DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

	This non-final action is responsive to the RCE filed on 1/13/21.
	Claims 1-3, 5-9, 21-23, and 25-33 are pending.

Response to Arguments
	The applicant argues that the cited references do not teach an audio recording of the speech. A new ground(s) of rejection is/are presented below addressing the amended limitation(s).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1, 2, 5-8, 21, 22, and 25-32 is/are rejected under 35 U.S.C. 103 as being unpatentable over Duwenhorst (US 20140035920) in view of Carney et al. (US 20130198642, Herein “Carney”) in view of Issa et al. (US 20120066592, Herein “Issa”) in view of Gibbon et al. (US 20120323575, Herein “Gibbon”).
	Regarding claim 1, Duwenhorst teaches a device comprising:
a processor (computer system and processor (fig. 7; [0081])); and
a memory in communication with the processor (fig. 7; memory [0085]), the memory comprising executable instructions that, when executed by the processor, cause the processor to control the device (fig. 7) to perform functions of:
capturing a first enhancement element (e.g., a displayable second segment type [0011]) for an audio recording (audio data as a function of time [0006] including recorded audio data of speakers [0009]) while an audio is being recorded to generate the audio recording (receiving digital audio data [0006]), the first enhancement element comprising visual content (displayable second segment type which may be displayed, such as upon selection of a visual indicator of the rendered audio recording [0011]) contextually relevant to content of the audio recording at a first time (relation with primary audio content [0006]);
associating the first enhancement element with a portion of the audio recording at the first time (the displayable second segment type associated with a selectable segment of of the audio data [0011]);
causing a first visual representation of the audio recording to be displayed via a graphical user interface (GUI) (displayed amplitude waveform [0029] displayed on a display along with selectable identifiers within the visual representation of the audio data [0011]);
identifying the first enhancement element associated with the audio recording (additional information about a particular audio segment, such as transcribed content of the segment [0058]).

Even though Duwenhorst discloses retrieving audio data from a particular file [0033], Duwenhorst fails to specifically teach and/or make abundantly clear capturing, at a first time of recording the speech, a first enhancement element for the audio recording. 

It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the recording audio data for analysis of Carney with the analyzing audio data of Duwenhorst to have capturing, at a first time of recording the speech, a first enhancement element for the audio recording. The combination would allow for, according to the motivation of Carney, analyzing a live broadcast by recording it for analysis so that supplemental content may be identified and presented to the user in a visual manner, such as by providing an image preview of content associated with a point in time of the broadcast ([0089] to [0092]). As such, the user may conveniently be able to select the preview of the supplemental content if the user desires and conveniently render supplemental content related to the preview of the supplemental content associated with the relevant time segment in the recorded audio [0093]. 

However, Duwenhorst in view of Carney fails to specifically teach causing a second visual representation of the first enhancement element to be displayed via the GUI, the second visual representation being displayed along with the first visual representation.
Yet, in a related art, Issa discloses previews of the enhancement item displayed as selectable links, each link rendered on the display as a visual representation of the corresponding additional content (fig. 3), the visual previews displayed along with the visual representation of the audio data visually representing a chronological sequence of corresponding audio data (e.g., the scrollable portion of fig. 3 represents a chronology of the audio of program narrative 300) (fig. 3).

Furthermore, Issa teaches and/or makes abundantly clear:
capturing a first enhancement element for an audio recording (based on a live broadcast such as a broadcast received from a talk radio station, examine the streaming audio content such as a live feed [0013] to capture a first enhancement element (e.g., video) corresponding with audio data such as terms of reference within the audio recording [0025], such that the enhancement element (e.g., video) may be displayed upon the user request; generate an audio recording such as a recording that may be transcribed using, e.g., a natural language process to detect terms of the audio recording [0016]), the first enhancement element comprising visual content contextually relevant to content of the audio recording (associated corresponding additional content ([0004], [0011], [0012], ]0015], and [0016]), the additional content comprising visual content such as retrieved additional content including, e.g., video, at a first time (chronology of enhancement data corresponding with selectable preview links and corresponding sequence within audio data (fig. 3));
	associating the first enhancement element with a portion of the audio recording at the first time (synchronization of previews such as “left tackle” and corresponding additional content comprising the respective enhancement elements which are rendered upon selection of a given preview link at the chronologically/synchronized time associated with the respective portion of the audio recording (figs. 2 and 3));
	causing a first visual representation of the audio recording to be displayed via a graphical user interface (GUI) (the chronologically oriented, scrollable presentation on the display of fig. 3 provides a visual representation of the sequential content of the audio data, such as a chronological presentation of audio data corresponding with, e.g., “Adam Schefter,” “left tackle,” and “Miami Dolphins,” the order of which is a visual representation of the content presented within the audio of program narrative 300 (fig. 3); in other words, the display of fig. 3 is a visual representation of the audio recording corresponding with the audio recording corresponding with the audio of program narrative 300);
	identifying the first enhancement element associated with the audio recording (identifying additional content (e.g., supplemental video) associated with relevant text of the recorded audio ([0036] to [0038]));
	causing a second visual representation of the first enhancement element to be displayed via the GUI (a visual audio link such as the image of the “left tackle” corresponding to additional content ([0016] and fig. 3); identify visual content corresponding with visual audio links based on the contextually-based terms of relevance of the audio recording, the visual content related to additional, contextually relevant content ([0016] and [0019]) such as a visual representation (e.g., “left tackle” and corresponding image) linked to cause a display of additional content based on a user selection (figs. 2 and 3)); for instance, fig. 3 shows a visual image of a “left tackle” displayed on the display, such as a at a first time (each of the identified visual content elements correspond with a particular point in time of the recorded audio such that a visual element may be displayed that corresponds with the relevant audio portion ([0021] and [0022])), the second visual representation being displayed along with the first visual representation (the preview links displayed along with the first visual representation of the scrollable presentation corresponding with audio data (fig. 3)); 
	receiving a first user input to select the displayed second visual representation (user selection of a visual content element link/representation associated with a corresponding portion of the recorded audio ([0023] and [0024])); and
	in response to the received first user input, causing the first enhancement element to be displayed via the GUI (once selected, the selected visual representation is used as a reference to retrieve additional content associated with the selected additional content, the additional content may comprise video, audio, text, webpage, etc. [0025]).

	However, Duwenhorst in view of Carney in view of Issa fails to specifically teach recording a speech by a person to generate an audio recording of the speech.
	Yet, in a related art, Gibbon discloses recording data of audio speakers such that audio/speaker content may be recorded in, e.g., an audio conference, such as including audible communications among humans [0018], the audio data recorded and monitored for speaker audio content, and further analyzed such as for determining visual representations of the audio content ([0019] and [0020]). 
	It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the recording speech by a person such as by identifying words for the recorded audio of the speech of Gibbon with the recording content for determining associated content and corresponding timeline of Duwenhorst in view of Carney in view of Issa to have recording a speech by a person to generate an audio recording of the speech. The combination would allow for, according 
Furthermore, Gibbon teaches and/or makes abundantly clear:
	capturing, at a first time of recording the speech (identifying words within the audio to be associated with particular speakers [0030]; a recording or audio such as audio including speaker information ([0019] and [0020])), a first enhancement element for the audio recording (visual detailed content and/or context of a given audio segment such that the detailed content and/or context of the segment may be displayed based on a user selection [0023]; see also ([0020] and [0021])), the first enhancement element comprising visual content contextually relevant to content of the audio recording at the first time (visualized detailed content and/or context of the audio segment ([0023]; see also [0021] and [0032] to [0035]));
	adding the first enhancement element to the audio recording (include the visual representations with the recording for presentation with corresponding presentation information such as when the content was presented([0021] and [0022])), the first enhancement element being associated with a portion of the audio recording at the first time (the determined spoken content being associated with portions of the audio conference such that the visual representations may be presented corresponding with the current audio data played to the user, such as the audio data being played in real time during a conference audio presentation [0024]);
	causing a first visual representation of the audio recording to be displayed via a graphical user interface (GUI) (as the audio is presented real time ([0031] to [0036]) such that the audio is presented 
	identify the first enhancement element associated with the audio recording (identify detailed content and/or context of the segment and present selectable visualization(s) [0023]);
	causing a second visual representation of the first enhancement element to be displayed via the GUI, the second visual representation being displayed along with the first visual representation (e.g., a selectable visualization such that a user may select (e.g., roll over with a cursor) a selectable layered text corresponding with an interactive visualization [0023]);
	receiving a first user input to select the displayed second visual representation (e.g., a user rolling a cursor over the selectable layered text [0023]); and
	in response to the received first user input, causing the first enhancement element to be displayed via the GUI (cause to display the detailed content and/or context of the segment [0023]). 

Regarding claim 2, Duwenhorst in view of Carney in view of Issa in view of Gibbon teaches the limitations of claim 1, as explained above.
Furthermore, Duwenhorst teaches the device of claim 1, wherein the instructions, when executed by the processor, further cause the processor to control the device to perform functions of:
causing at least a second portion of the audio recording at a second time to be transcribed (scripts identified from corresponding audio speech ([0003] to [0016]); received audio data including transcription information [0033]; transcribed audio data [0035] by producing transcription information and
causing text from the transcribed second portion of the audio recording to be identified (performing transcription using the audio data [0037] identifying text (e.g., speech segments of the audio data) [0038]; identifying words, phrases, sentences, etc. [0038]).

Furthermore, Issa teaches and/or makes abundantly clear:
causing at least a second portion of the audio recording at a second time to be transcribed (transcribing terms of relevance at various time points within the digital audio content [0019]);
causing text from the transcribed second portion of the audio recording to be identified (terms of relevanc identified from the digital audio recording [0004]); and
causing a third visual representation of a second enhancement element to be displayed via the GUI along with the first and second visual representations (each of the visual representations of the additional content corresponding with enhancement elements are displayed on the display in a manner that corresponds with the sequence in which they are presented in the digital audio recording (fig. 3)), the second enhancement element comprising the text (the additional content associated with the corresponding term of relevance [0004]).

Regarding claim 5, Duwenhorst in view of Carney in view of Issa in view of Gibbon teaches the limitations of claim 1, as explained above.
Furthermore, Issa teaches the device of claim 4, wherein the first enhancement element comprises at least one of a video and photo (the additional content comprising, e.g., video [0025]).

Regarding claim 6, Duwenhorst in view of Carney in view of Issa in view of Gibbon teaches the limitations of claim 1, as explained above.
Furthermore, Duwenhorst teaches the device of claim 1, wherein the first enhancement element includes a timestamp indicating the first time at which the first enhancement element was introduced to the audio recording (the enhancement entity items include transcribed audio data including their location within the audio data, and a duration of the respective segments [0038]; as such, the information which may be displayed based on the user input to enhance the waveform presentation may including additional information ([0048] and [0049]), the additional information further including, e.g., a location and duration of each enhancement entity within the audio data [0036]).

Furthermore, Issa teaches and/or makes abundantly clear the first enhancement element includes a timestamp indicating the first time at which the first enhancement element was introduced to the audio recording (keep track of the time in the audio recording in which a term of relevance was mentioned such that a respective enhancement element may be presented at a time indicating the time at which the respective enhancement element was presented in the audio corresponding with the relevant term of relevance mentioned in the audio [0029]).

Regarding claim 7, Duwenhorst in view of Carney in view of Issa in view of Gibbon teaches the limitations of claim 1, as explained above.
Furthermore, Duwenhorst teaches the device of claim 1, wherein the first enhancement element includes a display duration for which the first enhancement element is to be displayed along with the first visual representation via the display (a user can hover over a particular segment of the visual representation corresponding with the waveform and for the duration that the hover is 

Furthermore, Issa teaches and/or makes abundantly clear the first enhancement element includes a display duration for which the first enhancement element is to be displayed along with the first visual representation via the display (restricting access to the additional content to particular times ([0027] and [0039])).

Regarding claim 8, Duwenhorst in view of Carney in view of Issa in view of Gibbon teaches the limitations of claim 1, as explained above.
Furthermore, Issa teaches the device of claim 1, wherein:
the first enhancement element comprises at least one of a photo and video (the additional content may be, e.g., video [0025]); and
the second visual representation comprises a preview of the first enhancement element (fig. 3 shows a visual representation such as the image of a “left tackle” providing a preview of the additional content corresponding with the left tackle, for instance).

Regarding claim 21, claim 21 recites similar limitations as claim 1 – see rejection rationale above.
For instance, Gibbon teaches:
recording a speech by a person to generate an audio recording of the speech (using a processor system (figs. 1 to 3)  to record audio content and analyze speech of the recorded content (figs. 4 and 5));
capturing, at a first time of recording the speech, a first enhancement element for the audio recording, the first enhancement element comprising visual content contextually relevant to content of the audio recording at the first time (related to audio content such as words corresponding with particular speakers [0030], supplemental content including, e.g., sequential visualizations, are identified [0031]);
adding the first enhancement element to the audio recording (visualizations generated for the recorded audio including temporally sequential visualizations for each word associated with particular time segments of the audio such as beginning and ending of speaking segments for particular speakers ([0031] to [0033])), the first enhancement element being associated with a portion of the audio recording at the first time (visualizations associated with particular time segments ([0031] to [0034])),
causing a first visual representation of the audio recording to be displayed via a graphical user interface (GUI) (present the visualizations to the user [0036] such as displaying the visualizations [0048]);
identifying the first enhancement element associated with the audio recording (in the hardware system (figs. 1 to 3) identify speech segments for spoken content associated with speakers, such as for determining determined content (fig. 4) and, in particular, each enhancement element may be visualized selectively such as upon selection by a user [0023] );
causing a second visual representation of the first enhancement element to be displayed via the GUI, the second visual representation being displayed along with the first visual representation (the visual representation of the recorded audio may also be presented with a selectable layer [00890, presented as a second visualization such as by presenting selectable text as the second visualization that is presented along with the visual representation of the audio [0023]);
receiving a first user input to select the displayed second visual representation (a user selection of the selectable layered text such that the user may select the selectable layered text, resulting in a display of additional content [0023]); and
in response to the received first user input, causing the first enhancement element to be displayed via the GUI (a display of the detailed content and/or context of the segment [0023], the detailed content and context having been determined with respect to the analyzed audio content, such as the content and context of the speech in a segment [0034]). 

Regarding claim 22, claim 22 recites similar limitations as claim 2 – see rejection rationale above.

Regarding claim 25, claim 25 recites similar limitations as claim 5 – see rejection rationale above.

Regarding claim 26, claim 26 recites similar limitations as claim 6 – see rejection rationale above.

Regarding claim 27, claim 27 recites similar limitations as claim 7 – see rejection rationale above.

Regarding claim 28, claim 28 recites similar limitations as claim 8 – see rejection rationale above.

Regarding claim 29, Duwenhorst teaches a non-transitory computer readable medium containing instructions which, when executed by a processor (user system architecture ([0081] and fig. 7); memory and processor ([0081] to [0085])), cause a computer to perform functions of:
Claim 29 recites similar limitations as claim 1 – see rejection rationale above.
For instance, Gibbon teaches:
recording a speech by a person to generate an audio recording of the speech (recorded audio corresponding with speech (figs. 4 and 5));
capturing, at a first time of recording the speech, an enhancement element for the audio recording (detailed content and/or context corresponding with the recorded audio[0023]), the enhancement element comprising visual content contextually relevant to content of the audio recording at the first time (visualized detailed content and/or context that may be rendered upon a user selection of a visualization [0023]);
adding the first enhancement element to the audio recording, the enhancement element being associated with a portion of the audio recording at the first time (storing the detailed content and/or context in association with a given segment of the audio such that the enhancement item may be visually rendered upon the user making a selection [0023]). 

Regarding claim 30, Duwenhorst in view of Carney in view of Issa in view of Gibbon teaches the limitations of claim 29, as explained above.
Furthermore, Issa teaches the non-transitory computer readable medium of claim 29, wherein:
the enhancement element comprises at least one of a photo and video (the additional content may comprise, e.g., video [0025]); and 
the second visual representation comprises a preview of the enhancement element (fig. 3 shows a link representative of the additional content that may be displayed upon selection of the preview link).

Regarding claim 31, claim 31 recites similar limitations as claim 7 – see rejection rationale above.

Regarding claim 32, claim 32 recites similar limitations as claim 6 – see rejection rationale above.



	Claims 3 and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Duwenhorst in view of Carney in view of Issa in view of Gibbon, as applied above, and further in view of Sano et al. (US 20130308922, Herein “Sano”).
Regarding claim 3, Duwenhorst in view of Carney in view of Issa in view of Gibbon teaches the limitations of claims 1 and 2, as explained above.
Furthermore, Duwenhorst teaches the device of claim 2, wherein the text comprises at least one of a name, website, event time, event location and hashtag (as for event time and event location, Duwenhorst discloses transcription information identifying speech segments of the audio data including their location within the audio data and a duration of the segments [0038]).

Furthermore, Issa teaches a name (e.g., Jake Long (fig. 3)).

website.
	Yet, in a related art, Sano discloses that the transcript may include hyperlinks to enhance the experience of the user, such as by allowing a particular website or other content to be linked within the transcript, and the user may be enabled to click on the link to navigate to the content corresponding with the link (e.g., website) [0088].
	It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the including a website entity within the transcript/audio of Sano with the transcribing audio and providing representations of entities within the audio/transcript of Duwenhorst in view of Carney in view of Issa in view of Gibbon to have website. The combination would allow for, according to the motivation of Sano, enabling the user to access a diverse array of content within the transcript, such as by allowing the user to access a particular website in addition to the other content, such as by enabling the user to click on a link and, as an additional benefit, automatically link to the content (e.g., website) [0088].

Regarding claim 23, Duwenhorst in view of Carney in view of Issa in view of Gibbon teaches the limitations of claim 22, as explained above.
Furthermore, Duwenhorst teaches the method of claim 22, wherein the text comprises at least one of a name, website, event time, event location and hashtag (as for event time and event location, Duwenhorst discloses transcription information identifying speech segments of the audio data including their location within the audio data and a duration of the segments [0038]).

	Furthermore, Issa teaches a name (e.g., Jake Long of fig. 3).

website.
	Yet, in a related art, Sano discloses that the transcript may include hyperlinks to enhance the experience of the user, such as by allowing a particular website or other content to be linked within the transcript, and the user may be enabled to click on the link to navigate to the content corresponding with the link (e.g., website) [0088].
	It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the including a website entity within the transcript/audio of Sano with the transcribing audio and providing representations of entities within the audio/transcript of Duwenhorst in view of Carney in view of Issa in view of Gibbon to have website. The combination would allow for, according to the motivation of Sano, enabling the user to access a diverse array of content within the transcript, such as by allowing the user to access a particular website in addition to the other content, such as by enabling the user to click on a link and, as an additional benefit, automatically link to the content (e.g., website) [0088].


Claims 9 and 33 is/are rejected under 35 U.S.C. 103 as being unpatentable over Duwenhorst in view of Carney in view of Issa in view of Gibbon, as applied above, and in view of Bennett et al. (US 20100247061, Herein “Bennett”).
Regarding claim 9, Duwenhorst in view of Carney in view of Issa in view of Gibbon teaches the limitations of claim 1, as explained above.
Furthermore, Duwenhorst teaches the device of claim 1, wherein the instructions, when executed by the processor, further cause the processor to control the device to perform functions of:
receiving a second user input for selecting an editing mode (the user may input a modification request to modify the presented locations of the transcribed audio content (i.e., enhancement items) with respect to its presentation on the display [0050]);
	in response to receiving the second user input, causing the first enhancement element to be presented at an original location on the first visual representation (the user may select a slider with the audio data displayed at a current location with respect to time and intensity; subsequently, the user input may cause the data presentation to change with respect to the audio segments; for example, the audio data may first be presented with respect to positiosn corresponding to words whereas a subsequent user input may cause the enhancement entity (e.g., word or sentence) to be presented at a second location on the waveform such as presenting the segments with respect to words at different locations/segments than the sentences were previously presented [0051]);
receiving a third user input to modify the location of the first enhancement element on the first visual representation (the user may modify the locations of the transcribed audio text (i.e., enhancement entities) presented corresponding with the waveform as it is currently be viewed, such as according to words or sentences based on a user input edit request [0051]); and
in response to receiving the third user input, causing the first enhancement element presented at the modified location on the first visual representation (case the words (or sentences) to be presented at particular locations on the waveform with respect to the displayed segments according to the user input [0051]).

Furthermore, Carney teaches and/or makes abundantly clear receiving a second user input for selecting an editing mode (user selection of the supplemental content (e.g., a selection of the timeline for adjusting the presentation of the timeline’s supplemental content), such as a user control/input to control the supplemental content to be synchronized with the primary content [0093]);
in response to receiving the second user input, causing the first enhancement element to be presented at an original location on the first visual representation (input to control the supplemental content view to be, e.g., synchronized with the primary content, such as by the user selecting a resume key [0093]);
receiving a third user input to modify the location of the first enhancement element on the first visual representation (modify the view/location of the supplemental content by selecting keys associated with the supplemental content [0093]); and
in response to receiving the third user input, causing the first enhancement element presented at the modified location on the first visual representation (the supplemental content view modification [0093]).

Furthermore, Issa teaches and/or makes abundantly clear:
receiving a second user input for selecting an editing mode (receive input from a user such as input to the license rights manager [0039]);
in response to receiving the second user input (in response to receiving, e.g., user keys, determine a license for viewing the additional content [0039]), causing the first enhancement element to be presented at an original location on the first visual representation (cause the additional content to be presented on the display [0039]).

However, Duwenhorst in view of Carney in view of Issa in view of Gibbon fails to specifically teach receiving a third user input to modify the location of the first enhancement element on the first visual representation; and In response to receiving the third user input, causing the first enhancement element presented at the modified location on the first visual representation.

It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the repositioning of the supplemental content of Bennett with the providing dynamic supplemental content of Duwenhorst in view of Carney in view of Issa in view of Gibbon to have receiving a third user input to modify the location of the first enhancement element on the first visual representation; and In response to receiving the third user input, causing the first enhancement element presented at the modified location on the first visual representation. The combination would allow for, according to the motivation of Bennett, the user to conveniently manage and control the presentation and consumption and therefore enjoyment of an expanding amount of content that the user may like to consume based on the variety of media content being streamed and made available to the user, thus affording the user an adequate means to identify and enjoy such information in an efficient, coherent, and timely manner [0006], further improving the user’s viewing experience ([0026] and [0027]).

Regarding claim 33, claim 33 recites similar limitations as claim 9 – see rejection rationale above.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASON EDWARDS whose telephone number is (571) 272-5334. The examiner can normally be reached on Mon-Fri; 8am-5pm EST.

	Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance form a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA or CANADA) or 571-272-1000.
/JASON T EDWARDS/Examiner, Art Unit 2144