DETAILED ACTION
Introduction
This office action is in response to Applicant’s amendment filed on July 28, 2022. 
Claims 1-20 are pending in the application. Claims 1, 8 and 15 have been amended. As such, claims 1-20 have been examined. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In view of the original abstract only having 145 words as well as the accompanying explanations of record, the objections to the abstract have been withdrawn.
In view of the amendments to claims, the amendments to claims 1, 8 and 15, have been acknowledged and entered.
In view of the amendments to claims 1, 8 and 15, the rejections to claims 1-20 under 35 U.S.C. 103 have been withdrawn.
In light of the amendments to the claims, new grounds for rejection for claims 1-20 under 35 U.S.C. 103 are provided in the response below.
Response to Arguments
Applicant’s arguments regarding the prior art rejections under 35 U.S.C 103, received on July 28, 2022, have been fully considered.
In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).  See new rejections below based on combinations of references.
Additionally, Applicant’s further arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 6-9, 12, 13, 15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Turner et al. (US Patent Pub. No. 2020/0226642), hereinafter Turner, in view of Means, Jr. et al. (US Patent Pub. No. 2013/0110565), hereinafter Means, in further view of Gauci (US Patent Pub. No. 2017/0011740).

Regarding claim 1, Turner discloses a system (Turner [0039] - System 100 may include one or more devices 102) comprising: 
one or more processors configured by executable instructions (Turner [0006] - may comprise a memory storing instructions and a processor configured to execute the instructions) to perform operations comprising: 
presenting, by the one or more processors, 
text of the transcribed the audio content (Turner [0060] - Content management system logic 406 may also track data that describes digital content for targeting ads like content topic, content author, content key terms, and content transcript, for example), 

identifying, by the one or more processors, a plurality of keywords in the text of the transcribed audio content (Turner [0082] - keyword and/or topic data of the advertising content and digital content may be analyzed to determine whether a match exists… one or more keyword tags); 

determining, by the one or more processors, 
based on a first one or more keywords of the plurality of keywords (Turner [0082] - keyword and/or topic data of the advertising content and digital content may be analyzed to determine whether a match exists… one or more keyword tags), 

at least one first image to associate with (Turner [0064] - For example, API 504 may facilitate the reception and transmission of advertising content such as audio, video, and image advertising content by advertising storage 108 via network 104) 

a first time (Turner [0085] -For example, one piece of digital content may be associated with multiple advertising content. For example, a first podcase may be associated with a first matching advertising content, as well as a second matching advertising content that is different to the first. The first matching advertising content may be associated with a first advertising slot, such as a pre-content advertising slot, for example. The second matching advertising content may be associated with a second advertising slot, such as a post-content advertising slot, for example)
determining, by the one or more processors, 
based on second one or more keywords of the plurality of keywords (Turner [0085] -For example, one piece of digital content may be associated with multiple advertising content. For example, a first podcase may be associated with a first matching advertising content, as well as a second matching advertising content that is different to the first. The first matching advertising content may be associated with a first advertising slot, such as a pre-content advertising slot, for example. The second matching advertising content may be associated with a second advertising slot, such as a post-content advertising slot, for example), 

at least one second image to associate with (Turner [0085] -For example, one piece of digital content may be associated with multiple advertising content. For example, a first podcase may be associated with a first matching advertising content, as well as a second matching advertising content that is different to the first. The first matching advertising content may be associated with a first advertising slot, such as a pre-content advertising slot, for example. The second matching advertising content may be associated with a second advertising slot, such as a post-content advertising slot, for example) 

a second time (Turner [0085] -For example, one piece of digital content may be associated with multiple advertising content. For example, a first podcase may be associated with a first matching advertising content, as well as a second matching advertising content that is different to the first. The first matching advertising content may be associated with a first advertising slot, such as a pre-content advertising slot, for example. The second matching advertising content may be associated with a second advertising slot, such as a post-content advertising slot, for example)

and storing, by one or more processors (Turner [0058] - It should be noted that software 302 may be stored on a memory of device 102 and executed by a processor. Moreover, it should be noted that internal cache and other memory space of the memory may be used by software 302 as swap space for performing advertising and/or digital content processing, for example, as well as space for storing intermediate processing files and configuration settings for processing, for example, by the one or more processors), 
to generate enhanced audio content (Turner [0044] - Device 102 may then insert such advertising content into the obtained video and/or audio content, generating finalized content),

the at least one first image in association with the audio content and the first time, and the at least one second image in association with the audio content and the second time (Turner [0085] -For example, one piece of digital content may be associated with multiple advertising content. For example, a first podcase may be associated with a first matching advertising content, as well as a second matching advertising content that is different to the first. The first matching advertising content may be associated with a first advertising slot, such as a pre-content advertising slot, for example. The second matching advertising content may be associated with a second advertising slot, such as a post-content advertising slot, for example).  
Turner discloses a text of a transcript and audio content. However, Turner does not disclose a user interface or a timeline;
 presenting, by the one or more processors, in a user interface, text of a transcript,
the user interface further including a timeline representative of at least a portion of the transcribed audio content.
Turner discloses a first time and a first one or more keywords. However, Turner does not disclose a user interface or a timeline;
a first time in the timeline in the user interface, 
wherein the first time in the timeline is associated with the first one or more keywords.
Turner discloses a second time and second one or more keywords. However, Turner does not disclose a user interface or a timeline;
a second time in the timeline in the user interface, 
wherein the second time in the timeline is associated with the second one or more keywords.
Turner discloses a first image and a first time in association with the audio content and a second image and a second time in association with the audio content. However, Turner does not disclose a timeline;
the at least one first image in association with the audio content and the first time in the timeline, 
and the at least one second image in association with the audio content and the second time in the timeline.  
Turner discloses a text of a transcript and audio content and a processor for presenting. However, Turner does not disclose transcribing contemporaneously;
“transcribing”, by the one or more processors, audio content received from an audio source;
presenting, by the one or more processors, “contemporaneously with the transcribing,”
Turner discloses a processor for adding, keywords, and first and second images. However, Turner does not disclose a timeline, a user interface, nor images based on keywords;
adding, by the one or more processors, “to the timeline in the user interface,” a first visible indication of the at least one “first image determined based on the first one or more keywords,” and a second visible indication of the at least one “second image determined based on the second one or more keywords;”

However, Means does disclose
a user interface (Means [0090] - FIG. 3L depicts an exemplary embodiment of a diagram 396 of an exemplary supervisor interface management/reporting tool bar indicating exemplary member view, where various captured audio call activity, the audio level indicator may, e.g., but not limited to, allow the supervisor to play audio blocks, which actually contain audio, versus silence, etc., screen pane and web cam pane activity may be viewed via graphical user interface display of an exemplary timeline indicating call segments, and a selected magnified area for a selected call segment, including providing for adding comments by a coach/manager/supervisor, as well as review of the team member's action button selections, screen shot and web cam image of the team member, according to an exemplary embodiment),

the user interface further including a timeline (Means [0090] - FIG. 3L depicts an exemplary embodiment of a diagram 396 of an exemplary supervisor interface management/reporting tool bar indicating exemplary member view, where various captured audio call activity, the audio level indicator may, e.g., but not limited to, allow the supervisor to play audio blocks, which actually contain audio, versus silence, etc., screen pane and web cam pane activity may be viewed via graphical user interface display of an exemplary timeline indicating call segments, and a selected magnified area for a selected call segment, including providing for adding comments by a coach/manager/supervisor, as well as review of the team member's action button selections, screen shot and web cam image of the team member, according to an exemplary embodiment).
Means is considered to be analogous to the claimed invention because it is in the same field of analyzing user activity including audio. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner further in view of Means to provide for presenting a timeline in a user interface. Doing so would provide users with the ability to interact, review and analyze user activity including audio.

Turner in view of Means discloses a user interface, a timeline, a text of a transcript and audio content, and a processor for presenting. However, Turner in view of Means does not disclose transcribing contemporaneously, nor images based on keywords;
“transcribing”, by the one or more processors, audio content received from an audio source;
presenting, by the one or more processors, “contemporaneously with the transcribing,”
adding, by the one or more processors, to the timeline in the user interface, a first visible indication of the at least one “first image determined based on the first one or more keywords,” and a second visible indication of the at least one “second image determined based on the second one or more keywords.”
Gauci teaches
transcribing, by the one or more processors, audio content received from an audio source; presenting, by the one or more processors, in a user interface, contemporaneously with the transcribing, (Gauci [0065] Alternatively, speech-to-text units 14 may transcribe the speech and generate transcript 16 as the video conference is executed. Communication server 22 may retrieve the combined media stream in real-time (e.g., as the combined media stream is generated and transmitted, communication server 22 may simultaneously process the combined media stream for generation and/or amendment of transcript 16). During the video conference, speech-to-text units 14 may this continually transcribe speech into text and update transcript 16 while the end users are communicating. Translation unit 17 may also continually translate the necessary text before the transcript is sent to the end users. In this manner, transcript 16 may be continually updated to include recently generated text. Transcript 16 may thus be updated for each end user as new text is added or segments (e.g., words, phrases, or sentences) of transcript 16 may be transmitted to each user as the segments are generated),

adding…a first visible indication of the at least one first image determined based on the first one or more keywords (Gauci [0039] In this manner, annotation module 23 (or one or more processors of server device 20, for example) may be configured to annotate the transcribed text for the audio component of each respective media sub-stream to include additional content. Annotation of the text may include determining one or more keywords of the text. The keywords may be nouns, pronouns, addresses, phone numbers, or any other words or phrases identified as important based on the context of the transcription and/or the frequency with which the word or phrase is used. The additional content for the transcription may be selected based on the one or more keywords. For example, the additional content may be a web element (e.g., a picture, text, or other feature) or a hyperlink (e.g., a link to a web element) selected based on the one or more keywords and inserted into the text. The additional content may be inserted in place of the one or more associated keywords or near the keyword. In other examples, the additional content may be one or more advertisements selected based on the one or more keywords. Annotation module 23, for example, may match an advertisement indexed within a database (e.g., a database stored within server device 20 or stored in a repository networked to server device 20) to the one or more keywords. The advertisement may be presented within the transcript or otherwise associated with the real-time communication session),

and a second visible indication of the at least one second image determined based on the second one or more keywords (Gauci [0039] In this manner, annotation module 23 (or one or more processors of server device 20, for example) may be configured to annotate the transcribed text for the audio component of each respective media sub-stream to include additional content. Annotation of the text may include determining one or more keywords of the text. The keywords may be nouns, pronouns, addresses, phone numbers, or any other words or phrases identified as important based on the context of the transcription and/or the frequency with which the word or phrase is used. The additional content for the transcription may be selected based on the one or more keywords. For example, the additional content may be a web element (e.g., a picture, text, or other feature) or a hyperlink (e.g., a link to a web element) selected based on the one or more keywords and inserted into the text. The additional content may be inserted in place of the one or more associated keywords or near the keyword. In other examples, the additional content may be one or more advertisements selected based on the one or more keywords. Annotation module 23, for example, may match an advertisement indexed within a database (e.g., a database stored within server device 20 or stored in a repository networked to server device 20) to the one or more keywords. The advertisement may be presented within the transcript or otherwise associated with the real-time communication session).
Gauci is considered to be analogous to the claimed invention because it is in the same field of text transcript generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner in view of Means further in view of Gauci to provide for contemporaneously transcribing and adding images based on keywords. Doing so would allow a transcript (or a translated transcript from translation unit) to be provided to end users who indicate their desire to receive a transcription of the video conference.

Regarding claim 2, Turner in view of Means in view of Gauci discloses the system as recited in claim 1.
Turner further discloses the operations further comprising 
sending the enhanced audio content, including the audio content, the at least one first image, and the at least one second image, to at least one electronic device for presentation on the at least one electronic device (Turner [0097] - For example, a content management system may specify one or more transmission destinations for stitched content in destination data associated with the selected digital content; [0087] Process 700 may include a step 724, where each piece of digital content and associated matching advertising content is processed to form a stitched, combined content; also see [0044]).  

Regarding claim 6, Turner in view of Means in view of Gauci discloses the system as recited in claim 1.
Turner further discloses the operations further comprising 
sending the enhanced audio content to a computing device to be provided for download to a plurality of electronic devices (Turner [0097] - For example, a content management system may specify one or more transmission destinations for stitched content in destination data associated with the selected digital content; [0087] Process 700 may include a step 724, where each piece of digital content and associated matching advertising content is processed to form a stitched, combined content; also see [0044]).  

Regarding claim 7, Turner in view of Means in view of Gauci discloses the system as recited in claim 6.

Turner further discloses wherein: 
the enhanced audio content is configured to be played back at a respective electronic device to present the at least one first image and the at least one second image during playback of the enhanced audio content (Turner [0097] - For example, a content management system may specify one or more transmission destinations for stitched content in destination data associated with the selected digital content; [0087] Process 700 may include a step 724, where each piece of digital content and associated matching advertising content is processed to form a stitched, combined content; also see [0044]).  

and according to a timing corresponding to the first time (Turner [0085] -For example, one piece of digital content may be associated with multiple advertising content. For example, a first podcase may be associated with a first matching advertising content, as well as a second matching advertising content that is different to the first. The first matching advertising content may be associated with a first advertising slot, such as a pre-content advertising slot, for example. The second matching advertising content may be associated with a second advertising slot, such as a post-content advertising slot, for example)

and the second time, respectively (Turner [0085] -For example, one piece of digital content may be associated with multiple advertising content. For example, a first podcase may be associated with a first matching advertising content, as well as a second matching advertising content that is different to the first. The first matching advertising content may be associated with a first advertising slot, such as a pre-content advertising slot, for example. The second matching advertising content may be associated with a second advertising slot, such as a post-content advertising slot, for example).  
Turner discloses according to a timing corresponding to the first time and the second time. However, Turner does not disclose the timeline;
and according to a timing corresponding to the first time in the timeline and the second time in the timeline, respectively.  

Means does disclose
the timeline, (Means [0090] - FIG. 3L depicts an exemplary embodiment of a diagram 396 of an exemplary supervisor interface management/reporting tool bar indicating exemplary member view, where various captured audio call activity, the audio level indicator may, e.g., but not limited to, allow the supervisor to play audio blocks, which actually contain audio, versus silence, etc., screen pane and web cam pane activity may be viewed via graphical user interface display of an exemplary timeline indicating call segments, and a selected magnified area for a selected call segment, including providing for adding comments by a coach/manager/supervisor, as well as review of the team member's action button selections, screen shot and web cam image of the team member, according to an exemplary embodiment).
Means is considered to be analogous to the claimed invention because it is in the same field of analyzing user activity including audio. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner further in view of Means to provide for presenting a timeline in a user interface. Doing so would provide users with the ability to interact, review and analyze user activity including audio.

Regarding claim 8, Turner discloses a system (Turner [0039] - System 100 may include one or more devices 102) comprising: 

one or more processors configured by executable instructions (Turner [0006] - may comprise a memory storing instructions and a processor configured to execute the instructions) 
to perform operations comprising: 

receiving audio content (Turner [0060] - Content management system logic 406 may also track data that describes digital content for targeting ads like content topic, content author, content key terms, and content transcript, for example); 

the audio content (Turner [0060] - Content management system logic 406 may also track data that describes digital content for targeting ads like content topic, content author, content key terms, and content transcript, for example); 

determining a plurality of keywords from the transcribed audio content (Turner [0082] - keyword and/or topic data of the advertising content and digital content may be analyzed to determine whether a match exists… one or more keyword tags); 

associating respective keywords of the plurality of keywords with respective times representative of the portion of the audio content (Turner [0085] -For example, one piece of digital content may be associated with multiple advertising content. For example, a first podcase may be associated with a first matching advertising content, as well as a second matching advertising content that is different to the first. The first matching advertising content may be associated with a first advertising slot, such as a pre-content advertising slot, for example. The second matching advertising content may be associated with a second advertising slot, such as a post-content advertising slot, for example); 

determining, based on at least one of the keywords (Turner [0082] - keyword and/or topic data of the advertising content and digital content may be analyzed to determine whether a match exists… one or more keyword tags), 

a first image to associate with a first time corresponding to the audio content (Turner [0085] -For example, one piece of digital content may be associated with multiple advertising content. For example, a first podcase may be associated with a first matching advertising content [per [0064] advertising content may be an image], as well as a second matching advertising content that is different to the first. The first matching advertising content may be associated with a first advertising slot, such as a pre-content advertising slot, for example. The second matching advertising content may be associated with a second advertising slot, such as a post-content advertising slot, for example);

and storing (Turner [0058] - It should be noted that software 302 may be stored on a memory of device 102 and executed by a processor. Moreover, it should be noted that internal cache and other memory space of the memory may be used by software 302 as swap space for performing advertising and/or digital content processing, for example, as well as space for storing intermediate processing files and configuration settings for processing, for example, by the one or more processors) 

the firstDevice 102 may then insert such advertising content into the obtained video and/or audio content, generating finalized content).

Turner discloses a transcript of the audio content.  However, Turner does not disclose transcribing a transcript;
transcribing the audio content

Turner discloses a transcript of the audio content.  However, Turner does not disclose transcribing a transcript contemporaneously in a user interface with a timeline;
contemporaneously with the transcribing of the audio content, presenting in a user interface, a portion of the transcribed audio content and a timeline representative of the portion of the transcribed audio content.

Turner discloses associating respective keywords of the plurality of keywords with respective times corresponding to the audio content.  However, Turner does not disclose a timeline;
associating respective keywords of the plurality of keywords with respective times in the timeline representative of the portion of the audio content.

Turner discloses at least one of the keywords and a first image corresponding to the audio content. 
However, Turner does not disclose a timeline;
determining, based on at least one of the keywords, a first image to associate with a first time in the timeline corresponding to the audio content.

Turner discloses at least one of the keywords and a first image corresponding to the audio content. 
However, Turner does not disclose a timeline, a user interface, nor images based on keywords;
adding, to the timeline in the user interface, a visible indication of the first image determined based on the at least one of the keywords.

Turner discloses storing the first image in association with the audio content to generate enhanced audio content. However, Turner does not disclose a timeline;
and storing the firstin the timeline to generate enhanced audio content.

However, Means does disclose
transcribing the audio content (Means [0022], lines 72-75 - transcribing audio)

a user interface (Means [0090] - FIG. 3L depicts an exemplary embodiment of a diagram 396 of an exemplary supervisor interface management/reporting tool bar indicating exemplary member view, where various captured audio call activity, the audio level indicator may, e.g., but not limited to, allow the supervisor to play audio blocks, which actually contain audio, versus silence, etc., screen pane and web cam pane activity may be viewed via graphical user interface display of an exemplary timeline indicating call segments, and a selected magnified area for a selected call segment, including providing for adding comments by a coach/manager/supervisor, as well as review of the team member's action button selections, screen shot and web cam image of the team member, according to an exemplary embodiment),


a timeline (Means [0090] - FIG. 3L depicts an exemplary embodiment of a diagram 396 of an exemplary supervisor interface management/reporting tool bar indicating exemplary member view, where various captured audio call activity, the audio level indicator may, e.g., but not limited to, allow the supervisor to play audio blocks, which actually contain audio, versus silence, etc., screen pane and web cam pane activity may be viewed via graphical user interface display of an exemplary timeline indicating call segments, and a selected magnified area for a selected call segment, including providing for adding comments by a coach/manager/supervisor, as well as review of the team member's action button selections, screen shot and web cam image of the team member, according to an exemplary embodiment).
Means is considered to be analogous to the claimed invention because it is in the same field of analyzing user activity including audio. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner further in view of Means to provide for presenting a timeline in a user interface. Doing so would provide users with the ability to interact, review and analyze user activity including audio.

Turner in view of Means discloses a user interface, a timeline, a text of a transcript and audio content, and a processor for presenting. However, Turner in view of Means does not disclose transcribing contemporaneously, nor images based on keywords;
transcribing the audio content; contemporaneously with the transcribing of the audio content, presenting in a user interface, a portion of the transcribed audio content and a timeline representative of the portion of the transcribed audio content
adding, to the timeline in the user interface, a visible indication of the first image determined based on the at least one of the keywords.
Gauci teaches
transcribing audio content; contemporaneously with the transcribing of the audio content, presenting in a user interface, a portion of the transcribed audio content, (Gauci [0065] Alternatively, speech-to-text units 14 may transcribe the speech and generate transcript 16 as the video conference is executed. Communication server 22 may retrieve the combined media stream in real-time (e.g., as the combined media stream is generated and transmitted, communication server 22 may simultaneously process the combined media stream for generation and/or amendment of transcript 16). During the video conference, speech-to-text units 14 may this continually transcribe speech into text and update transcript 16 while the end users are communicating. Translation unit 17 may also continually translate the necessary text before the transcript is sent to the end users. In this manner, transcript 16 may be continually updated to include recently generated text. Transcript 16 may thus be updated for each end user as new text is added or segments (e.g., words, phrases, or sentences) of transcript 16 may be transmitted to each user as the segments are generated),

adding…in the user interface, a visible indication of the first image determined based on the first one or more keywords (Gauci [0039] In this manner, annotation module 23 (or one or more processors of server device 20, for example) may be configured to annotate the transcribed text for the audio component of each respective media sub-stream to include additional content. Annotation of the text may include determining one or more keywords of the text. The keywords may be nouns, pronouns, addresses, phone numbers, or any other words or phrases identified as important based on the context of the transcription and/or the frequency with which the word or phrase is used. The additional content for the transcription may be selected based on the one or more keywords. For example, the additional content may be a web element (e.g., a picture, text, or other feature) or a hyperlink (e.g., a link to a web element) selected based on the one or more keywords and inserted into the text. The additional content may be inserted in place of the one or more associated keywords or near the keyword. In other examples, the additional content may be one or more advertisements selected based on the one or more keywords. Annotation module 23, for example, may match an advertisement indexed within a database (e.g., a database stored within server device 20 or stored in a repository networked to server device 20) to the one or more keywords. The advertisement may be presented within the transcript or otherwise associated with the real-time communication session).
Gauci is considered to be analogous to the claimed invention because it is in the same field of text transcript generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner in view of Means further in view of Gauci to provide for contemporaneously transcribing and adding images based on keywords. Doing so would allow a transcript (or a translated transcript from translation unit) to be provided to end users who indicate their desire to receive a transcription of the video conference.

Regarding claim 9, Turner in view of Means in view of Gauci discloses the system as recited in claim 8.
Turner further discloses the operations further comprising sending the enhanced audio content, including the audio content and the first image, to an electronic device to cause, at least in part, an application executing on the electronic device to present the first image during presentation of the audio content (Turner [0097] - For example, a content management system may specify one or more transmission destinations for stitched content in destination data associated with the selected digital content; [0087] Process 700 may include a step 724, where each piece of digital content and associated matching advertising content is processed to form a stitched, combined content; also see [0044]).  

Regarding claim 12, Turner in view of Means in view of Gauci discloses the system as recited in claim 8.
Turner further discloses the operation of determining the plurality of keywords in the transcript further comprising referring to at least one of: a keyword data structure, or metadata associated with the audio content (Turner [0071] - The analysis may use metadata regarding the advertising and/or digital content).  

Regarding claim 13, Turner in view of Means in view of Gauci discloses the system as recited in claim 8.
Turner further discloses the operations further comprising sending the enhanced audio content to a computing device that provides the enhanced audio content for download to a plurality of electronic devices (Turner [0097] - For example, a content management system may specify one or more transmission destinations for stitched content in destination data associated with the selected digital content; [0087] Process 700 may include a step 724, where each piece of digital content and associated matching advertising content is processed to form a stitched, combined content; also see [0044]).  

Regarding claim 15, Turner discloses a computing device (Turner [0006] - may comprise a memory storing instructions and a processor configured to execute the instructions) comprising: 
a display (Turner [0039] - System 100 may include one or more devices 102. Device 102 may be, for example, a computer device, such as a server, desktop computer, laptop computer, tablet computer, mobile computer, mobile phone (e.g., PDA, cell phone, palmtop, etc.), mainframe, server, client, or any other type of special or general purpose computing device); 

a processor coupled to the display (Turner [0006] - may comprise a memory storing instructions and a processor configured to execute the instructions), 

the processor configured by executable instructions to perform operations comprising (Turner [0006] - may comprise a memory storing instructions and a processor configured to execute the instructions): 

receiving audio content at the computing device (Turner [0060] - Content management system logic 406 may also track data that describes digital content for targeting ads like content topic, content author, content key terms, and content transcript, for example); 

text of the audio content (Turner [0060] - Content management system logic 406 may also track data that describes digital content for targeting ads like content topic, content author, content key terms, and content transcript, for example); 

identifying, by the processor, a plurality of keywords in the text of the audio content (Turner [0082] - keyword and/or topic data of the advertising content and digital content may be analyzed to determine whether a match exists… one or more keyword tags); 

determining, 
based on at least one of the keywords (Turner [0082] - keyword and/or topic data of the advertising content and digital content may be analyzed to determine whether a match exists… one or more keyword tags), 

at least one image to present during presentation of the audio content on the computing device (Turner [0082] - keyword and/or topic data of the advertising content [per [0064] advertising content may be an image] and digital content may be analyzed to determine whether a match exists… one or more keyword tags)

at timing corresponding to a first time (Turner [0085] -For example, one piece of digital content may be associated with multiple advertising content. For example, a first podcase may be associated with a first matching advertising content, as well as a second matching advertising content that is different to the first. The first matching advertising content may be associated with a first advertising slot, such as a pre-content advertising slot, for example. The second matching advertising content may be associated with a second advertising slot, such as a post-content advertising slot, for example);

and presenting the at least one image (Turner [0082] - keyword and/or topic data of the advertising content and digital content may be analyzed to determine whether a match exists… one or more keyword tags; also see [0044])

on the display (Turner [0039] - System 100 may include one or more devices 102. Device 102 may be, for example, a computer device, such as a server, desktop computer, laptop computer, tablet computer, mobile computer, mobile phone (e.g., PDA, cell phone, palmtop, etc.), mainframe, server, client, or any other type of special or general purpose computing device)

the at least the portion of the text of the audio content (Turner [0060] - Content management system logic 406 may also track data that describes digital content for targeting ads like content topic, content author, content key terms, and content transcript, for example).  

Turner discloses a transcript. However, Turner does not disclose generate text of a transcript;
transcribing at least a portion of the audio content to generate text of a transcript of the audio content

Turner discloses a transcript. However, Turner does not disclose displaying transcript;
presenting on the display along with the transcript of the audio content

Turner discloses a transcript of the audio content.  However, Turner does not disclose transcribing a transcript contemporaneously and presenting in a user interface with a timeline;
contemporaneously with the transcribing of at least the portion of the audio content, presenting on the display in a user interface, at least a portion of the text of the audio content and a timeline representative of at least the portion of the text.

Turner discloses at least one of the keywords and a first image corresponding to the audio content. 
However, Turner does not disclose a timeline, a user interface, nor images based on keywords;
adding, to the timeline in the user interface, a visible indication of the at least one image determined based on the at least one of the keywords.

Turner discloses presenting a first image at a first time of the audio content and the transcript on the display. However, Turner does not disclose a timeline or a user interface;
at timing corresponding to a first time in the timeline representative of at least a portion of the audio content; 
and presenting the at least one image in the user interface on the display along with the transcript of the audio content.

However, Means discloses 
transcribing at least a portion of the audio content to generate text of a transcript of the audio content (Means [0022] - transcribing audio)

presenting on the display along with the transcript of the audio content (Means [0022] - …analyzing an audio transcript…displaying individual analyzed data)

the timeline (Means [0090] - FIG. 3L depicts an exemplary embodiment of a diagram 396 of an exemplary supervisor interface management/reporting tool bar indicating exemplary member view, where various captured audio call activity, the audio level indicator may, e.g., but not limited to, allow the supervisor to play audio blocks, which actually contain audio, versus silence, etc., screen pane and web cam pane activity may be viewed via graphical user interface display of an exemplary timeline indicating call segments, and a selected magnified area for a selected call segment, including providing for adding comments by a coach/manager/supervisor, as well as review of the team member's action button selections, screen shot and web cam image of the team member, according to an exemplary embodiment);

the user interface on the display (Means [0090] - FIG. 3L depicts an exemplary embodiment of a diagram 396 of an exemplary supervisor interface management/reporting tool bar indicating exemplary member view, where various captured audio call activity, the audio level indicator may, e.g., but not limited to, allow the supervisor to play audio blocks, which actually contain audio, versus silence, etc., screen pane and web cam pane activity may be viewed via graphical user interface display of an exemplary timeline indicating call segments, and a selected magnified area for a selected call segment, including providing for adding comments by a coach/manager/supervisor, as well as review of the team member's action button selections, screen shot and web cam image of the team member, according to an exemplary embodiment).
Means is considered to be analogous to the claimed invention because it is in the same field of analyzing user activity including audio. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner further in view of Means to provide for presenting a timeline in a user interface. Doing so would provide users with the ability to interact, review and analyze user activity including audio.

Turner in view of Means discloses a user interface, a timeline, a text of a transcript and audio content, and a processor for presenting. However, Turner in view of Means does not disclose transcribing contemporaneously, nor images based on keywords;
contemporaneously with the transcribing of at least the portion of the audio content, presenting on the display in a user interface, at least a portion of the text of the audio content and a timeline representative of at least the portion of the text.

adding, to the timeline in the user interface, a visible indication of the at least one image determined based on the at least one of the keywords.

Gauci teaches
contemporaneously with the transcribing of at least the portion of the audio content, presenting on the display in a user interface, at least a portion of the text of the audio content (Gauci [0065] Alternatively, speech-to-text units 14 may transcribe the speech and generate transcript 16 as the video conference is executed. Communication server 22 may retrieve the combined media stream in real-time (e.g., as the combined media stream is generated and transmitted, communication server 22 may simultaneously process the combined media stream for generation and/or amendment of transcript 16). During the video conference, speech-to-text units 14 may this continually transcribe speech into text and update transcript 16 while the end users are communicating. Translation unit 17 may also continually translate the necessary text before the transcript is sent to the end users. In this manner, transcript 16 may be continually updated to include recently generated text. Transcript 16 may thus be updated for each end user as new text is added or segments (e.g., words, phrases, or sentences) of transcript 16 may be transmitted to each user as the segments are generated),

adding…in the user interface, a visible indication of the at least one image determined based on the at least one of the keywords (Gauci [0039] In this manner, annotation module 23 (or one or more processors of server device 20, for example) may be configured to annotate the transcribed text for the audio component of each respective media sub-stream to include additional content. Annotation of the text may include determining one or more keywords of the text. The keywords may be nouns, pronouns, addresses, phone numbers, or any other words or phrases identified as important based on the context of the transcription and/or the frequency with which the word or phrase is used. The additional content for the transcription may be selected based on the one or more keywords. For example, the additional content may be a web element (e.g., a picture, text, or other feature) or a hyperlink (e.g., a link to a web element) selected based on the one or more keywords and inserted into the text. The additional content may be inserted in place of the one or more associated keywords or near the keyword. In other examples, the additional content may be one or more advertisements selected based on the one or more keywords. Annotation module 23, for example, may match an advertisement indexed within a database (e.g., a database stored within server device 20 or stored in a repository networked to server device 20) to the one or more keywords. The advertisement may be presented within the transcript or otherwise associated with the real-time communication session).
Gauci is considered to be analogous to the claimed invention because it is in the same field of text transcript generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner in view of Means further in view of Gauci to provide for contemporaneously transcribing and adding images based on keywords. Doing so would allow a transcript (or a translated transcript from translation unit) to be provided to end users who indicate their desire to receive a transcription of the video conference.


Regarding claim 20, Turner in view of Means in view of Gauci discloses the computing device as recited in claim 15. 
Turner discloses an association between the at least one image and the first time (Turner [0085] -For example, one piece of digital content may be associated with multiple advertising content. For example, a first podcase may be associated with a first matching advertising content, as well as a second matching advertising content that is different to the first. The first matching advertising content may be associated with a first advertising slot, such as a pre-content advertising slot, for example. The second matching advertising content may be associated with a second advertising slot, such as a post-content advertising slot, for example). 

Turner discloses an association between the at least one image and the first time. However, Turner does not disclose a user interface or a timeline to visually indicate;
wherein the user interface further presents the timeline to visually indicate an association between the at least one image and the first time in the timeline.   

Means does disclose 
wherein the user interface (Means [0090] - FIG. 3L depicts an exemplary embodiment of a diagram 396 of an exemplary supervisor interface management/reporting tool bar indicating exemplary member view, where various captured audio call activity, the audio level indicator may, e.g., but not limited to, allow the supervisor to play audio blocks, which actually contain audio, versus silence, etc., screen pane and web cam pane activity may be viewed via graphical user interface display of an exemplary timeline indicating call segments, and a selected magnified area for a selected call segment, including providing for adding comments by a coach/manager/supervisor, as well as review of the team member's action button selections, screen shot and web cam image of the team member, according to an exemplary embodiment)

further presents the timeline to visually indicate an association between an image and a first time in the timeline (Means [0090] - FIG. 3L depicts an exemplary embodiment of a diagram 396 of an exemplary supervisor interface management/reporting tool bar indicating exemplary member view, where various captured audio call activity, the audio level indicator may, e.g., but not limited to, allow the supervisor to play audio blocks, which actually contain audio, versus silence, etc., screen pane and web cam pane activity may be viewed via graphical user interface display of an exemplary timeline indicating call segments, and a selected magnified area for a selected call segment, including providing for adding comments by a coach/manager/supervisor, as well as review of the team member's action button selections, screen shot and web cam image of the team member, according to an exemplary embodiment).
Means is considered to be analogous to the claimed invention because it is in the same field of analyzing user activity including audio. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner further in view of Means to provide for presenting a timeline in a user interface. Doing so would provide users with the ability to interact, review and analyze user activity including audio.

Claims 3-5, 10, 11, 14 and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Turner in view of Means in view of Gauci in further view of Bilenko et al. (US Patent Pub. No. 2010/0208984), hereinafter Bilenko.

Regarding claim 3, Turner in view of Means in view of Gauci discloses the system as recited in claim 1.
Turner further discloses the operation of 
determining, based on the first one or more keywords (Turner [0082] - keyword and/or topic data of the advertising content and digital content may be analyzed to determine whether a match exists… one or more keyword tags), 

the at least one first image to associate with the first time (Turner [0097] - For example, a content management system may specify one or more transmission destinations for stitched content in destination data associated with the selected digital content; [0087] Process 700 may include a step 724, where each piece of digital content and associated matching advertising content is processed to form a stitched, combined content).

Turner discloses the at least one first image to associate with the first time. However, Turner does not disclose the timeline;
	the at least one first image to associate with the first time in the timeline

Turner does not disclose
further comprising inputting the first one or more keywords into a machine-learning model to determine, at least in part, based on an output of the machine-learning model, the at least one first image to associate with the first time in the timeline.

Means discloses
the timeline (Means [0090] - FIG. 3L depicts an exemplary embodiment of a diagram 396 of an exemplary supervisor interface management/reporting tool bar indicating exemplary member view, where various captured audio call activity, the audio level indicator may, e.g., but not limited to, allow the supervisor to play audio blocks, which actually contain audio, versus silence, etc., screen pane and web cam pane activity may be viewed via graphical user interface display of an exemplary timeline indicating call segments, and a selected magnified area for a selected call segment, including providing for adding comments by a coach/manager/supervisor, as well as review of the team member's action button selections, screen shot and web cam image of the team member, according to an exemplary embodiment).
Means is considered to be analogous to the claimed invention because it is in the same field of analyzing user activity including audio. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner further in view of Means to provide for presenting a timeline in a user interface. Doing so would provide users with the ability to interact, review and analyze user activity including audio.

Means discloses a timeline. However, Turner in view of Means does not disclose 
further comprising inputting the first one or more keywords into a machine-learning model to determine, at least in part, based on an output of the machine-learning model, the at least one first image to associate with the first time in the timeline.

However, Bilenko does disclose
further comprising inputting the first one or more keywords into a machine-learning model (Bilenko [0023] - The advertisement platform 122 may receive input/source keywords from a variety of sources. In FIG. 2, a client application 124, hosted on a client computer, provides a source keyword, for example, in a search query string or other input. The advertisement platform, possibly after determining that the input keyword should be expanded with broad-matched keywords, passes the input keyword to the broad-match learning machine 120. The broad-match learning machine performs broad matching/ranking on the input keyword. The broad-match learning machine 120 returns one or more top-ranked broad-match keywords to the advertisement platform 122, which uses the returned broad-match keywords to select an advertisement. The selected ad is returned to the client 124, for example in a web page, e-mail, embedded content, RSS feed, etc.) 

to determine, at least in part, based on an output of the machine-learning model, the at least one first image (Bilenko [0023] - The advertisement platform 122 may receive input/source keywords from a variety of sources. In FIG. 2, a client application 124, hosted on a client computer, provides a source keyword, for example, in a search query string or other input. The advertisement platform, possibly after determining that the input keyword should be expanded with broad-matched keywords, passes the input keyword to the broad-match learning machine 120. The broad-match learning machine performs broad matching/ranking on the input keyword. The broad-match learning machine 120 returns one or more top-ranked broad-match keywords to the advertisement platform 122, which uses the returned broad-match keywords to select an advertisement. The selected ad is returned to the client 124, for example in a web page, e-mail, embedded content, RSS feed, etc.) to associate with the first time in the timeline.

Bilenko is considered to be analogous to the claimed invention because it is in the same field of selecting data in response to keywords. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner in view of Means further in view of Bilenko to provide for a machine learning algorithm for selecting data based on keywords. Doing so would provide flexibility for rapidly changing trends in keywords.

Regarding claim 4, Turner in view of Means in view of Gauci in view of Bilenko discloses the system as recited in claim 3.
Turner discloses 
the operations further comprising: 
at least one first image or the first time (Turner [0097] - For example, a content management system may specify one or more transmission destinations for stitched content in destination data associated with the selected digital content; [0087] Process 700 may include a step 724, where each piece of digital content and associated matching advertising content is processed to form a stitched, combined content).

Turner discloses at least one first image or the first time. However, Turner does not disclose receiving user input or the timeline or the machine-learning model;
receiving a user input to change at least one of the at least one first image or the first time in the timeline; 
and updating the machine-learning model based at least in part on the user input.  

However, Means does disclose
receiving a user input (Means [0090] - FIG. 3L depicts an exemplary embodiment of a diagram 396 of an exemplary supervisor interface management/reporting tool bar indicating exemplary member view, where various captured audio call activity, the audio level indicator may, e.g., but not limited to, allow the supervisor to play audio blocks, which actually contain audio, versus silence, etc., screen pane and web cam pane activity may be viewed via graphical user interface display of an exemplary timeline indicating call segments, and a selected magnified area for a selected call segment, including providing for adding comments by a coach/manager/supervisor, as well as review of the team member's action button selections, screen shot and web cam image of the team member, according to an exemplary embodiment)

the timeline (Means [0090] - FIG. 3L depicts an exemplary embodiment of a diagram 396 of an exemplary supervisor interface management/reporting tool bar indicating exemplary member view, where various captured audio call activity, the audio level indicator may, e.g., but not limited to, allow the supervisor to play audio blocks, which actually contain audio, versus silence, etc., screen pane and web cam pane activity may be viewed via graphical user interface display of an exemplary timeline indicating call segments, and a selected magnified area for a selected call segment, including providing for adding comments by a coach/manager/supervisor, as well as review of the team member's action button selections, screen shot and web cam image of the team member, according to an exemplary embodiment).
Means is considered to be analogous to the claimed invention because it is in the same field of analyzing user activity including audio. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner further in view of Means to provide for presenting a timeline in a user interface. Doing so would provide users with the ability to interact, review and analyze user activity including audio.

Means discloses receiving a user input and the timeline. However, Turner in view of Means does not disclose
and updating the machine-learning model based at least in part on the user input.  

However, Bilenko does disclose
receiving user input to change at least one of the first image or the first time and updating the machine-learning model based at least in part on the user input (Bilenko [0024] - the user's impression may be recorded in the form of a clickthrough response (i.e., clicking, hovering, etc.), stored in a click-through log 128. … The click-through log entries and their respective keyword pairs are then used to train the broad-match learning machine 120).  

Bilenko is considered to be analogous to the claimed invention because it is in the same field of selecting data in response to keywords. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner in view of Means further in view of Bilenko to provide for training a machine learning algorithm based on user input. Doing so would provide flexibility for rapidly changing trends in keywords.

Regarding claim 5, Turner in view of Means in view of Gauci in view of Bilenko discloses the system as recited in claim 3.
Turner in view of Means in view of Gauci does not disclose
the operations further comprising: 
receiving information regarding consumer interactions with the enhanced audio content on a plurality of electronic devices; 
and updating the machine-learning model based at least in part on receiving the information regarding the consumer interactions.

Bilenko discloses
the operations further comprising: 
receiving information regarding consumer interactions with the enhanced audio content on a plurality of electronic devices (Bilenko [0024] - the user's impression may be recorded in the form of a clickthrough response (i.e., clicking, hovering, etc.), stored in a click-through log 128. … The click-through log entries and their respective keyword pairs are then used to train the broad-match learning machine 120); 

and updating the machine-learning model based at least in part on receiving the information regarding the consumer interactions (Bilenko [0024] - the user's impression may be recorded in the form of a clickthrough response (i.e., clicking, hovering, etc.), stored in a click-through log 128. … The click-through log entries and their respective keyword pairs are then used to train the broad-match learning machine 120).

Bilenko is considered to be analogous to the claimed invention because it is in the same field of selecting data in response to keywords. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner in view of Means further in view of Bilenko to provide for training a machine learning algorithm based on user input. Doing so would provide flexibility for rapidly changing trends in keywords.

Regarding claim 10, Turner in view of Means in view of Gauci discloses the system as recited in claim 8.
Turner discloses 
the operation of determining, based on at least one of the keywords (Turner [0085] -For example, one piece of digital content may be associated with multiple advertising content. For example, a first podcase may be associated with a first matching advertising content, as well as a second matching advertising content that is different to the first. The first matching advertising content may be associated with a first advertising slot, such as a pre-content advertising slot, for example. The second matching advertising content may be associated with a second advertising slot, such as a post-content advertising slot, for example), 

the first image to associate with the first time (Turner [0085] -For example, one piece of digital content may be associated with multiple advertising content. For example, a first podcase may be associated with a first matching advertising content, as well as a second matching advertising content that is different to the first. The first matching advertising content may be associated with a first advertising slot, such as a pre-content advertising slot, for example. The second matching advertising content may be associated with a second advertising slot, such as a post-content advertising slot, for example).

Turner does not disclose
in the timeline corresponding to the audio content 

further comprising inputting the at least one keyword into a machine-learning model to determine, based at least in part on an output of the machine-learning model, the first image to associate with the first time in the timeline.  

Means discloses
the timeline (Means [0090] - FIG. 3L depicts an exemplary embodiment of a diagram 396 of an exemplary supervisor interface management/reporting tool bar indicating exemplary member view, where various captured audio call activity, the audio level indicator may, e.g., but not limited to, allow the supervisor to play audio blocks, which actually contain audio, versus silence, etc., screen pane and web cam pane activity may be viewed via graphical user interface display of an exemplary timeline indicating call segments, and a selected magnified area for a selected call segment, including providing for adding comments by a coach/manager/supervisor, as well as review of the team member's action button selections, screen shot and web cam image of the team member, according to an exemplary embodiment).

Means is considered to be analogous to the claimed invention because it is in the same field of analyzing user activity including audio. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner further in view of Means to provide for presenting a timeline in a user interface. Doing so would provide users with the ability to interact, review and analyze user activity including audio.

Turner in view of Means does not disclose
further comprising inputting the at least one keyword into a machine-learning model to determine, based at least in part on an output of the machine-learning model, the first image to associate with the first time in the timeline.  

Bilenko discloses
further comprising inputting the at least one keyword into a machine-learning model (Bilenko [0023] - The advertisement platform 122 may receive input/source keywords from a variety of sources. In FIG. 2, a client application 124, hosted on a client computer, provides a source keyword, for example, in a search query string or other input. The advertisement platform, possibly after determining that the input keyword should be expanded with broad-matched keywords, passes the input keyword to the broad-match learning machine 120. The broad-match learning machine performs broad matching/ranking on the input keyword. The broad-match learning machine 120 returns one or more top-ranked broad-match keywords to the advertisement platform 122, which uses the returned broad-match keywords to select an advertisement. The selected ad is returned to the client 124, for example in a web page, e-mail, embedded content, RSS feed, etc.) 

to determine, based at least in part on an output of the machine-learning model, the first image to associate with the first time in the timeline (Bilenko [0023] - The advertisement platform 122 may receive input/source keywords from a variety of sources. In FIG. 2, a client application 124, hosted on a client computer, provides a source keyword, for example, in a search query string or other input. The advertisement platform, possibly after determining that the input keyword should be expanded with broad-matched keywords, passes the input keyword to the broad-match learning machine 120. The broad-match learning machine performs broad matching/ranking on the input keyword. The broad-match learning machine 120 returns one or more top-ranked broad-match keywords to the advertisement platform 122, which uses the returned broad-match keywords to select an advertisement. The selected ad is returned to the client 124, for example in a web page, e-mail, embedded content, RSS feed, etc.).  

Bilenko is considered to be analogous to the claimed invention because it is in the same field of selecting data in response to keywords. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner in view of Means further in view of Bilenko to provide for a machine learning algorithm for selecting data based on keywords. Doing so would provide flexibility for rapidly changing trends in keywords.

Regarding claim 11, Turner in view of Means in view of Gauci in view of Bilenko discloses the system as recited in claim 10.
Turner further discloses 
the operations further comprising: selecting, based on a at least one second keyword (Turner [0085] -For example, one piece of digital content may be associated with multiple advertising content. For example, a first podcase may be associated with a first matching advertising content, as well as a second matching advertising content that is different to the first. The first matching advertising content may be associated with a first advertising slot, such as a pre-content advertising slot, for example. The second matching advertising content may be associated with a second advertising slot, such as a post-content advertising slot, for example), 

a second image to associate with a second time (Turner [0085] -For example, one piece of digital content may be associated with multiple advertising content. For example, a first podcase may be associated with a first matching advertising content, as well as a second matching advertising content that is different to the first. The first matching advertising content may be associated with a first advertising slot, such as a pre-content advertising slot, for example. The second matching advertising content may be associated with a second advertising slot, such as a post-content advertising slot, for example)

Turner discloses a second keyword and a second image and a second time. However, Turner does not disclose the timeline;
selecting, based on a at least one second keyword, a second image to associate with a second time in the timeline; 
receiving a user input to change at least one of the second image or the second time in the timeline.

Turner does not disclose 
and updating the machine-learning model based at least in part on the user input.  

Means discloses
the timeline (Means [0090] - FIG. 3L depicts an exemplary embodiment of a diagram 396 of an exemplary supervisor interface management/reporting tool bar indicating exemplary member view, where various captured audio call activity, the audio level indicator may, e.g., but not limited to, allow the supervisor to play audio blocks, which actually contain audio, versus silence, etc., screen pane and web cam pane activity may be viewed via graphical user interface display of an exemplary timeline indicating call segments, and a selected magnified area for a selected call segment, including providing for adding comments by a coach/manager/supervisor, as well as review of the team member's action button selections, screen shot and web cam image of the team member, according to an exemplary embodiment);

receiving a user input (Means [0090] - FIG. 3L depicts an exemplary embodiment of a diagram 396 of an exemplary supervisor interface management/reporting tool bar indicating exemplary member view, where various captured audio call activity, the audio level indicator may, e.g., but not limited to, allow the supervisor to play audio blocks, which actually contain audio, versus silence, etc., screen pane and web cam pane activity may be viewed via graphical user interface display of an exemplary timeline indicating call segments, and a selected magnified area for a selected call segment, including providing for adding comments by a coach/manager/supervisor, as well as review of the team member's action button selections, screen shot and web cam image of the team member, according to an exemplary embodiment)
Means is considered to be analogous to the claimed invention because it is in the same field of analyzing user activity including audio. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner further in view of Means to provide for presenting a timeline in a user interface. Doing so would provide users with the ability to interact, review and analyze user activity including audio.

Turner in view of Means does not disclose
and updating the machine-learning model based at least in part on the user input.  

Bilenko discloses
receiving user input to change at least one of the second image or the second time and updating the machine-learning model based at least in part on the user input (Bilenko [0024] - the user's impression may be recorded in the form of a clickthrough response (i.e., clicking, hovering, etc.), stored in a click-through log 128. … The click-through log entries and their respective keyword pairs are then used to train the broad-match learning machine 120).  
Bilenko is considered to be analogous to the claimed invention because it is in the same field of selecting data in response to keywords. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner in view of Means further in view of Bilenko to provide for training a machine learning algorithm based on user input. Doing so would provide flexibility for rapidly changing trends in keywords.

Regarding claim 14, Turner in view of Means in view of Gauci discloses the system as recited in claim 8.
Turner in view of Means in view of Gauci does not disclose
the operations further comprising ranking the plurality of keywords based at least in part on a history of user interaction with previous enhanced audio content.

Bilenko discloses
the operations further comprising ranking the plurality of keywords based at least in part on a history of user interaction with previous enhanced audio content (Bilenko [0006] - A source keyword may be received multiple times and in response a machine-learning algorithm may be used to produce or train a ranker that ranks respective matching-keywords that have been determined to match the source keyword. A portion or unit of content may be generated based on one of the ranked matching-keywords. The content is transmitted via a network to a client device and a user's impression of the content is recorded. The machine-learning algorithm may continue to learn about matching-keywords for arbitrary source keywords from recorded impressions (e.g., clickthrough data) and in turn inform or train a ranking component that ranks keywords. The learning alters how the machine-learning algorithm evaluates matching-keywords determined to match the source keyword. It should be noted that "keyword" is used herein in a manner consistent with the meaning it conveys to those of ordinary skill in the art of keyword matching; "keyword" refers to a single word or a short phrase of words that form a semantic unit).
Bilenko is considered to be analogous to the claimed invention because it is in the same field of selecting data in response to keywords. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner in view of Means in view of Gauci further in view of Bilenko to provide for ranking of keywords. Doing so would provide flexibility for rapidly changing trends in keywords.

Regarding claim 16, Turner in view of Means in view of Gauci discloses the computing device as recited in claim 15.
Turner does not disclose
the operations further comprising using a machine-learning model to select, at least in part, the at least one image based on the at least one keyword.

Bilenko does disclose
the operations further comprising using a machine-learning model to select, at least in part, the at least one image based on the at least one keyword (Bilenko [0023] - The advertisement platform 122 may receive input/source keywords from a variety of sources. In FIG. 2, a client application 124, hosted on a client computer, provides a source keyword, for example, in a search query string or other input. The advertisement platform, possibly after determining that the input keyword should be expanded with broad-matched keywords, passes the input keyword to the broad-match learning machine 120. The broad-match learning machine performs broad matching/ranking on the input keyword. The broad-match learning machine 120 returns one or more top-ranked broad-match keywords to the advertisement platform 122, which uses the returned broad-match keywords to select an advertisement. The selected ad is returned to the client 124, for example in a web page, e-mail, embedded content, RSS feed, etc.). 
 
Bilenko is considered to be analogous to the claimed invention because it is in the same field of selecting data in response to keywords. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner in view of Means in view of Gauci further in view of Bilenko to provide for a machine learning algorithm for selecting data based on keywords. Doing so would provide flexibility for rapidly changing trends in keywords.

Regarding claim 17, Turner in view of Means in view of Gauci in view of Bilenko discloses the computing device as recited in claim 16.
Turner does not disclose
the operations further comprising receiving the machine-learning model with an application configured to present the user interface on the display.

Means does disclose
with an application configured to present the user interface on the display (Means [0090] - FIG. 3L depicts an exemplary embodiment of a diagram 396 of an exemplary supervisor interface management/reporting tool bar indicating exemplary member view, where various captured audio call activity, the audio level indicator may, e.g., but not limited to, allow the supervisor to play audio blocks, which actually contain audio, versus silence, etc., screen pane and web cam pane activity may be viewed via graphical user interface display of an exemplary timeline indicating call segments, and a selected magnified area for a selected call segment, including providing for adding comments by a coach/manager/supervisor, as well as review of the team member's action button selections, screen shot and web cam image of the team member, according to an exemplary embodiment).
Means is considered to be analogous to the claimed invention because it is in the same field of analyzing user activity including audio. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner further in view of Means to provide for presenting a timeline in a user interface. Doing so would provide users with the ability to interact, review and analyze user activity including audio.

Turner in view of Means does not disclose
the operations further comprising receiving the machine-learning model.

Bilenko does disclose
the operations further comprising receiving the machine-learning model (Bilenko [0023] - The advertisement platform 122 may receive input/source keywords from a variety of sources. In FIG. 2, a client application 124, hosted on a client computer, provides a source keyword, for example, in a search query string or other input. The advertisement platform, possibly after determining that the input keyword should be expanded with broad-matched keywords, passes the input keyword to the broad-match learning machine 120. The broad-match learning machine performs broad matching/ranking on the input keyword. The broad-match learning machine 120 returns one or more top-ranked broad-match keywords to the advertisement platform 122, which uses the returned broad-match keywords to select an advertisement. The selected ad is returned to the client 124, for example in a web page, e-mail, embedded content, RSS feed, etc.).
Bilenko is considered to be analogous to the claimed invention because it is in the same field of selecting data in response to keywords. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner in view of Means in view of Gauci further in view of Bilenko to provide for a machine learning algorithm for selecting data based on keywords. Doing so would provide flexibility for rapidly changing trends in keywords.

Regarding claim 18, Turner in view of Means in view of Gauci discloses the computing device as recited in claim 15.
Turner does not disclose
the operation of identifying the plurality of keywords in the text further comprising referring to a keyword library stored on the computing device to determine the plurality of keywords.  

Belenko does disclose
the operation of identifying the plurality of keywords in the text further comprising referring to a keyword library stored on the computing device to determine the plurality of keywords (Belenko [0031] - FIG. 5 shows another example of broad-matching and online training. An input keyword is received 180 is received by a matching system 181 (executing on one or more computers). Matching keywords and respective hypotheses are found or selected 182, perhaps by a plurality of independent or integrated keyword matching algorithms. The hypotheses are applied 184 to the matching keywords to generate scores and rank the matching keywords. A top-ranked matching keyword may be selected 186, and a decision of the selection is recorded 188. It may be helpful to store information associating the selection with the particular hypothesis that was applied (as hypotheses may adapt over time in based on feedback). The recorded entry may be in the form of the input keyword (e.g., "kw1"), the selected 186 keyword (e.g., "bm-kw12"), and the hypothesis that was used (e.g., "h12")).
Bilenko is considered to be analogous to the claimed invention because it is in the same field of selecting data in response to keywords. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner in view of Means in view of Gauci further in view of Bilenko to provide for a machine learning algorithm which stores keywords with the particular hypothesis which was applied. Doing so would provide flexibility for rapidly changing trends in keywords.

Regarding claim 19, Turner in view of Means in view of Gauci discloses the computing device as recited in claim 15.
Turner does not disclose
the operations further comprising: sending one or more of the keywords to another computing device over a network; and receiving, over the network, the at least one image for presentation in the user interface.  

Bilenko does disclose
the operations further comprising: sending one or more of the keywords to another computing device over a network (Bilenko [0022] - One or more of the top-ranked broad-match keywords are returned or transmitted (e.g., via a network, bus, etc.) to the advertisement platform 122, which then uses the broad-match keywords to select one or more advertisements); 

and receiving, over the network, the at least one image for presentation in the user interface (Bilenko [0023] - The selected ad is returned to the client 124, for example in a web page, e-mail, embedded content, RSS feed, etc).
Bilenko is considered to be analogous to the claimed invention because it is in the same field of selecting data in response to keywords. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Turner in view of Means in view of Gauci further in view of Bilenko to provide for sending keywords over a network and receiving an advertisement in response. Doing so would provide flexibility for functionality of the components and may allow the system be arranged in a variety of ways.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL J. MUELLER whose telephone number is (571)272-1875. The examiner can normally be reached M-F 8:30am-5:30pm (Eastern).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PAUL J. MUELLER/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657