DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on July 15, 2020. 
Claims 1-2, 4, 6-11, 13-16, and 18-24 are pending in the application. As such, claims 1-2, 4, 6-11, 13-16, and 18-24 have been examined. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings were received on July 15, 2020.  These drawings have been accepted and considered by the Examiner.
Claim Objections
Claim 21 is objected to because of the following informalities:  Claim 21 uses “the query” whereas all other claims use “the user query”.  Claim 21 should be corrected to provide consistency throughout the claims.  Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 6-9, 11, 13-16, 18-19 and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Sharifi et al. (US Patent Pub. No. 2013/0080159), hereinafter Sharifi, in view of Taboriskiy et al. (US Patent Pub. No. 2016/0306797), hereinafter Taboriskiy.

Regarding claim 1, Sharifi teaches a computer-implemented method (Sharifi [0027] Audio identification server 110 includes a memory that stores computer executable components and a processor that executes computer executable components stored in the memory, a non-limiting example of which can be found with reference to FIG. 13), 
comprising: 
analyzing spoken audio content associated with an audio presentation to identify one or more entities addressed in the audio presentation (Sharifi [0045] Fingerprint component 220 also provides a fingerprint generator component 430 that generates digital fingerprints for extracted audio segments. The fingerprint component can also employ voice recognition to convert voice within an audio segment, either spoken or singing, to text. The digital fingerprint and/or the text are employed by matching component 230 in order to identify data records in the library that match the audio segment); 
receiving a user query during playback of the audio presentation (Sharifi [0052] In one embodiment, client interface component 170 receives a query from client device 160 to provide identification information associated with an audio portion of a broadcast media stream); 
generating a response to the user query (Sharifi [0066] At 1030, a response to the query is provided (e.g. to a client device) including identification information associated with creative works located in audio segments at or near the timestamp or counter value/range provided in the query (e.g. by a client interface component)), 
wherein determining if the user query is directed to the audio presentation or generating the response to the user query uses the identified one or more entities (Sharifi [0066] At 1030, a response to the query is provided (e.g. to a client device) including identification information associated with creative works located in audio segments at or near the timestamp or counter value/range provided in the query (e.g. by a client interface component)).
Sharifi does not teach
and determining if the user query is directed to the audio presentation, 
and if the user query is determined to be directed to the audio presentation, 
Taboriskiy teaches
and determining if the user query is directed to the audio presentation (Taboriskiy [0059] In a more particular example, FIG. 4 shows an illustrative screen that includes an interface that prompts the user with one or more trigger terms or search initiating keywords for initiating a query relating to the presented media content in accordance with some implementations of the disclosed subject matter. As shown, in some implementations, a mobile device 410 can present interface 420 that prompts the user to speak the trigger terms “OK Smart TV” or “Hey TV” to initiate a query relating to the media content being presented on a media playback device. In some implementations, interface 420 can be updated to indicate different trigger terms that can be used to initiate a subsequent search (e.g., a follow-up question by one of the users being presented with the media content)).
Taboriskiy is considered to be analogous to the claimed invention because it is in the same field of processing queries relating to presented media content. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharifi further in view of Taboriskiy to allow for determining that a request is directed to the current audio. Doing so would provide the user with additional information relating to the presented answer (Taboriskiy [0068]).

Regarding claim 6, Sharifi in view of Taboriskiy teaches the method of claim 1.
Sharifi further teaches
further comprising determining one or more suggestions using the identified one or more entities based on a particular point in the audio presentation (Sharifi [0052] In one embodiment, client interface component 170 receives a query from client device 160 to provide identification information associated with an audio portion of a broadcast media stream. For example, a viewer at client device 160 may be watching a show and hear a song that she would like identified. The viewer then initiates a request for identification information associated with the audio portion of the broadcast media stream in which she heard the song. Client device 160 sends a query to client interface component 170 providing information about the audio portion of the broadcast media stream. For example, this information can include an identifier for the broadcast media stream, such as a channel identifier, and further include an identifier of where the audio portion is located in the broadcast media stream. This location identifier can include a timestamp or counter value associated with the broadcast media stream. Moreover, the timestamp or counter value may indicate a single point in the stream, or a time or counter range. Given this information about the audio portion received in the query, client interface component 170 examines the metadata associated with the broadcast media stream by identification component 120 in order to provide identification information associated with creative works located in audio segments at or near the timestamp or counter value provided in the query).

Regarding claim 7, Sharifi in view of Taboriskiy teaches the method of claim 6, 
further comprising presenting the one or more suggestions on an assistant device (Sharifi [0050] Client device 160 can be any type of device that receives broadcast media streams, for example, mobile phone, personal data assistant, laptop computer, tablet computer, desktop computer, server system, cable set top box, satellite set top box, cable modem, television set, media extender device, video cassette recorder device, blu-ray device, DVD (digital versatile disc or digital video disc) device, compact disc device, video game system, audio/video receiver, radio device, portable music player, navigation system, car stereo, etc.; [0056] In another, non-limiting implementation, the viewer can manually initiate the query through an input device, non-limiting examples of which are described below in relation to FIG. 13. Query component 510 further receives identification information from audio identification server 110 in response to the query. Client device 160 further outputs the received identification information, e.g., displaying the received identification information on a display device)
during playback of the particular point in the audio presentation by the assistant device (Sharifi [0036] The audio identification server 110, in addition to proactively determining identification information, can push the identification information to client devices. In this manner, the client device 160 is be able to respond immediately to a viewer request for identification information. For example, the identification information can be pushed to client devices 160 in the background of respective media streams. The identification information can be masked or unmasked as a function of user requests for such information. The identification information can be displayed concurrently with the media stream, before, or after playback of the media stream. The identification information can be filtered as a function of historical, demographic, or other metrics associated with user viewing or preferences).

Regarding claim 8, Sharifi in view of Taboriskiy teaches the method of claim 1.
Sharifi further teaches
further comprising preprocessing responses to one or more potential user queries prior to receiving the user query using the identified one or more entities (Sharifi [0036] The audio identification server 110, in addition to proactively determining identification information, can push the identification information to client devices. In this manner, the client device 160 is be able to respond immediately to a viewer request for identification information. For example, the identification information can be pushed to client devices 160 in the background of respective media streams. The identification information can be masked or unmasked as a function of user requests for such information. The identification information can be displayed concurrently with the media stream, before, or after playback of the media stream. The identification information can be filtered as a function of historical, demographic, or other metrics associated with user viewing or preferences).

Regarding claim 9, Sharifi in view of Taboriskiy teaches the method of claim 8.
Sharifi further teaches
wherein generating the response to the user query includes using a preprocessed response from the one or more preprocessed responses to generate the response to the user query (Sharifi [0036] The audio identification server 110, in addition to proactively determining identification information, can push the identification information to client devices. In this manner, the client device 160 is be able to respond immediately to a viewer request for identification information. For example, the identification information can be pushed to client devices 160 in the background of respective media streams. The identification information can be masked or unmasked as a function of user requests for such information. The identification information can be displayed concurrently with the media stream, before, or after playback of the media stream. The identification information can be filtered as a function of historical, demographic, or other metrics associated with user viewing or preferences).

Regarding claim 11, Sharifi in view of Taboriskiy teaches the method of claim 1.
Sharifi further teaches
further comprising buffering audio data from the audio presentation prior to receiving the user query (Sharifi [0057] In order to compensate for delay in user initiating a request for identification information, client device 160 can continuously buffer a predetermined amount of recorded audio associated with a broadcast media stream and send this buffer of recorded audio with the query. Audio identification server 110 then employs, this buffer of recorded audio to determine identification information, for example, in the manner it performs for received broadcast media streams. Accordingly, rather than a manual process in which a user attempts to capture a snippet using his phone, in one embodiment an automated process is employed in which the client device is continuously buffering audio from the media stream, so the client device is ready to send a relevant snippet when the user initiates a query),
wherein analyzing the spoken audio content associated with the audio presentation includes 
analyzing spoken audio content from the buffered audio data after receiving the user query to identify one or more entities addressed in the buffered audio data (Sharifi [0057] In order to compensate for delay in user initiating a request for identification information, client device 160 can continuously buffer a predetermined amount of recorded audio associated with a broadcast media stream and send this buffer of recorded audio with the query. Audio identification server 110 then employs, this buffer of recorded audio to determine identification information, for example, in the manner it performs for received broadcast media streams. Accordingly, rather than a manual process in which a user attempts to capture a snippet using his phone, in one embodiment an automated process is employed in which the client device is continuously buffering audio from the media stream, so the client device is ready to send a relevant snippet when the user initiates a query), 
and wherein determining if the user query is directed to the audio presentation or generating the response to the user query uses the identified one or more entities addressed in the buffered audio data (Sharifi [0057] In order to compensate for delay in user initiating a request for identification information, client device 160 can continuously buffer a predetermined amount of recorded audio associated with a broadcast media stream and send this buffer of recorded audio with the query. Audio identification server 110 then employs, this buffer of recorded audio to determine identification information, for example, in the manner it performs for received broadcast media streams. Accordingly, rather than a manual process in which a user attempts to capture a snippet using his phone, in one embodiment an automated process is employed in which the client device is continuously buffering audio from the media stream, so the client device is ready to send a relevant snippet when the user initiates a query).

Regarding claim 13, Sharifi in view of Taboriskiy teaches the method of claim 1.
Sharifi in view of Taboriskiy teaches determining if the user query is directed to the audio presentation, however Sharifi in view of Taboriskiy does not teach
wherein determining if the user query is directed to the audio presentation includes 
determining if the user query is directed to the audio presentation using the identified one or more entities.
Taboriskiy further teaches
determining if the user query is directed to the audio presentation using the identified one or more entities (Taboriskiy [0062] Referring back to FIG. 3, in some implementations, in response to receiving one or more trigger terms and a query phrase, process 300 can determine media playback information associated with the media content presented during the receipt of the query at 370. For example, in response to receiving one or more trigger terms (e.g., “OK Smart TV”) and a query phrase (e.g., “how old is this actor?”), the mobile application executing on the mobile device that received the query can request media playback information corresponding to the received query from the media playback application executing on the media playback device).
Taboriskiy is considered to be analogous to the claimed invention because it is in the same field of processing queries relating to presented media content. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharifi further in view of Taboriskiy to allow for determining that a request is directed to the current audio using the entity. Doing so would provide the user with additional information relating to the presented answer (Taboriskiy [0068]).

Regarding claim 14, Sharifi in view of Taboriskiy teaches the method of claim 1.
Sharifi further teaches
wherein generating the response to the user query includes 
generating the response to the user query using the identified one or more entities (Sharifi [0066] At 1030, a response to the query is provided (e.g. to a client device) including identification information associated with creative works located in audio segments at or near the timestamp or counter value/range provided in the query (e.g. by a client interface component)).

Regarding claim 15, Sharifi in view of Taboriskiy teaches the method of claim 1.
Sharifi further teaches
wherein determining if the user query is directed to the audio presentation includes 
determining if the user query is directed to a particular point in the audio presentation (Sharifi [0039] Referring to FIG. 3, a non-limiting implementation of an audio tagging component 210 is illustrated. The audio tagging component 210 infers or determines distinct audio segments within the broadcast media streams in connection with tagging searchable creative works identification information. For example, a distinct audio segment that is tagged may be a segment of the broadcast media stream where a song is playing in the background, or a line or series of lines spoken by an actor. It is to be appreciated that audio segments may overlap each other, and thus a hierarchy of tags can be employed to facilitate pinpoint identification of audio segments or portions thereof. Audio tagging component 210 includes audio segment component 310 which infers or determines audio segments within the media streams. Furthermore, audio tagging component 210 includes metadata component 320 that associates metadata with the media streams. The metadata can provide a vast array of information in connection with identification of creative works--further details regarding the metadata is provided below).

Regarding claim 16, Sharifi in view of Taboriskiy teaches the method of claim 1.
Sharifi further teaches
wherein determining if the user query is directed to the audio presentation includes 
determining if the user query is directed to a particular entity in the audio presentation (Sharifi [0039] Referring to FIG. 3, a non-limiting implementation of an audio tagging component 210 is illustrated. The audio tagging component 210 infers or determines distinct audio segments within the broadcast media streams in connection with tagging searchable creative works identification information. For example, a distinct audio segment that is tagged may be a segment of the broadcast media stream where a song is playing in the background, or a line or series of lines spoken by an actor. It is to be appreciated that audio segments may overlap each other, and thus a hierarchy of tags can be employed to facilitate pinpoint identification of audio segments or portions thereof. Audio tagging component 210 includes audio segment component 310 which infers or determines audio segments within the media streams. Furthermore, audio tagging component 210 includes metadata component 320 that associates metadata with the media streams. The metadata can provide a vast array of information in connection with identification of creative works--further details regarding the metadata is provided below).

Regarding claim 18, Sharifi in view of Taboriskiy teaches the method of claim 1.
Sharifi further teaches
wherein receiving the user query is performed on an assistant device (Sharifi [0050] Client device 160 can be any type of device that receives broadcast media streams, for example, mobile phone, personal data assistant, laptop computer, tablet computer, desktop computer, server system, cable set top box, satellite set top box, cable modem, television set, media extender device, video cassette recorder device, blu-ray device, DVD (digital versatile disc or digital video disc) device, compact disc device, video game system, audio/video receiver, radio device, portable music player, navigation system, car stereo, etc.).
Sharifi in view of Taboriskiy teaches determining if the user query is directed to the audio presentation and a user query, however Sharifi in view of Taboriskiy does not teach
and wherein determining if the user query is directed to the audio presentation includes 
determining that the user query is directed to the audio presentation rather than a general query directed to the assistant device.
Taboriskiy teaches
determining that the user query is directed to the audio presentation rather than a general query directed to the assistant device (Taboriskiy [0059] In a more particular example, FIG. 4 shows an illustrative screen that includes an interface that prompts the user with one or more trigger terms or search initiating keywords for initiating a query relating to the presented media content in accordance with some implementations of the disclosed subject matter. As shown, in some implementations, a mobile device 410 can present interface 420 that prompts the user to speak the trigger terms “OK Smart TV” or “Hey TV” to initiate a query relating to the media content being presented on a media playback device. In some implementations, interface 420 can be updated to indicate different trigger terms that can be used to initiate a subsequent search (e.g., a follow-up question by one of the users being presented with the media content)).
Taboriskiy is considered to be analogous to the claimed invention because it is in the same field of processing queries relating to presented media content. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharifi further in view of Taboriskiy to allow for determining that a request is directed to the current audio. Doing so would provide the user with additional information relating to the presented answer (Taboriskiy [0068]).

Regarding claim 19, Sharifi in view of Taboriskiy teaches the method of claim 18.
Sharifi in view of Taboriskiy teaches determining if the user query is directed to the audio presentation and a user query, however Sharifi in view of Taboriskiy does not teach
wherein determining if the user query is directed to the audio presentation further includes
determining that the user query is directed to the assistant device rather than a non-query utterance.
Taboriskiy teaches
determining that the user query is directed to the assistant device rather than a non-query utterance (Taboriskiy [0057] In response to converting ambient sounds to one or more text inputs, process 300 can determine whether the text inputs include a trigger term that corresponds to a request to initiate a query relating to the presented media content and a query phrase at 360. For example, in response to receiving the text inputs of “OK Smart TV, how old is this actor?,” process 300 can determine whether one or more trigger terms have been received to initiate a query. In this example, the text input “OK Smart TV” can be determined to match one or more trigger terms stored on the mobile device for initiating a query relating to presented media content. In a more particular example, process 300 can ignore text inputs from ambient sounds until one or more trigger terms have been received).
Taboriskiy is considered to be analogous to the claimed invention because it is in the same field of processing queries relating to presented media content. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharifi further in view of Taboriskiy to allow for determining that a request is directed to the assistant device. Doing so would provide the user with additional information relating to the presented answer (Taboriskiy [0068]).

Regarding claim 23, Sharifi teaches a computer-implemented method (Sharifi [0027] Audio identification server 110 includes a memory that stores computer executable components and a processor that executes computer executable components stored in the memory, a non-limiting example of which can be found with reference to FIG. 13), 
comprising: 
during playback of an audio presentation including spoken audio content (Sharifi [0045] Fingerprint component 220 also provides a fingerprint generator component 430 that generates digital fingerprints for extracted audio segments. The fingerprint component can also employ voice recognition to convert voice within an audio segment, either spoken or singing, to text. The digital fingerprint and/or the text are employed by matching component 230 in order to identify data records in the library that match the audio segment), 
receiving a user query (Sharifi [0052] In one embodiment, client interface component 170 receives a query from client device 160 to provide identification information associated with an audio portion of a broadcast media stream); 
generating a response to the user query (Sharifi [0066] At 1030, a response to the query is provided (e.g. to a client device) including identification information associated with creative works located in audio segments at or near the timestamp or counter value/range provided in the query (e.g. by a client interface component)), 
wherein determining if the user query is directed to the audio presentation or generating the response to the user query uses one or more entities identified from analysis of the audio presentation (Sharifi [0066] At 1030, a response to the query is provided (e.g. to a client device) including identification information associated with creative works located in audio segments at or near the timestamp or counter value/range provided in the query (e.g. by a client interface component)).
Sharifi does not teach
and determining if the user query is directed to the audio presentation, 
and if the user query is determined to be directed to the audio presentation, 
Taboriskiy teaches
and determining if the user query is directed to the audio presentation (Taboriskiy [0059] In a more particular example, FIG. 4 shows an illustrative screen that includes an interface that prompts the user with one or more trigger terms or search initiating keywords for initiating a query relating to the presented media content in accordance with some implementations of the disclosed subject matter. As shown, in some implementations, a mobile device 410 can present interface 420 that prompts the user to speak the trigger terms “OK Smart TV” or “Hey TV” to initiate a query relating to the media content being presented on a media playback device. In some implementations, interface 420 can be updated to indicate different trigger terms that can be used to initiate a subsequent search (e.g., a follow-up question by one of the users being presented with the media content)).
Taboriskiy is considered to be analogous to the claimed invention because it is in the same field of processing queries relating to presented media content. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharifi further in view of Taboriskiy to allow for determining that a request is directed to the current audio. Doing so would provide the user with additional information relating to the presented answer (Taboriskiy [0068]).

Regarding claim 24, Sharifi teaches a computer-implemented method, 
comprising: 
during playback of an audio presentation including spoken audio content (Sharifi [0045] Fingerprint component 220 also provides a fingerprint generator component 430 that generates digital fingerprints for extracted audio segments. The fingerprint component can also employ voice recognition to convert voice within an audio segment, either spoken or singing, to text. The digital fingerprint and/or the text are employed by matching component 230 in order to identify data records in the library that match the audio segment), 
buffering audio data from the audio presentation (Sharifi [0057] In order to compensate for delay in user initiating a request for identification information, client device 160 can continuously buffer a predetermined amount of recorded audio associated with a broadcast media stream and send this buffer of recorded audio with the query. Audio identification server 110 then employs, this buffer of recorded audio to determine identification information, for example, in the manner it performs for received broadcast media streams. Accordingly, rather than a manual process in which a user attempts to capture a snippet using his phone, in one embodiment an automated process is employed in which the client device is continuously buffering audio from the media stream, so the client device is ready to send a relevant snippet when the user initiates a query)
and receiving a user query (Sharifi [0052] In one embodiment, client interface component 170 receives a query from client device 160 to provide identification information associated with an audio portion of a broadcast media stream); 
after receiving the user query, analyzing spoken audio content from the buffered audio data to identify one or more entities addressed in the buffered audio data (Sharifi [0057] In order to compensate for delay in user initiating a request for identification information, client device 160 can continuously buffer a predetermined amount of recorded audio associated with a broadcast media stream and send this buffer of recorded audio with the query. Audio identification server 110 then employs, this buffer of recorded audio to determine identification information, for example, in the manner it performs for received broadcast media streams. Accordingly, rather than a manual process in which a user attempts to capture a snippet using his phone, in one embodiment an automated process is employed in which the client device is continuously buffering audio from the media stream, so the client device is ready to send a relevant snippet when the user initiates a query); 
generating a response to the user query (Sharifi [0066] At 1030, a response to the query is provided (e.g. to a client device) including identification information associated with creative works located in audio segments at or near the timestamp or counter value/range provided in the query (e.g. by a client interface component)), 
wherein determining if the user query is directed to the audio presentation or generating the response to the user query uses the identified one or more entities (Sharifi [0066] At 1030, a response to the query is provided (e.g. to a client device) including identification information associated with creative works located in audio segments at or near the timestamp or counter value/range provided in the query (e.g. by a client interface component)).
Sharifi does not teach
and determining if the user query is directed to the audio presentation, 
and if the user query is determined to be directed to the audio presentation, 
Taboriskiy teaches
and determining if the user query is directed to the audio presentation (Taboriskiy [0059] In a more particular example, FIG. 4 shows an illustrative screen that includes an interface that prompts the user with one or more trigger terms or search initiating keywords for initiating a query relating to the presented media content in accordance with some implementations of the disclosed subject matter. As shown, in some implementations, a mobile device 410 can present interface 420 that prompts the user to speak the trigger terms “OK Smart TV” or “Hey TV” to initiate a query relating to the media content being presented on a media playback device. In some implementations, interface 420 can be updated to indicate different trigger terms that can be used to initiate a subsequent search (e.g., a follow-up question by one of the users being presented with the media content)).
Taboriskiy is considered to be analogous to the claimed invention because it is in the same field of processing queries relating to presented media content. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharifi further in view of Taboriskiy to allow for determining that a request is directed to the current audio. Doing so would provide the user with additional information relating to the presented answer (Taboriskiy [0068]).

Claims 2 and 4 are rejected under 35 U.S.C. 103 as being unpatentable over Sharifi in view of Taboriskiy in further view of Abebe et al. (US Patent Pub. No. 2018/0069914), hereinafter Abebe.

Regarding claim 2, Sharifi in view of Taboriskiy teaches the method of claim 1.
Sharifi further teaches
wherein analyzing the spoken audio content associated with the audio presentation includes: 
executing speech recognition processing on the spoken audio content to generate transcribed text (Sharifi [0045] Fingerprint component 220 also provides a fingerprint generator component 430 that generates digital fingerprints for extracted audio segments. The fingerprint component can also employ voice recognition to convert voice within an audio segment, either spoken or singing, to text. The digital fingerprint and/or the text are employed by matching component 230 in order to identify data records in the library that match the audio segment).
Sharifi in view of Taboriskiy does not teach
and executing natural language processing on the transcribed text to identify the one or more entities.
Abebe teaches
and executing natural language processing on the transcribed text to identify the one or more entities (Abebe [0027] The synchronizer 202 may forward contents of the input signal to one or both of the speech recognizer 204 and the text parser 206. To this end, the synchronizer 202 may comprise a splitter or demultiplexer. For instance, if the input signal contains an audio signal recorded from a television broadcast, the synchronizer 202 may forward the audio signal to the speech recognizer 204 for further processing. The speech recognizer 204 performs one or more speech recognition techniques (e.g., automatic speech recognition, natural language processing, etc.) on the audio signal in order to recognize the words contained in the audio signal. To this end, the speech recognizer 204 may be programmed to recognize speech in one or more languages. The speech recognizer 204 may further include speech-to-text capabilities for producing a text transcription of the audio signal. This text transcription may be forwarded to the text parser 206).
Abebe is considered to be analogous to the claimed invention because it is in the same field of enhancing digital media with supplemental contextually relevant content. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharifi in view of Taboriskiy in further view of Abebe to allow for using natural language processing. Doing so would provide a user with contextually relevant supplemental content about a character, object, or location in a timely manner.

Regarding claim 4, Sharifi in view of Taboriskiy in view of Abebe teaches the method of claim 2.
Sharifi further teaches
wherein receiving the user query is performed on an assistant device (Sharifi [0050] Client device 160 can be any type of device that receives broadcast media streams, for example, mobile phone, personal data assistant, laptop computer, tablet computer, desktop computer, server system, cable set top box, satellite set top box, cable modem, television set, media extender device, video cassette recorder device, blu-ray device, DVD (digital versatile disc or digital video disc) device, compact disc device, video game system, audio/video receiver, radio device, portable music player, navigation system, car stereo, etc.)
during playback of the audio presentation by the assistant device (Sharifi [0036] The audio identification server 110, in addition to proactively determining identification information, can push the identification information to client devices. In this manner, the client device 160 is be able to respond immediately to a viewer request for identification information. For example, the identification information can be pushed to client devices 160 in the background of respective media streams. The identification information can be masked or unmasked as a function of user requests for such information. The identification information can be displayed concurrently with the media stream, before, or after playback of the media stream. The identification information can be filtered as a function of historical, demographic, or other metrics associated with user viewing or preferences), 
and wherein at least one of executing the speech recognition processing and executing the natural language processing is performed prior to playback of the audio presentation (Sharifi [0036] The audio identification server 110, in addition to proactively determining identification information, can push the identification information to client devices. In this manner, the client device 160 is be able to respond immediately to a viewer request for identification information. For example, the identification information can be pushed to client devices 160 in the background of respective media streams. The identification information can be masked or unmasked as a function of user requests for such information. The identification information can be displayed concurrently with the media stream, before, or after playback of the media stream. The identification information can be filtered as a function of historical, demographic, or other metrics associated with user viewing or preferences).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Sharifi in view of Taboriskiy in further view of Parthasarathi et al. (US Patent Pub. No. 2017/0270919), hereinafter Parthasarathi.

Regarding claim 10, Sharifi in view of Taboriskiy teaches the method of claim 1.
Sharifi in view of Taboriskiy teaches determining if the user query is directed to the audio presentation, transcribing text from the audio presentation, and a user query, however Sharifi in view of Taboriskiy does not teach
wherein determining if the user query is directed to the audio presentation includes 
providing transcribed text from the audio presentation and the user query to a neural network-based classifier trained to output an indication of whether a given user query is likely directed to a given audio presentation.
Parthasarathi teaches
providing transcribed text from the audio presentation and the user query to a neural network-based classifier trained to output an indication of whether a given user query is likely directed to a given audio presentation (Parthasarathi [0147] The server may also include an RNN encoder 950 for encoding data into a vector form as described above. The server may also include a model training component 2070 for training or retraining various model or classifiers discussed above. Various machine learning techniques may be used to perform various steps described above, such as training/retraining an RC, entity tagger, semantic parser, etc. Models may be trained and operated according to various machine learning techniques. Such techniques may include, for example, neural networks (such as deep neural networks and/or recurrent neural networks), inference engines, trained classifiers, etc. Examples of trained classifiers include Support Vector Machines (SVMs), neural networks, decision trees, AdaBoost (short for “Adaptive Boosting”) combined with decision trees, and random forests. Focusing on SVM as an example, SVM is a supervised learning model with associated learning algorithms that analyze data and recognize patterns in the data, and which are commonly used for classification and regression analysis; [0047] Once audio data corresponding to speech is identified, an ASR module 250 may convert the audio data 111 into text. The ASR transcribes audio data into text data representing the words of the speech contained in the audio data; [0130] FIG. 17 illustrates an example of classifying input audio data as desired speech or undesired speech using reference data that includes a wakeword. In this example a first user speaks an utterance “Alexa, play . . . some music.”).
Parthasarathi is considered to be analogous to the claimed invention because it is in the same field of speech processing systems using automatic speech recognition and natural language understanding. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharifi in view of Taboriskiy in further view of Parthasarathi to allow for using neural network trained classifiers. Doing so would allow the system to distinguish desired speech from undesired speech (Parthasarithi [0030]).

Claims 20 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Sharifi in view of Taboriskiy in further view of Suga (US Patent Pub. No. 2010/0232759).

Regarding claim 20, Sharifi in view of Taboriskiy teaches the method of claim 1.
Sharifi teaches the user query, however Sharifi in view of Taboriskiy does not teach
further comprising determining whether to pause the audio presentation in response to receiving the user query.
Suga teaches
further comprising determining whether to pause the audio presentation (Suga [0145] If the display changeover information represents ordinary TV program viewing ("YES" at S122), then the display-information discriminator 114 determines that the display information is a moving picture that cannot be paused (S143). On the other hand, if the display changeover information represents an embedded application ("YES" at S144), the display-information discriminator 114 checks the data format decoded in the data decoder 1108. If the format is the data format of a moving picture ("YES" at S145), then the display-information discriminator 114 determines that the display information is a moving picture that can be paused (i.e., a controllable moving picture) (S146))
in response to receiving the user query (Suga [0105] The print controller 112 detects occurrence of print error by receiving the response status indicating print error from the printer 2. Upon receiving response status, the print controller [based on receiving a response the print controller determines if display information is pauseable] 112 acquires the type of display information, which is being displayed on the display unit 110 at this time, from the display-information discriminator 114. If the display information is a moving-picture broadcast program ("YES" at S34), then the print controller 112 starts storing the data (audio data, video data, etc.) of the moving-picture broadcast program, which is output from the broadcast receiver 11, in the storage unit 115 in concurrence with display of the data (S35). For example, data of an error message that notifies the user of occurrence of the print error is generated using the graphics generator 1903 of the display controller 19 and the error-message data is output to the image synthesizer 1902. As a result, the error message is displayed by being superimposed on the moving-picture program to notify the user of occurrence of the error and prompt recovery (S36)).
Suga is considered to be analogous to the claimed invention because it is in the same field of controlling a playback apparatus. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharifi in view of Taboriskiy in further view of Suga to allow for pause determination based on receiving a signal. Doing so would provide a method of eliminating disruption of viewing by notifications and recording of unnecessary video.

Regarding claim 22, Sharifi in view of Taboriskiy in view of Suga teaches the method of claim 20.
Sharifi in view of Taboriskiy does not teach, however Suga teaches
wherein determining whether to pause the audio presentation includes 
determining whether the audio presentation is being played on a pauseable device (Suga [0149] In a case where a response representing error has been received at S33, the print controller 112 acquires the type of display information, which is being displayed on the display unit 110 at this time, from the display-information discriminator 114. If the display information is display information indicative of a moving picture that cannot be paused ("YES" at S94), then the print controller 112 starts storing the display information (video and audio data), which is output from the broadcast receiver 11, in the storage unit 115 in concurrence with display of the information (S35). As mentioned above, a moving picture that cannot be paused refers to a moving picture, such as a moving-picture broadcast program, for which the pause operation cannot be controlled from the television receiver 1. The print controller 112 displays the error message by superimposing it on the moving-picture display information, as illustrated in FIG. 10A (S36)), 
the method further comprising: 
in response to determining that the audio presentation is not being played on a pauseable device, presenting the generated response without pausing the audio presentation (Suga [0149] In a case where a response representing error has been received at S33, the print controller 112 acquires the type of display information, which is being displayed on the display unit 110 at this time, from the display-information discriminator 114. If the display information is display information indicative of a moving picture that cannot be paused ("YES" at S94), then the print controller 112 starts storing the display information (video and audio data), which is output from the broadcast receiver 11, in the storage unit 115 in concurrence with display of the information (S35). As mentioned above, a moving picture that cannot be paused refers to a moving picture, such as a moving-picture broadcast program, for which the pause operation cannot be controlled from the television receiver 1. The print controller 112 displays the error message by superimposing it on the moving-picture display information, as illustrated in FIG. 10A (S36)); 
and in response to determining that the audio presentation is being played on a pauseable device, pausing the audio presentation and presenting the generated response while the audio presentation is paused (Suga [0152] It is preferred that the error message presented at this time also include a message to the effect that display of the moving picture has been paused automatically and will be resumed automatically when the operating state is restored from the normal state from the error state).
Suga is considered to be analogous to the claimed invention because it is in the same field of controlling a playback apparatus. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharifi in view of Taboriskiy in further view of Suga to allow for pause determination. Doing so would provide a method of eliminating disruption of viewing by notifications and recording of unnecessary video.

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Sharifi in view of Taboriskiy in view of Suga in further view of Carbune et al. (US Patent Pub. No. 2018/0061400), hereinafter Carbune.

Regarding claim 21, Sharifi in view of Taboriskiy in view of Suga teaches the method of claim 20.
Sharifi in view of Taboriskiy does not teach, however Suga teaches
wherein determining whether to pause the audio presentation includes 
the method further comprising: 
presenting the generated response visually and without pausing the audio presentation (Suga [0149] In a case where a response representing error has been received at S33, the print controller 112 acquires the type of display information, which is being displayed on the display unit 110 at this time, from the display-information discriminator 114. If the display information is display information indicative of a moving picture that cannot be paused ("YES" at S94), then the print controller 112 starts storing the display information (video and audio data), which is output from the broadcast receiver 11, in the storage unit 115 in concurrence with display of the information (S35). As mentioned above, a moving picture that cannot be paused refers to a moving picture, such as a moving-picture broadcast program, for which the pause operation cannot be controlled from the television receiver 1. The print controller 112 displays the error message by superimposing it on the moving-picture display information, as illustrated in FIG. 10A (S36)); 
pausing the audio presentation and presenting the generated response while the audio presentation is paused (Suga [0152] It is preferred that the error message presented at this time also include a message to the effect that display of the moving picture has been paused automatically and will be resumed automatically when the operating state is restored from the normal state from the error state).
Suga is considered to be analogous to the claimed invention because it is in the same field of controlling a playback apparatus. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharifi in view of Taboriskiy in further view of Suga to allow for pause determination. Doing so would provide a method of eliminating disruption of viewing by notifications and recording of unnecessary video.
Sharifi in view of Taboriskiy in view of Suga does not teach
determining whether the query can be responded to with a visual response,
in response to determining that the query can be responded to with a visual response, 
and in response to determining that the query cannot be responded to with a visual response.
Carbune teaches
determining whether the query can be responded to with a visual response, (Carbune [0054] In various implementations, in response to textual input provided to the automated assistant 120 during a dialog that includes the automated assistant 120, the automated assistant 120 may generate responsive reply content based on the textual input and based on user state information. The automated assistant 120 may then provide, as output, the reply content for presentation (visual and/or audible) to one or more of the users involved in the dialog).
Carbune is considered to be analogous to the claimed invention because it is in the same field of presenting a response to a user request using an assistant. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sharifi in view of Taboriskiy in view of Suga in further view of Carbune to allow for determining if the reply content is visual or audible. Doing so would allow for taking into account user state information avoiding undue utilization of computational resources and/or other technical problems in various situations (Carbune [0042]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL J. MUELLER whose telephone number is (571)272-1875. The examiner can normally be reached M-F 8:30am-5:30pm (Eastern).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PAUL J. MUELLER/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657