DETAILED ACTION
This office action is in response to Applicant’s submission filed on 8/28/2020. Claims 1-21 are pending in the application. As such, claims 1- 21 have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement(s)(IDS) submitted on 1/27/2022 has been considered by the examiner.
Drawings
The drawing filed on 8/28/2020 have been accepted and considered by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.



Claims 1, 7, 8, 15, 17, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Labsky et al.  (US20140058732A1) (hereinafter "Labsky") .

Regarding claims 1, 8, and 15 Labsky teaches a system, device and method comprising: a processor of a client device; a memory configured to store a plurality of instructions, which, when executed, cause the processor to: (Labsky, Par. 0012:” One such embodiment comprises a computer program product that has a computer-storage medium [e.g., a non-transitory, tangible, computer-readable media, disparately located or commonly located storage media, computer storage media or medium, etc.] including computer program logic encoded thereon that, when performed in a computerized device having a processor and corresponding memory, programs the processor to perform [or causes the processor to perform] the operations disclosed herein. Such arrangements are typically provided as software, firmware, microcode, code data [e.g., data structures], etc., arranged or encoded on a computer readable storage medium such as an optical medium [e.g., CD-ROM], floppy disk, hard disk, one or more ROM or RAM or PROM chips, an Application Specific Integrated Circuit [ASIC], a field-programmable gate array [FPGA], and so on.”, and Par. 0016:” Also, it is to be understood that each of the systems, methods, apparatuses, etc. herein can be embodied strictly as a software program, as a hybrid of software and hardware, or as hardware alone such as within a processor, or within an operating system or within a software application, or via a non-software application such a person performing all or part of the operations.”).
encode an audio input signal received from a microphone, the audio input signal comprising a request; (Labsky, Par. 0056:” In step 300, the speech recognition response microphone.”, and Par. 0004:” For example, a speaker can utter a command to execute a specific task, or utter a query to retrieve specific results. Spoken input can follow a rigid set of phrases that perform specific tasks, or spoken input can be natural language, which is interpreted by a natural language unit of a speech recognition system.”).
transmit the encoded audio input signal to a cloud service that is configured to generate a primary response; (Labsky, Par. 0011:” The client device transmits [ implied encoding] at least a portion of the spoken utterance over a communication network to a remote automated speech recognizer that analyzes spoken utterances and returns remote speech recognition results [primary response], such as by a network-accessible server.”).
determine an initial response to the request, the initial response corresponding to a first audio clip; (Labsky, Par. 0013:” …. prior to receiving a remote speech recognition result from the remote automated speech recognizer, initiating a response [initial response] via a user interface of the client electronic device, the response corresponding to the spoken utterance, at least an initial portion of the response is based on a local speech recognition result from the local automated speech recognizer …”).
render the first audio clip for presentation prior to receiving the primary response, the primary response corresponding to a second audio clip; and (Labsky, Par. 0027:” …With techniques herein, however, the client device 112 can initiate a response to spoken utterance prior to having any specific results [primary response]. For example, client device can analyze the spoken utterance 107 and identify that the user is searching for something. initiate a response via a user interface [render], such as with a text-to-speech system. In this non-limiting example, the local recognizer initiates producing or speaking word 151, ‘Searching the Internet For.’ These introductory or filler words are then modified by adding words 152, ‘Apple Pie Recipe,’ which are presented after words 151. With such a technique, a response to user input is initiated via a user interface prior to having complete results, and then the UI response is modified [in this example the UI response is added-to] to convey results corresponding to the spoken query, such as search results.”, and Par. 0011:”For example, in embodiments in which the user interface includes a text-to-speech system, the client device can begin speaking words as if the client device were already in possession of results, and then add [append] words to the response after receiving results from a remote server. An initial response then begins immediately instead of waiting for results from all recognizers. This reduces a perceived delay by the user because even with the initial response comprising filler or introductory words, the commencement of a response signals to the user that results have been retrieved and the client device is in the process of conveying the results.”).
append the second audio clip to follow the first audio clip, the second audio clip being presented after the presentation of the first audio clip. (Labsky, Par. 0011:” For example, in embodiments in which the user interface includes a text-to-speech system, the client device can begin speaking words as if the client device were already in possession of results, and then add [append] words to the response after receiving results from a remote server. An initial response then begins immediately instead of waiting for results from all recognizers. This reduces a perceived delay by the user because even with the initial response comprising filler or introductory words, the commencement of a response signals to the user that results have been retrieved and the client device is in the process of conveying the results.”, and Par. 0082:”In step 340, the speech recognition response manager modifies the response after the response has been initiated and prior to completing delivery of the response via the user interface such that modifications to the response [appending process] are delivered via the user interface [rendering] as a portion of the response. The modifications are based on the remote speech recognition result.”).

Regarding claim 7 Labsky teaches the system of claim 1, wherein the plurality of instructions, which, when executed, further cause the processor to generate the second audio clip from the primary response. (Labsky, Par. Labsky, Par. 0011:” For example, in embodiments in which the user interface includes a text-to-speech system, the client device can begin speaking words as if the client device were already in possession of results, and then add [append] words to the response after receiving results [second audio] from a remote server. An initial [primary] response then begins immediately instead of waiting for results from all recognizers. This reduces a perceived delay by the user because even with the initial response comprising filler or introductory words, the commencement of a response signals to the user that results have been retrieved and the client device is in the process of conveying the results.”, and Par. 0082:”In step 340, the speech recognition response manager modifies the response after the response has been initiated and prior to completing delivery of the response via the user interface such that modifications to the response [appending process] are delivered via the user interface [rendering] as a portion of the response. The modifications are based on the remote speech recognition result.”).

Regarding claim 17 Labsky teaches the client device of claim 15, further comprising: categorizing, by the local device, the audio input signal into a directional result, the directional result indicating whether the remote service is able to respond to the request, wherein the initial response is generated according to the directional result. (Labsky, Par. 0011:” The speech recognition response manager or client device receives a spoken utterance at a client electronic device. The client electronic device includes a local automated speech recognizer. The speech recognition response manager analyzes the spoken utterance using the local automated speech recognizer. The client device transmits at least a portion of the spoken utterance over a communication network to a remote automated speech recognizer that analyzes spoken utterances and returns remote speech recognition results, such as by a network-accessible server. Prior to receiving a remote speech recognition result from the remote automated speech recognizer, the speech recognition response manager initiates a response [initial response] via a user interface of the client electronic device. The response corresponds to the spoken utterance [directional response]. At least an initial portion of the response is based on a local speech recognition result [directional response] from the local automated speech recognizer. The speech recognition response manager can then modify the response after the response has been initiated and prior to completing delivery of the response via the user interface such that modifications to the response are delivered via the user interface as a portion of the response. Such modifications are based on the remote speech recognition result. For example, in embodiments in which the user interface includes a text-to-speech system, the client device can begin speaking words as if the client device were already in possession of results, and then add words to the response after receiving results from a remote server. An initial response then begins immediately instead of waiting for results from all recognizers.”) Note: directional result is attained when the continuum result from the server is provided as taught by Labsky as if the entire result is spoken from the same source.

Regarding claim 20 Labsky teaches the client device of claim 15, further comprising generating, by the local device, the second audio clip from the primary response. (Labsky, Par. Labsky, Par. 0011:” For example, in embodiments in which the user interface includes a text-to-speech system, the client device can begin speaking words as if the client device were already in possession of results, and then add [append] words to the response after receiving results [second audio] from a remote server. An initial [primary] response then begins immediately instead of waiting for results from all recognizers. This reduces a perceived delay by the user because even with the initial response comprising filler or introductory words, the commencement of a response signals to the user that results have been retrieved and the client device is in the process of conveying the results.”, and Par. 0082:”In step 340, the speech recognition response manager modifies the response after the response has been initiated and prior to completing delivery of the response via the user interface such that modifications to the response [appending process] are delivered rendering] as a portion of the response. The modifications are based on the remote speech recognition result.”).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 2, 9, 10, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Labsky, and in further view of Qi Zhang (US20210256057A1)(hereinafter "Zhang").

Regarding claims 2 and 16, Labsky fails to explicitly disclose, however, Zhang teaches wherein the first audio clip is a predetermined audio clip that is stored in the memory prior to the audio input signal being received from the microphone. (Zhang, claim 8:” A method for playing audio, applied to a server, the method comprising: receiving a response request sent by a terminal; obtaining, based on the response request, a plurality of response audio data; sending, in response to determining that at least one piece of response audio data of the plurality of response audio data is synthesized into a response audio clip, the response audio clip to the terminal until finishing sending the plurality of response audio data.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Labsky, in view of Zhang to wherein the first audio clip is a predetermined audio clip that is stored in the memory prior to the audio input signal being received from the microphone, in order to improve accuracy of the information conveyed to the user during the voice playback of the terminal, as evidence by Zhang (see Par. 0079).

Regarding claim 9, Labsky fails to explicitly disclose, however, Zhang teaches wherein the first audio clip is a predetermined audio clip that is stored in the memory prior to the input signal being received at the client device. (Zhang, claim 8:” A method for playing audio, applied to a server, the method comprising: receiving a response request sent by a obtaining, based on the response request, a plurality of response audio data; sending, in response to determining that at least one piece of response audio data of the plurality of response audio data is synthesized into a response audio clip, the response audio clip to the terminal until finishing sending the plurality of response audio data.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Labsky, in view of Zhang to wherein the first audio clip is a predetermined audio clip that is stored in the memory prior to the input signal being received at the client device, in order to improve accuracy of the information conveyed to the user during the voice playback of the terminal, as evidence by Zhang (see Par. 0079).

Regarding claim 10, Labsky fails to explicitly disclose, however, Zhang teaches wherein the first audio clip is selected from a library of predetermined audio clips stored in the memory. (Zhang, claim 8:” A method for playing audio, applied to a server, the method comprising: receiving a response request sent by a terminal; obtaining, based on the response request, a plurality of response audio data; sending, in response to determining that at least one piece of response audio data of the plurality of response audio data is synthesized into a response audio clip, the response audio clip to the terminal until finishing sending the plurality of response audio data.”). Note: plurality of the response audio data implies storage in the memory.
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Labsky, in view of Zhang to .

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Labsky, and in further view of Jon Schiller (US20110035222A1)(hereinafter "Schiller").

Regarding claim 3, Labsky fails to explicitly disclose, however, Schiller teaches the system of claim 1, wherein the first audio clip is randomly selected from a library of predetermined audio clips stored in the memory. (Schiller, Par. 0018:” The electronic device can select which of several audio clips to play back using any suitable approach. In some embodiments, the user can direct an audio clip for playback. In some embodiments, the electronic device can instead randomly select an audio clip, or cycle through the available audio clips each time an audio clip for the text item is provided. In some embodiments, the electronic device can instead or in addition select an audio clip based on an attribute of a media item being played back. For example, the electronic device can select an audio clip based on an attribute [e.g., metadata] of the played back media, media playlist, past or future media, or any other suitable media item. In some embodiments, the electronic device can select an audio clip based on an attribute of the environment of the electronic device playing back the media.”)
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Labsky, in view of Schiller to .

Claims 4, 5, 11, 12, 14, 19 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Labsky, and in further view of Aher et al. (US20200167384A1)(hereinafter "Aher").

Regarding claim 4, Labsky fails to explicitly disclose, however, Aher teaches the system of claim 1, wherein the plurality of instructions, which, when executed, further cause the processor to determine the initial response to the request by applying a Deep Neural Network (DNN) algorithm to the audio input signal to generate the initial response. (Aher, Par. 0032:” The control circuitry 304 may include audio processing circuitry and/or audio generation circuitry, other digital encoding or decoding circuitry, or any other suitable audio circuits or combinations of such circuits. Encoding circuitry [e.g., for converting received audio input or digital signals to audio signals for analysis or storage] may also be provided. The audio circuitry may be used by the media device 300 to receive, process, and generate audio input [e.g., the search query 104] or output [e.g., the audio response 108].”, and Par. 0043:”At block 418, the control circuitry 304 generates audio output using the voice profile of the personality [identified at block 414]. The audio output generated by the control circuitry 304 is an audio response 108 that includes the answer to the search query 104. In some embodiments, the audio response 108 further includes the phrase, tune, or jingle identified at block 416 [initial response]. For example, the control circuitry employ deep learning neural networks to generate the audio output using the voice profile of the personality.“).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Labsky, in view of Aher to further cause the processor to determine the initial response to the request by applying a Deep Neural Network (DNN) algorithm to the audio input signal to generate the initial response, in order to improve the user experience of interactive searching tools by having audio responses that are more interactive and contextually relevant to the search queries provided, as evidence by Aher (See Par. 0002).

Regarding claim 5, Labsky further teaches wherein the DNN algorithm is configured to categorize the audio input signal into a directional result, the directional result indicating whether the cloud service is able to respond to the request, wherein the initial response is determined according to the directional result. (Labsky, Par. 0011:” The speech recognition response manager or client device receives a spoken utterance at a client electronic device. The client electronic device includes a local automated speech recognizer. The speech recognition response manager analyzes the spoken utterance using the local automated speech recognizer. The client device transmits at least a portion of the spoken utterance over a communication network to a remote automated speech recognizer that analyzes spoken utterances and returns remote speech recognition results, such as by a Prior to receiving a remote speech recognition result from the remote automated speech recognizer, the speech recognition response manager initiates a response [initial response] via a user interface of the client electronic device. The response corresponds to the spoken utterance [directional response]. At least an initial portion of the response is based on a local speech recognition result [directional response] from the local automated speech recognizer. The speech recognition response manager can then modify the response after the response has been initiated and prior to completing delivery of the response via the user interface such that modifications to the response are delivered via the user interface as a portion of the response. Such modifications are based on the remote speech recognition result. For example, in embodiments in which the user interface includes a text-to-speech system, the client device can begin speaking words as if the client device were already in possession of results, and then add words to the response after receiving results from a remote server. An initial response then begins immediately instead of waiting for results from all recognizers.”) Note: directional result is attained when the continuum result from the server is provided as taught by Labsky as if the entire result is spoken from the same source.

Regarding claim 11, Labsky fails to explicitly disclose, however, Aher teaches the system of claim 1, wherein the processor is further configured to determine the initial response to the request by applying an artificial intelligence algorithm to the input signal to generate the initial response. (Aher, Par. 0032:” The control circuitry 304 may include audio processing circuitry and/or audio generation circuitry, other digital encoding or decoding audio circuits or combinations of such circuits. Encoding circuitry [e.g., for converting received audio input or digital signals to audio signals for analysis or storage] may also be provided. The audio circuitry may be used by the media device 300 to receive, process, and generate audio input [e.g., the search query 104] or output [e.g., the audio response 108].”, and Par. 0043:”At block 418, the control circuitry 304 generates audio output using the voice profile of the personality [identified at block 414]. The audio output generated by the control circuitry 304 is an audio response 108 that includes the answer to the search query 104. In some embodiments, the audio response 108 further includes the phrase, tune, or jingle identified at block 416 [initial response]. For example, the control circuitry 304 may execute one or more speech synthesis algorithms, including diphone synthesis, domain-specific synthesis, formant synthesis, articulatory synthesis, hidden Markov models-based synthesis, and sinewave synthesis, and/or may employ deep learning neural networks [AI] to generate the audio output using the voice profile of the personality.“).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Labsky, in view of Aher to wherein the processor is further configured to determine the initial response to the request by applying an artificial intelligence algorithm to the input signal to generate the initial response, in order to improve the user experience of interactive searching tools by having audio responses that are more interactive and contextually relevant to the search queries provided, as evidence by Aher (See Par. 0002).

client device receives a spoken utterance at a client electronic device. The client electronic device includes a local automated speech recognizer. The speech recognition response manager analyzes the spoken utterance using the local automated speech recognizer. The client device transmits at least a portion of the spoken utterance over a communication network to a remote automated speech recognizer that analyzes spoken utterances and returns remote speech recognition results, such as by a network-accessible server. Prior to receiving a remote speech recognition result from the remote automated speech recognizer, the speech recognition response manager initiates a response [initial response] via a user interface of the client electronic device. The response corresponds to the spoken utterance [directional response]. At least an initial portion of the response is based on a local speech recognition result [directional response] from the local automated speech recognizer. The speech recognition response manager can then modify the response after the response has been initiated and prior to completing delivery of the response via the user interface such that modifications to the response are delivered via the user interface as a portion of the response. Such modifications are based on the remote speech recognition result. For example, in embodiments in which the user interface includes a text-to-speech system, the client device can begin speaking words as if the client device were already in possession of results, and then add words to the response after An initial response then begins immediately instead of waiting for results from all recognizers.”, and Par. 0085:” In step 360, the speech recognition response manager identifies that the initiated response conveyed via a text-to-speech system is incorrect [binary zero] based on remote speech recognition results, and in response corrects [binary one] the initiated response using an audible excuse transition, such as word-based [“pardon me”] and/or otherwise [throat clearing sound].”) Note: directional result is attained when the continuum result from the server is provided as taught by Labsky as if the entire result is spoken from the same source.

Regarding claim 14, Labsky further teaches the client device of claim 11, wherein the second audio clip is received from the cloud service as part of the primary response. (Labsky, Par. 0011:” For example, in embodiments in which the user interface includes a text-to-speech system, the client device can begin speaking words as if the client device were already in possession of results, and then add [append] words to the response after receiving results from a remote server [cloud]. An initial response then begins immediately instead of waiting for results from all recognizers. This reduces a perceived delay by the user because even with the initial response comprising filler or introductory words, the commencement of a response signals to the user that results have been retrieved and the client device is in the process of conveying the results.”).

Regarding claim 19, Labsky fails to explicitly disclose, however, Aher teaches the client device of claim 15, wherein the local device comprises a locally executed artificial audio processing circuitry and/or audio generation circuitry, other digital encoding or decoding circuitry, or any other suitable audio circuits or combinations of such circuits. Encoding circuitry [e.g., for converting received audio input or digital signals to audio signals for analysis or storage] may also be provided. The audio circuitry may be used by the media device 300 to receive, process, and generate audio input [e.g., the search query 104] or output [e.g., the audio response 108].”, and Par. 0043:”At block 418, the control circuitry 304 generates audio output using the voice profile of the personality [identified at block 414]. The audio output generated by the control circuitry 304 is an audio response 108 that includes the answer to the search query 104. In some embodiments, the audio response 108 further includes the phrase, tune, or jingle identified at block 416 [initial response]. For example, the control circuitry 304 may execute one or more speech synthesis algorithms, including diphone synthesis, domain-specific synthesis, formant synthesis, articulatory synthesis, hidden Markov models-based synthesis, and sinewave synthesis, and/or may employ deep learning neural networks [AI] to generate the audio output using the voice profile of the personality.“).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Labsky, in view of Aher to wherein the local device comprises a locally executed artificial intelligence module configured to generate the initial response, in order to improve the user experience of interactive searching tools by having audio responses that are more interactive and contextually relevant to the search queries provided, as evidence by Aher (See Par. 0002).

Regarding claim 21, Labsky fails to explicitly disclose, however, Aher teaches the client device of claim 15, further comprising generating the initial response to the request by applying a Deep Neural Network (DNN) algorithm to the audio input signal to generate the initial response. (Aher, Par. 0032:” The control circuitry 304 may include audio processing circuitry and/or audio generation circuitry, other digital encoding or decoding circuitry, or any other suitable audio circuits or combinations of such circuits. Encoding circuitry [e.g., for converting received audio input or digital signals to audio signals for analysis or storage] may also be provided. The audio circuitry may be used by the media device 300 to receive, process, and generate audio input [e.g., the search query 104] or output [e.g., the audio response 108].”, and Par. 0043:” At block 418, the control circuitry 304 generates audio output using the voice profile of the personality [identified at block 414]. The audio output generated by the control circuitry 304 is an audio response 108 that includes the answer to the search query 104. In some embodiments, the audio response 108 further includes the phrase, tune, or jingle identified at block 416 [initial response]. For example, the control circuitry 304 may execute one or more speech synthesis algorithms, including diphone synthesis, domain-specific synthesis, formant synthesis, articulatory synthesis, hidden Markov models-based synthesis, and sinewave synthesis, and/or may employ deep learning neural networks to generate the audio output using the voice profile of the personality.“).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Labsky, in view of Aher to further comprising generating the initial response to the request by applying a Deep Neural .

Claims 6, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Labsky, Aher and in further view of Venkataraman et al. (US20170161320A1)(hereinafter " Venkataraman ").

Regarding claim 6, Labsky and Aher fail to explicitly disclose, however, Aher teaches wherein the DNN algorithm is configured (Aher, Par. 0032:”… For example, the control circuitry 304 may execute one or more speech synthesis algorithms, including diphone synthesis, domain-specific synthesis, formant synthesis, articulatory synthesis, hidden Markov models-based synthesis, and sinewave synthesis, and/or may employ deep learning neural networks to generate the audio output using the voice profile of the personality.“
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Labsky, in view of Aher to wherein the DNN algorithm is configured, in order to improve the user experience of interactive searching tools by having audio responses that are more interactive and contextually relevant to the search queries provided, as evidence by Aher (See Par. 0002).
Regarding claim 6, Labsky and Aher fail to explicitly disclose, however, Venkataraman teaches the system of claim 4, [[wherein the DNN algorithm is configured]] to identify a generates a response to a natural language query as described by process 1100 of FIG. 11. At step 1102, the control circuitry receives, from a user input interface, a natural language query. “, and Par. 0116:” At step 1104, control circuitry 904 determines which query template of a plurality of query templates corresponds to the natural language query. As referred to herein, the term ‘query template’ refers to a generalized template for a specific type of query. For example, a basic query template may be ‘Show me <. . . >,’ where the dots represent what the user wants to be shown. A more complete query template may be ‘Show me <. . . >,’ directed by <. . . >. Thus, a natural language query ‘Show me movies [topic] directed by James Cameron will fit this query template. It should be noted that these two query templates are used as examples and other more complex query templates may be used by the system described.”, and Par. 0125:” At step 1106, control circuitry 904 retrieves one or more search results corresponding to the natural language query. The control circuitry may determine a search query based on the query template. To continue with example above, ‘show me’ may be excluded from the search because it is part of the template. So, the search string may include terms such as ‘movie,’ ‘directed,’ and ‘James Cameron.” When the search results are retrieved, the control circuitry may extract movie titles from the results or this can be done by another system before the results reach the control circuitry.”, and par. 0126:” At step 1108, control circuitry 904 selects, based on a selection criteria, one or more attributes of a plurality of attributes associated with a user. selecting the one or more attributes. …. Attributes associated with users may be stored in a database on a server and cached locally to a user equipment device as required.”, and Par. 0134:” At step 1110, control circuitry 904 identifies, based on the one or more attributes, a response template of a plurality of response templates previously assigned to the query template. Various ways may be used to make the identification. For example, if the control circuitry determines that time of day in the user's location is selected attribute, the control circuitry may use process 1400 of FIG. 14 to identify an appropriate response template.’, and Par. 0142:’ At step 1112, control circuitry 904 generates a response to the natural language query based on the identified response template and the retrieved one or more search results. For example, as described above, if the natural language query is: ‘Who directed Titanic,’ and the control circuitry selects time of day at the user's location as an attribute associated with the user, then the control circuitry may select a response template that is of the shortest length, based on that time of the day. As a result, the control circuitry may generate a response ‘James Cameron’ without any other words.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Labsky, and Aher in view of Venkataraman to [[wherein the DNN algorithm is configured to]] identify a topic associated with the audio input signal, wherein the plurality of instructions, which, when executed, further cause the processor to identify the initial response based on the identified topic, in order to generate an intelligent response to a natural language query based on one or more 

Regarding claim 13, Labsky fails to explicitly disclose, however, Aher teaches wherein the artificial intelligence algorithm is configured (Aher, Par. 0032:”… For example, the control circuitry 304 may execute one or more speech synthesis algorithms, including diphone synthesis, domain-specific synthesis, formant synthesis, articulatory synthesis, hidden Markov models-based synthesis, and sinewave synthesis, and/or may employ deep learning neural networks [artificial intelligence] to generate the audio output using the voice profile of the personality.“
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Labsky, in view of Aher to wherein the artificial intelligence algorithm is configured, in order to improve the user experience of interactive searching tools by having audio responses that are more interactive and contextually relevant to the search queries provided, as evidence by Aher (See Par. 0002).

Regarding claim 13, Labsky and Aher fail to explicitly disclose, however, Venkataraman teaches the client device of claim 11, [[wherein the artificial intelligence algorithm is configured]] to identify a topic associated with the input signal, wherein the plurality of instructions, which, when executed, further cause the processor to identify the initial response based on the identified topic. (Venkataraman, Par. 0114:” In some embodiments generates a response to a natural language query as described by process 1100 of FIG. 11. At step 1102, the control circuitry receives, from a user input interface, a natural language query. “, and Par. 0116:” At step 1104, control circuitry 904 determines which query template of a plurality of query templates corresponds to the natural language query. As referred to herein, the term ‘query template’ refers to a generalized template for a specific type of query. For example, a basic query template may be ‘Show me <. . . >,’ where the dots represent what the user wants to be shown. A more complete query template may be ‘Show me <. . . >,’ directed by <. . . >. Thus, a natural language query ‘Show me movies [topic] directed by James Cameron will fit this query template. It should be noted that these two query templates are used as examples and other more complex query templates may be used by the system described.”, and Par. 0125:” At step 1106, control circuitry 904 retrieves one or more search results corresponding to the natural language query. The control circuitry may determine a search query based on the query template. To continue with example above, ‘show me’ may be excluded from the search because it is part of the template. So, the search string may include terms such as ‘movie,’ ‘directed,’ and ‘James Cameron.” When the search results are retrieved, the control circuitry may extract movie titles from the results or this can be done by another system before the results reach the control circuitry.”, and par. 0126:” At step 1108, control circuitry 904 selects, based on a selection criteria, one or more attributes of a plurality of attributes associated with a user. For example, process 1300 of FIG. 13 illustrates one possible method of selecting the one or more attributes. …. Attributes associated with users may be stored in a database on a server and cached locally identifies, based on the one or more attributes, a response template of a plurality of response templates previously assigned to the query template. Various ways may be used to make the identification. For example, if the control circuitry determines that time of day in the user's location is selected attribute, the control circuitry may use process 1400 of FIG. 14 to identify an appropriate response template.’, and Par. 0142:’ At step 1112, control circuitry 904 generates a response to the natural language query based on the identified response template and the retrieved one or more search results. For example, as described above, if the natural language query is: ‘Who directed Titanic,’ and the control circuitry selects time of day at the user's location as an attribute associated with the user, then the control circuitry may select a response template that is of the shortest length, based on that time of the day. As a result, the control circuitry may generate a response ‘James Cameron’ without any other words.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Labsky, and Aher in view of Venkataraman to [[wherein the artificial intelligence algorithm is configured to]] identify a topic associated with the input signal, wherein the plurality of instructions, which, when executed, further cause the processor to identify the initial response based on the identified topic, in order to generate an intelligent response to a natural language query based on one or more attributes associated with a device that receives the query, as evidence by Venkataraman (See Par. 0152).

Claim  18 is rejected under 35 U.S.C. 103 as being unpatentable over Labsky, and in further view of Venkataraman.

Regarding claim 18, Labsky fails to explicitly disclose, however, Venkataraman teaches the client device of claim 15, further comprising:  identifying, by the local device, a topic associated with the audio input signal; and identify, by the local device, the initial response based on the identified topic. (Venkataraman, Par. 0114:” In some embodiments control circuitry 904 generates a response to a natural language query as described by process 1100 of FIG. 11. At step 1102, the control circuitry receives, from a user input interface, a natural language query. “, and Par. 0116:” At step 1104, control circuitry 904 determines which query template of a plurality of query templates corresponds to the natural language query. As referred to herein, the term ‘query template’ refers to a generalized template for a specific type of query. For example, a basic query template may be ‘Show me <. . . >,’ where the dots represent what the user wants to be shown. A more complete query template may be ‘Show me <. . . >,’ directed by <. . . >. Thus, a natural language query ‘Show me movies [topic] directed by James Cameron will fit this query template. It should be noted that these two query templates are used as examples and other more complex query templates may be used by the system described.”, and Par. 0125:” At step 1106, control circuitry 904 retrieves one or more search results corresponding to the natural language query. The control circuitry may determine a search query based on the query template. To continue with example above, ‘show me’ may be excluded from the search because it is part of the template. So, the search string may include terms such as ‘movie,’ extract movie titles from the results or this can be done by another system before the results reach the control circuitry.”, and par. 0126:” At step 1108, control circuitry 904 selects, based on a selection criteria, one or more attributes of a plurality of attributes associated with a user. For example, process 1300 of FIG. 13 illustrates one possible method of selecting the one or more attributes. …. Attributes associated with users may be stored in a database on a server and cached locally to a user equipment device as required.”, and Par. 0134:” At step 1110, control circuitry 904 identifies, based on the one or more attributes, a response template of a plurality of response templates previously assigned to the query template. Various ways may be used to make the identification. For example, if the control circuitry determines that time of day in the user's location is selected attribute, the control circuitry may use process 1400 of FIG. 14 to identify an appropriate response template.’, and Par. 0142:’ At step 1112, control circuitry 904 generates a response to the natural language query based on the identified response template and the retrieved one or more search results. For example, as described above, if the natural language query is: ‘Who directed Titanic,’ and the control circuitry selects time of day at the user's location as an attribute associated with the user, then the control circuitry may select a response template that is of the shortest length, based on that time of the day. As a result, the control circuitry may generate a response ‘James Cameron’ without any other words.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Labsky, in view of .

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Goel (US-11194973B1) teaches: Col. 2, lines 24 – 33:” Certain systems may be configured to perform actions responsive to user inputs. For example, for the user input of “Alexa, play Adele music,” a system may output music sung by an artist named Adele. For further example, for the user input of “Alexa, what is the weather,” a system may output synthesized speech representing weather information for a geographic location of the user. In a further example, for the user input of “Alexa, send a message to John,” a system may capture spoken message content and cause same to be output via a device registered to “John.”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689. The examiner can normally be reached Monday - Thursday and alternate Fridays, 7:30-4:30 PT.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DARIOUSH AGAHI/             Examiner, Art Unit 2656                                                                                                                                                                                           
/HUYEN X VO/             Primary Examiner, Art Unit 2656