Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103 is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Claim Rejections - 35 USC § 102	
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
(a) NOVELTY; PRIOR ART.—A person shall be entitled to a patent unless— 
(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention; or 
(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention. 

    PNG
    media_image1.png
    18
    19
    media_image1.png
    Greyscale
(b) EXCEPTIONS.— 
(1) DISCLOSURES MADE 1 YEAR OR LESS BEFORE THE EFFECTIVE FILING DATE OF THE CLAIMED INVENTION.—A disclosure made 1 year or less before the effective filing date of a claimed invention shall not be prior art to the claimed invention under subsection (a)(1) if— 
(A) the disclosure was made by the inventor or joint inventor or by another who obtained the subject matter disclosed directly or indirectly from the inventor or a joint inventor; or 
(B) the subject matter disclosed had, before such disclosure, been publicly disclosed by the inventor or a joint inventor or another who obtained the subject matter disclosed directly or indirectly from the inventor or a joint inventor. 
(2) DISCLOSURES APPEARING IN APPLICATIONS AND PATENTS.—A disclosure shall not be prior art to a claimed invention under subsection (a)(2) if— 
(A) the subject matter disclosed was obtained directly or indirectly from the inventor or a joint inventor;
(B) the subject matter disclosed had, before such subject matter was effectively filed under subsection (a)(2), been publicly disclosed by the inventor or a joint inventor or another who obtained the subject matter disclosed directly or indirectly from the inventor or a joint inventor; or
(C) the subject matter disclosed and the claimed invention, not later than the effective filing date of the claimed invention, were owned by the same person or subject to an obligation of assignment to the same person.

Claims 1-8, 10-17, and 19-20 are rejected under 35 USC 102(a)(1)-(a)(2) as being anticipated by Mamkina et al. (US 10032451 B1).
Regarding Claim 1, Mamkina discloses a method for processing data representing a spoken command, the method comprising: 
receiving, from a voice-controlled device, first audio data representing an utterance (Col 6, Rows 14-20, speech controlled device 110 / microphone 103 captures input audio corresponding to a spoken utterance); 
determining that the first audio data includes: 
a first portion of audio data representing a wakeword corresponding to a first speech-processing system having a first style of synthesized speech (Col 7, Rows 32-37, audio data being transmitted to server 120 having data corresponding to the wakeword; in view of Col 3, Row 65 – Col 4, Row 2, server 120 capable of performing TTS processing), and 
a second portion of audio data representing a command (Col 7, Rows 33-35, audio data 111 corresponding to input audio 11 to server 120 for the purposes of executing a command in the speech; 9,); 
determining a domain associated with the command (Col 9, Rows 1-13 and Col 10, Rows 16-21, perform ASR processing to obtain textual representation of the speech for natural language processing to determine a domain of the utterance so as to determine and narrow down which services offered by an endpoint device (e.g., the server 120 or an application server)); 
determining that the domain is associated with a second speech-processing system having a second style of synthesized speech (Col 10, Rows 16-21 and Col 33, Rows 8-12, determine an endpoint device such as server 120, application server 125 / commander processor 290 (“content source”) associated with the domain / command to be executed; in view of Col 32, Rows 32-44, a content source generates user specific TTS content based on user recognition confidence level thresholds);  
sending, from the first speech-processing system to the second speech-processing system, command data including the command (Col 33, Rows 1-15, server 120 sends output of NLU processing to send a signal to each determined content source (e.g., command processor 290 / application server 125 associated with the domain / command to be executed)); 
receiving, from the second speech-processing system, a first response to the command, the first response including a representation of first natural language corresponding to the second style of synthesized speech (Col 32, Rows 27-44, “your Bank A account balance is $500” is generated by a TTS content source / application server 125 that requires a user recognition confidence threshold of “high”); 
determining, using the first speech-processing system and the first response, a second response to the command, the second response including a representation of second natural language corresponding to the first style of synthesized speech (Col 32, Rows 27-35, “Hello John” may be created by a TTS content source (per Col 34, Rows 15-16, the server 120 may perform the TTS) that requires a user recognition confidence threshold of “low” in order to create the ultimate output of the system “Hello John, your Bank A account balance is $500”); and 
sending second audio data to the voice-controlled device corresponding to the second response (Col 34, Rows 16-17, the output audio data may be sent to and output from the speech controlled device 110).  
Regarding Claim 2, Mamkina discloses determining that the command corresponds to an action set to occur at a later time (Col 3, Rows 5-18 and Rows 30-43, banking application requires stringent user recognition confidence threshold and refrains from outputting user-customized content unless the highest user recognition confidence threshold of the content sources can be satisfied); and 
sending, from the second speech-processing system to the voice-controlled device, a second command to execute the action, wherein the second audio data includes a representation of a name of the second speech-processing system (Col 32, Rows 38-39, user customized content correspond to “your Bank A account balance is $500” where Bank A represents the bank application server 125).  
Regarding Claim 3, Mamkina discloses receiving, from the voice-controlled device, third audio data representing a second utterance (Col 6, Rows 1-4 and Col 31, Rows 40-44, ongoing exchange between the system and the user corresponding to a dialog of multiple utterances speaking commands); 
determining that the third audio data corresponds to a second command (Col 6, Rows 1-4, when user interacts with the system at runtime by speaking commands); 
determining that the second command corresponds to a user device proximate a user (Col 30, Rows 1-8, receive secondary data 809 indicting location data of speech controlled device 110 as user recognition confidence data; e.g., if the speech controlled device 110 is locate in user A’s bedroom, such location may increase user recognition confidence data associated with user A); and 
sending, to the user device via an application programming interface, an instruction corresponding to the second command (Col 31, Rows 44-51, API may be used to exchange user recognition confidence thresholds to be exchanged for purposes of processing a speech command / session).
Regarding Claims 4 and 13, Mamkina discloses a computing device (Fig. 11) comprising: 
at least one processor (Fig 11, processor 1104); and 
at least one memory including instructions that, when executed by the at least one processor (Col 35, Rows 26-35, memory 1106 / storage 1108 storing computer instructions for respective device’s processors 1104), cause the computing device to: 
receive input data corresponding to a command (Col 7, Rows 32-45, receiving audio data 111 from local device 110 for speech processing into executing system commands); 
determine, using a first natural-understanding system, that the command is associated with a second natural-understanding system (Col 9, Rows 10-15 and Col 10, Rows 16-21, server 120 performs natural language understanding processing to determine a domain of the utterance to determine and narrow down which services offered by an endpoint device (e.g., the server 120, the speech-controlled device 110, an application server, etc.) may be relevant; e.g., Col 10, Rows 22-28, telephone services, contact list service, calendar / scheduling service, music player service etc.); 
send, from the first natural-understanding system to the second natural- understanding system, first data corresponding to the command (Col 4, Rows 4-5 and Col 12, Rows 53-60, implement multiple servers with command processor 290 to receive output from NLU processing; e.g., Col 12, Rows 34-52, results of NLU processing tagged to attribute meaning to the utterance); 
receive, from the second natural-understanding system, second data corresponding to a first response to the command, the second data including a first indication of the second natural-understanding system (Col 13, Rows 1-7, in one example, NLU output includes a search utterance / requesting the return of search results, select a command processor 290 located on a search server to execute a search command and determine search results; Col 33, Rows 1-15, determine content sources / application server having access to content responsive to spoken utterance, with signal requesting respective content from the content source); 
determine, using the first natural-understanding system and the second data, third data corresponding to a second response to the command, the third data including a second indication of the first natural-understanding system (Col 34, Rows 9-16, server 120 receives content from respective content sources, perform TTS on the received text data to create output audio data); and 
cause output corresponding to the third data (Col 34, Rows 15-17, output audio data may be sent to and output from the speech controlled device 110).
  Regarding Claims 5 and 14, Mamkina discloses wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the computing device to: 
prior to causing the output, determine that the second natural-understanding system is associated with a request for permission (Col 2, Rows 26-29, Col 32, Rows 27-30, and Col 34, Rows  each content source / server may have a respective user recognition confidence threshold that must be satisfied prior to the content source providing access to the requested content), determine to cause second output corresponding to the request for permission before sending the first data (Col 34, Rows 18-21 and Rows 46-60, for example, server 120 performed NLU on input audio data and determine that bank application  content source requires a high user recognition confidence threshold be satisfied, server 120 cause the speech controlled device 110 to prompt the user “please provide an additional verification input so that I may more accurately verify you”); 
receive fourth input data corresponding to the request for permission (Col 34, Rows 51-53, receiving additional recognition data as result of prompting the user), wherein the third data further includes a representation of the request for permission (Col 34, Rows 47-60 in view of Col 32, Rows 29-44, “Hello John, our Bank A account balance is $500” where “your Bank A account balance is $500” was created using output from a banking content source / application server that requires a user recognition confidence threshold of “high” be satisfied).  
Regarding Claims 6 and 15, Mamkina discloses wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the computing device to: 
determine that the command corresponds to a second output of the second natural- understanding system, the second output occurring after the output (Col 32, Rows 29-33 and Col 34, Rows 50-51, in response to the command “What is my bank account balance”, first output “Please provide an additional verification input so that I may more accurately verify you” and thereafter output “Hello John, your Bank A account balance is $500”), wherein the third data includes a representation of a name of the second natural-understanding system (Col 32, Rows 29-33, in response to “What is my bank account balance”, the response “Hello John, your Bank A account balance is $500” where “Bank A” corresponds to a name of banking content source / application server 125).
Regarding Claims 7 and 16, Mamkina discloses wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the computing device to: 
determine that receiving the input data is associated with a first device (Col 25, Rows 16-20, to perform user recognition, determine the speech controlled device 110 from which the audio data 111 originated with a tag indicating the speech controlled device 110); and 
determine that causing the output is associated with a second device (Col 10, Rows 16-28, determine a domain of utterance so as to narrow down which services offered by an endpoint device (e.g., the server 120, the speech controlled device 110, an application server, an endpoint device offering music player service) may be relevant), wherein the third data includes a representation of a name of the second natural-understanding system (Col 32, Rows 29-33, in response to “What is my bank account balance”, the response “Hello John, your Bank A account balance is $500” where “Bank A” corresponds to a name of banking content source / application server 125; in another example, Col 2, Rows 20-23, in response to user request to play music, respond in a spoken form “playing music” before actually outputting the music content).  
Regarding Claims 8 and 17, Mamkina discloses wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the computing device to: 
determine that receiving the input data is associated with a first device (Col 25, Rows 16-20, to perform user recognition, determine the speech controlled device 110 from which the audio data 111 originated with a tag indicating the speech controlled device 110); and 
determine a user account associated with the first device (Col 2, Row 62 – Col 3, Row 4, banking service will send balance information if the requesting user is indicated by a user profile authorized to access the appropriate bank account; Col 20, Rows 59-65, user profile storage includes data regarding users of a device related to individual users, accounts etc.; Col 25, Rows 24-32, access user profile to perform user recognition); 
determine that the user account includes a third indication of the second natural- understanding system (Col 2, Row 62 – Col 3, Row 4, user profile authorizes access to appropriate bank account), wherein the third data includes a representation of a name of the second natural-understanding system (Col 32, Rows 29-33, in response to “What is my bank account balance”, the response “Hello John, your Bank A account balance is $500” where “Bank A” corresponds to a name of banking content source / application server 125).  
Regarding Claims 10 and 19, Mamkina discloses wherein the at least one memory further includes instructions to determine that the command is associated with the second natural-understanding system and that, when executed by the at least one processor, further cause the computing device to: 
determine a domain corresponding to the input data (Col 10, Rows 16-21, server 120 performs natural language understanding processing to determine a domain of the utterance to determine and narrow down which services offered by an endpoint device (e.g., the server 120, the speech-controlled device 110, an application server, etc.) may be relevant); and 
determine that the second natural-understanding system corresponds to the domain (e.g., Col 10, Rows 22-28, telephone services, contact list service, calendar / scheduling service, music player service etc.; see further, Col 31, Rows 12-15, determine that “Play my music” to send NLU results to music playing application server 125).  
Regarding Claim 11, Mamkina discloses receiving second audio data corresponding to a second command (Col 9, Rows 1-15, perform ASR processing on received sounds and pass ASR results to a server for natural language understanding processing and conversion of the ASR results / text data into commands for execution); 
determining that the second command is associated with the second natural- understanding system (Col 9, Rows 14-15, for execution either by the speech controlled device 110, by the server 120, or by another device; Col 10, Rows 16-24, perform NLU processing of speech input to determine a domain of the utterance so as to determine and narrow down which services offered by an endpoint device may be relevant); and 
sending, to the second natural-understanding system, a third command to process third audio data (Col 12, Row 53 – Col 13, Row 7, output from the NLU processing includes tagged text data and commands are sent to command processor 290 located on a same or separate server 120; e.g., a command to play music selects a command processor 290 corresponding to a music playing application; in another example, NLU output includes a request for the return of search results selects a command processor 290 of a search engine processor to execute a search command and determine search results).
Regarding Claims 12 and 20, Mamkina discloses wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the computing device to: 
determine that the second data lacks first information (Col 13, Rows 1-7, in one example, NLU output includes a search utterance / requesting the return of search results, select a command processor 290 located on a search server to execute a search command and determine search results); 
send, from the first natural-understanding system to the second natural-understanding system, fourth data corresponding to a request for the first information (in view of Col 33, Rows 8-15, server 120 may send a signal to each determined content source (e.g., command processor 290 / application server 125) with the signal requesting respective content from the content source); and 
receive, from the second natural-understanding system, fifth data corresponding to the first information (Col 13, Rows 1-7, search server returns search results).
Claim Rejections - 35 USC § 103
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 103 that form the basis for the rejections under this section made in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 9 and 18 are rejected under 35 USC 103(a) as being unpatentable over Mamkina et al. (US 10032451 B1) in view of Fainberg et al. (US 11100923 B2).
 Regarding Claims 9 and 18, Mamkina discloses wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the computing device to: determine that the input data includes a representation of a wakeword (Col 6, Rows 18-20 and Col 7, Rows 32-37, device 110 has a wakeword detection module 220 to process audio data to determine if a wakeword is detected and transmitted to server 120 for speech processing of the wakeword).
Mamkina does not disclose determine that the input data includes a representation of a wakeword associated with the first natural-understanding system.
Fainberg discloses determine input data includes a representation of a wakeword associated with a first natural understanding system (Col 2, Row 66 – Col 3, Row15, extract detected sound to identify a wakeword and transmit the sound data to an appropriate voice assistant service (VAS) / remote service implemented using cloud servers such as Amazon Alexa, Apple Siri, Microsoft Cortana etc.; Col 24, Rows 28-35, identify wake word “Alexa” and invoke Amazon VAS; identify “Ok Google” and invoke Google VAS).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to determine that the input data includes a representation of a wakeword associated with the first natural understanding system as taught by Fainberg in order to invoke an appropriate voice assistant service for interpretation (Fainberg, Col 2, Rows 63-67).
Conclusion
Prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
US 10002189 B2 teaches an apparatus receiving a search string, generating a semantic representation of the search string in accordance with an ontology, searching the database using the semantic representation, and outputting a result of searching. 
US 2003/0125955 A1 discloses a distributed speech recognition system provides speech-driven control and remote service access. The distributed speech recognition system comprises a client device and a central server having response synthesis manager.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to examiner Richard Z. Zhu whose telephone number is 571-270-1587 or examiner’s supervisor King Poon whose telephone number is 571-272-7440. Examiner Richard Zhu can normally be reached on M-Th, 0730:1700.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RICHARD Z ZHU/Primary Examiner, Art Unit 2675                                                                                                                                                                                                        05/10/2022