Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Acknowledgement  
Acknowledgement is made of applicant’s amendment made on 09/12/2022. Applicant’s submission filed has been entered and made of record.
Status of the Claims
Claims 1-20 are pending.
Response to Applicant’s Arguments
In response to “Conversely, the claim recites using a first natural-understanding system to determine that the command is associated with a second natural-understanding system which is different from a speech processing system having access to multiple content sources through the single speech processing system. In other words, the system as claimed uses the first natural-understanding system to associate the command with the second natural-understanding system. Then the system generates an indication that the second natural-understanding system is capable of processing the first data associated with the particular command that is received by the first natural-understanding system. In particular, the claim utilizes the first natural-understanding system to engage the second natural-understanding system without the second natural- understanding system directly receiving input data. Thus, the claim recites a system for determining, using a first natural-understanding system, that the command is associated with a second natural-understanding system; determining a first indication that the second natural- understanding system is capable of processing first data corresponding to the command”.
In view of such amendment to claims 1, 4, and 13, anticipation rejection has been withdrawn. Upon further consideration and search, please see details of a new ground of prior art rejection set forth below. 
Claim Rejections - 35 USC § 103
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 103 that form the basis for the rejections under this section made in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-8, 10-17, and 19-20 are rejected under 35 USC 103(a) as being unpatentable over Mamkina et al. (US 10032451 B1) in view of Mutagi et al. (US 9984686 B1).
Regarding Claim 1, Mamkina discloses a method for processing data representing a spoken command, the method comprising: 
receiving, from a voice-controlled device, first audio data representing an utterance (Col 6, Rows 14-20, speech controlled device 110 / microphone 103 captures input audio corresponding to a spoken utterance); 
determining that the first audio data includes: 
a first portion of audio data representing a wakeword corresponding to a first speech-processing system having a first style of synthesized speech (Col 7, Rows 32-37, audio data being transmitted to server 120 having data corresponding to the wakeword; in view of Col 3, Row 65 – Col 4, Row 2, server 120 capable of performing TTS processing), and 
a second portion of audio data representing a command (Col 7, Rows 33-35, audio data 111 corresponding to input audio 11 to server 120 for the purposes of executing a command in the speech; 9,); 
determining a domain associated with the command (Col 9, Rows 1-13 and Col 10, Rows 16-21, perform ASR processing to obtain textual representation of the speech for natural language processing to determine a domain of the utterance so as to determine and narrow down which services offered by an endpoint device (e.g., the server 120 or an application server)); 
determining that the domain is associated with a second speech-processing system having a second style of synthesized speech (Col 10, Rows 16-21 and Col 33, Rows 8-12, determine an endpoint device such as server 120, application server 125 / commander processor 290 (“content source”) associated with the domain / command to be executed; in view of Col 32, Rows 32-44, a content source generates user specific TTS content based on user recognition confidence level thresholds);  
sending, from the first speech-processing system to the second speech-processing system, command data including the command (Col 33, Rows 1-15, server 120 sends output of NLU processing to send a signal to each determined content source (e.g., command processor 290 / application server 125 associated with the domain / command to be executed)); 
receiving, from the second speech-processing system, a first response to the command, the first response including a representation of first natural language corresponding to the second style of synthesized speech (Col 32, Rows 27-44, “your Bank A account balance is $500” is generated by a TTS content source / application server 125 that requires a user recognition confidence threshold of “high”); 
determining, using the first speech-processing system and the first response, a second response to the command, the second response including a representation of second natural language corresponding to the first style of synthesized speech (Col 32, Rows 27-35, “Hello John” may be created by a TTS content source (per Col 34, Rows 15-16, the server 120 may perform the TTS) that requires a user recognition confidence threshold of “low” in order to create the ultimate output of the system “Hello John, your Bank A account balance is $500”); and 
sending second audio data to the voice-controlled device corresponding to the second response (Col 34, Rows 16-17, the output audio data may be sent to and output from the speech controlled device 110).  
Mamkina does not disclose determining an indication that the second speech processing system is capable of processing command data including the command (Col 10, Rows 16-21, determining a domain of the utterance so as to narrow down which services offered by an endpoint device (e.g., server 120) may be relevant).
Mutagi teaches a system for processing spoken command data comprising a voice controlled device and a plurality of speech processing systems (Fig. 1 and see Col 7, Rows 20-52, voice controlled device 104 uploading audio signal comprising natural language command to remote server 112 comprising multiple servers / speech processing systems) by determining a domain associated with the command (Col 12, Rows 59-65, after identifying a command from an audio signal, route the request to the appropriate domain at the remote service 112), determining an indication that a second speech processing system is capable of processing command data including the command (Col 13, Rows 5-9, Rows 22-25, and Rows 58-64, by having a device capability abstraction module 722 mapping capabilities of new devices to a set of predefined device capabilities, when a user issued requests to create a group consisting of devices capable of being dimmed, orchestration component 128 is able to identify which devices have this particular capability and route the user’s request to the appropriate location within remote service 112; Fig. 7 and Col 14, Rows 20-25, orchestration component 128 routes the request to turn on desk lamp to appropriate secondary device drivers 122, which may in turn generate the appropriate command), and sending, from a first speech processing system to the second speech processing system, the command data (Col 16, Rows 43-49, when the user says “please dim my desk lamp”, orchestration component 128 utilizes mapping created by device capability abstraction module 722 to map this voice command to a predefined device capability (dimming, see Col 16, Rows 32-35, device capability abstraction module 722 stores an indication that a device will perform this operation upon user requesting to “dim my <device>”) and route the request to a device driver associated with the desk lamp for generating a command for causing the desk lamp to dim).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to determine an indication that the second speech processing system is capable of processing command data including the command by implementing a device capability abstraction module as taught by Mutagi in order to send a response to the voice controlled device to execute an appropriate command issued by the user (Mutagi, Col 10, Rows 1-10, remote server 112 sends generated commands back to voice controlled device 104 to which the user initially issued the natural language command and pass the command to secondary devices for execution).
Regarding Claim 2, Mamkina discloses determining that the command corresponds to an action set to occur at a later time (Col 3, Rows 5-18 and Rows 30-43, banking application requires stringent user recognition confidence threshold and refrains from outputting user-customized content unless the highest user recognition confidence threshold of the content sources can be satisfied); and 
sending, from the second speech-processing system to the voice-controlled device, a second command to execute the action, wherein the second audio data includes a representation of a name of the second speech-processing system (Col 32, Rows 38-39, user customized content correspond to “your Bank A account balance is $500” where Bank A represents the bank application server 125).  
Regarding Claim 3, Mamkina discloses receiving, from the voice-controlled device, third audio data representing a second utterance (Col 6, Rows 1-4 and Col 31, Rows 40-44, ongoing exchange between the system and the user corresponding to a dialog of multiple utterances speaking commands); 
determining that the third audio data corresponds to a second command (Col 6, Rows 1-4, when user interacts with the system at runtime by speaking commands); 
determining that the second command corresponds to a user device proximate a user (Col 30, Rows 1-8, receive secondary data 809 indicting location data of speech controlled device 110 as user recognition confidence data; e.g., if the speech controlled device 110 is locate in user A’s bedroom, such location may increase user recognition confidence data associated with user A); and 
sending, to the user device via an application programming interface, an instruction corresponding to the second command (Col 31, Rows 44-51, API may be used to exchange user recognition confidence thresholds to be exchanged for purposes of processing a speech command / session).
Regarding Claims 4 and 13, Mamkina discloses a computing device (Fig. 11) comprising: 
at least one processor (Fig 11, processor 1104); and 
at least one memory including instructions that, when executed by the at least one processor (Col 35, Rows 26-35, memory 1106 / storage 1108 storing computer instructions for respective device’s processors 1104), cause the computing device to: 
receive input data corresponding to a command (Col 7, Rows 32-45, receiving audio data 111 from local device 110 for speech processing into executing system commands); 
determine, using a first natural-understanding system, that the command is associated with a second natural-understanding system (Col 9, Rows 10-15 and Col 10, Rows 16-21, server 120 performs natural language understanding processing to determine a domain of the utterance to determine and narrow down which services offered by an endpoint device (e.g., the server 120, the speech-controlled device 110, an application server, etc.) may be relevant; e.g., Col 10, Rows 22-28, telephone services, contact list service, calendar / scheduling service, music player service etc.); 
determining a first indication that the second natural understanding system is relevant for processing first data corresponding to the command (Col 10, Rows 16-21, determining a domain of the utterance so as to narrow down which services offered by an endpoint device (e.g., server 120) may be relevant);
send, from the first natural-understanding system to the second natural- understanding system, first data (Col 4, Rows 4-5 and Col 12, Rows 53-60, implement multiple servers with command processor 290 to receive output from NLU processing; e.g., Col 12, Rows 34-52, results of NLU processing tagged to attribute meaning to the utterance); 
receive, from the second natural-understanding system, second data corresponding to a first response to the command, the second data including a second indication of the second natural-understanding system (Col 13, Rows 1-7, in one example, NLU output includes a search utterance / requesting the return of search results, select a command processor 290 located on a search server to execute a search command and determine search results; Col 33, Rows 1-15, determine content sources / application server having access to content responsive to spoken utterance, with signal requesting respective content from the content source); 
determine, using the first natural-understanding system and the second data, third data corresponding to a second response to the command, the third data including a third indication of the first natural-understanding system (Col 34, Rows 9-16, server 120 receives content from respective content sources, perform TTS on the received text data to create output audio data); and 
cause output corresponding to the third data (Col 34, Rows 15-17, output audio data may be sent to and output from the speech controlled device 110).
Mamkina does not disclose that the second indication indicates that the second natural understanding system is capable of processing first data corresponding to the command.
Mutagi teaches a system for processing spoken command data comprising a voice controlled device and a plurality of natural understanding systems (Fig. 1 and see Col 7, Rows 20-52, voice controlled device 104 uploading audio signal comprising natural language command to remote server 112 comprising multiple servers / speech processing systems) by determining a domain associated with the command (Col 12, Rows 59-65, after identifying a command from an audio signal, route the request to the appropriate domain at the remote service 112), determining an indication that a second natural understanding system is capable of processing command data including the command (Col 13, Rows 5-9, Rows 22-25, and Rows 58-64, by having a device capability abstraction module 722 mapping capabilities of new devices to a set of predefined device capabilities, when a user issued requests to create a group consisting of devices capable of being dimmed, orchestration component 128 is able to identify which devices have this particular capability and route the user’s request to the appropriate location within remote service 112; Fig. 7 and Col 14, Rows 20-25, orchestration component 128 routes the request to turn on desk lamp to appropriate secondary device drivers 122, which may in turn generate the appropriate command), and sending, from a first natural understanding system to the second natural understanding system, the command data (Col 16, Rows 43-49, when the user says “please dim my desk lamp”, orchestration component 128 utilizes mapping created by device capability abstraction module 722 to map this voice command to a predefined device capability (dimming, see Col 16, Rows 32-35, device capability abstraction module 722 stores an indication that a device will perform this operation upon user requesting to “dim my <device>”) and route the request to a device driver associated with the desk lamp for generating a command for causing the desk lamp to dim).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to determine an indication that the second natural understanding system is capable of processing command data including the command by implementing a device capability abstraction module as taught by Mutagi in order to send a response to the voice controlled device to execute an appropriate command issued by the user (Mutagi, Col 10, Rows 1-10, remote server 112 sends generated commands back to voice controlled device 104 to which the user initially issued the natural language command and pass the command to secondary devices for execution).
  Regarding Claims 5 and 14, Mamkina discloses wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the computing device to: 
prior to causing the output, determine that the second natural-understanding system is associated with a request for permission (Col 2, Rows 26-29, Col 32, Rows 27-30, and Col 34, Rows  each content source / server may have a respective user recognition confidence threshold that must be satisfied prior to the content source providing access to the requested content), determine to cause second output corresponding to the request for permission before sending the first data (Col 34, Rows 18-21 and Rows 46-60, for example, server 120 performed NLU on input audio data and determine that bank application  content source requires a high user recognition confidence threshold be satisfied, server 120 cause the speech controlled device 110 to prompt the user “please provide an additional verification input so that I may more accurately verify you”); 
receive fourth input data corresponding to the request for permission (Col 34, Rows 51-53, receiving additional recognition data as result of prompting the user), wherein the third data further includes a representation of the request for permission (Col 34, Rows 47-60 in view of Col 32, Rows 29-44, “Hello John, our Bank A account balance is $500” where “your Bank A account balance is $500” was created using output from a banking content source / application server that requires a user recognition confidence threshold of “high” be satisfied).  
Regarding Claims 6 and 15, Mamkina discloses wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the computing device to: 
determine that the command corresponds to a second output of the second natural- understanding system, the second output occurring after the output (Col 32, Rows 29-33 and Col 34, Rows 50-51, in response to the command “What is my bank account balance”, first output “Please provide an additional verification input so that I may more accurately verify you” and thereafter output “Hello John, your Bank A account balance is $500”), wherein the third data includes a representation of a name of the second natural-understanding system (Col 32, Rows 29-33, in response to “What is my bank account balance”, the response “Hello John, your Bank A account balance is $500” where “Bank A” corresponds to a name of banking content source / application server 125).
Regarding Claims 7 and 16, Mamkina discloses wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the computing device to: 
determine that receiving the input data is associated with a first device (Col 25, Rows 16-20, to perform user recognition, determine the speech controlled device 110 from which the audio data 111 originated with a tag indicating the speech controlled device 110); and 
determine that causing the output is associated with a second device (Col 10, Rows 16-28, determine a domain of utterance so as to narrow down which services offered by an endpoint device (e.g., the server 120, the speech controlled device 110, an application server, an endpoint device offering music player service) may be relevant), wherein the third data includes a representation of a name of the second natural-understanding system (Col 32, Rows 29-33, in response to “What is my bank account balance”, the response “Hello John, your Bank A account balance is $500” where “Bank A” corresponds to a name of banking content source / application server 125; in another example, Col 2, Rows 20-23, in response to user request to play music, respond in a spoken form “playing music” before actually outputting the music content).  
Regarding Claims 8 and 17, Mamkina discloses wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the computing device to: 
determine that receiving the input data is associated with a first device (Col 25, Rows 16-20, to perform user recognition, determine the speech controlled device 110 from which the audio data 111 originated with a tag indicating the speech controlled device 110); and 
determine a user account associated with the first device (Col 2, Row 62 – Col 3, Row 4, banking service will send balance information if the requesting user is indicated by a user profile authorized to access the appropriate bank account; Col 20, Rows 59-65, user profile storage includes data regarding users of a device related to individual users, accounts etc.; Col 25, Rows 24-32, access user profile to perform user recognition); 
determine that the user account includes a fourth indication of the second natural- understanding system (Col 2, Row 62 – Col 3, Row 4, user profile authorizes access to appropriate bank account), wherein the third data includes a representation of a name of the second natural-understanding system (Col 32, Rows 29-33, in response to “What is my bank account balance”, the response “Hello John, your Bank A account balance is $500” where “Bank A” corresponds to a name of banking content source / application server 125).  
Regarding Claims 10 and 19, Mamkina discloses wherein the at least one memory further includes instructions to determine that the command is associated with the second natural-understanding system and that, when executed by the at least one processor, further cause the computing device to: 
determine a domain corresponding to the input data (Col 10, Rows 16-21, server 120 performs natural language understanding processing to determine a domain of the utterance to determine and narrow down which services offered by an endpoint device (e.g., the server 120, the speech-controlled device 110, an application server, etc.) may be relevant); and 
determine that the second natural-understanding system corresponds to the domain (e.g., Col 10, Rows 22-28, telephone services, contact list service, calendar / scheduling service, music player service etc.; see further, Col 31, Rows 12-15, determine that “Play my music” to send NLU results to music playing application server 125).  
Regarding Claim 11, Mamkina discloses receiving second audio data corresponding to a second command (Col 9, Rows 1-15, perform ASR processing on received sounds and pass ASR results to a server for natural language understanding processing and conversion of the ASR results / text data into commands for execution); 
determining that the second command is associated with the second natural- understanding system (Col 9, Rows 14-15, for execution either by the speech controlled device 110, by the server 120, or by another device; Col 10, Rows 16-24, perform NLU processing of speech input to determine a domain of the utterance so as to determine and narrow down which services offered by an endpoint device may be relevant); and 
sending, to the second natural-understanding system, a third command to process third audio data (Col 12, Row 53 – Col 13, Row 7, output from the NLU processing includes tagged text data and commands are sent to command processor 290 located on a same or separate server 120; e.g., a command to play music selects a command processor 290 corresponding to a music playing application; in another example, NLU output includes a request for the return of search results selects a command processor 290 of a search engine processor to execute a search command and determine search results).
Regarding Claims 12 and 20, Mamkina discloses wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the computing device to: 
determine that the second data lacks first information (Col 13, Rows 1-7, in one example, NLU output includes a search utterance / requesting the return of search results, select a command processor 290 located on a search server to execute a search command and determine search results); 
send, from the first natural-understanding system to the second natural-understanding system, fourth data corresponding to a request for the first information (in view of Col 33, Rows 8-15, server 120 may send a signal to each determined content source (e.g., command processor 290 / application server 125) with the signal requesting respective content from the content source); and 
receive, from the second natural-understanding system, fifth data corresponding to the first information (Col 13, Rows 1-7, search server returns search results).
Claims 9 and 18 are rejected under 35 USC 103(a) as being unpatentable over Mamkina et al. (US 10032451 B1) in view of Mutagi et al. (US 9984686 B1) as applied to claims 4 and 13, in further view of Fainberg et al. (US 11100923 B2).
 Regarding Claims 9 and 18, Mamkina discloses wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the computing device to: determine that the input data includes a representation of a wakeword (Col 6, Rows 18-20 and Col 7, Rows 32-37, device 110 has a wakeword detection module 220 to process audio data to determine if a wakeword is detected and transmitted to server 120 for speech processing of the wakeword).
Mamkina does not disclose determine that the input data includes a representation of a wakeword associated with the first natural-understanding system.
Fainberg discloses determine input data includes a representation of a wakeword associated with a first natural understanding system (Col 2, Row 66 – Col 3, Row15, extract detected sound to identify a wakeword and transmit the sound data to an appropriate voice assistant service (VAS) / remote service implemented using cloud servers such as Amazon Alexa, Apple Siri, Microsoft Cortana etc.; Col 24, Rows 28-35, identify wake word “Alexa” and invoke Amazon VAS; identify “Ok Google” and invoke Google VAS).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to determine that the input data includes a representation of a wakeword associated with the first natural understanding system as taught by Fainberg in order to invoke an appropriate voice assistant service for interpretation (Fainberg, Col 2, Rows 63-67).
Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to examiner Richard Z. Zhu whose telephone number is 571-270-1587 or examiner’s supervisor King Y. Poon whose telephone number is 571-272-7440. Examiner Richard Zhu can normally be reached on M-Th, 0730:1700.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RICHARD Z ZHU/Primary Examiner, Art Unit 2675                                                                                                                                                                                                        11/18/2022