DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Election/Restrictions
Applicants’ election with traverse of Invention II, Claims 5 to 20, in the reply filed on 10 August 2020 is acknowledged.  
Claims 1 to 4 are withdrawn from further consideration pursuant to 37 CFR 1.142(b), as being drawn to a nonelected invention, there being no allowable generic or linking claim.  Applicants timely traversed the restriction (election) requirement in the reply filed on 10 August 2020.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 5 to 7, 10 to 15, and 18 to 28 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement.  The claims contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventors, at the time the application was filed, had possession of the claimed invention. 
Independent claims 5 and 13 set forth new limitations directed to “receiving an indication to process the first audio signal using first speech processing system” and “outputting, based at least in part on the indication”, which appear to be new matter and/or misdescriptive of the invention.  Here, dependent claims 25 and 27 set forth that this “receiving an indication” can be construed as “detecting a selection of a button”, but there is no “outputting, based at least in part on the indication” due to a selection of a button.  Generally, a voice-controlled device is activated by a user speaking a wakeword or by pressing a button, but using both is not conventional as now set forth by the independent claims.  Applicants’ Remarks state that support for the new and amended claims is provided by ¶[0023], ¶0038], ¶[0040], and ¶[0059] to ¶[0060] of the Specification.  The Specification, ¶[0023], ¶[0025], and ¶[0060], does literally describe an embodiment of detecting a wakeword “and/or” “instead of or in addition to” a button press for a standard ‘Push-to-Talk’.  Still, doing both would appear to be redundant.  If audio of an utterance is a command and not a wakeword, then it would appear that one must logically press a button in a standard ‘Push-to-Talk’ before speaking the audio, so that, at best, the steps of “receiving, from a voice-controlled device, first audio data representing an utterance” and “receiving an indication to process the first audio data using a first speech processing system” should be reversed, where the step of “receiving an indication” should be placed before the step of “receiving, from a voice-controlled device, first audio data representing an utterance”.  That is, a button must be selected before an utterance is received to be processed using natural language understanding, or the first audio will be ignored; if the first audio includes a wakeword, then the wakeword and the button press will simply be redundant to wake up the voice-controlled device.  The Specification, as originally filed, appears to describe a wakeword and a button press as alternatives, but does not provide a description as to how they might be used together.  Moreover, Applicants’ Specification, ¶[0091] and ¶[0131], includes the only occurrences of the term “indication”, but neither of these occurrences have anything to do with an embodiment of “receiving an indication to process the first audio data using a first speech processing system” or “outputting, based at least in part on the indication”.  The Specification, as originally filed, does not provide support for using the terminology of “an indication” to represent a button press, and does not describe outputting due to a selection a first speech processing system by “an indication” of a specific button press.  Nor is there support for a button press that specifically activates a first speech processing system.  The Specification, ¶[0036], describes that a wakeword may specify one of two alternative speech processing systems, e.g., “Alexa” or “Ford”, but a button press is not described as being enabled to select one of two alternative speech processing systems, and output of second audio does not verify selection of “Alexa” or “Ford”.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Hart et al. (U.S. Patent No. 9,424,840) in view of Chen et al. (U.S. Patent Publication 2016/0035353).
Concerning independent claims 5 and 13, Hart et al. discloses a method and system for a speech recognition platform, comprising:
“receiving, from a voice-controlled device, first audio data representing an utterance” – a user within an environment may audibly state a request to a voice-controlled device, e.g., a request to play a certain song, a request to purchase an item, or a request for a reminder; the voice-controlled device may capture sound and generate an audio signal for analysis of any speech therein (column 2, lines 13 to 19); a microphone of voice-controlled device 106 detects audio including sounds uttered from user 104 (column 6, lines 36 to 38: Figure 2);  
“receiving an indication to process the first audio data using a first speech processing system, the first speech processing system including a first text-to-speech (TTS) component [corresponding to a first speech style indicating output generated by the first speech processing system]” – generally, voice-controlled device 106 is implemented with one or more haptic input components, e.g., a dedicated button to initiate configuration, power on/off, etc. (column 6, lines 22 to 30: Figure 2); dialog engine 230 references a dialog database 232 to determine one or more questions to pose to the user for the purpose of selecting a domain; dialog engine 230 then provides an indication of the question to a rendering engine, which provides a rendering directive to a text-to-speech (TTS) engine 236 (“the first speech processing system including a first text-to-speech (TTS) component”) (column 12, lines 24 to 36: Figure 2); upon identifying a particular intent, dialog engine 230 may provide a rendering directive to TTS engine 236, which may pose a question, “Would you like to listen to music on an internet radio application or from your personal storage?” (column 13, lines 19 to 34: Figure 2); here, voice-controlled device 106 receives a press of a button to initiate a power-on (“receiving an indication to process the first audio data using the first speech processing system”); 
“processing using a natural language understanding (NLU) component, the first audio data to determine intent data corresponding to an intent associated with the utterance” – after determining automatic speech recognition (ASR) results, a speech recognition component may provide the ASR results and the text to a natural language understanding (NLU) component to identify multiple different intents potentially represented by the speech (column 2, lines 50 to 54); NLU component 122 may identify multiple possible intents of the user’s speech across multiple different domains based on the ASR results and the context; a domain may represent a set of related activities, e.g., shopping, music, calendaring, etc., while an intent within a domain may represent one of the activities, e.g., buying a particular item, listening to a particular song, setting up a particular appointment (column 7, line 65 to column 8, line 6: Figure 1A); NLU component 122 receives ASR results and attempts to generate a list of potential intents associated with the speech of the user based on both the ASR results and the received context, which again may include past interactions or preferences of the user (column 11, lines 31 to 50: Figure 1A);
“determining at least a first application corresponding to the first speech processing system; determining at least a second application corresponding to a second speech processing system[, the second speech processing system including a second TTS component corresponding to a second speech style different from the first speech style and indicating output generated by the second speech processing system]” – each intent is associated with a respective domain; a ‘shopping’ domain may include an intent to purchase a particular digital music file, to purchase a grocery item, or to purchase a gift card; a ‘music’ domain may include an intent to play a particular song owned by a user, launch a particular music service, or send a particular song to a friend (column 2, lines 54 to 64); NLU component 122 may identify multiple possible intents across multiple different domains; a domain may represent a related set of activities, e.g., shopping, music, calendaring, etc. (column 7, line 65 to column 8, line 5: Figure 2); here, a ‘shopping’ domain corresponds to “a first application corresponding to the first speech processing system” and a ‘music’ domain corresponds to “a second application corresponding to a second speech processing system”; 
“determining, based at least in part on the at least first application, a first ability of the first speech processing system to respond to the intent; determining, based at least in part on the at least second application, a second ability of the second speech processing system to respond to the intent” – after identifying multiple intents associated with multiple different domains, an NLU component may rank the intents based on one or more criteria (column 3, lines 16 to 24); requests may be for essentially any type of operation, e.g., database inquiries, requesting and consuming entertainment, e.g., gaming, finding and playing music or movies, personal information management, e.g., calendaring and note taking, online shopping, financial transactions, etc. (column 7, lines 30 to 35: Figure 1A); NLU component 122 provides the ranked list of intents across domains and corresponding probabilities associated with the intents to determine if dialog engine 230 is able to select a domain with a confidence that is greater than a predefined threshold (column 12, lines 12 to 23: Figure 2); here, “a first speech processing system” corresponds to one of the domains, e.g., a ‘shopping’ domain, and ‘an ability of a speech processing system to respond to the intent’ reflects the ranking of the intent for that domain by its probability; broadly, a score or confidence  corresponds to “a first ability of the first speech processing system to respond to the intent” and to “a second ability of the second speech processing system to respond to the intent” because ranking of an intent for a domain by probability is the likelihood that an utterance is relevant to that domain; broadly, ‘score’ corresponds to “a first ability of the first speech processing system to respond to the intent” and “a second ability of a second speech processing system to respond to the intent” because ranking of an intent for a domain by probability is the likelihood that an utterance is relevant to that domain; Compare Specification, ¶[0022], ¶[0040], and ¶[0063] which states that a first score or ranking corresponds to a first ability of a first speech processing system to respond to the intent and a second score or ranking corresponds to a second ability of a second speech processing system to respond to the intent;
 “based at least in part on the second ability and the first ability, processing the intent data using the second speech processing system to generate first response data” – a platform may identify an intent of a voice command (“processing the intent data”) (Abstract); after identifying multiple different intents, NLU component may rank the intents based on one or more criteria (“based at least in part on the second ability and the first ability”) (column 3, lines 16 to 24); an orchestration component may then provide those intents from the ranked list of intents, and dialog engine may attempt to select a single intent (column 4, lines 60 to 65); dialog engine 238(3) functions to select an intent from these intents; if the dialog engine 238(3) can select an intent with a confidence that is greater than a predefined threshold, then engine 238(3) selects the intent (column 13, lines 5 to 9: Figure 2); engine 238(3) may determine that the selected intent is now actionable, and may identify one or more actions to take based on the selected intent; speechlet engine 246(3) then works with response component 126 to determine a response to provide to rendering engine 242(3) (“to generate first response data”) (column 13, lines 39 to 55: Figure 2); process 300 then ranks the multiple intents, and then selects an intent (column 15, lines 46 to 50: Figure 3: Steps 310 and 314); here, if an intent for an internet radio application ranks higher than an intent for a shopping application, then an ability for an internet radio application is higher than an ability for a shopping application (“based at least in part on the second ability and the first ability, processing the intent data using the second speech processing system”); 
“causing the voice-controlled device to perform an action using the first response data” – a platform may perform a corresponding action of streaming audio to a device (Abstract); response component 126 may perform a corresponding action of providing audio for output at device 106 by streaming the channel to device 106 (column 8, line 41 to column 9, line 16: Figure 1A); engine 238(3) may also route the selected intent to an appropriate application; an action may include providing audio for output on device 106, e.g., “I will begin playing your music shortly”, as well as requesting that an internet radio application begin streaming a particular channel to device 106 (“to perform an action using the first response data”) (column 13, lines 39 to 55: Figure 2);
“outputting, based at least in part on the indication [and using the first speech processing system], second audio data [corresponding to the first speech style], the second audio data indicating performance of the action” – response component 126 may perform a corresponding action, including providing audio for output at device 106, e.g., “I’ll begin playing your music shortly” (“outputting, based on the intent data . . . second audio data representing the first response data . . . the second audio data indicating performance of the action”), as well as begin streaming the channel to device 106 (column 9, lines 10 to 16: Figure 1A); an action may include providing audio for output on the device, e.g., ‘I will begin playing your music shortly’, as well as performing one or more additional actions, e.g., requesting that the internet radio application begin streaming a particular channel to device 100; rendering engine 242(2) may provide rendering directed to text-to-speech (TTS) engine 236, and rendering engine may provide audio for output on a device 106, e.g., ‘I will begin playing your music shortly’ (column 13, lines 51 to 64: Figure 2); process 300 performs an action for the user in addition to providing audio for output at 316 (column 15, lines 65 to 67: Figure 3); here, ‘I will begin playing your music shortly’ is “second audio data representing the first response data indicating performance of the action, and is “based on the indication” that an intent to stream music on an internet radio application has a highest ranking.
 Concerning independent claims 5 and 13, Hart et al. additionally discloses a text-to-speech engine 236 for “outputting, based on the intent data . . . , second audio data . . . , the second audio data indicating performance of the action.”  Here, “second audio, the second audio indicating performance of the action” is an output of ‘I will begin playing your music shortly’.  However, Hart et al. omits the limitations directed to “the first speech processing system including a first text-to-speech (TTS) component corresponding to the first speech style indicating output generated by the first speech processing system”, “the second speech processing system including a second TTS component corresponding to a second speech style different from the first speech style and indicating output generated by the second speech processing system”, and outputting second audio data “using the first speech processing system . . . corresponding to the first speech style”.  That is, Hart et al. discloses a conventional text-to-speech system for outputting second audio to confirm performance of an action, so as to provide at least one of the “a first TTS component” or “a second TTS component”, but does not differentiate between first and second speech styles corresponding to first and second speech processing systems, and does not output audio in a first speech style by a first speech processing system when the second speech processing system generates the first response data.  Here, it might be expected that a first speech processing system outputs audio in a first speech style when a first speech processing system is selected to perform an action, but not that the first speech processing system speaks for a second speech processing system when the latter is to perform the action.  
Concerning independent claims 5 and 13, however, Chen et al. teaches conversational agents, where a first computer-implemented agent may be for a first application, and a second computer-implemented agent may be for a second application on a user device.  (¶[0004])  A system enables a computer-implemented agent to automatically initiate conversation between another computer-implemented agent and a user, e.g., to hand-off a conversation based on input received from the user, e.g., specific to the other computer-implemented agent.  (¶[0007])  (Compare Specification, ¶[0018] and ¶[0065].)  Customizations may include changing settings of a conversational agent, e.g., a particular style of speech (particular language or a particular voice), e.g., an agent can have a male or female voice, as text-to-speech parameters.  (¶[0017])  Agent settings 212 may include a particular style of speech, a particular voice, or a particular language.  An administrator may select a deep male voice with a certain accent as a particular voice.  Styles of speech may include serious formal, bubbly, funny, and causal.  (¶[0032]: Figure 2)  Chen et al., then, teaches “the first speech processing system including a first text-to-speech (TTS) component corresponding to a first speech style indicating output generated by the first speech processing system” and “the second speech processing system including a second TTS component corresponding to a second speech style different from the first speech style and indicating output generated by the second speech processing system” because agent 104 and agent 106 may be assigned different speech styles, e.g., male or female, for different applications on a same user device.  Moreover, Chen et al. teaches that a platform may enable a conversational agent for a user device to ‘hand off’ a user conversation to another conversational agent, e.g., a specific third party conversational agent.  (¶[0022])  Conversational agent 105 receives and analyzes input, and determines that another computer-implemented conversational agent 106 for John’s Pizza Joint is available to assist the user in ordering pizza from John’s Pizza Joint.  Agent 104 responds to user 102 stating, “Let me connect you with the agent for John’s Pizza Joint”.  (¶[0025]: Figure 1)  Agent 104, then, is “outputting . . . using the first speech processing system, second audio data representing first response data and corresponding to the first speech style, the second audio data indicating performance of the action.”  That is, agent 104 is outputting second audio, “Let me connect you with the agent for John’s Pizza Joint”, where this second audio is in a ‘style’ corresponding to a speech style of agent 104 (“the first speech style”), but agent 104 provides this ‘second audio data’ as ‘indicating’ that a second speech processing system of John’s Pizza Joint is going to be performing an action of assisting in ordering the pizza.  An objective is to enable agents to hand off conversations to third party agents.  (¶[0024])  It would have been obvious to one having ordinary skill in the art to provide first and second speech styles to first and second applications and to output audio data in a first speech style indicating performance of an action associated with a second application as taught by Chen et al. to perform an action of an identified intent of Hart et al. for a purpose of enabling agents to hand off conversations to third party agents.

Claims 6 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Hart et al. (U.S. Patent No. 9,424,840) in view of Chen et al. (U.S. Patent Publication 2016/0035353) as applied to claims 5 and 13 above, and further in view of Van Os et al. (U.S. Patent No. 10/540,976).
Concerning claims 6 and 14, Chen et al. does not expressly teach the limitations directed to “prior to outputting the second audio data, causing output of a first indication associated with the second speech processing system, and during outputting of the second audio data, causing output of a second indication associated with the first speech processing application.”  Applicants’ Specification is not entirely clear on a scope of “a first indication” and “a second indication”, but this can be broadly construed as some graphical or audio indication that a first application and/or a second application is available or active.  Generally, Van Os et al. teaches contextual voice commands, where a data item in a first context is displayed, selecting a displayed data item in the first context, and receiving a voice input that relates to a selected data item to operation in a second context.  (Abstract)  Specifically, Figures 2a to 2b illustrate that a device initially displays a plurality of application icons 110, then a user can select one of the icons by voice, which may be a mail icon 110b, and then a user interface is displayed for email in Figures 3a to 3b.  Van Os et al. displays contextual voice command icons 110 before the contextual voice command mode is activated, and then user selection of any of these icons using an audio input can be communicated via a visual indication or an audio indication.  (Column 3, Line 51 to Column 4, Line 67: Figures 2a to 2b and 3a to 3b)  Van Os et al., then, provides for “causing output of a first indication associated with the second speech processing system”, e.g., web application 110c or music application 110d, “prior to outputting the second audio data”, and then “during the outputting of the second audio data, causing output of a second indication associated with the first speech processing system”, i.e., a visual indicator or an audio indication of “Email Application” after selection of a mail application 110b by voice command.  An objective is to implement contextual voice commands so that a user can execute desired operations faster than by navigating through a set of nested menu items.  (Column 1, Lines 16 to 30)  It would have been obvious to one having ordinary skill in the art to cause output of first and second indications prior to and during output of second audio data as taught by Van Os et al. in conversational agents of Chen et al. for a purpose of implementing contextual voice commands that can be executed faster than through a set of nested menu items.

Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Hart et al. (U.S. Patent No. 9,424,840) in view of Chen et al. (U.S. Patent Publication 2016/0035353) as applied to claims 5 and 13 above, and further in view of Nadkar et al. (U.S. Patent Publication 2018/0204569).
Hart et al. discloses “determining that the first audio data includes a representation of a wakeword” when a user speaks a predefined command of ‘Awake’.  (Column 6, Lines 45 to 51: Figure 1A)  Hart et al. does not expressly disclose “determining that the wakeword corresponds to the first speech processing system.  Applicants’ Specification, ¶[0016] - ¶[0018] and ¶[0036], appears to describe an embodiment where different wakewords are associated with different applications.  Still, if a wakeword is spoken, then this implicitly is a wakeword for a first speech processing application even if there is only one application.  Specifically, Nadkar et al. teaches voice assistant tracking and activation, where a listening component is configured to receive audio and detect a first wake word of a plurality of wake words that corresponds to a first voice assistant service of a plurality of voice assistant services.  (¶[0011])  Different voice assistant services on the same device or system may be activated based on different wake words.  (¶[0024]: Figure 1)  Voice assistant manager 104 may maintain a list of wake words and their corresponding voice assistant application/service, where each wake word may when detected activate a corresponding voice assistant service native to the vehicle infotainment system.  (¶[0031]: Figure 1)  Statements may include, “Siri, play music.  Alexa, open garage door.  Google, tell me about the weather” all without pauses or delays between commands, and voice assistant manager 104 may then receive responses from each system, and in turn, provide the responses in a visual or audio format to the user.  (¶[0033]: Figure 1)  A user may simply say, “Accuweather, how’s the weather today”, after which voice assistant manager 104 detects the wake word, and identifies the corresponding application.  (¶[0037]: Figure 1)  Nadkar et al., then, equivalently teaches “determining that the wakeword corresponds to the first speech processing system”, where the first speech system can be a weather application.  An objective is to manage a plurality of available speech-to-text or voice assistant services.  (¶[0002])  It would have been obvious to one having ordinary skill in the art to determine that a wakeword corresponds to a first speech processing application as taught by Nadkar et al. in natural language understanding that detects wakewords of Hart et al. for a purpose of managing a plurality of available voice assistant services.

Claims 10 to 12 and 18 to 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hart et al. (U.S. Patent No. 9,424,840) in view of Chen et al. (U.S. Patent Publication 2016/0035353) as applied to claims 5 and 13 above, and further in view of Van Scheltinga et al. (U.S. Patent Publication 2018/0338191).
Concerning claims 10 and 18, Chen et al. teaches outputting second audio data for a second speech processing system that is associated with performing an action, e.g., ordering pizza.  However, Chen et al. does not go into what happens after an application is selected by Hart et al. for “receiving, from the first speech processing system, second response data corresponding to a second response to the intent” and “outputting, based on the second response data and using the first speech processing system, third audio data.”  
Concerning claims 10 and 18, Van Scheltinga et al. teaches cross-device handoff, where a first computing device may receive an indication of user input that is part of a conversation of a user and a first assistant executing at the first computing device, and the first assistant may determine whether to handoff the conversation from the first assistant executing at the first computing device to a second assistant executing at a second computing device.  The first assistant system may then send to the second computing assistant a request to handoff the conversation which includes an indication of the conversation.  (Abstract)  Van Scheltinga et al., then, teaches that a second speech processing system may send first audio data to a first speech processing system when there is a handoff between a second assistant and a first assistant.  Specifically, Van Scheltinga et al. teaches an embodiment where a user requests that assistant module 122A make dinner reservations at a restaurant.  Then assistant module 122A or 122C may interact with a restaurant reservation service, and output a confirmation that it has made the requested reservation for 7 PM at Luigi’s.  When assistant module 122A receives a query 126, “How do I get to the restaurant?”, assistant module 122A or 122C may be able to determine that the restaurant refers to Luigi’s, may determine a relevant response for query 126 to be directions from the user’s current location to the location of the restaurant, and may formulate a response that includes the directions.  Assistant module 122A or 122C may determine whether to handoff the conversation between the user and assistant module 122A to another assistant module 122B, and may enable the computing device that receives the handoff to output a response to a query received by assistant module 122A.  (¶[0052] - ¶[0055]: Figure 1)  Van Scheltinga et al., then, teaches that a different assistant module can receive audio data corresponding to a conversation thread of a restaurant reservation and generate second response data.  Here, “the first speech processing system” is an assistant module that generates directions as second response data after receiving a conversation thread of a handoff from “the second speech processing system” that made the restaurant reservation.  An objective is to improve usability of an assistant so that a user does not have to explicitly instruct the assistant to handoff a conversation from a first computing platform to a second computing platform.  (¶[0004])  It would have been obvious to one having ordinary skill in the art to send first audio data to a different speech processing system to generate a response as third audio data as taught by Van Scheltinga et al. in a natural language understanding system of Hart et al. for a purpose of improving usability by not requiring a user to explicitly instruct an assistant to handoff a conversation.

Concerning claims 11 and 19, Hart et al. discloses determining intent.  Van Scheltinga et al. teaches that a request of a conversation “corresponds to a prior command”, where this prior command is for a restaurant reservation.  (¶[0052] - ¶[0053]: Figure 1)  Additionally, assistant module 122A or 122C may determine an action that is likely to be taken based on analyzing user preferences or user history stored at user information data store 124A and user information data store 124C.  (¶[0083]: Figure 1)  Here, a user information data store that stores user preferences and user history is equivalent to “a user profile”.  So, Van Scheltinga et al. teaches “determining that a user profile includes second response data corresponding to the prior command and corresponding to the first speech processing system” when a response is generated based on user preferences and user history.  Assistant module 122A or 122C may be able to determine the meaning of query 126 based on the contextual information determined from conversation 132, and determine that the restaurant refers to Luigi’s, may determine a relevant response for query 126 to be directions from the user’s current location to the location of the restaurant, and may formulate a response that includes the directions, where the conversation is handed off to generate the relevant responses by assistant module 122B (“sending, to the second speech processing system, the second response data”).  (¶[0053] - ¶[0055]: Figure 1)
Concerning claims 12 and 20, Hart et al. discloses determining intent data.  Van Scheltinga et al. teaches “sending, to the first speech processing system, the intent data” when a conversation thread is sent from assistant module 122A or 122C that makes a restaurant reservation to assistant module 122B that provides directions to the restaurant.  Assistant module 122B then determines the directions to the restaurant (“receiving, from the first speech processing system, second response data”), and outputs the directions as audio (“outputting, based on the second response data and using the first speech processing system, third audio data”).  (¶[0053] - ¶[0055]: Figure 1) 

Claims 21 to 24 are rejected under 35 U.S.C. 103 as being unpatentable over Hart et al. (U.S. Patent No. 9,424,840) in view of Chen et al. (U.S. Patent Publication 2016/0035353) as applied to claims 5 and 13 above, and further in view of Talwar et al. (U.S. Patent Publication 2018/0233138).
Broadly, Hart et al. discloses “causing the voice-controlled device to perform the action includes causing a change in a state” at least for an embodiment of playing a song or internet radio on voice-controlled device 106.  That is, Hart et al. discloses various embodiments of “a change in a state” that include ‘play’ or ‘pause’ a particular song in a music domain, or outputting a reminder in a to-do list application to ‘Remember to pick up Grace from soccer in 15 minutes’.  (Column 3, Lines 4 to 16; Column 5, Lines 18 to 46)  However, Hart et al. does not expressly disclose that a voice-controlled device is “associated with a vehicle” to change a state “of the vehicle”, or “wherein causing a change in a state of the vehicle includes one or more of raising or lowering a window, setting a thermostat, adjusting a seat, locking or unlocking doors, or turning on or turning off lights.”  Still, it is fairly well known in the prior art to perform actions from voice commands to control various accessories in a vehicle, and an application of a voice-controlled device to a vehicle is mainly ‘an intended use’.  
Specifically, Talwar et al. teaches vehicle control for multi-intent queries input by voice, where a module is configured to determine a primary intent included in voice input using automated speech recognition (ASR), and an execution module is configured to execute the primary intent.  (Abstract)  An infotainment system of a vehicle controls various aspects of in-vehicle entertainment, selection of a source of sound output via speakers, information displayed on a display within the vehicle, climate control settings of the vehicle, etc.  (¶[0031])  Climate control module 220 controls climate control actuators 224 of the vehicle based on user input regarding climate control settings, where user input may be in the form of user voice.  (¶[0048]: Figure 2)  User requests may be included within voice input 232, where multi-intent queries include: ‘Can you turn on my seat vibrator to medium setting and play a jazz station from the radio?’ and ‘Can you turn the temperature down and tell me how stock for company X is doing?’  (¶[0051]: Figure 2)  Execution module transmits a command to climate control module 220 to decrease a target temperature within passenger cabin and to adjust one or more climate control actuators 224.  (¶[0055]: Figure 2)  Talwar et al., then, teaches “causing the voice-controlled device to perform the action includes changing a state of the vehicle” and “wherein causing a change in a state of the vehicle includes one or more of . . . setting a thermostat, adjusting a seat”.  An objective is to improve request completion of an infotainment module for multi-intent queries.  (¶[0035] - ¶[0036])  It would have been obvious to one having ordinary skill in the art to apply a voice-controlled device to identify an intent of a voice command in Hart et al. to a vehicle control of setting a thermostat or adjusting a seat as taught by Talwar et al. for a purpose of improving request completion of an infotainment module executing multi-intent queries.

Claims 25 to 28 are rejected under 35 U.S.C. 103 as being unpatentable over Hart et al. (U.S. Patent No. 9,424,840) in view of Chen et al. (U.S. Patent Publication 2016/0035353) as applied to claims 5 and 13 above, and further in view of Daly (U.S. Patent Publication 2010/0333163).
Hart et al. arguably discloses the limitations of these claims directed to “receiving the indication includes detecting a selection of a button” and “in response to receiving the indication, causing the voice-controlled device to output audio identifying the first speech processing system.”  Specifically, Hart et al. discloses that a voice-controlled device 106 can be implemented with a haptic input component that includes one or more control buttons and a dedicated button to initiate configuration and power on/off.  (Column 6, Lines 24 to 30: Figure 2)  Broadly, Hart et al. then discloses that voice-controlled device 106 receives and detects an indication of a user pressing a dedicated button to turn on or turn off operation.  Subsequently, Hart et al. discloses that voice-controlled device 106 identifies a domain, and outputs audio identifying that domain.  That is, voice-controlled device 106 poses various questions: ‘Do you wish to shop or listen to music?’, ‘Would you like to listen to music on an internet radio application or from your personal storage?’, and then ‘I will begin playing your music shortly’.  (Column 12, Line 50 to Column 13, Line 64: Figure 2)  Broadly, Hart et al. then discloses “in response to receiving the indication, causing the voice-controlled device to output audio identifying the first speech processing system”.  That is, “the first speech processing system” is playing music, and audio is output to identify an application of playing music is to begin playing.  Even if there are intermediate steps between “receiving the indication” to turn on power to voice-controlled device 106 by pressing a button and “causing the voice-controlled device to output audio identifying the first speech processing system”, these limitations are broadly disclosed by Hart et al.
However, even if these limitations are omitted by Hart et al., they are taught by Daly.  Generally, Daly teaches voice enabled media presentation systems.  (Abstract)  A voice enable key provides push-to-talk capability for remote-control device 100.  When a user 220 pushes or otherwise activates a voice enable key 204, remote-control device 100 begins to receive or capture audio signals received by microphone 206, and to initiate a voice recognition process.  (¶[0039]: Figure 2)  One embodiment provides that a user 200 has pressed voice enable button 204, and in response the voice enabled media presentation system (VEMPS) initiates a display of an icon 304 that indicates that voice interface manager 101 is ready to accept, e.g., listening for, a spoken command.  Icon 304 serves as a prompt for a user to begin speaking.  (¶[0049]: Figure 3A)  Compare Specification, ¶[0059], which describes one or more lights 308 to identify a speech processing system being used to perform a task.  The VEMPS may use text-to-speech synthesis to prompt or otherwise interact with a user.  When the user selects voice enable button 204, the system may play an audio prompt that says, ‘I’m listening’ or ‘How may I help you?’ that is output by a speaker associated with presentation device 124.  (¶[0057]: Figure 3D)  Daly, then, teaches “receiving the indication includes detecting a selection of a button” when a user selects voice enable button 204, and “in response to receiving the indication, causing the voice-controlled device to output audio identifying the first speech processing system” by displaying icon 304 or using text-to-speech synthesis to output an audio prompt of ‘I’m listening’.  Here, a voice enabled media presentation system (VEMPS) is “the first speech processing system”, and icon 304 or an audio prompt is “identifying the first speech processing system.”  (Applicants’ claim language only requires identification of one “first speech processing system”.)  An objective is to control a set-top box via voice commands.  (¶[0001])  It would have been obvious to one having ordinary skill in the art to identify a first speech processing system in response to receiving an indication of a selection of a button as taught by Daly in a speech recognition platform that identifies a domain of a voice command in Hart et al. for a purpose of controlling a set-top box via voice commands.

Response to Arguments
Applicants’ arguments filed 03 May 2022 have been fully considered but they are not persuasive.
Applicants provide some significant amendments and revisions to independent claims 5 and 13, cancel claims 8 to 9 and 16 to 17, and add new claims 25 to 28.  Then Applicants present some brief arguments traversing the prior rejection of the independent claims as being obvious under 35 U.S.C. §103 over Hart et al. (U.S. Patent No. 9,424,840) in view of Chen et al. (U.S. Patent Publication 2016/0035353).  Applicants note that the rejection characterizes Hart et al. as disclosing a first score corresponding to an ability of a first speech processing system to respond to the intent and that a first score reflects the ranking of the intent for a domain.  Applicants do not concede this, but allege that unspecified claimed features “patentably improve over the references alone or in any proper combination.”  Applicants state that Hart et al. discloses an array of different applications that may work with a speech recognition platform to perform actions requested by the user, e.g., a shopping application, a to-do list application, a music application.  Then Applicants characterize Chen et al. as teaching that an agent may be chosen by name, where a device agent may receive a request from a user and identify John’s Pizza Joint, but contend that an agent is not selected based on intent or applications associated with the agent in Chen et al.  Applicants conclude that the independent claims patentably improve over the cited references because they do not disclose or teach processing the intent data using a speech processing system selected based on applications corresponding to the speech processing system.  These arguments are not persuasive.
Firstly, a new grounds of rejection is set forth under 35 U.S.C. §112(a).  Applicants’ independent claims are amended to set forth limitations of “receiving an indication to process the first audio data using a first speech processing system” and  “outputting, based at least in part on the indication”, which limitations are problematic in combination as misdescriptive of the invention or as introducing new matter under 35 U.S.C. §112(a).  Here, Applicants’ Specification, as originally filed, does not provide express support in this context for the term “an indication”.  New claims 25 and 27 state that “receiving the indication includes detecting a selection of a button.”  Generally, it is known in the prior art that a speech recognition device can be awakened to listen for a command by a user speaking a wakeword or an activation of a button, e.g., a push-to-talk button, but not both.  The Specification, ¶[0025] and ¶[0060], literally describes a wakeword “and/or” “or in addition” a button press, but using both appears that it would be redundant.  Moreover, if “first audio representing an utterance” is a command without a wakeword that is issued before “receiving an indication to process the first audio data” by pressing of a push-to-talk button, then, this first audio will be ignored because the voice-controlled device is not yet activated.  Conceivably, Applicants’ limitation of “receiving, from a voice controlled device, first audio data representing an utterance” should be situated after the limitation of “receiving an indication”.  Additionally, Applicants’ Specification, as originally filed, does not support that pressing a button (“receiving an indication”) actually is used to select “a first speech processing system”.  That is, there appears to be nothing in the originally-filed Specification that enables a user to select one of the first speech processing system and the second speech processing system by pressing a button.  The Specification, ¶[0059], describes an embodiment of lights or audio output to identify which of the two speech processing systems is performing the task, but this does not correspond to receiving an indication from a user as a button press.  Nor is output of second audio data “based at least in part on the indication” of pressing a button as set forth by the last clause of the independent claims.  These new grounds of rejection are necessitated by amendment.
Secondly, new grounds of rejection are set forth under 35 U.S.C. §103 as directed to new dependent claims 25 to 28 being obvious further in view of Daly (U.S. Patent Publication 2010/0333163).  Hart et al., Column 6, Lines 22 to 51, discloses that a dedicated button to initiate a power on may be pressed as an alternative to a command to ‘Awake’.  Arguably, Hart et al. then discloses “receiving the indication includes detecting a selection of a button.”  Similarly, Hart et al. may be construed to subsequently disclose “in response to receiving the indication, causing the voice-controlled device to output audio identifying the first speech processing system” because there is a subsequent identification by audio output of performing a task in a music domain.  Anyway, Daly, ¶[0039], ¶[0049], and ¶[0057], teaches these limitations by expressly including a voice enable key that provides push-to-talk capability, and then displaying an icon 304 or an audio prompt that indicates that a voice interface is ready to accept a spoken command.  Compare Specification, ¶[0059], where one or more lights 308 or audio output identify a speech processing system being used.  Applicants’ claim language only requires identifying one speech processing system, i.e., “a first speech processing system”, and this corresponds to the VEMPS system of Daly.
Thirdly, Applicants’ arguments are not persuasive as directed against the independent claims, and it is maintained that the obviousness rejection is proper over Hart et al. (U.S. Patent No. 9,424,840) in view of Chen et al. (U.S. Patent Publication 2016/0035353).  The rejection of some of the dependent claims continues to rely upon Van Os et al. (U.S. Patent No. 10/540,976), Nadkar et al. (U.S. Patent Publication 2018/0204569), Van Scheltinga et al. (U.S. Patent Publication 2018/0338191), and Talwar et al. (U.S. Patent Publication 2018/0233138).  New grounds of rejection are set forth under 35 U.S.C. §112(a) and for dependent claims 25 to 28 as being obvious further in view of Daly.  These new grounds of rejection are necessitated by amendment.
Mainly, Applicants’ arguments are not persuasive because the amendments do not significantly change the scope of the independent claims and these arguments do not specifically point out what is deficient in the rejection.  Applicants allege that broad range of quoted features “patentably improve over the references”, but do not fully explain why this is so.  Applicants’ claim language is amended to delete the limitations of “a first score” and “a second score” from the independent claims so that they now only refer to “a first ability” and “a second ability”.  The Specification, ¶[0022], ¶[0040], and ¶[0063], however, states that a first ability and a second ability are equivalent to a first score or ranking and a second score or ranking.  Hart et al. discloses these first scores or rankings and second scores or rankings.  Additionally, Applicants include a new limitation of “receiving an indication to process the first audio data using a first speech processing system”, but does not actually highlight this limitation as significant or provide arguments as to why it should provide a patentable feature over the prior art.  Hart et al., Column 6, Lines 22 to 51, discloses both a dedicated button and a command to ‘Awake’ that appear equivalent to this limitation.  However, this limitation is subject to a new grounds of rejection under 35 U.S.C. §112(a).  
Applicants’ argument, at best, only appears to attack the references individually without consideration as to what the prior art as whole suggests to one skilled in the art.  One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).  Applicants appear to admit that a first application and a second application are disclosed by Hart et al., but argue that an agent is not selected based on intent by Chen et al.  However, even if an agent is not selected based on intent by Chen et al., a first application or a second application is selected based on intent by Hart et al.  Applicants’ argument, then, does not demonstrate that there is any patent improvement over, in combination, Hart et al. and Chen et al., but merely addresses the references individually. 
Applicants’ arguments are not persuasive.  New grounds of rejection are set forth as directed to the independent claims under 35 U.S.C. §112(a) and to new dependent claims 25 to 28 under 35 U.S.C. §103 further in view of Daly (U.S. Patent Publication 2010/0333163).  These new grounds of rejection are necessitated by amendment.  Accordingly, this rejection is properly FINAL.

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.
Coon et al., Column 3, Lines 34 to 40, discloses related prior art similar to Daly, directed to a push button to activate a system and an audible output indicating a system is ready to process voice commands.

Applicants’ amendment necessitated the new grounds of rejection presented in this Office Action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP §706.07(a).  Applicants are reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached on Monday-Thursday 8:30 AM-6:00 PM.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        May 11, 2022