DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
The following title is suggested: Multi-Assistant Natural Language Processing from Triggers to Determine Voice Model for Synthesized Speech.
The disclosure is objected to because of the following informalities:
In ¶[0038], “signal represent” should be “signal to represent”.
In ¶[0054], “associate detectable” should be “to associate detectable”.  
In ¶[0056], “include one or more” should be “including one or more”.
In ¶[0076], “may represent processing may be caused to be performed” (three occurrences) does not appear to be grammatical, but could be “may represent processing to be performed”.
In ¶[0076], “may representing may be caused to be performed” does not appear to be grammatical, but could be “may represent processing to be performed”.
In ¶[0076], “when a natural language input” should be “for a natural language input”. 
In ¶[0095], “processes text data” should be “that processes text data”.
In ¶[00102], there should be some reference to Figure 9.

In ¶[00115], there should be some reference to Figure 10.
In ¶[00142], “is associated” should be “are associated”.
In ¶[00147], there should be some reference to Figure 11B.
In ¶[00147], “plan generate 1170” should be “plan generator 1170”. 
In ¶[00148], “may generated updated” should be “may generate updated”.
In ¶[00150], “determine (1204)” should be “may determine (1204)”.
 In ¶[00152], there should be some reference to Figure 12B beginning at Step 1212.
In ¶[00154], there should be some reference to Figure 12C beginning at Step 1222.
In ¶[00155], “corresponding to the in” should be “corresponding to the skill in”.
In ¶[00155], there should be some reference to Figure 12D beginning at Step 1234.
In ¶[00157], there should be some reference to Figure 12E beginning at Step 1248.
In ¶[00157], “that that output” should be “that had output”.
In ¶[00158], there should be some reference to Figure 12F beginning at Step 1258.
In ¶[00159], “rand” should be “and”.
In ¶[00163], “user’s over” should be “users over”.
In ¶[00173], there should be some reference to Figure 13B beginning at Step 1312.

In ¶[00177], it appears that this paragraph should be located earlier in the Specification, e.g., before ¶[00168].
In ¶[00178], “the last assistant” should be “as the last assistant”.
In ¶[00184], “the user the provided” should be “the user that provided”.
In ¶[00188], “component14” should be “component 1414”.
In ¶[00198], “audio data 405” should be “audio data 1504”.  See Figure 15.
In ¶[00211], “may send query the anonymous” is not grammatical, but could be “may query the anonymous” or “may send a query to the anonymous”.
In ¶[00228], “to recognition the user” should be “to recognize the user”.
Appropriate correction is required.

Claim Objections
Claims 5 to 20 are objected to because of the following informalities:  
Independent claims 5 and 13 set forth a limitation of “NLU results data”, but there is no unabbreviated designation in these independent claims for ‘NLU’.  Here, ‘NLU’ should be written out in full at its first occurrence as “natural language understanding”. 
Claims 12 and 20 set forth “the record speech”, which should be “the recorded speech”.
Appropriate correction is required.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5 to 6, 8, 12 to 14, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Maker et al. (U.S. Patent Publication 2019/0251960) in view of Brown et al. (U.S. Patent Publication 2015/0185996).
Concerning independent claims 5 and 13, Maker et al. discloses a system and method for trigger word detection with multiple digital assistants, comprising:
“a first component that outputs NLU results data” – natural language understanding unit (NLU) 1104 receives text results from automated speech recognizer (ASR) 1102; NLU 1104 may generate a meaning representation of text results through natural language understanding techniques; NLU 1104 may generate an intent through natural language understanding techniques; a user 136 may say, “Hey Roku, play jazz on Pandora on my television”, and NLU 1104 may determine an intent of the user 136 to play jazz on an application 194 (e.g., Pandora) on display device 104 (¶[0100] - ¶[0101]: Figure 11); broadly, a meaning and intent of text results from speech recognition is “NLU results data”;
“a second component that: receives first data representing at least one trigger corresponding to at least one natural language processing (NLP) system assistant” – a 
“determines, from among a plurality of NLP system assistants, a first NLP system assistant based at least in part on the at least one trigger” – a digital assistant may be selected from among multiple digital assistants (Abstract); a digital assistant is selected from multiple digital assistants in response to a voice input; a mediator selects a digital assistant for the user that is associated with a voice adapter that outputs a highest confidence score (¶[0009]); audio responsive electronic device 122 may select a digital assistant 180 from among a plurality of digital assistants 180 in voice platform 192 to process voice commands (¶[0062]: Figure 1); 
“outputs second data including a first NLP system assistant identifier corresponding to the first NLP system assistant” – voice platform 192 may select a digital assistant 180 based on a lookup table that maps trigger words to a particular digital assistant 180 (¶[0094]: Figure 1); voice platform 192 may select a digital assistant 180 based on a lookup table that maps different trigger words to digital assistants 180 (¶[0135]: Figure 13: Step 1306); here, “a first NLP system assistant identifier” is determined from a lookup table; Compare Specification, ¶[0054], which equivalently describes a table that stores wakewords associated with assistant 
“a text-to-speech (TTS) component that: receives text data associated with the first NLP system assistant identifier” – text-to-speech (TTS) unit 1106 may generate an audio response in response to generation of an intent (¶[0117]: Figure 11); audio responsive electronic device 122 sends a reply message to audio responsive electronic device 122, where the reply message to “When does the new season of Game of Thrones start?” may be “I don’t know” or “Soon” (¶[0225] - ¶[0230]); implicitly, a text-to-speech unit receives text data and outputs speech corresponding to the text data;
“[using the voice model], generates synthesized speech corresponding to the text data [in the voice specific] to the first NLP system assistant” – text-to-speech (TTS) unit 1106 may generate an audio response in response to generation of an intent (¶[0117]: Figure 11); audio responsive electronic device 122 processes the response, where the response may be a message to audibly play back to user 136; audio responsive electronic device may play over speakers 190, “I don’t know” or “Soon” (¶[0232]: Figure 10: Step 1014); audio responsive electronic device 1402 may generate a message that is based on a retrieved topic and customized for user 136, where the customized message is, “The most popular Hulu show in Palo Alto is Shark Tank” or “You listened to Pandora for 13 hours last month” (¶[0266] - ¶[0268]: Figure 17).
Concerning independent claims 5 and 13, the only limitations not expressly disclosed by Maker et al. are “determines a voice model associated with the first NLP system assistant identifier, the voice model corresponding to a voice specific to the first Maker et al. discloses generating synthesized speech by a text-to-speech unit 1106, and states that digital assistants 180 may use a library 188 to control audio responsive electronic device 122.  (¶[0210]: Figure 1)  A tone category may correspond to an emotional state that a digital assistant 180 may wish to convey when sending a message to a user 136 via audio responsive electronic device 122.  An emotional state may be designated as a ‘happy’ emotional state or a ‘sad’ emotional state.  (¶[0212]: Figure 9)  Broadly, synthesizing audio according to an emotional state may be construed to be “a voice model” and “in a voice specific to” a digital assistant.  However, even if determining a voice model corresponding to a voice specific to an assistant and unique from other assistants is omitted by Maker et al., this is taught by Brown et al.  
Concerning independent claims 5 and 13, Brown et al. teaches virtual assistant team identification, where a team may include multiple virtual assistants that are configured with different characteristics including base language models and personalities.  (Abstract)  A virtual assistant characteristic module 218 may be configured to customize or configure a characteristic of a virtual assistant.  (¶[0089]: Figure 2)  Characteristics of a virtual assistant may include an audible manner of output, i.e., how a virtual assistant speaks to a user.  This may include an accent of a virtual assistant, e.g., English, Australian, etc., or a personality, i.e., how a virtual assistant responses to a user.  This may include a virtual assistant acting cheerfully, angry, depressed, etc.  (¶[0099] and ¶[0101])  Compare Specification, ¶[0148], which implies Brown et al., then, teaches determining “a voice model” associated with a digital assistant, where a voice model is specific to a digital assistant and unique from other digital assistants.  An objective is to enhance a user’s experience with a virtual assistant.  (¶[0002])  It would have been obvious to one having ordinary skill in the art to provide a unique voice model for each of a plurality of digital assistants as taught by Brown et al. in a selected digital assistant of Maker et al. for a purpose of enhancing a user’s experience with a virtual assistant.

Concerning claims 6 and 14, Brown et al. teaches that characteristics of a virtual assistant may include a lexicon, i.e., a set of words that are understood and/or used by a virtual assistant; a travel assistant may be associated with a set of travel words, e.g., flight terminology, car rental terminology, etc.  (¶[0090] and ¶[0094]: Figure 2)
Concerning claims 8 and 16, Maker et al. discloses that a user 136 may say, “Hey Roku, play jazz on Pandora on my television”, and NLU 1104 may determine an intent of the user 136 to play jazz on an application 194 (e.g., Pandora) on display device 104 (¶[0100] - ¶[0101]: Figure 11).  Here, “Pandora on my television” is “the first data comprises a first NLP system assistant trigger representing the first NLP system assistant is associated with a first device that captured a natural language input corresponding to the NLU results data”; that is, ‘Pandora’ or ‘my television’ is “a first device” in “NLU results data”.  Additionally, “Hey Roku” is “the first data comprises a second NLP system assistant trigger corresponding to a natural language name of the first NLP system assistant included in the natural language input”; that is, ‘Roku’ is “a 
Concerning claims 12 and 20, Brown et al. teaches that contextual information may be used by a virtual assistant characteristic module 218 to customize a virtual assistant, where conversation information describes a conversation between a user and a virtual assistant, either during a current session or during a previous session, e.g., a conversational history, may be used to customize a conversation.  (¶[0057] and ¶[0059]: Figure 2)  A virtual assistant may be configured to emulate or mimic how a user interacts with the virtual assistant, e.g., if the user talks fast, a virtual assistant may speak fast.  (¶[0102])  Broadly, “recorded speech of a human” can be speech obtained during a current session or during a previous session via a conversational history.  If a virtual assistant outputs speech that is spoken fast when a user speaks quickly, then “the record speech corresponding to the voice specific to the first NLP system assistant” and “causes the audio data to be output”.  That is, speech is output as audio by a first virtual assistant that corresponds to some characteristic of speech received from a human.

Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Maker et al. (U.S. Patent Publication 2019/0251960) in view of Brown et al. (U.S. Patent Publication 2015/0185996) as applied to claims 5 and 13 above, and further in view of Liu et al. (U.S. Patent Publication 2020/0227039).
Maker et al. discloses that a confidence score is generated for a trigger word, and a digital assistant is selected based on the confidence score.  (Abstract; ¶[0009])  Maker et al., then, discloses that a platform “determines the first NLP system assistant based in part on a first NLP system assistant trigger word in the first data” and “determines a second NLP system assistant based at least in part on a second NLP system assistant trigger in the first data”, and “outputs the second data based at least in part on the first NLP system assistant trigger . . . the second NLP system assistant trigger”.  That is, a digital assistant is selected from a first and second digital assistant based on first data of a trigger word.  Moreover, an embodiment of voice platform 192 may overrule a selected digital assistant 180 by selecting a different digital assistant 180 than is normally selected based on a detected trigger word.  (¶[0107]: Figure 1)  Voice platform 192 may track the usage of different digital assistants 180 using various criteria including time of day, location, and frequency.  A majority of users 136 may use a digital assistant 180 from Google, Inc., to look up general information, but a user 136 may submit a voice input of “Hey, Siri, what is the capital of Minnesota” that would normally be processed by Apple, Inc., due to a user’s use of a trigger word, “Hey, Siri”.  Voice platform 192 may consult a crowdsource server to determine another digital assistant 180 should be used, and may then send the voice input to Google digital assistant 180 rather than Siri.  A server may increment a Siri counter relating to general information queries by one, and if a majority of users request Siri to process general information queries, so that Siri’s counter becomes greater than Google’s, then voice platform 180 will display these queries to Siri.  (¶[0107] - ¶[0110]: Figure 1)  
Here, Maker et al. does not expressly disclose the limitations of “determines a first weight associated with the first NLP system assistant trigger” and “determines a second weight associated with the second NLP system assistant trigger”, but these Liu et al. teaches electronic device and voice command identification that includes weight assignment.  One or more commands including a target command are mapped to a target voice signal.  If a total weight corresponding to a first command is greater than a confidence threshold, then a voice command mapping circuit determines that the first command is reliable as a target command.  (¶[0090] - ¶[0094]: Figure 6)  Liu et al., then, teaches using weights instead of counts incremented by Maker et al.  An objective is to provide voice identification in assistants of Google assistant, Apple Siri, and Amazon Alexa to execute a target command.  (¶[0003] - ¶[0004])  It would have been obvious to one having ordinary skill in the art to determine first and second weights associated with commands as taught by Liu et al. as confidence scores of trigger words in digital assistants of Maker et al. for a purpose of executing a target command by voice identification.  

Claims 9 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Maker et al. (U.S. Patent Publication 2019/0251960) in view of Brown et al. (U.S. Patent Publication 2015/0185996) as applied to claims 5 and 13 above, and further in view of Bricklin et al. (U.S. Patent Publication 2019/0221225).
Maker et al. discloses trigger words for voice assistants, but does not expressly disclose a trigger for “a default NLP system assistant”.  However, ‘default’ systems are generally known when a selection must be performed among a plurality of options.  Bricklin et al. teaches an automated voice assistant personality selector, where a response may be in connection with a task using a default personality for a voice-enabled virtual assistant selected based on the task.  (Abstract)  Speech output of a voice-enabled virtual assistant may have characteristics that define its personality including tone, timbre, sex, accent, etc.  (¶[0002])  Multiple voice personalities or characteristics may be stored in a voice-enabled virtual assistant that may be selected as circumstances dictate.  (¶[0012])  A personality selector 165 may select a default personality for a task to use.  (¶[0027]: Figure 1)  Voice processor 125 may work in conjunction with device 110 to output a description of a task using a default personality selected for a voice-enabled virtual assistant by personality selector 165.  (¶[0029]: Figure 1)  An objective is to provide a voice-enabled virtual assistant with an ability to determine an optimal personality characteristic for a task and a user to help drive the user to complete the task in a shorter amount of time.  (¶[0009])  It would have been obvious to one having ordinary skill in the art to determine a default virtual assistant as taught by Bricklin et al. from trigger words of Maker et al. for a purpose of determining an optimal personality for a user to complete a task in a shorter amount of time.

Claims 10 to 11 and 18 to 19 are rejected under 35 U.S.C. 103 as being unpatentable over Maker et al. (U.S. Patent Publication 2019/0251960) in view of Brown et al. (U.S. Patent Publication 2015/0185996) as applied to claims 5 and 13 above, and further in view of Casado et al. (U.S. Patent Publication 2020/0342866).
Concerning claims 10 and 18, Maker et al. discloses that a text-to-speech unit “causes the synthesized speech to be output”, and that a trigger word may stop a Maker et al. omits “after causing the synthesized speech to be output, causes the first NLP system assistant to no longer be an active assistant for a set of related natural language inputs and system outputs occurring via a first device of a period of time.”  That is, Maker et al. discloses that voice input of “Hey Roku, Stop” may ‘cause an assistant to no longer be an active assistant for inputs and outputs’, but does not appear to state that this inactivity is “for a period of time”.  Compare Specification, ¶[00166].  Still, timeout periods for voice assistants are fairly well known.  Specifically, Casado et al. teaches context-specific hotwords to invoke automated assistants, where an automated assistant transitions back into an inactive state once processing is complete.  If no audio input is received after detection of an invocation word, then a timeout (‘TO’) may transition automated assistant 120 from a general listening state back into an inactive state.  If a sufficient amount of time passes while automated assistant 120 is in a first context-specific listening state without detection of activated hotwords, then a timeout (‘TO’) may transition automated assistant 120 back into an inactive state.  (¶[0066] and ¶[0068]: Figure 1)  Casado et al., then, teaches that an assistant may “no longer be an active assistant . . . via a first device of a period of time.”  An objective is to transition an automated assistant back into a default inactive state so that later utterances not intended for processing are not captured or processed.  (¶[0066])  It would have been obvious to one having ordinary skill in the art to provide an inactive state after a period of time as taught by Casado et al. in voice assistants of Maker et al. for a purpose of ensuring that later utterances not intended for processing are not captured or processed.

Casado et al., a user can then invoke a trigger word for another voice assistant of Maker et al.  So, Maker et al. “causes a second NLP system assistant to be a second active assistant of the set of related natural language inputs and outputs” “after causing the first NLP system assistant to no longer be an active assistant” when there is a timeout period of Casado et al.

Allowable Subject Matter
Claims 1 to 4 are allowed.

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.
Smith et al. and Dunjic et al. disclose related prior art.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached on Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272- 5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        May 3, 2021