DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged. The prior-filed applications (provisional application No. 62/923342 Filed on 10/18/2019) are acknowledged. 

Information Disclosure Statement
The information disclosure statement(s)(IDS) submitted on the following dates 1/21/2021, and 3/1/2021 have been considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 1, 9, 14, 17, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Endo et al.  (US7228275B1)(hereinafter "Endo"),  Gilbert  et al.  (US20120084086A1)(hereinafter "Gilbert "),  and Mathias et al. (US20190325873A1)(hereinafter "Mathias").

Regarding claim 1, 17, and 18 Endo teaches a method comprising, by one or more computing systems: determining, for each transcription, a combination of one or more intents and one or more slots to be associated with the transcription; (Col. 5, lines 52 – 64:”For example, one speech recognizer 202 may output its speech recognition result as “I want to turn on the television” with a confidence score of 70. In another embodiment, the speech recognizers 202, 204, 206 may output their results in the form of slot-value pairs. For example, output its speech recognition result as slot-value pairs, such as <device=“television”: confidence score=80>and <action=“on”: confidence score=60>. In the context of the speech recognition results output from the speech recognizers used in the speech recognition system of the present invention, the term “speech text” includes speech recognition results in the form of slot-value pairs.”).
selecting, by a meta-speech engine, one or more combinations of intents and slots from the plurality of combinations to be associated with the first audio input; (Col. 7, lines 8 – 19:”The decision module 208 [meta-speech engine] is coupled to the speech recognizers 202, 204, 206, the external input module 108, and the NIDI module 110. The decision module 208 includes a speech text and confidence score buffer array 302, a processor 304, an external data buffer 306, a memory device 308, and a control data buffer 316. The speech text and confidence score buffer array 302 is a memory, and receives and temporarily stores the recognized speech text and associated raw confidence scores while the decision module 208 selects the most accurate speech text according to the method of the present invention.”, and “Col. 5, lines 52 – 64:”For example, one speech recognizer 202 may output its speech recognition result as “I want to turn on the television” with a confidence score of 70. In another embodiment, the speech recognizers 202, 204, 206 may output their results in the form of slot-value pairs. For example, one speech recognizer 204 may output its speech recognition result as slot-value pairs, such as <device=“television”: confidence score=80>and <action=“on”: confidence score=60>. In the context of the speech recognition results output from the speech recognizers used in the speech recognition system of the present invention, the term “speech text” includes speech recognition results in the form of slot-value pairs.”).

Gilbert teaches generating a plurality of transcriptions corresponding to the first audio input based on a plurality of automatic speech recognition (ASR) engines, (Par. 0037:” Next, the machine-learning algorithm 300 of FIG. 3 selects speech recognition candidates from segments of the speech recognition outputs 312a, 312b, 314, 316, and 318 of FIG. 3, based on the at least one speech recognition confidence score for the respective speech recognition outputs 608. The machine-learning algorithm 300 of FIG. 3 then combines the speech recognition candidates to yield a combination of the speech recognition candidates 610, and generates a text string 330 of FIG. 3 based on the combination 612.”).
wherein each ASR engine is associated with a respective domain of a plurality of domains; (Par. 0017:” The system determines the best recognition performance by aggregating information from a collection of domain-specific speech recognizers.”, and Par. 0029:” The system 202 first receives speech 302. The system 202 then recognizes the received speech with a collection of domain-specific speech recognizers 304, 306, 308, and 310, to yield respective speech recognition outputs 302. The collection of domain-specific speech recognizers 304, 306, 308, and 310 includes at least two experts from different domains; at least one of the different domains includes SMS, question/answering, video search, broadcast news, voicemail to text, web search, or local business search.”).
generating a response to the first audio input based on the selected combinations; and (Par. 0026:” The ASR module 202 analyzes speech input and provides a textual transcription of the speech input as output. SLU module 204 can receive the transcribed input and can use a natural language understanding model to analyze the group of words that are included in the transcribed input to derive a meaning from the input. The role of the DM module 206 is to interact in a natural way and help the user to achieve the task that the system is designed to support. The DM module 206 receives the meaning of the speech input from the SLU module 204 and determines an action, such as, for example, providing a response, based on the input.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Endo in view of Gilbert to generate a plurality of transcriptions corresponding to the first audio input based on a plurality of automatic speech recognition, wherein each ASR engine is associated with a respective domain of a plurality of domains; generating a response to the first audio input based on the selected combinations in order to allow for speech recognition across multiple applications or environments without model customization or knowledge of the domain of the received speech, where it requires a lower volume of data, thereby increasing scalability and reducing cost, and provides numerous additional benefits, such as higher speech recognition performance and rapid deployment of speech applications without intensive development of expertise, as evidence by Gilbert (See Par. 0039)

Neither Endo, nor Gilbert teach receiving, from a client system associated with a first user, a first audio input; sending, to the client system, instructions for presenting the response to the first audio input.
Mathias teaches receiving, from a client system associated with a first user, a first audio input; (Par. 0027:” The client device 102 may include a speaker or other audio output component for presenting or facilitating presentation of audio content. In addition, the client device 102 may contain a microphone or other audio input component for accepting speech input on which to perform speech recognition.”).
sending, to the client system, instructions for presenting the response to the first audio input. (Par. 0021:” The spoken language processing system 100 may transmit the TTS audio to the client device 102 at [B].”, and Par. 0026:” A user may use the client device 102 to submit utterances, receive information, and initiate various processes, either on the client device 102 or at the spoken language processing system 100. For example, the user can issue spoken commands to the client device 102 in order to get directions or listen to music, as described above.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Endo and Gilbert in view of Mathias to receive, from a client system associated with a first user, a first audio input; sending, to the client system, instructions for presenting the response to the first audio input, in order to improve the efficiency and accuracy of the ASR module, as evidence by Mathias (see Par. 0011)


Regarding claim 17 Gilbert further teaches one or more computer-readable non-transitory storage media embodying software that is operable when executed to (Par. 0040:” Embodiments within the scope of the present disclosure may also include tangible computer-readable storage media for carrying or having computer-executable instructions or data structures stored thereon for controlling a data processing device or other computing device. Such computer-readable storage media can be any available media that can be accessed by a general purpose or special purpose computer, including the functional design of any special purpose processor as discussed above....or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions, data structures, or processor chip design.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Endo in view of Gilbert to employ one or more computer-readable non-transitory storage media embodying software that is operable when executed to:, in order to allow for speech recognition across multiple applications or environments without model customization or knowledge of the domain of the received speech, where it requires a lower volume of data, thereby increasing scalability and reducing cost, and provides numerous additional benefits, such as higher speech recognition performance and rapid deployment of speech applications without intensive development of expertise, as evidence by Gilbert (See Par. 0039)


computing device 100 further includes storage devices 160 such as a hard disk drive, a magnetic disk drive, an optical disk drive, tape drive or the like… The drives and the associated computer readable storage media provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the computing device 100. In one aspect, a hardware module that performs a particular function includes the software component stored in a non-transitory computer-readable medium in connection with the necessary hardware components, such as the processor 120, bus 110, display 170, and so forth, to carry out the function.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Endo in view of Gilbert to employ one or more processors; and a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to, in order to allow for speech recognition across multiple applications or environments without model customization or knowledge of the domain of the received speech, where it requires a lower volume of data, thereby increasing scalability and reducing cost, and provides numerous additional benefits, such as higher speech recognition performance and rapid deployment of speech applications without intensive development of expertise, as evidence by Gilbert (See Par. 0039)


Regarding claim 9, Endo further teaches the method of Claim 1, wherein generating the plurality of transcriptions comprises: sending the first audio input to each of the ASR engines of the plurality of ASR engines; (Col. 4, lines 4 – 11:” the microphone 102 receives speech commands from a user or a speaker [not shown] and converts the speech to an input speech signal on signal lines 120. The microphone 102 passes the input speech signal 120 to the speech recognition system 104. The speech recognition system 104 preferably comprises multiple speech recognizers to recognize the input speech signal 120, according to an embodiment of the present invention.).
and receiving the plurality of transcriptions from the plurality of ASR engines. (Col. 2, lines 28 – 33:” The present invention provides a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts and associated confidence scores and a decision module selecting one of the speech texts based upon their associated confidence scores.", and Col. 5, lines 17 – 22:” The speech recognition system includes N number of speech recognizers including a first speech recognizer 202, a second speech recognizer 204, and an Nth speech recognizer 206, and a decision module 208 coupled to receive recognized speech texts 130 from each of the N speech recognizers #1-#N.").


Regarding claim 14, Endo further teaches the method of Claim 1, wherein one of the plurality of ASR engines is a combined ASR engine based on two or more discrete ASR engines, N number of speech recognizers including a first speech recognizer 202, a second speech recognizer 204, and an Nth speech recognizer 206, and a decision module 208 coupled to receive recognized speech texts 130 from each of the N speech recognizers #1-#N.”, and Col. 8, lines 52 – 67:”Consider the example of the speech recognition system of the present invention attempting to recognize the input speech “Ten University Avenue, Palo Alto” using two grammar-based speech recognizers and a statistical speech recognizer. The first grammar-based speech recognizer may recognize the input speech as “Ten University Avenue, Palo Alto” with a confidence score of 66. The second grammar-based speech recognizer may recognize the input speech as “Ten University Avenue, Palo Cedro” with a confidence score of 61. The statistical speech recognizer may recognize the input speech as “When University Avenue, Palo Alto” with a confidence score of 60. According to the embodiment described in FIG. 4, the speech recognition system will select the speech recognition result “Ten University Avenue Palo Alto” from the first grammar-based speech recognizer, since it has the highest confidence score [66].”, and Col. 9, lines 21 – 28:”The method then receives 509 a speech input and performs 510 multiple speech recognition on the speech input. The method then receives and stores 511 the recognized speech texts and associated raw confidence scores from the multiple speech recognizers. The raw confidence scores are then adjusted 512 based on the speech selection parameters to generate adjusted confidence scores.  ).
wherein each of the two or more discrete ASR engines is associated with a separate domain of the plurality of domains. (Col. 5, lines 29 – 39:” The speech recognizers 202, 204, 206 can be any type of conventional speech recognizer, such as a grammar-based speech statistical speech recognizer. The speech recognizers 202, 204, 206 may be customized to operate well with the present invention. In addition, the speech recognizers 202, 204, 206 may include more than one of the same type of speech recognizers. For example, the speech recognizers 202, 204, 206 may include two grammar-based speech recognizers each using a different set of grammars and one statistical speech recognizer.”).


Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Endo, Gilbert, and Mathias, as applied to claim 1, in further view of  Varadharajan et al. (US20180330725A1)(hereinafter "Varadharajan").

Regarding claim 2, Endo, Gilbert and Mathias teach a method comprising, by one or more computing systems.
Endo, Gilbert and Mathias do not teach the method of Claim 1, wherein each ASR engine is associated with one or more agents of a plurality of agents specific to the respective ASR engine.
Varadharajan teaches wherein each ASR engine is associated with one or more agents of a plurality of agents specific to the respective ASR engine. (Par. 0029:” In at least one embodiment, language-based intelligent agents 128[a-c] are added to the bot directory 110 by the intelligent agent system 100 such that multiple different language-based intelligent agents are accessible to a single speech recognition engine 116.”).
.


Claims 3, 5, 6, 15, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Endo, Gilbert, and Mathias, as applied to claim 1, in further view of  Lavallee et al. (US20160070696A1)(hereinafter "Lavallee").

Regarding claims 3, 5, 6, 15, and 16 Endo, Gilbert and Mathias teach a method comprising, by one or more computing systems.
With respect to claim 3 Endo, Gilbert and Mathias do not teach the method of Claim 1, wherein each domain of the plurality of domains comprises one or more agents specific to the respective domain. 
Lavallee teaches wherein each domain of the plurality of domains comprises one or more agents specific to the respective domain. (Par. 0065:” In some embodiments, when a task 204 is inactive, then none of its agents may collect information and/or be activated. Further, if a corresponding condition is met, then a task 204 may be activated, and agent 206 may begin to collect information. This may be referred to as task activation. Activation conditions may be based on one or more semantic slot values 208, which may communicate with the selector 202. Examples of slot values may be domain, intent, etc. A slot value 208 may correspond to a task, and may be used to identify the action to be taken by the application.”, and Par. 0066:” System 300 may an example of a banking dialogue system and/or application. System 300 may include a selector 302, a slot value 304 [e.g., intent], three tasks [e.g., transfer fund 306, pass bill 308, check balance 310], 7 tasks [e.g., from account 312, to account 314, amount 316, from account 318, bill 320, amount 322, and account 324]. Task 306 may be associated with the condition of intent to transfer, task 308 may be associated with the condition of intent to pay, and condition 310 may be associated with the condition of intent to determine a balance. Agents 312, 314, and 316 may be associated with task 306, agents 318, 320, and 322 may be associated with task 308, and agent 324 may be associated with 310. Thus, according to some aspects, a task's corresponding agents might not be activated until the task is activated.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Endo, Gilbert and Mathias in view of Lavallee to employ each domain of the plurality of domains comprises one or more agents specific to the respective domain, in order to provide flexibility in managing task interruptions by allowing or not allowing rules or restrictions to be implemented that may, for example, allow or not allow some currently active operations/tasks (such as bank transferring task) to be interrupted by other operations/tasks (such as a balance checking task), as evidence by Lavallee (See Par. 0008).


With respect to claim 5 Endo, Gilbert and Mathias do not teach the method of Claim 1, wherein each domain of the plurality of domains comprises a set of tasks specific to the respective domain.
Lavallee teaches wherein each domain of the plurality of domains comprises a set of tasks specific to the respective domain. (Par. 0065:” Examples of slot values may be domain, intent, etc. A slot value 208 may correspond to a task, and may be used to identify the action to be taken by the application.”, and Par. 0070:” For example, a dialogue application may be able to switch from a transfer task to a check balance task, but not from transfer task to a payment task. According to some aspects, the rules or restrictions may include which data might need to captured before switching to a particular task. This may include intent or domain information.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Endo, Gilbert and Mathias in view of Lavallee to employ each domain of the plurality of domains comprises a set of tasks specific to the respective domain, in order to provide flexibility in managing task interruptions by allowing or not allowing rules or restrictions to be implemented that may, for example, allow or not allow some currently active operations/tasks (such as bank transferring task) to be interrupted by other operations/tasks (such as a balance checking task), as evidence by Lavallee (See Par. 0008).


With respect to claim 6 Endo, Gilbert and Mathias do not teach the method of Claim 1, wherein the plurality of domains are associated with a plurality of agents, and wherein each agent is operable to execute one or more tasks specific to one or more of the domains.
Lavallee teaches wherein the plurality of domains are associated with a plurality of agents, and (Par. 0037:” For example, the conversational language processor 120 may include, among other things, an intent determination engine 130a, a constellation model 130b, one or more domain agents 130c, a context tracking engine 130d, a misrecognition engine 130e, and a voice search engine 130f.”).
wherein each agent is operable to execute one or more tasks specific to one or more of the domains. (Par. 0066:” Thus, according to some aspects, a task's corresponding agents might not be activated until the task is activated. For example, it system 300 determined that the intent of an input was to transfer money, and a corresponding condition was met, then task 306 may be activated, and one of agents 312, 314, and 316 may be activated and/or may begin to collect information associated with task 306 and/or input [e.g., by asking questions of a user to collect this information]”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Endo, Gilbert and Mathias in view of Lavallee to employ wherein the plurality of domains are associated with a plurality of agents, and wherein each agent is operable to execute one or more tasks specific to one or more of the domains, in order to provide flexibility in managing task interruptions by allowing or not allowing rules or restrictions to be implemented that may, for example, allow or not 


With respect to claim 15 Endo, Gilbert and Mathias do not teach the method of Claim 1, wherein the response comprises one or more of an action to be performed or one or more results generated from a query.
Lavallee teaches wherein the response comprises one or more of an action to be performed or one or more results generated from a query. (Par. 0083:” The responses based on the query produced by each set of models is scored, with the overall highest ranked result from all applied domains is ordinarily selected to be the correct result”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Endo, Gilbert and Mathias in view of Lavallee to employ wherein the response comprises one or more of an action to be performed or one or more results generated from a query, in order to provide flexibility in managing task interruptions by allowing or not allowing rules or restrictions to be implemented that may, for example, allow or not allow some currently active operations/tasks (such as bank transferring task) to be interrupted by other operations/tasks (such as a balance checking task), as evidence by Lavallee (See Par. 0008).



Lavallee teaches wherein the instructions for presenting the response comprises a notification of the action to be performed or a list of one or more results. (Par. 0012:” After causing the electronic device to output a notification that the first choice restaurant is incapable of fulfilling the request [and, in some embodiments, confirming that the alternative restaurant is acceptable], the system may then send instructions to the alternative restaurant to process the order, fulfilling the request”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Endo, Gilbert and Mathias in view of Lavallee to employ wherein the instructions for presenting the response comprises a notification of the action to be performed or a list of one or more results, in order to provide flexibility in managing task interruptions by allowing or not allowing rules or restrictions to be implemented that may, for example, allow or not allow some currently active operations/tasks (such as bank transferring task) to be interrupted by other operations/tasks (such as a balance checking task), as evidence by Lavallee (See Par. 0008).


Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Endo, Gilbert, Mathias, and Lavallee as applied to claim 3, in further view of  Ni et al. (US20190012198A1)(hereinafter "Ni").

Regarding claim 4 Endo, Gilbert, Mathias, and Lavallee teach a method comprising, by one or more computing systems.
With respect to claim 4, Endo, Gilbert, Mathias, and Lavallee do not teach the method of Claim 3, wherein the agents comprise one or more of a first-party agent or a third-party agent.
Ni teaches wherein the agents comprise one or more of a first-party agent or a third-party agent. (Par. 0031:” The agent modules can be managed by separate remote servers and the automated assistant 104 can access the remote servers over a network 130. When the automated assistant 104 identifies the agent modules suitable for completing the multitask command, an agent interaction engine 112 can delegate tasks to each identified agent module. The automated assistant 104 can invoke each agent module to perform one or more tasks of the delegated tasks by transmitting a signal over the network 130 to each server device that hosts an agent module. For example, the automated assistant 104 can access a first server 118, a second server 120, and an Nth server 122 that each host a first agent module 124, a second agent module 126, and an Nth agent module 128, respectively.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Endo, Gilbert, Mathias, and Lavallee in view of Ni to employ wherein the agents comprise one or more of a first-party agent or a third-party agent, in order  to provide  in a multitask command scenario to the automated assistant 104 from the user, the agent interaction engine 112 can delegate tasks to the agent modules in a series or in parallel, as evidence by Ni (See Par. 0032).

Claims 7, and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Endo, Gilbert, and Mathias, as applied to claim 1, and 7 respectively, in further view of  Secker-Walker et al. (US20200388282A1)(hereinafter " Secker-Walker").

Regarding claims 7, and 8 Endo, Gilbert and Mathias teach a method comprising, by one or more computing systems.
With respect to claim 7 Endo, Gilbert and Mathias do not teach the method of Claim 1, further comprising: identifying, for each combination of intents and slots, a domain of the plurality of domains, wherein selecting the one or more combinations of intents and slots comprises mapping the domain of each combination of intents and slots to the domain associated with one of the plurality of ASR engines.
Secker-Walker teaches identifying, for each combination of intents and slots, a domain of the plurality of domains (Par. 0036:” In some embodiments, the spoken language processing system 102 may include multiple single-domain ASR modules. Each ASR module can correspond to a single domain or intent. For example, one ASR module may correspond to a “music domain” [including intents such as a play music intent], one ASR module may correspond to a weather domain, one may correspond to a travel domain, and another may correspond to a shopping domain. Each module may include its own decoding graph with tags corresponding to the carrier phrase and content slot portions of intents for the specific domain. For example, in a single-domain ASR module decoding graph corresponding to the play music intent, the word “shop” may only be tagged as corresponding to the “song title” content slot.”).
domain ASR modules. Each ASR module can correspond to a single domain or intent. For example, one ASR module may correspond to a “music domain” [including intents such as a play music intent], one ASR module may correspond to a weather domain, one may correspond to a travel domain, and another may correspond to a shopping domain. Each module may include its own decoding graph with tags corresponding to the carrier phrase and content slot portions of intents for the specific domain. For example, in a single-domain ASR module decoding graph corresponding to the play music intent, the word “shop” may only be tagged as corresponding to the “song title” content slot.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Endo, Gilbert and Mathias in view of Secker-Walker to identify, for each combination of intents and slots, a domain of the plurality of domains, wherein selecting the one or more combinations of intents and slots comprises mapping the domain of each combination of intents and slots to the domain associated with one of the plurality of ASR engines, in order to improve the efficiency and accuracy of the ASR module, as evidence by Secker-Walker (see Par. 0027)



Secker-Walker teaches wherein the one or more combination of intents and slots are selected when the domain of the respective combination of intents and slots matches the domain of one of the plurality of ASR engines. (Par. 0055:” In another embodiment, a single-domain ASR module contains a single decoding graph corresponding to a single possible intent. The decoding graph may contain tags identifying, among other things, carrier phrase and content portions of the given intent. A results generator can produce a plurality of tokens for the content slots corresponding to a given intent while collapsing the carrier portions of the given intent. The results generator can produce results from numerous single-domain ASR modules that simultaneously [or substantially simultaneously] processed the same utterance, each for a different intent. The results generator can then deliver a top list of results, as an N-best list, a confusion network, etc., to the NLU module. Speech recognition results from the spoken language processing system 100 may include multiple transcriptions of the audio data. Each transcription may be associated with a different intent or may contain different values for the content slots of each utterance. The results may include a transcript or n-best list of transcripts for a portion of the audio data, a cumulative transcript or n-best list of transcripts, part of a lattice, part of a consensus network, any other kind of speech recognition result known to those of skill in the art, etc. The results may only include transcriptions associated with a specific intent.”).



Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Endo, Gilbert, and Mathias, as applied to claim 1, in further view of  Chao et al. (US20190318729A1)(hereinafter "Chao").

Regarding claim 10 Endo, Gilbert and Mathias teach a method comprising, by one or more computing systems.
Endo further teaches wherein generating the plurality of transcriptions comprises selecting the one or more [[transcriptions generated from the third-party ASR engine ]] to determine the combination of intents and slots associated with each respective transcription. (Col. 5, lines 52 – 64:”For example, one speech recognizer 202 may output its speech recognition result as “I want to turn on the television” with a confidence score of 70. In another embodiment, the speech recognizers 202, 204, 206 may output their results in the form of slot-value pairs. For example, one speech recognizer 204 may output its speech recognition result as slot-value pairs, such as <device=“television”: confidence score=80>and action=“on”: confidence score=60>. In the context of the speech recognition results output from the speech recognizers used in the speech recognition system of the present invention, the term “speech text” includes speech recognition results in the form of slot-value pairs.”).
Endo, Gilbert and Mathias do not teach transcriptions generated from the third-party ASR engine , wherein one or more of the ASR engines of the plurality of ASR engines are third-party ASR engines associated with third-party systems that are separate from and external to the one or more computing systems, the method further comprising: sending, to one of the third-party ASR engines, the first audio input to generate one or more transcriptions; and receiving, from the one of the third-party ASR engines, the one or more transcriptions generated by the third-party ASR engine.
Chao teaches transcriptions generated from the third-party ASR engine (Par. 0055:” The server device 102 can include multiple speech recognition models 136 for multiple different languages can be utilized in processing of audio data to generate multiple candidate semantic representations [e.g., each corresponding to a different language]. The probability metrics [optionally dependent on current contextual parameter[s]] for the multiple different languages and/or measures for each of the multiple candidate semantic representations can be utilized to select only one of the candidate semantic representations as appropriate for generating and providing content that is responsive to the given spoken utterance.”).
wherein one or more of the ASR engines of the plurality of ASR engines are third-party ASR engines associated with third-party systems that are separate from and external to the one or more computing systems, (Par. 0055:” The server device 102 can include multiple speech recognition models 136 for multiple different languages can be utilized in processing of audio data to generate multiple candidate semantic representations [e.g., each corresponding to a different language]. “).
the method further comprising: sending, to one of the third-party ASR engines, the first audio input to generate one or more transcriptions; and (Par. 0056:” The speech recognition engine 134 can receive an audio recording of voice input, e.g., in the form of input audio signals or digital audio data, and uses one or more models to convert the received data into one or more text tokens.”).
receiving, from the one of the third-party ASR engines, the one or more transcriptions generated by the third-party ASR engine, (Par. 0057:” The text, and/or semantic representations of text, converted from the audio data can parsed by a text parser engine 110 and made available to the automated assistant 104 as textual data or semantic data that can be used to generate and/or identify command phrases from the user 130 and/or a third party application.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Endo, Gilbert and Mathias in view of Chao to employ wherein one or more of the ASR engines of the plurality of ASR engines are third-party ASR engines associated with third-party systems that are separate from and external to the one or more computing systems, the method further comprising: sending, to one of the third-party ASR engines, the first audio input to generate one or more transcriptions; and receiving, from the one of the third-party ASR engines, the one or more transcriptions generated by the third-party ASR engine, in order to have an improved response to user input may be received, where reducing occasions on which an automatic assistant is unresponsive or  


Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Endo, Gilbert, and Mathias, as applied to claim 1, in further view of  Jacobson et al. (US20200202845A1)(hereinafter " Jacobson").

Regarding claim 11 Endo, Gilbert and Mathias teach a method comprising, by one or more computing systems.
Endo, Gilbert and Mathias do not teach the method of Claim 1, further comprising: identifying one or more features for each combination of intents and slots, wherein the one or more features are indicative of whether the combination of intents and slots have an attribute; and ranking the plurality of combinations based on their respective identified features, wherein selecting the one or more combinations of intents and slots comprises selecting the one or more combinations of intents and slots based on the ranking of the plurality of combinations.
Jacobson teaches identifying one or more features for each combination of intents and slots (Par. 0107:” According to the arbitration example 300, whether or not a mode analysis 310 is performed, the various SIM-FS pairings [Slot Intent Model - Fullfilment strategy] 302, 304, 306, 308, or a subset thereof if any of the SIM-FS pairings have already been eliminated by the mode analysis 310, are further arbitrated by the arbitrator 74 using a taste chooses a chosen one of the SIM-FS pairings by determining, using taste profile arbitration rules 314, which SIM-FS pairing most closely aligns with the taste profile 242 [FIG. 1]. In the arbitration 300, for example, the arbitrator 74 determines that the SIM-FS pairing 302 is most closely aligned with the taste profile 242 associated with the account because that taste profile indicates an affinity for the artist Jane Doe [or a type of artist with which Jane Doe is affiliated] that exceeds an affinity associated with any of the other SIM-FS pairings. As a result the chosen SIM-FS pairing 316 as chosen by the arbitrator 74 is the SIM-FS pairing 302, causing initiation 318 of a playback service that plays back the track Coffee and Donuts by Jane Doe.”, and Par. 0114:” It should be appreciated that the arbitrations 300 [FIG. 4] and 400 described above represent non-limiting examples of arbitration schemes that can be performed by the arbitrator 74. In some examples, rules from multiple arbitration schemes [such as the taste profile arbitration rules 314 and multi-level confidence [feature] score arbitration rules 416 from the first and second example arbitrations 300 and 400, respectively, described above] are combined and the arbitrator 74 chooses the chosen SIM-FS pairing by applying rules from the multiple schemes, e.g., by using both a taste profile associated with an account and a ranking technique using multi-level confidence scores.”).
wherein the one or more features are indicative of whether the combination of intents and slots have an attribute; and (Par. 0114:” It should be appreciated that the arbitrations 300 [FIG. 4] and 400 described above represent non-limiting examples of arbitration schemes that can be performed by the arbitrator 74. In some examples, rules from multiple arbitration schemes [such as the taste profile arbitration rules 314 and multi-level confidence [feature] SIM-FS pairing by applying rules from the multiple schemes, e.g., by using both a taste profile associated with an account and a ranking technique using multi-level confidence [feature] scores).
ranking the plurality of combinations based on their respective identified features, (Par. 0109:” FIG. 5 schematically illustrates a second arbitration 400 carried out by the arbitrator 74 of FIG. 2 according to a further example arbitration scheme, using in part a multi-level set of a confidence [features] scores.”, and Par. 0110:” Referring to FIG. 5, according to a second example arbitration 400, the arbitrator 74 chooses the chosen SIM-FS [Slot Intent Model - Fulfillment strategy] pairing by ranking the plurality of selected fulfillment strategies output by the selector 70”).
wherein selecting the one or more combinations of intents and slots comprises selecting the one or more combinations of intents and slots based on the ranking of the plurality of combinations. (Par. 0109:” … the arbitrator 74 chooses the chosen SIM-FS [Slot Intent Model - Fulfillment strategy] pairing by ranking the plurality of selected fulfillment strategies output by the selector 70”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Endo, Gilbert and Mathias in view of Jacobson to identify one or more features for each combination of intents and slots, wherein the one or more features are indicative of whether the combination of intents and slots have an attribute; and ranking the plurality of combinations based on their respective .


Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Endo, Gilbert, and Mathias, as applied to claim 1, in further view of  Trim et al. (US20210011684A1)(hereinafter "Trim").

Regarding claim 12 Endo, Gilbert and Mathias teach a method comprising, by one or more computing systems.
Endo, Gilbert and Mathias do not teach the method of Claim 1, further comprising: identifying one or more same combinations of intents and slots from the plurality of combinations; and ranking the one or more same combinations of intents and slots based on the number of same combinations of intents and slots, wherein selecting the one or more combinations of intents and slots comprises using the ranking of the one or more same combinations of intents and slots.
Trim teaches identifying one or more same combinations of intents and slots from the plurality of combinations; and (Par. 0039:” In operation 320, the artifact component 130 ranks action pairs based on one or more utterance characteristics of the one or more vocalization profiles.”).
ranking the one or more same combinations of intents and slots based on the number of same combinations of intents and slots, (Par. 0039:”Usage of the common commands may represent a frequency of command usage, a time of command usage, a context of command usage [e.g., device context, event context, location context], and other suitable and relevant usage statistics.”, and Par. 0040:”The artifact component 130 may rank the set of action pairs based on the profile characteristics defined for the global profile [e.g., a first vocalization profile] or the user profile [e.g., a second vocalization profile]. In some embodiments, the artifact component 130 ranks the set of action pairs based on the global profile where profile characteristics for the user profile indicate the utterance is new or infrequently used by the user. The artifact component 130 may rank the set of action pairs based on the user profile as a default or where profile characteristics of the user profile represent a more accurate ranking.”).
wherein selecting the one or more combinations of intents and slots comprises using the ranking of the one or more same combinations of intents and slots. (Par. 0042:” In some embodiments, the artifact component 130 selects ranked action pairs based on a frequency of use of one or more of the utterance or the function. In such embodiments, the artifact component 130 may determine a number of ranked action pairs which are most frequently used for inclusion in the subset of ranked action pairs. The number of such subsets may be dynamically determined, determined based on user profile settings, or determined through any other suitable manner. Although described with respect to specified embodiments, it should be understood that the artifact component 130 may select any suitable number of ranked action pairs for inclusion in the subset of ranked action pairs.”, Par. 0043:”In operation 340, the artifact component 130 generates a set of visual artifacts corresponding to the subset of ranked action pairs. … In some embodiments, the visual artifacts may include text indicating the function to be performed. For example, where a first function of a first action pair opens a music application and a second function of a second action pair initiates playback of a recent playlist, the artifact component 130 may generate a first visual artifact with the icon of the music application and text reading ‘Open the App.’”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Endo, Gilbert and Mathias in view of Trim to identify one or more same combinations of intents and slots from the plurality of combinations; and ranking the one or more same combinations of intents and slots based on the number of same combinations of intents and slots, wherein selecting the one or more combinations of intents and slots comprises using the ranking of the one or more same combinations of intents and slots, in order to enable dynamic generation of a contextualized augmented reality interface, as evidence by Trim (See Par. 0013).


Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Endo, Gilbert, and Mathias, as applied to claim 1, in further view of  Gelfenbeyn et al. (US20170300831A1)(hereinafter " Gelfenbeyn").


Endo, Gilbert and Mathias do not teach the method of Claim 1, wherein generating the response to the first audio input comprises: sending the selected combinations to a plurality of agents; receiving a plurality of responses from the plurality of agents corresponding to the selected combinations; ranking the plurality of responses received from the plurality of agents; and selecting the response from the plurality of responses based on the ranking of the plurality of responses.
Gelfenbeyn teaches sending the selected combinations to a plurality of agents; (Par. 0081:” … the requests component 124A can send the agent requests to agents 140A-D and/or additional agents [e.g., it can be sent to all of agents 140A-N] without regard to intent.”).
receiving a plurality of responses from the plurality of agents corresponding to the selected combinations; (Par. 0085:” As one example, the selection component 124E may select the particular agent 176 based on only the responses 174 [e.g., select the agent with the response most indicative of ability to respond] ...As another example, the selection component 124E may utilize the responses 174 and a ranking of one or more of the agents 140A-D that is provided by agent context component 124B.”).
ranking the plurality of responses received from the plurality of agents; and (Par. 0085:” For instance, the selection component 124E may initially select two of the agents 140A-D whose responses are most indicative of ability to respond, then select only one of those based on the selected one having a higher ranking than the non-selected one.”).
select only one of those based on the selected one having a higher ranking than the non-selected one.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Endo, Gilbert and Mathias in view of Gelfenbeyn to send the selected combinations to a plurality of agents; receiving a plurality of responses from the plurality of agents corresponding to the selected combinations; ranking the plurality of responses received from the plurality of agents; and selecting the response from the plurality of responses based on the ranking of the plurality of responses, in order to provide an ability of a corresponding agent to generate appropriate responsive content, as evidence by Gelfenbeyn (See Par. 0126).




Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Kennewick et al. (US20040193420) teach one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created, and creates, stores and uses .
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689.  The examiner can normally be reached on Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-



/D.A./Examiner, Art Unit 2656   

/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656