DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Response to Amendments 
Applicant’s amendment filed on September 7, 2021 has been entered. 
In view of the amendment to the specification, the amendment(s) to title of the invention have been entered. 
In view of the amendment(s) to title of the invention, the objections to the specification have been withdrawn.
In view of the amendment to the claim(s), the amendment of claim(s) 1, 4-5, 10-11, 14-15 and 20 and the cancellation of claim(s) 3, 9, 13, and 19 have been acknowledged and entered.  
In view of the amendment to claim(s) 1, 4-5, 10-11, 14-15 and 20 and the cancellation of claim(s) 3, 9, 13, and 19, the rejections of claims 1-20 under 35 U.S.C. §102 and 35 U.S.C. §103 are withdrawn.
In light of the amended claims, new grounds for rejection under 35 U.S.C. §103 of claims 1-2,4-8,10-12,14-18 and 20 are provided in the response below. 

Response to Arguments
Applicant’s arguments regarding the prior art rejections under 35 U.S.C. §102 and 35 U.S.C. §103, see pages 12-14 of the Response to Non-Final Office Action dated June 7, 2021, which was received on September 7, 2021 (hereinafter Response and Office Action
Prior to entry of this amendment, claims 1, 4, 5, 7, 11, 14, 15, 17 and 20 are rejected under 35 U.S.C. § 102(a)(1) as being anticipated by Burke (U.S. Pat. App. Pub. No. 2006/0009980, hereinafter Burke); claims 2 and 12 are rejected under 35 U.S.C. § 103 as being unpatentable over Burke in view of Tang (U.S. Pat. App. Pub. No. 2015/0161994, hereinafter Tang); claims 3 and 13 are rejected under 35 U.S.C. § 103 as being unpatentable over Burke in view of Kemp (U.S. Pat. App. Pub. No. 2005/0114135, hereinafter Kemp); claims 6 and 16 are rejected under 35 U.S.C. § 103 as being unpatentable over Burke in view of Lebeau (U.S. Pat. App. Pub. No. 2015/0310867, hereinafter Lebeau) and White (U.S. Pat. App. Pub. No.  2019/0066670, hereinafter White); claims 8 and 18 are rejected under 35 U.S.C. § 103 as being unpatentable over Burke in view of Endo (U.S. Pat. No. 7,228,275, hereinafter Endo); and claims 9, 10 and 19 are rejected under 35 U.S.C. § 103 as being unpatentable over Burke in view of Pasko (U.S. Pat. App. Pub. No. 2019/0311720, hereinafter Pasko).
With respect to the rejection(s) of claim(s) 1, 4, 5, 7, 11, 14, 15, 17 and 20 under 35 U.S.C. §102(a)(1) in light of Burke, applicant asserts that the cited references above fail to teach or suggest at least “based on the accuracy of the ASR of the speech or the utterance of the user being less than a first preset value, transmit the audio signal of the speech or the utterance of the user to a server to perform the ASR of the speech or the utterance of the user; and based on the accuracy of the ASR of the speech or the utterance of the user being greater than or equal to the first preset value, perform at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.” Applicant’s arguments in light of the amended claims are persuasive. As such, the rejections of claims 1, 4, 5, 7, 11, 14, 15, 17 and 20 under 35 U.S.C. §102 are withdrawn.
Applicant further argues that dependent claims 2, 4-8, 10, 12 and 14-18 are allowable for at least the same reasons as independent claims 1, 11, and 20. Applicant’s arguments in light of the amended claims are persuasive. As such, the rejections of claims 2, 4-8, 10, 12 and 14-18 under 35 U.S.C. §102(a)(1) and 35 U.S.C. §103 are withdrawn.
However, upon further consideration, new ground(s) of rejection under 35 U.S.C. §103 are made in light of combinations of Burke, Tang, Kemp, Lebeau, White, Endo, Pasko, and newly cited reference Braho (U.S. Pat. App. Pub. No. 2014/0278391, hereinafter Braho).
Further, with respect to the rejection(s) of claim(s) 1, 4, 5, 7, 11, 14, 15, 17 and 20 under 35 U.S.C. §102(a)(1) in light of Burke, applicant asserts that Pasko and Burke are silent as to the above described features. Examiner respectfully disagrees. 
As discussed in the newly presented rejection below, Burke further discloses “wherein the at least one processor is further configured to: estimate an accuracy of the ASR of the speech or the utterance of the user… based on the accuracy of the ASR of the speech or the utterance of the user being less than a first preset value, transmit the audio signal of the speech or the utterance of the user to a server to perform the ASR of the speech or the utterance of the user; and based on the accuracy of the ASR of the speech or the utterance of the user being greater than or equal to the first preset value, perform… [ASR] in the electronic device.” (Burke, ¶ [0059]). Respectively, Pasko further discloses wherein ASR further comprises “performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.” (Pasko, ¶¶ [0111]).
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.	

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claims 1, 4-5, 7, 10-11, 14-15, 17, and 20 is/are rejected under 35 U.S.C. 103 as being obvious in light of Burke in view of Braho and Pasko.

Regarding claim 1, Burke discloses An electronic device comprising (“System 100 includes a mobile device 104”; Burke, ¶ [0035]): a memory storing one or more instructions (“computer system 300 upon which an embodiment of the invention may be implemented including server 108 and with some differences mobile device 104,” where “computer system 300 also includes a main memory 306... [storing] instructions.”; Burke, ¶ [0064]-[0065]), and at least one processor configured to execute the one or more instructions stored in the memory (“computer system 300” includes a “processor 304 [configured to] execut[e] sequences of instructions contained in main memory 306; Burke, ¶ [0068]), wherein when executing the one or more instructions the at least one processor is configured to: (The method is performed “by computer system 300 in response to processor 304 executing sequences of instructions contained in main memory 306.”; Burke, ¶ [0068]) determine that the electronic device is to perform automated speech recognition (ASR) (“Allocation of speech recognition tasks...” which is the speech recognition tasks being allocated to one or more of the “multiple speech recognizers,” “is determined based on complexity which is measured using one or more of several metrics,” where speech recognition tasks are automated speech recognition {“distributed recognition tasks have been allocated and recognized by the individual recognition engines” also referred to as “automated speech recognition (ASR) engines”}.; Burke, ¶¶ [0052], [0054]-[0055]) of a speech or an utterance of a user of the electronic device (The speech recognizer can be “mobile device 104,” thus an electronic device, and the user produces “utterances to be Burke, ¶¶ [0051], [0052]), based on ambient noise information of the electronic device (“Allocation of speech recognition tasks is determined based on complexity... [where] background noise determines the complexity level,” and “a noise detector is used on mobile device 104, which measures the noise level of the speech signal.” Thus, based on background noise {ambient noise information} of the mobile device 104.; Burke, ¶ [0052]) obtained from an audio signal of the speech or the utterance of the user of the electronic device (“A noise detector is used on mobile device 104, which measures the noise level of the speech signal,” thus the noise level (ambient noise information) is obtained from the speech signal (audio signal of the speech or utterance), and the user produces “utterances to be recognized” to create a speech signal at the mobile device 104 (electronic device).; Burke, ¶ [0052]), perform the ASR of the speech or the utterance of the user of the electronic device (“If the speech signal is too noisy, i.e., the signal is determined to exceed a preset threshold level, then the signal is determined to be complex…” where “lightweight recognition tasks {not complex} can be performed on mobile device 104 while heavyweight recognition tasks {complex} are allocated to server 108,” where recognition tasks are automated speech recognition; Burke, ¶¶ [0052], [0041]) based on determining that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device (“Using a complexity-based allocation scheme, the speech recognition task is allocated to a speech recognizer based on the recognition task's complexity.” In the case of the speech signal being determined to not be noisy, thus not complex, the “recognition tasks can be performed on mobile device 104” Thus the mobile device 104 (electronic device) determines that the mobile device 104 (electronic device) is to perform the recognition tasks (ASR) of the “speech signal” (speech or utterance) of the user of the mobile device 104 (electronic device).; Burke, ¶ [0041], [0052]), and output a response to the speech or the utterance of the user of the electronic device (“After the distributed recognition tasks have been allocated and recognized by the individual recognition engines, e.g., mobile device 104, back-end telecom server 108A, and application server 108B, the individual Burke, ¶ [0054], FIG. 2), based on a result of performing the ASR of the speech or the utterance of the user of the electronic device (The single recognized result is a combination of the individual results, including the result of the performance of the recognition tasks (ASR) of the “speech signal” (speech or utterance) of the user of the mobile device 104 (electronic device), as performed at the mobile device 104 (electronic device). Also shown in FIG. 2 as “return results to user”; Burke, ¶ [0052], FIG. 2) wherein the at least one processor is further configured to: estimate an accuracy of the ASR of the speech or the utterance of the user… (describes “the embedded recognizer on mobile device 104 is executed first [to perform ASR of the speech signal] {ASR of the speech or the utterance of the user}. The accuracy of device 104 recognizer is then measured using an output confidence score {estimating the accuracy of the ASR}”; Burke, ¶¶ [0059]); based on the accuracy of the ASR of the speech or the utterance of the user being less than a first preset value, transmit the audio signal of the speech or the utterance of the user to a server to perform the ASR of the speech or the utterance of the user (“If the output confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is below a preset threshold {being less than a first preset value}, the recognition task is allocated to server 108 recognizer {transmit the audio signal of the speech or the utterance of the user to a server to perform the ASR of the speech or the utterance of the user}.”; Burke, ¶¶ [0059]) and based on the accuracy of the ASR of the speech or the utterance of the user being greater than or equal to the first preset value, perform… [ASR] in the electronic device (Conversely, if the confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is not below the preset threshold {thus, being greater than or equal to the first preset value}, the recognition task is maintained at “the embedded recognizer on mobile device 104”; Burke, ¶¶ [0059]).  However, Burke fail(s) to expressly recite wherein the confidence score is based on the ambient noise information of the electronic device; 
Braho teaches “analysis of sounds in detecting and/or recognizing speech for use with or in voice-driven systems.” (Braho, ¶ [0001]). Regarding claim 1, Braho teaches wherein the confidence score is based on the ambient noise information of the electronic device (Discloses using “non-transient background noise and transient noise events... to adjust a threshold or confidence value or score” thus, the confidence score is based on ambient noise information.; Braho, ¶¶ [0024]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke to incorporate the teachings of Braho to include wherein the confidence score is based on the ambient noise information of the electronic device. “It may be advantageous to know whether each frame of audio represents speech, non-transient background noise or transient noise events” as this may allow for the incorporation of “features [which] better match the models,” as recognized by Braho. (Braho, ¶¶ [0097]). However, Burke and Braho fail to expressly recite wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.
Pasko teaches systems and methods for time-based local device arbitration. (Pasko, ¶ [0014]). Regarding claim 1, Pasko teaches wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device (“The local speech processing component 740 may also include a natural language understanding (NLU) component 744 that performs NLU on the generated ASR text data to determine an intent so that directives may be determined based on the intent.”; Pasko, ¶¶ [0111]).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech Burke as modified by the sound analysis techniques of Braho, to incorporate the teachings of Pasko to include wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.. The device arbitration described in Pasko may determine the most appropriate device to both “‘listen’ for sound representing user speech in the environment” and “‘respond’ to the utterance,” thus accounting for the capability of the device to respond to a user utterance (ability to respond) in light of user expectations regarding appropriate timing and context, as recognized by Pasko. (Pasko, ¶¶ [0029]-[0030]).

Regarding claim 4, Burke further discloses wherein the at least one processor is further configured to execute the one or more instructions to (The method is performed “by computer system 300 in response to processor 304 executing sequences of instructions contained in main memory 306.”; Burke, ¶ [0068]) determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device (“Using a complexity-based allocation scheme, the speech recognition task is allocated to a speech recognizer based on the recognition task's complexity.” In the case of the speech signal being determined to not be noisy, thus not complex, the “recognition tasks can be performed on mobile device 104” Thus the mobile device 104 (electronic device) determines that the mobile device 104 (electronic device) is to perform the recognition tasks (ASR) of the “speech signal” (speech or utterance) of the user of the mobile device 104 (electronic device).; Burke, ¶ [0041], [0052]) based on the ambient noise information (“Allocation of speech recognition tasks is determined based on complexity... [where] background noise determines the complexity level,” and “a noise detector is used on mobile device 104, which measures the noise level of the speech signal.” Thus, background noise {ambient noise information} of the mobile device 104; Burke, ¶ [0052]) indicating that an ambient noise level of the electronic device is less than a second preset value (“If the speech signal is too noisy, i.e., the signal is determined to exceed a preset threshold Burke, ¶ [0052]).  

Regarding claim 5, Burke further discloses further comprising a communicator configured to transmit to and receive data from an external device (“Computer system 300 also includes a communication interface 318{communicator},” where the “communication interface 308 provides two-way data communication {configured to transmit to and receive data from an external device}”; Burke, ¶ [0070]), wherein the at least one processor is further configured to execute the one or more instructions to (The method is performed “by computer system 300 in response to processor 304 executing sequences of instructions contained in main memory 306.”; Burke, ¶ [0068]) : control the communicator to transmit the audio signal of the speech or the utterance of the user of the electronic device to the external device (“Computer system 300” controls the communication interface 318 {communicator} to “send messages and receive data, including program code, through the network(s), network link 320 and communication interface 318,” such as “submitting recognition input {the audio signal of the speech or utterance of the user} to multiple recognition systems... [such as] server 108 {external device}.” Also shown in FIG. 2 in the interactions, shown by way of arrows, between mobile device 104 and servers 108A and 108B.; Burke, ¶¶ [0072], [0074], FIG. 2), and receive, from the external device, an ASR result of the speech or the utterance of the user of the electronic device (“ the mobile device 104 allocates the recognition tasks, using a task allocation mechanism according to one of the above-described approaches, to multiple recognizers based on one or more of the aforementioned allocation methods” where the “recognizer performs speech recognition processing based on the same speech input received {the speech or the utterance of the user of the electronic device} and provides the results to {thus, receiving the ASR result from Burke, ¶ [0061]), based on the ambient noise information indicating that the ambient noise level of the electronic device is greater than or equal to the second preset value (“Using a complexity-based allocation scheme, the speech recognition task is allocated to a speech recognizer based on the recognition task's complexity,” and “if the speech signal is too noisy, i.e., the signal is determined to exceed a preset threshold level, then the signal is determined to be complex.” Thus, noise information of the speech signal (ambient noise information) indicates that the noise level (ambient noise level) exceeds a preset threshold level (greater than or equal to the second preset value); Burke, ¶¶ [0041], [0052]).  

Regarding claim 7, Burke further discloses further comprising a communicator configured to transmit to and receive data from an external device (“Computer system 300 also includes a communication interface 318{communicator},” where the “communication interface 308 provides two-way data communication {configured to transmit to and receive data from an external device}”; Burke, ¶ [0070]), wherein the at least one processor is further configured to execute the one or more instructions to (The method is performed “by computer system 300 in response to processor 304 executing sequences of instructions contained in main memory 306.”; Burke, ¶ [0068]) : obtain a first ASR result by performing the ASR of the speech or the utterance of the user of the electronic device (“In a distributed embodiment according to the present invention, multiple recognizers, i.e., the mobile device 104, back-end telecom server 108A, and application server 108B, receive the same speech for speech recognition processing. According to the distributed embodiment, each recognizer performs speech recognition processing based on the same speech input received and provides the results to the mobile device 104.” Thus, the mobile device 104 (electronic device) performs speech recognition processing based on the speech input received {performing the ASR of the speech or the utterance of the user of the electronic device} and provides the results to the mobile device 104 {obtaining a first ASR result}; Burke, ¶ [0061], FIG. 2), control the communicator to transmit the audio signal of the speech or the utterance of the user of the electronic device to the external device (“Computer system 300” controls the communication interface 318 {communicator} to “send messages and receive data, including program code, through the network(s), network link 320 and communication interface 318,” such as “submitting recognition input {the audio signal of the speech or utterance of the user} to multiple recognition systems... [such as] server 108 {external device}.” Also shown in FIG. 2 in the interactions, shown by way of arrows, between mobile device 104 and servers 108A and 108B.; Burke, ¶¶ [0072], [0074], FIG. 2), receive a second ASR result from the external device (In the distributed embodiment the “back-end telecom server 108A... [also] receive the same speech for speech recognition processing...[where the back-end telecom server 108A] performs speech recognition processing based on the same speech input received and provides the results to the mobile device 104.” Thus, the back-end telecom server 108A {external device} performs speech recognition processing based on the speech input received {performing the ASR of the speech or the utterance of the user of the electronic device} and provides the results to the mobile device 104 {obtaining a second ASR result}; Burke, ¶ [0061], FIG. 2), select an ASR result from among the first ASR result and the second ASR result (“After receiving each recognizer's results {the first ASR result and the second ASR result}, mobile device 104 combines the results based on a plural voting technique ... [where] Each word in the recognized result from each recognizer is compared and if at least two out of three recognizer results for a given word match, then that word is selected as the recognized word. If none of the recognizer results match, then the confidence score and weighting for each word recognized by a recognizer are combined to arrive at a comparison value.”; Burke, ¶ [0063]), and output the response to the speech or the utterance of the user of the electronic device (“After the distributed recognition tasks have been allocated and recognized by the individual recognition engines, e.g., mobile device 104, back-end telecom server 108A, and application server 108B, the individual results are combined to generate a single recognized result” where the “individual results” of the “multiple speech recognizers” and the Burke, ¶ [0054], FIG. 2), based on the ASR result (The single recognized result is a combination of the individual results, as described above, based on the ASR performed at the mobile device 104 and the server 108A.; Burke, ¶ [0063]).

Regarding claim 10, the rejection of claim 1 is incorporated. Burke and Braho disclose all of the elements of the current invention as stated above. However, Burke and Braho fail(s) to expressly recite wherein the at least one processor is further configured to execute the one or more instructions to determine the response by performing at least one of natural language understanding (NLU) or dialogue management (DM) based on the result of performing the ASR of the speech or the utterance of the user of the electronic device.
The relevance of Pasko is described above with relation to claim 1. Regarding claim 10, Pasko discloses wherein the at least one processor is further configured to execute the one or more instructions to (“the device 102 includes one or more processors 702” and “the processor(s) 702” which “execute instructions stored on the memory 704” to implement “the functionally described herein”; Pasko, ¶ [0103]-[0104]) determine the response by performing the at least one of natural language understanding (NLU) or dialogue management (DM) (“ local speech processing component 740 may also include a natural language understanding (NLU) component 744 that performs NLU on the generated ASR text data to determine an intent so that directives may be determined based on the intent” and “The local speech processing component 740 may also provide a dialog management function to engage in speech dialogue with the user 112 to determine (e.g., clarify) user intents by asking the user 112 for information using speech prompts.”; Pasko, ¶ [0111]) based on the result of performing the ASR of the speech or the utterance of the user of the electronic device. (“the NLU component 744 takes textual input (e.g., from the ASR component 742) and attempts to make a semantic interpretation of the ASR text data. That is, the NLU component 744 determines the meaning behind the ASR Pasko, ¶ [0111]).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke as modified by the sound analysis techniques of Braho, to incorporate the teachings of Pasko to include wherein the at least one processor is further configured to execute the one or more instructions to determine the response by performing at least one of natural language understanding (NLU) or dialogue management (DM) based on the result of performing the ASR of the speech or the utterance of the user of the electronic device. The device arbitration described in Pasko may determine the most appropriate device to both “‘listen’ for sound representing user speech in the environment” and “‘respond’ to the utterance,” thus accounting for the capability of the device to respond to a user utterance (ability to respond) in light of user expectations regarding appropriate timing and context, as recognized by Pasko. (Pasko, ¶¶ [0029]-[0030]).

Regarding claim 11, Burke further discloses An operation method of an electronic device, the operation method comprising (the method disclosed with reference to the “System 100 includ[ing] a mobile device 104”; Burke, ¶ [0035]): determining that the electronic device is to perform automated speech recognition (ASR) (“Allocation of speech recognition tasks...” which is the speech recognition tasks being allocated to one or more of the “multiple speech recognizers,” “is determined based on complexity which is measured using one or more of several metrics,” where speech recognition tasks are automated speech recognition {“distributed recognition tasks have been allocated and recognized by the individual recognition engines” also referred to as “automated speech recognition (ASR) engines”}.; Burke, ¶¶ [0052], [0054]-[0055]) of a speech or an utterance of a user of the electronic device (The speech recognizer can be “mobile device 104,” thus an electronic device, and the user produces “utterances to be Burke, ¶¶ [0051], [0052]), based on ambient noise information of the electronic device (“Allocation of speech recognition tasks is determined based on complexity... [where] background noise determines the complexity level,” and “a noise detector is used on mobile device 104, which measures the noise level of the speech signal.” Thus, based on background noise {ambient noise information} of the mobile device 104.; Burke, ¶ [0052]) obtained from an audio signal of the speech or the utterance of the user of the electronic device (“A noise detector is used on mobile device 104, which measures the noise level of the speech signal,” thus the noise level (ambient noise information) is obtained from the speech signal (audio signal of the speech or utterance), and the user produces “utterances to be recognized” to create a speech signal at the mobile device 104 (electronic device).; Burke, ¶ [0052]), performing the ASR of the speech or the utterance of the user of the electronic device (“If the speech signal is too noisy, i.e., the signal is determined to exceed a preset threshold level, then the signal is determined to be complex…” where “lightweight recognition tasks {not complex} can be performed on mobile device 104 while heavyweight recognition tasks {complex} are allocated to server 108,” where recognition tasks are automated speech recognition; Burke, ¶¶ [0052], [0041]) based on determining that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device (“Using a complexity-based allocation scheme, the speech recognition task is allocated to a speech recognizer based on the recognition task's complexity.” In the case of the speech signal being determined to not be noisy, thus not complex, the “recognition tasks can be performed on mobile device 104” Thus the mobile device 104 (electronic device) determines that the mobile device 104 (electronic device) is to perform the recognition tasks (ASR) of the “speech signal” (speech or utterance) of the user of the mobile device 104 (electronic device).; Burke, ¶ [0041], [0052]), and outputting a response to the speech or the utterance of the user of the electronic device (“After the distributed recognition tasks have been allocated and recognized by the individual recognition engines, e.g., mobile device 104, back-end telecom server 108A, and Burke, ¶ [0054], FIG. 2), based on a result of performing the ASR of the speech or the utterance of the user of the electronic device (The single recognized result is a combination of the individual results, including the result of the performance of the recognition tasks (ASR) of the “speech signal” (speech or utterance) of the user of the mobile device 104 (electronic device), as performed at the mobile device 104 (electronic device). Also shown in FIG. 2 as “return results to user”; Burke, ¶ [0052], FIG. 2) wherein the at least one processor is further configured to: estimating an accuracy of the ASR of the speech or the utterance of the user… (describes “the embedded recognizer on mobile device 104 is executed first [to perform ASR of the speech signal] {ASR of the speech or the utterance of the user}. The accuracy of device 104 recognizer is then measured using an output confidence score {estimating the accuracy of the ASR}”; Burke, ¶¶ [0059]); based on the accuracy of the ASR of the speech or the utterance of the user being less than a first preset value, transmitting the audio signal of the speech or the utterance of the user to a server to perform the ASR of the speech or the utterance of the user (“If the output confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is below a preset threshold {being less than a first preset value}, the recognition task is allocated to server 108 recognizer {transmit the audio signal of the speech or the utterance of the user to a server to perform the ASR of the speech or the utterance of the user}.”; Burke, ¶¶ [0059]) and based on the accuracy of the ASR of the speech or the utterance of the user being greater than or equal to the first preset value, performing… [ASR] in the electronic device (Conversely, if the confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is not below the preset threshold {thus, being greater than or equal to the first preset value}, the recognition task is maintained at “the embedded recognizer on mobile device 104”; Burke, ¶¶ [0059]).  However, Burke fail(s) to expressly recite wherein the confidence 
The relevance of Braho is described above with relation to claim 1. Regarding claim 11, Braho teaches wherein the confidence score is based on the ambient noise information of the electronic device (Discloses using “non-transient background noise and transient noise events... to adjust a threshold or confidence value or score” thus, the confidence score is based on ambient noise information.; Braho, ¶¶ [0024]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke to incorporate the teachings of Braho to include wherein the confidence score is based on the ambient noise information of the electronic device. “It may be advantageous to know whether each frame of audio represents speech, non-transient background noise or transient noise events” as this may allow for the incorporation of “features [which] better match the models,” as recognized by Braho. (Braho, ¶¶ [0097]). However, Burke and Braho fail to expressly recite wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.
The relevance of Pasko is described above with relation to claim 1. Regarding claim 11, Pasko teaches wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device (“The local speech processing component 740 may also include a natural language understanding (NLU) component 744 that performs NLU on the generated ASR text data to determine an intent so that directives may be determined based on the intent.”; Pasko, ¶¶ [0111]).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke as modified by the sound analysis techniques of Braho, to Pasko to include wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.. The device arbitration described in Pasko may determine the most appropriate device to both “‘listen’ for sound representing user speech in the environment” and “‘respond’ to the utterance,” thus accounting for the capability of the device to respond to a user utterance (ability to respond) in light of user expectations regarding appropriate timing and context, as recognized by Pasko. (Pasko, ¶¶ [0029]-[0030]).

Regarding claim 14, the rejection of claim 11 is incorporated. Claim 14 is substantially the same as claim 4 and is therefore rejected under the same rationale as above.

Regarding claim 15, the rejection of claim 14 is incorporated. Claim 15 is substantially the same as claim 5 and is therefore rejected under the same rationale as above.

Regarding claim 17, the rejection of claim 11 is incorporated. Claim 17 is substantially the same as claim 7 and is therefore rejected under the same rationale as above.

Regarding claim 20, Burke discloses An automated speech recognition (ASR) system comprising (“System 100 includes a mobile device 104” and a “server 108A”; Burke, ¶ [0035]): an electronic device configured to receive a speech or an utterance of a user of the electronic device (“computer system 300 upon which an embodiment of the invention may be implemented including server 108 and with some differences mobile device 104,” where “audio input [is] received by the mobile device,” wherein audio input is also referred to as a speech input and where speech input is “user-provided speech input.”; Burke, ¶¶ [0064], [0058], [0070]), and a server configured to perform ASR of the speech or the utterance of the user of the electronic device based on an audio signal of the speech or the utterance of the user of the electronic device received from the electronic device (“In a distributed embodiment according to the present invention, multiple recognizers, i.e., the mobile device 104, back-end telecom server 108A, and application server 108B, receive the same speech for speech recognition processing.” Thus, the server 108A performs speech recognition processing (ASR) of the same speech (speech or the utterance of the user of the electronic device); Burke, ¶ [0061]), wherein the electronic device comprises at least one processor configured to: (The method is performed “by computer system 300”, such as the mobile device 104 (electronic device), “in response to processor 304 executing sequences of instructions contained in main memory 306.”; Burke, ¶ [0068]): estimate an accuracy of the ASR of the speech or the utterance of the user… (describes “the embedded recognizer on mobile device 104 is executed first [to perform ASR of the speech signal] {ASR of the speech or the utterance of the user}. The accuracy of device 104 recognizer is then measured using an output confidence score {estimating the accuracy of the ASR}”; Burke, ¶¶ [0059]); based on the accuracy of the ASR of the speech or the utterance of the user being less than a first preset value, transmit the audio signal of the speech or the utterance of the user to a server to perform the ASR of the speech or the utterance of the user (“If the output confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is below a preset threshold {being less than a first preset value}, the recognition task is allocated to server 108 recognizer {transmit the audio signal of the speech or the utterance of the user to a server to perform the ASR of the speech or the utterance of the user}.”; Burke, ¶¶ [0059]) and based on the accuracy of the ASR of the speech or the utterance of the user being greater than or equal to the first preset value, perform… [ASR] in the electronic device (Conversely, if the confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is not below the preset threshold {thus, being greater than or equal to the first preset value}, the recognition task is maintained at “the embedded recognizer on mobile device 104”; Burke, ¶¶ [0059]).  However, Burke fail(s) to expressly recite wherein the confidence score is based on the ambient noise information of the electronic device; 
The relevance of Braho is described above with relation to claim 1. Regarding claim 20, Braho teaches wherein the confidence score is based on the ambient noise information of the electronic device (Discloses using “non-transient background noise and transient noise events... to adjust a threshold or confidence value or score” thus, the confidence score is based on ambient noise information.; Braho, ¶¶ [0024]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke to incorporate the teachings of Braho to include wherein the confidence score is based on the ambient noise information of the electronic device. “It may be advantageous to know whether each frame of audio represents speech, non-transient background noise or transient noise events” as this may allow for the incorporation of “features [which] better match the models,” as recognized by Braho. (Braho, ¶¶ [0097]). However, Burke and Braho fail to expressly recite wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.
The relevance of Pasko is described above with relation to claim 1. Regarding claim 20, Pasko teaches wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device (“The local speech processing component 740 may also include a natural language understanding (NLU) component 744 that performs NLU on the generated ASR text data to determine an intent so that directives may be determined based on the intent.”; Pasko, ¶¶ [0111]).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke as modified by the sound analysis techniques of Braho, to incorporate the teachings of Pasko to include wherein ASR further comprises performing at least Pasko may determine the most appropriate device to both “‘listen’ for sound representing user speech in the environment” and “‘respond’ to the utterance,” thus accounting for the capability of the device to respond to a user utterance (ability to respond) in light of user expectations regarding appropriate timing and context, as recognized by Pasko. (Pasko, ¶¶ [0029]-[0030]).

Claims 2 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Burke, Braho, and Pasko as applied to claims 1 and 11 above, and further in view of Tang.

Regarding claim 2, the rejection of claim 1 is incorporated. Burke, Braho, and Pasko disclose all of the elements of the current invention as stated above. However, Burke, Braho, and Pasko fail(s) to expressly recite wherein the ASR is processed using an artificial intelligence (Al) algorithm.
Tang teaches systems and methods for adaptation of neural networks to different speakers. (Tang, ¶ [0012]). Regarding claim 2, Tang discloses wherein the ASR is processed using an artificial intelligence (Al) algorithm. (“FIG. 1 is a diagram illustrating a deep belief neural network (DNN) 100 for automatic speech recognition (ASR)” where “the parameters of the DNN 100 are adjusted based on both speech data and speaker information”; Tang, ¶ [0013], [0016]).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke, as modified by the sound analysis techniques of Braho, and by the time-based local device arbitration of Pasko, to incorporate the teachings of Tang to include wherein the ASR is processed using an artificial intelligence (Al) algorithm. “By training the DNN 100 with both the speaker representation data 108 and the speech data 104, the recognition Tang. (Tang, ¶ [0016]).

Regarding claim 12, the rejection of claim 11 is incorporated. Claim 12 is substantially the same as claim 2 and is therefore rejected under the same rationale as above.

Claims 6 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Burke, Braho, and Pasko as applied to claims 1 and 11 above, and further in view of Lebeau and White.

Regarding claim 6, the rejection of claim 1 is incorporated. Burke, Braho, and Pasko disclose all of the elements of the current invention as stated above. However, Burke, Braho, and Pasko fail(s) to expressly recite wherein the at least one processor is further configured to execute the one or more instructions to: extract a keyword included in the speech or the utterance of the user of the electronic device based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range and determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the keyword being a preset keyword and based on the ambient noise information of the electronic device.
Lebeau teaches methods, systems, and techniques for automatically monitoring for voice input using current context of the computing device or user interaction. (Lebeau, ¶ [0004]). Regarding claim 6, Lebeau discloses wherein the at least one processor is further configured to execute the one or more instructions to extract a keyword (“the mobile computing device 202 is configured to automatically determine when to start and when to stop monitoring for voice input based on a current context associated with the mobile computing device,” where “when at least the microphone 206 a and the speech analysis subsystem 212 are activated during an audio monitoring mode of operation and the speech analysis subsystem 212 detects voice input from a Lebeau, ¶ [0057], [0067]-[0068]) included in the speech or the utterance of the user of the electronic device (“detects voice input from a stream of audio data provided by the microphone 206” which can include “a user request” (speech or utterance of the user) where the microphone 206 is part of the mobile computing device 202 (the electronic device).; Lebeau, ¶ [0067], FIG. 2) based on the ambient noise information (Discloses systems for “monitoring for voice input using a mobile computing device 172 a-d” where monitoring can be stopped based on the current context including “high level of ambient noise,” thus based on ambient noise information.; Lebeau, ¶ [0050], [0054]) indicating that an ambient noise level of the electronic device has a value in a preset range (using the “high level of ambient noise,” “the mobile device 172 d can generally infer that it is located in a public area…[and] determine to not monitor for voice input.” High level indicates that the ambient noise level has a value above a preset range (where the preset range would be a range of ambient noise expected from a non-public area). Thus the system determines whether to monitor for input or not, based on the ambient noise level having a value in a preset range.; Lebeau, ¶ [0054]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke, as modified by the sound analysis techniques of Braho, and by the time-based local device arbitration of Pasko, to incorporate the teachings of Lebeau to include wherein the at least one processor is further configured to execute the one or more instructions to: extract a keyword included in the speech or the utterance of the user of the electronic device based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range. The use of current context in recognition of voice input allows for less intrusive voice monitoring without specific adherence to “the formalities associated with prompting a mobile computing device to use voice input,” as recognized by Lebeau. (Lebeau, ¶ [0021]). However, Burke, Braho, Pasko, and Lebeau fail(s) to expressly recite 
White teaches systems and methods for using context in device arbitration. (White, ¶ [0010]). Regarding claim 6, White discloses determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device (“Following detection of a wakeword, the voice-enabled device 108 sends an audio signal 114 corresponding to the speech utterance 106, to a computing device of the speech processing system 110 that includes the ASR component 126.”; White, ¶ [0091]) based on the keyword being a preset keyword (“voice-enabled devices 108 may receive or capture sound corresponding to the speech utterance 106 of the user via one or more microphones. In certain implementations, the speech utterance 106 may include or be preceded by a wakeword” where “the wakeword... may be a predefined word, phrase, or other sound,” and where when the wakeword is detected “the voice-enabled devices 108 may begin streaming the audio signal, and other data, to the speech processing system 110.”; White, ¶ [0034]) and based on the ambient noise information of the electronic device. (“At 306, the voice-enabled device may determine voice activity using voice activity detection (VAD) to detect the presence of voice in the directional audio signals... [where] the voice activity may be a ratio of the signal strength of the speech utterance 106 in an audio signal 114 with the ambient noise in the audio signal 114.” and then “the voice-enabled device 108 may detect a wakeword by performing wakeword detection on the directional audio signal within which voice activity has been detected” The decision to perform ASR is based on wakeword detection (the preset keyword) [0091], the wakeword detection is based on voice activity detection, and voice activity detection is based on ambient noise in the audio signal 114 {ambient noise information}; White, ¶¶ [0077], [0079], [0091]).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech Burke, as modified by the sound analysis techniques of Braho, by the time-based local device arbitration of Pasko, and by the systems and methods for automatic context monitoring for voice input of Lebeau, to incorporate the teachings of White to include determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the keyword being a preset keyword and based on the ambient noise information of the electronic device. The use of context in device arbitration allows for the selection of the “best suited voice-enabled device… to respond to the speech utterance,” as recognized by White. (White, ¶ [0027]).

Regarding claim 16, the rejection of claim 11 is incorporated. Claim 16 is substantially the same as claim 6 and is therefore rejected under the same rationale as above.

Claims 8 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Burke, Braho, and Pasko as applied to claims 7 and 17 above, and further in view of Endo.

Regarding claim 8, the rejection of claim 7 is incorporated. Burke, Braho, and Pasko disclose all of the elements of the current invention as stated above. However, Burke, Braho, and Pasko fail(s) to expressly recite wherein the at least one processor is further configured to execute the one or more instructions to select the ASR result from among the first ASR result and the second ASR result, based on the ambient noise information of the electronic device.
Endo teaches a speech recognition system having multiple speech recognizers. (Endo, Col. 1, lines 16-17). Regarding claim 8, Endo discloses wherein the at least one processor is further configured to execute the one or more instructions to (Discloses a “speech recognition system 104 [including a] decision module 208... coupled to the speech recognizers 202, 204, 206…the decision module 208 includes ...a processor 304.”; Endo, Col. 7, lines 6-12) select the ASR result from among the first ASR result and the second ASR result (“Each Endo, Col. 5, lines 44-51, Col. 6, lines 50-52), based on the ambient noise information of the electronic device. (In some examples, “the decision module 208 adjusts the raw confidence scores to generate adjusted confidence scores associated with the recognized speech text, based upon ...the external data 109… [including] level of background noise.” Thus, the level of background noise {ambient noise information} is used to adjust the confidence score, where the confidence score is used to select the ASR result, as produced by speech recognizers 202, 204, and 206.; Endo, Col. 6, lines 35-43).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke, as modified by the sound analysis techniques of Braho, and by the time-based local device arbitration of Pasko, to incorporate the teachings of Endo to include wherein the at least one processor is further configured to execute the one or more instructions to select the ASR result from among the first ASR result and the second ASR result, based on the ambient noise information of the electronic device. “Because the output speech text is selected from the outputs from a plurality of speech recognizers in the speech recognition system of the present invention, the speech recognition system can take advantage of the strengths, while complementing the weaknesses, of each speech recognizer,” as recognized by Endo. (Endo, Col. 3, lines 3-8).

Regarding claim 18, the rejection of claim 17 is incorporated. Claim 18 is substantially the same as claim 8 and is therefore rejected under the same rationale as above.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more 





/Sean E Serraguard/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657