Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on January 7, 2022 has been entered.
 
Response to Amendments 
Applicant’s amendment filed on January 7, 2022 has been entered. 
In view of the amendment to the claim(s), the amendment of claim(s) 1, 6, 11, 16, and 20 and the cancellation of claim(s) 2 have been acknowledged and entered.  
In view of the amendment to claim(s) 1, 6, 11, 16, and 20 and the cancellation of claim(s) 2, the rejection of claims 1, 2, 4-8, 10-12, 14-18, and 20 under 35 U.S.C. §103 is withdrawn.
In light of the amended claims, new grounds for rejection under 35 U.S.C. §103 are provided in the response below. 

Response to Arguments
Applicant’s arguments regarding the prior art rejections under 35 U.S.C. §102 and 35 U.S.C. §103, see pages 12-14 of the Response to Non-Final Office Action dated June 7, 2021, Response and Office Action, respectively), have been fully considered.
Prior to entry of this amendment, claims 1, 4-5, 7, 10-11, 14-15, 17 and 20 are rejected under 35 U.S.C. § 103 as being unpatentable over Burke (U.S. Pat. App. Pub. No. 2006/0009980, hereinafter Burke) in view of Braho (U.S. Pat. App. Pub. No. 2014/0278391, hereinafter Braho) and Pasko (U.S. Pat. App. Pub. No. 2019/0311720, hereinafter Pasko); claims 2 and 12 are  rejected under 35 U.S.C. § 103 as being unpatentable over Burke in view of Braho, Pasko, and Tang (U.S. Pat. App. Pub. No. 2015/0161994, hereinafter Tang); claims 6 and 16 are rejected under 35 U.S.C. § 103 as being unpatentable over Burke in view of Braho, Pasko, Lebeau (U.S. Pat. App. Pub. No. 2015/0310867, hereinafter Lebeau) and White (U.S. Pat. App. Pub. No.  2019/0066670, hereinafter White); claims 8 and 18 are rejected under 35 U.S.C. § 103 as being unpatentable over Burke in view of Braho, Pasko, and Endo (U.S. Pat. No. 7,228,275, hereinafter Endo).
As Applicant has amended independent claim(s) 1, 11 and 20 to incorporate the limitations of claim 2, the rejections of claim(s) 1, 11, and 20 have been amended to incorporate the rejection of the respective limitations of claim 2, as appropriate.
With respect to the rejection(s) of claim(s) 1, 11, and 20 under 35 U.S.C. §103, applicant asserts that the cited references above fail to teach or suggest at least “wherein the at least one processor is further configured to: extract a feature from the audio signal of the speech or the utterance of the user; obtain one or more keywords and reliability data corresponding to each of the one or more keywords by inputting the extracted feature to the one or more neural networks; and determine a keyword having a highest reliability among the one or more keywords, as a keyword corresponding to the audio signal.” Applicant’s arguments regarding the current combination of references in light of the amended claims are persuasive. As such, the rejections of claims 1, 11, and 20 under 35 U.S.C. §103 are withdrawn.
Regarding claims 1, 11, and 20, Applicant asserts that the claims do not recite “wherein the confidence score is based on the ambient noise information of the electronic device,” and, thus, Braho should be withdrawn. However, this argument is not persuasive. Braho is cited, not for an express recitation of applicant’s claim language, but to modify the disclosure found in Burke. Thus, the recited portions of Braho are read in combination with the disclosure of Burke to indicate what was known in the art prior to the effective filing date of the present application. 
As indicated in the Office Action, Burke discloses wherein the at least one processor is further configured to: estimate an accuracy of the ASR of the speech or the utterance of the user based on a confidence score. (Office Action, pg. 7). However, Burke fails to expressly recite that said confidence score is based on ambient noise information of the electronic device. Braho cures that deficiency by the use of “non-transient background noise” to modify a “confidence value or score.” (Office Action, pg. 7 citing Braho, ¶ [0024]). Thus, Burke and Braho, read in combination, render the limitation of “wherein the at least one processor is further configured to: estimate an accuracy of the ASR of the speech or the utterance of the user, based on the ambient noise information of the electronic device” obvious. Further explanation is provided below to clarify the mapping of the claim elements. Therefore, the rejection of the above element over the combinations of Burke and Braho is maintained.
Further regarding claims 1, 11, and 20, applicant mischaracterizes examiner’s indication of the deficiencies of Burke in view of Braho and Pasko. Specifically, applicant asserts that “In the rejection of claim 2, the Examiner acknowledges that the combination of Burke, Braho and Pasko fails to teach or suggest ‘wherein the ASR is processed using an artificial intelligence (AI) algorithm’” which applicant appears to allude to in asserting that the references are “silent” with regards to the applicant’s newly recited features. (Response, pg. 12, referencing Office Action, p. 21). However, examiner made no such concession regarding the limitations of Burke, Braho and Pasko. As stated in the Office Action at page 21, Burke, Braho, and Pasko “fail to expressly recite wherein the ASR is processed using an artificial intelligence (Al) algorithm.” This failure to 
Further regarding claims 1, 11, and 20, examiner does not agree that Burke, Braho, and Pasko are silent with regards to the above presented limitation.  Though indicated as part of the background, Burke at paragraph [0007] explains that “Speech recognition is a complex process requiring analysis of speech signals, extraction of features, searching statistical models (such as Gaussian Mixture Models, Neural Networks, etc.), and combinations of word and language statistics.” Further, Pasko indicates the use of a neural network in keyword detection and voice activity detection. (Pasko, ¶ [0108]). However, examiner agrees that Burke, Braho, and Pasko fail to expressly recite all limitations of claims 1, 11, and 20 as amended.
Applicant further argues that dependent claims 4-8, 10, 12 and 14-18 are allowable for at least the same reasons as independent claims 1, 11, and 20. Applicant’s arguments in light of the amended claims are persuasive. As such, the rejections of claims 4-8, 10, 12 and 14-18 under 35 U.S.C. §103 are withdrawn.
However, upon further consideration, new ground(s) of rejection under 35 U.S.C. §103 are made in light of combinations of Burke, Braho, Tang, Kemp, Lebeau, White, and Endo, presented in detail with relation to the claim elements below.
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.	

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-5, 7, 10-12, 14-15, 17, and 20 is/are rejected under 35 U.S.C. 103 as being obvious in light of Burke in view of Braho and White.

Regarding claim 1, Burke discloses An electronic device comprising (“System 100 includes a mobile device 104”; Burke, ¶ [0035]): a memory storing one or more instructions (“computer system 300 upon which an embodiment of the invention may be implemented including server 108 and with some differences mobile device 104,” where “computer system 300 also includes a main memory 306... [storing] instructions.”; Burke, ¶ [0064]-[0065]), and at least one processor configured to execute the one or more instructions stored in the memory (“computer system 300” includes a “processor 304 [configured to] execut[e] sequences of instructions contained in main memory 306; Burke, ¶ [0068]), wherein when executing the one or more instructions the at least one processor is configured to: (The method is performed “by computer system 300 in response to processor 304 executing sequences of instructions contained in main memory 306.”; Burke, ¶ [0068]) determine that the electronic device is to perform automated speech recognition (ASR) (“Allocation of speech recognition tasks...” which is the speech recognition tasks being allocated to one or more of the “multiple speech recognizers,” “is determined based on complexity which is measured using one or more of several metrics,” where speech recognition tasks are automated speech recognition {“distributed Burke, ¶¶ [0052], [0054]-[0055]) of a speech or an utterance of a user of the electronic device (The speech recognizer can be “mobile device 104,” thus an electronic device, and the user produces “utterances to be recognized” to create a speech signal; Burke, ¶¶ [0051], [0052]), based on ambient noise information of the electronic device (“Allocation of speech recognition tasks is determined based on complexity... [where] background noise determines the complexity level,” and “a noise detector is used on mobile device 104, which measures the noise level of the speech signal.” Thus, based on background noise {ambient noise information} of the mobile device 104.; Burke, ¶ [0052]) obtained from an audio signal of the speech or the utterance of the user of the electronic device (“A noise detector is used on mobile device 104, which measures the noise level of the speech signal,” thus the noise level (ambient noise information) is obtained from the speech signal (audio signal of the speech or utterance), and the user produces “utterances to be recognized” to create a speech signal at the mobile device 104 (electronic device).; Burke, ¶ [0052]), perform the ASR of the speech or the utterance of the user of the electronic device (“If the speech signal is too noisy, i.e., the signal is determined to exceed a preset threshold level, then the signal is determined to be complex…” where “lightweight recognition tasks {not complex} can be performed on mobile device 104 while heavyweight recognition tasks {complex} are allocated to server 108,” where recognition tasks are automated speech recognition; Burke, ¶¶ [0052], [0041]) based on determining that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device (“Using a complexity-based allocation scheme, the speech recognition task is allocated to a speech recognizer based on the recognition task's complexity.” In the case of the speech signal being determined to not be noisy, thus not complex, the “recognition tasks can be performed on mobile device 104” Thus the mobile device 104 (electronic device) determines that the mobile device 104 (electronic device) is to perform the recognition tasks (ASR) of the “speech signal” (speech or utterance) of the user of Burke, ¶ [0041], [0052]), and output a response to the speech or the utterance of the user of the electronic device (“After the distributed recognition tasks have been allocated and recognized by the individual recognition engines, e.g., mobile device 104, back-end telecom server 108A, and application server 108B, the individual results are combined to generate a single recognized result” where the “individual results” of the “multiple speech recognizers,” which includes the mobile device 104, and the “single recognized result” are output.; Burke, ¶ [0054], FIG. 2), based on a result of performing the ASR of the speech or the utterance of the user of the electronic device (The single recognized result is a combination of the individual results, including the result of the performance of the recognition tasks (ASR) of the “speech signal” (speech or utterance) of the user of the mobile device 104 (electronic device), as performed at the mobile device 104 (electronic device). Also shown in FIG. 2 as “return results to user”; Burke, ¶ [0052], FIG. 2) wherein the at least one processor is further configured to: estimate an accuracy of the ASR of the speech or the utterance of the user… [based on a confidence score] (describes “the embedded recognizer on mobile device 104 is executed first [to perform ASR of the speech signal] {ASR of the speech or the utterance of the user}. The accuracy of device 104 recognizer is then measured using an output confidence score {estimating the accuracy of the ASR}”; Burke, ¶¶ [0059]); based on the accuracy of the ASR of the speech or the utterance of the user being less than a first preset value, transmit the audio signal of the speech or the utterance of the user to a server to perform the ASR of the speech or the utterance of the user (“If the output confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is below a preset threshold {being less than a first preset value}, the recognition task is allocated to server 108 recognizer {transmit the audio signal of the speech or the utterance of the user to a server to perform the ASR of the speech or the utterance of the user}.”; Burke, ¶¶ [0059]) and based on the accuracy of the ASR of the speech or the utterance of the user being greater than or equal to the first preset value, perform… [ASR] in the electronic device (Conversely, if the Burke, ¶¶ [0059]).  However, Burke fail(s) to expressly recite wherein the confidence score is based on the ambient noise information of the electronic device; [and] wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.
Braho teaches “analysis of sounds in detecting and/or recognizing speech for use with or in voice-driven systems.” (Braho, ¶ [0001]). Regarding claim 1, Braho teaches wherein the confidence score of a hypothesis in a speech recognition system is based on the ambient noise information of the electronic device (As indicated above, Burke discloses the use of a confidence score in determining the accuracy of ASR. However, Burke doesn’t expressly recite what is included in producing said confidence score. Braho discloses using “non-transient background noise and transient noise events... to adjust a threshold or confidence value or score” thus, the confidence score is based on ambient noise information.; Braho, ¶¶ [0024]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke to incorporate the teachings of Braho to include wherein the confidence score is based on the ambient noise information of the electronic device. “It may be advantageous to know whether each frame of audio represents speech, non-transient background noise or transient noise events” as this may allow for the incorporation of “features [which] better match the models,” as recognized by Braho. (Braho, ¶¶ [0097]). However, Burke and Braho fail to expressly recite wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device, wherein the ASR is processed using an artificial intelligence (AI) algorithm including one or more neural networks and wherein the at least one processor is further configured to: extract a feature from 
White teaches systems and methods for using context in device arbitration. (White, ¶ [0010]). Regarding claim 1, White teaches wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device. ("an orchestration component of the speech processing system may call an automated speech recognition (ASR) component of the speech processing system to process one or more of the audio signals received from the voice-enabled devices using automated speech recognition to generate text data representing the speech utterance... [and] a natural language understanding (NLU) component to process the text data representing the speech utterance" from the ASR component "using natural language understanding to determine an intent" and "the computer-readable media 402 may further store a dialog management component 408 that is responsible for conducting speech dialogs with the user 104 in response to meanings or intents of user speech determined by the NLU component 128"; White, ¶¶ [0017], [0094])…wherein the ASR is processed using an artificial intelligence (AI) algorithm including one or more neural networks ("The device or devices performing the ASR processing may include an acoustic front end (AFE) 416 and a speech recognition engine 418... [where] a number of approaches may be used by the AFE 416 to process the audio data, such as...neural network feature vector techniques."; White, ¶¶ [0099]) and wherein the at least one processor is further configured to: extract a feature from the audio signal of the speech or the utterance of the user ("the AFE 416 determines a number of values, called features, representing the qualities of the audio data, along with a set of those values, called a feature vector, representing the features/qualities of the audio data within the frame" where "a keyword spotter may use simplified ASR (automatic speech recognition) techniques...[and] a portion of an  White, ¶¶ [0099], [0081]); obtain one or more keywords and reliability data corresponding to each of the one or more keywords ("wakeword detection may be implemented using keyword spotting technology {obtain one or more keywords...}" and "wakeword detection may also use a support vector machine (SVM) classifier that receives the one or more feature scores produced by the HMM recognizer. The SVM classifier produces a confidence score {reliability data} indicating the likelihood that an audio signal contains the trigger expression {corresponding to each of the one or more keywords}."; White, ¶¶ [0080], [0082]) by inputting the extracted feature to the one or more neural networks (the system discloses the use of "Neural network feature vector techniques..." to "...process the audio data" which is the input of extracted features into a neural network.; White, ¶¶ [0099]); and determine a keyword having a highest reliability among the one or more keywords, as a keyword corresponding to the audio signal (the confidence score "represent[s] the likelihood that a particular set of words matches those spoken in the utterance" where "Based on... the assigned ASR confidence score," and thus the likelihood that the words match those spoken in the utterance "the ASR component 126 outputs the most likely text {determine a keyword having a highest reliability among the one or more keywords} recognized in the audio data {...as a keyword corresponding to the audio signal}."; White, ¶¶ [0098]).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke as modified by the sound analysis techniques of Braho, to incorporate the teachings of White to include wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device, wherein the ASR is processed using an artificial intelligence (AI) algorithm including one or more neural networks and wherein the at least one processor is further configured to: extract White. (White, ¶ [0027]).

Regarding claim 4, Burke further discloses wherein the at least one processor is further configured to execute the one or more instructions to (The method is performed “by computer system 300 in response to processor 304 executing sequences of instructions contained in main memory 306.”; Burke, ¶ [0068]) determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device (“Using a complexity-based allocation scheme, the speech recognition task is allocated to a speech recognizer based on the recognition task's complexity.” In the case of the speech signal being determined to not be noisy, thus not complex, the “recognition tasks can be performed on mobile device 104” Thus the mobile device 104 (electronic device) determines that the mobile device 104 (electronic device) is to perform the recognition tasks (ASR) of the “speech signal” (speech or utterance) of the user of the mobile device 104 (electronic device).; Burke, ¶ [0041], [0052]) based on the ambient noise information (“Allocation of speech recognition tasks is determined based on complexity... [where] background noise determines the complexity level,” and “a noise detector is used on mobile device 104, which measures the noise level of the speech signal.” Thus, background noise {ambient noise information} of the mobile device 104; Burke, ¶ [0052]) indicating that an ambient noise level of the electronic device is less than a second preset value (“If the speech signal is too noisy, i.e., the signal is determined to exceed a preset threshold level, then the signal is determined to be complex.” Thus, when noise information of the speech signal (ambient noise information) indicates that the noise level (ambient noise level) does not Burke, ¶ [0052]).  

Regarding claim 5, Burke further discloses further comprising a communicator configured to transmit to and receive data from an external device (“Computer system 300 also includes a communication interface 318{communicator},” where the “communication interface 308 provides two-way data communication {configured to transmit to and receive data from an external device}”; Burke, ¶ [0070]), wherein the at least one processor is further configured to execute the one or more instructions to (The method is performed “by computer system 300 in response to processor 304 executing sequences of instructions contained in main memory 306.”; Burke, ¶ [0068]) : control the communicator to transmit the audio signal of the speech or the utterance of the user of the electronic device to the external device (“Computer system 300” controls the communication interface 318 {communicator} to “send messages and receive data, including program code, through the network(s), network link 320 and communication interface 318,” such as “submitting recognition input {the audio signal of the speech or utterance of the user} to multiple recognition systems... [such as] server 108 {external device}.” Also shown in FIG. 2 in the interactions, shown by way of arrows, between mobile device 104 and servers 108A and 108B.; Burke, ¶¶ [0072], [0074], FIG. 2), and receive, from the external device, an ASR result of the speech or the utterance of the user of the electronic device (“ the mobile device 104 allocates the recognition tasks, using a task allocation mechanism according to one of the above-described approaches, to multiple recognizers based on one or more of the aforementioned allocation methods” where the “recognizer performs speech recognition processing based on the same speech input received {the speech or the utterance of the user of the electronic device} and provides the results to {thus, receiving the ASR result from the external device} the mobile device 104.”; Burke, ¶ [0061]), based on the ambient noise information indicating that the ambient noise level of the electronic device is greater than or equal to the second preset value (“Using a complexity-based allocation scheme, the speech recognition task is allocated to a speech recognizer based on the recognition task's complexity,” and “if the speech signal is too noisy, i.e., the signal is determined to exceed a preset threshold level, then the signal is determined to be complex.” Thus, noise information of the speech signal (ambient noise information) indicates that the noise level (ambient noise level) exceeds a preset threshold level (greater than or equal to the second preset value); Burke, ¶¶ [0041], [0052]).  

Regarding claim 7, Burke further discloses further comprising a communicator configured to transmit to and receive data from an external device (“Computer system 300 also includes a communication interface 318{communicator},” where the “communication interface 308 provides two-way data communication {configured to transmit to and receive data from an external device}”; Burke, ¶ [0070]), wherein the at least one processor is further configured to execute the one or more instructions to (The method is performed “by computer system 300 in response to processor 304 executing sequences of instructions contained in main memory 306.”; Burke, ¶ [0068]) : obtain a first ASR result by performing the ASR of the speech or the utterance of the user of the electronic device (“In a distributed embodiment according to the present invention, multiple recognizers, i.e., the mobile device 104, back-end telecom server 108A, and application server 108B, receive the same speech for speech recognition processing. According to the distributed embodiment, each recognizer performs speech recognition processing based on the same speech input received and provides the results to the mobile device 104.” Thus, the mobile device 104 (electronic device) performs speech recognition processing based on the speech input received {performing the ASR of the speech or the utterance of the user of the electronic device} and provides the results to the mobile device 104 {obtaining a first ASR result}; Burke, ¶ [0061], FIG. 2), control the communicator to transmit the audio signal of the speech or the utterance of the user of the electronic device to the external device (“Computer system 300” controls the communication interface 318 Burke, ¶¶ [0072], [0074], FIG. 2), receive a second ASR result from the external device (In the distributed embodiment the “back-end telecom server 108A... [also] receive the same speech for speech recognition processing...[where the back-end telecom server 108A] performs speech recognition processing based on the same speech input received and provides the results to the mobile device 104.” Thus, the back-end telecom server 108A {external device} performs speech recognition processing based on the speech input received {performing the ASR of the speech or the utterance of the user of the electronic device} and provides the results to the mobile device 104 {obtaining a second ASR result}; Burke, ¶ [0061], FIG. 2), select an ASR result from among the first ASR result and the second ASR result (“After receiving each recognizer's results {the first ASR result and the second ASR result}, mobile device 104 combines the results based on a plural voting technique ... [where] Each word in the recognized result from each recognizer is compared and if at least two out of three recognizer results for a given word match, then that word is selected as the recognized word. If none of the recognizer results match, then the confidence score and weighting for each word recognized by a recognizer are combined to arrive at a comparison value.”; Burke, ¶ [0063]), and output the response to the speech or the utterance of the user of the electronic device (“After the distributed recognition tasks have been allocated and recognized by the individual recognition engines, e.g., mobile device 104, back-end telecom server 108A, and application server 108B, the individual results are combined to generate a single recognized result” where the “individual results” of the “multiple speech recognizers” and the “single recognized result” are output.; Burke, ¶ [0054], FIG. 2), based on the ASR result (The Burke, ¶ [0063]).

Regarding claim 10, the rejection of claim 1 is incorporated. Burke and Braho disclose all of the elements of the current invention as stated above. However, Burke fail(s) to expressly recite wherein the at least one processor is further configured to execute the one or more instructions to determine the response by performing at least one of natural language understanding (NLU) or dialogue management (DM) based on the result of performing the ASR of the speech or the utterance of the user of the electronic device.
The relevance of White is described above with relation to claim 1. Regarding claim 10, White discloses wherein the at least one processor is further configured to execute the one or more instructions to (“The NLU component 128 (e.g., server) may include various components, including potentially dedicated processor(s), memory, storage, etc.” which “execute instructions stored on the computer-readable media”; White, ¶ [0104]-[0135]) determine the response by performing the at least one of natural language understanding (NLU) or dialogue management (DM) (“ASR results … may be sent to the speech processing system 110, for natural language understanding (NLU) processing, such as conversion of the text into commands for execution, either by the user device, by the speech processing system 110, or by another device” and “The computer-readable media 402 may further store a dialog management component 408 that is responsible for conducting speech dialogs with the user 104 in response to meanings or intents of user speech determined by the NLU component 128.”; White, ¶ [0103], [0094]) based on the result of performing the ASR of the speech or the utterance of the user of the electronic device. (“the NLU component 128 takes textual input (such as the textual input determined by the ASR component 126) and attempts to make a semantic interpretation of the text. That is, the NLU component 128 determines the meaning behind the text based on the individual words and then implements that meaning.”; White, ¶ [0105]).  
Burke as modified by the sound analysis techniques of Braho, to incorporate the teachings of White to include wherein the at least one processor is further configured to execute the one or more instructions to determine the response by performing at least one of natural language understanding (NLU) or dialogue management (DM) based on the result of performing the ASR of the speech or the utterance of the user of the electronic device. The device arbitration described in White may determine the most appropriate device to both “‘listen’ for sound representing user speech in the environment” and “‘respond’ to the utterance,” thus accounting for the capability of the device to respond to a user utterance (ability to respond) in light of user expectations regarding appropriate timing and context, as recognized by White. (White, ¶¶ [0029]-[0030]).

Regarding claim 11, Burke further discloses An operation method of an electronic device, the operation method comprising (the method disclosed with reference to the “System 100 includ[ing] a mobile device 104”; Burke, ¶ [0035]): determining that the electronic device is to perform automated speech recognition (ASR) (“Allocation of speech recognition tasks...” which is the speech recognition tasks being allocated to one or more of the “multiple speech recognizers,” “is determined based on complexity which is measured using one or more of several metrics,” where speech recognition tasks are automated speech recognition {“distributed recognition tasks have been allocated and recognized by the individual recognition engines” also referred to as “automated speech recognition (ASR) engines”}.; Burke, ¶¶ [0052], [0054]-[0055]) of a speech or an utterance of a user of the electronic device (The speech recognizer can be “mobile device 104,” thus an electronic device, and the user produces “utterances to be recognized” to create a speech signal; Burke, ¶¶ [0051], [0052]), based on ambient noise information of the electronic device (“Allocation of speech recognition tasks is determined Burke, ¶ [0052]) obtained from an audio signal of the speech or the utterance of the user of the electronic device (“A noise detector is used on mobile device 104, which measures the noise level of the speech signal,” thus the noise level (ambient noise information) is obtained from the speech signal (audio signal of the speech or utterance), and the user produces “utterances to be recognized” to create a speech signal at the mobile device 104 (electronic device).; Burke, ¶ [0052]), performing the ASR of the speech or the utterance of the user of the electronic device (“If the speech signal is too noisy, i.e., the signal is determined to exceed a preset threshold level, then the signal is determined to be complex…” where “lightweight recognition tasks {not complex} can be performed on mobile device 104 while heavyweight recognition tasks {complex} are allocated to server 108,” where recognition tasks are automated speech recognition; Burke, ¶¶ [0052], [0041]) based on determining that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device (“Using a complexity-based allocation scheme, the speech recognition task is allocated to a speech recognizer based on the recognition task's complexity.” In the case of the speech signal being determined to not be noisy, thus not complex, the “recognition tasks can be performed on mobile device 104” Thus the mobile device 104 (electronic device) determines that the mobile device 104 (electronic device) is to perform the recognition tasks (ASR) of the “speech signal” (speech or utterance) of the user of the mobile device 104 (electronic device).; Burke, ¶ [0041], [0052]), and outputting a response to the speech or the utterance of the user of the electronic device (“After the distributed recognition tasks have been allocated and recognized by the individual recognition engines, e.g., mobile device 104, back-end telecom server 108A, and application server 108B, the individual results are combined to generate a single recognized result” where the “individual results” of the “multiple speech recognizers,” which includes the Burke, ¶ [0054], FIG. 2), based on a result of performing the ASR of the speech or the utterance of the user of the electronic device (The single recognized result is a combination of the individual results, including the result of the performance of the recognition tasks (ASR) of the “speech signal” (speech or utterance) of the user of the mobile device 104 (electronic device), as performed at the mobile device 104 (electronic device). Also shown in FIG. 2 as “return results to user”; Burke, ¶ [0052], FIG. 2) wherein the at least one processor is further configured to: estimating an accuracy of the ASR of the speech or the utterance of the user… (describes “the embedded recognizer on mobile device 104 is executed first [to perform ASR of the speech signal] {ASR of the speech or the utterance of the user}. The accuracy of device 104 recognizer is then measured using an output confidence score {estimating the accuracy of the ASR}”; Burke, ¶¶ [0059]); based on the accuracy of the ASR of the speech or the utterance of the user being less than a first preset value, transmitting the audio signal of the speech or the utterance of the user to a server to perform the ASR of the speech or the utterance of the user (“If the output confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is below a preset threshold {being less than a first preset value}, the recognition task is allocated to server 108 recognizer {transmit the audio signal of the speech or the utterance of the user to a server to perform the ASR of the speech or the utterance of the user}.”; Burke, ¶¶ [0059]) and based on the accuracy of the ASR of the speech or the utterance of the user being greater than or equal to the first preset value, performing… [ASR] in the electronic device (Conversely, if the confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is not below the preset threshold {thus, being greater than or equal to the first preset value}, the recognition task is maintained at “the embedded recognizer on mobile device 104”; Burke, ¶¶ [0059]).  However, Burke fail(s) to expressly recite wherein the confidence score is based on the ambient noise information of the electronic device; [and] wherein ASR 
The relevance of Braho is described above with relation to claim 1. Regarding claim 11, Braho teaches wherein the confidence score of a hypothesis in a speech recognition system is based on the ambient noise information of the electronic device (Discloses using “non-transient background noise and transient noise events... to adjust a threshold or confidence value or score” thus, the confidence score is based on ambient noise information.; Braho, ¶¶ [0024]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke to incorporate the teachings of Braho to include wherein the confidence score is based on the ambient noise information of the electronic device. “It may be advantageous to know whether each frame of audio represents speech, non-transient background noise or transient noise events” as this may allow for the incorporation of “features [which] better match the models,” as recognized by Braho. (Braho, ¶¶ [0097]). ]). However, Burke and Braho fail to expressly recite wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device, wherein the ASR is processed using an artificial intelligence (AI) algorithm including one or more neural networks and wherein the at least one processor is further configured to: extract a feature from the audio signal of the speech or the utterance of the user; obtain one or more keywords and reliability data corresponding to each of the one or more keywords by inputting the extracted feature to the one or more neural networks; and determine a keyword having a highest reliability among the one or more keywords, as a keyword corresponding to the audio signal.
The relevance of White is described above with relation to claim 1. Regarding claim 11, White teaches wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device. ("an White, ¶¶ [0017], [0094])…wherein the ASR is processed using an artificial intelligence (AI) algorithm including one or more neural networks ("The device or devices performing the ASR processing may include an acoustic front end (AFE) 416 and a speech recognition engine 418... [where] a number of approaches may be used by the AFE 416 to process the audio data, such as...neural network feature vector techniques."; White, ¶¶ [0099]) and wherein the at least one processor is further configured to: extract a feature from the audio signal of the speech or the utterance of the user ("the AFE 416 determines a number of values, called features, representing the qualities of the audio data, along with a set of those values, called a feature vector, representing the features/qualities of the audio data within the frame" where "a keyword spotter may use simplified ASR (automatic speech recognition) techniques...[and] a portion of an audio signal {extract... from the audio signal of the speech} is analyzed...yielding a feature score {...a feature} that represents the similarity of the audio signal model to the trigger expression model."; White, ¶¶ [0099], [0081]); obtain one or more keywords and reliability data corresponding to each of the one or more keywords ("wakeword detection may be implemented using keyword spotting technology {obtain one or more keywords...}" and "wakeword detection may also use a support vector machine (SVM) classifier that receives the one or more feature scores produced by the HMM recognizer. The SVM classifier produces a confidence score {reliability data} indicating the likelihood that an audio  White, ¶¶ [0080], [0082]) by inputting the extracted feature to the one or more neural networks (the system discloses the use of "Neural network feature vector techniques..." to "...process the audio data" which is the input of extracted features into a neural network.; White, ¶¶ [0099]); and determine a keyword having a highest reliability among the one or more keywords, as a keyword corresponding to the audio signal (the confidence score "represent[s] the likelihood that a particular set of words matches those spoken in the utterance" where "Based on... the assigned ASR confidence score," and thus the likelihood that the words match those spoken in the utterance "the ASR component 126 outputs the most likely text {determine a keyword having a highest reliability among the one or more keywords} recognized in the audio data {...as a keyword corresponding to the audio signal}."; White, ¶¶ [0098]).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke as modified by the sound analysis techniques of Braho, to incorporate the teachings of White to include wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device, wherein the ASR is processed using an artificial intelligence (AI) algorithm including one or more neural networks and wherein the at least one processor is further configured to: extract a feature from the audio signal of the speech or the utterance of the user; obtain one or more keywords and reliability data corresponding to each of the one or more keywords by inputting the extracted feature to the one or more neural networks; and determine a keyword having a highest reliability among the one or more keywords, as a keyword corresponding to the audio signal. The use of context in device arbitration allows for the selection of the “best suited voice-enabled device… to respond to the speech utterance,” as recognized by White. (White, ¶ [0027]).

Regarding claim 12, the rejection of claim 11 is incorporated. Burke and Braho disclose all of the elements of the current invention as stated above. However, Burke fail(s) to expressly recite further comprising converting the speech or the utterance of the user of the electronic device into the audio signal of the speech or the utterance of the user of the electronic device.
The relevance of White is described above with relation to claim 1. Regarding claim 12, White discloses further comprising converting the speech or the utterance of the user of the electronic device into the audio signal of the speech or the utterance of the user of the electronic device (“The acoustic front end (AFE) 416 transforms {converting} the audio data from the microphone {the speech or utterance of the user of the electronic device…} into data for processing by the speech recognition engine 418 {…into the audio signal of the speech or the utterance of the user of the electronic device}”; White, ¶ [0099]).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke, as modified by the sound analysis techniques of Braho, and by the context-based device arbitration of White, to further incorporate the teachings of White to include further comprising converting the speech or the utterance of the user of the electronic device into the audio signal of the speech or the utterance of the user of the electronic device. The device arbitration described in White may determine the most appropriate device to both “‘listen’ for sound representing user speech in the environment” and “‘respond’ to the utterance,” thus accounting for the capability of the device to respond to a user utterance (ability to respond) in light of user expectations regarding appropriate timing and context, as recognized by White. (White, ¶¶ [0029]-[0030]).

Regarding claim 14, the rejection of claim 11 is incorporated. Claim 14 is substantially the same as claim 4 and is therefore rejected under the same rationale as above.

Regarding claim 15, the rejection of claim 14 is incorporated. Claim 15 is substantially the same as claim 5 and is therefore rejected under the same rationale as above.

Regarding claim 17, the rejection of claim 11 is incorporated. Claim 17 is substantially the same as claim 7 and is therefore rejected under the same rationale as above.

Regarding claim 20, Burke discloses An automated speech recognition (ASR) system comprising (“System 100 includes a mobile device 104” and a “server 108A”; Burke, ¶ [0035]): an electronic device configured to receive a speech or an utterance of a user of the electronic device (“computer system 300 upon which an embodiment of the invention may be implemented including server 108 and with some differences mobile device 104,” where “audio input [is] received by the mobile device,” wherein audio input is also referred to as a speech input and where speech input is “user-provided speech input.”; Burke, ¶¶ [0064], [0058], [0070]), and a server configured to perform ASR of the speech or the utterance of the user of the electronic device based on an audio signal of the speech or the utterance of the user of the electronic device received from the electronic device (“In a distributed embodiment according to the present invention, multiple recognizers, i.e., the mobile device 104, back-end telecom server 108A, and application server 108B, receive the same speech for speech recognition processing.” Thus, the server 108A performs speech recognition processing (ASR) of the same speech (speech or the utterance of the user of the electronic device); Burke, ¶ [0061]), wherein the electronic device comprises at least one processor configured to: (The method is performed “by computer system 300”, such as the mobile device 104 (electronic device), “in response to processor 304 executing sequences of instructions contained in main memory 306.”; Burke, ¶ [0068]): estimate an accuracy of the ASR of the speech or the utterance of the user… (describes “the embedded recognizer on mobile device 104 is executed first [to perform ASR of the speech signal] {ASR of the speech or the utterance of the user}. The accuracy of Burke, ¶¶ [0059]); based on the accuracy of the ASR of the speech or the utterance of the user being less than a first preset value, transmit the audio signal of the speech or the utterance of the user to a server to perform the ASR of the speech or the utterance of the user (“If the output confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is below a preset threshold {being less than a first preset value}, the recognition task is allocated to server 108 recognizer {transmit the audio signal of the speech or the utterance of the user to a server to perform the ASR of the speech or the utterance of the user}.”; Burke, ¶¶ [0059]) and based on the accuracy of the ASR of the speech or the utterance of the user being greater than or equal to the first preset value, perform… [ASR] in the electronic device (Conversely, if the confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is not below the preset threshold {thus, being greater than or equal to the first preset value}, the recognition task is maintained at “the embedded recognizer on mobile device 104”; Burke, ¶¶ [0059]).  However, Burke fail(s) to expressly recite wherein the confidence score is based on the ambient noise information of the electronic device; [and] wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.
The relevance of Braho is described above with relation to claim 1. Regarding claim 20, Braho teaches wherein the confidence score of a hypothesis in a speech recognition system is based on the ambient noise information of the electronic device (Discloses using “non-transient background noise and transient noise events... to adjust a threshold or confidence value or score” thus, the confidence score is based on ambient noise information.; Braho, ¶¶ [0024]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke to incorporate the teachings of Braho to include wherein the Braho. (Braho, ¶¶ [0097]). However, Burke and Braho fail to expressly recite wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device, wherein the ASR is processed using an artificial intelligence (AI) algorithm including one or more neural networks and wherein the at least one processor is further configured to: extract a feature from the audio signal of the speech or the utterance of the user; obtain one or more keywords and reliability data corresponding to each of the one or more keywords by inputting the extracted feature to the one or more neural networks; and determine a keyword having a highest reliability among the one or more keywords, as a keyword corresponding to the audio signal.
The relevance of White is described above with relation to claim 1. Regarding claim 11, White teaches wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device. ("an orchestration component of the speech processing system may call an automated speech recognition (ASR) component of the speech processing system to process one or more of the audio signals received from the voice-enabled devices using automated speech recognition to generate text data representing the speech utterance... [and] a natural language understanding (NLU) component to process the text data representing the speech utterance" from the ASR component "using natural language understanding to determine an intent" and "the computer-readable media 402 may further store a dialog management component 408 that is responsible for conducting speech dialogs with the user 104 in response to meanings or intents of user speech determined by the NLU component 128"; White, ¶¶ [0017], [0094])…wherein the ASR is processed using an artificial intelligence (AI) algorithm including one or more neural networks ("The device or devices performing the ASR processing may include an acoustic front  White, ¶¶ [0099]) and wherein the at least one processor is further configured to: extract a feature from the audio signal of the speech or the utterance of the user ("the AFE 416 determines a number of values, called features, representing the qualities of the audio data, along with a set of those values, called a feature vector, representing the features/qualities of the audio data within the frame" where "a keyword spotter may use simplified ASR (automatic speech recognition) techniques...[and] a portion of an audio signal {extract... from the audio signal of the speech} is analyzed...yielding a feature score {...a feature} that represents the similarity of the audio signal model to the trigger expression model."; White, ¶¶ [0099], [0081]); obtain one or more keywords and reliability data corresponding to each of the one or more keywords ("wakeword detection may be implemented using keyword spotting technology {obtain one or more keywords...}" and "wakeword detection may also use a support vector machine (SVM) classifier that receives the one or more feature scores produced by the HMM recognizer. The SVM classifier produces a confidence score {reliability data} indicating the likelihood that an audio signal contains the trigger expression {corresponding to each of the one or more keywords}."; White, ¶¶ [0080], [0082]) by inputting the extracted feature to the one or more neural networks (the system discloses the use of "Neural network feature vector techniques..." to "...process the audio data" which is the input of extracted features into a neural network.; White, ¶¶ [0099]); and determine a keyword having a highest reliability among the one or more keywords, as a keyword corresponding to the audio signal (the confidence score "represent[s] the likelihood that a particular set of words matches those spoken in the utterance" where "Based on... the assigned ASR confidence score," and thus the likelihood that the words match those spoken in the utterance "the ASR component 126 outputs the most likely text {determine a keyword having a highest reliability among the one or more keywords} recognized in the audio data {...as a keyword corresponding to the audio signal}."; White, ¶¶ [0098]).  
Burke as modified by the sound analysis techniques of Braho, to incorporate the teachings of White to include wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device, wherein the ASR is processed using an artificial intelligence (AI) algorithm including one or more neural networks and wherein the at least one processor is further configured to: extract a feature from the audio signal of the speech or the utterance of the user; obtain one or more keywords and reliability data corresponding to each of the one or more keywords by inputting the extracted feature to the one or more neural networks; and determine a keyword having a highest reliability among the one or more keywords, as a keyword corresponding to the audio signal. The use of context in device arbitration allows for the selection of the “best suited voice-enabled device… to respond to the speech utterance,” as recognized by White. (White, ¶ [0027]).

Claims 6 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Burke, Braho, and White as applied to claims 1 and 11 above, and further in view of Lebeau and White.

Regarding claim 6, the rejection of claim 1 is incorporated. Burke, Braho, and White disclose all of the elements of the current invention as stated above. However, Burke fail(s) to expressly recite wherein the at least one processor is further configured to execute the one or more instructions to: extract the one or more keywords included in the audio signal of the speech or the utterance of the user of the electronic device based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range and determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the one or more keywords being a preset keyword and based on the ambient noise information of the electronic device.
Lebeau teaches methods, systems, and techniques for automatically monitoring for voice input using current context of the computing device or user interaction. (Lebeau, ¶ [0004]). Regarding claim 6, Lebeau discloses wherein the at least one processor is further configured to execute the one or more instructions to extract one or more keywords (“the mobile computing device 202 is configured to automatically determine when to start and when to stop monitoring for voice input based on a current context associated with the mobile computing device,” where “when at least the microphone 206 a and the speech analysis subsystem 212 are activated during an audio monitoring mode of operation and the speech analysis subsystem 212 detects voice input from a stream of audio data provided by the microphone 206...” the system “can determine... the presence of keywords ...in the particular voice input.”; Lebeau, ¶ [0057], [0067]-[0068]) included in the audio signal of the speech or the utterance of the user of the electronic device (“detects voice input from a stream of audio data provided by the microphone 206” which can include “a user request” (speech or utterance of the user) where the microphone 206 is part of the mobile computing device 202 (the electronic device).; Lebeau, ¶ [0067], FIG. 2) based on the ambient noise information (Discloses systems for “monitoring for voice input using a mobile computing device 172 a-d” where monitoring can be stopped based on the current context including “high level of ambient noise,” thus based on ambient noise information.; Lebeau, ¶ [0050], [0054]) indicating that an ambient noise level of the electronic device has a value in a preset range (using the “high level of ambient noise,” “the mobile device 172 d can generally infer that it is located in a public area…[and] determine to not monitor for voice input.” High level indicates that the ambient noise level has a value above a preset range (where the preset range would be a range of ambient noise expected from a non-public area). Thus the system determines whether to monitor for input or not, based on the ambient noise level having a value in a preset range.; Lebeau, ¶ [0054]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech Burke, as modified by the sound analysis techniques of Braho, and by the context-based device arbitration of White, to incorporate the teachings of Lebeau to include wherein the at least one processor is further configured to execute the one or more instructions to: extract one or more keywords included in the audio signal of the speech or the utterance of the user of the electronic device based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range. The use of current context in recognition of voice input allows for less intrusive voice monitoring without specific adherence to “the formalities associated with prompting a mobile computing device to use voice input,” as recognized by Lebeau. (Lebeau, ¶ [0021]). However, Burke, Braho, White, and Lebeau fail(s) to expressly recite determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the keyword being a preset keyword and based on the ambient noise information of the electronic device.
The relevance of White is described above with relation to claim 1. Regarding claim 6, White discloses determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device (“Following detection of a wakeword, the voice-enabled device 108 sends an audio signal 114 corresponding to the speech utterance 106, to a computing device of the speech processing system 110 that includes the ASR component 126.”; White, ¶ [0091]) based on the one or more keywords being a preset keyword (“voice-enabled devices 108 may receive or capture sound corresponding to the speech utterance 106 of the user via one or more microphones. In certain implementations, the speech utterance 106 may include or be preceded by a wakeword” where “the wakeword... may be a predefined word, phrase, or other sound,” and where when the wakeword is detected “the voice-enabled devices 108 may begin streaming the audio signal, and other data, to the speech processing system 110.”; White, ¶ [0034]) and based on the ambient noise information of the electronic device. (“At 306, the voice-enabled device may determine voice activity using voice activity detection (VAD) to detect the presence of voice in the directional audio signals... [where] the voice activity may be White, ¶¶ [0077], [0079], [0091]).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke, as modified by the sound analysis techniques of Braho, by the context-based device arbitration of White, and by the systems and methods for automatic context monitoring for voice input of Lebeau, to further incorporate the teachings of White to include determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the one or more keywords being a preset keyword and based on the ambient noise information of the electronic device. The use of context in device arbitration allows for the selection of the “best suited voice-enabled device… to respond to the speech utterance,” as recognized by White. (White, ¶ [0027]).

Regarding claim 16, the rejection of claim 11 is incorporated. Claim 16 is substantially the same as claim 6 and is therefore rejected under the same rationale as above.

Claims 8 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Burke, Braho, and White as applied to claims 7 and 17 above, and further in view of Endo.

Regarding claim 8, the rejection of claim 7 is incorporated. Burke, Braho, and White disclose all of the elements of the current invention as stated above. However, Burke, Braho, and White fail(s) to expressly recite wherein the at least one processor is further configured to execute the one or more instructions to select the ASR result from among the first ASR result and the second ASR result, based on the ambient noise information of the electronic device.
Endo teaches a speech recognition system having multiple speech recognizers. (Endo, Col. 1, lines 16-17). Regarding claim 8, Endo discloses wherein the at least one processor is further configured to execute the one or more instructions to (Discloses a “speech recognition system 104 [including a] decision module 208... coupled to the speech recognizers 202, 204, 206…the decision module 208 includes ...a processor 304.”; Endo, Col. 7, lines 6-12) select the ASR result from among the first ASR result and the second ASR result (“Each speech recognizer 202, 204, 206 recognizes the input speech signal 120 output from the microphone 102 according to its own speech recognition mechanism (whether it is a grammar-based speech recognizer or a statistical speech recognizer) and outputs the recognized speech text 130 along with an associated raw confidence score 13... to the decision module 208.” where “The decision module 208 selects the speech text with the highest adjusted confidence score as the most accurate recognized speech text”; Endo, Col. 5, lines 44-51, Col. 6, lines 50-52), based on the ambient noise information of the electronic device. (In some examples, “the decision module 208 adjusts the raw confidence scores to generate adjusted confidence scores associated with the recognized speech text, based upon ...the external data 109… [including] level of background noise.” Thus, the level of background noise {ambient noise information} is used to adjust the confidence score, where the confidence score is used to select the ASR result, as produced by speech recognizers 202, 204, and 206.; Endo, Col. 6, lines 35-43).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke, as modified by the sound analysis techniques of Braho, and by the context-based device arbitration of White, to incorporate the teachings of Endo to include wherein the at least one processor is further configured to execute the one or more instructions Endo. (Endo, Col. 3, lines 3-8).

Regarding claim 18, the rejection of claim 17 is incorporated. Claim 18 is substantially the same as claim 8 and is therefore rejected under the same rationale as above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
	Kristjansson et al. (U.S. Pat. App. Pub. No. 2013/0238325) discloses systems and methods for geotagging areas based on levels of environmental audio including localized correction for varying noise levels to increase audio quality.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657