Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Response to Amendments 
Applicant’s amendment filed on June 15, 2022 has been entered. 
In view of the amendment to the claim(s), the amendment of claim(s) 1, 4, 5, 7, 11, 14, 15, 17, and 20; the cancellation of claim(s) 6 and 16 and the addition of claim(s) 21 have been acknowledged and entered.  
After entry of this amendment, claims 1, 4, 5, 7, 8, 10-12, 14, 15, 17, 18, 20, and 21 are presently pending.
In view of the amendment to claim(s) 1, 4, 5, 7, 11, 14, 15, 17, and 20 and the cancellation of claim(s) 6 and 16, the rejection of claims 1, 2, 4-8, 10-12, 14-18, and 20 under 35 U.S.C. §103 is withdrawn.
In light of the amended claims and newly added claims, new grounds for rejection under 35 U.S.C. §103 are provided in the response below. 

Response to Arguments
Applicant’s arguments regarding the prior art rejections under 35 U.S.C. §103, see pages 12-18 of the Response to Non-Final Office Action dated March 15, 2022, which was received on June 15, 2022 (hereinafter Response and Office Action, respectively), have been fully considered.
Prior to entry of this amendment, claims 1, 4-5, 7, 10-12, 14-15, 17, and 20 stand rejected under 35 U.S.C. § 103 as being unpatentable over Burke (U.S. Pat. App. Pub. No. 2006/0009980, hereinafter Burke) in view of Braho (U.S. Pat. App. Pub. No. 2014/0278391, hereinafter Braho) and White (U.S. Pat. App. Pub. No.  2019/0066670, hereinafter White); claims 6 and 16 stand rejected under 35 U.S.C. § 103 as being unpatentable over Burke in view of Braho, White, and LeBeau (U.S. Pat. App. Pub. No. 2015/0310867, hereinafter LeBeau); claims 8 and 18 stand rejected under 35 U.S.C. § 103 as being unpatentable over Burke in view of Braho, White, and Endo (U.S. Pat. No. 7,228,275, hereinafter Endo).
As Applicant has amended independent claim(s) 1, 11 and 20 to incorporate the limitations of claim 6 and 16, the rejections of claim(s) 1, 11, and 20 have been amended to incorporate the rejection of the respective limitations of claims 6 and 16, as appropriate.
With respect to the rejection(s) of claim(s) 1, 11, and 20 under 35 U.S.C. §103, applicant asserts that the cited references above fail to teach or suggest at least “based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range, extract[ing] a keyword included in the speech or the utterance of the user of the electronic device," or "determin[ing] that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the extracted keyword being the particular keyword, and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device," Applicant’s arguments regarding the current combination of references in light of the amended claims are persuasive. As such, the rejections of claims 1, 11, and 20 under 35 U.S.C. §103 are withdrawn.
Further regarding claims 1, 11, and 20, Applicant asserts that LeBeau fails to teach or suggest "determin[ing] that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the extracted keyword being the particular keyword, and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device." However, LeBeau is not cited to cure the above deficiency. As such, the argument is moot.
Further regarding claims 1, 11, and 20, Applicant asserts that LeBeau fails to teach or suggest “based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range, extract[ing] a keyword included in the speech or the utterance of the user of the electronic device.” This argument is not persuasive.
Applicant asserts that LeBeau is limited to “toggling between monitoring and not monitoring for voice input based on a relatively high level of ambient noise, and detecting voice input when monitoring is activated.” As such, applicant argues that LeBeau fails to teach or suggest "based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range, extract[ing] a keyword included in the speech or the utterance of the user of the electronic device." Even assuming, arguendo, that the disclosure of LeBeau is limited to that which is described above, we must consider what LeBeau discloses in the context of monitoring for voice input and the broadest reasonable interpretation of the phrase “based on the ambient noise information.” 
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. (MPEP 2111). The applicant does not clarify the meaning of the phrase “based on” as having any particular intended meaning in the specification, outside of the plain meaning of the phrase.  As such, we apply the plain meaning of the phrase “based on.” The broadest reasonable interpretation of the phrase “based on” is “to use an idea, a fact, a situation, or other element as the point from which something can be developed.” Examiner specifically notes that the phrase “based on” is purposefully broad, indicating anything which might develop from the fact or element indicated as the basis, regardless of the distance between the basis and outcome which develops. “Based on” does not require that the basis be causative of the outcome or that there be only one outcome derived from any one basis.
As explained in the office action, LeBeau discloses “When at least the microphone 206a and the speech analysis subsystem 212 are activated,” thus when ambient noise level is not relatively high, the system “can determine whether the detected voice input indicates a request from the user for the mobile computing device to perform an operation” (LeBeau, ¶ [0067]). LeBeau further explains that the system “can use various subsystems to aid in determining whether a particular voice input indicates a user request, such as a keyword identifier.” LeBeau explains that the keyword identifier “can determine whether a particular voice input is directed at the mobile computing device 202 based on the presence of keywords from a predetermined group of keywords stored in a keyword repository 243 in the particular voice input.” Thus, LeBeau discloses that the outcome of determining “the presence of keywords from a predetermined group of keywords stored in a keyword repository 243 in the particular voice input {…extract a keyword included in the speech or the utterance of the user of the electronic device}” is based on “determin[ing] whether a particular voice input is directed at the mobile computing device 202,” which is based on “monitor[ing] for voice input,” which is based on the ambient noise level being within an acceptable range. Therefore, the rejection of elements under LeBeau are maintained as described in more detail in the rejection below.
Applicant further argues that the rejection of the dependent claims 4, 5, 7, 8, 10, 12, 14, 15, 17, and 18 should be withdrawn for at least the same reasons as independent claims 1, 11, and 20. Applicant’s arguments in light of the amended claims are persuasive. As such, the rejections of claims 4, 5, 7, 8, 10, 12, 14, 15, 17, and 18 under 35 U.S.C. §103 are withdrawn.
However, upon further consideration, new ground(s) of rejection under 35 U.S.C. §103 are presented for claims 1, 4, 5, 7, 8, 10-12, 14, 15, 17, 18, 20, and 21, in light of combinations of Burke, Braho, LeBeau, White, Endo and newly cited reference Wood (U.S. Pat. App. Pub. No. 2021/0327433, hereinafter Wood), presented in detail with relation to the claim elements below.
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.	

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-5, 7, 11, 14-15, 17, and 20 is/are rejected under 35 U.S.C. 103 as being obvious in light of Burke in view of Braho, LeBeau, and Wood.

Regarding claim 1, Burke discloses An electronic device comprising ("System 100 includes a mobile device 104"; Burke, ¶¶ [0035]): a memory storing one or more instructions ("computer system 300 upon which an embodiment of the invention may be implemented including server 108 and with some differences mobile device 104," where "computer system 300 also includes a main memory 306... [storing] instructions."; Burke, ¶¶ [0064]-[0065]); and at least one processor configured to execute the one or more instructions stored in the memory ("computer system 300" includes a "processor 304 [configured to] execut[e] sequences of instructions contained in main memory 306; Burke, ¶¶ [0068]), wherein when executing the one or more instructions the at least one processor is configured to (The method is performed "by computer system 300 in response to processor 304 executing sequences of instructions contained in main memory 306."; Burke, ¶¶ [0068]) : determine whether the electronic device or a server is to perform automated speech recognition (ASR) ("Allocation of speech recognition tasks..." which is the speech recognition tasks being allocated to one or more of the "multiple speech recognizers," where the multiple recognizers includes "the mobile device 104 and... server 108," can be "determined based on complexity which is measured using one or more of several metrics," where speech recognition tasks are automated speech recognition.; Burke, ¶¶ [0052]) of a speech or an utterance of a user of the electronic device, (The speech recognizer can be "mobile device 104," thus an electronic device, and the user produces "utterances to be recognized" to create a speech signal; Burke, ¶¶ [0051], [0052]) based on ambient noise information of the electronic device ("Allocation of speech recognition tasks is determined based on complexity... [where] background noise determines the complexity level," and "a noise detector is used on mobile device 104, which measures the noise level of the speech signal." Thus, based on background noise {ambient noise information} of the mobile device 104.; Burke, ¶¶ [0052]) obtained from an audio signal of the speech or the utterance of the user of the electronic device ("A noise detector is used on mobile device 104, which measures the noise level of the speech signal," thus the noise level (ambient noise information) is obtained from the speech signal (audio signal of the speech or utterance), and the user produces "utterances to be recognized" to create a speech signal at the mobile device 104 (electronic device).; Burke, ¶¶ [0052]) and whether a particular keyword is recognized in the speech or the utterance of the user of the electronic device (complexity can further include "the vocabulary of words a user is expected to speak" where "low complexity means few alternative words and large complexity means many words," thus keywords which include many alternative words; Burke, ¶¶ [0052]) … perform the ASR of the speech or the utterance of the user of the electronic device ("If the speech signal is too noisy, i.e., the signal is determined to exceed a preset threshold level, then the signal is determined to be complex…" where "lightweight recognition tasks {not complex} can be performed on mobile device 104 while heavyweight recognition tasks {complex} are allocated to server 108."; Burke, ¶¶ [0052], [0041]) based on determining that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device ("Using a complexity-based allocation scheme, the speech recognition task is allocated to a speech recognizer based on the recognition task’s complexity." In the case of the speech signal being determined to not be noisy, thus not complex, the "recognition tasks can be performed on mobile device 104" Thus the mobile device 104 (electronic device) determines that the mobile device 104 (electronic device) is to perform the recognition tasks (ASR) of the "speech signal" (speech or utterance) of the user of the mobile device 104 (electronic device).; Burke, ¶¶ [0041], [0052]), and output a response to the speech or the utterance of the user of the electronic device, ("After the distributed recognition tasks have been allocated and recognized by the individual recognition engines, e.g., mobile device 104, back-end telecom server 108A, and application server 108B, the individual results are combined to generate a single recognized result" where the "individual results" of the "multiple speech recognizers," which includes the mobile device 104, and the "single recognized result" are output.; Burke, ¶¶ [0054], FIG. 2) based on a result of performing the ASR of the speech or the utterance of the user of the electronic device, (The single recognized result is a combination of the individual results, including the result of the performance of the recognition tasks (ASR) of the "speech signal" (speech or utterance) of the user of the mobile device 104 (electronic device), as performed at the mobile device 104 (electronic device). Also shown in FIG. 2 as "return results to user"; Burke, ¶¶ [0052], FIG. 2) wherein the at least one processor is further configured to: estimate an accuracy of the ASR of the speech or the utterance of the user…[based on a confidence score] (describe "the embedded recognizer on mobile device 104 is executed first [to perform ASR of the speech signal] {ASR of the speech or the utterance of the user}. The accuracy of device 104 recognizer is then measured using an output confidence score {estimating the accuracy of the ASR}"; Burke, ¶¶ [0059]);… based on the accuracy of the ASR of the speech or the utterance of the user being less than a first preset value ("If the output confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is below a preset threshold {being less than a first preset value}..."; Burke, ¶¶ [0059]), determine that the server is to perform the ASR of the speech or the utterance of the user ("the recognition task is allocated to server 108 recognizer," which is an action by the system which is based on a determination of the same system to allocate the task to the server {determine that the server is to perform the ASR of the speech or the utterance of the user}."; Burke, ¶¶ [0059]); based on determining that the server is to perform the ASR of the speech or the utterance of the user (the system determines that "the recognition task is allocated to server 108 recognizer {determine that the server is to perform the ASR of the speech or the utterance of the user}."; Burke, ¶¶ [0059]), transmit the audio signal of the speech or the utterance of the user to the server to perform the ASR of the speech or the utterance of the user ("the [speech] recognition task is allocated [by the system] to server 108 recognizer," where allocating the recognition task to the server inherently discloses transmission of the audio signal including the speech which is the subject of the recognition task to the server for performance of said recognition.; Burke, ¶¶ [0059]); and based on the accuracy of the ASR of the speech or the utterance of the user being greater than or equal to the first preset value, perform… [ASR] in the electronic device (Conversely, if the confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is not below the preset threshold {thus, being greater than or equal to the first preset value}, the recognition task is maintained at “the embedded recognizer on mobile device 104”; Burke, ¶¶ [0059]). However, Burke fails to expressly recite wherein the confidence score is based on the ambient noise information of the electronic device; based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range, extract a keyword included in the speech or the utterance of the user of the electronic device; determine whether the electronic device or a server is to perform automated speech recognition (ASR) of a speech or an utterance of a user of the electronic device, based on… whether a particular keyword is recognized in the speech or the utterance of the user of the electronic device, determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the extracted keyword being the particular keyword, and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device; based on determining that the server is to perform the ASR of the speech or the utterance of the user, transmit the audio signal of the speech or the utterance of the user to the server to perform the ASR of the speech or the utterance of the user… and wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.
Braho teaches “analysis of sounds in detecting and/or recognizing speech for use with or in voice-driven systems.” (Braho, ¶ [0001]). Regarding claim 1, Braho teaches wherein the confidence score is based on the ambient noise information of the electronic device (As indicated above, Burke discloses the use of a confidence score in determining the accuracy of ASR. However, Burke doesn’t expressly recite what is included in producing said confidence score. Braho discloses using “non-transient background noise and transient noise events... to adjust a threshold or confidence value or score” thus, the confidence score is based on ambient noise information.; Braho, ¶¶ [0024]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke to incorporate the teachings of Braho to include wherein the confidence score is based on the ambient noise information of the electronic device. “It may be advantageous to know whether each frame of audio represents speech, non-transient background noise or transient noise events” as this may allow for the incorporation of “features [which] better match the models,” as recognized by Braho. (Braho, ¶¶ [0097]). However, Burke and Braho fail to expressly recite based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range, extract a keyword included in the speech or the utterance of the user of the electronic device; determine whether the electronic device or a server is to perform automated speech recognition (ASR) of a speech or an utterance of a user of the electronic device, based on… whether a particular keyword is recognized in the speech or the utterance of the user of the electronic device, determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the extracted keyword being the particular keyword, and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device; based on determining that the server is to perform the ASR of the speech or the utterance of the user, transmit the audio signal of the speech or the utterance of the user to the server to perform the ASR of the speech or the utterance of the user… and wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.
LeBeau teaches methods, systems, and techniques for automatically monitoring for voice input using current context of the computing device or user interaction. (LeBeau, ¶ [0004]). Regarding claim 1, LeBeau teaches based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range (Discloses systems for “monitoring for voice input using a mobile computing device 172 a-d” where monitoring can be stopped based on the current context including “high level of ambient noise,” thus based on ambient noise information. Further, using the “high level of ambient noise,” “the mobile device 172 d can generally infer that it is located in a public area…[and] determine to not monitor for voice input.” High level indicates that the ambient noise level has a value above a preset range (where the preset range would be a range of ambient noise expected from a non-public area). Thus the system determines whether to monitor for input or not, based on the ambient noise level having a value in a preset range.; LeBeau, ¶¶ [0050], [0054]), extract a keyword included in the speech or the utterance of the user of the electronic device ("the mobile computing device 202 is configured to automatically determine when to start and when to stop monitoring for voice input based on a current context associated with the mobile computing device,” where “when at least the microphone 206 a and the speech analysis subsystem 212 are activated during an audio monitoring mode of operation and the speech analysis subsystem 212 detects voice input from a stream of audio data provided by the microphone 206...” the system “can determine... the presence of keywords...in the particular voice input.”; LeBeau, ¶¶ [0057], [0067]-[0068]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke as modified by the sound analysis techniques of Braho, to incorporate the teachings of LeBeau to include based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range, extract a keyword included in the speech or the utterance of the user of the electronic device. The use of current context in recognition of voice input allows for less intrusive voice monitoring without specific adherence to “the formalities associated with prompting a mobile computing device to use voice input,” as recognized by LeBeau. (LeBeau, ¶ [0021]). However, Burke, Braho, and LeBeau fail(s) to expressly recite determine whether the electronic device or a server is to perform automated speech recognition (ASR) of a speech or an utterance of a user of the electronic device, based on… whether a particular keyword is recognized in the speech or the utterance of the user of the electronic device, determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the extracted keyword being the particular keyword, and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device; based on determining that the server is to perform the ASR of the speech or the utterance of the user, transmit the audio signal of the speech or the utterance of the user to the server to perform the ASR of the speech or the utterance of the user… and wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.
Wood teaches systems and methods for “distributing the performance of speech recognition among a remote control device and a voice platform”. (Wood, ¶ [0002]). Regarding claim 1, Wood teaches determine whether the electronic device or a server is to perform automated speech recognition (ASR) of a speech or an utterance of a user of the electronic device, based on… whether a particular keyword is recognized in the speech or the utterance of the user of the electronic device ("In some embodiments, audio responsive electronic device 122 may send the preprocessed voice input to voice platform 192 at the third-party entity based on a detected trigger word and configuration information provided by a voice adaptor 196," where the voice platform for the command module and/or the digital assistant can be "implemented in a cloud computing platform... [or] on a server computer," and where digital assistant 180 "may analyze the voice input to recognize trigger words and commands."; Wood, ¶¶ [0085], [0063]) determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device ("user interface and command module 128" makes a determination to "transmit the audio input (e.g., voice input) to digital assistant(s) 180 {determine that the electronic device is to perform...}" where the digital assistant 180 "may analyze the voice input to recognize... commands {...to perform the ASR of the speech or the utterance...," and where the voice platform for the command module and/or the digital assistant can be "implemented in a cloud computing platform... [or] on a server computer."; Wood, ¶¶ [0061], [0085]) based on the extracted keyword being the particular keyword (where the determination is "based on a recognized trigger word," and where the trigger word is extracted from the audio input to match a recognized trigger word.; Wood, ¶¶ [0061]), and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device (In the absence of a recognized trigger word, "the user interface and command module 128 may analyze the voice input to recognize trigger words and commands, using any well-known signal recognition techniques, procedures, technologies"; Wood, ¶¶ [0063]); based on determining that the server is to perform the ASR of the speech or the utterance of the user (Based on the determination to "transmit the audio input (e.g., voice input) to digital assistant(s) 180 based on a recognized trigger word,"; Wood, ¶¶ [0061], [0063]), transmit the audio signal of the speech or the utterance of the user to the server to perform the ASR of the speech or the utterance of the user (the command module 128 may then "transmit the audio input to digital assistant(s) 180 via transceiver 130" where the digital assistant 180 "may analyze the voice input to recognize trigger words and commands."; Wood, ¶¶ [0063]) and wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device (“digital assistant 180 includes an automated speech recognizer (ASR) 1102, natural language unit (NLU) 1104, and a text-to-speech (TTS) unit 1106. In some other embodiments, voice platform 192 may include a common ASR 1102 for one or more digital assistants 180.”; Wood, ¶¶ [0090]).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke, as modified by the sound analysis techniques of Braho, and by the systems and methods for automatic context monitoring for voice input of LeBeau, to incorporate the teachings of Wood to include determine whether the electronic device or a server is to perform automated speech recognition (ASR) of a speech or an utterance of a user of the electronic device, based on… whether a particular keyword is recognized in the speech or the utterance of the user of the electronic device, determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the extracted keyword being the particular keyword, and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device; based on determining that the server is to perform the ASR of the speech or the utterance of the user, transmit the audio signal of the speech or the utterance of the user to the server to perform the ASR of the speech or the utterance of the user… and wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device. “Distributing the performance of speech recognition among a remote control device and a voice platform… [can] improve speech recognition and reduce power usage, network usage, memory usage, and processing time,” as recognized by Wood. (Wood, ¶ [0002]).

Regarding claim 4, Burke further discloses wherein the at least one processor is further configured to execute the one or more instructions to (The method is performed “by computer system 300 in response to processor 304 executing sequences of instructions contained in main memory 306.”; Burke, ¶ [0068]) determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device (“Using a complexity-based allocation scheme, the speech recognition task is allocated to a speech recognizer based on the recognition task's complexity.” In the case of the speech signal being determined to not be noisy, thus not complex, the “recognition tasks can be performed on mobile device 104” Thus the mobile device 104 (electronic device) determines that the mobile device 104 (electronic device) is to perform the recognition tasks (ASR) of the “speech signal” (speech or utterance) of the user of the mobile device 104 (electronic device).; Burke, ¶ [0041], [0052]) based on the ambient noise information (“Allocation of speech recognition tasks is determined based on complexity... [where] background noise determines the complexity level,” and “a noise detector is used on mobile device 104, which measures the noise level of the speech signal.” Thus, background noise {ambient noise information} of the mobile device 104; Burke, ¶ [0052]) indicating that the ambient noise level of the electronic device is less than a second preset value (“If the speech signal is too noisy, i.e., the signal is determined to exceed a preset threshold level, then the signal is determined to be complex.” Thus, when noise information of the speech signal (ambient noise information) indicates that the noise level (ambient noise level) does not exceed a preset threshold level (less than a second preset value), the system determines that the mobile device 104 is to perform the ASR.; Burke, ¶ [0052]).  

Regarding claim 5, Burke further discloses further comprising a communicator configured to transmit to and receive data from the server (“Computer system 300 also includes a communication interface 318{communicator},” where the “communication interface 308 provides two-way data communication {configured to transmit to and receive data from the server}”; Burke, ¶ [0070]), wherein the at least one processor is further configured to execute the one or more instructions to (The method is performed “by computer system 300 in response to processor 304 executing sequences of instructions contained in main memory 306.”; Burke, ¶ [0068]) : control the communicator to transmit the audio signal of the speech or the utterance of the user of the electronic device to the server (“Computer system 300” controls the communication interface 318 {communicator} to “send messages and receive data, including program code, through the network(s), network link 320 and communication interface 318,” such as “submitting recognition input {the audio signal of the speech or utterance of the user} to multiple recognition systems... [such as] server 108 {server}.” Also shown in FIG. 2 in the interactions, shown by way of arrows, between mobile device 104 and servers 108A and 108B.; Burke, ¶¶ [0072], [0074], FIG. 2), and receive, from the server, an ASR result of the speech or the utterance of the user of the electronic device (“ the mobile device 104 allocates the recognition tasks, using a task allocation mechanism according to one of the above-described approaches, to multiple recognizers based on one or more of the aforementioned allocation methods” where the “recognizer performs speech recognition processing based on the same speech input received {the speech or the utterance of the user of the electronic device} and provides the results to {thus, receiving the ASR result from the server} the mobile device 104.”; Burke, ¶ [0061]), based on the ambient noise information indicating that the ambient noise level of the electronic device is greater than or equal to the second preset value (“Using a complexity-based allocation scheme, the speech recognition task is allocated to a speech recognizer based on the recognition task's complexity,” and “if the speech signal is too noisy, i.e., the signal is determined to exceed a preset threshold level, then the signal is determined to be complex.” Thus, noise information of the speech signal (ambient noise information) indicates that the noise level (ambient noise level) exceeds a preset threshold level (greater than or equal to the second preset value); Burke, ¶¶ [0041], [0052]).  

Regarding claim 7, Burke further discloses further comprising a communicator configured to transmit to and receive data from an server (“Computer system 300 also includes a communication interface 318{communicator},” where the “communication interface 308 provides two-way data communication {configured to transmit to and receive data from an server}”; Burke, ¶ [0070]), wherein the at least one processor is further configured to execute the one or more instructions to (The method is performed “by computer system 300 in response to processor 304 executing sequences of instructions contained in main memory 306.”; Burke, ¶ [0068]) : obtain a first ASR result by performing the ASR of the speech or the utterance of the user of the electronic device (“In a distributed embodiment according to the present invention, multiple recognizers, i.e., the mobile device 104, back-end telecom server 108A, and application server 108B, receive the same speech for speech recognition processing. According to the distributed embodiment, each recognizer performs speech recognition processing based on the same speech input received and provides the results to the mobile device 104.” Thus, the mobile device 104 (electronic device) performs speech recognition processing based on the speech input received {performing the ASR of the speech or the utterance of the user of the electronic device} and provides the results to the mobile device 104 {obtaining a first ASR result}; Burke, ¶ [0061], FIG. 2), control the communicator to transmit the audio signal of the speech or the utterance of the user of the electronic device to the server (“Computer system 300” controls the communication interface 318 {communicator} to “send messages and receive data, including program code, through the network(s), network link 320 and communication interface 318,” such as “submitting recognition input {the audio signal of the speech or utterance of the user} to multiple recognition systems... [such as] server 108 {server}.” Also shown in FIG. 2 in the interactions, shown by way of arrows, between mobile device 104 and servers 108A and 108B.; Burke, ¶¶ [0072], [0074], FIG. 2), receive a second ASR result from the server (In the distributed embodiment the “back-end telecom server 108A... [also] receive the same speech for speech recognition processing...[where the back-end telecom server 108A] performs speech recognition processing based on the same speech input received and provides the results to the mobile device 104.” Thus, the back-end telecom server 108A {server} performs speech recognition processing based on the speech input received {performing the ASR of the speech or the utterance of the user of the electronic device} and provides the results to the mobile device 104 {obtaining a second ASR result}; Burke, ¶ [0061], FIG. 2), select an ASR result from among the first ASR result and the second ASR result (“After receiving each recognizer's results {the first ASR result and the second ASR result}, mobile device 104 combines the results based on a plural voting technique ... [where] Each word in the recognized result from each recognizer is compared and if at least two out of three recognizer results for a given word match, then that word is selected as the recognized word. If none of the recognizer results match, then the confidence score and weighting for each word recognized by a recognizer are combined to arrive at a comparison value.”; Burke, ¶ [0063]), and output the response to the speech or the utterance of the user of the electronic device (“After the distributed recognition tasks have been allocated and recognized by the individual recognition engines, e.g., mobile device 104, back-end telecom server 108A, and application server 108B, the individual results are combined to generate a single recognized result” where the “individual results” of the “multiple speech recognizers” and the “single recognized result” are output.; Burke, ¶ [0054], FIG. 2), based on the ASR result (The single recognized result is a combination of the individual results, as described above, based on the ASR performed at the mobile device 104 and the server 108A.; Burke, ¶ [0063]).

Regarding claim 11, Burke further discloses An operation method of an electronic device, the operation method comprising (the method disclosed with reference to the “System 100 includ[ing] a mobile device 104”; Burke, ¶ [0035]): determining whether the electronic device or a server is to perform automated speech recognition (ASR) ("Allocation of speech recognition tasks..." which is the speech recognition tasks being allocated to one or more of the "multiple speech recognizers," where the multiple recognizers includes "the mobile device 104 and... server 108," can be "determined based on complexity which is measured using one or more of several metrics," where speech recognition tasks are automated speech recognition.; Burke, ¶¶ [0052]) of a speech or an utterance of a user of the electronic device, (The speech recognizer can be "mobile device 104," thus an electronic device, and the user produces "utterances to be recognized" to create a speech signal; Burke, ¶¶ [0051], [0052]) based on ambient noise information of the electronic device ("Allocation of speech recognition tasks is determined based on complexity... [where] background noise determines the complexity level," and "a noise detector is used on mobile device 104, which measures the noise level of the speech signal." Thus, based on background noise {ambient noise information} of the mobile device 104.; Burke, ¶¶ [0052]) obtained from an audio signal of the speech or the utterance of the user of the electronic device ("A noise detector is used on mobile device 104, which measures the noise level of the speech signal," thus the noise level (ambient noise information) is obtained from the speech signal (audio signal of the speech or utterance), and the user produces "utterances to be recognized" to create a speech signal at the mobile device 104 (electronic device).; Burke, ¶¶ [0052]) and whether a … keyword is recognized in the speech or the utterance of the user of the electronic device (complexity can further include "the vocabulary of words a user is expected to speak" where "low complexity means few alternative words and large complexity means many words," thus keywords which include many alternative words; Burke, ¶¶ [0052]) … performing the ASR of the speech or the utterance of the user of the electronic device ("If the speech signal is too noisy, i.e., the signal is determined to exceed a preset threshold level, then the signal is determined to be complex…" where "lightweight recognition tasks {not complex} can be performed on mobile device 104 while heavyweight recognition tasks {complex} are allocated to server 108."; Burke, ¶¶ [0052], [0041]) based on determining that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device ("Using a complexity-based allocation scheme, the speech recognition task is allocated to a speech recognizer based on the recognition task’s complexity." In the case of the speech signal being determined to not be noisy, thus not complex, the "recognition tasks can be performed on mobile device 104" Thus the mobile device 104 (electronic device) determines that the mobile device 104 (electronic device) is to perform the recognition tasks (ASR) of the "speech signal" (speech or utterance) of the user of the mobile device 104 (electronic device).; Burke, ¶¶ [0041], [0052]), and outputting a response to the speech or the utterance of the user of the electronic device, ("After the distributed recognition tasks have been allocated and recognized by the individual recognition engines, e.g., mobile device 104, back-end telecom server 108A, and application server 108B, the individual results are combined to generate a single recognized result" where the "individual results" of the "multiple speech recognizers," which includes the mobile device 104, and the "single recognized result" are output.; Burke, ¶¶ [0054], FIG. 2) based on a result of performing the ASR of the speech or the utterance of the user of the electronic device, (The single recognized result is a combination of the individual results, including the result of the performance of the recognition tasks (ASR) of the "speech signal" (speech or utterance) of the user of the mobile device 104 (electronic device), as performed at the mobile device 104 (electronic device). Also shown in FIG. 2 as "return results to user"; Burke, ¶¶ [0052], FIG. 2) wherein the determining further comprises: estimating an accuracy of the ASR of the speech or the utterance of the user…[based on a confidence score] (describe "the embedded recognizer on mobile device 104 is executed first [to perform ASR of the speech signal] {ASR of the speech or the utterance of the user}. The accuracy of device 104 recognizer is then measured using an output confidence score {estimating the accuracy of the ASR}"; Burke, ¶¶ [0059]);… based on the accuracy of the ASR of the speech or the utterance of the user being less than a first preset value ("If the output confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is below a preset threshold {being less than a first preset value}..."; Burke, ¶¶ [0059]), determining that the server is to perform the ASR of the speech or the utterance of the user ("the recognition task is allocated to server 108 recognizer," which is an action by the system which is based on a determination of the same system to allocate the task to the server {determine that the server is to perform the ASR of the speech or the utterance of the user}."; Burke, ¶¶ [0059]); based on determining that the server is to perform the ASR of the speech or the utterance of the user (the system determines that "the recognition task is allocated to server 108 recognizer {determine that the server is to perform the ASR of the speech or the utterance of the user}."; Burke, ¶¶ [0059]), transmitting the audio signal of the speech or the utterance of the user to the server to perform the ASR of the speech or the utterance of the user ("the [speech] recognition task is allocated [by the system] to server 108 recognizer," where allocating the recognition task to the server inherently discloses transmission of the audio signal including the speech which is the subject of the recognition task to the server for performance of said recognition.; Burke, ¶¶ [0059]); and based on the accuracy of the ASR of the speech or the utterance of the user being greater than or equal to the first preset value, perform… [ASR] in the electronic device (Conversely, if the confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is not below the preset threshold {thus, being greater than or equal to the first preset value}, the recognition task is maintained at “the embedded recognizer on mobile device 104”; Burke, ¶¶ [0059]). However, Burke fails to expressly recite wherein the confidence score is based on the ambient noise information of the electronic device; based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range, extract a keyword included in the speech or the utterance of the user of the electronic device; determine whether the electronic device or a server is to perform automated speech recognition (ASR) of a speech or an utterance of a user of the electronic device, based on… whether a particular keyword is recognized in the speech or the utterance of the user of the electronic device, determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the extracted keyword being the particular keyword, and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device; based on determining that the server is to perform the ASR of the speech or the utterance of the user, transmit the audio signal of the speech or the utterance of the user to the server to perform the ASR of the speech or the utterance of the user… and wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.
The relevance of Braho is described above with relation to claim 1. Regarding claim 11, Braho teaches wherein the confidence score is based on the ambient noise information of the electronic device (As indicated above, Burke discloses the use of a confidence score in determining the accuracy of ASR. However, Burke doesn’t expressly recite what is included in producing said confidence score. Braho discloses using “non-transient background noise and transient noise events... to adjust a threshold or confidence value or score” thus, the confidence score is based on ambient noise information.; Braho, ¶¶ [0024]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke to incorporate the teachings of Braho to include wherein the confidence score is based on the ambient noise information of the electronic device. “It may be advantageous to know whether each frame of audio represents speech, non-transient background noise or transient noise events” as this may allow for the incorporation of “features [which] better match the models,” as recognized by Braho. (Braho, ¶¶ [0097]). However, Burke and Braho fail to expressly recite based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range, extract a keyword included in the speech or the utterance of the user of the electronic device; determine whether the electronic device or a server is to perform automated speech recognition (ASR) of a speech or an utterance of a user of the electronic device, based on… whether a particular keyword is recognized in the speech or the utterance of the user of the electronic device, determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the extracted keyword being the particular keyword, and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device; based on determining that the server is to perform the ASR of the speech or the utterance of the user, transmit the audio signal of the speech or the utterance of the user to the server to perform the ASR of the speech or the utterance of the user… and wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.
The relevance of LeBeau is described above with relation to claim 1. Regarding claim 11, LeBeau teaches based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range (Discloses systems for “monitoring for voice input using a mobile computing device 172 a-d” where monitoring can be stopped based on the current context including “high level of ambient noise,” thus based on ambient noise information. Further, using the “high level of ambient noise,” “the mobile device 172 d can generally infer that it is located in a public area…[and] determine to not monitor for voice input.” High level indicates that the ambient noise level has a value above a preset range (where the preset range would be a range of ambient noise expected from a non-public area). Thus the system determines whether to monitor for input or not, based on the ambient noise level having a value in a preset range.; LeBeau, ¶¶ [0050], [0054]), extract a keyword included in the speech or the utterance of the user of the electronic device ("the mobile computing device 202 is configured to automatically determine when to start and when to stop monitoring for voice input based on a current context associated with the mobile computing device,” where “when at least the microphone 206 a and the speech analysis subsystem 212 are activated during an audio monitoring mode of operation and the speech analysis subsystem 212 detects voice input from a stream of audio data provided by the microphone 206...” the system “can determine... the presence of keywords...in the particular voice input.”; LeBeau, ¶¶ [0057], [0067]-[0068]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke as modified by the sound analysis techniques of Braho, to incorporate the teachings of LeBeau to include based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range, extract a keyword included in the speech or the utterance of the user of the electronic device. The use of current context in recognition of voice input allows for less intrusive voice monitoring without specific adherence to “the formalities associated with prompting a mobile computing device to use voice input,” as recognized by LeBeau. (LeBeau, ¶ [0021]). However, Burke, Braho, and LeBeau fail(s) to expressly recite determine whether the electronic device or a server is to perform automated speech recognition (ASR) of a speech or an utterance of a user of the electronic device, based on… whether a particular keyword is recognized in the speech or the utterance of the user of the electronic device, determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the extracted keyword being the particular keyword, and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device; based on determining that the server is to perform the ASR of the speech or the utterance of the user, transmit the audio signal of the speech or the utterance of the user to the server to perform the ASR of the speech or the utterance of the user… and wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.
The relevance of Wood is described above with relation to claim 1. Regarding claim 11, Wood teaches determine whether the electronic device or a server is to perform automated speech recognition (ASR) of a speech or an utterance of a user of the electronic device, based on… whether a particular keyword is recognized in the speech or the utterance of the user of the electronic device ("In some embodiments, audio responsive electronic device 122 may send the preprocessed voice input to voice platform 192 at the third-party entity based on a detected trigger word and configuration information provided by a voice adaptor 196," where the voice platform for the command module and/or the digital assistant can be "implemented in a cloud computing platform... [or] on a server computer," and where digital assistant 180 "may analyze the voice input to recognize trigger words and commands."; Wood, ¶¶ [0085], [0063]) determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device ("user interface and command module 128" makes a determination to "transmit the audio input (e.g., voice input) to digital assistant(s) 180 {determine that the electronic device is to perform...}" where the digital assistant 180 "may analyze the voice input to recognize... commands {...to perform the ASR of the speech or the utterance...," and where the voice platform for the command module and/or the digital assistant can be "implemented in a cloud computing platform... [or] on a server computer."; Wood, ¶¶ [0061], [0085]) based on the extracted keyword being the particular keyword (where the determination is "based on a recognized trigger word," and where the trigger word is extracted from the audio input to match a recognized trigger word.; Wood, ¶¶ [0061]), and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device (In the absence of a recognized trigger word, "the user interface and command module 128 may analyze the voice input to recognize trigger words and commands, using any well-known signal recognition techniques, procedures, technologies"; Wood, ¶¶ [0063]); based on determining that the server is to perform the ASR of the speech or the utterance of the user (Based on the determination to "transmit the audio input (e.g., voice input) to digital assistant(s) 180 based on a recognized trigger word,"; Wood, ¶¶ [0061], [0063]), transmit the audio signal of the speech or the utterance of the user to the server to perform the ASR of the speech or the utterance of the user (the command module 128 may then "transmit the audio input to digital assistant(s) 180 via transceiver 130" where the digital assistant 180 "may analyze the voice input to recognize trigger words and commands."; Wood, ¶¶ [0063]) and wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device (“digital assistant 180 includes an automated speech recognizer (ASR) 1102, natural language unit (NLU) 1104, and a text-to-speech (TTS) unit 1106. In some other embodiments, voice platform 192 may include a common ASR 1102 for one or more digital assistants 180.”; Wood, ¶¶ [0090])..  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke, as modified by the sound analysis techniques of Braho, and by the systems and methods for automatic context monitoring for voice input of LeBeau, to incorporate the teachings of Wood to include determine whether the electronic device or a server is to perform automated speech recognition (ASR) of a speech or an utterance of a user of the electronic device, based on… whether a particular keyword is recognized in the speech or the utterance of the user of the electronic device, determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the extracted keyword being the particular keyword, and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device; based on determining that the server is to perform the ASR of the speech or the utterance of the user, transmit the audio signal of the speech or the utterance of the user to the server to perform the ASR of the speech or the utterance of the user. “Distributing the performance of speech recognition among a remote control device and a voice platform… [can] improve speech recognition and reduce power usage, network usage, memory usage, and processing time,” as recognized by Wood. (Wood, ¶ [0002]).

Regarding claim 14, the rejection of claim 11 is incorporated. Claim 14 is substantially the same as claim 4 and is therefore rejected under the same rationale as above.

Regarding claim 15, the rejection of claim 14 is incorporated. Claim 15 is substantially the same as claim 5 and is therefore rejected under the same rationale as above.

Regarding claim 17, the rejection of claim 11 is incorporated. Claim 17 is substantially the same as claim 7 and is therefore rejected under the same rationale as above.

Regarding claim 20, Burke discloses An automated speech recognition (ASR) system comprising (“System 100 includes a mobile device 104” and a “server 108A”; Burke, ¶ [0035]): an electronic device configured to receive a speech or an utterance of a user of the electronic device (“computer system 300 upon which an embodiment of the invention may be implemented including server 108 and with some differences mobile device 104,” where “audio input [is] received by the mobile device,” wherein audio input is also referred to as a speech input and where speech input is “user-provided speech input.”; Burke, ¶¶ [0064], [0058], [0070]), and a server configured to perform ASR of the speech or the utterance of the user of the electronic device based on an audio signal of the speech or the utterance of the user of the electronic device received from the electronic device (“In a distributed embodiment according to the present invention, multiple recognizers, i.e., the mobile device 104, back-end telecom server 108A, and application server 108B, receive the same speech for speech recognition processing.” Thus, the server 108A performs speech recognition processing (ASR) of the same speech (speech or the utterance of the user of the electronic device); Burke, ¶ [0061]), wherein the electronic device comprises at least one processor configured to: (The method is performed “by computer system 300”, such as the mobile device 104 (electronic device), “in response to processor 304 executing sequences of instructions contained in main memory 306.”; Burke, ¶ [0068]): estimate an accuracy of the ASR of the speech or the utterance of the user…[based on a confidence score] (describe "the embedded recognizer on mobile device 104 is executed first [to perform ASR of the speech signal] {ASR of the speech or the utterance of the user}. The accuracy of device 104 recognizer is then measured using an output confidence score {estimating the accuracy of the ASR}"; Burke, ¶¶ [0059]);… based on the accuracy of the ASR of the speech or the utterance of the user being less than a first preset value ("If the output confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is below a preset threshold {being less than a first preset value}..."; Burke, ¶¶ [0059]), determine that the server is to perform the ASR of the speech or the utterance of the user ("the recognition task is allocated to server 108 recognizer," which is an action by the system which is based on a determination of the same system to allocate the task to the server {determine that the server is to perform the ASR of the speech or the utterance of the user}."; Burke, ¶¶ [0059])… based on determining that the server is to perform the ASR of the speech or the utterance of the user (the system determines that "the recognition task is allocated to server 108 recognizer {determine that the server is to perform the ASR of the speech or the utterance of the user}."; Burke, ¶¶ [0059]), transmit the audio signal of the speech or the utterance of the user to the server to perform the ASR of the speech or the utterance of the user ("the [speech] recognition task is allocated [by the system] to server 108 recognizer," where allocating the recognition task to the server inherently discloses transmission of the audio signal including the speech which is the subject of the recognition task to the server for performance of said recognition.; Burke, ¶¶ [0059]); and based on the accuracy of the ASR of the speech or the utterance of the user being greater than or equal to the first preset value, perform… [ASR] in the electronic device (Conversely, if the confidence score {based on the accuracy of the ASR of the speech or the utterance of the user} is not below the preset threshold {thus, being greater than or equal to the first preset value}, the recognition task is maintained at “the embedded recognizer on mobile device 104”; Burke, ¶¶ [0059]). However, Burke fails to expressly recite wherein the confidence score is based on the ambient noise information of the electronic device; based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range, extract a keyword included in the speech or the utterance of the user of the electronic device; determine whether the electronic device or a server is to perform automated speech recognition (ASR) of a speech or an utterance of a user of the electronic device, based on… whether a particular keyword is recognized in the speech or the utterance of the user of the electronic device, determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the extracted keyword being the particular keyword, and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device; based on determining that the server is to perform the ASR of the speech or the utterance of the user, transmit the audio signal of the speech or the utterance of the user to the server to perform the ASR of the speech or the utterance of the user.  However, Burke fail(s) to expressly recite wherein the confidence score is based on the ambient noise information of the electronic device; based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range, extract a keyword included in the speech or the utterance of the user of the electronic device; determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the extracted keyword being a particular keyword, and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device; [and] wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.
The relevance of Braho is described above with relation to claim 1. Regarding claim 20, Braho teaches wherein the confidence score is based on the ambient noise information of the electronic device (As indicated above, Burke discloses the use of a confidence score in determining the accuracy of ASR. However, Burke doesn’t expressly recite what is included in producing said confidence score. Braho discloses using “non-transient background noise and transient noise events... to adjust a threshold or confidence value or score” thus, the confidence score is based on ambient noise information.; Braho, ¶¶ [0024]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke to incorporate the teachings of Braho to include wherein the confidence score is based on the ambient noise information of the electronic device. “It may be advantageous to know whether each frame of audio represents speech, non-transient background noise or transient noise events” as this may allow for the incorporation of “features [which] better match the models,” as recognized by Braho. (Braho, ¶¶ [0097]). However, Burke and Braho fail to expressly recite based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range, extract a keyword included in the speech or the utterance of the user of the electronic device; determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the extracted keyword being a particular keyword, and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device; [and] wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.
The relevance of LeBeau is described above with relation to claim 1. Regarding claim 20, LeBeau teaches based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range (Discloses systems for “monitoring for voice input using a mobile computing device 172 a-d” where monitoring can be stopped based on the current context including “high level of ambient noise,” thus based on ambient noise information. Further, using the “high level of ambient noise,” “the mobile device 172 d can generally infer that it is located in a public area…[and] determine to not monitor for voice input.” High level indicates that the ambient noise level has a value above a preset range (where the preset range would be a range of ambient noise expected from a non-public area). Thus the system determines whether to monitor for input or not, based on the ambient noise level having a value in a preset range.; LeBeau, ¶¶ [0050], [0054]), extract a keyword included in the speech or the utterance of the user of the electronic device ("the mobile computing device 202 is configured to automatically determine when to start and when to stop monitoring for voice input based on a current context associated with the mobile computing device,” where “when at least the microphone 206 a and the speech analysis subsystem 212 are activated during an audio monitoring mode of operation and the speech analysis subsystem 212 detects voice input from a stream of audio data provided by the microphone 206...” the system “can determine... the presence of keywords...in the particular voice input.”; LeBeau, ¶¶ [0057], [0067]-[0068]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke as modified by the sound analysis techniques of Braho, to incorporate the teachings of LeBeau to include based on the ambient noise information indicating that an ambient noise level of the electronic device has a value in a preset range, extract a keyword included in the speech or the utterance of the user of the electronic device. The use of current context in recognition of voice input allows for less intrusive voice monitoring without specific adherence to “the formalities associated with prompting a mobile computing device to use voice input,” as recognized by LeBeau. (LeBeau, ¶ [0021]). However, Burke, Braho, and LeBeau fail(s) to expressly recite determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the extracted keyword being a particular keyword, and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device; [and] wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device.
The relevance of Wood is described above with relation to claim 1. Regarding claim 20, Wood teaches determine that the electronic device or a server is to perform automated speech recognition (ASR) of a speech or an utterance of a user of the electronic device, based on… whether a particular keyword is recognized in the speech or the utterance of the user of the electronic device ("In some embodiments, audio responsive electronic device 122 may send the preprocessed voice input to voice platform 192 at the third-party entity based on a detected trigger word and configuration information provided by a voice adaptor 196," where the voice platform for the command module and/or the digital assistant can be "implemented in a cloud computing platform... [or] on a server computer," and where digital assistant 180 "may analyze the voice input to recognize trigger words and commands."; Wood, ¶¶ [0085], [0063]) determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device ("user interface and command module 128" makes a determination to "transmit the audio input (e.g., voice input) to digital assistant(s) 180 {determine that the electronic device is to perform...}" where the digital assistant 180 "may analyze the voice input to recognize... commands {...to perform the ASR of the speech or the utterance...," and where the voice platform for the command module and/or the digital assistant can be "implemented in a cloud computing platform... [or] on a server computer."; Wood, ¶¶ [0061], [0085]) based on the extracted keyword being the particular keyword (where the determination is "based on a recognized trigger word," and where the trigger word is extracted from the audio input to match a recognized trigger word.; Wood, ¶¶ [0061]), and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device (In the absence of a recognized trigger word, "the user interface and command module 128 may analyze the voice input to recognize trigger words and commands, using any well-known signal recognition techniques, procedures, technologies"; Wood, ¶¶ [0063]); based on determining that the server is to perform the ASR of the speech or the utterance of the user (Based on the determination to "transmit the audio input (e.g., voice input) to digital assistant(s) 180 based on a recognized trigger word,"; Wood, ¶¶ [0061], [0063]), transmit the audio signal of the speech or the utterance of the user to the server to perform the ASR of the speech or the utterance of the user (the command module 128 may then "transmit the audio input to digital assistant(s) 180 via transceiver 130" where the digital assistant 180 "may analyze the voice input to recognize trigger words and commands."; Wood, ¶¶ [0063]) and wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device (“digital assistant 180 includes an automated speech recognizer (ASR) 1102, natural language unit (NLU) 1104, and a text-to-speech (TTS) unit 1106. In some other embodiments, voice platform 192 may include a common ASR 1102 for one or more digital assistants 180.”; Wood, ¶¶ [0090]).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke, as modified by the sound analysis techniques of Braho, and by the systems and methods for automatic context monitoring for voice input of LeBeau, to incorporate the teachings of Wood to include determine that the electronic device is to perform the ASR of the speech or the utterance of the user of the electronic device based on the extracted keyword being a particular keyword, and otherwise determine that the server is to perform the ASR of the speech or the utterance of the user of the electronic device; [and] wherein ASR further comprises performing at least one of natural language understanding (NLU) or dialogue management (DM) in the electronic device. “Distributing the performance of speech recognition among a remote control device and a voice platform… [can] improve speech recognition and reduce power usage, network usage, memory usage, and processing time,” as recognized by Wood. (Wood, ¶ [0002]).

Claims 10, 12, and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Burke, Braho, LeBeau, and Wood as applied to claims 1 and 11 above, and further in view of White.

Regarding claim 10, the rejection of claim 1 is incorporated. Burke, Braho, LeBeau, and Wood disclose all of the elements of the current invention as stated above. However, Burke fail(s) to expressly recite wherein the at least one processor is further configured to execute the one or more instructions to determine the response by performing at least one of natural language understanding (NLU) or dialogue management (DM) based on the result of performing the ASR of the speech or the utterance of the user of the electronic device.
White teaches systems and methods for using context in device arbitration. (White, ¶ [0010]). Regarding claim 10, White discloses wherein the at least one processor is further configured to execute the one or more instructions to (“The NLU component 128 (e.g., server) may include various components, including potentially dedicated processor(s), memory, storage, etc.” which “execute instructions stored on the computer-readable media”; White, ¶ [0104]-[0135]) determine the response by performing the at least one of natural language understanding (NLU) or dialogue management (DM) (“ASR results … may be sent to the speech processing system 110, for natural language understanding (NLU) processing, such as conversion of the text into commands for execution, either by the user device, by the speech processing system 110, or by another device” and “The computer-readable media 402 may further store a dialog management component 408 that is responsible for conducting speech dialogs with the user 104 in response to meanings or intents of user speech determined by the NLU component 128.”; White, ¶ [0103], [0094]) based on the result of performing the ASR of the speech or the utterance of the user of the electronic device. (“the NLU component 128 takes textual input (such as the textual input determined by the ASR component 126) and attempts to make a semantic interpretation of the text. That is, the NLU component 128 determines the meaning behind the text based on the individual words and then implements that meaning.”; White, ¶ [0105]).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke, as modified by the sound analysis techniques of Braho, by the systems and methods for automatic context monitoring for voice input of LeBeau, and by the speech recognition distribution systems of Wood, to incorporate the teachings of White to include wherein the at least one processor is further configured to execute the one or more instructions to determine the response by performing at least one of natural language understanding (NLU) or dialogue management (DM) based on the result of performing the ASR of the speech or the utterance of the user of the electronic device. The device arbitration described in White may determine the most appropriate device to both “‘listen’ for sound representing user speech in the environment” and “‘respond’ to the utterance,” thus accounting for the capability of the device to respond to a user utterance (ability to respond) in light of user expectations regarding appropriate timing and context, as recognized by White. (White, ¶¶ [0029]-[0030]).

Regarding claim 12, the rejection of claim 11 is incorporated. Burke, Braho, LeBeau, and Wood disclose all of the elements of the current invention as stated above. However, Burke fail(s) to expressly recite further comprising converting the speech or the utterance of the user of the electronic device into the audio signal of the speech or the utterance of the user of the electronic device.
The relevance of White is described above with relation to claim 10. Regarding claim 12, White discloses further comprising converting the speech or the utterance of the user of the electronic device into the audio signal of the speech or the utterance of the user of the electronic device (“The acoustic front end (AFE) 416 transforms {converting} the audio data from the microphone {the speech or utterance of the user of the electronic device…} into data for processing by the speech recognition engine 418 {…into the audio signal of the speech or the utterance of the user of the electronic device}”; White, ¶ [0099]).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke, as modified by the sound analysis techniques of Braho, by the systems and methods for automatic context monitoring for voice input of LeBeau, and by the speech recognition distribution systems of Wood, to incorporate the teachings of White to include further comprising converting the speech or the utterance of the user of the electronic device into the audio signal of the speech or the utterance of the user of the electronic device. The device arbitration described in White may determine the most appropriate device to both “‘listen’ for sound representing user speech in the environment” and “‘respond’ to the utterance,” thus accounting for the capability of the device to respond to a user utterance (ability to respond) in light of user expectations regarding appropriate timing and context, as recognized by White. (White, ¶¶ [0029]-[0030]).

Regarding claim 21, the rejection of claim 1 is incorporated. Burke, Braho, LeBeau, and Wood disclose all of the elements of the current invention as stated above. However, Burke fail(s) to expressly recite wherein the ASR is processed using an artificial intelligence (AI) algorithm.
The relevance of White is described above with relation to claim 10. Regarding claim 21, White discloses wherein the ASR is processed using an artificial intelligence (AI) algorithm ("The device or devices performing the ASR processing may include an acoustic front end (AFE) 416 and a speech recognition engine 418... [where] a number of approaches may be used by the AFE 416 to process the audio data, such as...neural network feature vector techniques {an artificial intelligence (AI) algorithm}."; White, ¶¶ [0099]).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke, as modified by the sound analysis techniques of Braho, by the systems and methods for automatic context monitoring for voice input of LeBeau, and by the speech recognition distribution systems of Wood, to incorporate the teachings of White to include wherein the ASR is processed using an artificial intelligence (AI) algorithm. The device arbitration described in White may determine the most appropriate device to both “‘listen’ for sound representing user speech in the environment” and “‘respond’ to the utterance,” thus accounting for the capability of the device to respond to a user utterance (ability to respond) in light of user expectations regarding appropriate timing and context, as recognized by White. (White, ¶¶ [0029]-[0030]).

Claims 8 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Burke, Braho, LeBeau, and Wood as applied to claims 7 and 17 above, and further in view of Endo.

Regarding claim 8, the rejection of claim 7 is incorporated. Burke, Braho, and White disclose all of the elements of the current invention as stated above. However, Burke, Braho, and White fail(s) to expressly recite wherein the at least one processor is further configured to execute the one or more instructions to select the ASR result from among the first ASR result and the second ASR result, based on the ambient noise information of the electronic device.
Endo teaches a speech recognition system having multiple speech recognizers. (Endo, Col. 1, lines 16-17). Regarding claim 8, Endo discloses wherein the at least one processor is further configured to execute the one or more instructions to (Discloses a “speech recognition system 104 [including a] decision module 208... coupled to the speech recognizers 202, 204, 206…the decision module 208 includes ...a processor 304.”; Endo, Col. 7, lines 6-12) select the ASR result from among the first ASR result and the second ASR result (“Each speech recognizer 202, 204, 206 recognizes the input speech signal 120 output from the microphone 102 according to its own speech recognition mechanism (whether it is a grammar-based speech recognizer or a statistical speech recognizer) and outputs the recognized speech text 130 along with an associated raw confidence score 13... to the decision module 208.” where “The decision module 208 selects the speech text with the highest adjusted confidence score as the most accurate recognized speech text”; Endo, Col. 5, lines 44-51, Col. 6, lines 50-52), based on the ambient noise information of the electronic device (In some examples, “the decision module 208 adjusts the raw confidence scores to generate adjusted confidence scores associated with the recognized speech text, based upon ...the external data 109… [including] level of background noise.” Thus, the level of background noise {ambient noise information} is used to adjust the confidence score, where the confidence score is used to select the ASR result, as produced by speech recognizers 202, 204, and 206.; Endo, Col. 6, lines 35-43).  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and method for speech recognition task allocation of Burke, as modified by the sound analysis techniques of Braho, by the systems and methods for automatic context monitoring for voice input of LeBeau, and by the speech recognition distribution systems of Wood, to incorporate the teachings of Endo to include wherein the at least one processor is further configured to execute the one or more instructions to select the ASR result from among the first ASR result and the second ASR result, based on the ambient noise information of the electronic device. “Because the output speech text is selected from the outputs from a plurality of speech recognizers in the speech recognition system of the present invention, the speech recognition system can take advantage of the strengths, while complementing the weaknesses, of each speech recognizer,” as recognized by Endo. (Endo, Col. 3, lines 3-8).

Regarding claim 18, the rejection of claim 17 is incorporated. Claim 18 is substantially the same as claim 8 and is therefore rejected under the same rationale as above.
                                                                                                                                                                                           
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Kim et al. (U.S. Pat. App. Pub. No. 2016/0267913) discloses a device incorporating a wake-up keyword model which detects a wake-up keyword from a received speech signal of a user.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        
/LAMONT M SPOONER/Primary Examiner, Art Unit 2657                                                                                                                                                                                                        
9/9/2022