DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings were submitted on 12/03/2020.  These drawings are reviewed and accepted by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-3, 6, 9, 11, 15-17, 20, 23, and 25 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Shankar et al. (US 20190325862 A1).

Regarding claims 1 and 15, Shankar teaches:
“receiving, at data processing hardware of a voice-enabled device, an indication of a microphone trigger event indicating a possible user interaction with the voice-enabled device through speech, the voice-enabled device having a microphone that, when open, is configured to capture speech for recognition by an automated speech recognition (ASR) system” (par. 0025; ‘The input signal 205 to the ASR system 200 may be an analog signal or a time-varying digital value. In either case, the input signal 205 is typically derived from a microphone.’; par. 0027; ‘The VAD 210, when present, receives the input signal 205 and determines whether or not the input signal 205 contains, or is likely to contain, the sound of a human voice.’ ‘When speech activity is detected (or when the VAD 210 is not present), the elements of the ASR system 200 process the input signal 205 as described in the following paragraphs.’)
“in response to receiving the indication of the microphone trigger event: instructing, by the data processing hardware, the microphone to open or remain open for an open microphone duration window to capture an audio stream in an environment of the voice-enabled device” (The operation time periods of a few seconds to minutes reads on duration window, par. 0038; ‘It is understood that system 300 may also operate for shorter periods of time such as for a few seconds or minutes if desired.’; par. 0050; ‘Consequently, system 300 as shown in FIGS. 3-4 can be continuously receiving input 305…’); and
“providing, by the data processing hardware, the audio stream captured by the open microphone to the ASR system to perform ASR processing over the audio stream” (par. 0043; ‘The ASR system 300 includes the trigger path 350 to perform speech recognition processing of a current frame of the consecutive feature vectors received from the feature extractor 320 during the long period of time.’);
and
“while the ASR system is performing the ASR processing over the audio stream captured by the open microphone: decaying, by the data processing hardware, a level of the ASR processing that the ASR system performs over the audio stream based on a function of the open microphone duration window” (par. 0055; ‘To realize the reduction in power consumption possible with this ASR system architecture, the processing device or devices that implement the detector 360 can be capable of transitioning between an active mode and a low (or zero) power quiescent mode. The processing device or devices that implement the detector may be placed in the quiescent mode except when performing speech recognition processing or reviewing the output signals of the neural network under control of the trigger neural path or signal. When a single processing device implements more than one function of the ASR system, the processing device may be in an active mode for a portion of each frame and a quiescent mode during another portion of each frame offset interval.’); and
“instructing, by the data processing hardware, the ASR system to use the decayed level of the ASR processing over the audio stream captured by the open microphone” (a low (or zero) power quiescent mode, par. 0055).

Regarding claims 2 (dep. on claim 1) and 16 (dep. on claim 15), Shankar further teaches:
“further comprising, while the ASR system is performing the ASR processing over the audio stream captured by the open microphone: determining, by the data processing hardware, whether voice activity is detected in the audio stream captured by the open microphone, wherein decaying the level of the ASR processing the ASR system performs over the audio stream is further based on the determination of whether any voice activity is detected in the audio stream” (par. 0027; ‘When speech or speech activity is not detected, the other elements of the ASR system 200 may be held in a low power or quiescent state in which they do not process the input signal or perform speech recognition processing.’).

Regarding claims 3 (dep. on claim 1) and 17 (dep. on claim 15), Shankar further teaches:
“the ASR system initially uses a first processing level to perform the ASR processing over the audio stream upon commencement of the open microphone duration window, the first processing level associated with full processing capabilities of the ASR system” (par. 0027; ‘When speech activity is detected (or when the VAD 210 is not present), the elements of the ASR system 200 process the input signal 205 as described in the following paragraphs.’),
“decaying the level of the ASR processing the ASR system performs over the audio stream based on the function of the open microphone duration window comprises: determining whether a first interval of time has elapsed since commencing the open microphone duration window” (par. 0044; ‘The trigger path 350 performs trigger processing of the content of the consecutive feature vectors of vectors 325. The trigger path 350 is configured to determine when during the long period of time, the detector 360 should perform the speech recognition processing on the output signals S.sub.1-S.sub.X to attempt to recognize a word by triggering when the detector 360 reviews the output signals S.sub.1-S.sub.X to recognize the word and output that word as output vector 345.’ ‘At other times, the detector 360 remains in a low power quiescent state. The detector 360 does not perform speech recognition processing when it is in the quiescent state.’); and
“when the first interval of time has elapsed, decaying the level of the ASR processing the ASR system performs over the audio stream by reducing the level of the ASR processing from the first processing level to a second processing level, the second processing level less than the first processing level” (par. 0044; ‘At other times, the detector 360 remains in a low power quiescent state. The detector 360 does not perform speech recognition processing when it is in the quiescent state.’).

Regarding claims 6 (dep. on claim 1) and 20 (dep. on claim 15), Shankar further teaches:
“wherein instructing the ASR system to use the decayed level of the ASR processing comprises instructing the ASR system to reduce a number of ASR processing steps performed over the audio stream” (par. 0044; ‘At other times, the detector 360 remains in a low power quiescent state. The detector 360 does not perform speech recognition processing when it is in the quiescent state.’).

Regarding claims 9 (dep. on claim 1) and 23 (dep. on claim 15), Shankar further teaches:
“comprising obtaining, by the data processing hardware, a current context when the indication of the microphone trigger event is received, wherein instructing the ASR system to use the decayed level of the ASR processing comprises instructing the ASR system to bias speech recognition results based on the current context” (par. 0026; ‘For example, when multiple words are considered present, a particular word may be selected based on the context established by previously detected words.’).

Regarding claims 11 (dep. on claim 1) and 25 (dep. on claim 15), Shankar further teaches:
“wherein, while the ASR system is using [t]he decayed level of the ASR processing over the audio stream captured by the open microphone” (par. 0055; ‘par. 0055; ‘To realize the reduction in power consumption possible with this ASR system architecture, the processing device or devices that implement the detector 360 can be capable of transitioning between an active mode and a low (or zero) power quiescent mode.’);
“the ASR system is configured to: generate a speech recognition result for audio data corresponding to a query spoken by the user; and provide the speech recognition result to an application to perform an action specified by the query” (Spoken commands, upon recognition, perform actions specified by content of the commands, par. 0023; ‘Additionally, the ASR system will preferably recognize words within its vocabulary in a continuous audio stream without having to be turned on or prompted by a user just before a user speaks a word or command. Such an ASR system may be applied, for example, in game controllers, remote controls for entertainment systems, and other portable devices with limited battery capacity.’).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 4-5 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Shankar in view of Jagatheesan et al. (US 20150025890 A1)

Regarding claims 4 (dep. on claim 1) and 18 (dep. on claim 15), Shankar teaches a decayed level of the ASR processing. 
However, Shankar does not expressly teach switching from remote server to voice-enabled device, as in “wherein instructing the ASR system to use the decayed level of the ASR processing comprises instructing the ASR system to switch from performing the ASR processing on a remote server in communication with the voice-enabled device to performing the ASR processing on the data processing hardware of the voice-enabled device.”
Jagatheesan teaches:
“wherein instructing the ASR system to use the decayed level of the ASR processing comprises instructing the ASR system to switch from performing the ASR processing on a remote server in communication with the voice-enabled device to performing the ASR processing on the data processing hardware of the voice-enabled device” (par. 0071; ‘In one example, PLM and PAM represent selective processing of high-frequency or most-expected voice commands of a speaker at a particular point of time. In one embodiment, the PLM and PAM are transferred digitally from ASRs in a higher hierarchy level (such as a cloud) to lower hierarchy levels so that it enables cached processing of the frequent hot words, with less processing on the lower hierarchy level devices. This feature provides the lower hierarchy level ASR to process the high-frequency or "hot words or commands" locally with less processing power and better accuracy (or lower error rate), since there are less words and acoustics to be processed.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Shankar’s method of switching from active mode to low power mode by incorporating Jagatheesan’s method of selective processing in order to switch from cloud processing to local processing. The combination would provide an ASR process with less processing power and better accuracy. (Jagatheesan: par. 0071)

Regarding claims 5 (dep. on claim 1) and 19 (dep. on claim 15), the combination of Shankar in view of Jagatheesan further teaches:
“wherein instructing the ASR system to use the decayed level of the ASR processing comprises instructing the ASR system to switch from using a first ASR model to a second ASR model for performing the ASR processing over the audio stream, the second ASR model comprising fewer parameters than the first ASR model” (Jagatheesan: par. 0070; ‘Personalized Language Model (PLM) and Personalized Acoustic Model (PAM) that are "selectively processed" and sent from higher hierarchies to be stored in lower hierarchies enabling almost a "cached-processing" of speech in lower hierarchies. In the example HSR ecosystem 500, the higher level ASR in the hierarchy (e.g., the cloud), may generate the PLM and PAM using data it has acquired about the speaker at multiple times. For example, if it finds the speaker is most likely to call certain friends on a Friday evening, or turn on a TV once at home, those relevant speech commands and the acoustics associated with only that environment (e.g., car or home) may be used in a PAM or PLM. A PLM is a smaller, geolocation/time-specific and rather than covering all the words and its variations in a language, uses only the words and grammar that a speaker is expected to use.’).

Claim(s) 7-8, 12-14, 21-22, and 26-28 are rejected under 35 U.S.C. 103 as being unpatentable over Shankar in view of Thomson et al. (US 20200175961 A1).

Regarding claims 7 (dep. on claim 1) and 21 (dep. on claim 15), Shankar does not expressly teach:
“wherein instructing the ASR system to use the decayed level of the ASR processing comprises instructing the ASR system to adjust beam search parameters to reduce a decoding search space of the ASR system.”
Thomson teaches:
“wherein instructing the ASR system to use the decayed level of the ASR processing comprises instructing the ASR system to adjust beam search parameters to reduce a decoding search space of the ASR system” (par. 0336; ‘ASR2 may perform a beam search using a narrower beam, relative to the beam width of ASR1.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Shankar’s method of switching from active mode to low power mode by incorporating Thomson’s different configurations of ASR systems in order to eliminate alignment paths or nodes where a performance criterion falls below a selected threshold.

Regarding claims 8 (dep. on claim 1) and 22 (dep. on claim 15), the combination of Shankar in view of Thomson further teaches:
“wherein instructing the ASR system to use the decayed level of the ASR processing comprises instructing the ASR system to perform quantization and/or sparsification on one or more parameters of the ASR system” (Thomson: par. 0108; ‘In some embodiments, after extracting the speech recognition features, the features may be quantized or otherwise compressed.’).

Regarding claims 12 (dep. on claim 1) and 26 (dep. on claim 15), the combination of Shankar in view of Thomson further teaches:
“after instructing the ASR system to use the decayed level of the ASR processing over the audio stream: receiving, at the data processing hardware, an indication that a confidence for a speech recognition result for a voice query output by the ASR system fails to satisfy a confidence threshold” (Thomson: par. 1583; ‘If the ASR confidence or another objective metric related to accuracy is low, then the system may: [1584] a. Transfer the communication session to a revoiced ASR system.’), and
“instructing, by the data processing hardware, the ASR system to: increase the level of ASR processing from the decayed level” (Thomson: par. 1583; ‘If the ASR confidence or another objective metric related to accuracy is low, then the system may: [1584] a. Transfer the communication session to a revoiced ASR system.’) and
“reprocess the voice query using the increased level of ASR processing” (Thomson: par. 1583; ‘If the ASR confidence or another objective metric related to accuracy is low, then the system may: [1584] a. Transfer the communication session to a revoiced ASR system.’).

Regarding claims 13 (dep. on claim 1) and 27 (dep. on claim 15), the combination of Shankar in view of Thomson further teaches:
“while the ASR system is performing the ASR processing over the audio stream captured by the open microphone: determining, by the data processing hardware, when the decayed level of the ASR processing the ASR performs over the audio stream based on the function of the open microphone duration is equal to zero” (Shankar: par. 0055; ‘To realize the reduction in power consumption possible with this ASR system architecture, the processing device or devices that implement the detector 360 can be capable of transitioning between an active mode and a low (or zero) power quiescent mode.’); and
“when the decayed level of the ASR processing is equal to zero, instructing, by the data processing hardware, the microphone to close” (Thomson: par. 0706; muting the microphone).

Regarding claims 14 (dep. on claim 1) and 28 (dep. on claim 15), the combination of Shankar in view of Thomson further teaches:
“displaying, by the data processing hardware, in a graphical user interface of the voice-enabled device, a graphical indicator indicating the decayed level of ASR processing performed by the ASR system on the audio stream” (Table 11; ‘Provide a means, such as via a GUI, for an operator to view the indicators and select a business objective. For example, a GUI may display a chart, such as a table or an ROC curve, showing overall accuracy vs. automation rate and allow the operator to select an automation rate.’).

Claim(s) 10 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Shankar in view of Lovitt (US 20180025731 A1).

Regarding claims 10 (dep. on claim 1) and 24 (dep. on claim 15), Shankar does not expressly teach switching from SOC-based processing to DSP-based, as in “wherein instructing the ASR system to use the decayed level of the ASR processing comprises instructing the ASR system to switch from system on a chip-based (SOC-based) processing to perform the ASR processing on the audio stream to digital signal processor-based (DSP-based) processing to perform the ASR processing on the audio stream.”
Lovitt teaches:
“wherein instructing the ASR system to use the decayed level of the ASR processing comprises instructing the ASR system to switch from system on a chip-based (SOC-based) processing to perform the ASR processing on the audio stream to digital signal processor-based (DSP-based) processing to perform the ASR processing on the audio stream” (par. 0009; ‘In some configurations, the specialized recognition engines, the policy engine, and the arbitrator can execute on a digital signal processor (“DSP”) while the listeners execute on a system on a chip (“SoC”).’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Shankar’s method of switching from active mode to low power mode by incorporating Lovitt’s use of DSP and SoC in order to switch from SoC-based processing to DSP-based processing. The combination can reduce the power required by a computing device to recognize particular words, phrases, or other types of acoustic objects, particularly when operating in a low power state, as compared to previous speech recognition technologies. (Lovitt: par. 0004)

Conclusion
Other pertinent prior art that teach similar features are listed in the PTO-892 for consideration.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK VILLENA whose telephone number is (571)270-3191. The examiner can normally be reached 10 am - 6pm EST Monday through Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MARK . VILLENA
Examiner
Art Unit 2658



/MARK VILLENA/Examiner, Art Unit 2658