Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
1.	This action is responsive to Application no. 16/953,510 filed 11/20/2020.  All claims have been examined and are currently pending.
Information Disclosure Statement
2.	The information disclosure statement (IDS) submitted is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Objections
3.	Claim 16 recites the limitation “the first stage hotword detector”.  There is insufficient antecedent basis for this limitation. Appropriate correction is required.
Claim Rejections - 35 USC § 103
4.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

6.	Claims 1-7, 10-13, 15-22, 25-28, 30 are rejected under 35 U.S.C. 103 as being unpatentable over Gruenstein et al (2018/0233150) in view of Wu et al (10,872,599).

Regarding claim 1 Gruenstein teaches A method (abstract: methods, systems, and apparatus; fig 4; para: 70; 78-81) comprising:   
receiving, at data processing hardware, audio data characterizing a hotword event 5detected by a first stage hotword detector in streaming audio captured by a user device (abstract: detecting hotwords; receiving audio signal encoding one or more utterances; fig 1 102; para 2-3: client device; 24: client hotword detection);
processing, by the data processing hardware, using a second stage hotword detector, the audio data to determine whether a hotword is detected by the second stage hotword detector in a first segment of the audio data (abstract: sending the audio signal to a server system that determines whether the first utterance satisfies a second threshold of being the key phrase; fig 1: server hotword detection module 114; para: 30);
when the hotword is not detected by the second stage hotword detector in the first 10segment of the audio data (38-39: In response to determining that the audio signal likely does not encode the key phrase, the speech recognition system 112 may send a message to the client device 102 indicating that the audio signal does not likely encode the key phrase)
but does not specifically teach where
Wu et al (10,872,599) teaches A method comprising: 
receiving, at data processing hardware, audio data characterizing a hotword event 5detected by a first stage hotword detector in streaming audio captured by a user device (fig 1 audio 11, device 110; fig 2A wakeword detection component; col 2 l. 21-23: device monitor audio to detect a wakeword); and 
when the hotword is not detected by the [second stage] hotword detector in the first 10segment of the audio data: 
classifying, by the data processing hardware, the first segment of the audio data as containing a negative hotword that caused a false detection of the hotword event in the streaming audio by the first stage hotword detector (abstract: A device monitors audio data for a predetermined and/or user-defined wakeword. The device detects an error in detecting the wakeword in the audio data, such as a false-positive detection of the wakeword or a false-negative detection of the wakeword. Upon detecting the error, the device updates a model trained to detect the wakeword to create an updated trained model; the updated trained model reduces or eliminates further errors in detecting the wakeword; fig 2A; fig 4; col 5 l. 16-19 determine occurrence of a false-positive detection or false-negative detection of the wakeword; 
col 7 l. 40-45; col 8 l. 24-31, l. 43-45; col 9 l. 35-43: false positive detection of a wakeword; the user 5 (or other source of sound or speech) issues an utterance that does not include the wakeword, but the corresponding score 904 is determined to be greater than the threshold 902. The trained model is updated using the audio data such that, when the user 5 or other source again issues the utterance, the score 906 is less than the threshold 902.); and 
based on the first segment of the audio data classified as containing the 15negative hotword, updating, by the data processing hardware, the first stage hotword detector to prevent triggering the hotword event in subsequent audio data that contains the negative hotword (abstract: Upon detecting the error, the device updates a model trained to detect the wakeword to create an updated trained model; the updated trained model reduces or eliminates further errors in detecting the wakeword; 
col 9 l. 35-43: the user 5 (or other source of sound or speech) issues an utterance that does not include the wakeword, but the corresponding score 904 is determined to be greater than the threshold 902. The trained model is updated using the audio data such that, when the user 5 or other source again issues the utterance, the score 906 is less than the threshold 902 –
Where Wu teaches different options for determining false negative or false positive detection and can perform training, where the training will include methods for eliminating the improper detections which includes at least adding/classifying the term that caused the false positive to be identified as a non-wakeword col 4 l 42).  
	It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Wu for improved false hotword detection and training.


Regarding claim 2 Gruenstein teaches The method of claim 1, further comprising, when the hotword is not detected by 20the second stage hotword detector in the first segment of the audio data (Gruenstein abstract; 38-39): 
suppressing, by the data processing hardware, a wake-up process on the user device for processing the hotword and/or one or more other terms following the hotword in the streaming audio (33 determine whether client device should wake up, perform an action; 38-39 – if wakeword not detected by second detector not waking up device); and 
and does not specifically teach where Wu teaches
determining, by the data processing hardware, whether an immediate follow-up 25query was provided by a user of the user device after suppressing the wake-up process on the user device (col 7 l. 40-44), 
wherein classifying the first segment of the audio data as containing the negative hotword is further based on determining that no follow-up query was provided by the user of the user device after suppressing the wake-up process (col 7 l. 40 – 56: The server 120, using any of the speech-recognition techniques described herein, determines that the first audio data does not include a command, request, or other such user input corresponding to an intending speaking of the wakeword—i.e., the determination of the score corresponds to a false-positive detection of the wakeword.  The server 120 accordingly sends an indication of the false-positive detection to the device 110, which receives (412) the indication. In other embodiments, the device 110 instead or in addition determines that the first audio data does not include the command, request, or other such user input. Based on this determination and/or on receiving the indication, the device 110 generates (414) an updated trained model using the first audio data by, for example, back-propagating error data created from the difference between a candidate representation of the wakeword in the audio data and a stored representation of the wakeword.; col 9 l. 35-43).  
Rejected for similar rationale and reasoning as claim 1 above

Regarding claim 3 Gruenstein teaches The method of claim 1, further comprising, when the hotword is detected by the second stage hotword detector in the first segment of the audio data (38; 40): 
But does not specifically teach where Wu teaches 
processing, by the data processing hardware, a second segment of the audio data that follows the first segment of the audio data to determine whether the second segment 5of the audio data is indicative of a spoken query-type utterance (col 7 l. 40 – 56: The server 120, using any of the speech-recognition techniques described herein, determines that the first audio data does not include a command, request, or other such user input corresponding to an intending speaking of the wakeword); and 
when the second audio segment of the audio data is not indicative of the spoken query-type utterance: 
classifying, by the data processing hardware, the first segment of the audio data as containing the negative hotword (col 7 l. 40-56); and 
10based on the first segment of the audio data classified as containing the negative hotword, updating, by the data processing hardware, the first stage hotword detector to prevent triggering the hotword event in subsequent audio data that includes the negative hotword (abstract; col 7 l. 40-56; col 9 l. 35-43: trained model is updated using the audio data such that, when the user 5 or other source again issues the utterance, the score 906 is less than the threshold 902).  
Rejected for similar rationale and reasoning as claim 1 above


Regarding claim 4 Gruenstein does not specifically teach where Wu teaches 15The method of claim 3, further comprising, when the second audio segment of the audio data is not indicative of the spoken query-type utterance: 
determining, by the data processing hardware, whether an immediate follow-up query was provided by a user of the user device (col 7 l. 40-56; col 8 l. 1-34, 24-31: if a user utters a wakeword but the device 110 does not wake, the user is likely to repeat the wakeword soon after (e.g., one, two, or five seconds after). The device 110 may detect the repeated wakeword because the user speaks it more loudly, more clearly, and/or with less background noise. Based on the time difference, the device 110 determines that the first score corresponds to a false-negative detection of the keyword.), 
wherein classifying the first segment of the audio data as containing the negative 20hotword is further based on determining that no follow-up query was provided by the user of the user device (col 7 l. 40-56; col 8 l. 1-34; col 9 l. 35-43).  
Rejected for similar rationale and reasoning as claim 1 above


Regarding claim 5 Gruenstein does not specifically teach where Wu teaches The method of claim 3, further comprising, when the second audio segment of the audio data is indicative of the spoken query-type utterance: 
25receiving, at the data processing hardware, a negative interaction result indicating that a user of the user device negatively interacted with results for the spoken query-type utterance provided to the user device (col 8 l. 35-58, 42-44: The device 110 receives (610), however, user input indicating a false negative detection of the wakeword or user input indicating a false positive detection of the wakeword.); 
classifying, by the data processing hardware, based on the received negative interaction result, the first segment of the audio data as containing the negative hotword (abstract; col 9 l. 35-43); 30and 35Attorney Docket No: 231441-474828 
based on the first segment of the audio data classified as containing the negative hotword, updating, by the data processing hardware, the first stage hotword detector to prevent detecting the hotword event in subsequent audio data that contains the negative hotword (abstract: The device detects an error in detecting the wakeword in the audio data, such as a false-positive detection of the wakeword or a false-negative detection of the wakeword. Upon detecting the error, the device updates a model trained to detect the wakeword to create an updated trained model; the updated trained model reduces or eliminates further errors in detecting the wakeword; col 9 l. 35-43).  
Rejected for similar rationale and reasoning as claim 1 above


Regarding claim 6 Gruenstein does not specifically teach where Wu teaches The method of claim 1, further comprising, after receiving the audio data characterizing the hotword event detected by the first stage hotword detector: 
receiving, at the data processing hardware, a negative user interaction indicating user suppression of a wake-up process on the user device (col 8 l. 35-58, 42-44: The device 110 receives (610), however, user input indicating a false negative detection of the wakeword or user input indicating a false positive detection of the wakeword.), 
10wherein classifying the first segment of the audio data as containing the negative hotword is further based on the negative user interaction indicating user suppression of the wake-up process (abstract; col 8 l. 35-58; col 9 l. 35-43).  
Rejected for similar rationale and reasoning as claim 1 above


Regarding claim 7 Gruenstein does not specifically teach where Wu teaches The method of claim 1, wherein updating the first stage hotword detector to 15prevent triggering the hotword event in subsequent audio data comprises providing the first segment of the audio data classified as containing the negative hotword to the user device, the user device configured to retrain the first stage hotword detector using the first segment of audio data classified as containing the negative hotword (abstract; fig 1; col 9 l. 35-43).  
Rejected for similar rationale and reasoning as claim 1 above


Regarding claim 10 Gruenstein does not specifically teach where Wu teaches The method of claim 1, wherein: 
updating the first stage hotword detector to prevent triggering the hotword event in subsequent audio data comprises providing the first segment of the audio data classified as containing the negative hotword to the user device (abstract: device updates a model; fig 1; col 9 l. 35-43), the user device 10configured to: 
obtain an embedding representation of the first segment of the audio data (fig 4 ; col 3 l. 32-35 – representation of audio); and 
store, in memory hardware of the user device, the embedding representation of the first segment of the audio data (abstract; col 9 l. 35-43;  col 2 l. 21-36; col 3 l. 64 – col 4 l. 3; col 4 l. 34-46 – storing representation of negative hotword); and 
15the user device is configured to determine when subsequent audio data characterizing the hotword event detected by the first stage hotword detector includes the negative hotword (col 9 l. 35-43: when the user…again issues the utterance) by: 
computing an evaluation embedding representation for the subsequent audio data (fig 4; col 3 l. 32-35 – representation of audio); 
20determining a similarity score between the embedding representation of the first segment of the audio data classified as the negative hotword and the evaluation embedding representation for the subsequent audio data (col 2 l. 27-34; col 4 l. 34-36; col 9 l. 35-43 – compare received audio representation to stored audio representation to determine likelihood/match score); and 
when the similarity score satisfies a similarity score threshold, determining that the subsequent audio data includes the negative hotword (col 2 l. 27-34: probability; score; l. 37: if the score is greater than the threshold; col 4 l. 34-46: wakeword detection component may compare audio data to stored models or data to detect a wakeword; decode the audio signals; wakeword spotting…non-wakeword; non-wakewrod speech – recognizing wakewords and non-wakewords by comparing to stored representations; where when a false positive is detected an utterance can be classified as a non-wakeword and stored accordingly).  
Rejected for similar rationale and reasoning as claim 1

Regarding claim 11 Gruenstein teaches The method of claim 1, wherein: 
the data processing hardware resides on a server in communication with the data processing hardware (abstract; fig 1); and 
the first stage hotword detector executes on a processor of the user device (abstract; fig 1).  

Regarding claim 12 Gruenstein teaches The method of claim 11, wherein processing the audio data to determine whether the hotword is detected by the second stage hotword detector in the first segment of the audio data comprises performing automated speech recognition to determine whether the hotword is recognized in the first segment of the audio data (fig 1: speech recognition; 30-32).  

Regarding claim 13 Gruenstein teaches The method of claim 1, wherein the data processing resides on the user device (fig 1).  


Regarding claim 15 Gruenstein teaches The method of claim 1, wherein the first stage hotword detector is configured to: 
15generate a probability score indicating a presence of the hotword in audio features of the streaming audio captured by the user device (abstract; fig 1); and 
detect the hotword event in the streaming audio when the probability score satisfies a hotword detection threshold of the first stage hotword detector (abstract; fig 1).  


Regarding claim 16 Gruenstein and Wu teach 20A system comprising: 
data processing hardware; and 
memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising: 
25processing, using a second stage hotword detector, the audio data to determine whether a hotword is detected by the second stage hotword detector in a first segment of the audio data; and 
when the hotword is not detected by the second stage hotword detector in the first segment of the audio data: 38Attorney Docket No: 231441-474828 
classifying, by the data processing hardware, the first segment of the audio data as containing a negative hotword that caused a false detection of the hotword event in the streaming audio by the first stage hotword detector; and 
based on the first segment of the audio data classified as containing 5the negative hotword, updating, by the data processing hardware, the first stage hotword detector to prevent triggering the hotword event in subsequent audio data that contains the negative hotword.  
Claim recites limitations similar to claim 1 and is rejected for similar rationale and reasoning.

Claim 17 recites limitations similar to claim 2 and is rejected for similar rationale and reasoning.
Claim 18 recites limitations similar to claim 3 and is rejected for similar rationale and reasoning.
Claim 19 recites limitations similar to claim 4 and is rejected for similar rationale and reasoning.
Claim 20 recites limitations similar to claim 5 and is rejected for similar rationale and reasoning.
Claim 21 recites limitations similar to claim 6 and is rejected for similar rationale and reasoning.
Claim 22 recites limitations similar to claim 7 and is rejected for similar rationale and reasoning.

Claim 25 recites limitations similar to claim 10 and is rejected for similar rationale and reasoning.
Claim 26 recites limitations similar to claim 11 and is rejected for similar rationale and reasoning.
Claim 27 recites limitations similar to claim 12 and is rejected for similar rationale and reasoning.
Claim 28 recites limitations similar to claim 13 and is rejected for similar rationale and reasoning.

Claim 30 recites limitations similar to claim 15 and is rejected for similar rationale and reasoning.



7.	Claims 8-9, 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Gruenstein et al (2018/0233150) in view of Wu et al (10,872,599) in further view of Weng et al (2012/0271631).

Regarding claim 8 Gruenstein does not specifically teach where Wu teaches 20The method of claim 7, wherein the user device is configured to retrain the first stage hotword detector by: 
storing, in memory hardware of the user device, each instance of the first segment of the audio data classified as containing the negative hotword in memory hardware of the user device (abstract; fig 1; col 4 l 42-43; col 9 l. 35-43); and 
25retraining the first stage hotword detector based on [an aggregation of the number of instances of] the first segment of the audio data classified as containing the negative hotword stored in the memory hardware (abstract; fig 1; col 9 l. 35-43);
But does not specifically teach 
retraining the first stage hotword detector based on an aggregation of the number of instances of the first segment of the audio data classified as containing the negative hotword stored in the memory hardware.  
Weng et al (2012/0271631) teaches updating/training speech models based on a frequency of received utterances
(generating models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a model using the high-frequency plurality of utterances as training data (abstract); 23; 32).
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Weng (with Wu) to allow for improved false hotword detection and training based on frequently used improper word(s).

Regarding claim 9 Gruenstein does not specifically teach where Wu teaches The method of claim 8, the user device is further configured to, prior to retraining 30the first stage hotword detector: 36Attorney Docket No: 231441-474828 
determine that a corresponding confidence score associated each instance of the first segment of the audio data classified as containing the negative hotword fails to satisfy a negative hotword threshold score (col 2 l. 27-34: probability; score; col 4 l. 34-46: wakeword detection component may compare audio data to stored models or data to detect a wakeword; wakeword spotting…non-wakeword; non-wakewrod speech – recognizing wakewords and non-wakewords); 
and does not specifically teach where Weng teaches
determine that the number of instances exceeds a threshold number of instances (Weng abstract; 23; 32).  
Rejected for similar rationale and reasoning as claim 8.

Claim 23 recites limitations similar to claim 8 and is rejected for similar rationale and reasoning.
Claim 24 recites limitations similar to claim 9 and is rejected for similar rationale and reasoning.


8.	Claims 14 and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Gruenstein et al (2018/0233150) in view of Wu et al (10,872,599) in further view of Ganong III et al (2014/0278435)

Regarding claim 14 Gruenstein already teaches the first and second stage hotword detectors, however Gruenstein and Wu do not specifically teach where Ganong teaches The method of claim 13, wherein: 
the first stage hotword detector executes on a digital signal processor (DSP) of the 10data processing hardware (27; 56; 62; 64; 72: first processing stage is performed on …DSP); and 
the second stage hotword detector executes on an application processor of the data processing hardware (72: second processing stage is performed on primary processor e.g. main CPU).
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Ganong to allow for more efficient use of a device.  

Claim 29 recites limitations similar to claim 14 and is rejected for similar rationale and reasoning.

Conclusion
9.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: See PTO-892.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAUN A ROBERTS whose telephone number is (571)270-7541.  The examiner can normally be reached Monday-Friday 9-5 EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SHAUN ROBERTS/
Primary Examiner, Art Unit 2655