Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Continued Examination Under 37 CFR 1.114
1.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 6/15/2022 has been entered.
Response to Amendment
2.	Claims 1, 3, 11, 13 have been amended.
Response to Arguments
3.	Applicant’s arguments filed have been fully considered and are moot based on the new grounds of rejection responsive to the amendments.
Applicant has amended the claims to recite where the attributes are obtained without performing speech recognition.  Newly cited prior art, Thomas, teaches such (see art rejection below).	

Claim Rejections - 35 USC § 103
4.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

6.	Claims 1-2, 9-12, 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Thomas (2009/0086934) in view of Gruenstein et al (2019/0362719).

Regarding claim 1 Thomas (2009/0086934) teaches A method (abstract system; 0001: methods) comprising: 
receiving, at data processing hardware, a first acoustic segment characterizing a [hot]word detected by a [hot]word detector in streaming audio captured by a user device (52: computer system which may embody the present invention; 48: receives speech from a caller); 
without performing speech recognition processing on the streaming audio: 
extracting, by the data processing hardware, one or more [hot]word attributes from the first acoustic segment, wherein one of the one or more [hot]word attributes extracted from the first acoustic segment comprises a pause duration measure indicating an extent that a speaker paused while speaking the[ hot]word and/or between speaking the [hot]word and a second acoustic segment that characterizes a spoken query/command that follows the first acoustic segment in the streaming audio captured by the user device (fig 4, fig 5/para 52: computer system which may embody the present invention; 48: receives speech from a caller; modification apparatus 401; 56: callers speak; 57: duration of pauses between words; this information can be from previous utterances in current call; 59 speech samples, energy
61: modification apparatus 401 monitors the energy of the speech signal which will have a waveform similar to that shown in FIG. 8. The high energy regions of this waveform correspond to periods in time when the caller is speaking, whilst the troughs corresponds to periods when the caller is silent. It will be seen that four distinct words are said by the caller during the time of the waveform. The troughs represent the spaces between consecutive words. To achieve this, the modification apparatus 401 includes a speech signal measuring device (not shown in FIG. 4) which measures the short term energy waveform, using any commonly understood algorithm. The speech signal measuring device also uses a speech/silence energy threshold 802 to identify the portions of the speech signal that are deemed to be speech and the portions which are classed as silence.); 
instructing, by the data processing hardware, an endpointer to one of increase or decrease an audio capture duration for the second acoustic segment from a default value based on the pause duration measure (59; 113); and 
adjusting, by the data processing hardware, based on the one or more [hot]word attributes extracted from the first acoustic segment, one or more speech recognition parameters of an automated speech recognition (ASR) model (59: Once this typical range of durations of pauses is known, the modification apparatus can calculate how long it should wait after an utterance before the result should be returned. This length of time will be calculated to ensure that there is a high probability that the caller has finished speaking before the result is returned. The modification apparatus modifies the command signal accordingly. The end result is that the length of time is lengthened for slow speakers and is shortened for fast speakers; 62; 113 variable end silence figure; 115); and 
after adjusting the speech recognition parameters of the ASR model and instructing the endpointer to one of increase or decrease the audio capture duration for the second acoustic segment, processing, by the data processing hardware, using the ASR model, the second acoustic segment to generate a speech recognition result (fig 4 403 speech recognizer; 48: speech recognizer 403 also includes an output arranged to produce a speech result signal; 67; 115: based upon the durations of these periods, it instructs the recognizer to behave differently according to the different speaker rates of the caller 
0056: the modification apparatus 401 is able to modify the command signals according to the actual situation. For example, it might be determined that a 1 s delay will be insufficient for some callers to complete their utterances as they speak very slowly. Therefore, the it might be appropriate to override the specified 1 s delay and replace it with a 2 s one. Conversely, for a fast speaker, it might be appropriate to reduce the delay below 1 s.
[0057] For this modification to be possible, the modification apparatus 401 will need to be aware of the typical duration of pauses between words that the current caller is likely to use. This information can be derived either from previous utterances in the current call, or from a database of information that is associated with the caller, such as their account number. This embodiment uses the utterances from the current call, ).  

Thomas does not specifically teach where a first acoustic segment is a hotword and extracting hotword attributes.  Gruenstein teaches a hotword (34: hotword).  It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Gruenstein to have the analysis on the hotword presenting a reasonable expectation of success in still allowing a first segment of speech be analyzed to further adapt the recognizer for subsequent recognition for improved and more customized recognition.  


Regarding claim 2 Thomas and Gruenstein teach The method of claim 1, wherein the one or more hotword attributes extracted from the first acoustic segment further comprise at least one of: 
a rate of speech measure (Thomas 112: speaking rate; 115); 20a pitch measure; an ASR prediction measure; or a loudness/tone measure.
Rejected for similar rationale and reasoning as claim 1 where Gruenstein teaches the hotword.  


Regarding claim 159 Thomas and Gruenstein teach The method of claim 1, further comprising, when one of the one or more hotword attributes extracted from the first acoustic segment comprises a rate of speech measure indicating a rate at which a speaker spoke the hotword in the streaming audio, instructing, by the data processing hardware, the endpointer to one of additionally increase or additionally decrease the audio capture duration for the second acoustic segment from the default value based on the rate of speech 20measure (Thomas 112: speaking rate; 115).  
Rejected for similar rationale and reasoning as claim 1 where Gruenstein teaches the hotword.  


Regarding claim 10 Gruenstein teaches The method of claim 1, wherein the one or more hotword attributes are extracted from the first acoustic segment using at least one of a neural network-based model or a heuristic-based model (5 neural networks to determine…the hotword). 
Rejected for similar rationale and reasoning as claim 1

Regarding claim 11 Thomas and Gruenstein teach A system comprising: 
data processing hardware; and 
memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing 30hardware cause the data processing hardware to perform operations comprising: 27Attorney Docket No: 23 1441-476255 
receiving a first acoustic segment characterizing a hotword detected by a hotword detector in streaming audio captured by a user device; 
without performing speech recognition processing on the streaming audio: 
extracting one or more hotword attributes from the first acoustic segment, wherein one of the one or more hotword attributes extracted from the first acoustic segment comprises a pause duration measure indicating an extent that a speaker paused while speaking the hotword and/or between speaking the hotword and a second acoustic segment that characterizes a spoken query/command that follows the first acoustic segment in the streaming audio captured by the user device; 
instructing an endpointer to one of increase or decrease an audio capture duration for the second acoustic segment from a default value based on the pause duration measure; 
adjusting, based on the one or more hotword attributes extracted from the 5first acoustic segment, one or more speech recognition parameters of an automated speech recognition (ASR) model; and 
after adjusting the speech recognition parameters of the ASR model and instructing the endpointer to one of increase or decrease the audio capture duration for the second acoustic segment, processing, using the ASR model, the second acoustic segment to generate a speech recognition result.
Recites limitations similar to claim 1 and is rejected for similar rationale and reasoning  

Claim 12 Recites limitations similar to claim 2 and is rejected for similar rationale and reasoning  

Claim 19 Recites limitations similar to claim 9 and is rejected for similar rationale and reasoning  
Claim 20 Recites limitations similar to claim 10 and is rejected for similar rationale and reasoning  



7.	Claims 3-5, 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over Thomas (2009/0086934) in view of Gruenstein et al (2019/0362719) in further view of Arun (2006/0074651).


Regarding claim 3 Gruenstein teaches The method of claim 1, further comprising, when receiving the first acoustic 25segment: 
receiving, at the data processing hardware, a confidence score indicating a confidence on the likelihood that the streaming audio captured by user device contains a hotword the hotword detector is trained to detect (34 hotword confidence score, hotword confidence score threshold), 
but does not specifically teach
wherein extracting one or more hotword attributes from the first acoustic segment 30comprises extracting an ASR prediction measure indicating a likelihood that the ASR 25Attorney Docket No: 23 1441-476255 model will accurately recognize the query/command portion in the second acoustic segment.  
Arun teaches receiving a noise error from the speech recognition unit responsive to a user voice command and reducing a confidence threshold for an appropriate grammar (abstract); and dynamically adapting the confidence thresholds for speech recognition (0001).
Arun teaches receiving a spoken command from a user, and attempting recognition.  If the recognition does not meet a certain confidence threshold level due to certain factors, such as SNR (ASR prediction measure), the likelihood that the ASR 25Attorney Docket No: 23 1441-476255 model will accurately recognize the query/command is low, and the system can then adjust the confidence threshold level to meet the needs of different speakers in different conditions. 
Arun teaches The method of claim 1, further comprising, when receiving the first acoustic 25segment: 
receiving, at the data processing hardware, a confidence score indicating a confidence on the likelihood that the streaming audio captured by user device contains a [hot]word the [hot]word detector is trained to detect (6 grammar; 7 user voice command, confidence, confidence threshold level), 
wherein extracting one or more [hot]word attributes from the first acoustic segment 30comprises extracting an ASR prediction measure indicating a likelihood that the ASR 25Attorney Docket No: 23 1441-476255 model will accurately recognize the spoken query/command portion in the second acoustic segment (abstract; 7: signal-to-noise ratio; 6-8).  
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Arun to allow ASR to be adjusted based on confidence of initial or previous terms and other factors (indicating a likelihood for successful recognition) to allow for improved recognition to better meet the needs of different speakers in different conditions (where when incorporated with Grunstein allow for the confidence to be determined for the hotword to better recognize the hotword itself or subsequent commands).


Regarding claim 4 Arun teaches The method of claim 3, wherein adjusting the one or more speech recognition 5parameters of the ASR model comprises: 
when the confidence score is greater than a confidence score threshold, decreasing a number of speech recognition hypotheses output by the ASR model and/or decreasing a beam search width of the ASR model (47 – if confidence of recognizing user voice command is above confidence threshold level then not as many hypothesis’ (possible match) are needed; however, for limitation below, if confidence is below confidence threshold level, the threshold is lowered allowing for an increase in the number of hypothesis); or 
when the confidence score is less than the confidence score threshold, increasing 10the number of speech recognition hypotheses output by the ASR model and/or increasing the beam search width of the ASR model (47 The speech recognition unit 136 now has a higher probability of recognizing the user voice command since the confidence threshold is lower. In one embodiment, given the reduced confidence threshold, the speech recognition unit 136 will now find more than one possible match to the repeated user voice command).  
Rejected for similar rationale and reasoning as claim 3

Regarding claim 5 Arun teaches The method of claim 3, wherein adjusting the one or more speech recognition parameters of the ASR model comprises: 
15when the confidence score is greater than a confidence score threshold, adjusting the one or more speech recognition parameters to bias recognition hypotheses toward recognizing the hotword in the first acoustic segment (47); 
or when the confidence score is less than the confidence score threshold, adjusting the one or more speech recognition parameters to not bias recognition hypotheses toward 20recognizing the hotword in the first acoustic segment (47).  
Rejected for similar rationale and reasoning as claims 3 & 4


Claims 13-15 Recite limitations similar to claims 3-5 and are rejected for similar rationale and reasoning  


8.	Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Thomas (2009/0086934) in view of Gruenstein et al (2019/0362719) in further view of Pogue et al (9,799,329).

Regarding claim 6 Thomas teaches analyzing speech attributes and adjusting the ASR but does not specifically teach where Pogue teaches The method of claim 1, wherein: 
extracting one or more hotword attributes from the first acoustic segment comprises extracting a pitch measure specifying a range of frequencies associated with 25the first acoustic segment (col 2 l. 43-61); and 
adjusting the one or more speech recognition parameters comprises adjusting the one or more speech recognition parameters to apply frequency-based filtering on the second acoustic segment by focusing on the specified range of frequencies when processing the second acoustic segment to generate the speech recognition result (col 2 l. 43-61 where Pogue teaches receiving certain frequencies of sounds and ASR can remove unwanted frequencies or focus on wanted frequencies more).  
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Pogue for improved and more customized recognition.  

Claim 16 Recites limitations similar to claim 6 and is rejected for similar rationale and reasoning  

9.	Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Thomas (2009/0086934) in view of Gruenstein et al (2019/0362719) in further view of Degges Jr et al (9,830,924).

Regarding claim 7 Thomas teaches analyzing speech attributes and adjusting the ASR but does not specifically teach where Degges teaches The method of claim 1, further comprising, when one of the one or more hotword attributes extracted from the first acoustic segment comprises a tone and loudness score specifying a tone and loudness of a voice when speaking the hotword, influencing, by the data processing hardware, a natural language understanding (NLU) module or biasing of 5the ASR model when performing query interpretation on the generated speech recognition result (col 1 l. 57 – col  2l. 5; col 2 l. 6-15: analyzes sound intensity of voice command; claim 21 command comprises a wake word – where Degges detects tone/loudness and ASR biases recognition based on such).  
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Degges for improved and more customized recognition.  

Claim 17 Recites limitations similar to claim 7 and is rejected for similar rationale and reasoning  


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAUN A ROBERTS whose telephone number is (571)270-7541.  The examiner can normally be reached Monday-Friday 9-5 EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAUN ROBERTS/
Primary Examiner, Art Unit 2655