Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
2.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103, which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-28 are rejected under 35 U.S.C. 103 as being unpatentable over Hoffmeister (US 20160379632) in view of Buchanan (US 20150310879).
As per claim 1, Hoffmeister teaches receiving, at data processing hardware, audio data of an utterance spoken by a user of a user device (Fig. 1 and [0020]); 
processing, by the data processing hardware, the audio data to obtain, as output from a speech recognition model configured to jointly perform speech decoding and endpointing of utterances: 
partial speech recognition results for the utterance ([0023]-[0024], [0063], partial speech recognition results of the spoken utterance are generated. For example, the ASR process may compare the input audio data with models for sounds (e.g., subword units or phonemes) and sequences of sounds to identify words that match the sequence of sounds spoken in the utterance of the audio data); and 
an endpoint indication indicating when the utterance has ended (Fig. 1, [0020], indicating when a spoken command has concluded); 
while processing the audio data, detecting, by the data processing hardware, based on the endpoint indication output from the speech recognition model, the end of the utterance ([0051], [0063], while processing the audio data, determining the begin point and endpoint of an utterance); and 
Hoffmeister does not explicitly disclose terminating, by the data processing hardware, the processing of any subsequent audio data received after the end of the utterance was detected in response to detecting the end of the utterance. 
Buchanan in the same filed of endeavor teaches speech endpointing based on word comparison, wherein a computing device receives an audio input of an utterance spoken by a user, and uses a continuous speech recognizer that incrementally transcribes the utterance. The computing device compares the incrementally recognized transcription to text samples from a collection of text samples to determine whether the transcription more likely represents a complete query or an incomplete query ([0004], [0022]).  Further the prior art Buchanan teaches terminating  the processing of any subsequent audio data received after the end of the utterance was detected in response to detecting the end of the utterance, by deactivating the corresponding microphone (Buchanan, [0006], [0037]).  Therefore, it would have been obvious at the time the application was filed to Buchanan’s above feature with the system of Hoffmeister, in order to process the target data and provide to the user accurate and desirable results.

As per claim 2, Hoffmeister teaches in response to detecting the end of the utterance, transmitting, by the data processing hardware, a transcription of the utterance to a search engine or a digital conversational assistant system, the transcription based on the partial speech recognition results output by the speech recognition model while processing the audio data ([0018], [0021].  See also, Buchanan, [0060]).
As per claim 3, Hoffmeister wherein the speech recognition model comprises a streaming speech recognition model configured to emit the partial speech recognition results for the utterance before the endpointing decision indicates when the utterance has ended ([0058],[0063], [0066], wherein the server continues processing audio data and necessarily emitting or producing results until an endpoint is detected).
As per claim 4, Hoffmeister teaches prior to detecting the end of the utterance, providing, by the data processing hardware, for display on a screen in communication with the data processing hardware, any partial speech recognition results emitted by the speech recognition audio ([0024], outputting the most likely text recognized in the audio data based on partial speech recognition results ([0023]-[0024], [0063]). for display on a screen, is considered as an intended use limitation).
As per claim 5, Hoffmeister teaches wherein the speech recognition model comprises a Recurrent Neural Network-Transducer (RNN-T) ([0053], various machine learning techniques may be used to determine a beginning of an utterance or an end of an utterance, i.e. neural networks (such as deep neural networks and/or recurrent neural networks)).
As per claim 6, Hoffmeister teaches wherein the data processing hardware resides on the user device of the user and executes the speech recognition model ([0021], [0022], the various components of processing a spoken command may be located on a same or different physical devices).
As per claim 7, Hoffmeister does not explicitly disclose in response to detecting the end of the utterance, deactivating, by the data processing hardware, an array of one or more microphones that captured the utterance spoken by the user.
 Buchanan in the same filed of endeavor teaches speech endpointing based on word comparison ([0004], [0022]), wherein in response to detecting the end of the utterance, deactivating, by the data processing hardware, an array of one or more microphones that captured the utterance spoken by the user (Buchanan, [0006], [0037]).  Therefore, it would have been obvious at the time the application was filed to Buchanan’s above feature with the system of Hoffmeister, in order to process the target data and provide to the user accurate and desirable results.
As per claim 8, Hoffmeister teaches wherein the speech recognition model comprises a neural network configured to generate, as output, a probability distribution over a set of output labels for each audio feature in the sequence of audio features, the output labels comprising linguistic units and an utterance endpoint symbol ([0025], [0053], [0055], using labels bout utterance beginnings, utterance conclusions, existence/lengths of pauses, etc.).
As per claim 9, Hoffmeister teaches wherein processing the audio data to obtain the endpoint indication comprises determining the speech recognition model has output the endpoint indication when the output label comprising the utterance endpoint symbol is associated with a probability score that satisfies a threshold score ([0055]-[0056], [0069], semantic tags representing the length of pauses determine the likelihood of an endpoint).
As per claim 10, Hoffmeister teaches wherein the linguistic units comprise graphemes, wordpieces, or words ([0023], wherein the linguistic units comprise subwords units, phonemes).
As per claim 11, Hoffmeister teaches wherein the output labels further comprise at least one of: an utterance start symbol indicating the beginning of an utterance; a punctuation symbol indicating a punctuation element; a space symbol indicating a space character; or a blank symbol that does not correspond to any orthographic element ([0071], semantic tags, state indicators, or other non-speech items).
As per claim 12, Hoffmeister teaches the speech recognition model is configured to output, based on the audio data for the utterance, a decoded sequence of symbols; and processing the audio data to obtain the endpoint indication comprises determining whether an endpoint symbol corresponding to the endpoint indication is included in the decoded sequence of symbols output by the speech recognition model ([0020], [0025], [0074], and [0055]) 
As per claim 13, Hoffmeister teaches wherein determining whether the endpoint symbol corresponding to the endpoint indication is included in the decoded sequence of symbols output by the speech recognition model comprises determining that a probability score associated with the endpoint symbol satisfies a threshold score ([0051], [0057], when the confidence level satisfies a confidence level threshold, an utterance begin/end may be marked).
As per claim 14, Hoffmeister teaches wherein processing the audio data to obtain the endpoint indication indicating the end of the utterance comprises determining, during a beam search process over multiple outputs of the speech recognition model, whether a symbol corresponding to the endpoint indication is included in one or more beams of the beam search process ([0055], [0070], considering labels and tags representing pauses durations to determine whether an endpoint has been reached).
As per claims 15-28, system claims 15-28 and method claims 1-14 are related as apparatus and the method of using same, with each claimed element's function corresponding to the claimed method step.  Accordingly, claims 15-28 are similarly rejected under the same rationale as applied above with respect to method claims 1-14.  Further, Hoffmeister teaches data processing hardware; and memory hardware in communication with the data processing hardware as claimed ([0090]). 
Conclusion
3.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABDELALI SERROU whose telephone number is (571)272-7638. The examiner can normally be reached M-F 9 Am - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more 





/ABDELALI SERROU/Primary Examiner, Art Unit 2659