DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments, see pages 8-12, filed 10/20/2021, with respect to claims 28-33 have been fully considered and are persuasive.  The Restriction/Election of claims 28-33 has been withdrawn. 
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-5, 9, 12, 15-18, 22, and 25 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Gruenstein et al. (US 10,170,112).
Regarding Claim 1, Gruenstein et al discloses a speech recognition method comprising: adding a preset special sequence to a front end of an input sequence that corresponds to an input utterance of a speaker (The device 102 may be configured to activate upon detecting ambient audio that contains a pre-defined hotword. For example, as shown in FIG. 1A, the phrase "OK Voice Service" is a hotword that activates the device 102 in a mode that enables it to receive voice queries) (col. 5, lines41-56); recognizing the preset special sequence and the input sequence using an 
Regarding Claim 2, Gruenstein et al discloses a speech recognition method, wherein the input sequence includes the input utterance or vectors extracted from the input utterance (a user 104 may first speak the "OK Voice Service" hotword to activate the device 102 and then speak the query "Call Teresa's school" to prompt a dialing operation) (col. 6, lines 13-16).
Regarding Claim 3, Gruenstein et al discloses a speech recognition method, wherein the preset special sequence includes a preset utterance of the speaker, or at least one vector extracted from the preset utterance (The device 102 may be configured to activate upon detecting ambient audio that contains a pre-defined hotword. For example, as shown in FIG. 1A, the phrase "OK Voice Service" is a hotword that activates the device 102 in a mode that enables it to receive voice queries) (col. 5, lines41-56).
Regarding Claim 4, Gruenstein et al discloses a speech recognition method, wherein the preset special sequence is a preset utterance of the speaker (The device 102 may be configured to activate upon detecting ambient audio that contains a pre-defined hotword. For example, as shown in FIG. 1A, the phrase "OK Voice Service" is a 
Regarding Claim 5, Gruenstein et al discloses a speech recognition method, wherein the preset special sequence is "hi." (For instance, the device 102 may include one or more microphones and a hotworder that is configured to constantly listen for pre-defined hotwords spoken in proximity of the device 102 (e.g., in a local environment of the device 102). The device 102 may be configured to activate upon detecting ambient audio that contains a pre-defined hotword) (col. 5, lines 44-49).  It is being interpreted by the examiner that “hi” can be one of the pre-defined hotwords.
Regarding Claim 9, Gruenstein et al discloses a speech recognition method, wherein recognizing the preset special sequence and the input sequence comprises: outputting the recognition result that corresponds to the preset special sequence and the input sequence by inputting the preset special sequence and the input sequence to an end-to- end artificial neural network that has an encoder-decoder architecture (The hotworder 204 may use classifying windows to process these audio features using, for example, a support vector machine, a machine-learned neural network, or other models) (col. 9, line 60-col. 10, line 15).
Regarding Claim 12, Gruenstein et al discloses a speech recognition method, herein recognizing the input sequence comprises: excluding a recognition result corresponding to the preset special sequence from the recognition result that corresponds to the preset special sequence and the input sequence (The suppressor 222 suppresses performance of operations associated with voice queries that are determined to be illegitimate) col. 10, line 65-col. 11, line 6).
Claims 15 and 16 are rejected for the same reason as claim 1.
Claim 17 is rejected for the same reason as claim 2.
Claim 18 is rejected for the same reason as claim 3.
Claim 22 is rejected for the same reason as claim 9.
Claim 25 is rejected for the same reason as claim 12.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 6-8, 10, 11, 14, 19-21, 23, 24, 27-30, and 33 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gruenstein et al. in view of Le at al. (US 2017/0316775).
Regarding Claim 6, Gruenstein et al teaches a speech recognition method, wherein recognizing the preset special sequence and the input sequence comprises: generating the preset special sequence and the input sequence (a user 104 may first speak the "OK Voice Service" hotword to activate the device 102 and then speak the query "Call Teresa's school" to prompt a dialing operation) (col. 6, lines 13-16); outputting at least one special token that corresponds to the preset special sequence based on the feature (The device 102 may be configured to activate upon detecting ambient audio that contains a pre-defined hotword) (col. 5, lines 48-52); and determining at least one output token that corresponds to the input sequence based on 
Gruenstein et al fails to teach a speech recognition method, generating an encoded feature by encoding.
Le teaches a speech recognition method, generating an encoded feature by encoding (a long short term memory (LSTM) unit encodes a context, word by word, into a vector) (page 5, paragraph [0042])
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Gruenstein with the teachings of Le to improve and optimize speech recognition by encoding/decoding the hotwords and the voice query.
Regarding Claim 7, Gruenstein et al teaches a speech recognition method, wherein the special token includes a text corresponding to a preset utterance of the speaker (The fingerprint can include an audio component that represents acoustic features of the query, a textual component that represents a transcription of the query, or both) col. 10, lines 58-61).
Regarding Claim 8, Gruenstein et al teaches a speech recognition method, wherein recognizing the preset special sequence and the input sequence comprises: generating the preset special sequence and the input sequence (a user 104 may first speak the "OK Voice Service" hotword to activate the device 102 and then speak the query "Call Teresa's school" to prompt a dialing operation) (col. 6, lines 13-16); and determining at least one output token that corresponds to the input sequence based on 
Gruenstein et al fails to teach a speech recognition method, generating an encoded feature by encoding (a long short term memory (LSTM) unit encodes a context, word by word, into a vector) (page 5, paragraph [0042]).  
Le teaches a speech recognition method, generating an encoded feature by encoding
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Gruenstein with the teachings of Le to improve and optimize speech recognition by encoding/decoding the hotwords and the voice query.
Regarding Claim 10, Gruenstein et al fails to teach a speech recognition method, wherein a decoder of the end- to-end artificial neural network is configured to output the recognition result dependent on a recognition result from previous steps based on information calculated from an encoder of the end-to-end artificial neural network in each step.
Le teaches a speech recognition method, wherein a decoder of the end- to-end artificial neural network is configured to output the recognition result dependent on a recognition result from previous steps based on information calculated from an encoder of the end-to-end artificial neural network in each step (A decoder similar to that used in the neural chat model can be used here) (page 7, paragraph [0059]).

Regarding Claim 11, Gruenstein et al fails to teach a speech recognition method, wherein the artificial neural network includes one or more of a recurrent neural network (RNN), a convolutional neural network (CNN), and a self-attention neural network (SANN).  
Le teaches a speech recognition method, wherein the artificial neural network includes one or more of a recurrent neural network (RNN), a convolutional neural network (CNN), and a self-attention neural network (SANN) (In dialog systems disclosed herein, the NLU/DM/NLG utterance generation chain is replaced by a set of language models (LMs) whose outputs are combined using a single Recurrent Neural Network (RNN) that generates a current natural language utterance from the dialogue history up to the point where this current utterance is to be produced) (page 2, paragraph [0019]).
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Gruenstein with the teachings of Le to improve speech recognition by employing RNN to integrate or mix up an arbitrary number or language models.
Regarding Claim 14, Gruenstein et al fails to teach a speech recognition method, wherein, in a case in which the input sequence includes only noise, the method further 
Le teaches a speech recognition method, wherein, in a case in which the input sequence includes only noise, the method further comprises: recognizing the input sequence subsequent to the preset special sequence as an end of state (EOS) token (For each context, to mark where its ends, a token "<EOC>" was added to its tail) (page 4, paragraph [0035]).
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Gruenstein with the teachings of Le to improve speech recognition by identifying when the utterance ends. 
Claims 19 is rejected for the same reason as claim 6.
Claims 20 is rejected for the same reason as claim 7.
Claims 21 is rejected for the same reason as claim 8.
Claims 23 is rejected for the same reason as claim 10.
Claims 24 is rejected for the same reason as claim 11.
Claims 27 is rejected for the same reason as claim 14.
Regarding Claim 28, Gruenstein et al teaches a processor-implemented speech recognition method comprising: extracting a feature vector from an utterance that includes an input utterance and a special utterance that is added prior to the input utterance (The device 102 may be configured to activate upon detecting ambient audio that contains a pre-defined hotword. For example, as shown in FIG. 1A, the phrase "OK Voice Service" is a hotword that activates the device 102 in a mode that enables it to 
Gruenstein et al fails to teach a processor-implemented speech recognition method, encoding the feature vector to generate an encoded feature.
Le teaches a processor-implemented speech recognition method, encoding the feature vector to generate an encoded feature (a long short term memory (LSTM) unit encodes a context, word by word, into a vector) (page 5, paragraph [0042])
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Gruenstein with the teachings of Le to improve and optimize speech recognition by encoding/decoding the hotwords and the voice query.
Regarding Claim 29, Gruenstein et al teaches a processor-implemented speech recognition method, further comprising outputting a special token that corresponds to the special utterance as the output token (The device 102 may be configured to activate upon detecting ambient audio that contains a pre-defined hotword) (col. 5, lines 48-52).
Regarding Claim 30, Gruenstein et al fails to teach a processor-implemented speech recognition method, further comprising decoding an input token and the encoded feature to output the special token.
Le teaches a processor-implemented speech recognition method, further comprising decoding an input token and the encoded feature to output the special 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Gruenstein with the teachings of Le to improve and optimize speech recognition by encoding/decoding the hotwords and the voice query.   
Regarding Claim 33, Gruenstein et al teaches a processor-implemented speech recognition method,
Gruenstein et al fails to teach a processor-implemented speech recognition method, wherein encoding the feature vector includes transforming a dimension of the feature vector to generate the encoded feature.
Le teaches a processor-implemented speech recognition method, wherein encoding the feature vector includes transforming a dimension of the feature vector to generate the encoded feature (a long short term memory (LSTM) unit encodes a context, word by word, into a vector) (page 5, paragraph [0042])
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Gruenstein with the teachings of Le to improve and optimize speech recognition by encoding/decoding the hotwords and the voice query.
Allowable Subject Matter
Claims 13, 26, 31, and 32 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.  Specifically, the prior art fails 
Information Disclosure Statement
The information disclosure statement (IDSs) submitted on 02/11/2020 and 07/20/2020 are in compliance with the provisions of 37 CFR 1.97 and 1.98.  Accordingly, the information disclosure statement is being considered by the examiner.
Cited Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Biadsy et al. (US 9,842,592) discloses language models using non-linguistic context.
Biadsy et al. (US 2018/0053502) discloses language models using domain specific model components.
Gruenstein et al. (US 2019/0156828) discloses detecting and suppressing voice queries.
Gruenstein et al. (US 2020/0357400) discloses detecting and suppressing voice queries.
Biadsy et al. (US 2021/0020170) discloses language models using domain specific model components
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SATWANT K SINGH whose telephone number is (571)272-7468. The examiner can normally be reached Monday thru Friday 8:30 AM to 5:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SATWANT K SINGH/Primary Examiner, Art Unit 2672