DETAILED ACTION
Introduction
1.	This office action is in response to Applicant’s submission filed on 1/14/2021.   Claims 1-26 are pending in the application and have been examined.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
3.	The drawings filed on 1/14/2021 have been accepted and considered by the Examiner.

Information Disclosure Statement
4.	The information disclosure statement (IDS) submitted on November 30, 2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
5.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


6.	Claims 1-9, 11, 13-22, 24, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over US Pat. No. 11,158,307 (Ghias et al., hereinafter “Ghias”) in view of US Pat. No. 10,176,802 (Ladhak et al., hereinafter “Ladhak”).
With regard to Claim 1, Ghias describes:
“A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations comprising:
receiving a first-pass hypothesis and an encoded acoustic frame, (Column 25, lines 50-65 describes that multiple hypotheses are input into component 285.  The hypotheses are made of words or phenomes.  A first hypothesis is cited as “a first-pass hypothesis” and a second hypothesis is cited as “an encoded acoustic frame.”)
encoding the first-pass hypothesis at a hypothesis encoder; (Column 25, lines 53-56 describe that the hypotheses are encoded into a single data vector.)
generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector; (Column 25, lines 66-67 describe that an attention mechanism component acts on the encoded hypotheses.)
generating, using a second attention mechanism attending to the encoded first- pass hypothesis, a second context vector; (Column 25, lines 66-67 describe that an attention mechanism component acts on the encoded hypotheses.) and
decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis.” (Column 26, lines 1-5 and Figure 10 show that the output of attention mechanism 1010 is input into component 1015, which produces vocabulary for an alternate utterance component 285.  Column 10, lines 54-60 describe that component 285 determines an alternate possibility for the utterance, cited as “a second-pass hypothesis.”)
Ghias does not explicitly describe “the first-pass hypothesis generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame.”
However, column 5, lines 6-20 of Ladhak describes an ASR system that uses an RNN to process audio data and create a hypothesis.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the RNN as described by Ladhak into the system of Ghias to provide improved updates of word hypotheses, as described at column 4, lines 26-44 of Ladhak.
With regard to Claim 2, Ghias describes “decoding the first context vector and the second context vector comprises decoding a concatenation of the first context vector and the second context vector.”  Column 25, lines 53-56 of Ghias describes that the hypotheses are encoded into a single data vector, so they are already concatenated before the attention process.

With regard to Claim 3, Ghias describes “encoding the first-pass hypothesis comprises bi-directionally encoding the first-pass hypothesis at the hypothesis encoder to generate contextual information from the first-pass hypothesis.” Column 25, lines 61-64 Ghias describes that the encoding is a bi-direction process.
With regard to Claim 4, Ghias describes “the hypothesis encoder comprises a long short term memory (LSTM) network.” Column 25, lines 61-64 Ghias describes that the encoding is done with an LSTM network.
With regard to Claim 5, Ghias does not explicitly describe the subject matter of this claim.  However, Ladhak describes “the operations further comprise:
encoding the acoustic frame at a shared encoder;  (Column 16, lines 25-30 describe that an encoder is used to encode the audio data.  Thus, the encoder is “shared” by all the data.) and
generating the first-pass hypothesis at the RNN decoder model based on the encoded acoustic frame communicated from the shared encoder.” (Column 19, lines 35-44 describes that the encoded audio data is used to generate the hypotheses.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the RNN as described by Ladhak into the system of Ghias to provide improved updates of word hypotheses, as described at column 4, lines 26-44 of Ladhak.
With regard to Claim 6, Ghias describes “the operations further comprise generating an acoustic embedding at a unidirectional audio encoder based on the encoded acoustic frame communicated from the shared encoder.”  Column 25, lines 53-56 describe that the multiple hypotheses are encoded into a single data vector, which is cited as “an acoustic embedding.”
With regard to Claim 7, Ghias describes “the unidirectional audio encoder comprises a long short term memory (LSTM) network.” Column 25, lines 61-64 Ghias describes that the encoding is done with an LSTM network.
With regard to Claim 8, Ghias describes “the LSTM network comprises at least two layers.”  Figure 14 of Ghias shows that the LSTM has at least 2 layers.
With regard to Claim 9, Ghias describes “the operations further comprise:
training a deliberation decoder [[while parameters of the trained RNN decoder model remain fixed]], the deliberation decoder comprising the hypothesis encoder, the first attention mechanism, the second attention mechanism, and the context vector decoder.”  (Column 3, lines 8-10 describe that the model that receives and encodes the multiple hypotheses (cited as “the hypothesis encoder, the first attention mechanism, the second attention mechanism, and the context vector decoder”) is a trained model.) 
Ghais does not describe “the operations further comprise:
training the RNN decoder model; and
training a deliberation decoder while parameters of the trained RNN decoder model remain fixed.”
However, Ladhak describes “the operations further comprise:
training the RNN decoder model; (Column 5, lines 13-15 describe that the RNN model is trained.) and
training a deliberation decoder while parameters of the trained RNN decoder model remain fixed.” (The training of the RNN decoder of Ladhak would be independent of the training of the LSTM of Ghais.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the RNN as described by Ladhak into the system of Ghias to provide improved updates of word hypotheses, as described at column 4, lines 26-44 of Ladhak.
With regard to Claim 11, Ghias describes “the operations further comprise [[jointly]] training [[the RNN decoder model and]] a deliberation decoder, the deliberation decoder comprising the hypothesis encoder, the first attention mechanism, the second attention mechanism, and the context vector decoder. (Column 3, lines 8-10 describe that the model that receives and encodes the multiple hypotheses (cited as “the hypothesis encoder, the first attention mechanism, the second attention mechanism, and the context vector decoder”) is a trained model.)
Ghais does not describe “the operations further comprise jointly training the RNN decoder model.”
However, Ladhak describes “the operations further comprise jointly training the RNN decoder model.”  Column 5, lines 13-15 of Ladhak describe that the RNN model is trained.  The simultaneous or subsequent training of the RNN decoder of Ladhak and the LSTM of Ghais would be done “jointly,” as the claims do not recite any requirement for “joint” training.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the RNN as described by Ladhak into the system of Ghias to provide improved updates of word hypotheses, as described at column 4, lines 26-44 of Ladhak.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the RNN as described by Ladhak into the system of Ghias to provide improved updates of word hypotheses, as described at column 4, lines 26-44 of Ladhak.
With regard to Claim 13, Ghias describes “the data processing hardware resides on a user device.”  Column 3, lines 39-41 describe that the hardware resides on user device 110.
With respect to Claims 14-22, 24, and 26, system Claim 14 and method Claim 1 are related as a system programmed to perform the same method, with each claimed product step function corresponding to each claimed method step. Further, Ghias describes data processing hardware (column 31, line 47) and memory hardware (column 31, line 50).  Accordingly, Claims 14-22, 24, and 26 are similarly rejected under the same rationale as applied above with respect to Claims 1-9, 11, and 13.

7.	Claims 10, 12, 23, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Ghias in view of Ladhak and further in view of US Pat. App. Pub. No. 20170221474 (Hori et al., hereinafter “Hori”).
With regard to Claim 10, Ghias in view of Ladhak does not explicitly describe this subject matter.  However, Hori describes “the operations further comprise minimizing a word error rate during training of the RNN decoder model and the deliberation decoder model.”  Paragraph 18 of Hori describes that an RNN may be trained to minimize the word error rate.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the RNN training as described by Hori into the system of Ghias in view of Ladhak to reduce recognition errors, as described at paragraph 16 of Hori.
With regard to Claim 12, Ghias in view of Ladhak does not explicitly describe this subject matter.  However, Hori describes “the operations further comprise minimizing a word error rate during the joint training of the RNN decoder model and the deliberation decoder model.”  Paragraph 18 of Hori describes that an RNN may be trained to minimize the word error rate.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the RNN training as described by Hori into the system of Ghias in view of Ladhak to reduce recognition errors, as described at paragraph 16 of Hori.
With respect to Claims 23 and 25, system Claim 14 and method Claim 1 are related as a system programmed to perform the same method, with each claimed product step function corresponding to each claimed method step. Further, Ghias describes data processing hardware (column 31, line 47) and memory hardware (column 31, line 50).  Accordingly, Claims 23 and 25 are similarly rejected under the same rationale as applied above with respect to Claims 10 and 12.

Conclusion
8.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US Pat. App. Pub. No. 20200357392 (Zhou et al.) also encodes word hypotheses using an LSTM.
9.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWARD TRACY whose telephone number is (571)272-8332. The examiner can normally be reached Monday-Friday 9 AM- 5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/EDWARD TRACY JR./Examiner, Art Unit 2656                                                                                                                                                                                                        
/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656