DETAILED ACTION
Introduction
1.	This office action is in response to Applicant’s submission filed on 2/8/2021.   Claims 1-26 are pending in the application and have been examined.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
3.	The drawings filed on 2/8/2021 have been accepted and considered by the Examiner.

Information Disclosure Statement
4.	The information disclosure statement (IDS) submitted on February 8, 2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.  The information disclosure statement (IDS) submitted on June 21, 2021 did not include a copy of “Training Neural Speech Recognition Systems with Synthetic Speech Augmentation” (Li et al.).  Thus, that reference is lined out on the enclosed form 1449.  However, that reference has been considered, and is listed on the enclosed form 892.

Claim Rejections - 35 USC § 103
5.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


6.	Claims 1, 4-10, 14, and 17-23 are rejected under 35 U.S.C. 103 as being unpatentable over “Training Neural Speech Recognition Systems with Synthetic Speech Augmentation” (Li et al., hereinafter “Li”) in view of US Pat. App. Pub. No. 20220051059 (Shang et al., hereinafter “Shang”).
With regard to Claim 1, Li describes:
“A computer-implemented method when executed by data processing hardware causes the data processing hardware to perform operations comprising:
receiving [[a set of training utterance pairs, each training utterance pair comprising:]]
a non-synthetic speech representation of a corresponding utterance; (Section 3.2, page 3, lines 8-10 describe that half the input data to a training process is natural speech) and
a synthetic speech representation of the corresponding utterance; (Section 3.2, page 3, lines 8-10 describe that half the input data to a training process is synthetic speech)
at each of a plurality of output steps for each training utterance pair in the set of training utterance pairs:
generating, for output by a speech recognition model, a first probability distribution over possible non-synthetic speech recognition hypotheses for the corresponding non-synthetic speech representation of the corresponding utterance; (Table 2, page 3 shows a natural speech input. The WER data is based on probability distributions being produced.)
generating, for output by the speech recognition model, a second probability distribution over possible synthetic speech recognition hypotheses for the corresponding synthetic speech representation of the corresponding utterance; (Table 2, page 3 shows a synthetic speech input. The WER data is based on probability distributions being produced.) and
Li does not explicitly describe:
 “receiving a set of training utterance pairs, 
determining a consistent loss term for the corresponding training utterance pair based on the first probability distribution over possible non-synthetic speech recognition hypotheses and the second probability distribution over possible non- synthetic speech recognition hypotheses; and
updating parameters of the speech recognition model based on the consistent loss term determined at each of the plurality of output steps for each training utterance pair in the set of training utterance pairs.”
However, Shang describes:
“receiving a set of training utterance pairs, (Paragraph 113 describes that a second predicted probability and a third predicted probability can be calculated from a set of training utterance pairs.)
determining a consistent loss term for the corresponding training utterance pair based on the first probability distribution over possible non-synthetic speech recognition hypotheses and the second probability distribution over possible non- synthetic speech recognition hypotheses; (Paragraph 113 describes that a consistent loss can be calculated from the second predicted probability and the third predicted probability calculated from a set of training utterance pairs.) and
updating parameters of the speech recognition model based on the consistent loss term determined at each of the plurality of output steps for each training utterance pair in the set of training utterance pairs.” (Paragraph 120 describes that the consistent loss is used to update the model.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the loss calculation as described by Shang into the system of Li to provide improved updates of the model during training, as described at paragraph 120 of Shang.

With regard to Claim 4, Li describes “the synthetic speech representation in each training utterance pair comprises synthesized speech representing the corresponding utterance.”  Page 1, last paragraph of Li describes that the synthetic speech represent a corresponding utterance.
With regard to Claim 5, Li describes “the operations further comprise, for each training utterance pair, converting, using a text-to-speech (TTS) model, a ground truth transcription of the corresponding utterance to generate the corresponding synthetic speech representation.” Page 2, section 2 of Li describes that the synthetic speech is based on a ground truth transcript.
With regard to Claim 6, Li describes “the operations further comprise, for at least one training utterance pair, applying data augmentation to at least one of the corresponding non-synthetic speech representation or the corresponding synthetic speech representation. Page 2, section 2 of Li describes that the dropout rate was manipulated for the synthetic data, which is cited as “data augmentation.”
With regard to Claim 7, Li describes “the applied data augmentation comprises at least one of adding noise, adding reverberation, or manipulating timing.”  Page 2, section 2 of Li describes that the dropout rate was manipulated for the synthetic data, which is cited as “manipulating timing.”
With regard to Claim 8, Li does not explicitly describe this subject matter.  However, Sheng describes “determining the consistent loss term for the corresponding training utterance pair is based on a Kullback-Leibler divergence between the first probability distribution over possible non-synthetic speech recognition hypotheses and the second probability distribution over possible non-synthetic speech recognition hypotheses.”  (Paragraph 113 describes that a Kullback-Leibler divergence can be calculated from the second predicted probability and the third predicted probability calculated from a set of training utterance pairs.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the loss calculation as described by Shang into the system of Li to provide improved updates of the model during training, as described at paragraph 120 of Shang.
With regard to Claim 9, Li describes “the operations further comprise, prior to determining the consistent loss term for the corresponding training utterance pair, when the speech recognition model comprises a frame alignment-based transducer model, defining a valid alignment distribution for a sequence of first probability distributions over possible non-synthetic speech recognition hypotheses generated at each of the output steps and a sequence of second probability distributions over possible synthetic speech recognition hypotheses generated at each of the output steps based on a probability of valid alignments of a ground-truth transcription of the corresponding utterance.”  Page 2, section 2 of Li describes that the synthetic speech is based on a ground truth transcript for every type of model, including a frame alignment-based transducer model.  Thus, the synthetic data is always based on recognition hypotheses.  Table 2, page 3 shows both natural and synthetic speech inputs. The WER data is based on probability distributions being produced.
With regard to Claim 10, Li describes “the valid alignment distribution is defined over each output label in a sequence of output labels representing the ground-truth transcription of the corresponding utterance.” Page 2, section 2 of Li describes that the synthetic speech is based on a ground truth transcript.  The resulting synthetic data is a sequence of output labels based on the ground-truth transcription.
With respect to Claims 14 and 17-23, system Claim 14 and method Claim 1 are related as a system programmed to perform the same method, with each claimed product step function corresponding to each claimed method step. Further, Ghias describes data processing hardware (processor, paragraph 10) and memory hardware (memory, paragraph 10).  Accordingly, Claims 14 and 17-23 are similarly rejected under the same rationale as applied above with respect to Claims 1 and 4-10.

7.	Claims 2, 3, 15, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Shang and further in view of US Pat. No. 10,628,688 (Kim et al., hereinafter “Kim”).
With regard to Claim 2, Section 3.3 of Li describes generating probability distributions over synthetic and natural speech hypotheses, but Li in view of Shang does not explicitly “the operations further comprise, at each of the plurality of output steps for each training utterance pair in the set of training utterance pairs:
generating a non-synthetic speech loss term based on the first probability distribution over possible non-synthetic speech recognition hypotheses for the corresponding non-synthetic speech representation of the corresponding utterance and a ground-truth transcription of the corresponding utterance; Column 2, lines 30-35 and
generating a synthetic speech loss term based on the second probability distribution over possible synthetic speech recognition hypotheses for the corresponding synthetic speech representation of the corresponding utterance and the ground-truth transcription of the corresponding utterance.”  
However, column 2, lines 30-35 of Kim describes generating a loss term based on the first probability distribution over possible recognition hypotheses and a ground-truth data.  Thus, combining Kim with Li’s description of generating probability distributions over synthetic and natural speech hypotheses would create the subject matter of Claim 2.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the distribution calculations as described by Kim into the system of Li in view of Shang to generate estimated relational information, as described at column 2, lines 30-35 of Kim.
With regard to Claim 3, Li does not explicitly describe this subject matter.  However, Shang describes “the parameters of the speech recognition model are updated based on the consistent loss term determined at each of the plurality of output steps for each training utterance pair in the set of training utterance pairs independently of the non-synthetic and synthetic speech loss terms generated at each of the plurality of output steps for each training utterance pair in the set of training utterance pairs.”  Paragraph 120 of Shang describes that the consistent loss is used to update the model.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the loss calculation as described by Shang into the system of Li to provide improved updates of the model during training, as described at paragraph 120 of Shang.
With respect to Claims 15 and 16, system Claim 14 and method Claim 1 are related as a system programmed to perform the same method, with each claimed product step function corresponding to each claimed method step. Further, Sheng describes data processing hardware (processor, paragraph 10) and memory hardware (memory, paragraph 10).  Accordingly, Claims 15 and 16 are similarly rejected under the same rationale as applied above with respect to Claims 2 and 3.

8.	Claims 11-13 and 24-26 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Shang and further in view of US Pat. App. Pub. No. 20200349927 (Stoimenovet al., hereinafter “Sto”).
With regard to Claim 11, Li in view of Shang does not explicitly describe this subject matter.  However, Sto describes “the speech recognition model comprises a frame- alignment-based transducer model.”  Paragraph 21 describes the use of an RRN-T language model, which is a type of frame-alignment-based transducer model.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the RNN-T as described by Sto into the system of Li in view of Shang to help detect custom words, as described at paragraph 21 of Sto.
With regard to Claim 12, Li in view of Shang does not explicitly describe this subject matter.  However, Sto describes “the frame-alignment-based transducer model comprises a Recurrent Neural Network-Transducer (RNN-T) model.” Paragraph 21 describes the use of an RRN-T language model.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the RNN-T as described by Sto into the system of Li in view of Shang to help detect custom words, as described at paragraph 21 of Sto.
With regard to Claim 13, Li in view of Shang does not explicitly describe this subject matter.  However, Sto describes “the speech recognition model comprises an alignment-based encoder-decoder model.” Paragraph 21 describes the use of an RRN-T language model, which is a type of alignment-based encoder-decoder model.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the RNN-T as described by Sto into the system of Li in view of Shang to help detect custom words, as described at paragraph 21 of Sto.
With respect to Claims 24-26, system Claim 14 and method Claim 1 are related as a system programmed to perform the same method, with each claimed product step function corresponding to each claimed method step. Further, Sheng describes data processing hardware (processor, paragraph 10) and memory hardware (memory, paragraph 10).  Accordingly, Claims 24-26 are similarly rejected under the same rationale as applied above with respect to Claims 11-13.

Conclusion
9.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US Pat. App. Pub. No. 20210224606 (Lee et al.) also describes the use of Kullback-Leibler loss.
10.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWARD TRACY whose telephone number is (571)272-8332. The examiner can normally be reached Monday-Friday 9 AM- 5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/EDWARD TRACY JR./Examiner, Art Unit 2656                                                                                                                                                                                                        
/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656