DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments and amendments in the Amendment filed September 12, 2022 (herein “Amendment”), with respect to the rejection of claims 7 and 17 under 35 U.S.C. 112(b) have been fully considered and are persuasive.  The rejection of claims 7 and 17 under 35 U.S.C. 112(b) has been withdrawn. 
Applicant’s arguments and amendments in the Amendment with respect to the rejection(s) of claim(s) 1 and 11, and claims depending therefrom under 35 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Gong et al., "Sentence-wise smooth regularization for sequence to sequence learning", Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, No. 01, 2019.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 7-12, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al., (US 11,037,547 B2, herein “Wang”) in view of Laurent et al., "Improving recognition of proper nouns (in ASR) through generation and filtering of phonetic transcriptions" Computer Speech and Language, Elsevier, 2014, 28 (4), pp.979-996. 10.1016/j.csl.2014.02.006. hal-01433238 (herein “Laurent NPL”) further in view of Gong et al., "Sentence-wise smooth regularization for sequence to sequence learning", Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, No. 01, 2019 (herein “Gong NPL”).
Regarding claim 1, Wang teaches a computer-implemented method that when executed on data processing hardware causes the data processing to perform operations comprising (Wang col. 2, ll. 4-10, a processor that operates (method) as instructed by program code, an attention based end-to-end training of an automatic speech recognition system): 
Regarding claim 11, Wang teaches a system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising (Wang col. 2, ll. 4-10, and 27-30, a processor that operates  as instructed by program code/instructions stored on a non-transitory computer readable medium, an attention based end-to-end training of an automatic speech recognition system):
Regarding claims 1 and 11, Wang teaches training a speech recognition model with a minimum word loss function by (Wang col. 7, ll. 1-2, and col. 8, ll. 40-55, equation 12, method for training an automatic speech recognition system, that uses a token wise training in beam (TWTiB) loss function, where tokens can represent words (such as “wreck” in the given example of “It is not easy to wreck a nice beach”)): 
receiving a training example (Wang col. 7, ll. 30-41, in the training process, X is input audio data, and r is a “reference token”/golden/labeled token); 
generating a plurality of hypotheses corresponding to the training example, each hypothesis of the plurality of hypotheses representing the (Wang col. 8, ll. 20-45, token-wise training (TWT) is applied on multiple hypothesis where yt is an output token (generated) of a hypothesis j in a beam, and where col. 1, ll. 9-10 teach that tokens are mapped to words, which proper nouns are (although Wang does not disclose proper nouns as discussed below)) and comprising a corresponding probability that indicates a likelihood that the hypothesis represents the (Wang col. 8, ll. 33-35, hypothesis has its own posterior probability indicating whether it is the best hypothesis (likelihood of representing the correct output)); 
determining that the corresponding probability associated with one of the plurality of hypotheses satisfies a penalty criteria, the penalty criteria indicating that: the corresponding probability satisfies a probability threshold (Wang col. 8, ll. 33-35, a hypothesis, including the first wrong token ytw, thus having a penalty/incorrectness associated with the hypothesis, is selected (determining) as the hypothesis with the highest posterior probability (satisfies a probability threshold), that is, it is a one-best hypothesis); and the associated hypothesis incorrectly represents the (Wang col. 8, ll. 33-51, fig. 4, the hypothesis with the highest posterior probability incorrectly has the word “wreck” as a token to represent the output, when the reference/correct token is “rec” of “recognize”); and 
applying a penalty to the loss function (Wang col. 8, ll. 35-45, as given in equation 12, the overall loss function L(θ) is the summation of the loss functions for individual tokens  lθ, and as such, the loss function for the selected hypothesis with the first wrong token contributes (with its error) to the summation, thus a penalty).
While Wang teaches using a loss function in its ASR training, Wang does not explicitly teach the error function is a “minimum word error rate” loss function. However, Wang teaches in col. 3, ll. 7-9 that the token-wise training method disclosed is flexible and can be combined with a variety of loss functions, and in col. 1, ll. 22-27 teach a loss function as an expected number of word errors. Therefore, while Wang does not explicitly teach using the expected number of word errors as a loss function to be combined with the loss function disclosed in the embodiment of fig. 3, cited above, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the loss function of an expected number of word errors as disclosed in the background section of Wang with the loss function for the speech recognition training disclosed in the fig. 3 embodiment of Wang at least because doing so would provide a training criteria close to commonly used evaluation metrics (see Wang col. 1, ll. 7-13) and thus be combining prior art elements according to known methods to yield predictable results. see MPEP 2143(I)(A).
Further, while Wang does disclose using a training corpus, which more than likely has proper noun samples within it, Wang does not explicitly disclose “a training example comprising a proper noun.” Accordingly, while Wang does teach predicted output tokens, and reference tokens, Wang does not teach the tokens are proper nouns.
Still further, while Wang teaches that a hypothesis with a highest probability is selected, Wang does not explicitly teach the probability “exceeds a value assigned to a probability threshold,” as claimed.
Laurent NPL teaches a training example comprising a proper noun (Laurent NPL page 17, section 7.1 corpus used for training including annotations for named entities allowing easy spotting of proper nouns).
Gong NPL teaches exceeds a value assigned to a probability threshold (Gong NPL section 3.1, page 6451, the prediction saturation principle of optimization of a loss function, including the consideration of token predictions that are good enough as they have a probability that is larger (exceeds) a threshold probability β).
Therefore, taking the teachings of Wang and Laurent NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the training data of Wang to include the annotated named entity corpus data of Laurent NPL at least because doing so would provide a reduction in word error rate for an ASR system (see Laurent NPL page 2, section 1).
Therefore, taking the teachings of Wang and Gong NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the training methodology of Wang to include considerations of token probabilities being greater than a threshold as disclosed in Gong NPL at least because doing so would provide smoother (reduced variance and different between tokens) probabilities, and consequently better performance (see Gong NPL page 6450, bottom left column, and page 6451, section 3.1, top left column).
Regarding claims 2 and 12, Wang teaches wherein the corresponding probability the probability threshold when the corresponding probability is greater than the corresponding probabilities associated with the other hypothesis (Wang col. 8, ll. 33-39, a hypothesis that has the highest posterior probability, the one-best hypothesis, is selected, thus, having the highest meaning that it is greater than the posterior probabilities of all the other hypothesis).
As noted above in the rejection rationale for claims 1 and 11, and for the motivations given above also, Wang modified by Gong NPL teaches the probability exceeding the value assigned to the probability threshold.
Regarding claims 7 and 17, Wang teaches wherein the operations further comprise assigning the corresponding probability to each hypothesis of the plurality of hypotheses (Wang col. 8, ll. 33-35, hypothesis has its own posterior probability indicating whether it is the best hypothesis (likelihood of representing the correct output)).
Regarding claims 8 and 18, Wang teaches wherein the operations further comprise: receiving an incorrect hypothesis (Wang col. 8, ll. 33-49, a hypothesis that includes a wrong token is selected); and assigning a respective probability to the incorrect hypothesis (Wang col. 8, ll. 33-35, the hypothesis with the wrong token has a highest posterior probability), wherein the penalty criteria further comprises an indication that the hypothesis includes the generated incorrect hypothesis (Wang col. 8, ll. 20-45, loss function L(θ) including a loss function for an output token that is the first wrong token which is given a -log term in the loss function, thus an indication that the token it represents is incorrect).
Regarding claims 9 and 19, Wang teaches wherein the incorrect hypothesis comprises a phonetically similarity (Wang col. 8, ll. 48-51, fig. 4, the incorrect token is “wreck” in “wreck a nice” and the actual correct token is “rec” from “recognize”, where wreck and rec are phonetically similar).
While Wang discloses speech recognition generally, which could include training data that includes labeled proper nouns, Wang does not explicitly teach “the proper noun.”
Laurent NPL teaches the proper noun (Laurent NPL page 17, section 7.1 corpus used for training including annotations for named entities allowing easy spotting of proper nouns).
Therefore, taking the teachings of Wang and Laurent NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the training data of Wang to include the annotated named entity corpus data of Laurent NPL at least because doing so would provide a reduction in word error rate for an ASR system (see Laurent NPL page 2, section 1).
Regarding claims 10 and 20, Wang teaches wherein the operations further comprise substituting the incorrect hypothesis for a generated hypothesis of the plurality of hypotheses (Wang col. 7, ll. 15-63, and col. 8, ll. 40-55, in cross-entropy training of the model, a first wrong token is corrected though updating the model based on the total loss, and an output token is substituted by a reference token).
Claims 3-6 and 13-16 are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Laurent NPL in view of Gong NPL, as set forth above regarding claim 1 from which claim 3 depends, and from claim 11 from which claim 13 depends, further in view of Sainath et al., “Two-Pass End-to-End Speech Recognition,” arXiv:1908.10992v1 [cs.CL], August 29, 2019, https://doi.org/10.48550/arXiv.1908.10992 (herein “Sainath NPL”).
Regarding claims 3 and 13, Wang does not teach the limitations of claims 3 or 13. Sainath NPL teaches [wherein the speech recognition model comprises a two-pass architecture – claim 3 only] comprising:  a first pass network comprising a recurrent neural network transducer (RNN-T) decoder; and a second pass network comprising a listen-attend-spell (LAS) decoder [wherein the speech recognition model comprises the first pass network and the second pass network – claim 13 only]. (Sainath NPL section 2.1, two-pass end-to-end automatic speech recognition, in the first pass through an RNN-T decode, in the second pass through an LAS decoder).
Therefore, taking the teachings of Wang as modified by Laurent NPL, and Sainath NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition disclosed in Wang with the automatic speech recognition processing disclosed in Sainath NPL at least because doing so would achieve a reduction in word error rate (see Sainath NPL Abstract).
Regarding claims 4 and 14, Wang does not teach the limitations of claims 4 and 14. Sainath NPL teaches [wherein the speech recognition model – claim 4 only] further compris[es – claim 4/ing – claim 14] a shared encoder [, the shared encoder encoding – claim 4/configured to encode – claim 14] acoustic frames for each of the first pass network and the second pass network (Sainath NPL section 2.1, in the first pass, each acoustic frame is passed through a share encoder, then later the output of the shared encoder is passed to an LAS decoder in a second pass).
Therefore, taking the teachings of Wang as modified by Laurent NPL, and Sainath NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition disclosed in Wang with the automatic speech recognition processing disclosed in Sainath NPL at least because doing so would achieve a reduction in word error rate (see Sainath NPL Abstract).
Regarding claims 5 and 15, Wang does not teach the limitations of claims 5 and 15. Sainath NPL teaches wherein training with the minimum word error rate loss function occurs at the LAS decoder (Sainath NPL section 2.3.2, an additional training step is performed with the LAS decoder following an MWER (minimum word error rate) loss function).
Therefore, taking the teachings of Wang as modified by Laurent NPL, and Sainath NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition disclosed in Wang with the automatic speech recognition processing disclosed in Sainath NPL at least because doing so would achieve a reduction in word error rate (see Sainath NPL Abstract).
Regarding claims 6 and 16, Wang does not teach the limitations of claims 6 and 16. Sainath NPL teaches wherein the operations further comprise: training the RNN-T decoder; and prior to training the LAS decoder with the minimum word error rate loss function, training the LAS decoder while parameters of the trained RNN-T decoder remain fixed (Sainath NPL section 2.3.1, in the training process, the RNN-T decoder is trained in step 1, then the encoder trained in step 1 is frozen (parameters remain fixed) and then the LAS decoder is trained).
Therefore, taking the teachings of Wang as modified by Laurent NPL, and Sainath NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition disclosed in Wang with the automatic speech recognition processing disclosed in Sainath NPL at least because doing so would achieve a reduction in word error rate (see Sainath NPL Abstract).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Wang et al., US 2020/0265831 A1, directed towards attention based end-to-end automatic speech recognition model training which generates n-best hypotheses, and uses a word based gradient with a loss function maximizing the distance between a reference sequence and a one-best hypothesis, to update the model.
Chiu et al., "State-of-the-Art Speech Recognition with Sequence-to-Sequence Models," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 4774-4778, doi: 10.1109/ICASSP.2018.8462105. Chiu discloses minimum word error rate training for an automatic speech recognition model, including a loss function that uses the average number of word errors over all hypotheses in an n-best list.
He et al., “Streaming End-to-end Speech Recognition For Mobile Devices,” arXiv:1811.06621v1 [cs.CL], November 15, 2018
https://doi.org/10.48550/arXiv.1811.06621. He discloses end-to-end speech recognition using an RNN-T based model, and training it with a data set that includes proper noun samples.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908. The examiner can normally be reached Monday-Friday, 09:30-18:30 EDT/EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MICHELLE M. KOETH
Primary Examiner
Art Unit 2656



/MICHELLE M KOETH/Primary Examiner, Art Unit 2656