DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation is “a shared encoder configured to encode” in claim 14.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 7 and 17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Specifically, claims 7 and 17 both recite the limitation "assigning the probability.” There is insufficient antecedent basis for this limitation in the claim. However, claims 1 and 11 which claims 7 and 17 respectively depend, do not provide  antecedent basis for “the probability.” Rather, there is recited “the corresponding probability.” It is unclear and indefinite whether “the probability” in claims 7 and 17 intend to refer back to the claimed “the corresponding probability” in claims 1 and 11, o if the intent is to recite a new probability altogether. 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 7-12, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al., (US 11,037,547 B2, herein “Wang”) in view of Laurent et al., "Improving recognition of proper nouns (in ASR) through generation and filtering of phonetic transcriptions" Computer Speech and Language, Elsevier, 2014, 28 (4), pp.979-996. 10.1016/j.csl.2014.02.006. hal-01433238 (herein “Laurent NPL”).
Regarding claim 1, Wang teaches a computer-implemented method that when executed on data processing hardware causes the data processing to perform operations comprising (Wang col. 2, ll. 4-10, a processor that operates (method) as instructed by program code, an attention based end-to-end training of an automatic speech recognition system): 
Regarding claim 11, Wang teaches a system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising (Wang col. 2, ll. 4-10, and 27-30, a processor that operates  as instructed by program code/instructions stored on a non-transitory computer readable medium, an attention based end-to-end training of an automatic speech recognition system):
Regarding claims 1 and 11, Wang teaches training a speech recognition model with a minimum word loss function by (Wang col. 7, ll. 1-2, and col. 8, ll. 40-55, equation 12, method for training an automatic speech recognition system, that uses a token wise training in beam (TWTiB) loss function, where tokens can represent words (such as “wreck” in the given example of “It is not easy to wreck a nice beach”)): 
receiving a training example (Wang col. 7, ll. 30-41, in the training process, X is input audio data, and r is a “reference token”/golden/labeled token); 
generating a plurality of hypotheses corresponding to the training example, each hypothesis of the plurality of hypotheses representing the (Wang col. 8, ll. 20-45, token-wise training (TWT) is applied on multiple hypothesis where yt is an output token (generated) of a hypothesis j in a beam, and where col. 1, ll. 9-10 teach that tokens are mapped to words, which proper nouns are (although Wang does not disclose proper nouns as discussed below)) and comprising a corresponding probability that indicates a likelihood that the hypothesis represents the (Wang col. 8, ll. 33-35, hypothesis has its own posterior probability indicating whether it is the best hypothesis (likelihood of representing the correct output)); 
determining that the corresponding probability associated with one of the plurality of hypotheses satisfies a penalty criteria, the penalty criteria indicating that: the corresponding probability satisfies a probability threshold (Wang col. 8, ll. 33-35, a hypothesis, including the first wrong token ytw, thus having a penalty/incorrectness associated with the hypothesis, is selected (determining) as the hypothesis with the highest posterior probability (satisfies a probability threshold), that is, it is a one-best hypothesis); and the associated hypothesis incorrectly represents the (Wang col. 8, ll. 33-51, fig. 4, the hypothesis with the highest posterior probability incorrectly has the word “wreck” as a token to represent the output, when the reference/correct token is “rec” of “recognize”); and 
applying a penalty to the loss function (Wang col. 8, ll. 35-45, as given in equation 12, the overall loss function L(θ) is the summation of the loss functions for individual tokens  lθ, and as such, the loss function for the selected hypothesis with the first wrong token contributes (with its error) to the summation, thus a penalty).
While Wang teaches using a loss function in its ASR training, Wang does not explicitly teach the error function is a “minimum word error rate” loss function. However, Wang teaches in col. 3, ll. 7-9 that the token-wise training method disclosed is flexible and can be combined with a variety of loss functions, and in col. 1, ll. 22-27 teach a loss function as an expected number of word errors. Therefore, while Wang does not explicitly teach using the expected number of word errors as a loss function to be combined with the loss function disclosed in the embodiment of fig. 3, cited above, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the loss function of an expected number of word errors as disclosed in the background section of Wang with the loss function for the speech recognition training disclosed in the fig. 3 embodiment of Wang at least because doing so would provide a training criteria close to commonly used evaluation metrics (see Wang col. 1, ll. 7-13) and thus be combining prior art elements according to known methods to yield predictable results. see MPEP 2143(I)(A).
Further, while Wang does disclose using a training corpus, which more than likely has proper noun samples within it, Wang does not explicitly disclose “a training example comprising a proper noun.” Accordingly, while Wang does teach predicted output tokens, and reference tokens, Wang does not teach the tokens are proper nouns.
Laurent NPL teaches a training example comprising a proper noun (Laurent NPL page 17, section 7.1 corpus used for training including annotations for named entities allowing easy spotting of proper nouns).
Therefore, taking the teachings of Wang and Laurent NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the training data of Wang to include the annotated named entity corpus data of Laurent NPL at least because doing so would provide a reduction in word error rate for an ASR system (see Laurent NPL page 2, section 1).
Regarding claims 2 and 12, Wang teaches wherein the corresponding probability satisfies the probability threshold when the corresponding probability is greater than the corresponding probabilities associated with the other hypothesis (Wang col. 8, ll. 33-39, a hypothesis that has the highest posterior probability, the one-best hypothesis, is selected, thus, having the highest meaning that it is greater than the posterior probabilities of all the other hypothesis).
Regarding claims 7 and 17, Wang teaches wherein the operations further comprise assigning the probability to each hypothesis of the plurality of hypotheses (Wang col. 8, ll. 33-35, hypothesis has its own posterior probability indicating whether it is the best hypothesis (likelihood of representing the correct output)).
Regarding claims 8 and 18, Wang teaches wherein the operations further comprise: receiving an incorrect hypothesis (Wang col. 8, ll. 33-49, a hypothesis that includes a wrong token is selected); and assigning a respective probability to the incorrect hypothesis (Wang col. 8, ll. 33-35, the hypothesis with the wrong token has a highest posterior probability), wherein the penalty criteria further comprises an indication that the hypothesis includes the generated incorrect hypothesis (Wang col. 8, ll. 20-45, loss function L(θ) including a loss function for an output token that is the first wrong token which is given a -log term in the loss function, thus an indication that the token it represents is incorrect).
Regarding claims 9 and 19, Wang teaches wherein the incorrect hypothesis comprises a phonetically similarity (Wang col. 8, ll. 48-51, fig. 4, the incorrect token is “wreck” in “wreck a nice” and the actual correct token is “rec” from “recognize”, where wreck and rec are phonetically similar).
While Wang discloses speech recognition generally, which could include training data that includes labeled proper nouns, Wang does not explicitly teach “the proper noun.”
Laurent NPL teaches the proper noun (Laurent NPL page 17, section 7.1 corpus used for training including annotations for named entities allowing easy spotting of proper nouns).
Therefore, taking the teachings of Wang and Laurent NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the training data of Wang to include the annotated named entity corpus data of Laurent NPL at least because doing so would provide a reduction in word error rate for an ASR system (see Laurent NPL page 2, section 1).
Regarding claims 10 and 20, Wang teaches wherein the operations further comprise substituting the incorrect hypothesis for a generated hypothesis of the plurality of hypotheses (Wang col. 7, ll. 15-63, and col. 8, ll. 40-55, in cross-entropy training of the model, a first wrong token is corrected though updating the model based on the total loss, and an output token is substituted by a reference token).
Claims 3-6 and 13-16 are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Laurent NPL, as set forth above regarding claim 1 from which claim 3 depends, further in view of Sainath et al., “Two-Pass End-to-End Speech Recognition,” arXiv:1908.10992v1 [cs.CL], August 29, 2019, https://doi.org/10.48550/arXiv.1908.10992 (herein “Sainath NPL”).
Regarding claims 3 and 13, Wang does not teach the limitations of claims 3 or 13. Sainath NPL teaches [wherein the speech recognition model comprises a two-pass architecture – claim 3 only] comprising:  a first pass network comprising a recurrent neural network transducer (RNN-T) decoder; and a second pass network comprising a listen-attend-spell (LAS) decoder [wherein the speech recognition model comprises the first pass network and the second pass network – claim 13 only]. (Sainath NPL section 2.1, two-pass end-to-end automatic speech recognition, in the first pass through an RNN-T decode, in the second pass through an LAS decoder).
Therefore, taking the teachings of Wang as modified by Laurent NPL, and Sainath NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition disclosed in Wang with the automatic speech recognition processing disclosed in Sainath NPL at least because doing so would achieve a reduction in word error rate (see Sainath NPL Abstract).
Regarding claims 4 and 14, Wang does not teach the limitations of claims 4 and 14. Sainath NPL teaches [wherein the speech recognition model – claim 4 only] further compris[es – claim 4/ing – claim 14] a shared encoder [, the shared encoder encoding – claim 4/configured to encode – claim 14] acoustic frames for each of the first pass network and the second pass network (Sainath NPL section 2.1, in the first pass, each acoustic frame is passed through a share encoder, then later the output of the shared encoder is passed to an LAS decoder in a second pass).
Therefore, taking the teachings of Wang as modified by Laurent NPL, and Sainath NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition disclosed in Wang with the automatic speech recognition processing disclosed in Sainath NPL at least because doing so would achieve a reduction in word error rate (see Sainath NPL Abstract).
Regarding claims 5 and 15, Wang does not teach the limitations of claims 5 and 15. Sainath NPL teaches wherein training with the minimum word error rate loss function occurs at the LAS decoder (Sainath NPL section 2.3.2, an additional training step is performed with the LAS decoder following an MWER (minimum word error rate) loss function).
Therefore, taking the teachings of Wang as modified by Laurent NPL, and Sainath NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition disclosed in Wang with the automatic speech recognition processing disclosed in Sainath NPL at least because doing so would achieve a reduction in word error rate (see Sainath NPL Abstract).
Regarding claims 6 and 16, Wang does not teach the limitations of claims 6 and 16. Sainath NPL teaches wherein the operations further comprise: training the RNN-T decoder; and prior to training the LAS decoder with the minimum word error rate loss function, training the LAS decoder while parameters of the trained RNN-T decoder remain fixed (Sainath NPL section 2.3.1, in the training process, the RNN-T decoder is trained in step 1, then the encoder trained in step 1 is frozen (parameters remain fixed) and then the LAS decoder is trained).
Therefore, taking the teachings of Wang as modified by Laurent NPL, and Sainath NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition disclosed in Wang with the automatic speech recognition processing disclosed in Sainath NPL at least because doing so would achieve a reduction in word error rate (see Sainath NPL Abstract).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Wang et al., US 2020/0265831 A1, directed towards attention based end-to-end automatic speech recognition model training which generates n-best hypotheses, and uses a word based gradient with a loss function maximizing the distance between a reference sequence and a one-best hypothesis, to update the model.
Chiu et al., "State-of-the-Art Speech Recognition with Sequence-to-Sequence Models," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 4774-4778, doi: 10.1109/ICASSP.2018.8462105. Chiu discloses minimum word error rate training for an automatic speech recognition model, including a loss function that uses the average number of word errors over all hypotheses in an n-best list.
He et al., “Streaming End-to-end Speech Recognition For Mobile Devices,” arXiv:1811.06621v1 [cs.CL], November 15, 2018
https://doi.org/10.48550/arXiv.1811.06621. He discloses end-to-end speech recognition using an RNN-T based model, and training it with a data set that includes proper noun samples.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908. The examiner can normally be reached Monday-Friday, 09:30-18:30 EDT/EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MICHELLE M. KOETH
Primary Examiner
Art Unit 2656



/MICHELLE M KOETH/Primary Examiner, Art Unit 2656