DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This Office Action is in response to correspondence filed 14 July 2022 in refence to application 16/963,837.  Claims 1,2, and 4-21 are pending and have been examined.

Response to Amendment
The amendment filed 14 July 2022 has been accepted and considered in this office action.  Claims 1, 2, 4, and 5 have been amended, claim 3 cancelled and claims 6-21 added.

Response to Arguments
Applicant's arguments filed 14 July 2022 have been fully considered but they are not persuasive.  Applicant argues, see Remarks pages 9-10 that Prabhavalkar does not specifically teach the limitations of “obtains a parameter set for an entirety of the speech recognition model by backpropagation, based on a loss in word error rate and a policy gradient approximating a gradient of loss, the entirety of the speech recognition model minimizing an expected value of summation of loss in the word error rates.”  The examiner respectfully disagrees.  Although, Prabhavalkar does not specifically use the words “back propagation,” it is clear through the discussions of section 3.1-3.3 that the models are trained during this method.  Sections 3.1-3.3 discuss gradient loss functions based on approximated word error rates, and training to minimize these loss functions.  Moreover, Su et al. (Error Back Propagation for Sequence Training of Context-Dependent Deep Networks for Conversational Speech Transaction), cited by Prabhavalkar, discusses in great detail using back propagation to train recognition networks using word error loss functions.  Prabhavalkar at section 3.1 teaches a loss gradient function and section 3.3 teaches loss which is based on word error rates.  Although Applicant may have different methods for determining these functions, these differences are not claimed.   For these reasons, Examiner believes Prabhavalkar to teach the limitations as claimed.

Claim Rejections - 35 USC § 102
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Claim(s) 1, 4-11, and 15-21 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Prabhavalkar et al. (Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models).

Consider claim 1, Prabhavalkar teaches a learning device (abstract), comprising: 
extracting circuitry that extracts features of speech from speech data for training (section 2, parameterizing speech feature vectors); 
probability calculating circuitry that, based on of the features of speech, performs prefix searching using a speech recognition model of which a neural network is representative, and calculates a posterior probability of a recognition character string to obtain a plurality of hypothetical character strings (section 3.2, determining n-best hypothesis labels with neural network, section 2, using previously determined labels); 
error calculating circuitry that calculates an error by word error rates of the plurality of hypothetical character strings and a correct character string for training, and obtains a parameter for the entire speech 3Docket No. 14724US01Preliminary Amendment recognition model by backpropagation, on the basis of loss in word error rate and a policy gradient approximating the gradient of loss (section 2.2 3.1, and 3.3, using gradient decent to train neural network to minimize loss function, back feeding from ground truth, i.e. back propagation) the entirety of speech recognition model minimizing an expected value of summation of loss in the word error rates (section 2.2, 3.1, and 3.3, determining loss function, or error, which is used to train neural network); and 
updating circuitry that updates a parameter of the speech recognition model in accordance with the parameter obtained by the error calculating circuitry (section 2.2 3.1, and 3.3, using gradient decent to train neural network to minimize loss function).

Consider claim 4, Prabhavalkar teaches A learning method executed by a learning device (abstract), the method comprising: 
extracting features of speech from speech data for training (section 2, parameterizing speech feature vectors); 
based on the features of speech, performing prefix searching using a speech recognition model of which a neural network is representative, and calculates a posterior probability of a recognition character string to obtain a plurality of hypothetical character strings (section 3.2, determining n-best hypothesis labels with neural network, section 2, using previously determined labels); 
calculating an error by word error rates of the plurality of hypothetical character strings and a correct character string for training, and obtains a parameter for the entire speech 3Docket No. 14724US01Preliminary Amendment recognition model by backpropagation, on the basis of loss in word error rate and a policy gradient approximating the gradient of loss (section 2.2 3.1, and 3.3, using gradient decent to train neural network to minimize loss function, back feeding from ground truth, i.e. back propagation) the entirety of speech recognition model minimizing an expected value of summation of loss in the word error rates (section 2.2, 3.1, and 3.3, determining loss function, or error, which is used to train neural network); and 
updating parameter of the speech recognition model in accordance with the parameter obtained by the error calculating circuitry (section 2.2 3.1, and 3.3, using gradient decent to train neural network to minimize loss function).

Consider claim 5, Prabhavalkar teaches a non-transitory computer readable medium storing computer executable instructions, which when executed by a computer cause causes the computer to: 
extract features of speech from speech data for training (section 2, parameterizing speech feature vectors); 
based on of the features of speech, perform prefix searching using a speech recognition model of which a neural network is representative, and calculates a posterior probability of a recognition character string to obtain a plurality of hypothetical character strings (section 3.2, determining n-best hypothesis labels with neural network, section 2, using previously determined labels); 
calculate an error by word error rates of the plurality of hypothetical character strings and a correct character string for training, and obtains a parameter for the entire speech 3Docket No. 14724US01Preliminary Amendment recognition model by backpropagation, on the basis of loss in word error rate and a policy gradient approximating the gradient of loss (section 2.2 3.1, and 3.3, using gradient decent to train neural network to minimize loss function, back feeding from ground truth, i.e. back propagation) the entirety of speech recognition model minimizing an expected value of summation of loss in the word error rates (section 2.2, 3.1, and 3.3, determining loss function, or error, which is used to train neural network);+ and 
update parameter of the speech recognition model in accordance with the parameter obtained by the error calculating circuitry (section 2.2 3.1, and 3.3, using gradient decent to train neural network to minimize loss function).

Consider claim 6, Prabhavalkar teaches the learning device according to claim 1, further comprising storage circuitry that stores the parameter of the speech recognition model updated in the updating (section 4, model is trained and implemented on a computing device. Inherently the model parameters must be stored in order to be used by the computer for recognition as described in section 4.).

Consider claim 7, Prabhavalkar teaches the learning device according to claim 6, wherein the storage circuitry further stores the parameter set (section 4, model is trained and implemented on a computing device. Inherently the model parameters must be stored in order to be used by the computer for recognition as described in section 4.).

Consider claim 8, Prabhavalkar teaches the learning device according to claim 1, wherein the error calculating circuitry obtains the parameter set by backpropagation starting from a gradient regarding a loss parameter to minimize loss in the summation of the word error rates at each character in a hypothetical character string using a set of smallest elements selected by a minimum operation that makes up a final word error count (sections 3-3.3, training the model to minimize the expected word errors over the sequence of output characters for each training sequence).

Consider claim 9, Prabhavalkar teaches the learning device according to Claim 1, wherein the probability calculating circuitry calculates the posterior probability of the recognition character string based on the speech features extracted by the extracting circuitry and the parameter set for the speech recognition model (section 4, outputting probabilities of symbols based on parameterized input features).

Consider claim 10, Prabhavalkar teaches the learning device according to claim 7, wherein the probability calculating circuitry calculates the posterior probability of the recognition character string based on the speech features extracted by the extracting circuitry and the parameter set for the speech recognition model (section 4, outputting probabilities of symbols based on parameterized input features).

Consider claim 11, Prabhavalkar teaches the learning device according to claim 1, wherein the probability calculating circuitry outputs a character string, of the plurality of hypothetical character strings, having a highest probability of calculated probabilities as recognition results (section 3.2, decoding n-best lists for example).

Claim 15 contains similar limitations as claim 6 and is therefore rejected for the same reasons. 

Claim 15 contains similar limitations as claim 6 and is therefore rejected for the same reasons. 

Claim 16 contains similar limitations as claim 7 and is therefore rejected for the same reasons. 

Claim 17 contains similar limitations as claim 8 and is therefore rejected for the same reasons. 

Claim 18 contains similar limitations as claim 9 and is therefore rejected for the same reasons. 

Claim 19 contains similar limitations as claim 10 and is therefore rejected for the same reasons. 

Claim 20 contains similar limitations as claim 11 and is therefore rejected for the same reasons. 

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Claim 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Prabhavalkar in view of Gemmeke (US PAP 2018/015177).

Consider claim 2 Prabhavalkar teaches the learning device according to claim 1, but does not specifically teach wherein the probability calculating circuitry selects a character candidate following a prefix that is an object of searching, on the basis of a polynomial distribution in accordance with a co- occurrence probability of a character candidate following a prefix that is an object of searching.
In the same field of speech recognition Gemmeke teaches wherein the probability calculating circuitry selects a character candidate following a prefix that is an object of searching, on the basis of a polynomial distribution in accordance with a co- occurrence probability of a character candidate following a prefix that is an object of searching (0065, 0069, and 0088, using multinomial distributions to model co-occurrence probabilities given as sequence of acoustic observations).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use multinomial distributions as taught by Gemmeke in the system of Prabhavalkar in order to improve the quality of the speech recognition system (Gemmeke 0003).

Claim 12-14 and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Prabhavalkar in view of Senior et al (An Empirical Study of Learning Rates in Deep Neural Networks for Speech Recognition).

Consider claim 12, Prabhavalkar teaches the learning device according to claim 1, further comprising but does not specifically teach determining circuitry that determines whether the parameter set has converged or not.
In the same field of training neural networks for speech recognition, Senior teaches determining circuitry that determines whether the parameter set has converged or not (section 2.1, model is trained to convergence of minimum error.)
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to detect convergence as taught by Senior in the system of Prabhavalkar in order to use an extremely well known method of determining when a neural network model has been optimized to minimize expected error rates.

Consider claim 13, Senior teaches the learning device according to claim 12, wherein in a case that the determining circuitry determines that the parameter set has converged, the updating circuitry stops updating of parameters of the speech recognition model (section 2.1, model is trained to convergence of minimum error, then training ends).

Consider claim 14, Senior teaches the learning device according to claim 12, wherein in a case that the determining circuitry determines that the parameter set has not converged, the updating circuitry continues to update parameters of the speech recognition model (section 2.1, model is trained to convergence of minimum error, then training ends).

Claim 21 contains similar limitations as claim 12 and is therefore rejected for the same reasons. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DOUGLAS C GODBOLD whose telephone number is (571)270-1451. The examiner can normally be reached 6:30am-5pm Monday-Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

DOUGLAS GODBOLD
Examiner
Art Unit 2655



/DOUGLAS GODBOLD/Primary Examiner, Art Unit 2655