Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter
Examiner’s reason for Allowance
Claims 1-20 are allowed. 
Claim 1, A computer-implemented method for training a transcription neural network,
 the method comprising: inputting an utterance that comprises a set of spectrogram frames covering time steps of the utterance into a first layer of the transcription neural network that evaluates,
 for each time step of a set of time steps, a spectrogram frame from the set of spectrogram frames and an associated context of one or more spectrogram frames;
 obtaining predicted character probabilities for the utterance from the transcription neural network; using the predicted character probabilities for the utterance
 and a corresponding ground truth transcription for the utterance to determine a loss in predicting the corresponding ground truth transcription for the utterance; 
and updating one or more parameters of the transcription neural network using a gradient based upon the loss in predicating the utterance.
The following is an examiner's statement of reasons for allowance:Regarding claim 1 the prior art of record, specifically Huang et al. (US Patent Application Publication #20140257805 ) teaches a deep neural network (DNN) utilizing multilingual training data, as well as performing a recognition task through utilization of a DNN trained with multilingual training data, are now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more aspects.  (Paragraphs 0020).
 Deng et al. (US 20120065976) teaches speech recognition system 102 comprises a context-dependent Deep Belief Network (DBN)--Hidden Markov Model (HMM) system 106. A DBN is a probabilistic generative model with multiple layers of stochastic hidden units above a single bottom layer of observed variables that represent a data vector. Feed-forward artificial neural networks (ANNs) whose weights have been initialized by way of a pretraining phase described below can also be considered DBNs. (Paragraphs 0019).
However, none of the prior art cited alone or in combination provides the motivation to teach obtaining predicted character probabilities for the utterance from the transcription neural network; using the predicted character probabilities for the utterance
 and a corresponding ground truth transcription for the utterance to determine a loss in predicting the corresponding ground truth transcription for the utterance; 
and updating one or more parameters of the transcription neural network using a gradient based upon the loss in predicating the utterance.
Claim 10, A computer-implemented method for transcribing speech comprising: generating a set of spectrogram frames for an input audio; 

inputting the set of spectrogram frames into a transcription neural network; obtaining predicted character probabilities outputs from the transcription neural network; and 
decoding a predicted transcription of the input audio using the predicted character probabilities outputs from the transcription neural network constrained by a 5Appl. No.16/542,243Atty. Docket No. 28888-1910D (BN150625USN1-DIV1)Office Action Date 25 August 2021Response Date language model that interprets a string of characters from the predicted character probabilities outputs as a word or words.
The following is an examiner's statement of reasons for allowance:Regarding claim 10 the prior art of record, specifically Huang et al. (US Patent Application Publication #20140257805 ) teaches a deep neural network (DNN) utilizing multilingual training data, as well as performing a recognition task through utilization of a DNN trained with multilingual training data, are now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more aspects.  (Paragraphs 0020).
 Deng et al. (US 20120065976) teaches speech recognition system 102 comprises a context-dependent Deep Belief Network (DBN)--Hidden Markov Model (HMM) system 106. A DBN is a probabilistic generative model with multiple layers of stochastic hidden units above a single bottom layer of observed variables that represent a data vector. Feed-forward artificial neural networks (ANNs) whose weights have been initialized by way of a pretraining phase described below can also be considered DBNs. (Paragraphs 0019).
However, none of the prior art cited alone or in combination provides the motivation to teach inputting the set of spectrogram frames into a transcription neural network; obtaining predicted character probabilities outputs from the transcription neural network; and 
decoding a predicted transcription of the input audio using the predicted character probabilities outputs from the transcription neural network constrained by a 5Appl. No.16/542,243Atty. Docket No. 28888-1910D (BN150625USN1-DIV1)Office Action Date 25 August 2021Response Date language model that interprets a string of characters from the predicted character probabilities outputs as a word or words.
Claim 16, A non-transitory computer-readable medium or media comprising one or more sequences of instructions which, 
when executed by one or more processors, causes steps to be performed comprising:
 generating a set of spectrogram frames for an input audio; inputting the set of spectrogram frames into a transcription neural network; 
obtaining predicted character probabilities outputs from the transcription neural network;
 andOffice Action Date 25 August 2021Response Date decoding a predicted transcription of the input audio using the predicted character probabilities outputs from the transcription neural network constrained by a language model that interprets a string of characters from the predicted character probabilities outputs as a word or words.
The following is an examiner's statement of reasons for allowance:Regarding claim 16 the prior art of record, specifically Huang et al. (US Patent Application Publication #20140257805 ) teaches a deep neural network (DNN) utilizing multilingual training data, as well as performing a recognition task through utilization of a DNN trained with multilingual training data, are now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more aspects.  (Paragraphs 0020).
 Deng et al. (US 20120065976) teaches speech recognition system 102 comprises a context-dependent Deep Belief Network (DBN)--Hidden Markov Model (HMM) system 106. A DBN is a probabilistic generative model with multiple layers of stochastic hidden units above a single bottom layer of observed variables that represent a data vector. Feed-forward artificial neural networks (ANNs) whose weights have been initialized by way of a pretraining phase described below can also be considered DBNs. (Paragraphs 0019).
However, none of the prior art cited alone or in combination provides the motivation to teach generating a set of spectrogram frames for an input audio; inputting the set of spectrogram frames into a transcription neural network; 
obtaining predicted character probabilities outputs from the transcription neural network;
 andOffice Action Date 25 August 2021Response Date decoding a predicted transcription of the input audio using the predicted character probabilities outputs from the transcription neural network constrained by a language model that interprets a string of characters from the predicted character probabilities outputs as a word or words.
Conclusion
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Akwasi M Sarpong whose telephone number is (571)270-3438. The examiner can normally be reached Mon-Fri. 8:00am-4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KING D POON can be reached on 571-272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AKWASI M SARPONG/Primary  Examiner, Art Unit 2675                                                                                                                                                                                                        08/26/2022