DETAILED ACTION
Response to Amendment
Claims 1-27 are pending. Claims 1-27 are amended directly or by dependency on an amended claim.
Response to Arguments
Applicant’s arguments, see pages 8-9, filed April 28, 2021, with respect to the 35 USC 103 rejections of claims 1-27, along with accompanying amendments received on the same date, have been fully considered and are persuasive.  The 35 USC 103 rejections of claims 1-27 have been withdrawn. 
Allowable Subject Matter
Claims 1-27 are allowed.
The following is an examiner’s statement of reasons for allowance: Applicant arguments as noted above with respect to the cited prior art were persuasive. Representative noted in interview that support for these amendments can be found in Fig. 8D and paragraphs 206-210 of the specification. The following art is also cited, but not sufficient to disclose, teach or fairly suggest the subject matter of the independent claims:

“REGULARIZATION OF CONTEXT-DEPENDENT DEEP NEURAL NETWORKS WITH CONTEXT-INDEPENDENT MULTI-TASK TRAINING”: We propose a DNN acoustic model which jointly predicts both CD and CI units using multitask learning

“Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition”: There is an efficient unsupervised algorithm, first described in [24], for learning the connection weights in a DBN that is equivalent to training each adjacent pair of unsupervised pre-training phase, Hinton et al. [24] used the up-down algorithm to optimize all of the DBN weights jointly

“CONTEXT DEPENDENT PHONE MODELS FOR LSTM RNN ACOUSTIC MODELLING”: Section 3.2 describes our system architecture and data, while Section 4 describes initial experiments with context independent (CI) models and further experiments with duration modelling and CD phone models. The weights of all layers are randomly initialized prior to training.

US 10229672 B1: The training process may include "flat start" training, in which the CTC models do not rely on any other acoustic model (e.g., GMM, DNN, or other non-CTC acoustic model) and do not use any previously determined phonetic alignments or CD phone information.  The CTC acoustic models may be trained from acoustic data and non-phonetically-aligned word-level transcriptions of the acoustic data.  For example, the CI phone may be trained directly from written-domain word transcripts by aligning with all possible phonetic verbalizations; In another general aspect, a method of training an acoustic model includes: accessing, by the one or more computers, training data including audio data corresponding to utterances and transcriptions for the utterances; training, by the one or more computers, a first connectionist temporal classification (CTC) acoustic model to indicate labels for context-independent states; using, by the one or more computers, the first CTC acoustic model to determine approximate alignments between the audio data in the training data and phonetic sequences corresponding to the transcriptions in the training data; determining a set of First, the connectionist temporal classification (CTC) technique is used to train a model with context independent (CI) phones directly from written-domain word transcripts by aligning with all possible phonetic verbalizations.  Then, a set of CD phones is generated using the CTC CI phone model alignments and train a CD phone model to improve the accuracy.  This end-to-end training process does not require any previously trained GMM-HMM or DNN model for CD phone generation or alignment, and thus significantly reduces the overall model building time; Using the generated CD phone inventory, the computing system trains a unidirectional CTC model (e.g., a CD-CTC-sMBR model) predicting the CD phones identified using the trained bidirectional CTC model.  A context dependency transducer, D, is constructed from the CD phone inventory, that maps CD phones to CI phones

US 20150127327 A1: The first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network; At a given parent node of the one or more decision trees, a training frame corresponding to a context independent state can be assigned to one of a plurality of context dependent child nodes.  The multiple context-dependent states can be derived from the plurality of context-independent states using divisive, likelihood-based K-means clustering 

US 20210065683 A1: Embodiments are associated with a speaker-independent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-independent attention-based encoder-decoder model associated with a first output distribution, a speaker-dependent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-dependent attention-based encoder-decoder model associated with a second output distribution, training of the second attention-based encoder-decoder model to classify output tokens based on input speech frames of a target speaker and simultaneously training the speaker-dependent attention-based encoder-decoder model to maintain a similarity between the first output distribution and the second output distribution, and performing automatic speech recognition on speech frames of the target speaker using the trained speaker-dependent attention-based encoder-decoder model

US 20140142929 A1: In various embodiments, the model striping 122 may be applied more frequently to the training of context-dependent DNNs because in context-independent DNNs the top layer size is typically much smaller than that in the context-dependent DNNs.  By implementing model striping with respect to the top layer 114(N), the input v.sup.l of the top layer 114(N) may be distributed across the multi-core processors 108(1)-108(N) in forward propagation, in which each of the multi-core processors 108(1)-108(N) may compute a slice of the output vector

US 20180173240 A1: For example, the behavior prediction module 308 may invoke one or more machine learning models or algorithms, for example recurrent neural network (RNN) model, to learn from and make prediction on data provided by the vehicle-independent and/or vehicle-dependent information, thereby training the model(s) to predict one or more trajectories training set of vehicle-independent and/or vehicle-dependent information.

US 20170160813 A1: From this training data, the training engine 1607 may derive speaker-dependent models, in the same way that the training engine 1607 derives speaker-independent models.  In these implementations, the system includes a combining engine 1632, which, when the system 1600 is in operative mode, can combine the speaker-dependent models with the speaker-adapted models

US 9953634 B1: The speaker-dependent keyword sensing model 330 can replace the speaker-independent keyword sensing model 320 after the training is complete or substantially complete (see FIG. 3B, for example.) In some embodiments, multiple utterances are collected in a batch mode during mobile device operation and can be used to train the 
speaker-dependent keyword sensing model 330 in a background process.

US 9477925 B2: The training of the DNNs 112 may be achieved by pipelining computations of back-propagation in a parallelized fashion (i.e., simultaneously executing multiple computations)

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084.  The examiner can normally be reached on 10-7 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VINCENT M RUDOLPH can be reached on (571)272-8243.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MICHELLE M ENTEZARI/Primary Examiner, Art Unit 2661