DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on October 8, 2020 has been considered by the examiner, but fails to comply with 37 CFR 1.98(a)(2), which requires a legible copy of each cited foreign patent document; each non-patent literature publication or that portion which caused it to be listed; and all other information or that portion which caused it to be listed.  In particular, a copy of the DELCROIX, M. et al "Context Adaptive Neural Network Based Acoustics Models for Rapid Adaptation" IEEE/ACM Transactions on Audio Speech and Language Processing volume 376 Issue 5 published in May 2018, has not been filed. This reference citation has been lined out in the annotated IDS attached herein.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-4, 6, 8, 12-18, 20, 22, and 26-27 are rejected under 35 U.S.C. 103 as being unpatentable over Ko et al., "A study on data augmentation of reverberant speech for robust speech recognition," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5220-5224, doi: 10.1109/ICASSP.2017.7953152 (herein “Ko”) in view of Goto et al., (US 2020/0110994 A1, herein “Goto”).
Regarding claim 1, Ko teaches a method of training an acoustic model, wherein the training includes a data preparation phase and a training loop which follows the data preparation phase (Ko page 5221, left column, and page 5222, section 4, algorithm for augmenting data which is then used to train acoustic models), wherein the training loop includes at least one epoch (Ko pages 5222-5223, section 4, data augmented with reverberation effects and noise used to train the acoustic model at least 2 epochs as shown in table 1), said method including: 
in the data preparation phase, providing training data, wherein the training data are or include at least one example of audio data (Ko pages 5220-5221, training data for the acoustic model including simulated far-field speech (audio data) which is a speech signal augmented by a room impulse response (noise/distortion) and additive noise sources); 
augmenting the training data, thereby generating augmented training data (Ko page 5221, algorithm for simulating far-end speech which augments samples t of a speech database X with reverberation and noise to output a reverberated speech database (augmented training data)); and 
during each epoch of the training loop, using at least some the augmented training data to train the model (Ko page 5223, table 1, the simulated reverb training data is used to train the acoustic model for at least 2 epochs).
While Ko discloses that multiple sets of training data are created by simulating far-end speech by augmenting speech signals, Ko does not explicitly teach that the simulation/data augmentation occurs “during the training loop” as claimed.
Goto teaches during the training loop (Goto para. 23, augmented dataset generated during each pass through the training loop).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to occur during the training loop as disclosed in Goto at least because doing so would allow for 
Regarding claim 2, Ko teaches by augmenting at least some of the training data using different sets of augmentation parameters drawn from a plurality of probability distributions (Ko pages 5221-5222, in algorithm 1 where the training data is output by augmenting inputted speech data, several probability distributions are used for the reverberation augmentation where there are multiple RIR (room impulse response) sets (sets of augmentation parameters) used to simulation reverb in different sized rooms, and where the probability distribution of all different rooms is computed by accumulating RIR probabilities corresponding to the specific rooms). 
Ko does not explicitly teach wherein different subsets of the augmented training data are generated during the training loop, for use in different epochs of the training loop .
Goto teaches wherein different subsets of the augmented training data are generated during the training loop, for use in different epochs of the training loop (Goto paras. 23 and 27, augmented data is generated during each pass through the training loop and shuffling is performed as part of data augmentation so that each batch of a set of N batches of training data is different for each epoch).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to occur during the training loop as disclosed in Goto at least because doing so would allow for 
Regarding claims 3 and 17, Ko teaches wherein the training data are indicative of a plurality of utterances of a user (Ko pages 5220-5221, input data is from a speech database, where the simulated far-field speech resulting from data augmentation is disclosed as using a speech signal (plurality of utterances) and a particular RIR corresponding to the speaker (user) position).
Regarding claim 4, Ko teaches wherein the training data are indicative of features extracted from time domain input audio data (Ko page 5222, section 3, 40-dimensional MFCCs (features) were used in the experiment, where page 5221, Algorithm 1 discloses iterating for each recording x[t] (time domain input) in the speech database X), and the augmentation occurs in at least one feature domain (Ko page 5221, algorithm 1 teaching that the data augmentation is performed in the time domain).
Regarding claims 6 and 20, Ko teaches wherein the acoustic model is a speech analytics model or a noise suppression model (Ko page 5222, section 4 the acoustic model training for speech detection (speech analytics model) as the metric measured for the results is a word error rate).
Regarding claim 8, Ko teaches wherein said augmentation includes at least one of adding fixed spectrum stationary noise, adding variable spectrum stationary noise, adding noise including one or more random stationary narrowband tones, adding reverberation, adding non-stationary noise, adding simulated echo residuals, simulating microphone equalization, simulating microphone cutoff, or varying broadband level (Ko pages 5220-5221, generation of simulated training data by data augmentation of real speech samples with reverberation).
Regarding claim 12, Ko teaches wherein the augmenting is performed in a manner determined in part from the training data (Ko page 5221, algorithm 1, the data augmenting algorithm includes iterating through the input speech database data, reverb impulse response and point source noise in time, where the training data output is time-based, thus the manner (time domain) is determined from the training data also being in the time domain).
Regarding claim 13, Ko teaches wherein the training is implemented by a control system (Ko page 5223, table 1 systems for outputting data from the acoustic modeling and training same shown by type of training data, type of augmentation performed and type of test data used), the training includes providing the training data to the control system (Ko page 5221, algorithm 1, a reverberated speech database is output of algorithm 1 using a Kaldi math library, where page 5222 teaches the acoustic models are trained on the simulated reverberated data (training data), thus this data being provided to the acoustic model), and the training produces a trained acoustic model (Ko pages 5222-5223, acoustic models are trained with data reverberated per algorithm 1, thus the end result after the multiple training epochs to be trained acoustic models), 
Ko does not explicitly teach the control system includes one or more processors and one or more devices implementing non-transitory memory, and wherein the method includes: storing parameters of the trained acoustic model in one or more of the devices.
Goto fig. 1, paras. 14-15, server system 10 with various blocks including machine learning system 15 and processor 16, and memory 17), and wherein the method includes: storing parameters of the trained acoustic model in one or more of the devices (Goto paras 15 and 21, machine learning system 120 including neural network 117 implemented by software and residing (storing) within memory 17).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to include the server structure as disclosed in Goto at least because doing so would allow for enabling users to submit datasets on their own for training a neural network in a client-server configuration (see Goto para 12).
Regarding claim 14, Ko teaches a control system, wherein the control system is configured to perform the method of claim 1 (Ko pages 5221 and 5223, table 1 systems for outputting data from the acoustic modeling and training same shown by type of training data, type of augmentation performed and type of test data used, and using algorithm 1)). Ko does not explicitly teach the remainder of claim 14.
Goto teaches an apparatus (Goto fig. 1, para. 10, server system 10), comprising an interface system (Goto para. 13, server equipped with input devices (interface)), and a control system including one or more processors and one or more devices implementing non-transitory memory (Goto fig. 1, paras. 14-15, server system 10 with various blocks including machine learning system 15 and processor 16, and memory 17)).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to include the server structure as disclosed in Goto at least because doing so would allow for enabling users to submit datasets on their own for training a neural network in a client-server configuration (see Goto 12).
Regarding claim 15, Ko teaches a system configured for training an acoustic model (Ko page 5223, table 1, reverberated system of training data and training an acoustic model), wherein the training includes a data preparation phase and a training loop which follows the data preparation phase (Ko page 5221, left column, and page 5222, section 4, algorithm for augmenting data which is then used to train acoustic models), wherein the training loop includes at least one epoch (Ko pages 5222-5223, section 4, data augmented with reverberation effects and noise used to train the acoustic model at least 2 epochs as shown in table 1), said system including: 
configured to implement the data preparation phase, including by receiving or generating training data, wherein the training data are or include at least one example of audio data (Ko pages 5220-5221, training data for the acoustic model including simulated far-field speech (audio data) which is a speech signal augmented by a room impulse response (noise/distortion) and additive noise sources); 
configured to augment the training data, thereby generating augmented training data (Ko page 5221, algorithm for simulating far-end speech which augments samples t of a speech database X with reverberation and noise to output a reverberated speech database (augmented training data)), and to use at least some the augmented training data to train the model during each epoch of the training loop (Ko page 5223, table 1, the simulated reverb training data is used to train the acoustic model for at least 2 epochs).
While Ko discloses that multiple sets of training data are created by simulating far-end speech by augmenting speech signals, Ko does not explicitly teach that the simulation/data augmentation occurs “during the training loop” as claimed.
Ko further does not teach a data preparation subsystem, coupled and, a training subsystem, coupled to the data preparation subsystem.
Goto teaches during the training loop (Goto para. 23, augmented dataset generated during each pass through the training loop).
Goto further teaches a data preparation subsystem, coupled and, a training subsystem, coupled to the data preparation subsystem (Goto fig. 1, paras. 12-17, data augmentation module which is part of (thus coupled to) neural network training module 120).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to occur during the training loop and modules as disclosed in Goto and specifically cited to above, at least because doing so would allow for balancing of class accuracies in the classes processed by a neural network and thus provide greater overall accuracy of the neural network (see Goto paras. 28 and 31 and Abstract).
Regarding claim 16, Ko teaches including by augmenting at least some of the training data using different sets of augmentation parameters drawn from a plurality of probability distributions (Ko pages 5221-5222, in algorithm 1 where the training data is output by augmenting inputted speech data, several probability distributions are used for the reverberation augmentation where there are multiple RIR (room impulse response) sets (sets of augmentation parameters) used to simulation reverb in different sized rooms, and where the probability distribution of all different rooms is computed by accumulating RIR probabilities corresponding to the specific rooms). 
Ko does not explicitly teach wherein the training subsystem is configured to different subsets of the augmented training data are generated during the training loop, for use in different epochs of the training loop .
Goto teaches wherein the training subsystem is configured to (Goto para. 12, neural network training module).
Goto further teaches wherein different subsets of the augmented training data are generated during the training loop, for use in different epochs of the training loop (Goto paras. 23 and 27, augmented data is generated during each pass through the training loop and shuffling is performed as part of data augmentation so that each batch of a set of N batches of training data is different for each epoch).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to occur during the training loop and modules as disclosed in Goto and specifically cited to above, at least because doing so would allow for balancing of class accuracies in the 
Regarding claim 18, Ko teaches wherein the training data are indicative of features extracted from time domain input audio data (Ko page 5222, section 3, 40-dimensional MFCCs (features) were used in the experiment, where page 5221, Algorithm 1 discloses iterating for each recording x[t] (time domain input) in the speech database X), and configured to augment the training data in at least one feature domain (Ko page 5221, algorithm 1 teaching that the data augmentation is performed in the time domain).
Goto teaches wherein the training subsystem is configured to (Goto para. 12, neural network training module).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to include the modules as disclosed in Goto at least because doing so would allow for enabling users to submit datasets on their own for training a neural network in a client-server configuration (see Goto 12).
Regarding claim 22, Ko teaches wherein augment the training data including by performing at least one of adding fixed spectrum stationary noise, adding variable spectrum stationary noise, adding noise including one or more random stationary narrowband tones, adding reverberation, adding non-stationary noise, adding simulated echo residuals, simulating microphone equalization, simulating microphone cutoff, or Ko pages 5220-5221, generation of simulated training data by data augmentation of real speech samples with reverberation).
Ko does not explicitly teach the training subsystem is configured to.
Goto teaches wherein the training subsystem is configured to (Goto para. 12, neural network training module).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to include the modules as disclosed in Goto at least because doing so would allow for enabling users to submit datasets on their own for training a neural network in a client-server configuration (see Goto 12).
Regarding claim 26, Ko teaches wherein is configured to augment the training data in a manner determined in part from said training data (Ko page 5221, algorithm 1, the data augmenting algorithm includes iterating through the input speech database data, reverb impulse response and point source noise in time, where the training data output is time-based, thus the manner (time domain) is determined from the training data also being in the time domain).
Ko does not explicitly teach the training subsystem is configured to.
Goto teaches wherein the training subsystem is configured to (Goto para. 12, neural network training module).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to include the 
Regarding claim 27, Ko teaches is configured to produce a trained acoustic model (Ko pages 5222-5223, acoustic models are trained with data reverberated per algorithm 1, thus the end result after the multiple training epochs to be trained acoustic models), 
Ko does not explicitly teach wherein the training sub-system includes one or more processors and one or more devices implementing non-transitory memory, and the training sub-system is configured to store parameters of the trained acoustic model in one or more of the devices.
Goto teaches the training sub-system includes one or more processors and one or more devices implementing non-transitory memory (Goto fig. 1, paras. 14-15, server system 10 with various blocks including machine learning system 15 and processor 16, and memory 17), and the training sub-system is configured to store parameters of the trained acoustic model in one or more of the devices (Goto paras 15 and 21, machine learning system 120 including neural network 117 implemented by software and residing (storing) within memory 17).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to include the server structure as disclosed in Goto at least because doing so would allow for enabling .
Claims 5 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Ko in view of Goto, as set forth above regarding claim 4 from which claim 5 depends, and as set forth above regarding claim 15 from which claim 19 depends, further in view of Terrance Devries ET AL: "Workshop track ICLR 2017 DATASET AUGMENTATION IN FEATURE SPACE", 17 February 2017 (2017-02-17), XP055617306 (cited in the IDS filed 12/7/2021, herein “Devries”).
Regarding claims 5 and 19, while Ko teaches that the training data is MFCC feature based, Ko does not explicitly teach the limitations of claims 5 and 19. Devries teaches wherein the feature domain is the Mel Frequency Cepstral Coefficient (MFCC) domain, or the log of the band power for a plurality of frequency bands (Devries page 4, section 3.2, in augmenting a dataset, each sample is projected into feature space, where section 4 teaches that all experiments used the projection into feature space, and page 6, section 4.3 discloses an experiment using mel-frequency cepstrum coefficients (MFCCs)).
Therefore, taking the teachings of Ko and Devries together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to use MFCCs in data augmentation as disclosed in Devries at least because doing so would allow for generating synthetic testing data that is more realistic (see Devries page 1, section 1).
Claims 7, 9, 21 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Ko in view of Goto, as set forth above regarding claim 4 from which claim 5 depends, further in view of Peddinti et al., "JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS," 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2015, pp. 539-546, doi: 10.1109/ASRU.2015.7404842 (herein “Peddinti”).
Regarding claim 7, Ko teaches wherein said training is or includes training a deep neural network (DNN), or a convolutional neural network (CNN), or a recurrent neural network (RNN), or an HMM-GMM acoustic model (Ko page 5222 section 3, a time-delay neural network (TDNN) was used for the acoustic model, where page 5223 teaches the TDNNs used are similar to the ones specified in reference 4 (which is the Peddinti reference)).
Peddinti teaches that the TDNN is a deep neural network (Peddinti page 540, figure 1, as shown, the TDNN as 4 layers, making it deep).
Therefore, taking the teachings of Ko and Peddinti together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the acoustic model of the TDNN of Ko to be a deep neural network as disclosed in Peddinti, at least because Ko teaches that it uses the TDNN of Peddinti, and because such a TDNN would allow for the neural network to deal with late reverberations in an audio signal with reverberation noise (see Peddinti page 539, Introduction).
Regarding claim 9, Ko, as modified by Goto which performs said augmentation as disclosed above. Ko modified by Goto however does not teach “is implemented in or Peddinti Abstract, training of acoustic models using 32 GPUs).
Therefore, taking the teachings of Ko and Peddinti together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation of Ko to use a GPU as disclosed in Peddinti, at least because doing so would exponentially decrease the learning rate schedule (see Peddinti page 541, section 2.5).
Regarding claim 21, Ko teaches wherein the is configured to train the model including by training a deep neural network (DNN), or a convolutional neural network (CNN), or a recurrent neural network (RNN), or an HMM-GMM acoustic model (Ko page 5222 section 3, a time-delay neural network (TDNN) was used for the acoustic model, where page 5223 teaches the TDNNs used are similar to the ones specified in reference 4 (which is the Peddinti reference)).
Peddinti teaches that the TDNN is a deep neural network (Peddinti page 540, figure 1, as shown, the TDNN as 4 layers, making it deep).
Ko modified by Peddinti does not teach the training subsystem.
Goto teaches wherein the training subsystem (Goto para. 12, neural network training module).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to include the modules as disclosed in Goto at least because doing so would allow for enabling users 
Further, taking the teachings of Ko and Peddinti together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the acoustic model of the TDNN of Ko to be a deep neural network as disclosed in Peddinti, at least because Ko teaches that it uses the TDNN of Peddinti, and because such a TDNN would allow for the neural network to deal with late reverberations in an audio signal with reverberation noise (see Peddinti page 539, Introduction).
Regarding claim 23, Ko, as modified by Goto which performs said augmentation as disclosed above, and by the training subsystem configured to do same. Ko modified by Goto however does not teach “is implemented in or on one or more Graphics Processing Units (GPUs). Peddinti teaches is implemented in or on one or more Graphics Processing Units (GPUs) (Peddinti Abstract, training of acoustic models using 32 GPUs).
Therefore, taking the teachings of Ko and Peddinti together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation of Ko to use a GPU as disclosed in Peddinti, at least because doing so would exponentially decrease the learning rate schedule (see Peddinti page 541, section 2.5).
Claims 10, 11, 24 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Ko in view of Goto, as set forth above regarding claim 1 from which claim 10 depends, further in view of Cui et al., (US 2017/0200446 A1, herein “Cui”).
Regarding claim 10, Ko teaches wherein the training data are indicative of features comprising frequency bands, the features are extracted from time domain input audio data (Ko page 5222, 40-dimentional MFCCs are used for the acoustic model, thus the training data training the model also being MFCCs, which comprise frequency bands (Mel-Frequency cepstral coefficients) and where page 5221, Algorithm 1 teaches iterating through the speech database per time sample (time domain input)).
Ko does not explicitly teach the augmentation occurs in the frequency domain.
Cui teaches the augmentation occurs in the frequency domain (Cui fig. 2, paras. 25 and 29-30, SFM (disclosed as stochastic feature mapping, a data augmentation process in para. 19) using a feature space of the feature extraction pipeline, which includes transforming the speech input into a spectrum (frequency domain) using FFT 201).
Therefore, taking the teachings of Ko and Cui together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation of Ko to use the frequency feature domain as disclosed in Cui at least because doing so would reduce the word error rate of the data output and processed through the trained model (see Cui para. 52).
Regarding claims 11 and 25, Ko teaches wherein the frequency bands each to occupy a constant proportion of the Mel spectrum, or are equally spaced in log frequency, or are equally spaced in log frequency with the log scaled such that the Ko page 5222, MFCCs (Mel-Frequency Cepstral Coefficients) are used in the experiments with training the acoustic models, thus for the training data, where MFCCs feature frequency bands equally spaced (constant proportion) on the Mel-frequency scale (Mel spectrum)).
Regarding claim 24, Ko teaches wherein the training data are indicative of features comprising frequency bands, extract the features from time domain input audio data (Ko page 5222, 40-dimentional MFCCs are used for the acoustic model, thus the training data training the model also being MFCCs, which comprise frequency bands (Mel-Frequency cepstral coefficients) and where page 5221, Algorithm 1 teaches iterating through the speech database per time sample (time domain input)).
Ko does not explicitly teach the augmentation occurs in the frequency domain.
Ko further does not explicitly teach the data preparation subsystem is configured to, or the training subsystem is configured to.
Goto teaches the data preparation subsystem is configured to, and the training subsystem is configured to (Goto para. 12, neural network training module and data augmentation module).
Cui teaches the augmentation occurs in the frequency domain (Cui fig. 2, paras. 25 and 29-30, SFM (disclosed as stochastic feature mapping, a data augmentation process in para. 19) using a feature space of the feature extraction pipeline, which includes transforming the speech input into a spectrum (frequency domain) using FFT 201).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the 
Further, taking the teachings of Ko and Cui together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation of Ko to use the frequency feature domain as disclosed in Cui, at least because doing so would reduce the word error rate of the data output and processed through the trained model (see Cui para. 52).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Pratap et al., "Wav2Letter++: A Fast Open-source Speech Recognition System," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6460-6464, doi: 10.1109/ICASSP.2019.8683535. Pratap is directed towards an automatic speech recognition system that is trained with speech data that has undergone a data preparation step.
Lee et al., "Personalizing Recurrent-Neural-Network-Based Language Model by Social Network," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 3, pp. 519-530, March 2017, doi: 10.1109/TASLP.2016.2635445. Lee is directed towards personalizing language models for specific users with training data that is for a specific user by interpolating a background corpus according to a personal corpus.
Li et al., "Multi-stream Network With Temporal Attention For Environmental Sound Classification", arXiv:1901.08608v1 [cs.SD], Jan. 24, 2019. Li is directed towards training a neural network including a data augmentation step by mixing training samples.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908. The examiner can normally be reached Monday-Friday, 09:30-18:30 EDT/EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MICHELLE M. KOETH
Primary Examiner




/MICHELLE M KOETH/Primary Examiner, Art Unit 2656