DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments and amendments in the Amendment filed June 2, 2022 (herein “Amendment”) with respect to the rejections of claims 1-27 under 35 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground of rejection for the independent claims is made in view of Devries ET AL: "Workshop track ICLR 2017 DATASET AUGMENTATION IN FEATURE SPACE", 17 February 2017 (2017-02-17), XP055617306.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 5-6, 8, 10-17, 19-20, 22, and 24-28 are rejected under 35 U.S.C. 103 as being unpatentable over Ko et al., "A study on data augmentation of reverberant speech for robust speech recognition," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5220-5224, doi: 10.1109/ICASSP.2017.7953152 (herein “Ko”) in view of Goto et al., (US 2020/0110994 A1, herein “Goto”) further in view of De Vries et al., "Workshop track ICLR 2017 Dataset Augmentation in Feature Space", 17 February 2017 (2017-02-17), XP055617306 (herein “De Vries NPL”).
Regarding claim 1, Ko teaches a method of training an acoustic model, wherein the training includes a data preparation phase and a training loop which follows the data preparation phase (Ko page 5221, left column, and page 5222, section 4, algorithm for augmenting data which is then used to train acoustic models), wherein the training loop includes at least one epoch (Ko pages 5222-5223, section 4, data augmented with reverberation effects and noise used to train the acoustic model at least 2 epochs as shown in table 1), said method including: 
in the data preparation phase, providing training data, wherein the training data are or include time domain input audio data (Ko pages 5220-5221, training data for the acoustic model including simulated far-field speech (audio data) which is a speech signal); 
augmenting, thereby generating augmented training data (Ko page 5221, algorithm for simulating far-end speech which augments samples t of a speech database X with reverberation and noise to output a reverberated speech database (augmented training data)); and 
during each epoch of the training loop, using at least some of the augmented training data to train the model (Ko page 5223, table 1, the simulated reverb training data is used to train the acoustic model for at least 2 epochs).
While Ko discloses that multiple sets of training data are created by simulating far-end speech by augmenting speech signals, Ko does not explicitly teach that the simulation/data augmentation occurs “during the training loop” as claimed.
Further, Ko does not disclose that one or more frequency domain features are extracted from the audio data used for training, therefore Ko does not explicitly teach “one or more frequency domain features are extracted from.”
Still further, while Ko does teach data augmentation, Ko does not teach that it is augmenting “the one or more frequency features,” “in the frequency domain.” 
Lastly, while Ko does teach training a model with augmented training data, Ko does not teach using the augmented training data “in the frequency domain.”
Goto teaches during the training loop (Goto para. 23, augmented dataset generated during each pass through the training loop).
De Vries NPL teaches one or more frequency domain features are extracted from (De Vries NPL section 4.3, mel-frequency cepstrum coefficients (MFCCs- a type of frequency domain feature) are extracted from the Arabic digits dataset of 8,800 samples from audio clips).
De Vries NPL further teaches “the one or more frequency features,” “in the frequency domain” (De Vries NPL sections 3.1, 3.2 and 4.3, data augmentation is performed upon the MFCCs using a sequence encoder and is performed in the feature space (domain), where the features given in 4.3 are MFCCs which are frequency domain features).
De Vries NPL also teaches “in the frequency domain” (De Vries NPL sections 4.3, and 3.2, the vectors newly created as output from the data augmentation process in the feature space are used directly for input into a learning task).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to occur during the training loop as disclosed in Goto at least because doing so would allow for balancing of class accuracies in the classes processed by a neural network and thus provide greater overall accuracy of the neural network (see Goto paras. 28 and 31 and Abstract).
Further taking the teachings of Ko and De Vries NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to use MFCCs in data augmentation in feature space as disclosed in Devries at least because doing so would allow for generating synthetic testing data that is more realistic (see De Vries NPL page 1, section 1).
Regarding claim 2, Ko teaches by augmenting at least some of the training data using different sets of augmentation parameters drawn from a plurality of probability distributions (Ko pages 5221-5222, in algorithm 1 where the training data is output by augmenting inputted speech data, several probability distributions are used for the reverberation augmentation where there are multiple RIR (room impulse response) sets (sets of augmentation parameters) used to simulation reverb in different sized rooms, and where the probability distribution of all different rooms is computed by accumulating RIR probabilities corresponding to the specific rooms). 
Ko does not explicitly teach wherein different subsets of the augmented training data are generated during the training loop, for use in different epochs of the training loop .
Goto teaches wherein different subsets of the augmented training data are generated during the training loop, for use in different epochs of the training loop (Goto paras. 23 and 27, augmented data is generated during each pass through the training loop and shuffling is performed as part of data augmentation so that each batch of a set of N batches of training data is different for each epoch).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to occur during the training loop as disclosed in Goto at least because doing so would allow for balancing of class accuracies in the classes processed by a neural network and thus provide greater overall accuracy of the neural network (see Goto paras. 28 and 31 and Abstract).
Regarding claims 3 and 17, Ko teaches wherein the training data are indicative of a plurality of utterances of a user (Ko pages 5220-5221, input data is from a speech database, where the simulated far-field speech resulting from data augmentation is disclosed as using a speech signal (plurality of utterances) and a particular RIR corresponding to the speaker (user) position).
Regarding claims 5 and 19, while Ko teaches that the training data is MFCC feature based, Ko does not explicitly teach the limitations of claims 5 and 19. Devries teaches wherein the feature domain is the Mel Frequency Cepstral Coefficient (MFCC) domain, or the log of the band power for a plurality of frequency bands (Devries page 4, section 3.2, in augmenting a dataset, each sample is projected into feature space, where section 4 teaches that all experiments used the projection into feature space, and page 6, section 4.3 discloses an experiment using mel-frequency cepstrum coefficients (MFCCs)).
Therefore, taking the teachings of Ko and Devries together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to use MFCCs in data augmentation as disclosed in Devries at least because doing so would allow for generating synthetic testing data that is more realistic (see Devries page 1, section 1).
Regarding claims 6 and 20, Ko teaches wherein the acoustic model is a speech analytics model or a noise suppression model (Ko page 5222, section 4 the acoustic model training for speech detection (speech analytics model) as the metric measured for the results is a word error rate).
Regarding claim 8, Ko teaches wherein said augmentation includes at least one of adding fixed spectrum stationary noise, adding variable spectrum stationary noise, adding noise including one or more random stationary narrowband tones, adding reverberation, adding non-stationary noise, adding simulated echo residuals, simulating microphone equalization, simulating microphone cutoff, or varying broadband level (Ko pages 5220-5221, generation of simulated training data by data augmentation of real speech samples with reverberation).
Regarding claim 10, Ko teaches wherein the training data are indicative of features comprising frequency bands (Ko page 5222, 40-dimentional MFCCs are used for the acoustic model, thus the training data training the model also being MFCCs, which comprise frequency bands (Mel-Frequency cepstral coefficients)).
Regarding claims 11 and 25, Ko teaches wherein the frequency bands each occupy a constant proportion of the Mel spectrum, are equally spaced in log frequency, or are equally spaced in log frequency with the log scaled such that the features represent the band powers in decibels (dB) (Ko page 5222, MFCCs (Mel-Frequency Cepstral Coefficients) are used in the experiments with training the acoustic models, thus for the training data, where MFCCs feature frequency bands equally spaced (constant proportion) on the Mel-frequency scale (Mel spectrum)).
Regarding claim 12, Ko teaches wherein the augmenting is performed in a manner determined in part from the training data (Ko page 5221, algorithm 1, the data augmenting algorithm includes iterating through the input speech database data, reverb impulse response and point source noise in time, where the training data output is time-based, thus the manner (time domain) is determined from the training data also being in the time domain).
Regarding claim 13, Ko teaches wherein the training is implemented by a control system (Ko page 5223, table 1 systems for outputting data from the acoustic modeling and training same shown by type of training data, type of augmentation performed and type of test data used), the training includes providing the training data to the control system (Ko page 5221, algorithm 1, a reverberated speech database is output of algorithm 1 using a Kaldi math library, where page 5222 teaches the acoustic models are trained on the simulated reverberated data (training data), thus this data being provided to the acoustic model), and the training produces a trained acoustic model (Ko pages 5222-5223, acoustic models are trained with data reverberated per algorithm 1, thus the end result after the multiple training epochs to be trained acoustic models), 
Ko does not explicitly teach the control system includes one or more processors and one or more devices implementing non-transitory memory, and wherein the method includes: storing parameters of the trained acoustic model in one or more of the devices.
Goto teaches the control system includes one or more processors and one or more devices implementing non-transitory memory (Goto fig. 1, paras. 14-15, server system 10 with various blocks including machine learning system 15 and processor 16, and memory 17), and wherein the method includes: storing parameters of the trained acoustic model in one or more of the devices (Goto paras 15 and 21, machine learning system 120 including neural network 117 implemented by software and residing (storing) within memory 17).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to include the server structure as disclosed in Goto at least because doing so would allow for enabling users to submit datasets on their own for training a neural network in a client-server configuration (see Goto para 12).
Regarding claim 14, Ko teaches a control system, wherein the control system is configured to perform the method of claim 1 (Ko pages 5221 and 5223, table 1 systems for outputting data from the acoustic modeling and training same shown by type of training data, type of augmentation performed and type of test data used, and using algorithm 1)). Ko does not explicitly teach the remainder of claim 14.
Goto teaches an apparatus (Goto fig. 1, para. 10, server system 10), comprising an interface system (Goto para. 13, server equipped with input devices (interface)), and a control system including one or more processors and one or more devices implementing non-transitory memory (Goto fig. 1, paras. 14-15, server system 10 with various blocks including machine learning system 15 and processor 16, and memory 17)).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to include the server structure as disclosed in Goto at least because doing so would allow for enabling users to submit datasets on their own for training a neural network in a client-server configuration (see Goto 12).
Regarding claim 15, Ko teaches a system configured for training an acoustic model (Ko page 5223, table 1, reverberated system of training data and training an acoustic model), wherein the training includes a data preparation phase and a training loop which follows the data preparation phase (Ko page 5221, left column, and page 5222, section 4, algorithm for augmenting data which is then used to train acoustic models), wherein the training loop includes at least one epoch (Ko pages 5222-5223, section 4, data augmented with reverberation effects and noise used to train the acoustic model at least 2 epochs as shown in table 1), said system including: 
configured to implement the data preparation phase, including by receiving or generating training data, wherein the training data are or include time domain input audio data (Ko pages 5220-5221, training data for the acoustic model including simulated far-field speech (audio data) which is a speech signal); 
configured to augment, thereby generating augmented training data (Ko page 5221, algorithm for simulating far-end speech which augments samples t of a speech database X with reverberation and noise to output a reverberated speech database (augmented training data)), and to use at least some the augmented training data to train the model during each epoch of the training loop (Ko page 5223, table 1, the simulated reverb training data is used to train the acoustic model for at least 2 epochs).
While Ko discloses that multiple sets of training data are created by simulating far-end speech by augmenting speech signals, Ko does not explicitly teach that the simulation/data augmentation occurs “during the training loop” as claimed.
Further, Ko does not disclose that one or more frequency domain features are extracted from the audio data used for training, therefore Ko does not explicitly teach “one or more frequency domain features are extracted from.”
Still further, while Ko does teach data augmentation, Ko does not teach that it is augmenting “the one or more frequency features,” “in the frequency domain.” 
Lastly, while Ko does teach training a model with augmented training data, Ko does not teach using the augmented training data “in the frequency domain.”
Ko further does not teach a data preparation subsystem, coupled and, a training subsystem, coupled to the data preparation subsystem.
Goto teaches during the training loop (Goto para. 23, augmented dataset generated during each pass through the training loop).
Goto further teaches a data preparation subsystem, coupled and, a training subsystem, coupled to the data preparation subsystem (Goto fig. 1, paras. 12-17, data augmentation module which is part of (thus coupled to) neural network training module 120).
De Vries NPL teaches one or more frequency domain features are extracted from (De Vries NPL section 4.3, mel-frequency cepstrum coefficients (MFCCs- a type of frequency domain feature) are extracted from the Arabic digits dataset of 8,800 samples from audio clips).
De Vries NPL further teaches “the one or more frequency features,” “in the frequency domain” (De Vries NPL sections 3.1, 3.2 and 4.3, data augmentation is performed upon the MFCCs using a sequence encoder and is performed in the feature space (domain), where the features given in 4.3 are MFCCs which are frequency domain features).
De Vries NPL also teaches “in the frequency domain” (De Vries NPL sections 4.3, and 3.2, the vectors newly created as output from the data augmentation process in the feature space are used directly for input into a learning task).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to occur during the training loop and modules as disclosed in Goto and specifically cited to above, at least because doing so would allow for balancing of class accuracies in the classes processed by a neural network and thus provide greater overall accuracy of the neural network (see Goto paras. 28 and 31 and Abstract).
Further taking the teachings of Ko and Devries together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to use MFCCs in data augmentation in feature space as disclosed in Devries at least because doing so would allow for generating synthetic testing data that is more realistic (see Devries page 1, section 1).
Regarding claim 16, Ko teaches including by augmenting at least some of the training data using different sets of augmentation parameters drawn from a plurality of probability distributions (Ko pages 5221-5222, in algorithm 1 where the training data is output by augmenting inputted speech data, several probability distributions are used for the reverberation augmentation where there are multiple RIR (room impulse response) sets (sets of augmentation parameters) used to simulation reverb in different sized rooms, and where the probability distribution of all different rooms is computed by accumulating RIR probabilities corresponding to the specific rooms). 
Ko does not explicitly teach wherein the training subsystem is configured to different subsets of the augmented training data are generated during the training loop, for use in different epochs of the training loop .
Goto teaches wherein the training subsystem is configured to (Goto para. 12, neural network training module).
Goto further teaches wherein different subsets of the augmented training data are generated during the training loop, for use in different epochs of the training loop (Goto paras. 23 and 27, augmented data is generated during each pass through the training loop and shuffling is performed as part of data augmentation so that each batch of a set of N batches of training data is different for each epoch).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to occur during the training loop and modules as disclosed in Goto and specifically cited to above, at least because doing so would allow for balancing of class accuracies in the classes processed by a neural network and thus provide greater overall accuracy of the neural network (see Goto paras. 28 and 31 and Abstract).
Regarding claim 22, Ko teaches wherein augment the training data including by performing at least one of adding fixed spectrum stationary noise, adding variable spectrum stationary noise, adding noise including one or more random stationary narrowband tones, adding reverberation, adding non-stationary noise, adding simulated echo residuals, simulating microphone equalization, simulating microphone cutoff, or varying broadband level (Ko pages 5220-5221, generation of simulated training data by data augmentation of real speech samples with reverberation).
Ko does not explicitly teach the training subsystem is configured to.
Goto teaches wherein the training subsystem is configured to (Goto para. 12, neural network training module).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to include the modules as disclosed in Goto at least because doing so would allow for enabling users to submit datasets on their own for training a neural network in a client-server configuration (see Goto 12).
Regarding claim 24, Ko teaches wherein the training data are indicative of features comprising frequency (Ko page 5222, 40-dimentional MFCCs are used for the acoustic model, thus the training data training the model also being MFCCs, which comprise frequency bands/ thus frequency (Mel-Frequency cepstral coefficients)).
Regarding claim 26, Ko teaches wherein is configured to augment the training data in a manner determined in part from said training data (Ko page 5221, algorithm 1, the data augmenting algorithm includes iterating through the input speech database data, reverb impulse response and point source noise in time, where the training data output is time-based, thus the manner (time domain) is determined from the training data also being in the time domain).
Ko does not explicitly teach the training subsystem is configured to.
Goto teaches wherein the training subsystem is configured to (Goto para. 12, neural network training module).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to include the modules as disclosed in Goto at least because doing so would allow for enabling users to submit datasets on their own for training a neural network in a client-server configuration (see Goto 12).
Regarding claim 27, Ko teaches is configured to produce a trained acoustic model (Ko pages 5222-5223, acoustic models are trained with data reverberated per algorithm 1, thus the end result after the multiple training epochs to be trained acoustic models), 
Ko does not explicitly teach wherein the training sub-system includes one or more processors and one or more devices implementing non-transitory memory, and the training sub-system is configured to store parameters of the trained acoustic model in one or more of the devices.
Goto teaches the training sub-system includes one or more processors and one or more devices implementing non-transitory memory (Goto fig. 1, paras. 14-15, server system 10 with various blocks including machine learning system 15 and processor 16, and memory 17), and the training sub-system is configured to store parameters of the trained acoustic model in one or more of the devices (Goto paras 15 and 21, machine learning system 120 including neural network 117 implemented by software and residing (storing) within memory 17).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to include the server structure as disclosed in Goto at least because doing so would allow for enabling users to submit datasets on their own for training a neural network in a client-server configuration (see Goto para 12).
Regarding claim 28, Ko does not explicitly teach the limitations of claim 28. 
Goto teaches further comprising applying a first augmentation during a first training epoch and applying a second augmentation during a second training epoch (Goto para. 34, fig. 4, training at a first epoch is performed with data augmented at a step 420, then 440 the dataset is augmented by an updated variable (thus a different, second augmentation) and the neural network is trained with the new augmented data (second epoch), where para 24 teaches that an epoch is a number of times the same training data set is used to train the neural network - thus in the change of the training data via augmentation via an updated variable, when training the neural network with the second augmented data, it is different data, and thus a different (second) epoch).
De Vries teaches applying a first augmentation to the one or more frequency domain features and applying a second augmentation to the one or more frequency domain features (De Vries sections 4.3 and 3.2, multiple augmentations within the feature space of the spoken Arabic digits dataset, which has MFCC features (frequency domain features) including adding random noise, and interpolating or extrapolating between data sample vectors).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to occur during the training loop, each loop having differently augmented data (hence difference epochs) as disclosed in Goto at least because doing so would allow for balancing of class accuracies in the classes processed by a neural network and thus provide greater overall accuracy of the neural network (see Goto paras. 28 and 31 and Abstract).
Further taking the teachings of Ko and De Vries NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to use MFCCs in data augmentation in feature space, and with different types of augmentation in that feature space, as disclosed in Devries at least because doing so would allow for generating synthetic testing data that is more realistic (see De Vries NPL page 1, section 1).
Claims 7, 9, 21 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Ko in view of Goto in view of De Vries NPL, as set forth above regarding claim 1 from which claims 7 and 9 depend, and regarding claim 15 from which claims 21 and 23 depend, further in view of Peddinti et al., "JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS," 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2015, pp. 539-546, doi: 10.1109/ASRU.2015.7404842 (herein “Peddinti”).
Regarding claim 7, Ko teaches wherein said training is or includes training a deep neural network (DNN), or a convolutional neural network (CNN), or a recurrent neural network (RNN), or an HMM-GMM acoustic model (Ko page 5222 section 3, a time-delay neural network (TDNN) was used for the acoustic model, where page 5223 teaches the TDNNs used are similar to the ones specified in reference 4 (which is the Peddinti reference)).
Peddinti teaches that the TDNN is a deep neural network (Peddinti page 540, figure 1, as shown, the TDNN as 4 layers, making it deep).
Therefore, taking the teachings of Ko and Peddinti together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the acoustic model of the TDNN of Ko to be a deep neural network as disclosed in Peddinti, at least because Ko teaches that it uses the TDNN of Peddinti, and because such a TDNN would allow for the neural network to deal with late reverberations in an audio signal with reverberation noise (see Peddinti page 539, Introduction).
Regarding claim 9, Ko, as modified by Goto which performs said augmentation as disclosed above. Ko modified by Goto however does not teach “is implemented in or on one or more Graphics Processing Units (GPUs). Peddinti teaches is implemented in or on one or more Graphics Processing Units (GPUs) (Peddinti Abstract, training of acoustic models using 32 GPUs).
Therefore, taking the teachings of Ko and Peddinti together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation of Ko to use a GPU as disclosed in Peddinti, at least because doing so would exponentially decrease the learning rate schedule (see Peddinti page 541, section 2.5).
Regarding claim 21, Ko teaches wherein the is configured to train the model including by training a deep neural network (DNN), or a convolutional neural network (CNN), or a recurrent neural network (RNN), or an HMM-GMM acoustic model (Ko page 5222 section 3, a time-delay neural network (TDNN) was used for the acoustic model, where page 5223 teaches the TDNNs used are similar to the ones specified in reference 4 (which is the Peddinti reference)).
Peddinti teaches that the TDNN is a deep neural network (Peddinti page 540, figure 1, as shown, the TDNN as 4 layers, making it deep).
Ko modified by Peddinti does not teach the training subsystem.
Goto teaches wherein the training subsystem (Goto para. 12, neural network training module).
Therefore, taking the teachings of Ko and Goto together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation disclosed in Ko to include the modules as disclosed in Goto at least because doing so would allow for enabling users to submit datasets on their own for training a neural network in a client-server configuration (see Goto 12).
Further, taking the teachings of Ko and Peddinti together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the acoustic model of the TDNN of Ko to be a deep neural network as disclosed in Peddinti, at least because Ko teaches that it uses the TDNN of Peddinti, and because such a TDNN would allow for the neural network to deal with late reverberations in an audio signal with reverberation noise (see Peddinti page 539, Introduction).
Regarding claim 23, Ko, as modified by Goto which performs said augmentation as disclosed above, and by the training subsystem configured to do same. Ko modified by Goto however does not teach “is implemented in or on one or more Graphics Processing Units (GPUs). Peddinti teaches is implemented in or on one or more Graphics Processing Units (GPUs) (Peddinti Abstract, training of acoustic models using 32 GPUs).
Therefore, taking the teachings of Ko and Peddinti together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data augmentation of Ko to use a GPU as disclosed in Peddinti, at least because doing so would exponentially decrease the learning rate schedule (see Peddinti page 541, section 2.5).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Pratap et al., "Wav2Letter++: A Fast Open-source Speech Recognition System," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6460-6464, doi: 10.1109/ICASSP.2019.8683535. Pratap is directed towards an automatic speech recognition system that is trained with speech data that has undergone a data preparation step.
Lee et al., "Personalizing Recurrent-Neural-Network-Based Language Model by Social Network," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 3, pp. 519-530, March 2017, doi: 10.1109/TASLP.2016.2635445. Lee is directed towards personalizing language models for specific users with training data that is for a specific user by interpolating a background corpus according to a personal corpus.
Li et al., "Multi-stream Network With Temporal Attention For Environmental Sound Classification", arXiv:1901.08608v1 [cs.SD], Jan. 24, 2019. Li is directed towards training a neural network including a data augmentation step by mixing training samples.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908. The examiner can normally be reached Monday-Friday, 09:30-18:30 EDT/EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant can use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice, or call the examiner directly during the above stated office hours.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MICHELLE M. KOETH
Primary Examiner
Art Unit 2656




/MICHELLE M KOETH/Primary Examiner, Art Unit 2656