DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-2, 8-13 and 19-22 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Le Roux et al. (US 2020/0058314).

Claims 1, 12 and 20,
Le Roux teaches a method of training a neural network for de-noising audio enhancement, the method comprising: creating simulated noisy speech data from high quality speech data ([Fig. 4] [0003] [0075] training of an audio signal processing system 400 for speech enhancement; a noisy input speech signal 405 including a mixture of speech and noise and the corresponding clean signals 461 for the speech); and 
performing training on a neural network using the high quality speech data and the simulated noisy speech data to train the neural network to create de-noised speech data given noisy speech data, wherein performing the training includes minimizing errors in the neural network according to at least one of: a decoding error of an Automatic Speech Recognition (ASR) system processing current de-noised speech data results that are generated by the neural network during the training; and spectral distance between the high quality speech data and the current de- noised speech data results that are generated by the neural network during the training ([Fig. 1c] [0012] [0075] training a neural network for an audio signal processing system 400 for speech enhancement to separate the speech from the noise within a noisy speech signal; the noisy input signal 405 is processed by an enhancement network 454 to compute a filter 460 for the target signal; an objective function computation module 463 computes an objective function by computing a distance between the clean speech and the enhanced speech; the objective function can be used by a network training module 457 to update the network parameters 452; an algorithm that is optimized (minimize) under an objective function (the distance between its estimated magnitude with respect to the true magnitude)).

Claims 2 and 13,
Le Roux further teaches the method of Claim 1 further comprising: generating the current de-noised speech data results during the training by processing at least a portion of the simulated noisy speech data with the neural network ([0075] enhancing speech based on the noisy input speech signal; the noisy input signal 405 is processed by an enhancement network 454 to compute a filter 460 for the target signal, using stored network parameters 452).

Claims 8 and 19,
Le Roux further teaches the method of Claim 1 further comprising: performing the training by training the neural network to learn a maximum-likely encryption of the high quality speech data given the simulated noisy speech data ([0076] magnitude computation module 550 can use these probabilities as a plurality of weighted magnitude codes 570 to combine multiple values in the magnitude codebook 576 in a weighted fashion, or it can use only the largest probability as a unique magnitude code 570 to select the corresponding value in the magnitude codebook 576, or it can use a single value sampled according to these probabilities as a unique magnitude code 570 to select the corresponding value in the magnitude codebook 576, among multiple ways of using the output of the enhancement network 554 to obtain a filter magnitude 574).

Claim 9,
Le Roux further teaches the method of Claim 1 wherein minimizing the errors in the neural network includes: adjusting one or more weights of the neural network ([0075-0076] magnitude computation module 550 can use these probabilities as a plurality of weighted magnitude codes 570 to combine multiple values in the magnitude codebook 576 in a weighted fashion).

Claims 10 and 20,
Le Roux further teaches the method of Claim 1 further comprising: after the training, processing noisy speech data using the trained neural network to determine enhanced speech data ([0084] a neural network trained to process the noisy audio signal).

Claims 11 and 21,
Le Roux further teaches the method of Claim 1 wherein the training is deep normalizing flow training ([0009] using deep neural networks or deep recurrent neural networks).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Le Roux et al. (US 2020/0058314) and further in view of Naylor-Teece et al. (US 2021/0110812).

Claims 3 and 14,
Le Roux teaches all the limitations in claim 2. The difference between the prior art and the claimed invention is that Le Roux does not explicitly teach determining the decoding error during the training by comparing (1) speech recognition results generated by the ASR system processing the current de-noised speech data results and (2) a transcript of at least a portion of the high quality speech data upon which the at least a portion of the simulated noisy speech data was created.
Naylor-Teece teaches determining the decoding error during the training by comparing (1) speech recognition results generated by the ASR system processing the current de-noised speech data results and (2) a transcript of at least a portion of the high quality speech data upon which the at least a portion of the simulated noisy speech data was created ([Abstract] a difference between a result of speech recognition performed on the input audio data and a result of speech recognition performed on an instance of corresponding output audio data is determined).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Le Roux with teachings of Naylor-Teece by modifying method and system for enhancing audio signals corrupted by noise as taught by Le Roux to include determining the decoding error during the training by comparing (1) speech recognition results generated by the ASR system processing the current de-noised speech data results and (2) a transcript of at least a portion of the high quality speech data upon which the at least a portion of the simulated noisy speech data was created as taught by Naylor-Teece for the benefit of improving the quality of speech for listeners to the audio output of the sound system (Naylor-Teece [0001]).

Claims 4 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Le Roux et al. (US 2020/0058314) and further in view of Niemisto (RU 2517315 C2).

Claims 4 and 15,
Le Roux teaches all the limitations in claim 1. The difference between the prior art and the claimed invention is that Le Roux does not explicitly teach collecting the high quality speech data in a low noise environment.
Niemisto teaches collecting the high quality speech data in a low noise environment ([pg. 14] configured to collect high-quality speech reproduction data in a low noise environment).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Le Roux with teachings of Niemisto by modifying method and system for enhancing audio signals corrupted by noise as taught by Le Roux to include collecting the high quality speech data in a low noise environment as taught by Niemisto for the benefit of improving the method and system of audio processing (Niemisto [pg. 2]). 

Claims 5-6 and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Le Roux et al. (US 2020/0058314) and further in view of Nagatani (EP 0989540).

Claims 5 and 16,
Le Roux teaches all the limitations in claim 1. The difference between the prior art and the claimed invention is that Le Roux does not explicitly teach creating the simulated noisy speech data by adding reverberation to the high quality speech data using convolution.
Nagatani teaches creating the simulated noisy speech data by adding reverberation to the high quality speech data using convolution ([pg. 7] adding reverberation to original sound using convolution).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Le Roux with teachings of Nagatani by modifying method and system for enhancing audio signals corrupted by noise as taught by Le Roux to include creating the simulated noisy speech data by adding reverberation to the high quality speech data using convolution as taught by Nagatani for the benefit of obtaining more natural and high quality audio signal (Nagatani [pg. 2]). 

Claims 6 and 17,
Nagatani further teaches the method of Claim 5 further comprising: adding the reverberation using convolution by accessing a database comprising at least one of: measured impulse responses from a reverberant environment and synthetically generated impulse responses ([pg. 3] digitally synthesizes reverberation; generating reverberation in a real hall or with a steel-plate echo apparatus, collecting an impulse response corresponding to the generated reverberation).

Claims 7 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Le Roux et al. (US 2020/0058314) and further in view of Kuroiwa et al. (JP H08211888).

Claims 7 and 18,
Le Roux teaches all the limitations in claim 1. The difference between the prior art and the claimed invention is that Le Roux does not explicitly teach collecting data from an environment in which the ASR system is to be deployed; and creating the simulated noisy speech data in accordance with the data collected from the environment.
Kuroiwa teaches collecting data from an environment in which the ASR system is to be deployed; and creating the simulated noisy speech data in accordance with the data collected from the environment ([0012] [pg. 3] the environment-adaptive speech recognition apparatus includes an adding means for superimposing the learning environmental noise data on the learning speech data to create the noise-superimposed learning speech data).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Le Roux with teachings of Kuroiwa by modifying method and system for enhancing audio signals corrupted by noise as taught by Le Roux to include collecting data from an environment in which the ASR system is to be deployed; and creating the simulated noisy speech data in accordance with the data collected from the environment as taught by Kuroiwa for the benefit of improving environment adaptation method in speech recognition using a statistical model (Kuroiwa [0001]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Chen et al. (CN 110992934) teaches a method for defending and defending device for a voice recognition system black box attack model. the defending method using the original audio adding environmental noise simulation, simulation of reality scene voice input condition, adding random noise to form primary against sample, to optimize against sample through the genetic algorithm and gradient estimation. obtaining precisely against the sample, then the original audio file and a counter sample is mixed, as the training data against the training set, the model for the training, improving the model against samples of the recognition accuracy, so as to improve the model for robustness against attack.
Kupryjanow et al. (US 20190043491) teaches a processor-implemented method for training a recursive neural network (RNN) to generate a time-frequency mask (TFM), the method comprising: selecting, by a processor-based system, a sample clean-speech signal from a speech database; selecting, by the processor-based system, a reverberation filter from a filter database; selecting, by the processor-based system, a sample noise-signal from a noise database; scaling, by the processor-based system, the sample noise-signal based on a selected signal to noise ratio (SNR); applying, by the processor-based system, the reverberation filter to the sample clean-speech signal, and adding the scaled sample noise-signal to the result of the reverberation filter application, to generate a noisy-speech signal; generating, by the processor-based system, an estimated TFM based on application of the RNN to features extracted from the noisy-speech signal; generating, by the processor-based system, a target TFM based on a ratio of features extracted from the sample clean-speech signal to the features extracted from the noisy-speech signal; and training, by the processor-based system, the RNN based on a calculated error between the estimated TFM and the target TFM.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHREYANS A PATEL whose telephone number is (571)270-0689. The examiner can normally be reached Monday-Friday 8am-5pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

SHREYANS A. PATEL
Examiner
Art Unit 2657



/SHREYANS A PATEL/Examiner, Art Unit 2656