DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments with respect to 35 U.S.C. 102 in regards to claims 1 and 13  on pages 7-8 of the Remarks dated 10/10/2022 have been considered, however are not found to be persuasive due to the following reasons. Applicant claims that Germain fails to disclose “a pre-processing neural network configured to receive a segment of the input audio signal, including metadata associated with the segment of the input audio signal.” Examiner respectfully disagrees, as Germain, in paragraphs [0015], clearly teaches the limitation “in the training phase 102, at operation 110, an audio classifier neural network is first trained with labeled classification training audio 140 to generate the trained audio classifier neural network 170.” Therefore, the rejection is maintained. See detailed rejection below. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-10 and 13-18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Germain et al. (US 2019/0043516).

Claim 1,
Germain teaches a system comprising: a trained neural network configured to receive an input audio signal and generate an enhanced target signal, the trained neural network comprising ([Fig. 1] [0017] [0019] operation 130, speech denoising neural network to produce denoised speech 190): 
a pre-processing neural network configured to receive a segment of the input audio signal, including metadata associated with the segment of the input audio signal, and generate an audio classification at an output layer ([Fig. 1] [0015] training phase 102; at operation 110, an audio classifier neural network if first trained with labeled (metadata associated with the segment of the input audio signal) classification training audio 140 to generate the trained audio classifier neural network 170), 
the pre-processing neural network including at least one hidden layer ([Fig. 1] [0015] neural network comprising multiple convolutional layers) comprising an embedding vector generated based at least in part on the metadata associated with the segment of the input audio signal ([Fig. 1] [0018] training employs deep feature losses based on the labeled classification of the training audio 140); and 
a noise reduction neural network configured to receive the segment of the input audio signal, and the embedding vector and generate the enhanced target signal ([Fig. 1] [0017-0018] [0021] a speech denoising neural network is trained based on a combination of noisy training speech 150 (input audio signal) and associated clean training speech 155; this operation generates a trained speech denoising neural network 180; the trained speech denoising neural network 180 is employed to process noisy operational speech 160 to generate denoised speech 190).

Claims 13-14 contain subject matter similar to claim 1, and thus is rejected under similar rationale.

Claim 2,
Germain further teaches the system of claim 1, wherein the pre-processing neural network comprises a target signal pre-processing neural network configured to receive the segment of the input audio signal and generate a target signal classification at the output layer; wherein the at least one hidden layer comprises a target embedding vector ([0028] the average pooling layer circuit 320 is configured to average each channel output of the final convolutional layer, over a period of time, to yield an output feature vector; the output feature vector can be fed to one (or more) logistic classifiers 330 with cross-entropy loss to perform one (or more) classification tasks).

Claim 3,
Germain further teaches the system of claim 2, wherein the target signal pre-processing neural network further comprises a neural network trained to classify speech ([0015] an audio classifier neural network is first trained with labeled classification training audio 140 to generate the trained audio classifier neural network 170); and wherein the noise reduction neural network is configured to extract a speech waveform from the segment of the input audio signal ([0012] a denoising neural network which is trained with deep feature losses extracted from an audio classifier neural network; the audio classifier neural network is pre-trained to identify various types of audio sounds).

Claim 4,
Germain further teaches the system of claim 2, wherein the target signal pre-processing neural network comprises an autoencoder neural network trained to classify a plurality of semantic categories ([0012] [0016] encoded by the audio classifier network; labeled classification training audio 140 may include domestic (household) sounds such as, for example, background noise from appliances, percussive sounds (e.g., crashes, bangs, knocks, footsteps), videogame/television sounds, and speech from adult males, adult females, and children, etc.).

Claim 5,
Germain further teaches the system of claim 1, wherein the pre-processing neural network comprises a noise pre- processing neural network configured to receive the segment of the input audio signal and generate a noise classification at the output layer; wherein the at least one hidden layer comprises a noise embedding vector ([0016-0018] the labeled classification training audio 140 may also include sounds associated with different environments, such as, for example, restaurants, bus stations, urban locations, forests, beaches, etc.; targeting meaningful features in a noisy signal).

Claim 15 contains subject matter similar to claim 5, and thus is rejected under similar rationale.

Claim 6,
Germain further teaches the system of claim 5, wherein the noise pre-processing neural network further comprises a neural network trained to classify audio sounds and wherein the noise embedding vector comprises information describing a corresponding noise classification ([0016] the labeled classification training audio 140).

Claim 7,
Germain further teaches the system of claim 1, wherein the noise reduction neural network is trained with random speech and noise sequences and corresponding embedding vector (Fig. 1] speech denoising neural network training 120 receives noisy training speech 150 and clean training speech 155 and extracting feature vectors).

Claim 8,
Germain further teaches the system of claim 1, wherein the pre-processing neural network further comprises: a speech signal pre-processing neural network configured to receive the segment of the input audio signal and generate a speech signal classification at an output layer of the speech signal pre-processing neural network, the speech signal pre-processing neural network including a speech signal preprocessing neural network hidden layer comprising a speech embedding vector; and a noise pre-processing neural network configured to receive the segment of the input audio signal and generate a noise classification at an output layer of the noise pre-processing neural network, the noise pre-processing neural network including a noise pre-processing neural network hidden layer comprising a noise embedding vector; and wherein the system further comprises an auxiliary neural network configured to classify the segment of the audio input signal as speech or noise, and wherein the segment is processed by a corresponding pre-processing neural network ([0016-0019] [0021-0022] the labeled classification training audio classifies speech and noise sounds; the speech denoising neural network is trained based on a combination of the noisy training speech 150 and associated clean training speech 155; noisy samples of training speech signals 150 are applied to the speech denoising neural network (in training) 210, from which processed training speech signals 220 are generated; the processed training speech signals 220 are applied to the trained audio classifier neural network 170, which generates a first set of activation features 230a; clean samples on the training speech signals 155 are applied to the trained audio classifier neural network 170, which generates a second set of activation features 230b; the clean samples 155 and the noisy samples 150 include the same speech signals, but the noisy samples also include additive background noise at a selected level of signal-to-noise ratio; the trained speech denoising neural network 180 is employed to process noisy operational speech 160 to generate denoised speech 190).

Claim 16 contains subject matter similar to claim 8, and thus is rejected under similar rationale.

Claim 9,
Germain further teaches the system of claim 8, wherein an average embedding vector is calculated for each pre- processing neural network ([0028] the average pooling layer circuit 320 is configured to average each channel output of the final convolutional layer, over a period of time, to yield an output feature vector.).

Claims 17-18 contain subject matter similar to claim 9, and thus is rejected under similar rationale.

Claim 10,
Germain further teaches the system of claim 1, wherein the embedding vector is a predefined embedding vector corresponding to a predetermined audio classification ([0015-0016] [0021-0022] labeled classification training audio is predetermined; the average pooling layer circuit is configured to yield an output feature vector to generate classification labels of the input audio signal).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 12 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Germain et al. (US 2019/0043516) and further in view of Watson et al. (US 2004/0024588).

Claim 12,
Germain teaches all the limitations in claim 10. The difference between the prior art and the claimed invention is that Germain does not explicitly teach wherein the predefined embedding vector is selected by a user.
Watson teaches wherein the predefined embedding vector is selected by a user (0048] the present invention allows a user to choose the strength or energy of the embedded signal).
Germain is analogous art with Watson because they both involve audio processing. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Germain with teachings of Watson by modifying neural network for speech denoising trained with deep feature losses as taught by Germain to include wherein the predefined embedding vector is selected by a user as taught by Watson for the benefit of solving the problem of how to embed a watermark in a perceptual encoder while maximizing the strength and minimizing the perceptibility of the embedded signal ([0048]).

Claim 20 contains subject matter similar to claim 12, and thus is rejected under similar rationale.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHREYANS A PATEL whose telephone number is (571)270-0689. The examiner can normally be reached Monday-Friday 8am-5pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

SHREYANS A. PATEL
Examiner
Art Unit 2657



/SHREYANS A PATEL/Examiner, Art Unit 2656