Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	Information Disclosure Statement
The information disclosure statement (IDS) submitted on August 05, 2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 9-11, 17-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kupryjanow (U.S. Publication No. 20200184987).
Regarding claim 1, Kupryjanow discloses a noise cancellation method using a computer system (Figure 1 – Noise Reduction Model Selector 108), the noise cancellation method comprising:
generating a first voice signal by canceling a first portion of noise included in an input voice signal using a first network, the first network being a trained u-net structure, and the first portion of the noise being in a time domain ([0025] - the neural network based spectral speech enhancement can be a LSTM network that estimates a TFM and a denoised signal is obtained via multiplication of the input and TFM. In various examples, the noise suppressor 104 can use auto-encoders to perform neural network-based time domain speech enhancement… auto-encoders such as… u-net… may be used. The network may be fed with audio signal chunks and the encoder-decoder layers transform a signal to a higher dimension);
applying a first window to the first voice signal to obtain a first windowed voice signal ([0039] - which may be a result from passing a time window of the audio signal through Fourier Transform);
performing a fast Fourier transform (FFT) on the first windowed voice signal to acquire a magnitude signal and a phase signal ([0039] - the Fourier Transform may be approximated by FFT algorithm… Before feeding into the TFM net, the complex coefficients are converted to magnitude and scaled);
acquiring a mask using a second network based on the magnitude signal, the second network being another trained u-net structure ([0025] - the neural network based spectral speech enhancement can be a LSTM network that estimates a TFM and a denoised signal is obtained via multiplication of the input and TFM. In various examples, the noise suppressor 104 can use auto-encoders to perform neural network-based time domain speech enhancement… auto-encoders such as… u-net… may be used. The network may be fed with audio signal chunks and the encoder-decoder layers transform a signal to a higher dimension. [0026] - the noise suppressor 104 of system 100 uses time-frequency masks);
applying the mask to the magnitude signal to obtain a masked magnitude signal ([0039] - the Fourier Transform may be approximated by FFT algorithm… Before feeding into the TFM net, the complex coefficients are converted to magnitude and scaled);
generating a second voice signal by canceling a second portion of the noise by performing an inverse fast Fourier transform (IFFT) on the first windowed voice signal based on the masked magnitude signal and the phase signal ([0027] - Training a neural network to infer disturbance time frequency masks is different than just inverting the speech TFM because the disturbances are often foreground sounds which overlap with speech but are not identical to the acoustic background);
and applying a second window to the second voice signal to obtain a second windowed voice signal ([0039] - which may be a result from passing a time window of the audio signal through Fourier Transform).
Regarding claim 9, Kupryjanow discloses the noise cancellation method, wherein the input voice signal comprises a plurality of frames ([0020] - the classifiers may process frames produced by the frontends and return posterior probabilities that the audio frame belongs to a target event class).
Regarding claim 10, Kupryjanow discloses a non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, cause a computer system including the at least one processor to perform the noise cancellation method ([0065] - The computer readable media 600 may be accessed by a processor 602 over a computer bus 604. Furthermore, the computer readable medium 600 may include code configured to direct the processor 602 to perform the methods described herein. In some embodiments the computer readable media 600 may be non-transitory computer readable media. In some examples, the computer readable media 600 may be storage media).
Regarding claim 11, Kupryjanow discloses a computer system for cancelling noise (Figure 1 – Noise Reduction Model Selector 108), the computer system comprising:
a memory storing computer-readable instructions ([0054] - The computing device 500 may include a central processing unit (CPU) 502 that is configured to execute stored instructions, as well as a memory device 504);
and at least one processor configured to execute the computer-readable instructions to cause the computer systems to ([0054] - The computing device 500 may include a central processing unit (CPU) 502 that is configured to execute stored instructions, as well as a memory device 504),
generate a first voice signal by canceling a first portion of noise included in an input voice signal using a first network, the first network being a trained u-net structure, and the first portion of the noise being in a time domain ([0025] - the neural network based spectral speech enhancement can be a LSTM network that estimates a TFM and a denoised signal is obtained via multiplication of the input and TFM. In various examples, the noise suppressor 104 can use auto-encoders to perform neural network-based time domain speech enhancement… auto-encoders such as… u-net… may be used. The network may be fed with audio signal chunks and the encoder-decoder layers transform a signal to a higher dimension),
apply a first window to the first voice signal to obtain a first windowed voice signal ([0039] - which may be a result from passing a time window of the audio signal through Fourier Transform),
perform a fast Fourier transform (FFT) on the first windowed voice signal to acquire a magnitude signal and a phase signal ([0039] - the Fourier Transform may be approximated by FFT algorithm… Before feeding into the TFM net, the complex coefficients are converted to magnitude and scaled),
acquire a mask using a second network based on the magnitude signal, the second network being another trained u-net structure ([0025] - the neural network based spectral speech enhancement can be a LSTM network that estimates a TFM and a denoised signal is obtained via multiplication of the input and TFM. In various examples, the noise suppressor 104 can use auto-encoders to perform neural network-based time domain speech enhancement… auto-encoders such as… u-net… may be used. The network may be fed with audio signal chunks and the encoder-decoder layers transform a signal to a higher dimension. [0026] - the noise suppressor 104 of system 100 uses time-frequency masks),
apply the mask to the magnitude signal to obtain a masked magnitude signal ([0039] - the Fourier Transform may be approximated by FFT algorithm… Before feeding into the TFM net, the complex coefficients are converted to magnitude and scaled),
generate a second voice signal by canceling a second portion of the noise by performing an inverse fast Fourier transform (IFFT) on the first 34Atty. Dkt. No. 16634LN-000157-US windowed voice signal based on the masked magnitude signal and the phase signal ([0027] - Training a neural network to infer disturbance time frequency masks is different than just inverting the speech TFM because the disturbances are often foreground sounds which overlap with speech but are not identical to the acoustic background),
and apply a second window to the second voice signal to obtain a second windowed voice signal ([0039] - which may be a result from passing a time window of the audio signal through Fourier Transform).
Regarding claim 17, Kupryjanow discloses the noise cancellation method, further comprising: generating an audio signal based on the second windowed voice signal; and driving a speaker to output the audio signal (Figure 2 – Acoustic Event Detector (Sound 2) 106, Audio Output 116).
Regarding claim 18, Kupryjanow discloses the noise cancellation method, wherein the second portion of the noise is in a frequency domain ([0018] - the frontend of the acoustic event detector 106 may include a fast Fourier transform (FFT) and filters in the frequency domain).
Dependent claims 19-20 are analogous in scope to claims 17-18, and are rejected according to the same reasoning.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically taught as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-3, 5-6, 8, 11-13, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Kupryjanow (U.S. Publication No. 20200184987) in view of Helmrich (U.S. Publication No. 20170365266).
Regarding claim 2, Kupryjanow discloses all aforementioned limitations of claim 1.
However, Kupryjanow does not disclose the noise cancellation method, wherein at least one of the first window or the second window comprises a Kaiser-Bessel-derived window for time domain aliasing cancellation (TDAC) in modified discrete cosine transform (MDCT).
Helmrich does teach the noise cancellation method, wherein at least one of the first window or the second window comprises a Kaiser-Bessel-derived window for time domain aliasing cancellation (TDAC) in modified discrete cosine transform (MDCT) ([0078] - The inverse MDCT is known as the IMDCT. Because there are different numbers of inputs and outputs, at first glance it might seem that the MDCT should not be invertible. However, perfect invertibility is achieved by adding the overlapped IMDCTs of time-adjacent overlap ping blocks, causing the errors to cancel and the original data to be retrieved; this technique is known as time-domain aliasing cancellation (TDAC). [0083] - AC-3 uses a Kaiser-Bessel derived (KBD) window, and MPEG-4 AAC can also use a KBD window).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the application to modify the teaching of Kupryjanow to include the teaching of Helmrich in order to implement the noise cancellation method, wherein at least one of the first window or the second window comprises a Kaiser-Bessel-derived window for time domain aliasing cancellation (TDAC) in modified discrete cosine transform (MDCT). Doing so allows perfect invertibility to be achieved by adding the overlapped IMDCTs of time-adjacent overlap ping blocks, causing the errors to cancel and the original data to be retrieved (Helmrich [0078]).
Regarding claim 3, Kupryjanow discloses all aforementioned limitations of claim 1.
Kupryjanow discloses the noise cancellation method, wherein the magnitude signal includes a first magnitude signal and a second magnitude signal;
and the acquiring the mask comprises:
acquiring a first mask using the second network based on the first magnitude signal, the first magnitude signal being in a first frequency band ([0039] - the Fourier Transform may be approximated by FFT algorithm… Before feeding into the TFM net, the complex coefficients are converted to magnitude and scaled),
calculating an average energy for each of the plurality of second magnitude sub-signals ([0018] - the backend may perform basic operations such as calculating a running average and comparing the running average with a threshold. As another example, the acoustic event detector 106 may be implemented as a neural network that includes lower layers to perform feature extraction),
and acquiring a second mask using the second network based on the average energy for each of the plurality of second magnitude sub-signals ([0018] - the backend may perform basic operations such as calculating a running average and comparing the running average with a threshold. As another example, the acoustic event detector 106 may be implemented as a neural network that includes lower layers to perform feature extraction [0039] - the Fourier Transform may be approximated by FFT algorithm… Before feeding into the TFM net, the complex coefficients are converted to magnitude and scaled).
However, Kupryjanow does not disclose dividing the second magnitude signal into a plurality of second magnitude sub-signals according to bandwidth, the second magnitude signal being in a second frequency band greater than the first frequency band.
Helmrich does teach dividing the second magnitude signal into a plurality of second magnitude sub-signals according to bandwidth, the second magnitude signal being in a second frequency band greater than the first frequency band ([0114] - when the pitch of the input signal is exactly, or very close to, an integer multiple of the frequency resolution of the transform (i.e. the bandwidth of one transform bin in the spectral domain), the MDCT-II or MDST-II may be employed for the affected frames and channels).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the application to modify the teaching of Kupryjanow to include the teaching of Helmrich in order to implement dividing the second magnitude signal into a plurality of second magnitude sub-signals according to bandwidth, the second magnitude signal being in a second frequency band greater than the first frequency band. Doing so allows an audio code like HE-AAC to be used in order to minimize issues (Helmrich [0114]).
Regarding claim 5, Kupryjanow in view of Helmrich teaches all aforementioned limitations of claim 3.
Kupryjanow discloses the noise cancellation method, wherein the first mask is an ideal ratio mask (IRM) for the first magnitude signal and the second mask is an IRM for the average energy ([0012] - A time-frequency mask (TFM)-based network using specific disturbance models was shown to be able to better match the ideal mask, which perfectly separates the speech from the disturbing sound);
and the applying the mask to the magnitude signal comprises multiplying the first mask by the first magnitude signal and multiplying the second mask by the second magnitude signal ([0025] - the TFM based speech enhancement may be implemented by using a LSTM network that estimates a Time-Frequency Mask and denoising the signal via multiplication of the input and TFM).
Regarding claim 6, Kupryjanow in view of Helmrich teaches all aforementioned limitations of claim 3.
Kupryjanow discloses the noise cancellation method, wherein the acquiring the first mask comprises: calculating a number of Mel-frequency cepstral coefficients (MFCCs) based on the first magnitude signal; and acquiring the first mask using the second network based on the MFCCs ([0020] - a Discrete Cosine Transform and logarithm may be applied to the filterbank features to obtain Mel-Frequency Cepstral Coefficients (MFCCs). A difference between the processing of continuous frontend and the impulsive frontend may be in the splicing of feature frames [0039] - the Fourier Transform may be approximated by FFT algorithm… Before feeding into the TFM net, the complex coefficients are converted to magnitude and scaled).
Regarding claim 8, Kupryjanow discloses all aforementioned limitations of claim 1.
Kupryjanow discloses the noise cancellation method, wherein the generating the second voice signal comprises:
estimating a denoised magnitude signal by multiplying the magnitude signal and the mask ([0025] - the TFM based speech enhancement may be implemented by using a LSTM network that estimates a Time-Frequency Mask and denoising the signal via multiplication of the input and TFM);
and recovering an FFT coefficient based on the denoised magnitude signal and the phase signal ([0039] - the input spectrum is a vector of K complex coefficients , which may be a result from passing a time window of the audio signal through Fourier Transform).
However, Kupryjanow does not disclose recovering the second voice signal by performing the IFFT based on the FFT coefficient.
Helmrich does teach recovering the second voice signal by performing the IFFT based on the FFT coefficient ([0022] - the decoder has to apply the inverse transform kernel of the transform kernel used by the encoder to encode the audio signal in each frame and channel).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the application to modify the teaching of Kupryjanow to include the teaching of Helmrich in order to implement recovering the second voice signal by performing the IFFT based on the FFT coefficient. Doing so allows information to be stored in the control information and transmitted from encoder to decoder (Helmrich [0022]).
Dependent claims 12-13 are analogous in scope to claims 2-3, and are rejected according to the same reasoning.
Dependent claim 14 is analogous in scope to claim 8, and is rejected according to the same reasoning.
Dependent claim 16 is analogous in scope to claim 5, and is rejected according to the same reasoning.
Claims 4 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Kupryjanow (U.S. Publication No. 20200184987) in view of Helmrich (U.S. Publication No. 20170365266), and further in view of Skovenborg (U.S. Publication No. 20190285673)
Regarding claim 4, Kupryjanow in view of Helmrich teaches all aforementioned limitations of claim 3.
However, Kupryjanow in view of Helmrich does not teach the noise cancellation method, wherein the dividing divides the second magnitude signal into the plurality of second magnitude sub-signals by dividing the second frequency band based on a bark scale unit.
Skovenberg does teach the noise cancellation method, wherein the dividing divides the second magnitude signal into the plurality of second magnitude sub-signals by dividing the second frequency band based on a bark scale unit ([0037] - the measuring of levels in neighboring frequency bands based on critical bandwidths may for example involve measuring specific loudness incorporating a psychoacoustical model of masking, e.g., measuring the “specific loudness” in Sone units in each band on the Bark scale).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the application to modify the teaching of Kupryjanow in view of Helmrich to include the teaching of Skovenberg in order to implement the noise cancellation method, wherein the dividing divides the second magnitude signal into the plurality of second magnitude sub-signals by dividing the second frequency band based on a bark scale unit. Doing so allows a filter-bank to mirror properties of critical bandwidth, allowing a more perceptual analysis to be achieved (Skovenberg [0037]).
Dependent claim 15 is analogous in scope to claim 4, and is rejected according to the same reasoning.
Claims 7 is rejected under 35 U.S.C. 103 as being unpatentable over Kupryjanow (U.S. Publication No. 20200184987) in view of Helmrich (U.S. Publication No. 20170365266), and further in view of Lample (U.S. Publication No. 20170169812).
Regarding claim 7, Kupryjanow in view of Helmrich teaches all aforementioned limitations of claim 3.
However, Kupryjanow in view of Helmrich does not teach the noise cancellation method, wherein the acquiring the first mask comprises:
calculating a zero-crossing rate (ZCR) based on the first magnitude signal;
and acquiring the first mask using the second network based on the ZCR.
Lample does teach the noise cancellation method, wherein the acquiring the first mask comprises:
calculating a zero-crossing rate (ZCR) based on the first magnitude signal ([0072] - Additionally, an example of an extracted feature that is related to the Sound input is a determination of a Zero-crossing rate. For example, as mentioned above, the Sound input can include auditory data in the form of a signal or a waveform that spans a plurality of frames. A Zero crossing is the point at which the waveform crosses the horizontal axis indicating a change in sign of the associated mathematical function (e.g., from positive to negative). In most embodiments, a high Zero-crossing rate is correlated with noise, as opposed to speech. Accordingly, in one or more embodiments, the lower the Zero-crossing rate is over the portion of the Sound input corresponding to the analysis word, the more likely the analysis word is correct);
and acquiring the first mask using the second network based on the ZCR ([0072] - Additionally, an example of an extracted feature that is related to the Sound input is a determination of a Zero-crossing rate. For example, as mentioned above, the Sound input can include auditory data in the form of a signal or a waveform that spans a plurality of frames. A Zero crossing is the point at which the waveform crosses the horizontal axis indicating a change in sign of the associated mathematical function (e.g., from positive to negative). In most embodiments, a high Zero-crossing rate is correlated with noise, as opposed to speech. Accordingly, in one or more embodiments, the lower the Zero-crossing rate is over the portion of the Sound input corresponding to the analysis word, the more likely the analysis word is correct).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the application to modify the teaching of Kupryjanow in view of Helmrich to include the teaching of Lample in order to implement the noise cancellation method, wherein the acquiring the first mask comprises: calculating a zero-crossing rate (ZCR) based on the first magnitude signal; and acquiring the first mask using the second network based on the ZCR. Doing so allows the system to determine whether the analysis word is correct or if it is just noise (Lample [0072]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Fisher (U.S. Patent No. 10511908) teaches audio denoising and normalization using image transforming neural network. Jansson (U.S. Publication No. 20210104256) teaches systems and methods for jointly estimating sound sources and frequencies from audio. O’Shea (U.S. Publication No. 20200343985) teaches processing communication signals using a machine-learning network. Xu (U.S. Publication No. 20220027672) teaches label generation using neural networks.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ETHAN DANIEL KIM whose telephone number is (571) 272-1405.  The examiner can normally be reached on Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ETHAN DANIEL KIM/
Examiner, Art Unit 2658


/RICHEMOND DORVIL/            Supervisory Patent Examiner, Art Unit 2658