DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 19-24 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Regarding Claims 19-24, the claims are rejected under 35 U.S.C. § 101 because the claim invention is not supported by a process, machine, manufacture, or composition of matter. In the state of the art, transitory signals are commonplace as a medium for transmitting computer instructions and thus, in the absence of any evidence to the contrary and given a broadest reasonable interpretation, the scope of a "computer-readable medium" covers a signal per se. A transitory signal does not fall within the definition of a process, machine, manufacture, or composition of matters. Claims 19-24 when read in light of Paragraph [0297] of Applicant's filed specification, does not define a computer-readable medium to include the disclosed tangible computer readable media, while at the same time excluding the intangible media such as signals, carrier waves, propagated signals, etc., and is thus non-statutory for that reason. Paragraph [0411] however, recites in at least one embodiment, a computer-a non-transitory computer-readable storage medium that excludes transitory signals (e.g., a propagating transient electric or electromagnetic transmission) but includes non-transitory data storage circuitry ( e.g., buffers, cache, and queues) within transceivers of transitory signals.
The specification or claims must be amended to limit the computer-readable storage medium to only non-transitory signals, and state the exclusion of transitory signals (See Official Gazette Notice 1351 OG 212, dated February 23, 2010).
Claims 19-24 recites “a machine readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least …”. The Examiner suggests amending claims 19-24 to state "A non-transitory machine-readable medium …" to overcome the rejection under 35 U.S.C. §101.

Allowable Subject Matter
Claims 6, 12, 18, 24, and 30 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1, 7, 13, 19 and 25 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Calle et al. (US #2018/0358003).

Regarding Claim 1, Calle discloses a processor (Figs. 1-12; processor 1204), comprising:
one or more circuits to use one or more neural networks (Calle title; Figs. 1-3, 5-7, and 10) to determine a noise signal in one or more speech signals (Calle ¶0042 discloses Fig. 3 illustrates an example of applying voice reconstruction using a neural network on a receiving UE 320 in a wireless communication system 300. ¶0044 discloses the UE 310 can include a noise filter/suppression, beam-forming component 312 that filters or suppresses noise and performs beam forming on the speech signal picked up by one or more microphones of the UE 310. Because of the environmental noise surrounding the UE 310, as well as the processing by the component 312 and standard voice codecs 314, the quality of the speech signal transmitted by the UE 310 can be poor. ¶0045 discloses the UE 320 can include a voice reconstruction block 326 that reconstructs the voice stream generated by the standard voice codecs 322 using a neural network to enhance the quality of the speech. As a result, the user of the UE 320 can hear a clean high definition (HD) voice [e.g., with increased SNR and/or fewer artifacts]).

Claims 7, 13, 19 and 25 are rejected for the same results as set forth in Claim 1.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-4, 8-10, 14-16, 20-22, and 26-28 are rejected under 35 U.S.C. 103 as being unpatentable over Calle et al. (US PGPUB #2018/0358003) in view of Lee et al. (US PGPUB #2018/0190268).

Regarding Claim 2, Calle discloses the processor of claim 1,
wherein the one or more circuits are further to generate an audio spectrogram corresponding to one or more features extracted from the one or more speech signals (Calle ¶0047 discloses the ASR can be constructed by a neural network with convolutional layers acting on speech features, including MFCC, spectrogram and gammatone features, or conceivably on the audio signal itself).
Calle may not explicitly disclose wherein the one or more circuits are further to generate an audio spectrogram corresponding to one or more features extracted from the one or more speech signals.
wherein the one or more circuits are further to generate an audio spectrogram corresponding to one or more features extracted from the one or more speech signals (Lee ¶0065 discloses the speech recognizing apparatus 110 obtains or generates a spectrogram from/of the speech signal and extracts a frequency feature of the speech signal from the spectrogram).
Calle and Lee are analogous art as they pertain to enhancing speech using neural networks. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify improving speech quality (as taught by Calle) to generates a spectrogram from/of the speech signal and extracts a frequency feature of the speech signal from the spectrogram (as taught by Lee, ¶0065) to provide speech recognition as a desire for convenience (Lee, ¶0003).

Regarding Claim 3, Calle in view of Lee discloses the processor of claim 2,
wherein the one or more circuits are further to provide the audio spectrogram as input to the one or more neural networks (Calle Fig. 2: spectrogram as input to CNN, RNN).
Calle may not explicitly disclose wherein the one or more circuits are further to provide the audio spectrogram as input to the one or more neural networks, wherein the one or more neural networks generate an audio mask corresponding to the noise signal determined in the one or more speech signals.
However, Lee (Figs. 1-11) teaches wherein the one or more circuits are further to provide the audio spectrogram as input to the one or more neural networks (Lee ¶0115 discloses Fig. 9 operation 910 the speech recognizing apparatus obtains a spectrogram of a speech frame. The speech recognizing apparatus generates the spectrogram by ,
wherein the one or more neural networks generate an audio mask corresponding to the noise signal determined in the one or more speech signals (Lee ¶0069 discloses the speech recognizing model implemented by the speech recognizing apparatus 110 configured as the neural network can dynamically implement spectral masking by receiving a feedback on a result calculated by the neural network at the previous time. When the spectral masking is performed, feature values for each frequency band can selectively not be used in full as originally determined/captured, but rather, a result of a respective adjusting of the magnitudes of all or select feature values for all or select frequency bands, e.g., according to the dynamically implemented spectral masking, can be used for or within speech recognition. Also, for example, such a spectral masking scheme can be dynamically implemented to intensively recognize a speech of a person other than noise from a captured speech signal and/or to intensively recognize a speech of a particular or select speaker to be recognized when plural speeches of a plurality of speakers are present in the captured speech signal. ¶0112 discloses a masking function that reduces an influence of a particular component in speech recognition based on the attention weight can be implemented).
Calle and Lee are analogous art as they pertain to enhancing speech using neural networks. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify improving .

Regarding Claim 4, Calle in view of Lee discloses the processor of claim 3,
wherein the one or more neural networks include two parallel paths for determining patterns in the audio spectrogram (Calle Fig. 2: spectrogram as input to CNN, RNN; ¶0037 discloses RNNs can come in a variety of forms including GRU. The exemplary deep convolutional network 200 also includes multiple convolution blocks (e.g., Cl and C2). Each of the convolution blocks can be configured with a convolution layer [CONV], a normalization layer [LNorm], and a pooling layer [MAX POOL]. The convolution layers can include one or more convolutional filters. ¶0038 discloses the parallel filter banks, for example, of a deep convolutional network can be loaded on a CPU or GPU of an SOC, optionally based on an Advanced RISC Machine [ARM] instruction set, to achieve high performance and low power consumption),
the two parallel paths including a first path with a sequence of convolutional layers and a second path with one or more gated recurrent unit (GRU) layers (Calle ¶0037 discloses Fig. 2 shows the exemplary deep convolutional network 200 includes a preprocessing block. The preprocessing block has a waveform input. The preprocessing block includes a spectrogram block, convolutional neural network [CNN] block, recurrent neural network [RNN] block, and a decoding block. RNNs can come in a variety of forms including generic RNN, LSTM, and GRU, which can be designed with stable memory allowing association over long input sequences of indefinite lengths. ¶0047 discloses the ASR .
Calle may not explicitly disclose wherein the one or more neural networks include two parallel paths for determining patterns in the audio spectrogram.
However, Lee (Figs. 1-11) teaches wherein the one or more neural networks include two parallel paths for determining patterns in the audio spectrogram (Lee ¶0114 discloses operations of Fig. 9 can be performer in parallel or simultaneously).
Calle and Lee are analogous art as they pertain to enhancing speech using neural networks. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify improving speech quality (as taught by Calle) to generates a spectrogram from/of the speech signal and extracts a frequency feature of the speech signal from the spectrogram (as taught by Lee, ¶0065) to provide speech recognition as a desire for convenience (Lee, ¶0003).


Claims 8-10, 14-16, 20-22, and 26-28 are rejected for the same results as set forth in Claims 2-4.

Claims 5, 11, 17, 23, and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Calle et al. (US PGPUB #2018/0358003) in view of Lee et al. (US PGPUB #2018/0190268) further in view of Kim et al. (US #2021/0350796).
5, Calle in view of Lee discloses the processor of claim 4, but may not explicitly disclose wherein the one or more circuits are further to concatenate the patterns determined by the two parallel paths and process those concatenated patterns using a sequence of GRU layers to identify important noise patterns in the one or more speech signals for use in generating the audio mask.
However, Kim (Figs. 1-5) teaches wherein the one or more circuits are further to concatenate the patterns determined by the two parallel paths and process those concatenated patterns using a sequence of GRU layers to identify important noise patterns in the one or more speech signals for use in generating the audio mask (Kim ¶0026 discloses [Figs. 1, 3] the first level of context aggregation in a densely connected convolutional and recurrent network [DCCRN] is done by a dilated ID convolutional network component, with a DenseNet architecture [20], to extract the target speech from the noisy mixture in the time domain. It is followed by a compact gated recurrent unit [GRU] component [21] to further leverage the contextual information in the "many-to-one" fashion. ¶0027 discloses the speech processing represents that the hybrid architecture of dilated convolution neural network [CNN] and GRU in DCCRN consistently helps outperform the CNN variations with only one level of context aggregation. ¶0029 discloses here, the densely connected hybrid network can be a network in which a CNN and a recurrent neural network [RNN] are combined. ¶0030 discloses the densely connected hybrid network can include a plurality of dense blocks. And, each of the dense blocks can be composed of a plurality of convolutional layers. ¶0042 discloses each layer concatenates the output from all preceding layers as its input, while it feeds its own outputs to all subsequent layers).
Calle, Lee and Kim are analogous art as they pertain to enhancing speech using neural networks. Therefore it would have been obvious to someone of ordinary skill in .

Claims 11, 17, 23, and 29 are rejected for the same results as set forth in Claim 5.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YOGESHKUMAR G PATEL whose telephone number is (571)272-3957. The examiner can normally be reached 7:30 AM-4 PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached on 571-272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, 





/YOGESHKUMAR PATEL/Primary Examiner, Art Unit 2651