DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 11/18/2019. Claims 1-13 are pending in the application and have been examined.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/18/2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
The disclosure is objected to because of the following informalities: see Specifications [0028]… be a process of generating an LPT residual signal.  
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:





Claims 10-13 are rejected under 35 U.S.C. 112, first paragraph, as failing to comply with the enablement requirement. The claim(s) contain(s) subject matter which was not described in the specification in such a way as to enable one skilled in the art to which it pertains, or with which it is most nearly connected, to make and/or use the invention. Specifically, Claims 10-13 recites for example “a processor configured to:  determine…remove…trained.” Claims 10-13  are rejected under 35 U.S.C. 112, first paragraph, because said claim is similar to single means claims, reciting only one structure (“a processor configured to …”) where the apparatus is not in combination with any other hardware memory element.
A single means claim, i.e., where a means recitation does not appear in combination with another recited element of means, is subject to an undue breadth rejection under 35 U.S.C. 112, first paragraph. In re Hyatt, 708 F. 2d 712, 714-715, 218 USPQ 195, 197 (Fed. Cir. 1983) (A single means claim which covered every conceivable means for achieving the stated purpose was held non-enabling for the scope of the claim because the specification disclosed at most only those means known to the inventor.). When claims depend on a recited property, a fact situation comparable to Hyatt is possible, where the claim covers every conceivable structure (means) for achieving the stated property (result) while the specification discloses at most only those known to the inventor. The processor in Claims 10-13 correspondingly is not limited to a particular structure for performing the claimed functions; as such the said claim(s) may cover all devices that performed the claimed function. This raises a concern regarding whether the scope of enablement provided by Applicant’s disclosure is commensurate with the scope of protection sought by the claim(s). Applicant cannot rely on the knowledge of one skilled in the art to supply 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.


Claims 1-13 are rejected under 35 U.S.C. 103 as being unpatentable over Vos (US Patent Application Publication 2010/0174534) in view of Janusz Klejsa and Per Hedelin and Cong Zhou and Roy Fejgin and Lars Villemoes, "High-quality speech coding with SampleRNN," arXiv:1811.03021v1 [eess.AS], 2018.
Regarding claim 1, Vos teaches a method of processing a residual signal for audio coding, the method comprising: determining a residual signal of a first band in an original signal of an entire band  (see Vos, [0005, 0051] according to many speech coding algorithms such as those using Linear Predictive Coding (LPC), a short-term filter is used to separate out the speech signal into two separate components: (i) a signal representative of the effect of the time-varying filter 104; and (ii) the remaining signal with the effect of the filter 104 removed, which is representative of the source signal. The signal representative of the effect of the filter 104 may be referred to as the spectral envelope signal, and typically comprises a series of sets of LPC parameters describing the spectral envelope at each stage. Once the varying spectral envelope is removed, the remaining signal representative of the source alone may be referred to as the LPC residual signal, as shown schematically in FIG. 2a; The encoder 500 further comprises a high-pass filter 502, a linear predictive coding (LPC) analysis block 504, a first vector quantizer 506, an open-loop pitch analysis block 508, a long-term prediction (LTP) analysis block 510; LPC residual signal interpreted as residual signal of first band); determining a reference signal of a second band remaining excluding the first band from the original signal of the entire band (see Vos, [0047] the speech input signal is input to a voice activity detector 501. The voice activity detector is arranged to determine a measure of voicing activity, and spectral tilt and signal to noise estimate, for each frame. The voice activity detector uses a sequence of half-band filter banks to split the signal into four sub-bands; signal output of voice activity detector interpreted as reference signal of second band); removing an envelope from the residual signal and the reference signal (see Vos, [0005] once the varying spectral envelope is removed, the remaining signal representative of the source alone may be referred to as the LPC residual signal, as shown schematically in FIG. 2a); removing a pitch from the residual signal and the reference signal from which the envelope is removed (see Vos, [0008] the effect of this inter-period correlation is then removed from the LPC residual, leaving an LTP residual signal representing the source signal with the effect of the correlation between pitch periods removed. To represent the source signal, the LTP vectors and LTP residual signal are encoded separately for transmission). However, Vos fails to teach wherein the residual signal of the first band and the reference signal of the second band are set to be an output and an input of a residual signal learning engine including a convolutional layer and a neural network so as to be trained. However, Janusz teaches wherein the residual signal of the first band and the reference signal of the second band are set to be an output and an input of a residual signal learning engine including a convolutional layer and a neural network so as to be trained (see Janusz, pg. 2, section 3.1 Waveform samples xi−FS(k) , . . . , xi−1 and decoded vocoder conditioning vector hf processed by respective 1 × 1 convolution layers are the inputs to k-th tier. When k < K, the output from (k + 1)-th tier is additional input; Waveform samples xi−FS(k) , . . . , xi−1 is interpreted as the reference signal of second band to convolution layer of learning engine and conditioning vector hf is interpreted as the residual signal of second band).
Vos and Janusz are considered to be analogous to the claimed invention because they relate to digital signal coding techniques for speech processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Vos for encoding the source signal techniques with the generative network architecture teachings of Janusz to improve the rate-quality trade-off (see Janusz, pg. 1, section 1).
Regarding claim 2, Vos and Janusz teach the method of claim 1.  Janusz further teaches wherein the reference signal is converted into a feature map including a frame and a frequency bin and set to be an input of the residual signal learning engine (see Janusz, pg. 1 sect 2 , the encoder scheme is based on a wide-band version of a linear prediction coding (LPC) vocoder [7]. Signal analysis is performed on a per-frame basis, and it results in the following parameters: i) an M-th order LPC filter, ii) an LPC residual RMS level s, iii) pitch f0, and, iv) a k-band voicing vector v. A voicing component v(i), i = 1, . . . , k gives the fraction of periodic energy within a band. All these parameters are used for conditioning of SampleRNN, as described in Section 3).  
the method of claim 1. Janusz further teaches wherein the reference signal is convoluted through a filter defined based on a number of bins and a number of frames in the convolutional layer so that a feature is extracted (see Janusz, pg. 2, section 3.1 Waveform samples xi−FS(k) , . . . , xi−1 and decoded vocoder conditioning vector hf processed by respective 1 × 1 convolution layers are the inputs to k-th tier. When k < K, the output from (k + 1)-th tier is additional input).
	Regarding claim 4, Vos and Janusz teach the method of claim 1. Janusz further teaches wherein a node of an output layer of the neural network is mapped to an index of a quantization level of the residual signal (see Janusz, pg.2, sect 2, the residual level s is quantized in the dB domain using a hybrid approach similar to that in [22]. Small level inter-frame variations are detected, signaled by one bit, and coded by a predictive scheme using fine uniform quantization. In other cases the coding is memoryless with a larger, yet uniform, step-size covering a wide range of levels).
Regarding claim 5, Vos teaches a method of processing a residual signal for audio coding, the method comprising: restoring a reference signal of a second band other than a first band of an entire band to extract a residual signal of the first band (see Vos, [0098],   
    PNG
    media_image1.png
    233
    476
    media_image1.png
    Greyscale
 at the arithmetic decoding and dequantizing block 702, the arithmetically encoded bitstream is demultiplexed and decoded to create LSF indices, LTP indices, LTP scaling value indices (if used), quantization gains indices, pitch lags and a signal of quantization indices; bitstream is interpreted as the reference signal of second band); adding an envelope to the residual signal of the first band (see Vos, [0104] the 
    PNG
    media_image2.png
    79
    265
    media_image2.png
    Greyscale
LPC excitation signal is input to an LPC synthesis filter to create the decoded speech signal y(n) according to using the quantized LPC coefficients a.sub.Q;  LPC synthesis filter 708 adding envelope to the residual signal of first band); adding a pitch to the residual signal of the first band (see Vos, [0104] the excitation signal is input to the LTP 
    PNG
    media_image3.png
    68
    269
    media_image3.png
    Greyscale
synthesis filter 706 to create the LPC excitation signal eLPc(n) according to:   using the pitch lag and quantized LTP coefficients bQ;   LTP synthesis filter 706interpreted as adding pitch to the residual signal of first band); and restoring an original signal of the entire band by combining the residual signal to which the envelope and the pitch are added, with the reference signal of the second band (see Vos, [0097] the LPC synthesis filter has an output arranged to provide a decoded output for supply to an output device such as a speaker or headphones). However, Vos fails to teach inputting the reference signal of the second band to a residual signal learning engine including a convolutional layer and a neural network; extracting the residual signal of the first band from the reference signal through the residual signal learning engine. However, Janusz teaches inputting the reference signal of the second band to a residual signal learning engine including a convolutional layer and a neural network (see Janusz pg2. sect 3.1. Conditioning Without conditioning information, SampleRNN is only capable of “babbling”. Hence, we provide decoded vocoder parameters, hf , as conditioning information to the model. Eq. 1 thus becomes where hf represents the vocoder parameters corresponding to the audio sample at time i. Waveform samples xi−FS(k) , . . . , xi−1 and decoded vocoder conditioning vector hf processed by respective 1 × 1 convolution layers.  Waveform samples xi, interpreted as reference 
    PNG
    media_image4.png
    435
    307
    media_image4.png
    Greyscale
signal of second band); extracting the residual signal of the first band from the reference signal through the residual signal learning engine(see Janusz, Section 3. pg. 2, section 3. SampleRNN is a deep neural generative model proposed in [2] for generating raw audio signals; the output of the convolution layer is interpreted as the residual signal of first band);
Vos and Janusz are considered to be analogous to the claimed invention because they relate to digital signal coding techniques for speech processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Vos for encoding the source signal techniques with the generative network architecture teachings of Janusz to improve the rate-quality trade-off (see Janusz, pg. 1, section 1).
Regarding claim 6, Vos and Janusz teach the method of claim 5. Janusz further teaches wherein the reference signal is converted into a feature map including a frame and a frequency bin and set to be an input of the residual signal learning engine (see Janusz, pg. 1 sect 2 , the encoder scheme is based on a wide-band version of a linear prediction coding (LPC) vocoder [7]. Signal analysis is performed on a per-frame basis, and it results in the following parameters: i) an M-th order LPC filter, ii) an LPC residual RMS level s, iii) pitch f0, and, iv) a k-band voicing vector v. A voicing component v(i), i = 1, . . . , k gives the fraction of periodic energy within a band. All these parameters are used for conditioning of SampleRNN, as described in Section 3).  
Regarding claim 7, Vos and Janusz teach the method of claim 5. Janusz further teaches wherein the reference signal is convoluted through a filter defined based on a number of bins and a number of frames in the convolutional layer so that a feature is extracted (see Janusz, pg. 2, section 3.1 Waveform samples xi−FS(k) , . . . , xi−1 and decoded vocoder conditioning vector hf processed by respective 1 × 1 convolution layers are the inputs to k-th tier. When k < K, the output from (k + 1)-th tier is additional input).
Regarding claim 8,  Vos and Janusz teach the method of claim 5. Janusz further teaches wherein a node of an output layer of the neural network is mapped to an index of a quantization level of the residual signal (see Janusz, pg.2, sect 2, The residual level s is quantized in the dB domain using a hybrid approach similar to that in [22]. Small level inter-frame variations are detected, signalled by one bit, and coded by a predictive scheme using fine uniform quantization. In other cases the coding is memoryless with a larger, yet uniform, step-size covering a wide range of levels).
Regarding claim 9, Vos and Janusz teach the method of claim 5. Janusz further teaches wherein an output value of an output layer of the neural network is obtained through one-hot coding (see Janusz, pg. 2, sect 3.1, the structure of a conditional SampleRNN is illustrated in Fig. 2. In a K-tier conditional SampleRNN, the k-th tier (1 < k ≤ K) operates on non-overlapping frames of length FS(k) samples at a time, and the lowest tier (k = 1) predicts one sample at a time).
Regarding claim 10, Vos teaches an audio processing apparatus comprising: a processor (see Vos, [0106]), wherein the processor is configured to determine a residual signal of a first band in an original signal of an entire band(see Vos, [0005, 0051] according to many speech coding algorithms such as those using Linear Predictive Coding (LPC), a short-term filter is used to separate out the speech signal into two separate components: (i) a signal representative of the effect of the time-varying filter 104; and (ii) the remaining signal with the effect of the filter 104 removed, which is representative of the source signal. The signal representative of the effect of the filter 104 may be referred to as the spectral envelope signal, and typically comprises a series of sets of LPC parameters describing the spectral envelope at each stage. Once the varying spectral envelope is removed, the remaining signal representative of the source alone may be referred to as the LPC residual signal, as shown schematically in FIG. 2a; The encoder 500 further comprises a high-pass filter 502, a linear predictive coding (LPC) analysis block 504, a first vector quantizer 506, an open-loop pitch analysis block 508, a long-term prediction (LTP) analysis block 510; LPC residual signal interpreted as residual signal of first band), determine a reference signal of a second band remaining except for the first band from the original signal of the entire band(see Vos, [0047] the speech input signal is input to a voice activity detector 501. The voice activity detector is arranged to determine a measure of voicing activity, and spectral tilt and signal to noise estimate, for each frame. The voice activity detector uses a sequence of half-band filter banks to split the signal into four sub-bands; signal output of voice activity detector interpreted as reference signal of second band), remove an envelope from the residual signal and the reference signal (see Vos, [0005] once the varying spectral envelope is removed, the remaining signal representative of the source alone may be referred to as the LPC residual signal, as shown schematically in FIG. 2a), remove a pitch from the residual signal and the reference signal from which the envelope is removed(see Vos, [0008] the effect of this inter-period correlation is then removed from the LPC residual, leaving an LTP residual signal representing the source signal with the effect of the correlation between pitch periods removed. To represent the source signal, the LTP vectors and LTP residual signal are encoded separately for transmission). However, Vos fails to teach, wherein the residual signal of the first band and the reference signal of the second band are set to be an output and an input of a residual signal learning engine including a convolutional layer and a neural network so as to be trained. However, Janusz teaches wherein the residual signal of the first band and the reference signal of the second band are set to be an output and an input of a residual signal learning engine including a convolutional layer and a neural network so as to be trained (see Janusz, pg. 2, section 3.1 Waveform samples xi−FS(k) , . . . , xi−1 and decoded vocoder conditioning vector hf processed by respective 1 × 1 convolution layers are the inputs to k-th tier. When k < K, the output from (k + 1)-th tier is additional input; Waveform samples xi−FS(k) , . . . , xi−1 is interpreted as the reference signal of second band to convolution layer of learning engine and conditioning vector hf is interpreted as the residual signal of second band).
Vos and Janusz are considered to be analogous to the claimed invention because they relate to digital signal coding techniques for speech processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Vos for encoding the source signal techniques with the generative network architecture teachings of Janusz to improve the rate-quality trade-off (see Janusz, pg. 1, section 1).
Regarding claim 11, Vos and Janusz teach the audio processing apparatus of claim 10. Janusz further teaches wherein the reference signal is converted into a feature map including a frame and a frequency bin and set to be an input of the residual signal learning engine (see Janusz, pg. 1 sect 2 , the encoder scheme is based on a wide-band version of a linear prediction coding (LPC) vocoder [7]. Signal analysis is performed on a per-frame basis, and it results in the following parameters: i) an M-th order LPC filter, ii) an LPC residual RMS level s, iii) pitch f0, and, iv) a k-band voicing vector v. A voicing component v(i), i = 1, . . . , k gives the fraction of periodic energy within a band. All these parameters are used for conditioning of SampleRNN, as described in Section 3).  
	Regarding claim 12, Vos and Janusz teach the audio processing apparatus of claim 10. wherein the reference signal is convoluted through a filter defined based on a number of bins and a number of frames in the convolutional layer so that a feature is extracted (see Janusz, pg. 2, section 3.1 Waveform samples xi−FS(k) , . . . , xi−1 and decoded vocoder conditioning vector hf processed by respective 1 × 1 convolution layers are the inputs to k-th tier. When k < K, the output from (k + 1)-th tier is additional input).
	Regarding claim 13, Vos and Janusz teach the audio processing apparatus of claim 10. Janusz further teaches wherein a node of an output layer of the neural network is mapped to an index of a quantization level of the residual signal (see Janusz, pg.2, sect 2, The residual level s is quantized in the dB domain using a hybrid approach similar to that in [22]. Small level inter-frame variations are detected, signalled by one bit, and coded by a predictive scheme using fine uniform quantization. In other cases the coding is memoryless with a larger, yet uniform, step-size covering a wide range of levels).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Schmidt et. al., (WO 2019081070) teaches using neural network used for generating a bandwidth-extended audio signal (see Schmidt, Fig. 5).	Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 2:00pm - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/NANDINI SUBRAMANI/Examiner, Art Unit 2656        

/EDGAR X GUERRA-ERAZO/Primary Examiner, Art Unit 2656