DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-19 are pending in this application. 
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed on 03/23/2021.
Receipt is acknowledged of some certified copies of papers required by 37 CFR 1.55.
A translation of said application has not been made of record in accordance with 37 CFR 1.55. See MPEP §§ 215 and 216.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/27/2020, 05/06/2021, and 04/08/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
The drawings are objected to because of the following informalities: 
Fig. 14: speech signal 1400, as described in [0221] of the specification, is not shown in the figure;
Fig. 15: speech signal 1500, as described in [0230] of the specification, is not shown in the figure.  
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Objections
Claim 13 is objected to because of the following informalities:  the claim recites “a corresponding fourth feature extracting module”, where a fourth feature extracting module has already been recited in claim 12, on which claim 13 is dependent. This claim element is lacking proper antecedent basis, but will be interpreted as referring to the “fourth feature extracting module” recited in claim 12 in the interest of compact prosecution.  Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim(s) 1 and 18, the limitation(s) of “extracting low frequency feature information” and “transmitting a speech signal”, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. More specifically, the mental process of a human calculating feature information using speech data values written on paper and reading the calculation results to another human.. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the --Mental Processes-- grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application because the recitation of an “apparatus”, “transceiver”, “memory”, “processor”, and “receiving end” in claim 18 reads to generalized computer components, based upon the claim interpretation wherein the structure is interpreted using [0220-37] in the specification. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using generalized computer components to extract and transmit amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
	
	
With respect to claim(s) 2, the claim(s) recite(s) “extracting” and “obtaining”, which reads on a human using a particular model to calculate the desired results at different steps in the process. No additional limitations are present.

With respect to claim(s) 3, the claim(s) recite(s) “performing feature extraction” and “outputting”, which reads on a human using a particular model with specific input parameters and writing down the results of the calculation. No additional limitations are present.

With respect to claim(s) 4 and 5, the claim(s) recite(s) the precise nature of the parameters used for the calculations. No additional limitations are present.

With respect to claim(s) 6, the claim(s) recite(s) “fusing one or more low frequency feature information”, which reads on a human using a particular model to combine different features into one value. No additional limitations are present.

With respect to claim(s) 7, the claim(s) recite(s) the nature of the calculation results. No additional limitations are present.

With respect to claim(s) 8, the claim(s) recite(s) “obtaining”, which reads on a human performing a particular type of calculation in order to get a particular result. No additional limitations are present.

With respect to claim(s) 19, the claim(s) recite(s) a “non-transitory computer-readable storage medium” and “computer programs”, which reads on generalized computer components as per the specifications at [0239-40]. No additional limitations are present.

These claims further do not remedy the judicial exception being integrated into a practical application and further fail to include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim(s) 9, the limitation(s) of “receiving”, “extracting”, and “outputting”, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. More specifically, the mental process of human hearing and writing down data related to speech values, performing a calculation to extract specific information from the data, and writing down the results. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the --Mental Processes-- grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application because the recitation of a “transmitting end” reads to generalized computer components, based upon the claim interpretation wherein the structure is interpreted using [0220-37] in the specification. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using generalized computer components to receive, extract, and output amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.

With respect to claim(s) 10, the claim(s) recite(s) “performing data replication”, which reads on a human performing a particular calculation on the data set. No additional limitations are present.

With respect to claim(s) 11, the claim(s) recite(s) “extracting” and “recovering”, which reads on a human performing a particular series of calculations in order to obtain specific information. No additional limitations are present.
With respect to claim(s) 12, the claim(s) recite(s) “extracting”, “obtaining”, “recovering”, “performing” and “obtain”, which reads on a human performing a particular series of calculations using a specific set of models in order to determine the information desired at each step of the calculation. No additional limitations are present.

With respect to claim(s) 13, the claim(s) recite(s) “performs”, “outputs”, “performs fusing”, and “outputs”, which reads on a human performing specific calculations using a particular set of variables, and writes down the results at each stage of the calculation. No additional limitations are present.

With respect to claim(s) 14, the claim(s) recite(s) precise nature of the parameters used for the calculations. No additional limitations are present.

With respect to claim(s) 15, the claim(s) recite(s) “extracting” and “extracting”, which reads on a human performing a specific series of calculations using a specific set of models. No additional limitations are present.

With respect to claim(s) 16, the claim(s) recite(s) “performs”, “outputs”, “extracts” and “recovers”, which reads on a human performing a specific series of calculations using a particular set of variables and set of models to obtain desired results. No additional limitations are present.
With respect to claim(s) 17, the claim(s) recite(s) “fusing” and “obtain”, which reads on a human performing a specific series of calculations using a particular set of models to obtain desired results. No additional limitations are present.

These claims further do not remedy the judicial exception being integrated into a practical application and further fail to include additional elements that are sufficient to amount to significantly more than the judicial exception.
	
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 2, 6, 7, 9-12, 15, and 17 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Hetherington et al. (US PG Pub No. 2006/0247922), as found in the IDS, hereinafter Hetherington.
Regarding claim 1, Hetherington teaches
A method of transmitting speech signal (a method of improving speech signals in communications systems [0001:1-3]), the method comprising:

extracting low frequency feature information from an input speech signal by using a first feature extracting network (the high frequency compressor of a high frequency encoder, i.e. by using a first feature extracting network, receives the speech signal in a frequency domain and splits the signal into two components, i.e. extracting...from an input speech signal, which are a high frequency and a low frequency component, i.e. low frequency feature information [0049],[0052:1-17]); and
transmitting a speech signal corresponding to the low frequency feature information to a receiving end (the lower frequency components of the speech signal are left unchanged, i.e. speech signal corresponding to the low frequency feature information, and are combined with compressed high frequencies to be down-sampled and transmitted over a communication channel to a receiver, i.e. transmitting...to a receiving end [0048],[0052:17-34],[0055:1-12]).  

Regarding claim 2, Hetherington teaches claim 1, and further teaches
the first feature extracting network comprises at least one first feature extracting module and at least one second feature extracting module (the high frequency compressor, i.e. first feature extracting network, includes a time-domain-to-frequency-domain transform, i.e. first feature extracting module, and a high frequency compressor, i.e. second feature extracting module [0049]), and wherein the extracting the low frequency feature information from the input speech signal by using the first feature extracting network comprises:
extracting speech feature information of the input speech signal by using the at least one first feature extracting module (the digitized input speech signal is transformed from the time-domain into the frequency domain, i.e. extracting speech feature information of the input speech signal, using the time-domain-to-frequency-domain transform, i.e. using the at least one first feature extracting module [0050]); and
obtaining the low frequency feature information according to the extracted speech feature information by using the at least one second feature extracting module (the speech signal in the frequency domain, i.e. according to the extracted speech feature information, is separated into high frequency and low frequency components, i.e. obtaining the low frequency feature information, by the high frequency compressor, i.e. using the at least one second feature extracting module [0052:1-17]).  

Regarding claim 6, Hetherington teaches claim 1, and further teaches
fusing one or more low frequency feature information output by the first feature extracting network by using a first feature fusing network, to obtain the speech signal corresponding to the low frequency feature information (the high frequency components are compressed, and the compressed high frequency components and low frequency components, i.e. one or more low frequency feature information output by the first feature extracting network, are combined in a combiner, i.e. fusing...using a first feature fusing network, to output a combined signal, i.e. obtain the speech signal corresponding to the low frequency feature information [0052]).  
Regarding claim 7, Hetherington teaches claim 1, and further teaches
the low frequency feature information extracted by the first feature extracting network comprises relevant information between high frequency features and low frequency features (the high frequency compressor of a high frequency encoder, i.e. first feature extracting network, receives the speech signal in a frequency domain and splits the signal into two components, which are a high frequency and a low frequency component, i.e. comprises relevant information between high frequency features and low frequency features, where the high frequency component is compressed and combined with the unchanged low frequency component [0049],[0052]).  

Regarding claim 9, Hetherington teaches
A method for receiving speech signal (a method of improving speech signals in communications systems [0001:1-3]), the method comprising:
receiving a first speech signal transmitted by a transmitting end (the receiver receives the speech signals, i.e. receiving a first speech signal, transmitted over the communication channel by the transmitter, i.e. transmitted by a transmitting end [0048]);
extracting low frequency feature information from the first speech signal and recovering high frequency feature information based on the low frequency feature information, by using a second feature extracting network (the bandwidth extender, i.e. second feature extracting network, receives the band limited speech signal, i.e. first speech signal [0056], processes the signal so that the low frequency portion of the signal remains unchanged, i.e. extracting low frequency feature information, and a portion of the received speech signal is expanded to fill the higher frequency range, i.e. recovering high frequency feature information based on the low frequency feature information [0043-5]); and
outputting a second speech signal comprising the low frequency feature information and the high frequency feature information (the expanded speech signal, i.e. second speech signal comprising the low frequency feature information and the high frequency feature information, is reproduced by a loud speaker for the benefit of the receiver’s user, i.e. outputting [0059]).  

Regarding claim 10, Hetherington teaches claim 9, and further teaches
performing data replication on the first speech signal to expand data scale of the first speech signal before the extracting the low frequency feature information from the first speech signal and recovering the high frequency feature information by using the second feature extracting network (the bandwidth extender includes an up sampler, i.e. using the second feature extracting network, that receives the digitized speech signal, i.e. first speech signal, and samples the signal at a sample rate corresponding to the highest rate of the intended highest frequency of the expanded signal, i.e. performing data replication ... to expand data scale of the first speech signal [0056-7], prior to further processing where the low frequency portion of the signal remains unchanged, i.e. extracting low frequency feature information from the first speech signal, and a portion of the received speech signal is expanded to fill the higher frequency range, i.e. recovering high frequency feature information based on the low frequency feature information [0043-5]).  

Regarding claim 11, Hetherington teaches claim 9, and further teaches
extracting the low frequency feature information from the first speech signal by using a low frequency feature extracting network in the second feature extracting network, wherein the low frequency feature information comprises relevant information between high frequency features and low frequency features (the bandwidth extender, i.e. second feature extracting network, receives the band limited speech signal, i.e. first speech signal [0056], and uses a spectral envelope extender, i.e. using a low frequency feature extracting network, with a frequency demapping matrix that maps the lower frequency bins of the received compressed speech signal to the higher frequency bins of the extended frequencies of the uncompressed signal, i.e. the low frequency feature information comprises relevant information between high frequency features and low frequency features, which is further shaped by the gain controller to provide an output, i.e. extracting the low frequency feature information [0058]); and
 recovering the high frequency feature information according to the low frequency feature information and performing fusing processing on the high frequency feature information and the low frequency feature information, by using a high frequency feature extracting network in the 82second feature extracting network, to obtain feature information comprising the high frequency feature information and the low frequency feature information (the bandwidth extender, i.e. second feature extracting network, uses an excitation signal generator and combiner, i.e. high frequency feature extracting network, where the excitation signal generator creates harmonic information based on the un-expanded signal, and the combiner uses the output of the excitation signal generator to shape the expanded signal to add the proper harmonics and correct their phase relationships, i.e. recovering the high frequency feature information according to the low frequency feature information and performing fusing processing on the high frequency feature information and the low frequency feature information, where the output of the combiner is an expanded signal with proper harmonics and phase relationships between the high and low frequency components, i.e. obtain feature information comprising the high frequency feature information and the low frequency feature information [0058-9]).  

Regarding claim 12, Hetherington teaches claim 11, and further teaches
the low frequency feature extracting network comprises at least one third feature extracting module and at least one fourth feature extracting module (the bandwidth extender, i.e. second feature extracting network, includes a time-domain-to-frequency-domain transformer, i.e. third feature extracting module, and a frequency demapping matrix, i.e. fourth feature extracting module, of the spectral envelope extender [0056],[0058]), wherein the extracting the low frequency feature information from the first speech signal by using the low frequency feature extracting network in the second feature extracting network comprises:
extracting speech feature information of the first speech signal by using the at least one third feature extracting module (the digitized and upsampled signal, i.e. first speech signal, is transformed from the time-domain to the frequency domain, i.e. extracting speech feature information, by the time-domain-to-frequency-domain transformer, i.e. third feature extracting module [0057]); and
obtaining the low frequency feature information according to the extracted speech feature information by using the at least one fourth feature extracting module (the frequency demapping matrix, i.e. fourth feature extracting module, receives the frequency domain signal, i.e. according to the extracted speech feature information, and maps the lower frequency bins of the received compressed speech signal to the higher frequency bins of the extended frequencies of the uncompressed signal, which is further shaped by the gain controller to provide an output, i.e. obtaining the low frequency feature information [0058]), 
wherein the high frequency feature extracting network comprises at least one fifth feature extracting module and at least one sixth feature extracting module (the excitation signal generator and combiner, i.e. high frequency feature extracting network, uses both an excitation signal generator, i.e. fifth feature extracting module, and a combiner, i.e. sixth feature extracting module, to perform different parts of the process [0059]), wherein the recovering the high frequency feature information according to the low frequency feature information and performing the fusing processing on the high frequency feature information and the low frequency feature information comprises:
recovering the high frequency feature information according to the low frequency feature information by using the at least one fifth feature extracting module (the excitation signal generator, i.e. fifth feature extracting module, creates harmonic information based on the un-expanded signal, i.e. recovering the high frequency feature information according to the low frequency feature information [0059]); and
performing fusing processing on the high frequency feature information and the low frequency feature information extracted by a corresponding fourth feature extracting module, by using the at least one sixth feature extracting module, to obtain the feature information comprising the high frequency feature information and the low frequency feature information (the combiner, i.e. sixth feature extracting module, uses the output of the excitation signal generator to shape the expanded signal to add the proper harmonics and correct their phase relationships, i.e. performing fusing processing on the high frequency feature information and the low frequency feature information, where the output of the combiner is an expanded signal with proper harmonics and phase relationships between the high and low frequency components, i.e. obtain feature information comprising the high frequency feature information and the low frequency feature information [0058-9]).  

Regarding claim 15, Hetherington teaches claim 9, and further teaches
the second feature extracting network comprises at least one seventh feature extracting module and at least one eighth feature extracting module (the bandwidth extender, i.e. second feature extracting network, includes a time-domain-to-frequency-domain transformer, i.e. seventh feature extracting module, and a frequency demapping matrix, i.e. eighth feature extracting module, of the spectral envelope extender [0056],[0058]), wherein the extracting the low frequency feature information from the first speech signal and recovering the high frequency feature information by using the second feature extracting network comprises:
extracting speech feature information of the first speech signal by using the at least one seventh feature extracting module (the digitized and upsampled signal, i.e. first speech signal, is transformed from the time-domain to the frequency domain, i.e. extracting speech feature information, by the time-domain-to-frequency-domain transformer, i.e. seventh feature extracting module [0057]); and
 84extracting the low frequency feature information comprising relevant information between high frequency features and low frequency features according to the extracted speech feature information and recovering the high frequency feature information, by using the at least one eighth feature extracting module, to obtain feature information comprising the high frequency feature information and the low frequency feature information (the frequency demapping matrix, i.e. using the at least one eighth feature extracting module, receives the frequency domain signal and maps the lower frequency bins of the received compressed speech signal to the higher frequency bins of the extended frequencies of the uncompressed signal, which is further shaped by the gain controller to provide an output, i.e. extracting the low frequency feature information comprising relevant information between high frequency features and low frequency features [0058], and the excitation signal generator creates harmonic information based on the un-expanded signal, i.e. recovering the high frequency feature information, and the output of both the spectral envelope extender and excitation signal generator, i.e. obtain feature information comprising the high frequency feature information and the low frequency feature information, are sent to a combiner for further processing [0059]).  

Regarding claim 17, Hetherington teaches claim 9, and further teaches
fusing the feature information comprising the high frequency feature information and the low frequency feature information output by the second feature extracting network, by using a second feature fusing network, to obtain the second speech signal corresponding to the feature information comprising the high frequency feature information and the low frequency feature information (the combiner, i.e. second feature fusing network, uses the output of both the spectral envelope extender and excitation signal generator of the bandwidth extender, i.e. second feature extracting network, to shape the expanded signal to add the proper harmonics and correct their phase relationships, i.e. fusing the feature information comprising the high frequency feature information and the low frequency feature information, where the output of the combiner is an expanded signal with proper harmonics and phase relationships between the high and low frequency components, i.e. obtain the second speech signal corresponding to the feature information comprising the high frequency feature information and the low frequency feature information [0058-9]).  
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 3, 5, 8, 13, 14, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hetherington, in view of Gu et al. (U.S. PG Pub No. 2021/0375294), hereinafter Gu.

Regarding claim 3, Hetherington teaches claim 2.
While Hetherington provides processing of an input speech signal to provide frequency information, Hetherington does not specifically teach that the process is based on convolution processing parameters, and thus does not teach
performing feature extraction on input information respectively based on at least two convolution processing parameters, and outputting the extracted feature information.  
Gu, however, teaches performing feature extraction on input information respectively based on at least two convolution processing parameters, and outputting the extracted feature information (the system performs a two-dimensional dilated convolution on the sound source audio signal, i.e. input information, to extract a plurality of inter-channel features, i.e. performing feature extraction [0041], and the convolution is performed utilizing convolution kernels, inter-channel dilation coefficients, and inter-channel strides, i.e. based on at least two convolution processing parameters, and generates feature maps as inter-channel features, i.e. outputting the extracted feature information [0043]).  
Hetherington and Gu are analogous art because they are from a similar field of endeavor in processing input audio to produce a desired result. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processing of an input speech signal to provide frequency information teachings of Hetherington with the use of a dilated convolution to extract features of the audio signal as taught by Gu. The motivation to do so would have been to achieve a predictable result of enabling the processing of an audio signal from a multi-sound source (Gu [0042]).

Regarding claim 5, Hetherington in view of Gu teaches claim 3, and Gu further teaches
the convolution processing parameter comprises a convolution kernel size corresponding to a convolution operation (the system performs a two-dimensional dilated convolution on the sound source audio signal, i.e. corresponding to a convolution operation [0041], and the convolution is performed utilizing convolution kernels of different sizes, i.e. convolution processing parameter comprises a convolution kernel size [0043]).  
Where the motivation to combine is the same as previously presented.

Regarding claim 8, Hetherington teaches claim 2.
While Hetherington provides downsampling of the signal that is to be transmitted, Hetherington does not specifically teach the downsampling of the extracted speech information, and thus does not teach
down-sampling the extracted speech feature information at one or more scales.
Gu, however, teaches down-sampling the extracted speech feature information at one or more scales (the system performs a two-dimensional dilated convolution on the audio signal, i.e. extracted speech feature information [0041], and the convolution is performed to extract inter-channel features utilizing convolution kernels with different stride sizes, i.e. down-sampling...at one or more scales [0043]).
Hetherington and Gu are analogous art because they are from a similar field of endeavor in processing input audio to produce a desired result. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the downsampling of the signal teachings of Hetherington with the use of convolution kernels of different stride sizes to process data as taught by Gu. The motivation to do so would have been to achieve a predictable result of enabling the processing of an audio signal from a multi-sound source (Gu [0042]).

Regarding claim 13, Hetherington teaches claim 12, and further teaches
at least one of a plurality of feature extracting modules in the second feature extracting network performs feature extraction on input information respectively through at least two ... processing parameters, and outputs the extracted feature information (the frequency demapping matrix, i.e. fourth feature extracting module, receives the frequency domain signal, i.e. performs feature extraction on input information, and maps the lower frequency bins of the received compressed speech signal to the higher frequency bins of the extended frequencies of the uncompressed signal, i.e. through...processing parameters, which is further shaped by the gain controller using the spectral shape of the spectrum of the un-expanded signal, i.e. through...processing parameters, to provide an output, i.e. output the extracted feature information [0058]); and
 for the input high frequency feature information respectively corresponding to at least two ... processing parameters (the excitation signal generator creates harmonic information, i.e. high frequency feature information, based on the un-expanded signal, i.e. processing parameters [0059]), the at least one sixth feature extracting module respectively performs fusing processing on the high frequency feature information and the low frequency feature information (the combiner, i.e. sixth feature extracting module, uses the output of the excitation signal generator, i.e. processing parameter, and the spectral envelope extender, i.e. processing parameter, to shape the expanded signal to add the proper harmonics and correct their phase relationships, i.e. performing fusing processing on the high frequency feature information and the low frequency feature information, where the output of the combiner is an expanded signal with proper harmonics and phase relationships between the high and low frequency components [0058-9]), which is extracted by a corresponding fourth feature extracting module according to corresponding ... processing parameters (the frequency demapping matrix, i.e. fourth feature extracting module, receives the frequency domain signal and maps the lower frequency bins of the received compressed speech signal to the higher frequency bins of the extended frequencies of the uncompressed signal, which is further shaped by the gain controller to provide an output, i.e. extracted...according to corresponding processing parameters [0058]), and outputs the feature information comprising the high frequency feature information and the low frequency feature information (the combiner, i.e. sixth feature extracting module, uses the output of the excitation signal generator to shape the expanded signal to add the proper harmonics and correct their phase relationships, and the output of the combiner is an expanded signal with proper harmonics and phase relationships between the high and low frequency components, i.e. outputs feature information comprising the high frequency feature information and the low frequency feature information [0058-9]).  
While Hetherington provides the determination of different information from a received band limited speech signal using different parameters, Hetherington does not specifically teach the extraction of features from a speech signal using convolution parameters, and thus does not teach
convolution processing parameters....
Gu, however, teaches convolution processing parameters... (feature maps for an audio signal are generated by performing a dilated convolution based on an inter-channel dilation coefficient, an inter-channel stride, and convolution kernels of different sizes, i.e. convolution processing parameters [0043]).
Hetherington and Gu are analogous art because they are from a similar field of endeavor in processing input audio to produce a desired result. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the determination of different information from a received band limited speech signal using different parameters teachings of Hetherington with the use of a convolution process utilizing different coefficients and values as taught by Gu. The motivation to do so would have been to achieve a predictable result of enabling the processing of an audio signal from a multi-sound source (Gu [0042]).

Regarding claim 14, Hetherington in view of Gu teaches claim 13, and Gu further teaches
a convolution kernel size corresponding to a convolution operation (feature maps for an audio signal are generated by performing a dilated convolution, i.e. corresponding to a convolution operation, based on an inter-channel dilation coefficient, an inter-channel stride, and convolution kernels of different sizes, i.e. convolution kernel size [0043]).  
Where the motivation to combine is the same as previously presented.

Regarding claim 16, Hetherington teaches claim 15, and further teaches
the at least one seventh feature extracting module performs feature extraction on input information respectively ..., and outputs the extracted speech feature information (the digitized and upsampled signal, i.e. input information, is transformed from the time-domain to the frequency domain, i.e. outputs the extracted speech feature information, by the time-domain-to-frequency-domain transformer, i.e. seventh feature extracting module performs feature extraction [0057]); and
 the at least one eighth feature extracting module extracts the low frequency feature information from the input information respectively through at least two ... processing parameters and recovers the high frequency feature information to obtain the feature information comprising the high frequency feature information and the low frequency feature information (the frequency demapping matrix, i.e. eighth feature extracting module, receives the frequency domain signal, i.e. extracts...from the input information, and maps the lower frequency bins of the received compressed speech signal to the higher frequency bins of the extended frequencies of the uncompressed signal, i.e. through...processing parameters, which is further shaped by the gain controller using the spectral shape of the spectrum of the un-expanded signal, i.e. through...processing parameters, to provide an output that is an extended signal out to the highest frequencies of the uncompressed signal, i.e. recovers the high frequency feature information to obtain the feature information comprising the high frequency feature information and the low frequency feature information [0058]).  
While Hetherington provides the determination of different information from a received band limited speech signal using different parameters, Hetherington does not specifically teach the extraction of features from a speech signal using convolution and deconvolution parameters, and thus does not teach
at least two convolution processing parameters...;
at least two deconvolution processing parameters....
Gu, however, teaches at least two convolution processing parameters... (feature maps for an audio signal are generated by performing a dilated convolution based on an inter-channel dilation coefficient, an inter-channel stride, and convolution kernels of different sizes, i.e. at least two convolution processing parameters [0043]);
at least two deconvolution processing parameters...(feature maps for an audio signal are generated by performing a dilated convolution based on an inter-channel dilation coefficient, an inter-channel stride, and convolution kernels of different sizes, i.e. at least two convolution processing parameters [0043]).
Hetherington and Gu are analogous art because they are from a similar field of endeavor in processing input audio to produce a desired result. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the determination of different information from a received band limited speech signal using different parameters teachings of Hetherington with the use of a convolution process utilizing different coefficients and values as taught by Gu. The motivation to do so would have been to achieve a predictable result of enabling the processing of an audio signal from a multi-sound source (Gu [0042]).



Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hetherington, in view of Gu, and further in view of Wu et al. (“Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation”, Proc. Interspeech, Jul 2019.), hereinafter Wu.


Regarding claim 4, Hetherington in view of Gu teaches claim 3.
While Hetherington in view of Gu provides a convolution process using multiple coefficients and values, Hetherington in view of Gu does not specifically teach that any of the parameters are related to the receptive field, and thus does not teach
80a first convolution processing parameter corresponding to a first receptive field between adjacent samples of the speech signal, a second convolution processing parameter corresponding to a second receptive field of one pitch length, or a third convolution processing parameter corresponding to a third receptive field of at least two pitch lengths.  
Wu, however, teaches 80a first convolution processing parameter corresponding to a first receptive field between adjacent samples of the speech signal, a second convolution processing parameter corresponding to a second receptive field of one pitch length, or a third convolution processing parameter corresponding to a third receptive field of at least two pitch lengths (in a pitch-dependent dilated convolution, the receptive field lengths are changed, i.e. first receptive field between adjacent samples... second receptive field of one pitch length... third receptive field of at least two pitch lengths, corresponding to the fundamental frequency values, for use in the convolution, i.e. first...second...third convolution processing parameter (Sec. 3.2)).  
Hetherington, Gu, and Wu are analogous art because they are from a similar field of endeavor in processing audio input to produce a desired result. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the convolution process using multiple coefficients and values teachings of Hetherington, as modified by Gu, with the use of changing pitch-dependent receptive field lengths as taught by Wu. The motivation to do so would have been to achieve a predictable result of enabling the network to efficiently extend the receptive field without losing trajectory information of sequential signals (Wu, Sec. 3.2).

Claim(s) 18 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hetherington, in view of Takahashi (U.S. PG Pub No. 2013/0085751), hereinafter Takahashi.

Regarding claim 18, Hetherington teaches
An apparatus for transmitting speech signal (a system, i.e. apparatus, for improving speech signals in communications systems [0001:1-3]), the apparatus comprising:

extract low frequency feature information from an input speech signal by using a first feature extracting network (the high frequency compressor of a high frequency encoder, i.e. by using a first feature extracting network, receives the speech signal in a frequency domain and splits the signal into two components, i.e. extract...from an input speech signal, which are a high frequency and a low frequency component, i.e. low frequency feature information [0049],[0052:1-17]); and
... transmit a speech signal corresponding to the low frequency feature information to a receiving end (the lower frequency components of the speech signal are left unchanged, i.e. speech signal corresponding to the low frequency feature information, and are combined with compressed high frequencies to be down-sampled and transmitted over a communication channel to a receiver, i.e. transmit...to a receiving end [0048],[0052:17-34],[0055:1-12]).  
While Hetherington provides a system for the extraction and transmission of low frequency components of speech, Hetherington does not specifically teach that the transmitter and receiver can be a transceiver, or that the system includes a memory and processor, and thus does not teach
a transceiver;
at least one memory storing one or more instructions; and
at least one processor executing the one or more instructions and configured to:
controlling the transceiver to transmit....
Takahashi, however, teaches a transceiver (the voice communication system includes transmitter-receivers, i.e. transceiver [0024]);
at least one memory storing one or more instructions (the transmitter may be implemented by a system with program sequences, i.e. one or more instructions, stored on a computer-readable storage medium, i.e. memory [0030]); and
at least one processor executing the one or more instructions and configured to (the transmitter may be implemented by a system including a central processing unit, i.e. processor, that executes the program sequences stored in the system, i.e. executing the one or more instructions [0030]):
controlling the transceiver to transmit...(the transmitter-receivers may have both functions of the transmitter, such as transmitting encoded voice signals, i.e. controlling the transceiver to transmit, and the receiver [0028]).
Hetherington and Takahashi are analogous art because they are from a similar field of endeavor in transmitting low frequency voice signals. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the system for the extraction and transmission of low frequency components of speech teachings of Hetherington with the specific use of a transceiver, processor, and memory to perform the tasks as taught by Takahashi. The motivation to do so would have been to achieve a predictable result of enabling the implementation of the processes in a computer system (Takahashi [0030]).

Regarding claim 19, Hetherington teaches claim 1.
While Hetherington provides a system for the extraction and transmission of low frequency components of speech, Hetherington does not specifically teach that the that the system includes a memory with stored instructions, and thus does not teach
A non-transitory computer-readable recording medium having recorded thereon computer programs for performing a method of claim 1.
Takahashi, however, teaches A non-transitory computer-readable recording medium having recorded thereon computer programs for performing a method... the transmitter may be implemented by a system with program sequences, i.e. computer programs for performing a method, stored on a computer-readable storage medium, i.e. non-transitory computer-readable recording medium having recorded thereon computer programs [0030].
Hetherington and Takahashi are analogous art because they are from a similar field of endeavor in transmitting low frequency voice signals. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the system for the extraction and transmission of low frequency components of speech teachings of Hetherington with the specific use of a memory to store program sequences to perform the tasks as taught by Takahashi. The motivation to do so would have been to achieve a predictable result of enabling the implementation of the processes in a computer system (Takahashi [0030]).

Conclusion	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICOLE A K SCHMIEDER whose telephone number is (571)270-1474. The examiner can normally be reached 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NICOLE A K SCHMIEDER/Examiner, Art Unit 2659 

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659