DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 4 is objected to because of the following informalities:  line 2 recites “MFCC” but should indicate that this acronym stands for Mel Frequency Cepstrum Coefficients by reciting “Mel Frequency Cepstrum Coefficients (MFCC).”  Appropriate correction is required.
Claim 5 is objected to because of the following informalities:  line 2 recites “fbank” but should indicate that this acronym stands for filterbank by reciting “filterbank (fbank).”  Appropriate correction is required.
Claims 9-10 are objected to because of the following informalities:  line 2 recites “GMM-HMM” but should indicate that this acronym stands for “Gaussian Mixture Model-Hidden Markov Model”  by reciting “Gaussian Mixture Model-Hidden Markov Model (GMM-HMM).”  Appropriate correction is required.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:



Claim 1, and therefore claims 2-13 which depend therefrom, are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation "the first speech data" in lines 7-8, and “"the speech data" in line 10.  There is insufficient antecedent basis for these limitations in the claim, and it is unclear to what other limitation these limitations were intending to refer back to.
Claims 6 and 8, and therefore claim 7 which depends therefrom is further rejected as being indefinite because in line 7 of claims 6 and 8 is recited “the frame-aligned first speech training data” (claim 6) and “the frame aligned second speech training data” (claim 8), and there is no antecedent basis for these limitations in the claim, and it is unclear to what other limitation these limitations were intending to refer back to.
Further regarding claim 8, line 6 recites “the forced alignment operation.” There is insufficient antecedent basis for these limitations in the claim, and it is unclear to what other limitation this limitation was intending to refer back to.




Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-5, 13, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Gao et al., "An experimental study on joint modeling of mixed-bandwidth data via deep neural networks for robust speech recognition," 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 2016, pp. 588-594, doi: 10.1109/IJCNN.2016.7727253 (herein “Gao NPL”) in view of Seltzer et al., "Training wideband acoustic models using mixed-bandwidth training data via feature bandwidth extension," Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., Philadelphia, PA, USA, 2005, pp. I/921-I/924 Vol. 1, doi: 10.1109/ICASSP.2005.1415265 (herein “Seltzer NPL”) in view of Maas et al., “Building DNN Acoustic Models for Large Vocabulary Speech Recognition,” arXiv:1406.7806v2 [cs.CL], January 20, 2015 (herein “Maas NPL”).
Regarding claim 1, Gao NPL teaches a training method of a hybrid frequency acoustic recognition model, wherein a unified hybrid frequency acoustic recognition model is formed by training (Gao NPL Introduction, constructing a unified model for mixed-band speech recognition with large-scale mixed-band speech data for training hybrid acoustic models via hidden Markov model HMM) to respectively perform acoustic recognition on a first speech signal having a first sampling frequency and to perform the acoustic recognition on a second speech signal having a second sampling frequency (Gao NPL pages 592-593, results section – the trained model performs speech recognition on 8khz and 16kHz sampled speech data); the training method of the hybrid frequency acoustic recognition model specifically comprises (Gao NPL Abstract, Introduction, joint modeling strategies for training hybrid acoustic models): 
step S1, obtaining a first-type speech feature of the first speech signal (Gao page 590, section III, for feature extraction, adopting 24-dimensional wideband LMFB features from wideband speech, where four datasets are referenced including Wideband_Ori, which is wideband speech (first-type speech feature) that is from the original wideband speech (of the first speech signal)), and processing the first speech Gao page 590 section III, datasets used in the training strategy including Wideband_Ori, original wideband speech); 
step S2, the first-type speech feature of the second speech signal (Gao page 590, section III, 24 dimensional LMFB features from narrowband signal, where four datasets are referenced including Wideband_US, which is wideband speech (first-type speech feature) that is up-sampled from the original narrowband speech (of the second speech signal)), and processing the second speech data to obtain corresponding second speech training data (Gao page 590 section III, datasets used in the training strategy (training data) including Wideband_US, wideband speech up-sampled from the original narrowband speech); 
step S3, obtaining a second-type speech feature of the first speech signal (Gao page 590, datasets including Narrowband_DS which is narrowband LMFB features (second type speech feature) from wideband signal (the first speech signal)) according to a power spectrum of the first speech signal (Gao page 588, right column, the log-Mel filterbank is used to obtain the features, where log-Mel defines a power spectrum and is applied to the wideband signal (first speech signal)), and obtaining the second-type speech feature of the second speech signal (Gao page 590, datasets including (narrowband LMFB features from narrowband signal)) according to a power spectrum of the second speech signal (Gao page 588, right column, the log-Mel filterbank is used to obtain the features, where log-Mel defines a power spectrum and is applied to the narrowband signal (second speech signal)); 
Gao page 588, right column and page 590, section II, the first step in the joint modeling strategy is to train a regression DNN-BWE by mapping the features from the narrowband (second speech signal) to wideband speech (first speech signal), where the DNN-BWE model is input/mapped to (thus "of the") a classification DNN to jointly train a DNN acoustic model (the hybrid frequency acoustic recognition model)); and 
step S5, performing supervised parameter training on the preliminary recognition model according to the first speech training data, the second speech training data and the second-type speech feature (Gao NPL section II, training of the regression DNN using narrowband (second speech training data) and wideband speech data (first speech training data) pairs, where the downsampled wideband (second-type speech feature) speech as LMFB (log-Mel filterbank) features are used), so as to form the hybrid frequency acoustic recognition model (Gao NPL sections I and IV, the regression DNN-BWE is updated in a later joint training step, thus the first step of training the regression DNN being "so as to" form through joint training of the unified model for mixed-band speech recognition).
While Gao NPL does teach the training of the DNN-BWE, Gao does not explicitly teach that it is supervised training. 
Further, while Gao NPL does teach that one of the "four datasets" contemplated for the system of Gao NPL is a wideband speech upsampled from narrowband speech, 
Seltzer NPL further teaches obtaining the first-type speech feature of the second speech signal (Seltzer NPL page 922, sections 3, 3.1, 3.2, the HMM training using narrowband speech (second speech signal) that is upsampled through bandwidth extension to generate cepstral feature vectors of the wideband (first-type speech feature))
Maas NPL teaches supervised parameter training (Maas NPL page 11, section D, training labels are created for DNN acoustic model training in a supervised training process, where the training as set forth in the Experiments subsection is on a 100M parameter DNN (thus parameter training)).
Therefore, taking the teachings of Gao NPL and Seltzer NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the mixed-bandwidth speech recognition disclosed in Gao NPL with the determining of wideband feature vectors from the narrowband signal as disclosed in Seltzer NPL at least because doing so would reduce the amount of time and cost of expensive wideband training data while still maintaining recognition accuracy (Seltzer NPL Abstract).
Further, taking the teachings of Gao NPL and Maas NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the mixed-bandwidth speech recognition disclosed in Gao NPL with the supervised training as disclosed in Maas NPL at least 
Regarding claim 2, Gao NPL teaches wherein the first sampling frequency is a sampling frequency of 16 KHz (Gao NPL section III, wideband speech at 16KHz).
Regarding claim 3, Gao NPL teaches wherein the second sampling frequency is a sampling frequency of 8 KHz (Gao NPL section III, narrowband speech at 8KHz).
Regarding claim 4,  while Gao NPL teaches that log-Mel filterbank (LMFB) features are used in characterizing the narrowband and wideband speech, Gao NPL does not explicitly teach MFCC feature.
Seltzer NPL teaches wherein the first-type speech feature is an MFCC feature (Seltzer NPL section 2, the mel-frequency cepstral coefficients are the features used for speech recognition).
Therefore, taking the teachings of Gao NPL and Seltzer NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the mixed-bandwidth speech recognition disclosed in Gao NPL with the mel-frequency cepstral coefficients as features as disclosed in Seltzer NPL at least because doing so would be applying a feature set that is used for speech recognition, and thus appropriate for the frequencies and anticipated energy levels used in speech audio (Seltzer NPL page 921, section 1 and section 2, narrowband and wideband speech being within a frequency band of 0 to 8Khz, where the mel filterbank from which the MFCC features are generated represents this range in a series of frequency regions
Regarding claim 5, Gao NPL teaches wherein the second-type speech feature is a fbank feature (Gao NPL page 588, Introduction section, log-Mel filterbank (fbank) features from narrowband speech).
Regarding claim 13, Gao NPL teaches wherein wherein the hybrid frequency acoustic recognition model is a partially connected deep neural network model; or the hybrid frequency acoustic recognition model is a fully connected deep neural network model (Gao NPL page 591, the concatenated DNN-BWE and DNN-AM having the connections between the layers as shown by the up/down arrows in fig. 3, where the claim recites the alternative “or” and either all the nodes of the DNN are connected in a fully-connected DNN or some aren’t connected for a partially connected DNN).
Regarding claim 15, Gao NPL teaches wherein in the step S5, the supervised parameter training is performed on the preliminary recognition model by using a stochastic gradient descent method according to the first speech training data, the second speech training data and the second-type speech feature, so as to form the hybrid frequency acoustic recognition model (Gao NPL page 589 section II, in the training procedure of the DNN-BWE (preliminary recognition model), the stochastic gradient descent algorithm is used to minimize a loss-function according to equation 1 which includes the estimated (second type speech feature) and reference wideband features (first speech training data) and narrowband features (second speech training data)).
Claims 6, 8, 9, 10 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Gao NPL in view of Seltzer NPL in view of Maas NPL, as set forth above regarding claim 1 from which claim 6 depends, further in view of Gu et al., "Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks," Interspeech 2016, San Francisco, USA, September 12, 2016, pp. 297-301 (herein “Gu NPL”).
Regarding claim 6, Gao NPL teaches the first speech signal and the first speech training data and the first-type speech feature (Gao page 590, section III, four datasets are referenced including Wideband_Ori, which is wideband speech (first speech training data) that is from the original wideband speech (the first speech signal), and where 24-dimensional wideband LMFB features are extracted from wideband speech (first-type speech feature)), but does not explicitly teach the remainder of the limitations of claim 6.
Gu NPL teaches wherein in the step S1, the method for processing the speech signal to obtain the speech training data specifically comprises: step S11, performing training by using the speech feature to form a first acoustic model (Gu NPL page 299, section 4, at training stage of DNN with a BN layer for HMM state classification (first acoustic model), input features are multi-frame MFCCs (speech feature)). 
and step S12, performing a forced alignment operation on the first speech signal by using the first acoustic model to form the frame-aligned first speech training data (Gu NPL, page 299 section 4, target outputs are HMM state labels generated by forced alignment using trained GMM-HMM models, forming a BN feature extractor that extracts BN feature vectors for all frames (frame-aligned first speech training data)).
Therefore, taking the teachings of Gao NPL and Gu NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the mixed-bandwidth speech recognition Gu NPL section 4).
Regarding claim 8, Gao NPL teaches the second speech signal and the second speech training data and the second-type speech feature (Gao page 590, section III, four datasets are referenced including Narrowband_Ori, which is wideband speech (first speech training data) that is from the original narrowband speech (the second speech signal), and where 24-dimensional narrowband LMFB features are extracted from narrowband speech (second-type speech feature)), but does not explicitly teach the remainder of the limitations of claim 8.
Gu NPL teaches wherein in the step S2, the method for processing the speech signal to obtain the speech training data specifically comprises: step S21, performing training by using the speech feature to form a second acoustic model (Gu NPL page 299, section 4, at training stage of DNN with a BN layer for HMM state classification (first acoustic model), input features are multi-frame MFCCs (speech feature)). 
and step S22, performing a forced alignment operation on the speech signal by using the second acoustic model to form the frame-aligned second speech training data (Gu NPL, page 299 section 4, target outputs are HMM state labels generated by forced alignment using trained GMM-HMM models, forming a BN feature extractor that extracts BN feature vectors for all frames (frame-aligned first speech training data)).
Although Gu NPL explicitly teaches calculating a model and forced alignment for one type of speech signal, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the teachings of 
Further, taking the teachings of Gao NPL and Gu NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the mixed-bandwidth speech recognition disclosed in Gao NPL with the forced alignment from an acoustic model as disclosed in Gu NPL at least because doing so would provide additional linguistic information that is beneficial to the reconstruction of a frequency component (Gu NPL section 4).
Regarding claim 9, Gao NPL does not explicitly teach the limitations of claim 9.
Seltzer NPL teaches wherein the first acoustic model is a GMM-HMM acoustic model (Seltzer NPL page 922, section 3, in a second stage, an HMM is trained using speech modeled as a GMM, thus the resulting model being a GMM-HMM model).
Therefore, taking the teachings of Gao NPL and Seltzer NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the mixed-bandwidth speech recognition disclosed in Gao NPL with the HMM training as disclosed in Seltzer NPL at least because doing so would produce acoustic models that outperform other typically trained acoustic models (Seltzer NPL page 921, section 1).
Regarding claim 10, Gao NPL does not explicitly teach the limitations of claim 10.
Seltzer NPL page 922, section 3, in a first stage, an HMM is trained using speech modeled as a GMM, thus the resulting model being a GMM-HMM model).
Therefore, taking the teachings of Gao NPL and Seltzer NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the mixed-bandwidth speech recognition disclosed in Gao NPL with the HMM training as disclosed in Seltzer NPL at least because doing so would produce acoustic models that outperform other typically trained acoustic models (Seltzer NPL page 921, section 1).
Regarding claim 14, Gao NPL teaches the deep neural network model is pre-trained according to the first speech signal and the second speech signal to form the preliminary recognition model of the hybrid frequency acoustic recognition model (Gao page 588, right column and page 590, section II, the first step in the joint modeling strategy is to train a regression DNN-BWE by mapping the features from the narrowband (second speech signal) to wideband speech (first speech signal), where the DNN-BWE model is input/mapped to (thus "of the") a classification DNN to jointly train a DNN acoustic model (the hybrid frequency acoustic recognition model)).
Gao does not explicitly teach by using a restricted Bolzmann machine.
Gu teaches by using a restricted Bolzmann machine (Gu Introduction, restricted Boltzmann machines are used to model the relationship between low-frequency speech parameters to high-frequency ones, where page 298, left column from section 2 discloses that RBMs are used in the pre-training).
.
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Gao NPL in view of Seltzer NPL in view of Maas NPL in view of Gu NPL, as set forth above regarding claim 1 from which claim 6 depends, further in view of Bauer et al., "Automatic recognition of wideband telephone speech with limited amount of matched training data," 2014 22nd European Signal Processing Conference (EUSIPCO), Lisbon, Portugal, 2014, pp. 1232-1236 (herein “Bauer NPL”).
Regarding claim 7, Gao NPL does not explicitly teach the limitations of claim 7. Bauer NPL teaches wherein in the step S2, the first-type speech feature of the second speech signal is obtained by using a triphone decision tree the same as the first acoustic model (Bauer NPL pages 1233-1234, sections 3.2-3.3, two cycles of decision tree training and LDA are used for the creation of a tri-phone HMM inventory in the plain MFCC feature space). 
Therefore, taking the teachings of Gao NPL and Bauer NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the mixed-bandwidth speech recognition Bauer NPL section1, page 1232, and section 5, page 1235).

Allowable Subject Matter
Claims 11 and 12 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: 
Regarding claim 11, the closest prior art of record is Gao NPL. Gao is directed towards using a log-Mel filter bank in obtaining narrowband speech features. Gao further teaches that the first speech signal is downsampled to obtain narrowband features, which presumably represent a low frequency portion only. Gao NPL, nor any of the other cited references of record in a combination that would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, does not explicitly teach that in the obtaining of the second-type speech feature in the first speech signal, a high frequency portion and a low frequency portion are combined.
Regarding claim 12, the closest prior art of record is Gao NPL. Gao is directed towards using a log-Mel filter bank in obtaining narrowband speech .


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Garudadri et al., US 7,089,178 B2, directed towards a voice recognition system that obtains acoustic features at multiple frequencies by extracting high frequency components including feature extraction of speech sampled at 8 kHz and 16 Khz. Garudadri does not provide teachings regarding neural networks.
Wu et al., US 2018/0040336 A1, directed towards bandwidth expansion using a supervised regression process. Wu does not provide teachings regarding the training of a hybrid frequency acoustic recognition model.
Gu et al., "Restoring high frequency spectral envelopes using neural networks for speech bandwidth extension," 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 2015, pp. 1-8, doi: 10.1109/IJCNN.2015.7280483. Gu teaches generating wideband and 
Seltzer et al., "Training Wideband Acoustic Models Using Mixed-Bandwidth Training Data for Speech Recognition," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 1, pp. 235-245, Jan. 2007, doi: 10.1109/TASL.2006.876774. Seltzer is directed towards acoustic modeling with MFCCs using a GMM-HMM with wideband and narrowband data. The acoustic models trained in Seltzer are wideband and not hybrid-frequency.
Wang et al., "A Joint Training Framework for Robust Automatic Speech Recognition," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 4, pp. 796-806, April 2016, doi: 10.1109/TASLP.2016.2528171. Wang is directed towards training a DNN for automatic speech recognition. Wang does not specifically teach obtaining and processing features from 8 Khz and 16 Khz speech signals.
Ling et al., "Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 10, pp. 2129-2139, Oct. 2013, doi: 10.1109/TASL.2013.2269291. Ling is directed towards speech acoustic modeling using Boltzmann machines. Ling does not provide teachings regarding the training of a hybrid frequency acoustic recognition model.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908.  The examiner can normally be reached on M-Th, and every other Friday, 9:30a-7p..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


MICHELLE M. KOETH
Primary Examiner
Art Unit 2656



/MICHELLE M KOETH/Primary Examiner, Art Unit 2656