DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 13, and therefore claim 14 which depends therefrom, are objected to because of the following informalities:  lines 2-3 recite “an adapted acoustic model,” but should recite “the adapted acoustic model,” and line 5 recites “a speech recognition process,” but should recite “the speech recognition process.”  Appropriate correction is required.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 8-9, 13, 18 and 19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kim (US 2012/0130716 A1, herein “Kim”).
Regarding claims 1, 18 and 19, Kim teaches a speech recognition [apparatus comprising processing circuitry configured to – claim 1 / method comprising – claim 18 / a non-transitory computer-readable storage medium storing a program for causing a computer to execute processing comprising – claim 19] (Kim figs. 2 and 7, Abstract, para. 95, a robot operating a speech recognition method, the robot including various circuitry, where para. 95 teaches that embodiments are implemented in computing hardware and software, the software implementing embodiments recorded on non-transitory computer-readable media): 
generate/generating, based on sensor information, environmental information relating to an environment in which the sensor information has been acquired (Kim paras. 47, 51 and 61, noise signal (sensor information) is received through the microphone (sensor) as an input, as ambient noise from the environment); 
generate/generating, based on the environmental information and generic speech data, an adapted [acoustic/environmental] model obtained by adapting a base acoustic model to the environment (Kim paras. 52-53, 66-72, via model adaptation, a pre-stored clean acoustic model (base acoustic model) is adapted to the input noisy voice signal from voice signal characteristics extracted from a voice signal received (generic speech data), where adaptation occurs if the noisy environment is detected to be a new noise); 
acquire/acquiring speech uttered in the environment as input speech data (Kim para. 50, voice signal is received through the microphone); and 
subject/subjecting the input speech data to a speech recognition process using the adapted acoustic model (Kim fig. 2, paras. 53 and 55-56, speech recognition is performed on the noisy speech signal using the acoustic model adapted to the noise).
Regarding claim 8, Kim teaches wherein the sensor information contains acoustic data acquired by a speech collection device placed in the environment (Kim paras. 48 and 51, noise signal is received through the robots input unit 10 which includes a microphone).
Regarding claim 9, Kim teaches wherein the input speech data is acquired by the speech collection device (Kim paras. 48-50, voice signal is received through the robots input unit 10 which includes a microphone).
Regarding claim 13, Kim teaches wherein when a predetermined criterion for generation of an adapted acoustic model is not satisfied, the processing circuitry is further configured to subject the input speech data to a speech recognition process using the base acoustic model (Kim paras. 65-68, and 89, the acoustic model is only adapted if “new noise” is detected (predetermined criterion), otherwise, a predetermined routine, which involves using an existing acoustic model (the base acoustic model) suited for the received noise is selected for the speech recognition process).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 2-5, 7 and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Kim, as set forth above regarding claim 1 from which claims 2-5 and 7 depend, further in view of Tang et al., "Low-Frequency Compensated Synthetic Impulse Responses For Improved Far-Field Speech Recognition," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 9, 2020, pp. 6974-6978, doi: 10.1109/ICASSP40776.2020.9054454 (herein “Tang NPL”).
Regarding claim 2, Kim does not teach the limitations of claim 2. Tang NPL teaches wherein the processing circuitry is further configured to: generate pseudo input speech data that mimics the speech uttered in the environment based on the environmental information and the generic speech data (Tang NPL sections 2 and 4.1.3, speech augmentation (pseudo input speech data) according to common augmentation procedure as disclosed in equation 1 using simulated IRs (impulse responses – environmental information) with clean speech (generic speech data)); and 
generate the adapted acoustic model using the pseudo input speech data (Tang NPL sections 4.2-4.3, automatic speech recognition models using a phone based model (thus acoustic) are trained using the augmented speech training data (pseudo input speech data)).
Therefore, taking the teachings of Kim and Tang NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the adaptation of an acoustic model disclosed in Kim with the augmented speech data disclosed in Tang NPL at least because doing so would reduce word error rate and improve performance in an automatic speech recognition system (see Tang NPL Abstract).
Regarding claim 3, Kim does not teach the limitations of claim 3. Tang NPL teaches wherein the processing circuitry is further configured to: generate, based on the environmental information, a pseudo impulse response that mimics an impulse response between a generation source of the speech and a speech collection device that collects the speech (Tang NPL sections 3.2, 3.3 and 4.1.2, synthetic IRs (pseudo impulse response) generated from recorded IRs (environmental information), the synthetic IRs to represent the IR including meta-info of loudspeaker location (source of speech) and microphone location (speech collection device)); and 
generate the pseudo input speech data based on the generic speech data and the pseudo impulse response (Tang NPL sections 4.1.3 and 2, speech augmentation according to equation 1 which convolves clean speech (generic speech data) with an IR (pseudo impulse response)).
Therefore, taking the teachings of Kim and Tang NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the adaptation of an acoustic model disclosed in Kim with the augmented speech data disclosed in Tang NPL at least because doing so would reduce word error rate and improve performance in an automatic speech recognition system (see Tang NPL Abstract).
Regarding claim 4, Kim does not teach the limitations of claim 4. Tang NPL teaches wherein the environmental information contains the impulse response, and the processing circuitry is further configured to generate the pseudo impulse response by performing a predetermined operation on the impulse response (Tan NPL section 3.3, synthetic IRs are generated by matching to real-world IRs via filtering (predetermined operation)).
Therefore, taking the teachings of Kim and Tang NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the adaptation of an acoustic model disclosed in Kim with the impulse response generation disclosed in Tang NPL at least because doing so would reduce word error rate and improve performance in an automatic speech recognition system (see Tang NPL Abstract).
Regarding claim 5, Kim does not teach the limitations of claim 5. Tang NPL teaches wherein the predetermined operation is at least one of an increase in a waveform of the impulse response in a time direction, a decrease in the waveform in the time direction, and a change in a peak value of the waveform (Tang NPL section 3.3, EQ compensation filtering changing the dB (peak) levels of the IR waveform at various frequencies).
Therefore, taking the teachings of Kim and Tang NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the adaptation of an acoustic model disclosed in Kim with the impulse response generation disclosed in Tang NPL at least because doing so would reduce word error rate and improve performance in an automatic speech recognition system (see Tang NPL Abstract).
Regarding claim 7, Kim does not teach the limitations of claim 7. Tang NPL teaches wherein the processing circuitry is further configured to generate the pseudo input speech data by performing a convolution operation of the generic speech data with the pseudo impulse response (Tang NPL sections 4.1.3 and 2, augmented speech data generated by convolving simulated IRs (pseudo impulse response ) with clean speech data (generic speech data)).
Therefore, taking the teachings of Kim and Tang NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the adaptation of an acoustic model disclosed in Kim with the augmented speech data disclosed in Tang NPL at least because doing so would reduce word error rate and improve performance in an automatic speech recognition system (see Tang NPL Abstract).
Regarding claim 11, Kim does not teach the limitations of claim 11. Tang NPL teaches wherein the processing circuitry is further configured to generate the adapted acoustic model by optimizing a parameter of the base acoustic model (Tang NPL section 4.2, automatic speech recognition training (generating the adapted) of time-delay neural networks (acoustic model) performed with a lattice-free maximum (optimizing) mutual information (parameter) criterion).
Therefore, taking the teachings of Kim and Tang NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the adaptation of an acoustic model disclosed in Kim with the training criterion disclosed in Tang NPL at least because doing so would reduce word error rate and improve performance in an automatic speech recognition system (see Tang NPL Abstract).
Regarding claim 12, Kim does not teach the limitations of claim 12. Tang NPL teaches wherein the generic speech data is speech data contained in training data for the base acoustic model (Kim sections 4.2 and 4.1.3, clean speech from LibriSpeech corpus is used to generate the augmented speech data for training the neural network).
Therefore, taking the teachings of Kim and Tang NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the adaptation of an acoustic model disclosed in Kim with the training data disclosed in Tang NPL at least because doing so would reduce word error rate and improve performance in an automatic speech recognition system (see Tang NPL Abstract).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of Tang NPL, as set forth above regarding claim 3 from which claim 6 depends, further in view of Maziewski et al., (US 2019/0043514 A1, herein “Maziewski”).
Regarding claim 6, Kim does not teach the limitations of claim 6. Tang NPL teaches wherein the environmental information contains information on a breadth of a space in the environment (Tang NPL sections 3.2 and 4.1.2, BUT ReverbDB of real-word recorded environment IRs including meta-info of environment size in cubic meters (breadth of space in the environment - volume)), and the processing circuitry is further configured to: and generate the pseudo impulse response based on the information on the breadth of the space and the reverberation time (Tang NPL section 4.1.2, environment size and a sampled reverberation time is used to obtain simulated IRs (pseudo impulse response)).
Maziewski teaches calculate a reverberation time of the space based on the input speech data (Maziewski paras. 90-91, a SRMR value is calculated for input user speech, and a reverberation time is estimated based on the SRMR value).
Therefore, taking the teachings of Kim and Tang NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the adaptation of an acoustic model disclosed in Kim with the impulse response generation disclosed in Tang NPL at least because doing so would reduce word error rate and improve performance in an automatic speech recognition system (see Tang NPL Abstract).
	Further taking the teachings of Kim and Maziewski together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the adaptation of an acoustic model disclosed in Kim with the reverberation time calculation disclosed in Maziewski at least because doing so would provide better adaptation to an acoustical environment for better user experience in voice controlled devices (see Maziewski para. 15).
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Kim, as set forth above regarding claim 1 from which claim 10 depends, further in view of Kim et al., "Room Layout Estimation with Object and Material Attributes Information using a Spherical Camera," Centre for Vision Speech and Signal Processing, University of Surrey, FGA, University of Brasilia, 3DV, 2016, pp. 4321-4329 (cited by Applicant on the IDS filed 2/26/2021, herein “Kim3 NPL”).
Regarding claim 10, Kim does not teach the limitations of claim 10. Kim3 NPL teaches wherein the sensor information includes at least one of image data and point cloud data acquired by an imaging device placed in the environment (Kim3 NPL section 3, a stereo pair of images (sensor information) is obtained from a Spheron camera (sensor) placed in a room (environment) to determine information about the room).
Therefore taking the teachings of Kim and Kim3 NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the adaptation of an acoustic model disclosed in Kim with the stereo images captured by a camera disclosed in Kim3 NPL at least because doing so would provide complete scene coverage for accurate room layout and characteristic detections (see Kim3 NPL sections 2.1 and 1).
Claims 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Kim as set forth above regarding claim 1 from which claim 15 depends, further in view of Fujimura et al., (US 2008/0201136 A1, herein “Fujimura”).
Regarding claim 15, Kim does not teach the limitations of claim 15. Fujimura teaches wherein the processing circuitry is further configured to: generate a first speech recognition result by subjecting the input speech data to a speech recognition process using the base acoustic model (Fujimura paras. 29 and 20-22, a first speech recognition is performed on input speech using a first acoustic model that is invariable for all speakers and environments (base acoustic model)); 
calculate a first reliability of the first speech recognition result based on the first speech recognition result (Fujimura para. 29, likelihoods (reliability) of candidate words for the input speech are calculated); 
generate a second speech recognition result by subjecting the input speech data to a speech recognition process using the adapted acoustic model (Fujimura paras. 20-21, 32 and 34, a second speech recognition is performed on the input speech using a second acoustic model that varies according to speaker or environment and is learned from speech (thus adapted)); 
calculate a second reliability of the second speech recognition result based on the second speech recognition result (Fujimura para. 34, likelihoods (reliability) of candidate words for the input speech are calculated by the second speech recognition process); and 
output the first speech recognition result or the second speech recognition result based on the first reliability or the second reliability (Fujimura paras. 29, 34, each of the likelihoods for top candidates for the input speech are stored, including the top candidates from the first and second speech recognition, and the second speech recognizing unit selects the candidate word with the highest likelihood (based on first or second reliability – whichever is higher) and outputs it as the final speech recognition result).
Further taking the teachings of Kim and Fujimura together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the adaptation of an acoustic model disclosed in Kim with the selection of the highest likelihood speech recognition result from two different acoustic models as disclosed in Fujimura at least because doing so would reduce computation requirements in speech recognition (see Fujimura para. 41).
Regarding claim 16, Kim does not teach the limitations of claim 15. Fujimura teaches wherein the processing circuitry is further configured to output a speech recognition result corresponding to either the first reliability or the second reliability, whichever is higher (Fujimura paras. 29, 34, each of the likelihoods for top candidates for the input speech are stored, including the top candidates from the first and second speech recognition, and the second speech recognizing unit selects the candidate word with the highest likelihood (based on first or second reliability – whichever is higher) and outputs it as the final speech recognition result)(ratio.
Therefore taking the teachings of Kim and Fujimura together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the adaptation of an acoustic model disclosed in Kim with the selection of the highest likelihood speech recognition result from two different acoustic models as disclosed in Fujimura at least because doing so would reduce computation requirements in speech recognition (see Fujimura para. 41).
Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of Fujimura, as set forth above regarding claim 15 from which claim 17 depends, further in view of Kim et al., (US 2016/0260426 A1, herein “Kim2”).
Regarding claim 17, Kim does not explicitly teach the limitations of claim 17. Kim2 teaches wherein the processing circuitry is further configured to output, when a ratio between the first reliability and the second reliability satisfies a predetermined ratio, a speech recognition result corresponding to the higher reliability (Kim2 paras. 47-, input signal is converted to likelihoods of values output from at least one (thus including two) different acoustic models in neural networks/GMM, where the likelihoods are divided into speech versus non-speech likelihoods with a first maximum likelihood (first reliability) and a second maximum likelihood (second reliability) determined, and the likelihood ratio (LR) of the first maximum likelihood to the second maximum likelihood is calculated, and if LR is higher than a set threshold (predetermined ratio) then the speech is detected (speech recognition result)).
Therefore taking the teachings of Kim and Kim2 together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the adaptation of an acoustic model disclosed in Kim with the speech recognition result from a ratio of likelihoods as disclosed in Kim2 at least because doing so would provide a higher degree of accuracy in performing speech detecting for speech recognition (see Kim2 para. 26).

Allowable Subject Matter
Claim 14 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The closest cited art of record, Kim, while providing teachings of using a base acoustic model rather than an adapted acoustic model according to a criterion of whether new noise is detected, does not teach or suggest the predetermined criterion  to be a period of time required to adapt the base acoustic model to the environment. None of the other cited art of record teaches this limitation as well, and so, claim 14 is allowable over the cited art of record.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Weber et al., US 9,401,140 B1, directed towards a method of updating an acoustic model for speech recognition based on noise in a listening zone.
Komeji et al., US 2013/0231929 A1, directed towards a speech recognition device that adapts a clean acoustic model according to a detected amount of noise.
Asano, US 2004/0054531 A1, directed towards a speech recognition apparatus which selects an acoustic model that closest represents acoustic conditions under which current speech is being processed.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908. The examiner can normally be reached Monday-Friday, 09:30-18:30 EDT/EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MICHELLE M. KOETH
Primary Examiner
Art Unit 2656



/MICHELLE M KOETH/Primary Examiner, Art Unit 2656