DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 09/19/2022 has been entered.

Response to Arguments
Applicant’s arguments, filed 08/15/2022, with respect to claim(s) 1-12 have been considered but are moot because of the new ground of rejection in view of Gomez and Kinoshita for claims 1, 8 and 9-12; Gomez, Kinoshita and Moon for claims 2, 3, 5-7, 13 and 14; and Gomez, Kinoshita, Moon and Siohan for claim 4.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 8 and 9-12 are rejected under 35 U.S.C. 103 as being unpatentable over Gomez (US PG Pub 20160203828) in view of Kinoshita (US PG Pub 20210400383).

As per claims 1, 8 and 9, Gomez discloses a generating device, method and non-transitory computer readable storage medium comprising: 	a processor programmed (Gomez; p. 0284 - Each of the functional blocks of each of the speech processing device 10 may be individually realized in the form of a processor, or part or all of the functional blocks may be integrated in the form of a processor) to:	obtain training data including an acoustic feature value of a first observation signal (Gomez; p. 0115 - Similarly to the feature quantity calculation unit 104 (FIG. 1), the feature quantity calculation unit 142 calculates an acoustic feature quantity [f(x,y)] for the dereverberated speech signal relating to the sound source position (x,y) input from the reverberation processing unit 141. The feature quantity calculation unit 142 outputs the calculated acoustic feature quantity [f(x,y)] to the adjustment factor calculation unit 143 and the feature quantity adjustment unit 144; also see p. 0139), a late reverberation component corresponding to the first observation signal (Gomez; p. 0114 - The reverberation processing unit 141 is provided with a storage unit which stores an impulse response from each sound source position to the sound pickup unit 12 installed at a predetermined position in advance. The reverberation processing unit 141 performs a convolution operation of the speech signal of clean speech read from the clean speech data storage unit 13 and the impulse response of each sound source position and generates a reverberant speech signal indicating reverberant speech relating to the sound source position (x,y). Similarly to the reverberation suppression unit 103, the reverberation processing unit 141 suppresses a reverberation component for the generated reverberant speech signal to generate a dereverberated speech signal. The reverberation processing unit 141 outputs the generated dereverberated speech signal to the feature quantity calculation unit 142; also see p. 0138; also see p. 0099-0107 - explaining how the reverberation suppression unit calculates a reverberation component factor), and a phoneme label associated with the first observation signal (Gomez; p. 0116 - the adjustment factor calculation unit 143 calculates a likelihood for each of possible utterance state strings q.sup.(s) in the clean speech acoustic model .lamda..sup.(s) for a set w of known phoneme strings indicating utterance of clean speech used for calculating the acoustic feature quantity [f(x,y)] and the input acoustic feature quantity [f(x,y)] and selects an utterance state string q'.sup.(s) having the maximum calculated likelihood; also see p. 0103-0105); and 	generate an acoustic model to identify a phoneme label corresponding to an observation signal based on the obtained acoustic feature value, the obtained late reverberation component, and the obtained phoneme label (Gomez; p. 0121 - The model generation unit 145 generates a position-dependent acoustic model .PSI..sup.(n) relating to the sound source position (x,y) using the clean speech acoustic model .lamda..sup.(s) read from the model storage unit 109 and the adjusted feature quantity [f'] of each sound source position (x,y) input from the feature quantity adjustment unit 144. n is an index indicating the sound source position (x,y). In generating the position-dependent acoustic model .PSI..sup.(n), the model generation unit 145 calculates a likelihood for each given adjusted feature quantity [f'] and updates the model parameters of the position-dependent acoustic model .PSI..sup.(n) such that the likelihood increases (is maximized).).
Gomez, however, fails to explicitly disclose generating an acoustic model to identify a phoneme label corresponding to a second observation signal, such that the acoustic feature value of the first observational signal and the late reverberation component are used as input data to generate the acoustic model, wherein the late reverberation component is a portion of the first observation signal received after a predetermined time has passed from receipt of a direct sound component of the first observation signal. 
Kinoshita does teach generating an acoustic model to identify a phoneme label corresponding to a second observation signal, such that the acoustic feature value of the first observational signal and the late reverberation component are used as input data to generate the acoustic model (Kinoshita; p. 0061-0067 – learning operation for learning neural network (generating an acoustic model)… the power estimating unit 12 receives input of the observation feature quantity calculated by the observation feature quantity calculating unit 11 with respect to the observation signal for learning (i.e., the sound including reverberation) in the learning data. The sound including reverberation implies, for example, the sound including the clean sound and reverberation… The learning data is provided in advance as the correct signal in the learning data), wherein the late reverberation component is a portion of the first observation signal received after a predetermined time has passed from receipt of a direct sound component of the first observation signal (Kinoshita; p. 0047 - wherein the late reverberation component is a portion of the first observation signal received after a predetermined time has passed from receipt of a direct sound component of the first observation signal).	Therefore, it would be obvious to one of ordinary skill in the art to modify the device of Gomez to include generating an acoustic model to identify a phoneme label corresponding to a second observation signal, such that the acoustic feature value of the first observational signal and the late reverberation component are used as input data to generate the acoustic model, wherein the late reverberation component is a portion of the first observation signal received after a predetermined time has passed from receipt of a direct sound component of the first observation signal, as taught by Kinoshita, in order to enable accurate reverberation removal even when the observation signals are short (Kinoshita; p. 0010).

As per claims 10-12, Gomez in view of Kinoshita disclose the generating device, method and non-transitory computer-readable medium according to claims 1, 8 and 9, wherein the phoneme label is transcribed from the first observation signal (Gomez; p. 0081 - The speech recognition unit 111 calculates a likelihood of each candidate of a sentence represented by phoneme strings on the basis of a language model stored in the model storage unit 109 among possible phoneme strings, and outputs recognition data representing a sentence with the highest likelihood to the outside of the speech processing device 10).

Claims 2, 3, 5-7, 13 and 14 is rejected under 35 U.S.C. 103 as being unpatentable over Gomez in view of Kinoshita and further in view of Moon (US PG Pub 20160118039).
	As per claims 2, 13 and 14, Gomez in view of Kinoshita discloses the generating device according to claims 1, 8 and 9, upon which claims 2, 13 and 14 depends.	Gomez in view of Kinoshita, however, fails to disclose the obtained first observation signal has a signal-to-noise ratio that is lower than a first threshold. Moon does teach the obtained first observation signal has a signal-to-noise ratio that is lower than a first threshold (Moon; p. 0035 - the electronic device 100 may determine an SNR of each of the sound samples S1, S2, S3, S4, and S5 as an acoustic feature. If an SNR of a sound sample is determined to be less than a threshold SNR, it may indicate that the sound sample has too much noise. Thus, the electronic device 100 may determine that the sound sample may not be used in generating a sound detection model).	Therefore, it would be obvious to one of ordinary skill in the art to modify the device of Gomez to include the obtained first observation signal has a signal-to-noise ratio that is lower than a first threshold, as taught by Moon, because some utterances may be received in a noisy sound environment and thus may not provide sufficient quality for generating a sound model. Thus, the sound model generated or trained from such utterances may not produce adequate detection performance (Moon; p. 0005). 

	As per claim 3, Gomez in view of Kinoshita teaches the generating device according to claim 1, upon which claim 3 depends.	Gomez in view of Kinoshita, however, fails to disclose wherein the obtained late reverberation component is larger than a second threshold. 	Moon does teach wherein the obtained late reverberation component is larger than a second threshold (Moon; p. 0033 - the term "similar acoustic features" or equivalent variations thereof may mean that the acoustic features are the same or substantially the same within a specified tolerance or threshold value or percentage in feature values or parameters such as spectral features, time domain features, statistical measures, subwords, or the like; p. 0025 - the acoustic features may include a sound intensity level, a signal-to-noise ratio (SNR), or a reverberation time (RT), which may be indicative of sound quality).
Therefore, it would be obvious to one of ordinary skill in the art to modify the device of Gomez to include wherein the obtained late reverberation component is larger than a second threshold, as taught by Moon, because some utterances may be received in a noisy sound environment and thus may not provide sufficient quality for generating a sound model. Thus, the sound model generated or trained from such utterances may not produce adequate detection performance (Moon; p. 0005).

As per claim 5, Gomez in view of Kinoshita discloses the generating device according to claim 1, upon which claim 5 depends.	Gomez in view of Kinoshita, however, fails to disclose wherein the obtained late reverberation component is smaller than a second threshold. 	Moon does teach wherein the obtained late reverberation component is smaller than a second threshold (Moon; p. 0033 - the term "similar acoustic features" or equivalent variations thereof may mean that the acoustic features are the same or substantially the same within a specified tolerance or threshold value or percentage in feature values or parameters such as spectral features, time domain features, statistical measures, subwords, or the like; p. 0025 - the acoustic features may include a sound intensity level, a signal-to-noise ratio (SNR), or a reverberation time (RT), which may be indicative of sound quality).
Therefore, it would be obvious to one of ordinary skill in the art to modify the device of Gomez to include wherein the obtained late reverberation component is smaller than a second threshold, as taught by Moon, because some utterances may be received in a noisy sound environment and thus may not provide sufficient quality for generating a sound model. Thus, the sound model generated or trained from such utterances may not produce adequate detection performance (Moon; p. 0005).
	As per claim 6, Gomez in view of Kinoshita discloses the generating device according to claim 4, upon which claim 6 depends.	And further, Moon teaches wherein the second processor generates an observation signal having a late reverberation component smaller than a third threshold by removing the late reverberation component from the first observation signal (Moon; p. 0035 - the electronic device 100 may determine an SNR of each of the sound samples S1, S2, S3, S4, and S5 as an acoustic feature. If an SNR of a sound sample is determined to be less than a threshold SNR, it may indicate that the sound sample has too much noise. Thus, the electronic device 100 may determine that the sound sample may not be used in generating a sound detection model).	Therefore, it would be obvious to one of ordinary skill in the art to modify the device of Gomez to include wherein the second processor generates an observation signal having a late reverberation component smaller than a third threshold by removing the late reverberation component from the first observation signal, as taught by Moon, because some utterances may be received in a noisy sound environment and thus may not provide sufficient quality for generating a sound model. Thus, the sound model generated or trained from such utterances may not produce adequate detection performance (Moon; p. 0005).

As per claim 7, Gomez in view of Kinoshita discloses the generating device according to claim 1, upon which claim 7 depends.	Gomez, however, fails to disclose the first observation signal has a signal-to-noise ratio which is higher than a fourth threshold. Moon does teach the first observation signal has a signal-to-noise ratio which is higher than a fourth threshold (Moon; p. 0035 - the electronic device 100 may determine an SNR of each of the sound samples S1, S2, S3, S4, and S5 as an acoustic feature. If an SNR of a sound sample is determined to be less than a threshold SNR, it may indicate that the sound sample has too much noise. Thus, the electronic device 100 may determine that the sound sample may not be used in generating a sound detection model).	Therefore, it would be obvious to one of ordinary skill in the art to modify the device of Gomez to include the first observation signal has a signal-to-noise ratio which is higher than a fourth threshold, as taught by Moon, because some utterances may be received in a noisy sound environment and thus may not provide sufficient quality for generating a sound model. Thus, the sound model generated or trained from such utterances may not produce adequate detection performance (Moon; p. 0005).

	Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Gomez in view of Kinoshita and Moon and further in view of Siohan et al (US Patent 9299347; hereinafter “Siohan”).
	As per claim 4, Gomez in view of Kinoshita discloses the generating device according to claim 1, upon which claim 1 depends.	Gomez in view of Kinoshita, however, fails to disclose a signal-to-noise ratio of which is lower than a first threshold. Moon does teach a signal-to-noise ratio of which is lower than a first threshold (Moon; p. 0035 - the electronic device 100 may determine an SNR of each of the sound samples S1, S2, S3, S4, and S5 as an acoustic feature. If an SNR of a sound sample is determined to be less than a threshold SNR, it may indicate that the sound sample has too much noise. Thus, the electronic device 100 may determine that the sound sample may not be used in generating a sound detection model). 	Therefore, it would be obvious to one of ordinary skill in the art to modify the device of Gomez to include a signal-to-noise ratio of which is lower than a first threshold, as taught by Moon, because some utterances may be received in a noisy sound environment and thus may not provide sufficient quality for generating a sound model. Thus, the sound model generated or trained from such utterances may not produce adequate detection performance (Moon; p. 0005).
	Furthermore, Gomez and Moon fails to disclose the processor is further programmed to generate an observation signal having a reverberation component larger than a second threshold by adding reverberation to the first observation signal. 	Siohan does teach a second generating unit that generates an observation signal having a reverberation component larger than a second threshold by adding reverberation to the first observation signal (Siohan; Col. 3, lines 20-23 -  the corrupted versions of the uncorrupted audio segments are each a version of an uncorrupted audio segment that has been modified to add noise, reverberation, echo, or distortion; Col. 5, lines 57-67 -  the corrupted versions of the uncorrupted audio segments are each a version of an uncorrupted audio segment that has been modified to add noise, reverberation, echo, or distortion; Col. 7, lines 24-29 - the clean audio 110 may be a large corpus of unsupervised recordings that includes common words and phrases, where the recordings have signal-to-noise ratio that is above a minimum threshold).	Therefore, it would be obvious to one of ordinary skill in the art to modify the device of Gomez and Moon to include a second generating unit that generates an observation signal having a reverberation component larger than a second threshold by adding reverberation to the first observation signal, as taught by Siohan, in order to deal with noise, a speech recognition system may use associative mappings between noisy audio and clean audio to identify less-noisy, or "clean," audio data corresponding to the same sounds that the user spoke. The identified clean audio data can be substituted for the noisy audio data in the speech recognition process to effectively filter out noise from the input audio data (Siohan; Col. 1, lines 25-34).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art made of record and not relied upon includes:	Watanabe et al. (US Patent 8645130) discloses a processing unit is provided which executes speech recognition on speech signals captured by a microphone for capturing sounds uttered in an environment. The processing unit has: an initial reflection component extraction portion that extracts initial reflection components by removing diffuse reverberation components from a reverberation pattern of an impulse response generated in the environment; and an acoustic model learning portion that learns an acoustic model for the speech recognition by reflecting the initial reflection components to speech data for learning (Watanabe et al.; Abstract). 
Nakadai (US PG Pub 20180286423) discloses an audio processing device includes a sound source localization unit that determines respective directions of sound sources from audio signals of a plurality of channels, a setting information selection unit that selects a setting information from a setting information storage unit that stores setting information including transfer functions of directions in advance for each acoustic environment, and a sound source separation unit that separates the audio signals of the plurality of channels into respective sound-source-specific signals of sound sources by applying a separation matrix based on transfer functions included in the setting information selected by the setting information selection unit (Nakadai; Abstract). 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Rodrigo A Chavez whose telephone number is (571)270-0139.  The examiner can normally be reached on Monday - Friday 9-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 5712727602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/RODRIGO A CHAVEZ/Examiner, Art Unit 2658

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658