Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This office action is in response to correspondence 02/23/21 regarding application 16/510,565, in which claim 17 was amended. Claims 1-17 are pending in the application and have been considered.


Response to Arguments
The amended title of the invention overcomes the objection for not being descriptive, and so the objection is withdrawn.  
Amended claim 17 overcomes the 35 U.S.C. 101 rejection, and so it is withdrawn. 
The arguments on pages 8-14 regarding the 35 U.S.C. 103 rejections based on Kim, Burnett, and Togneri have been considered but are not persuasive. 

First, on pages 8-9, Applicant argues that Kim does not disclose “separating an input audio signal into at least two separated signals”, allegedly because in Kim, “multiple signals are obtained by respective multiple microphones, not separated from an input signal”.
In response, Kim teaches “each microphone of the multiple microphones 102 may be configured to generate a respective audio signal of the multiple audio signals 112 based on sound 108 of the far-field acoustic environment 110 as detected at the microphone. To illustrate, the first microphone 104 may detect at least a portion of the sound 108 and may generate a first audio signal of the audio signals 112 based on the portion of the sound 108 detected at the first physically separated”, see [0094]. In other words, Kim teaches separating sound 108 (an input audio signal) into separate channels (at least two separated signals). While Applicant is correct that Kim obtains multiple signals using multiple microphones, they are separate signals obtained from a single input signal (sound 108), and there is no reason why the particular language of the claims, which merely recites “an input signal” cannot be fairly interpreted as the acoustic input signal in Kim.
Next, on pages 9-10, Applicant argues that Kim does not “generate a denoised signal at a current frame based on a primary separated signal and one or more secondary separated signals selected from the at least two separated signals of the audio signal at the current frame”, allegedly because Kim forms a raked output based on multiple directional audio signals from a beamformer by combining two or more directional audio signals.
In response, it is unclear why Kim forming a raked output based on multiple directional audio signals from a beamformer by combining two or more directional audio signals necessarily means Kim does not disclose generating a denoised signal at a current frame based on a primary separated signal and one or more secondary separated signals selected from the at least two separated signals of the audio signal at the current frame, and further, in response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Kim discloses generating a signal at a current frame based on a primary separated signal and one or more secondary separated signals selected from the at least two separated signals of the audio signal at the current frame (as a result of the combination process, the raked output 222 may have a signal to noise ratio that is greater than the SNR of any individual one of the combined directional audio streams, [0113], the second directional 
Burnett discloses generating a denoised signal (denoising acoustic signals, [0020], Fig 14). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Kim by generating a denoised signal in order to improve intelligibility and quality of the speech of the user, as suggested by Burnett ([0025]). It is also worth noting that while the noise reduced signals generated by Kim are not considered completely “denoised”, they are noise reduced by the beamforming and especially the nullforming operations of Kim, which reduce interfering signals from stationary sources such as a television (Kim, [0065]). With regard to the raking operations of Kim, Fig 10 shows a user uttering a wake word from Zone 2, which reflects off a wall and the reflected sound is received from zone 5. The raking in Kim combines these signals to improve SNR. Thus, Kim is fairly considered to disclose, at the very least, generating a signal at a current frame based on a primary separated signal and one or more secondary separated signals selected from the at least two separated signals of the audio signal at the current frame.
	Next, on page 10 Applicant argues that Kim does not disclose “performing a recognition decision according to the recognition score of each of the plurality of interesting signals at the current frame”, allegedly because “…It is unclear which element described by Kim allegedly corresponds to recitation of the “recognition score” in claim 1. Further, the statement “the greater SNR of the raked output 222 can improve reliability of results of the voice recognition system 124 ([0113])” can infer the effect of the raked output, not determine a recognition decision.”
	In response, the examiner respectfully disagrees that it is unclear which element described by Kim corresponds to recitation of the “recognition score” in claim 1. The Office Action 11/24/20 pages 4-5 explicitly states “each of the plurality of interesting signals at the current frame having a recognition score at the current frame associated with a result of the preliminary recognition at the current frame (in Fig 10, the confidence metric 324 is greater than the confidence threshold, and the comparison 350 indicates that the fifth confidence metric 330 is greater than the confidence threshold. The other comparisons in Fig 10 indicate that no keyword is detected in the other directional audio signals, [0104])”. As those familiar with speech recognition would have immediately recognized, speech recognition systems are never entirely certain which words were actually spoken, and frequently assign results a confidence score, which is a degree of certainty in the output. For example, as Kim teaches, “Each confidence metric 322-332 indicates a likelihood that the corresponding directional audio signals 302-312 encodes sound corresponding to a keyword. For example, a first confidence metric 322 indicates a likelihood that the corresponding directional audio signals 302-312 encodes sound corresponding to a keyword”. The examiner maintains that the confidence metric taught by Kim is fairly considered a “recognition score”. As for “recognition decision”, the comparison to a confidence threshold in Kim is fairly considered a “recognition” for at least the reasons that are apparent from the above.
Next, on pages 10-11, Applicant argues that the technical principle described by Kim is different than that of claim 1, allegedly because Kim is optimized for detecting sound in a narrower angular range. To the extent it can be understood, this argument appears unpersuasive for similar reasons to the Kim does not disclose “separating an input audio signal into at least two separated signals” argument addressed above, and it is noted that the features upon which applicant relies (i.e. “the relationship between the separated signals”) are not recited in the rejected claims.  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Next, on pages 11-12, Applicant argues that if Kim were modified by generated a denoised signal, the combined invention would generated a denoised signal by combining signal from the first microphone and the second microphone, just as the raked output 222 in Kim. In response, just as there 
The remaining arguments on pages 12-14 regarding claims 2-17 as well as Togneri are similar to those addressed above, and are not persuasive for similar reasons.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-9 and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over by Kim et al. (2018/0033428) in view of Burnett (2009/0010451).

Consider claim 1, Kim discloses a method for recognizing speech, comprising: separating an input audio signal into at least two separated signals (corresponding to zones 1 and 2, [0095-0096]); generating a signal at a current frame based on a primary separated signal and one or more secondary separated signals selected from the at least two separated signals of the audio signal at the current 
Kim does not specifically mention generating a denoised signal. 
Burnett discloses generating a denoised signal (denoising acoustic signals, [0020], Fig 14).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Kim by generating a denoised signal in order to improve intelligibility and quality of the speech of the user, as suggested by Burnett ([0025]).


Consider claim 15, Kim discloses an apparatus for recognizing speech, comprising: a splitter configured to separate an input audio signal into at least two separated signals (corresponding to zones 1 and 2, [0095-0096]); a signal generator configured to generate a signal at a current frame based on a primary separated signal and one or more secondary separated signals selected from the at least two separated signals of the audio signal at the current frame (as a result of the combination process, the raked output 222 may have a signal to noise ratio that is greater than the SNR of any individual one of the combined directional audio streams, [0113], the second directional audio signal 304 includes first target speech, [0104], the fifth directional audio signal 310 includes second target speech, [0104]); a recognizer configured to perform a preliminary recognition on each of a plurality of interesting signals at the current frame (the keyword detection system 122 determines that utterances in zones 2 and 5 included the keyword, [0104]), the plurality of interesting signals at the current frame including the at least two separated signals and the signal at the current frame (the signal processing system may continue to output the directional audio signals 302-312 to the keyword detection system 122, and may concurrently or simultaneously provide the second directional audio signal 304 and the fifth directional audio signal 310 tor the raked output to the voice recognition system 124, [0106]), and each of the plurality of interesting signals at the current frame having a recognition score at the current frame associated with the result of preliminary recognition at the current frame (in Fig 10, the comparison 344 indicates that the second confidence metric 324 is greater than the confidence threshold, and the comparison 350 indicates that the fifth confidence metric 330 is greater than the confidence threshold. The other comparisons in Fig 10 indicate that no keyword is detected in the other directional audio signals, [0104]); and a decision device configured to perform a recognition decision according to the recognition score at the current frame of each of a plurality of interesting signals at the current frame 
Kim does not specifically mention generating a denoised signal. 
Burnett discloses generating a denoised signal (denoising acoustic signals, [0020], Fig 14).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Kim by generating a denoised signal for reasons similar to those for claim 1.


Consider claim 16, Kim and Burnett disclose an apparatus for recognizing speech, comprising: one or more processors (Kim, [0013]) configured to perform at least the method of claim 1 when starting (see claim 1). 

Consider claim 17, Kim and Burnett disclose a computer readable non-volatile storage medium having program instructions stored thereon (Kim, non-transitory computer readable medium, [0138]) which performs the method of claim 1 when executed.

Consider claim 2, Kim discloses the primary separated signal selected at the current frame has a recognition score at a preceding frame of the audio signal larger than or equal to the recognition score of any other separated signal in the at least two separated signals at the preceding frame (determining which frame of the subset 128 of the processed audio signals 120 corresponds to a beginning or ending of the keyword, [0042], confidence metrics are compared and the highest are chosen as the output, [0106]). 



Consider claim 4, Kim discloses the preliminary recognition further comprises: determining a confidence level of the decoding result of each of the plurality of interesting signals at the current frame matching a predetermined sentence including one or more predetermined words, the recognition score of each of the plurality of interesting signals at the current frame depending on the confidence level determined at the current frame (comparative confidence threshold for keyword detection, [0080], determining which frame of the subset 128 of the processed audio signals 120 corresponds to a beginning or ending of the keyword, [0042], confidence metric comparison and raking the results, [0104-0106]). 

Consider claim 5, Kim discloses the preliminary recognition further comprises: performing a natural language processing on the decoding result of each of the plurality of interesting signals at the current frame, the recognition score of each of the plurality of interesting signals at the current frame depending on a score of the natural language processing at the current frame (performing keyword detection, [0104-0106], Fig 1 elements 120 and 122, [0028]). 


Consider claim 7, Kim discloses the predetermined condition comprises one or more of following conditions: the recognition score of the first interesting signal at the current frame is larger than or equal to the recognition score of any other of the plurality of interesting signals at the current frame; predetermined word hit times corresponding to the preliminary recognition result of the first interesting signal at the current frame are larger than or equal to that corresponding to the preliminary recognition result of any other of the plurality of interesting signals at the current frame; and the recognition score of the first interesting signal at the current frame is larger than or equal to a first threshold (comparative confidence threshold for keyword detection, [0080], determining which frame of the subset 128 of the processed audio signals 120 corresponds to a beginning or ending of the keyword, [0042], confidence metric comparison and raking the results, [0104-0106]). 
Consider claim 8, Kim discloses the current recognition decision comprises: determining the recognition result for the audio signal at the current frame as an instruction to perform a wake-up operation according to a predetermined condition (user speaks a keyword to wake up a voice recognition system, [0005], [0129]). 
Consider claim 9, Kim discloses the predetermined condition comprises one or more of following conditions: the plurality of interesting signals at the current frame include a first interesting signal which has a recognition score at the current frame larger than or equal to a first threshold; the recognition 
Kim does not specifically mention generating a denoised signal. 
Burnett discloses generating a denoised signal (denoising acoustic signals, [0020], Fig 14).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Kim by generating a denoised signal for reasons similar to those for claim 1.



Claims 10-12 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (2018/0033428) in view of Burnett (2009/0010451), in further view of Beltman et al. (2012/0004909).
Consider claim 10, Kim and Burnett do not, but Beltman discloses determining a feature of a speaker at least according to the recognition result for audio signal at the current frame; and training a speaker model by the speaker feature (user training of particular characteristics as features, [0012]).


Consider claim 11, Kim discloses recording previous interesting signals at the time of at least one previous frame before the current frame, and the preliminary recognition result of each previous interesting signal at a corresponding previous frame is a recognition result for the audio signal at the corresponding previous frame; and determining the speaker feature according to the preliminary recognition result of each previous interesting signal at the corresponding previous frame (a speaker tracking process, [0076], comparative confidence threshold for keyword detection, [0080], determining which frame of the subset 128 of the processed audio signals 120 corresponds to a beginning or ending of the keyword, [0042], confidence metric comparison and raking the results, [0104-0106]). 

Consider claim 12, Kim discloses a probability of the primary separated signal selected at the current frame being associated with the speaker at the previous frame as determined by the speaker model is larger than or equal to a probability of any other of the at least two separated signals being associated with the speaker at the preceding frame as determined by the speaker model, and is larger than or equal to a first threshold (tracking multiple speakers by confidence threshold comparison, [0076], [0104]). 


Claims 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (2018/0033428) in view of Burnett (2009/0010451), in further view of Togneri et al. (“An Overview of Speaker Identification: Accuracy and Robustness Issues”. IEEE Circuits and Systems Magazine, 2011).

Consider claim 13, Kim does not but Burnett discloses generating a denoised signal and denoised factor (denoising acoustic signals, [0020], [0093], Fig 14).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Kim by generating a denoised signal for reasons similar to those for claim 1.
Kim and Burnett do not specifically mention generating a denoised signal at the current frame comprises: analyzing each of the at least two separated signals to obtain its frequency spectrum and power spectrum; determining a denoised factor according to the power spectrum of the primary separated signal and the power spectrum of the one or more secondary separated signals; and obtaining the denoised signal at the current frame according to the denoised factor and the frequency spectrum of the primary separated signal. 
Togneri discloses generating a signal at the current frame comprises: analyzing each of the at least two separated signals to obtain its frequency spectrum and power spectrum; determining a noise reduction factor according to the power spectrum of the primary separated signal and the power spectrum of the one or more secondary separated signals; and obtaining the signal at the current frame according to the noise reduction factor and the frequency spectrum of the primary separated signal (separating signals and utilizing their power spectrums, pages 50-51, SNR-based Estimation). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Kim and Burnett such that generating a denoised signal at the current frame comprises: analyzing each of the at least two separated signals to obtain its 

Consider claim 14, Kim does not but Burnett discloses generating a denoised signal and denoised factor (denoising acoustic signals, [0020], [0093], Fig 14).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Kim by generating a denoised signal for reasons similar to those for claim 1.
Kim and Burnett do not specifically mention determining the denoised factor comprises: determining a power ratio according to the power spectrum of the primary separated signal and the power spectrum of the one or more secondary separated signals; and determining the denoised factor according to the power ratio such that the larger the power ratio, the larger the denoised factor. 
Togneri discloses determining the noise reduction factor comprises: determining a power ratio according to the power spectrum of the primary separated signal and the power spectrum of the one or more secondary separated signals; and determining the noise reduction factor according to the power ratio such that the larger the power ratio, the larger the noise reduction factor (separating signals and utilizing their power spectrums for noise reduction, pages 50-51, SNR-based Estimation). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Kim and Burnett such that determining the denoised factor comprises: determining a power ratio according to the power spectrum of the primary separated signal and the power spectrum of the one or more secondary separated signals; and determining the .

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 8:00 AM - 4:30 PM. The examiner’s fax number is 571/270-6135.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Dan Washburn can be reached on 571/272-5551. 

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).


/Jesse S Pullias/
Primary Examiner, Art Unit 2657                                        03/02/21