DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 3/14/2022 and supplemental amendment filed on 4/4/2022. Claims 1-16 are pending in the application and have been examined.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The response filed on 4/4/2022 has been correspondingly accepted and considered in this Office Action. Claims 1-16 have been examined. Applicant’s amendments to claims 1, 4-8, 12 indicating a processor coupled to a memory and applicant’s amendments to claims 2, 9-11 indicating a processor overcome the 35 U.S.C 112f interpretation previously set forth in the Non-Final Office Action mailed 10/13/2021. Therefore, the above referenced claim interpretations under 35 U.S.C. 112(f) are withdrawn.

Response to Arguments
Applicant's arguments filed 04/04/2022 on pg. 8 have been fully considered as follows:
Applicant’s arguments with respect to claim 1 state that
“Each of independent claims 1, 2, and 13 now further recite “the background noise including non-speech”...”
	
The examiner respectfully disagrees, Jeong teaches “The Filler model is used to search for extraneous acoustics, such as noise or non-keywords, in speech.” in Jeong, [0030]. In the case of determining whether the speech data includes a keyword, Jeong teaches it can be determined depending on the result of likelihoods from the Filler model and the keyword model. The Filler model module 130 calculates a likelihood that the extracted feature vectors are extraneous acoustic signals (i.e. noise or a non-keyword). The likelihood from the Filler model module is a measure of how likely it is that a portion of the recognized speech is not a keyword (i.e, that the input speech is noise or a non-keyword), see Jeong [0042]. The Filler model calculates the likelihood of extraneous acoustic signals; hence the model is utilizing background noise to calculate the likelihood of non-keyword or noise. Noise and non-keyword recognition is interpreted as background noise including non-speech and therefore, Jeong teaches extracting the feature amount for the first frame of the speech data and inputting the feature amount to the model, the background noise including non-speech and therefore, the rejections of Claims 1, 2 and 13 are rejected under 35 U.S.C. 103 are sustained and further updated accordingly.
In response to the art rejection(s) of the remainder of dependent claims are rejected under 35 U.S.C 103, in case said claims are correspondingly discussed and/or argued for at least the same rationale presented in Remarks filed 04/04/2022, Examiner respectfully notes as follows. For completeness, should the mentioned claims are likewise traversed for similar reasons to independent claims 1, 2 and 13 correspondingly, Examiner respectfully directs Applicant to the same previous supra reasons provided in the response directed towards claims 1, 2 and 13 correspondingly discussed above. For at least the same supra provided reasons, Examiner likewise respectfully disagrees, and Applicant's arguments have been fully considered but they are not persuasive.


Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Claims 1, 12, 13, 14 and 16 are rejected under 35 U.S.C. §103 as being unpatentable over Jeong (U.S. Patent Application Publication 2007/0136058) in view of Bocklet (U.S. Patent Application Publication 2017/0256255).
Regarding Claim 1, Jeong teaches the apparatus comprising: a processor coupled to a memory, the processor configured to: acquire speech data including a plurality of frames ( Jeong [0038] teaches a speech receiver that receives a speech signal; the above mentioned functionality provided is configured to execute on one or more processors mentioned in [0035]) ; calculate a keyword score indicative of occurrence probability of the component of the keyword, based on the information output from the model, by extracting the feature amount for each of the frames of the speech data and inputting the feature amount to the model ( Jeong [0042] teaches the keyword model compares the extracted feature vector with a stored keyword to calculate a likelihood that the feature vector of the recognized speech matches the keyword; the above mentioned functionality provided is configured to execute on one or more processors mentioned in [0035]  ); calculate a background noise score indicative of occurrence probability of the component of the background noise, based on the information output from the model, by extracting the feature amount for each of the frames of the speech data and inputting the feature amount to the model , the background noise including non-speech ( Jeong [0042] teaches the Filler model module calculates a likelihood that the extracted feature vectors are extraneous acoustic signals (i.e., noise or a non-keyword) The likelihood from the Filler model module is a measure of how likely it is that a portion of the recognized speech is not a keyword (i.e, that the input speech is noise or a non-keyword;  Noise and non-keyword recognition is interpreted as background noise including non-speech and the Filler model calculates the likelihood of extraneous acoustic signals, hence the model is utilizing background noise to calculate the likelihood of non-keyword or noise; the above mentioned functionality provided is configured to execute on one or more processors mentioned in [0035]) ; and determine whether or not the speech data includes the keyword based on the keyword score, the background noise score, and a threshold ( Jeong [0054] teaches the determination module determines that the recognized word is in the keyword database. Conversely, if the determined position is included in the non-keyword area on the confidence coordinate system, the determination module determines that the recognized word is not in the keyword database, [0051] teaches the reference function generator generates a boundary serving as a standard of judgment as to whether the input speech signal corresponds to a word in a keyword database in accordance with the position determined by the first and second confidence scores on a predetermined confidence coordinate system, the boundary of judgement by reference function generator is interpreted as a threshold;  the above mentioned functionality provided is configured to execute on one or more processors mentioned in [0035]), however Jeong fails to teach acquire a model trained to, upon input of a feature amount extracted from the speech data, output information indicative of likelihood of each of a plurality of classes including a component of a keyword and a component of background noise other than the keyword.  However, Bocklet teaches acquire a model trained to, upon input of a feature amount extracted from the speech data, output information indicative of likelihood of each of a plurality of classes including a component of a keyword and a component of background noise other than the keyword ( Bocklet [0043-0045] teaches an acoustic scoring module which scores feature vectors based on acoustic model which is pretrained based on a training set of audio and provides any number of output scores based on feature vectors.  For example, the outputs of acoustic scoring module may, based on feature vectors, provide probabilities or scores or the like associated with such sub-phonetic units to which phone has been spoken as well as probabilities or scores associated with silence and/or background noise or the like at its outputs (different classes including component of keyword and background noise)).
Jeong and Bocklet are both considered to be analogous to the claimed invention because they both relate generally to speech recognition methods for keyword detection. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong to process the keyword and non-keyword detection with the model trained to output the likelihood of each of the plurality of classes of the speech data as taught by Bocklet to use neural network keyphrase detection system in an autonomous, stand-alone manner and achieve ultra-low power consumption compared to conventional systems (see Bocklet [0029]).
Regarding claim 12, Jeong and Bocklet teach the apparatus of claim 1. Furthermore, Jeong teaches wherein the classes include a plurality of components of the background noise, and the processor calculates the background noise score for each of the plurality of components of the background noise in each of the frames ( Jeong [0030, 0034] teaches the Filler model used to search and calculate the likelihood for noise or non-keywords for each recognized frame).
Regarding claim 13, is directed to a method claim corresponding to the apparatus claim presented in claim 1 and is rejected under the same grounds stated above regarding claim 1.
Regarding claim 14, Jeong and Bocklet teach the apparatus of claim 1. Furthermore, Jeong teaches wherein the background noise includes speech (Jeong [0042] teaches the Filler model module calculates a likelihood that the extracted feature vectors are extraneous acoustic signals (i.e., noise or a non-keyword) The likelihood from the Filler model module is a measure of how likely it is that a portion of the recognized speech is not a keyword (i.e, that the input speech is noise or a non-keyword;  non-keyword recognition is interpreted as background noise including speech and the Filler model calculates the likelihood of extraneous acoustic signals, hence the model is utilizing background noise including speech). Furthermore, Bocklet also teaches wherein the background noise includes speech (Bocklet [0037] microphone 201 may receive audio input that is not intended to wake system 200 or other background noise or even silence. For example, audio input 111 may include any speech issued by user 101 and any other background noise or silence or the like in the environment of microphone 201; background noise is interpreted to include non-speech like silence or the like).
Regarding claim 16, is directed to a method claim corresponding to the apparatus claim presented in claim 14 and is rejected under the same grounds stated above regarding claim 14.
Claim 3 is rejected under 35 U.S.C. §103 as being unpatentable over Jeong (U.S. Patent Application Publication 2007/0136058) in view of Bocklet (U.S. Patent Application Publication 2017/0256255) further in view of Knill (U.S. Patent 5,950,159).
Regarding claim 3, Jeong and Bocklet teach the apparatus of claim 1. Furthermore, Jeong teaches wherein the information includes correspondence between a phoneme as the component of the keyword and a first Hidden Markov Model, and correspondence between a phoneme as the component of the background noise( Jeong [0047, 0057] teaches the keyword model module and second confidence score calculator, the combination is being interpreted as first HMM and phoneme operations as explained in relative to HMM.   The filler model calculates noise from the received speech signal per phoneme as explained additionally in [0030-0034], Fig 2.), however Jeong and Bocklet fail to teach correspondence between a phoneme as the component of the background noise and a second Hidden Markov Model. However, Knill teaches wherein the information includes correspondence between a phoneme as the component of the keyword and a first Hidden Markov Model, and correspondence between a phoneme as the component of the background noise and a second Hidden Markov Model (Knill, Col3 lines 45-56 teaches the keyword and filler recognizer is a software module which applies the set of filler HMMs and the sequence of keyword HMMs to the audio data in order to map the audio data to a sequence of filler phones and keywords phone strings together with likelihood scores for each filler phone and keyword phone instance).
Jeong, Bocklet and Knill are considered to be analogous to the claimed invention because they relate to methods for keyword detection in acoustic data. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong and Booklet to process the keyword and non-keyword detection with the teachings of Knill to  include the correspondence between the phonemes of keyword and non-keyword using HMMs to find a keyword in acoustic data which is faster than known methods as well as being memory-efficient (see Knill Col 1, lines 47-50).
Claim 4 is rejected under 35 U.S.C. §103 as being unpatentable over Jeong (U.S. Patent Application Publication 2007/0136058) in view of Bocklet (U.S. Patent Application Publication 2017/0256255) further in view of Hayakawa (U.S. Patent Application Publication 2017/0148429).
Regarding claim 4, Jeong and Bocklet teach the apparatus of claim 1 but fail to teach wherein in calculating the keyword score, the processor calculates occurrence probability of correspondence between a phoneme as the component of the keyword and a Hidden Markov Model, and calculates a cumulative value of the occurrence probability of the correspondence by using Viterbi algorithm.  However Hayakawa teaches wherein in calculating the keyword score, the processor calculates occurrence probability of correspondence between a phoneme as the component of the keyword and a Hidden Markov Model, and calculates a cumulative value of the occurrence probability of the correspondence by using Viterbi algorithm ( Hayakawa[0032] teaches the processing used for the computation of keyword detection using the phoneme HMM; [0072] teaches the likelihood calculation by detection unit using the Viterbi computation method). 
Jeong, Bocklet and Hayakawa are considered to be analogous to the claimed invention because they relate to methods for keyword detection in speech signal. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong and Booklet to process the keyword and non-keyword detection with the processing of the keyword detecting using the phoneme HMM and Viterbi computation method as taught by Hayakawa to find a keyword in speech data in a more efficient manner (see Hayakawa [0007]).
Claim 5 is rejected under 35 U.S.C. §103 as being unpatentable over Jeong (U.S. Patent Application Publication 2007/0136058) in view of Bocklet (U.S. Patent Application Publication 2017/0256255) further in view of Kawazoe (U.S. Patent Application Publication 2003/0200086).
Regarding claim 5, Jeong and Bocklet teach the apparatus of claim 1 but fail to teach wherein in calculating the background noise score, the processor calculates occurrence probability of correspondence between a phoneme as the component of the background noise and a Hidden Markov Model and calculates a cumulative value of the occurrence probability of the correspondence by using Viterbi algorithm.  However, Kawazoe teaches wherein in calculating the background noise score, the processor calculates occurrence probability of correspondence between a phoneme as the component of the background noise and a Hidden Markov Model and calculates a cumulative value of the occurrence probability of the correspondence by using Viterbi algorithm( Kawazoe [0107] teaches calculating extraneous speech (interpreted as noise/not keyword) likelihood using a garbage model which is HMM, [0192] teaches that these models are based on syllables in this embodiment, it may be generated based on phonemes, Fig. 5 and [0126] teaches the second likelihood calculation by the extraneous-speech components HMM and [0161] teaches the matching process calculates the likelihood using Viterbi algorithm).
Jeong, Bocklet and Kawazoe are considered to be analogous to the claimed invention because they relate to methods for keyword detection in speech. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong and Booklet to process the keyword and non-keyword detection with the processing of the noise probability calculation using the HMM and Viterbi computation method as taught by Kawazoe to prevent misrecognition that can occur due to noise level and recognize keywords reliably (see Kawazoe [0053]).
Claim 6 is rejected under 35 U.S.C. §103 as being unpatentable over Jeong (U.S. Patent Application Publication 2007/0136058) in view of Bocklet (U.S. Patent Application Publication 2017/0256255) further in view of Melanson (U.S. Patent Application Publication 2018/0040325).
Regarding claim 6, Jeong and Bocklet teach the apparatus of claim 1 but fail to teach wherein if the keyword score is larger than a first threshold and the background noise score is smaller than a second threshold, the processor determines that the speech data includes the keyword.   However, Melanson teaches wherein if the keyword score is larger than a first threshold and the background noise score is smaller than a second threshold, the processor determines that the speech data includes the keyword( Melanson [0110, 0111] teaches  if the keyword discrimination score great than the threshold level of the selected keyword library(interpreted as first threshold); [0103] teaches determining the selecting keyword library based on the low background noise compared to some threshold noise level ( interpreted as noise level below second threshold) combined with Jeong [0058] which teaches to calculate the noise likelihood(interpreted to determine noise level)).
Jeong, Bocklet and Melanson are considered to be analogous to the claimed invention because they relate to methods for trigger phrase recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong and Booklet to process the keyword and non-keyword detection with the processing of the keyword determination in speech data based on the keyword score and noise level comparison to the respective thresholds as taught by Melanson to overcome the loss of accuracy of phrase recognition in a background noise environment (see Melanson [0008]).
Claim 7 is rejected under 35 U.S.C. §103 as being unpatentable over Jeong (U.S. Patent Application Publication 2007/0136058) in view of Bocklet (U.S. Patent Application Publication 2017/0256255) further in view of Lee (U.S. Patent Application Publication 2016/0071516).
Regarding claim 7, Jeong and Bocklet teach the apparatus of claim 1 but fail to wherein if a difference between the keyword score and the background noise score is larger than a third threshold, the processor determines that the speech data includes the keyword. However, Lee teaches wherein if a difference between the keyword score and the background noise score is larger than a third threshold, the processor determines that the speech data includes the keyword ( Lee [0064] compares the difference between the keyword score and non-keyword score is greater than a predetermined confidence value ( which is interpreted as a threshold)).
Jeong, Bocklet and Lee are considered to be analogous to the claimed invention because they relate to methods for keyword recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong and Booklet to process the keyword and non-keyword detection with the processing of the keyword determination methods as taught by Lee to improve keyword detection in voice commands (see Lee [0004]).
Claim 8 is rejected under 35 U.S.C. §103 as being unpatentable over Jeong (U.S. Patent Application Publication 2007/0136058) in view of Bocklet (U.S. Patent Application Publication 2017/0256255) further in view of Liu (U.S. Patent Application Publication 2019/0180734).
Regarding claim 8, Jeong and Bocklet teach the apparatus of claim 1. Furthermore, Jeong teaches calculating the ratio between the keyword score and the background score ( Jeong [0044] using the likelihood ratio between likelihood from the keyword model module and likelihood from the Filler model module to calculate the first confidence score). However, Jeong and Bocklet fail to teach wherein if a ratio between the keyword score and the background noise score is larger than a fourth threshold, the processor determines that the speech data includes the keyword. However, Liu teaches wherein if a ratio between the keyword score and the background noise score is larger than a fourth threshold, the processor determines that the speech data includes the keyword ( Liu [0130- 0132] teaches ratio of the cumulative silence probability and cumulative keyword probability with a threshold to determine the audio data as an effective keyword).
Jeong, Bocklet and Liu are considered to be analogous to the claimed invention because they relate to methods for keyword recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong and Booklet to process the keyword and non-keyword detection with the processing of the keyword determination methods as taught by Liu to reduce incorrect non-keyword recognition as keyword (see Liu [0007]).
Claims 2 and 15 are rejected under 35 U.S.C. §103 as being unpatentable over Jeong (U.S. Patent Application Publication 2007/0136058) in view of Bocklet (U.S. Patent Application Publication 2017/0256255), further in view of Koshiba (U.S. Patent Application Publication 2003/0125943) further in view of Guan (Chinese Patent Application Publication CN 109461456 A).
Regarding Claim 2, Jeong teaches the apparatus of comprising a first acquisition processor configured to acquire speech data including a plurality of frames (Jeong [0038] teaches a speech receiver that receives a speech signal; the above mentioned functionality provided is configured to execute on one or more processors mentioned in [0035]) ; a first calculation processor configured to calculate a keyword score indicative of occurrence probability of the component of the keyword, based on the information output from the model, by extracting the feature amount for each of the frames of the speech data and inputting the feature amount to the model ( Jeong [0042] teaches the keyword model compares the extracted feature vector with a stored keyword to calculate a likelihood that the feature vector of the recognized speech matches the keyword; the above mentioned functionality provided is configured to execute on one or more processors mentioned in [0035]), however Jeong fails to teach a second acquisition processor configured to acquire a model trained to, upon input of a feature amount extracted from the speech data, output information indicative of likelihood of each of a plurality of classes including a component of a keyword and a component of background noise other than the keyword.  
However, Bocklet teaches a second acquisition processor configured to acquire a model trained to, upon input of a feature amount extracted from the speech data, output information indicative of likelihood of each of a plurality of classes including a component of a keyword and a component of background noise other than the keyword, the background noise including non-speech (Bocklet [0043-0045] teaches an acoustic scoring module which scores feature vectors based on acoustic model which is pretrained based on a training set of audio and provides any number of output scores based on feature vectors.  For example, the outputs of acoustic scoring module may, based on feature vectors, provide probabilities or scores or the like associated with such sub-phonetic processors to which phone has been spoken as well as probabilities or scores associated with silence and/or background noise or the like at its outputs (different classes including component of keyword and background noise); Bocklet [0037] microphone 201 may receive audio input that is not intended to wake system 200 or other background noise or even silence. For example, audio input 111 may include any speech issued by user 101 and any other background noise or silence or the like in the environment of microphone 201; background noise is interpreted to include non-speech like silence or the like).
 Jeong and Bocklet are both considered to be analogous to the claimed invention because they relate generally to speech recognition methods for selected word detection. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong to process the keyword and non-keyword detection with the model trained to output the likelihood of each of the plurality of classes of the speech data as taught by Bocklet to use neural network keyphrase detection system in an autonomous, stand-alone manner and achieve ultra-low power consumption compared to conventional systems (see Bocklet [0029]). 
However, Jeong and Bocklet fail to teach a second calculation processor configured to determine whether or not the speech data includes a candidate for the keyword based on the keyword score and a first threshold, and if the speech data is determined to include the candidate for the keyword, calculate a background noise score indicative of occurrence probability of the component of the background noise, based on the information output from the model, by extracting the feature amount for each of the frames corresponding to the candidate for the keyword and inputting the feature amount to the model.  
However, Koshiba teaches a second calculation processor configured to determine whether or not the speech data includes a candidate for the keyword based on the keyword score and a first threshold, and if the speech data is determined to include the candidate for the keyword, calculate a background noise score indicative of occurrence probability of the component of the background noise, based on the information output from the model, by extracting the feature amount for each of the frames corresponding to the candidate for the keyword and inputting the feature amount to the model( Koshiba, Fig. 7 and [0095-0096] teaches in step 401, 402 to determine the score of recognizing target vocabulary ( interpretation of keyword score);[0097] teaches in step 403 to compare this score with first threshold to determine to reject input and if not, then proceeds to the noise score calculation (environment noise calculation) in step 404[0098];  [0099] teaches in step 405 to compare if vocabulary score < noise score to determine the output selected vocabulary). 
Jeong, Bocklet and Koshiba are all considered to be analogous to the claimed invention because they relate generally to speech recognition methods for selected word detection. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong and Bocklet further using the vocabulary word recognition method using score of the recognizing vocabulary and the noise score calculation and comparison of scores teachings of Koshiba to improve the success rate of speech word recognition (see Koshiba [0014]). 
However, Jeong, Bocklet, Koshiba fail to teach a determination processor configured to determine whether or not the speech data includes the keyword based on at least the background noise score and a second threshold. 
However, Guan teaches a determination processor configured to determine whether or not the speech data includes the keyword based on at least the background noise score and a second threshold ( Guan [0036] teaches comparing the obtained score with a noise interference threshold to determine if a keyword is present).  
Jeong, Bocklet, Koshiba and Guan are all considered to be analogous to the claimed invention because they relate generally to speech recognition methods for selected word detection. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong, Bocklet and Koshiba to process the keyword and non-keyword detection further using the teachings of comparing to noise interference threshold teaching of Guan to improve the success rate of the speech wake word recognition detection system ( see Guan [0007]).
Regarding claim 15, Jeong, Bocklet, Koshiba, Guan teach the apparatus of claim 2. Furthermore, Jeong teaches wherein the background noise includes speech (Jeong [0042] teaches the Filler model module calculates a likelihood that the extracted feature vectors are extraneous acoustic signals (i.e., noise or a non-keyword) The likelihood from the Filler model module is a measure of how likely it is that a portion of the recognized speech is not a keyword (i.e, that the input speech is noise or a non-keyword;  non-keyword recognition is interpreted as background noise including speech and the Filler model calculates the likelihood of extraneous acoustic signals, hence the model is utilizing background noise including speech). Furthermore, Bocklet also teaches wherein the background noise includes speech (Bocklet [0037] microphone 201 may receive audio input that is not intended to wake system 200 or other background noise or even silence. For example, audio input 111 may include any speech issued by user 101 and any other background noise or silence or the like in the environment of microphone 201; background noise is interpreted to include non-speech like silence or the like).
Claim 9 is rejected under 35 U.S.C. §103 as being unpatentable over Jeong (U.S. Patent Application Publication 2007/0136058) in view of Bocklet (U.S. Patent Application Publication 2017/0256255), further in view of Koshiba (U.S. Patent Application Publication 2003/0125943) further in view of Guan (Chinese Patent Application Publication CN 109461456 A), further in view of Fujimura (U.S. Patent 10,964,311), further in view of Choi Chang et.al. (Korea Patent Application Publication 20060082465 A).
Regarding Claim 9, Jeong, Bocklet, Koshiba, Guan teach the apparatus of claim 2. Furthermore, Koshiba teaches the second calculation processor determines that the speech data includes the candidate for the keyword, and calculates the background noise score for the frames corresponding to the candidate for the keyword by using start information and end information of the candidate for the keyword (Koshiba [0097] teaches in step 403 it will be determined if the input speech is noise or not after comparing likelihood of the registered vocabulary to a predetermined threshold and then calculates the likelihood of environment noise. The calculation of the noise is same as the time length of recognizing vocabulary which will be interpreted as start and end information of vocabulary (keyword); as indicated by [0086] and Fig. 6, Fig. 8a; registered vocabulary interpreted as candidate for keyword), and if the background noise score is smaller than the second threshold, the determination processor determines that the speech data includes the keyword(Koshiba [0099] teaches determining if the input speech is not the noise if the probability score of registered vocabulary less than environmental noise score). 
Jeong, Bocklet, Guan and Koshiba are all considered to be analogous to the claimed invention because they relate generally to speech recognition methods for selected word detection. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong and Bocklet further using the vocabulary word recognition method using score of the recognizing vocabulary and the noise score calculation and comparison of scores teachings of Koshiba to improve the success rate of speech word recognition (see Koshiba [0014]).  
However, Jeong, Bocklet, Koshiba and Guan fail to teach if the keyword score is larger than the first threshold and if the background noise score is smaller than the second threshold. 
However, Fujimura teaches this keyword score is larger than the first threshold ( Fujimura Col 7, lines 42-45 teaches the keyword's first detection processor compares the keyword's first score with the preset first threshold score thereby to determine whether a keyword having a score exceeding the first threshold score is present to determine if the keyword is present).
Jeong, Bocklet, Koshiba, Guan and Fujimura are all considered to be analogous to the claimed invention because they relate generally to speech recognition methods. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong, Bocklet, Koshiba and Guan to process the keyword and non-keyword detection using score of the keyword and the noise score calculation with the threshold comparison of keyword score teaching of Fujimura to improve the accuracy of a speech key word detection system (see  Fujimura Col. 1 lines 28-36). 
However, Jeong, Bocklet, Koshiba and Fujimura fail to teach if the background noise score is smaller than the second threshold, the determination processor determines that the speech data includes the keyword. 
However, Choi Chang teaches if the background noise score is smaller than the second threshold, the determination processor determines that the speech data includes the keyword ( Choi Chang lines 633-643 teaches when the voice absence probability of the noise source (interpreted as background noise) is smaller than a predetermined threshold to determine whether the input frame belongs to a voice section).
Jeong, Bocklet, Koshiba, Guan, Fujimura and Choi Chang are all considered to be analogous to the claimed invention because they relate generally to speech recognition methods. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong, Bocklet, Koshiba, Guan and Fujimura to process the keyword and non-keyword detection using score of the keyword and the noise score calculation based on keyword threshold comparison with the noise threshold comparison teachings of Choi Chang to improve the accuracy of a speech keyword detection system (see Choi Chang lines 221-230).
Claim 10 is rejected under 35 U.S.C. §103 as being unpatentable over Jeong (U.S. Patent Application Publication 2007/0136058) in view of Bocklet (U.S. Patent Application Publication 2017/0256255), further in view of Koshiba (U.S. Patent Application Publication 2003/0125943) further in view of Guan (Chinese Patent Application Publication CN 109461456 A), further in view of Fujimura (U.S. Patent 10,964,311), further in view of Lee (U.S. Patent Application Publication 2016/0071516).
Regarding Claim 10, Jeong, Bocklet, Koshiba, Guan teach the apparatus of claim 2. Furthermore, Koshiba teaches the second calculation processor determines that the speech data includes the candidate for the keyword, and calculates the background noise score for the frames corresponding to the candidate for the keyword by using start information and end information of the candidate for the keyword (Koshiba [0097] teaches in step 403 it will be determined if the input speech is noise or not after comparing likelihood of the registered vocabulary to a predetermined threshold and then calculates the likelihood of environment noise. The calculation of the noise is same as the time length of recognizing vocabulary which will be interpreted as start and end information of vocabulary (keyword); as indicated by [0086] and Fig. 6, Fig. 8a). 
Jeong, Bocklet, Guan and Koshiba are all considered to be analogous to the claimed invention because they relate generally to speech recognition methods for selected word detection. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong and Bocklet further using the vocabulary word recognition method using score of the recognizing vocabulary and the noise score calculation and comparison of scores teachings of Koshiba to improve the success rate of speech word recognition (see Koshiba [0014]).  
However,  Jeong, Bocklet, Koshiba and Guan fail to teach if the keyword score is larger than the first threshold and if a difference between the keyword score and the background noise score is larger than a third threshold, the determination processor determines that the speech data includes the keyword. 
However, Fujimura teaches this keyword score is larger than the first threshold ( Fujimura Col 7, lines 42-45 teaches the keyword's first detection processor compares the keyword's first score with the preset first threshold score thereby to determine whether a keyword having a score exceeding the first threshold score is present to determine if the keyword is present).
 Jeong, Bocklet, Koshiba, Guan and Fujimura are all considered to be analogous to the claimed invention because they relate generally to speech recognition methods. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong, Bocklet, Koshiba and Guan to process the keyword and non-keyword detection using score of the keyword and the noise score calculation with the threshold comparison of keyword score teaching of Fujimura to improve the accuracy of a speech key word detection system (see Fujimura Col. 1 lines 28-36). 
However, Jeong, Bocklet, Koshiba and Fujimura fail to teach if a difference between the keyword score and the background noise score is larger than a third threshold, the determination processor determines that the speech data includes the keyword. 
However, Lee teaches if a difference between the keyword score and the background noise score is larger than a third threshold, the determination processor determines that the speech data includes the keyword (Lee [0064] teaches determining if the input sound is indicative of the user if the difference of a keyword score and non-keyword score is greater than or equal to a predetermined confidence value).
Jeong, Bocklet, Koshiba, Guan, Fujimura and Lee are all considered to be analogous to the claimed invention because they relate generally to speech recognition methods for keyword detection. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong, Bocklet, Koshiba, Guan and Fujimura to process the keyword and non-keyword detection using score of the keyword and the noise score calculation based on the threshold comparison  with the keyword determination methods as taught by Lee to improve keyword detection in voice commands(see Lee [0004]).
Claim 11 is rejected under 35 U.S.C. §103 as being unpatentable over Jeong (U.S. Patent Application Publication 2007/0136058) in view of Bocklet (U.S. Patent Application Publication 2017/0256255), further in view of Koshiba (U.S. Patent Application Publication 2003/0125943) further in view of Guan (Chinese Patent Application Publication CN 109461456 A), further in view of Fujimura (U.S. Patent 10,964,311), further in view of Liu (U.S. Patent Application Publication 2019/0180734).
Regarding Claim 11, Jeong, Bocklet, Koshiba, Guan teach the apparatus of claim 2. Furthermore, Koshiba teaches the second calculation processor determines that the speech data includes the candidate for the keyword, and calculates the background noise score for the frames corresponding to the candidate for the keyword by using start information and end information of the candidate for the keyword (Koshiba [0097] teaches in step 403 it will be determined if the input speech is noise or not after comparing likelihood of the registered vocabulary to a predetermined threshold and then calculates the likelihood of environment noise. The calculation of the noise is same as the time length of recognizing vocabulary which will be interpreted as start and end information of vocabulary (keyword); as indicated by [0086] and Fig. 6, Fig. 8a). 
Jeong, Bocklet, Guan and Koshiba are all considered to be analogous to the claimed invention because they relate generally to speech recognition methods for selected word detection. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong and Bocklet further using the vocabulary word recognition method using score of the recognizing vocabulary and the noise score calculation and comparison of scores teachings of Koshiba to improve the success rate of speech word recognition (see Koshiba [0014]). 
However, Jeong, Bocklet, Koshiba and Guan fail to teach if the keyword score is larger than the first threshold and if a ratio between the keyword score and the background noise score is larger than a fourth threshold, the determination processor determines that the speech data includes the keyword. 
However, Fujimura teaches this keyword score is larger than the first threshold ( Fujimura Col 7, lines 42-45 teaches the keyword's first detection processor compares the keyword's first score with the preset first threshold score thereby to determine whether a keyword having a score exceeding the first threshold score is present to determine if the keyword is present).
 Jeong, Bocklet, Koshiba, Guan and Fujimura are all considered to be analogous to the claimed invention because they relate generally to speech recognition methods. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong, Bocklet, Koshiba and Guan to process the keyword and non-keyword detection using score of the keyword and the noise score calculation with the threshold comparison of keyword score teaching of Fujimura to improve the accuracy of a speech key word detection system (see Fujimura Col. 1 lines 28-36). 
However, Jeong, Bocklet, Koshiba and Fujimura fail to teach if a ratio between the keyword score and the background noise score is larger than a fourth threshold, the determination processor determines that the speech data includes the keyword. 
However, Liu teaches if a ratio between the keyword score and the background noise score is larger than a fourth threshold, the determination processor determines that the speech data includes the keyword (Liu [0130-0132] teaches when a ratio of the cumulative silence probability and the cumulative keyword probability being greater than a second threshold, the first audio data is confirmed as an effective keyword).
Jeong, Bocklet, Koshiba, Guan, Fujimura and Liu are all considered to be analogous to the claimed invention because they relate generally to speech recognition methods for keyword detection. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong, Bocklet, Koshiba, Guan and Fujimore to process the keyword and non-keyword detection using score of the keyword and the noise score calculation compared to the threshold with the keyword determination methods as taught by Liu to improve keyword detection (see Liu [0007]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Morin, US Patent 6,985,859 teaches a method for spotting words in a speech signal and further provides for calculating a first confidence score based on a matching ratio between a first minimum recognition value and a first background score. The spotting module continuously estimates the background score of each word. (see Morin, Fig. 4, Col 6, lines 6-19 and Col 4, lines 53-57).
Weiss et. al. ,  US Patent 8,131,543 teaches the classifier which includes a Gaussian mixture model for speech and a Gaussian mixture model for noise and uses a speech/noise probability (SNP) calculator to determine the probabilities that a frame is associated with noise, speech, or both (see Weiss, col 6, lines 14-24).
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 2:00pm - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NANDINI SUBRAMANI/Examiner, Art Unit 2656                                                                                                                                                                                                        
/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656