DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 03/24/2022. Claims 1-10 are pending in the application and have been examined.
Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 
Response to Amendment
The response filed on 03/24/2022 has been correspondingly accepted and considered in this Office Action. Claims 1-10  have been examined. Applicant’s amendments to claims 1, 2, 7, indicating a processor configured to operate as the different units with the support in the Specifications Pg. 7, lines 3-11, overcome the 35 U.S.C 112 (f) claim interpretation previously set forth in the Non-Final Office Action mailed 10/25/2021. Therefore, the above referenced claim interpretation under 35 U.S.C. 112 (f) are withdrawn.
Response to Arguments
Applicant's arguments filed 03/24/2022 have been fully considered as follows:
Applicant’s arguments with respect to claim 1 on page 8 states that
“As recognized in the grounds for rejection Parada teaches analyzing only the portions of speech, The above-noted claim features are directed to an extraction from the first training data, to thereby extract second training data that includes one or more voice feature quantities of least one of a Keyword, a sub-word included in the keyword, a syllable included in the Keyword, or a phoneme included in the Keyword, the Keyword being preset. Applicant submits analyzing only the portions of speech as noted in Parada does not meet the noted claim features...”
	
Applicant’s arguments above with respect to claim 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant’s further arguments with respect to claim 1 state that
Applicant's arguments filed 03/24/2022 have been fully considered as follows:
Applicant’s arguments with respect to claim 1 on page 8 states that
“Applicant submits Parada discloses a DNA that gets updated using a second training set smaller than an initial training set. Applicant submits Parada does not disclose or suggest adapting a trained acoustic model to a keyword model that is generated from the trained acoustic model...”
	
Applicant’s arguments above with respect to claim 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. For independent claims 9 and 10, Examiner respectfully directs Applicant to the same previous supra reasons provided in the response directed towards claims 1 discussed above.
In response to the art rejection(s) of the remainder of dependent claims are rejected under 35 U.S.C 103, in case said claims are correspondingly discussed and/or argued for at least the same rationale presented in Remarks filed 03/24/2022, Examiner respectfully notes as follows. For completeness, should the mentioned claims are likewise traversed for similar reasons to independent claim 1 correspondingly, Examiner respectfully directs Applicant to the same previous supra reasons provided in the response directed towards claims 1 discussed above. For at least the same supra provided reasons, Examiner likewise respectfully disagrees, and but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument of the independent claim 1.
Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 4, 8, 9, 10 is rejected under 35 U.S.C. §103 as being unpatentable over Parada (U.S. Patent Application Publication 2015/0125594) in view of G. Heigold et al., "Multilingual acoustic models using distributed deep neural networks," 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 8619-8623.
Regarding claim 1, Parada teaches an information processing apparatus comprising: a processor configured to operate as: (Parada et. al. paragraph [0099-101], Fig. 7 teaches a speech recognition system implemented in a computing device as shown in Fig. 7, processors 712)  a first data acquisition unit configured to acquire first training data including at least one combination of a voice feature quantity and a correct phoneme label of the voice feature quantity (Parada et. al. paragraphs [0023, 0036] teaches how feature vectors is the training data including subword units/triphones/phonemes) provided to DNN which is trained, the labels represent words/phonemes of the keywords); a training unit configured to train, using the first training data, an acoustic model , the input being performed by the acoustic model, the acoustic model outputting the correct phoneme label (Parada et. al. paragraphs [0023, 0024, 0036] teaches training the deep neural network (interpreted as the acoustic model) to predict the posterior priorities and the posterior handling module checks for keyword determination based on confidence score; the labels represent words/phonemes of the keywords).  However, Parada an extraction unit configured to extract from the first training data, second training data including one or more voice feature quantities of at least one of a keyword, a sub-word included in the keyword, a syllable included in the keyword, or a phoneme included in the keyword, the keyword being preset; and an adaptation processing unit configured to adapt, using the second training data, the trained acoustic model to a keyword model used for detection of the keyword, the keyword model being generated from the trained acoustic model.
	Heigold teaches an extraction unit configured to extract from the first training data, second training data including one or more voice feature quantities of at least one of a keyword, a sub-word included in the keyword, a syllable included in the keyword, or a phoneme included in the keyword, the keyword being preset (see Heigold, pg. 8620, Sect 2. 1 On top of these features, a comparably lightweight classifier (for example, Gaussian mixture models or a neural network with only a couple of layers) is trained for another language, keeping the features fixed . Heigold, pg. 8620, 2.3 in this paper, we use an architecture based on DNNs with a shared feature extraction and language-specific classifiers, see Fig. 1 and Fig. 2(b). In particular, the feature extraction and the classifiers are jointly optimized on the shared data for the different languages. As a side effect, it is expected that multitask learning is less sensitive to the optimal tuning of the network size. Feature learning (Section 2.1) and transfer learning (Section 2.2) can be considered (efficient) approximations of this implementation of multitask learning; classifiers are interpreted as keyword); and an adaptation processing unit configured to adapt, using the second training data, the trained acoustic model to a keyword model used for detection of the keyword, the keyword model being generated from the trained acoustic model (see Heigold, pg. 8620,  sect 3 The basic approach to Downpour SGD is as follows. We divide the training data into a number of subsets and run a copy of the model on each of these subsets. Models periodically update their copies of the model parameters by requesting fresh values from the parameter server. The models send updates to a centralized parameter server, which keeps the current state of all parameters for the model, sharded across many machines (see Fig. 3); updating their copies of model parameters is interpreted as adapt, using the fresh parameters to a keyword model which is generated from a trained acoustic model ).
Parada and Heigold  are considered to be analogous to the claimed invention because they relate to Speech recognition to efficiently train accurate acoustic models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Parada on hotword detection with the multitask learning teachings of Heigold to learn some features in a parallel task compared to traditional single-task approach ( see Heigold, pg. 8619, sect. 1).
Regarding claim 4, Parada in view of Heigold teaches the apparatus according to claim 1. Parada further teaches wherein the extraction unit extracts the second training data up to a predetermined number of data pieces (Parada et. al. paragraph [0060] teaches extraction of stack of frames depending on the length of units predicted by the system; the size of the stack interpreted as predetermined value).
Regarding claim 8, Parada in view of Heigold teaches a keyword detecting apparatus configured to perform keyword detection using a keyword model adapted by the apparatus according to claim 1 (Parada et. al. paragraph [0099], Figure. 7 exhibits computing device 700 where keyword detection is performed using a keyword model as disclosed at paragraph [0099] adapted by the apparatus according to claim 1 (see claim 1 above)).
Regarding claim 9, is directed to a method claim corresponding to the apparatus claim presented in claim 1 and is rejected under the same grounds stated above regarding claim 1.
Regarding claim 10, is directed to a non-transitory computer readable medium including computer executable instructions claim corresponding to the apparatus claim presented in claim 1 and is rejected under the same grounds stated above regarding claim 1.
Claim 2 is rejected under 35 U.S.C. §103 as being unpatentable over Parada (U.S. Patent Application Publication 2015/0125594) in view of G. Heigold et al., "Multilingual acoustic models using distributed deep neural networks," 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 8619-8623  further in view of Pearce (U.S. Patent 9,953,634).
Regarding claim 2, Parada in view of Heigold teaches the apparatus according to claim 1, however fails to teach the processor further configured to operate as a second data acquisition unit configured to acquire keyword utterance data including utterance voice of the keyword, wherein the adaptation processing unit adapts the acoustic model to the keyword model using the second training data and the keyword utterance data.  However, Pearce teaches the processor further configured to operate as a second data acquisition unit configured to acquire keyword utterance data including utterance voice of the keyword, wherein the adaptation processing unit adapts the acoustic model to the keyword model using the second training data and the keyword utterance data (Pearce, Fig. 4, steps 404 and 408 teaches updating the trained speaker dependent model based on detecting keyword or the key-phrase in the spoken utterance).
Parada, Heigold and Pearce are considered to be analogous to the claimed invention because they relate generally to speech recognition training methods.  It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention, to use the training methods as taught by Pearce when new utterances are detected to update a trained model as taught by Parada and Heigold. Using the known technique of updating training model as taught by Pearce to improve the quality of training of the model of Parada and Heigold when new utterances are detected would have been obvious to the one of ordinary skill in the art (see Pearce, col 2 lines 13-19).
Claim 3 is rejected under 35 U.S.C. §103 as being unpatentable over Parada (U.S. Patent Application Publication 2015/0125594) in view of G. Heigold et al., "Multilingual acoustic models using distributed deep neural networks," 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 8619-8623  further in view of Chen (G. Chen, C. Parada and G. Heigold, "Small-footprint keyword spotting using deep neural networks," 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 4087-4091).
Regarding claim 3, Parada in view of Heigold teaches the apparatus according to claim 1. Parada further teaches wherein the extraction unit extracts as the second training data, a data piece in which a proportion in number of a letter of the keyword, a letter of the sub-word, the syllable, or the phoneme to the data piece is a predetermined value or more (Parada paragraph [0024] teaches calculating the score of the feature vectors to determine whether a keyword is included which is interpreted as the proportion of the keyword). Furthermore, Chen teaches a data piece in which a proportion in number of a letter of the keyword, a letter of the sub-word, the syllable, or the phoneme to the data piece is a predetermined value (See Chen, pg. 1 Col 2, lines 13-15,  e.g. “A deep neural network is trained to directly predict the keyword(s) or subword units of the keyword(s) followed by a posterior handling method producing a  final confidence score”; interpreted as predetermined value of the subword units of the keyword).
Parada, Heigold and Chen are considered to be analogous to the claimed invention because they relate to speech recognition training methods. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Parada and Heigold to process the score calculation of feature vectors to determine the keyword with the prediction of the subword units of keyword as taught by Chen to simplify the implementation and reduce computation (see Chen, pg. 1, Col2, lines 17-18).
Claims 5 and 6 are rejected under 35 U.S.C. §103 as being unpatentable over Parada (U.S. Patent Application Publication 2015/0125594) in view of G. Heigold et al., "Multilingual acoustic models using distributed deep neural networks," 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 8619-8623  further in view of Liu (J. Liu, Z. Ling, S. Wei, G. Hu and L. Dai, "Cluster-based senone selection for the efficient calculation of deep neural network acoustic models," 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2016, pp. 1-5).
Regarding claim 5, Parada in view of Heigold teaches the apparatus according to claim 1. Parada teaches wherein the extraction unit extracts data pieces as the second training data up to a predetermined number of data pieces (Parada paragraph [0060] teaches extracting stack of frames based on the length of keyword as predicted by the system; interpreted as predetermined stack of data pieces),  however Parada in view of Heigold fail to teach in descending order according to a proportion in number of a 10letter of the keyword, a letter of the sub-word, the syllable, or the phoneme to a data piece.  However, Liu teaches in descending order according to a proportion in number of a letter of the keyword, a letter of the sub-word, the syllable, or the phoneme to a data piece (See Liu, pg. 3, col 2, lines 3-9, e.g.  “Figure 1 demonstrates the DNN structure with cluster-based senone selection for output calculation. Original DNN structure and its weight parameters are kept intactly. A new cluster layer is added on the top hidden layer to predict selected senones.”, Liu, pg. 2, col 2, lines 11-
    PNG
    media_image1.png
    110
    315
    media_image1.png
    Greyscale
14, “Mathematically, supposing the c-th cluster has Kc frames of acoustic features in the training data set, its average posterior vector can be calculated  as (9).  yL are sorted in descending order, and the top Nc senones, of which the accumulated posterior exceeds a predefined confidence α are determined as the selected senones of the c-th cluster”).
Parada, Heigold and Liu are considered to be analogous to the claimed invention because they relate generally to speech recognition training methods. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Parada to extract predetermined stack of frames with the descending cluster based training as taught by Liu to optimize methods of DNN calculation to accelerate the calculations (see Liu, pg. 1, Col2, lines 27-30).
Regarding claim 6, Parada teaches in view of Heigold teaches the apparatus according to claim 1. Parada further teaches wherein the extraction unit extracts as the second training data, data pieces in each of which a proportion in number of a letter of the keyword, a letter of the sub-word, the syllable, or the phoneme to a data piece is a predetermined value or more(Parada paragraph [0024] teaches calculating the score of the feature vectors to determine whether a keyword is included which is interpreted as the proportion of the keyword), up to a predetermined number of data pieces according to the proportion. (Parada paragraph [0060] teaches extracting stack of frames based on the length of keyword as predicted by the system; interpreted as predetermined stack of data pieces). However, Parada in view of Heigold  fails to teach in descending order according to the proportion. However, Liu teaches in descending order according to a proportion (See Liu, pg. 2, col 2, lines 11-14, e.g.  “Figure 1 demonstrates the DNN structure with cluster-based senone selection for output calculation. Original DNN structure and its weight parameters are kept intactly. A new cluster layer is added on the top hidden layer to predict selected senones.”, Liu, pg. 3, col 2, lines 3-9, 
    PNG
    media_image1.png
    110
    315
    media_image1.png
    Greyscale
“Mathematically, supposing the c-th cluster has Kc frames of acoustic features in the training data set, its average posterior vector can be calculated  as (9).  yL are sorted in descending order, and the top Nc senones, of which the accumulated posterior exceeds a predefined confidence α are determined as the selected senones of the c-th cluster”).
Parada, Heigold and Liu are both considered to be analogous to the claimed invention because they relate generally to speech recognition training methods. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Parada and Heigold to extract predetermined stack of frames and calculation of score of frames that correspond with the feature vectors with the descending cluster based training as taught by Liu to optimize methods of DNN calculation to accelerate the calculations (see Liu, pg. 1, Col2, lines 27-30).
Claim 7 is rejected under 35 U.S.C. §103 as being unpatentable over Parada (U.S. Patent Application Publication 2015/0125594) in view of G. Heigold et al., "Multilingual acoustic models using distributed deep neural networks," 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 8619-8623  further in view of Yun (U.S. Patent Application Publication 2015/0302847).
Regarding claim 7, Parada in view of Heigold teaches the apparatus according to claim 1, however fails to teach further comprising a keyword setting unit configured to receive setting of the keyword from a user.  However, Yun teaches further 20comprising a keyword setting unit configured to receive setting of the keyword from a user (Yun, Fig. 9, paragraph [0077], Fig. 9 teaches a flow chart of a method for generating a keyword model of a user-defined keyword from at least one input indicative of the user-defined keyword as indicated in [0077]).
Parada, Heigold and Yun are considered to be analogous to the claimed invention because they relate generally to speech recognition methods for keyword model generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Parada and Heigold to train acoustic models for keyword detection with the user keyword input techniques as taught by Yun to reduce the inconvenience to the user to train the keyword model (see Yun, paragraph [0007]).



Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Bocklet et. al., (US. Patent Number 9,792,907), discloses (Bocklet et. al. abstract, Fig. 7, Fig 8 discloses techniques to update a key phrase model based on scores of sub-phonetic units from an acoustic model to generate a key phrase likelihood score and determining whether received audio input is associated with a predetermined key phrase based on the rejection likelihood score and the key phrase likelihood score). 
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 2:00pm - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NANDINI SUBRAMANI/Examiner, Art Unit 2656                                                                                                                                                                                                        
/EDGAR X GUERRA-ERAZO/Primary Examiner, Art Unit 2656