DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments/Amendments
2.	With respect to Claim Objections, the formal issue is/are corrected. The Claim Objections towards Claims 5 and 9 are withdrawn. 
	With respect to the amended claims 1, 5, 9 and 16, the Applicant indicated on page 7 of the Remark that “Support of these amendments can be found in at least paragraphs 49 and 53 of the original filed specification.” 
	In response, the Examiner notes that paragraph [0049] disclose a training phase of a wake-up word. The use’s utterance in the training phase is called “enrollment utterance”. The sequence of speech units corresponds with the enrollment utterance is reference sequence of speech unit. Paragraph [0053] disclose a detecting phase. In the detecting phase, the input sequence of speech unit is compared with the reference sequence of speech units stored on mobile device. The element 210 in Fig. 2 is an enrollment utterance, the element 300 in Fig. 3 show an acoustic input. The acoustic input received from the user is compared with the reference sequence of speech units of the enrollment utterance. The amended claim 1 recites the limitation of “comparing the acoustic input to stored sequence of speech units to increase recognition at the low-power mode of the wake-up word in the target language using the sequence of speech units stored on the user device.” whereas stored sequence of speech units as claimed is generated from the acoustic input. The Examiner reviews not only paragraphs [0049 and 0050] but also each of paragraph in the specification and each Figure. The amended claim 1 does not have support in the specification. Claims 5 and 16 has similar issues. Please see more analysis in the 112(a) rejection in the following section. 
	With respect to 103 rejection, the Applicant argued on page 9 of the Remarks that “Hanazawa appears to disclose recognizing “accents” where “speakers who for example speak English with the same Japanese accent” and “an acoustic model is adapted to a specific speaker in the same language or dialect.”
in the present example, as an example of language adaptation in which an acoustic model is adapted to a language, an example of dialects in described. However, for example, the same is true for the case where an acoustic model is adapted to a difference between languages, i.e. between Japanese and English, or to English with a Japanese, Fig. 2 elements 14, 15, 16, 17, [0043] the phoneme detection unit 17 outputs a phoneme thereof as a detection result.) The example for language adaptation in paragraph [0119] is between English and Japanese. Hanazawa et al. utilizes the language adaption system to detect phoneme in the input voice, wherein the acoustic model in the language adaption system is adapted to a different language. Applicant’s arguments are not persuasive, and thus for these reason, Examiner respectfully disagree. 

Claim Rejections - 35 USC § 112
3.	The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

4.	Claims 1-8, 16-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably 
	Claim 1 recites the limitation of 
	“receiving acoustic input of a user speaking a wake-up word in the target language when the user device is in a low-power mode; 
 	providing acoustic features derived from the acoustic input to an acoustic model stored on the user device to obtain a sequence of speech units corresponding to the wake-up word spoken by the user in the target language, the acoustic model trained on a corpus of training data in a source language different than the target language; and 
 	storing the sequence of speech units in the target language on the user device for use in subsequent wake-up word detection including comparing the acoustic input to stored sequence of speech units to increase recognition at the low-power mode of the wake-up word in the target language using the sequence of speech units stored on the user device.”
	Claim 5 recites the limitation of 
	“receiving, from the at least one microphone, acoustic input from the user speaking a wake-up word in the target language when the user device is in a low-power mode; 
 	providing acoustic features derived from the acoustic input to the acoustic model to obtain a sequence of speech units corresponding to the wake-up word spoken by the user in the target language; and 
 	storing the sequence of speech units in the target language on the user device for use in subsequent wake-up word detection including comparing the acoustic input to stored sequence of speech units to increase recognition at the low-power mode of the wake-up word in the target language using the sequence of speech units stored on the user device.” 
	Both the amended claims 1 and 5 “compares the acoustic input to stored sequence of speech units” and “stored sequence of speech units” as claimed is/are generated from “the acoustic input”.
	Paragraph [0006] of the specification disclose “Some embodiments include a user device configured to perform wake-up word detection in a target language. The user device comprises at least one microphone configured to obtain acoustic information from the environment of the user device, at least one computer readable medium storing an acoustic model trained on a corpus of training data in a source language different than the target language, and storing a first sequence of speech units obtained by providing acoustic features derived from audio comprising the user speaking a wake-up word in the target language to the acoustic model, and at least one processor coupled to the at least one computer readable medium and programmed to perform receiving, from the at least one microphone, acoustic input from the user speaking in the target language while the user device is operating in a low-power mode, applying acoustic features derived from the acoustic input to the acoustic model to obtain a second sequence of speech units corresponding to the acoustic input, determining if the user spoke the wake-up word at least in part by comparing the first sequence of speech units to the second sequence of speech units, and exiting the low-power mode if it is determined that the user spoke the wake-up word.” This paragraph indicates that determining if the user spoke the wake-up word at least in part by comparing the first sequence of speech units to the second sequence of speech units. The first sequence of speech units and the second sequence of speech units are obtained from the user at different times. The first sequence of speech unit are obtained in the training phase, this first sequence of speech units is/are stored in the computing device. The second sequence of speech units are obtained in the detecting phase. Paragraph [0006] does not teach “compares the acoustic input to stored sequence of speech units” and whereas “stored sequence of speech units” is/are generated from “the acoustic input”.
 	Paragraph [0049] of the specification discloses “The user of mobile device 400 may speak a language different than the source language on which the acoustic model was trained. To enable wake-up word detection in the language of the user, techniques described herein may be performed. For example, method 200 described in connection with FIG. 2 may be performed to enable mobile device 400 to perform wake-up detection in a target language different than the source language on which the available acoustic model was trained. In particular, the user may be prompted to speak an enrollment utterance that includes the desired wake-up word in the target language (e.g., the language spoken by the user). The user may be prompted via mobile device, for example, using the display interface, via synthesized speech provided via a speaker, or via any one or combination of interface elements. Mobile device 400 may obtain the enrollment utterance via one or more microphones 130 provided on the device (e.g., by performing exemplary act 210 described in connection with FIG. 2). Mobile device 400, either alone or using network resources, may process the enrollment utterance by applying the wake-up word spoken by the user to the acoustic model (e.g., by performing exemplary act 220 described in connection with FIG. 2). For example, acoustic features derived from audio of the user speaking the wake-up word in the target language may be provided as input to the acoustic model trained in a source language to obtain a sequence of speech units corresponding to the wake-up word.” This paragraph discloses a training phase of a wake-up word. The use’s utterance in the training phase is called “enrollment utterance”. The sequence of speech units corresponds with the enrollment utterance is reference sequence of speech unit. Paragraph [0049] does not teach “compares the acoustic input to stored sequence of speech units” and whereas “stored sequence of speech units” is/are generated from “the acoustic input”.
 	Paragraph [0053] of the specification discloses “[0053] In turn, the input sequence of speech units may be compared to a reference sequence of speech units stored on mobile device 400 (e.g., stored on computer readable medium 435) to assess whether the user spoke a wake-up word (e.g., by performing exemplary act 330 described in connection with FIG. 3). As discussed above, any suitable comparison may be used to reach a determination as to whether the user spoke a wake-up word in the target language. When it is determined that the user spoke a wake-up word, initiation of a transition from the low-power mode, initiation or performance of one or more tasks associated with the wake-up word, or a combination thereof may be performed (e.g., by performing exemplary act 340 described in connection with FIG. 3). As discussed above, the input sequence of speech units obtained from the acoustic input can be compared to any reference sequence stored on the mobile device to determine if any valid wake-up was spoken by the user, and corresponding action may be performed when it is determined that the valid wake-up word was spoken. In this manner, voice activation and/or control of mobile device 400 in a user's language can be achieved even if an acoustic model trained via the user's language may not be available to the mobile device 400 (e.g., when operated in a low power mode or otherwise).” This paragraph discloses the detecting phase. The input sequence of speech units is compared with a reference sequence of speech units stored on the computing device to determining if the user spoke a wake-up word. Paragraph [0053] does not teach “compares the acoustic input to stored sequence of speech units” and whereas “stored sequence of speech units” is/are generated from “the acoustic input”.
storing the first sequent of speech units in the target language on the user device for use in subsequent wake-up word detection to increase recognition at the low-power mode of the wake-up word in the target language; and” 
 	The first sequent of speech units as claimed in claim 16 are generated in the detecting phase. This first sequent of speech units as claimed is/are generated from the acoustic input in the limitation of “while the user device is operating in a low-power mode: receiving acoustic input from a user speaking in a target language;” The specification and the figure disclose a method for storing the sequence of speech unit in the training phase. The specification and the figure does not disclose a method for storing the sequence of speech unit in the detecting phase. 
Claims 2-4, 6-8, 17-20 are rejected 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph as Claims 1, 5, 16 by virtue of their dependency.

	Claim Rejections - 35 USC § 103
5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

6.	Claims 9-12, 14, 15 are rejected under 35 U.S.C. 103 as being unpatentable over Zopt et al. (US 2016/0189706 A1) in view of Hanazawa et al. (US 2011/0224985 A1). 

	With respect to Claim 9, Zopt et al. disclose
 	A user device configured to perform wake-up word detection in a target language, the user device comprising: 
 	at least one microphone configured to obtain acoustic information from an environment of the user device (Zopt et al. Fig. 3 element 314, [0044] one or more microphones 314, [0050] During an initial training phase, a user may be prompt to utter their custom wake-up phrase (WUP) multiple times. The utterances are captured as user training sequences 402 at a sampling rate, e.g., of at least 8 kHz, and the audio representation thereof is provided to VAD 404.); 
Fig. 8 elements 802 Receive a user-specified wake-up phrase, 804 Generate a phoneme concatenation model of the user-specified wake-up phase, 806 Generate a word model of the user-specified wake-up phrase based on the phoneme concatenation model, Fig. 2 elements 206, 208, 214, 216. Zopt et al. utilizes HMM model combined with phoneme recognition to generate a sequence of phonemes corresponding with the user utterance); and
 	at least one processor coupled to the at least one computer readable medium and programmed to perform (Zopt et al. [0116] Such computer-readable storage media may, for example, store computer program logic, e.g., program modules, comprising computer executable instructions that, when executed by one or more processor circuits, provide and/or maintain one or more aspects of functionality described herein with reference to the figures, as well as any and all components, capabilities, and functions therein and/or further embodiments described herein):
 	receiving, from the at least one microphone, acoustic input from the user speaking in the target language while the user device is operating in a low-power mode (Zopt et al. Fig. 3 element 808 Detect audio activity of the user, [0066] VAD 504 may be configured as stand-alone electrical circuitry, and in further embodiments, such electrical circuitry may be configured to have a very low power consumption (e.g., during a stand-by or sleep mode of an electronic device); 
 	applying acoustic features derived from the acoustic input to the acoustic model to obtain a second sequence of speech units corresponding to the acoustic input (Zopt et al. [0082] In flowchart 700, feature vectors of the at least one audio input are decoded using the one or more stored phoneme models to generate the phoneme transcription that comprises one or more phoneme identifiers); 
 	determining if the user spoke the wake-up word at least in part by comparing the first sequence of speech units to the second sequence of speech units (Zopt et al. [0091] After detection by a VAD, an audio representation of the activity is passed to the WUPD portion. The WUPD portion is configured to sample frames of the audio representation with audio features extracted by the feature extraction portion and compare the samples to stored wake-up phrase model to determine if the wake-up phrase is present); and  
 	storing the first sequent of speech units in the target language on the user device (Zopt et al. Fig. 8 elements 802 Receive a user-specified wake-up phrase, 804 Generate a phoneme concatenation model of the user-specified wake-up phase, 806 Generate a word model of the user-specified wake-up phrase based on the phoneme concatenation model, Fig. 2 elements 206, 208, 214, 216. Zopt et al. utilizes HMM model combined with phoneme recognition to generate a sequence of phonemes corresponding with the user utterance) for use in subsequent wake-up word detection to increase recognition at the low-power mode of the wake-up word in the target language (Zopt et al. [0028] When not in use, the device may be placed into a low-power stand-by mode. When in stand-by mode, the device may be woken by providing the wake-up phrase, after which the device may be put into a normal mode of operation); and 
 	exiting the low-power mode if it is determined that the user spoke the wake-up word (Zopt et al. [0028] When not in use, the device may be placed into a low-power stand-by mode. When in stand-by mode, the device may be woken by providing the wake-up phrase, after which the device may be put into a normal mode of operation).  
 	Zopt et al. fail to explicitly teach storing and using an acoustic model trained on a corpus of training data in a language which is different with the language of the user. More specifically, Zopt et al. fail to explicitly teach
 	storing an acoustic model trained on a corpus of training data in a source language different than the target language, and
	However, Hanazawa et al. teach
	storing an acoustic model trained on a corpus of training data in a source language different than the target language, and (Hanazawa et al. [0119] Incidentally, in the present example, as an example of language adaptation in which an acoustic model is adapted to a language, an example of dialects in described. However, for example, the same is true for the case where an acoustic model is adapted to a difference between languages, i.e. between Japanese and English, or to English with a Japanese, Fig. 2 elements 14, 15, 16, 17, [0043] the phoneme detection unit 17 outputs a phoneme thereof as a detection result.)
 	Zopt et al. and Hanazawa et al. are analogous art because they are from a similar field of endeavor in the Signal Processing algorithm and applications. Therefore, it would have been when being used for speaker verification, the adapted acoustic model is expected to achieve a high level of verification accuracy.)

 	With respect to Claim 10, Zopt et al. in view of Hanazawa et al. teach 
 	wherein the acoustic model was adapted using the audio comprising the user speaking the wake-up word in the target language and the first sequence of speech units obtained therefrom via the acoustic model (Hanazawa et al. [0119] Incidentally, in the present example, as an example of language adaptation in which an acoustic model is adapted to a language, an example of dialects in described. However, for example, the same is true for the case where an acoustic model is adapted to a difference between languages, i.e. between Japanese and English, or to English with a Japanese, Fig. 2 elements 14, 15, 16, 17, [0043] the phoneme detection unit 17 outputs a phoneme thereof as a detection result.)

 	With respect to Claim 11, Zopt et al. in view of Hanazawa et al. teach  
 	wherein the first sequence of speech units comprises a phoneme sequence corresponding to the user speaking the wake-up word in the target language (Zopt et al. [0035] The electronic device includes a first processing component and a second processing component. The first processing component is configured to receive a user-specified wake-up phrase. The first processing component is also configured to generate a phoneme concatenation model of the user-specified wake-up phrase, and to generate a word model of the user-specified wake-up phrase based on the phoneme concatenation model), and wherein the second sequence of speech units comprises a phoneme sequence corresponding to the acoustic input (Zopt et al. [0035] The second processing component is configured to detect audio activity of the user, and to determine if the user-specified wake-up phrase is present within the audio activity based on the word model, [0082] In flowchart 700, feature vectors of the at least one audio input are decoded using the one or more stored phoneme models to generate the phoneme transcription that comprises one or more phoneme identifiers).

 	With respect to Claim 12, Zopt et al. in view of Hanazawa et al. teach  
 	wherein the at least one processor comprises a low-power processor (Zopt et al. [0063] the AP and lower-power processors of processor(s) 324 shown in Fig. 3 may be included in detection and adaptation system 500.)

 	With respect to Claim 14, Zopt et al. in view of Hanazawa et al. teach  
 	wherein exiting the low-power mode comprises transitioning the user device from an idle mode to an active mode (Zopt et al. [0028] When not in use, the device may be placed into a low-power stand-by mode. When in stand-by mode, the device may be woken by providing the wake-up phrase, after which the device may be put into a normal mode of operation).

 	With respect to Claim 15, Zopt et al. in view of Hanazawa et al. teach  
 	wherein the user device is a mobile device (Zopt et al. [0118] The techniques and embodiments described herein may be implemented as, or in, various types of devices. For instance, embodiments may be included in mobile devices such as laptop computers, handheld devices such as mobile phones (e.g., cellular and smart phones), handheld computers, and further types of mobile devices).

7.	Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Zopt et al. (US 2016/0189706 A1) in view of Hanazawa et al. (US 2011/0224985 A1) and Stevan et al. (US 2018/0108343 A1). 

 	With respect to Claim 13, Zopt et al. in view of Hanazawa et al. teach all the limitations of Claim 9 upon which Claim 13 depends. Zopt et al. in view of Hanazawa et al. fail to explicitly teach 
 	wherein at least one task is associated with the wake-up word, and wherein the at least one processor is programmed to initiate performance of the at least one task when it is determined that the user spoke the wake-up word.  
	However, Stevans et al. teach  
By changing the configuration based on the wake-up phrase detected, it is possible to cause a virtual assistant to behave in very difference ways, which can have many potential applications, [0042] Some embodiments include dedicated wake-up phrases for device control commands, such as “turn off phone”. Some embodiments allow users to set wake-up phrases that invoke particular actions, such as "call home" to dial a particular user-configured phone number. Some embodiments include dedicated wake-up phrases for emergency functions, such as "call 9-1-1", [0066] certain wake-up phrases respond to commands that control operation of a local device, [0028] based on the wake-up phrase detected, it is possible to cause a virtual assistant to behave in very difference ways, which can have many potential applications.)
	Zopt et al., Hanazawa et al. and Stevans et al. are analogous art because they are from a similar field of endeavor in the Signal Processing algorithm and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of detecting the wake-up phrase as taught by Zopt et al., using teaching of adapting the acoustic model to a different language as taught by Hanazawa et al. for the benefit of achieving a high level of verification accuracy in speaker verification, using teaching of detecting of wake-up phrases as taught by Stevans for the benefit of performing a task associated with the wake-up word (Stevans et al. [0042] Some embodiments include dedicated wake-up phrases for device control commands, such as “turn off phone”. Some embodiments allow users to set wake-up phrases that invoke particular actions, such as "call home" to dial a particular user-configured phone number. Some embodiments include dedicated wake-up phrases for emergency functions, such as "call 9-1-1", [0066] certain wake-up phrases respond to commands that control operation of a local device.)

Allowable Subject Matter
8.	Claims 1-8 and 16-20 would be allowable if the 112 (a) rejection noted above is overcome.
the prior art(s) taken alone or in combination fail(s) to teach the following element(s) in combination with the other recited elements in the claim(s).
	“receiving acoustic input of a user speaking a wake-up word in the target language when the user device is in a low-power mode; 
 	providing acoustic features derived from the acoustic input to an acoustic model stored on the user device to obtain a sequence of speech units corresponding to the wake-up word spoken by the user in the target language, the acoustic model trained on a corpus of training data in a source language different than the target language; and 
 	storing the sequence of speech units in the target language on the user device for use in subsequent wake-up word detection including comparing the acoustic input to stored sequence of speech units to increase recognition at the low-power mode of the wake-up word in the target language using the sequence of speech units stored on the user device.” as recited in Claim 1. 
	“receiving, from the at least one microphone, acoustic input from the user speaking a wake-up word in the target language when the user device is in a low-power mode; 
 	providing acoustic features derived from the acoustic input to the acoustic model to obtain a sequence of speech units corresponding to the wake-up word spoken by the user in the target language; and 
 	storing the sequence of speech units in the target language on the user device for use in subsequent wake-up word detection including comparing the acoustic input to stored sequence of speech units to increase recognition at the low-power mode of the wake-up word in the target language using the sequence of speech units stored on the user device.” as recited in Claim 5. 
	“providing acoustic features derived from the acoustic input to an acoustic model stored on the user device to obtain a first sequence of speech units corresponding to the acoustic input, the acoustic model trained on a corpus of training data in a source language different than the target language; 
 	determining if the user spoke the wake-up word at least in part by comparing the first sequence of speech units to a second sequence of speech units stored on the user device, the second sequence of speech units obtained by applying acoustic features derived from audio comprising the user speaking the wake-up word in the target language to the acoustic model;
storing the first sequent of speech units in the target language on the user device for use in subsequent wake-up word detection to increase recognition at the low-power mode of the wake-up word in the target language; and” as recited in Claim 16.  
	The closest prior art found as following. 
a. 	Zopt et al. (US 2016/0189706). In this reference, Zopt et al. disclose a method for training the word model of a user-specific wake-up word and utilizes the trained word model to detect the wake-up word in the user’s utterance language (Zopt et al. Fig. 8 elements 802 Receive a user-specified wake-up phrase, 804 Generate a phoneme concatenation model of the user-specified wake-up phase, 806 Generate a word model of the user-specified wake-up phrase based on the phoneme concatenation model, Fig. 2 elements 206, 208, 214, 216. Zopt et al. utilizes HMM model combined with phoneme recognition to generate a sequence of phonemes corresponding with the user utterance, [0028] When not in use, the device may be placed into a low-power stand-by mode. When in stand-by mode, the device may be woken by providing the wake-up phrase, after which the device may be put into a normal mode of operation.) The method for training and detecting the wake-up word in Zopt et al. has two parts: training phase and detecting phase. In Fig. 8, elements 802-804 illustrates the training phase and element 808, 810 illustrates the detecting phase. Zopt et al. does not teach and/or suggest the method of comparing the acoustic input to stored sequence of speech unit, wherein stored sequence of speech unit in fact is generated from the acoustic input as recited in claims 1 and 5. Zopt et al. also does not teach and/or suggest the method of comparing the first sequence of speech units to a second sequence of speech unit stored on the user device, and later on storing the first sequence of speech units in the target language on the user device for use in subsequent wake-up word detection as recited in claim 16. 
b. 	Rosner et al. (US 2013/0339028 A1.) In this reference, Rosner et al. disclose a method for activating the device by voice (Rosner et al. [0037] In an embodiment, the speech recognition engine instead transitions to a wake-up word detection state from the stand by state based on the second activation signal. In the wake-up word detection state, the speech recognition engine can be configured to specifically recognize wake-up words in the audio signal. In doing so, only those sets of acoustic, key word, and/or grammar models that are need to recognize wake-up words are loaded. Moreover, because fewer models are located, the recognizing function can be less power consuming because fewer comparisons between the received audio signal and the different models need to be conducted. Thus, the speech recognition engine can use less power in the wake-up word detection state than in the fully-operational state. In a further embodiment, the speech recognition engine can be configured to transition from the wake-up word detection state to either the stand-by state or the fully-operational state depending on whether wake-up words are recognized within the audio signal. Specifically, if wake-up words are determined to be present in the received audio signal, the speech recognition engine can be transitioned to the fully-operational state. If not, the speech recognition engine can be transitioned to the stand-by state.) Rosner et al. detects a wake-up word in the received audio signal by first loading the acoustic models and keyword spotting models and second performing only the comparisons between the received audio signal with the different models. Rosner et al. does not teach and/or suggest the method of comparing the acoustic input to stored sequence of speech unit, wherein stored sequence of speech unit in fact is generated from the acoustic input as recited in claims 1 and 5. Rosner et al. also does not teach and/or suggest the method of comparing the first sequence of speech units to a second sequence of speech unit stored on the user device, and later on storing the first sequence of speech units in the target language on the user device for use in subsequent wake-up word detection as recited in claim 16. 
c. 	Basye et al. (US 2014/0163978 A1.) In this reference, Basye et al. disclose a method for managing a power consumption of a computing device. (Basye et al. [0010] Accordingly, aspects of the present disclosure are directed to power management for speech recognition. A computing device may be provided with a power management subsystem that selectively activates or deactivates one or more modules of the computing device. This activation may be responsive to an audio input that includes one or more pre-designated spoken words, sometimes referred to herein as "keywords." A keyword that prompts the activation of one or more components may be activated is sometimes referred to herein as a "wakeword," while a keyword that prompts the deactivation of one or more components is sometimes referred to herein as a "sleepword." In a particular example, the computing device may include a selectively activated network interface module that, when activated, consumes energy to provide the computing device with connectivity to a second computing device, such as a speech recognition server or other computing device. The power management subsystem may process an audio input to determine that the audio input includes a wakeword, and activate the network interface module in response to determining that the audio input comprises the wakeword. With the network interface module activated, the power management subsystem may cause transmission of the audio input to a speech recognition server for processing, [0036] The non-transitory computer-readable medium drive 204 may include any electronic data storage known in the art. In some embodiments, the non-transitory computer-readable medium drive 204 stores one or more keyword models (e.g., wakeword models or sleepword models) to which an audio input may be compared by the power management subsystem 100.) Basye et al. compares the audio input to one or more store keyword models to detect keyword in the audio input. Basye et al. does not teach and/or suggest the method of comparing the acoustic input to stored sequence of speech unit, wherein stored sequence of speech unit in fact is generated from the acoustic input as recited in claims 1 and 5. Basye et al. also does not teach and/or suggest the method of comparing the first sequence of speech units to a second sequence of speech unit stored on the user device, and later on storing the first sequence of speech units in the target language on the user device for use in subsequent wake-up word detection as recited in claim 16. 

Conclusion
9.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. See PTO-892.
a.	Xiao et al. (US 2016/0019884.) This reference disclose a method for the acoustic model may be “adapted” to the different acoustic environment without modifying the acoustic model itself. 
b.	Jagatheesan et al. (US 2015/0025890 A1.) This reference discloses a method for adapting the language model and the acoustic model. 
c. 	Weinstein et al. (US 2012/0278061 A1.) This reference discloses a method for generating an adapted acoustic model. 

10. 	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after 

11.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429. The examiner can normally be reached Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C. Flanders can be reached on (571) 272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/THUYKHANH LE/Primary Examiner, Art Unit 2655