DETAILED ACTION

Introduction
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
2.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed on 08/05/2022 has been entered.

Information Disclosure Statement
3.	The information disclosure statement (IDS) submitted on 08/05/2022 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments/Amendments
4.	The Applicant’s amendment in conjunction with the Examiner’s Amendment below have overcome the 112(a) rejection. Thus, 112(a) rejection is withdrawn. 

Examiner’s Amendment
5.         An examiner’s amendment to the record appears below.  Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
	Authorization for an examiner’s amendment was given in a telephone interview with Applicant’s attorney of record Mr. Fanqi Meng  (L 0647) on 03/25/2022. 

 	Please, amend Claims 1, 5, 16, 17 and 19 filed on 03/17/2022 as following. 
	
	With respect to Claim 1, please delete Claim 1 and insert
	1. 	(Currently Amended)	A method of enabling wake-up word detection in a target language on a user device, the method comprising:
	receiving acoustic input of a user speaking a wake-up word in the target language when the user device is in a low-power mode;
	providing acoustic features derived from the acoustic input to an acoustic model stored on the user device to obtain a first sequence of speech units corresponding to the wake-up word spoken by the user in the target language, the acoustic model trained on a corpus of training data in a source language different than the target language; 
	comparing the first sequence of speech units with a reference sequence of speech units to recognize the wake-up word in the target language, wherein the reference sequence of speech units is obtained by applying acoustic features derived from audio comprising the user speaking the wake-up word in the target language to the acoustic model; 
responsive to recognizing the wake-up word, transitioning the user device from the low-power mode to an active mode; and 
adapting the acoustic model to the user using both the reference sequence of speech units and the first sequence of speech units. 

	With respect to Claim 5, please delete Claim 5 and insert
	5. 	(Currently Amended ) A user device configured to enable wake-up word detection in a target language, the user device comprising:
	at least one microphone configured to obtain acoustic information from an environment of the user device;
	at least one computer readable medium storing an acoustic model trained on a corpus of
training data in a source language different than the target language; and
	at least one processor coupled to the at least one computer readable medium and
programmed to perform:
		receiving, from the at least one microphone, acoustic input from the user speaking
	a wake-up word in the target language when the user device is in a low-power mode;
		providing acoustic features derived from the acoustic input to the acoustic model 	to obtain a sequence of speech units corresponding to the wake-up word spoken by the 	user in the target language; 
comparing the to a reference sequence of speech units, wherein the reference sequence of speech units obtained by applying acoustic features derived from audio comprising the user speaking the wake-up word in the target language to the acoustic model; and 
adapting the acoustic model to the user using the sequence of speech units and the reference sequence of the speech units. 

	With respect to Claim 16, please delete Claim 16 and insert
	16. 	(Currently Amended) 	A method of performing wake-up word detection on a user device, the method comprising:
	while the user device is operating in a low-power mode:
		receiving acoustic input from a user speaking in a target language;
		providing acoustic features derived from the acoustic input to an acoustic model
	stored on the user device to obtain a first sequence of speech units corresponding to the
	acoustic input, the acoustic model trained on a corpus of training data in a source
	language different than the target language;
determining if the user spoke the wake-up word at least in part by comparing the first sequence of speech units to a reference sequence of speech units stored on the user device, wherein the reference sequence of speech units obtained by applying acoustic features derived from audio comprising the user speaking the wake-up word in the target language to the acoustic model; 
		exiting the low-power mode if it is determined that the user spoke the wake-up 	word; and 
adapting the acoustic model to the user using both the reference sequence of speech units and the first sequence of speech units.

 	With respect to Claim 17, please delete Claim 17 and insert
	17.	(Currently Amended)	 The method of claim 16, wherein the acoustic model was adapted to the user using the audio comprising the user speaking the wake-up word in the target language and the reference sequence of speech units obtained therefrom via the acoustic model.

	With respect to Claim 19, please delete Claim 19 and insert
	19.	(Currently Amended)	 The method of claim 16, wherein the first sequence of speech units comprises a phoneme sequence corresponding to the acoustic input, and wherein the reference sequence of speech units comprises a phoneme sequence corresponding to the user speaking the wake-up word in the target language.

Reasons for Allowance
6.	Claims 1-8, 16-20 are allowed. 
The prior art(s) taken alone or in combination fail(s) to teach the following element(s) in combination with the other recited elements in the claim(s). 
	“providing acoustic features derived from the acoustic input to an acoustic model stored on the user device to obtain a first sequence of speech units corresponding to the wake-up word spoken by the user in the target language, the acoustic model trained on a corpus of training data in a source language different than the target language; 
	comparing the first sequence of speech units with a reference sequence of speech units to recognize the wake-up word in the target language, wherein the reference sequence of speech units is obtained by applying acoustic features derived from audio comprising the user speaking the wake-up word in the target language to the acoustic model; 
responsive to recognizing the wake-up word, transitioning the user device from the low-power mode to an active mode; and 
adapting the acoustic model to the user using both the reference sequence of speech units and the first sequence of speech units.” as recited in Claim 1. 

“at least one computer readable medium storing an acoustic model trained on a corpus of
training data in a source language different than the target language; and
	at least one processor coupled to the at least one computer readable medium and
programmed to perform:
		receiving, from the at least one microphone, acoustic input from the user speaking
	a wake-up word in the target language when the user device is in a low-power mode;
		providing acoustic features derived from the acoustic input to the acoustic model 	to obtain a sequence of speech units corresponding to the wake-up word spoken by the 	user in the target language; 
comparing the to a reference sequence of speech units, wherein the reference sequence of speech units obtained by applying acoustic features derived from audio comprising the user speaking the wake-up word in the target language to the acoustic model; and 
adapting the acoustic model to the user using the sequence of speech units and the reference sequence of the speech units.” as recited in Claim 5. 

	 	“providing acoustic features derived from the acoustic input to an acoustic model
	stored on the user device to obtain a first sequence of speech units corresponding to the
	acoustic input, the acoustic model trained on a corpus of training data in a source
	language different than the target language;
determining if the user spoke the wake-up word at least in part by comparing the first sequence of speech units to a reference sequence of speech units stored on the user device, wherein the reference sequence of speech units obtained by applying acoustic features derived from audio comprising the user speaking the wake-up word in the target language to the acoustic model; 
		exiting the low-power mode if it is determined that the user spoke the wake-up 	word; and 
adapting the acoustic model to the user using both the reference sequence of speech units and the first sequence of speech units.” as recited in Claim 16.

 	The closest prior arts found as following. 
a.	Zopt et al. (US 2016/0189736 A1.) In this reference, Zopt et al. disclose a method for isolated word training and detection (Zopt et al. Fig. 3 element 314, [0044] one or more microphones 314, [0050] During an initial training phase, a user may be prompt to utter their custom wake-up phrase (WUP) multiple times. The utterances are captured as user training sequences 402 at a sampling rate, e.g., of at least 8 kHz, and the audio representation thereof is provided to VAD 404, Fig. 8 elements 802 Receive a user-specified wake-up phrase, 804 Generate a phoneme concatenation model of the user-specified wake-up phase, 806 Generate a word model of the user-specified wake-up phrase based on the phoneme concatenation model, Fig. 2 elements 206, 208, 214, 216. Zopt et al. utilizes HMM model combined with phoneme recognition to generate a sequence of phonemes corresponding with the user utterance, [0091] After detection by a VAD, an audio representation of the activity is passed to the WUPD portion. The WUPD portion is configured to sample frames of the audio representation with audio features extracted by the feature extraction portion and compare the samples to stored wake-up phrase model to determine if the wake-up phrase is present.) However, Zopt et al. does not teach the user speaks a wake-up phrase in the target language in the enrollment phase/in the authorization phase, the acoustic/phoneme model used to obtain the first sequence of speech units in the authorization phase and the reference sequence of speech units in the enrollment phase is in the source language, wherein the source language is different than the target language. Zopt et al. teach adapting the word model based on additional provisioning of the wake-up phase. Zopt et al. does not teach adapting the acoustic model to the user using both the reference sequence of speech units and the first sequence of speech units. Thus, Zopt et al. fail to teach and/or suggest the allowable subject matter noted above. 
b.	Prémont et al. (US 2019/0073999 A1.) In this reference, Prémont et al. teach a method for detecting a designated wake-up word (Prémont et al. [0005] According to some aspects, a system for detecting a designated wake-up word is provided, the system comprising a plurality of microphones to detect acoustic information from a physical space having a plurality of acoustic zones, at least one processor configured to receive a first acoustic signal representing the acoustic information received by the plurality of microphones, process the first acoustic signal to identify content of the first acoustic signal originating from each of the plurality of acoustic zones, provide a plurality of second acoustic signals, each of the plurality of second acoustic signals substantially corresponding to the content identified as originating from a respective one of the plurality of acoustic zones, and performing automatic speech recognition on each of the plurality of second acoustic signals to determine whether the designated wake-up word was spoken, [0035] According to some embodiments, speech recognition models used by units 116a, 116b, . . . , 116n may have different acoustic models. Each of the speech recognition models are used to each detect a wake-up word within one particular acoustic zone, and since each acoustic zone may exhibit a different acoustic environment, it may be beneficial to train the speech recognition models for the acoustic environment of their associated acoustic zones. For example, a system in which the acoustic zones are different rooms of a house may exhibit different acoustic environments in each room due to differences in background noise, shapes and sizes of the rooms and/or contents of the rooms. The speech recognition model associated with each acoustic zone may therefore be trained to recognize a wake-up word within the acoustic environment of the associated acoustic zone to improve recognition of the wake-up word.) Prémont et al. teach training the speech recognition models for the acoustic environment of their associated acoustic zones. Prémont et al. does not teach the user speaks a wake-up phrase in the target language in the enrollment and in the authorization phase, the acoustic/phoneme model used to obtain the first sequence of speech units in the authorization phase and the reference sequence of speech units in the enrollment phase is in the source language, wherein the source language is different than the target language. Prémont et al. does not teach adapting the acoustic model to the user using both the reference sequence of speech units and the first sequence of speech units. Thus, Prémont et al. fail to teach and/or suggest the allowable subject matter noted above. 
c. 	Bapat et al. (US 2015/0154953 A1.) In this reference, Bapat et al. disclose a method for generating one or more wake-up words (Bapat et al. [0008] Another embodiment includes a system for generating one or more wake-up words. The system includes an interface device and a wake-up-word (WUW) processing engine. The interface device is configured to receive a text representation of the one more wake-up words and an audio representation of the one or more wake-up words. The WUW processing engine is configured to determine a strength of the text representation of the one or more wake-up words based on one or more static measures and to determine a strength of the audio representation of the one or more wake-up words based on one or more dynamic measures. The interface device is also configured to provide feedback on the one or more wake-up words based on the strengths of the text and audio representations.) In Bapat et al. utilizes the dynamic measures to determine a strength of the text representation of the one or more wake-up words. Bapat et al. does not teach the user speaks a wake-up phrase in the target language in the enrollment phase/in the authorization phase, the acoustic/phoneme model used to obtain the first sequence of speech units in the authorization phase and the reference sequence of speech units in the enrollment phase is in the source language, wherein the source language is different than the target language. Bapat et al.  does not teach adapting the acoustic model to the user using both the reference sequence of speech units and the first sequence of speech units. Thus, Bapat et al. fail to teach and/or suggest the allowable subject matter noted above. 
d.	Muschett et al. (US 2008/0154599 A1.) In this reference, Muschett et al. disclose a method for authenticating a user based upon a spoken password processed through a standard speech recognition engine (Muschett et al. [0015] FIG. 2 is a flow chart of a method 200 for creating and using spoken free-form passwords to authenticate users in accordance with an embodiment of the inventive arrangements disclosed herein. The method 200 can be performed in the context of a system 100 or any system having speech recognition capabilities and an ability to acoustically generate and use speaker dependent grammars. The method 200 includes a process 205 to establish a password and a process 225 to utilize established passwords, [0016] The password establishment process 205 can begin in step 210, where a user can be prompted to audibly provide a password. The password can be free-form and can include any user generated utterance, such as a word, a phrase, or any other noise. In one embodiment, the utterance used for the password is used to generate an acoustic baseform and is not converted into text. Consequently, the utterance can be in any language or dialect and can include slang. The flexibility of the free-form utterance advantageously permits a user to create a highly unique password which is easy for the user to remember. Further, use of an acoustic baseform as a password is uniquely associated with a user's voice and is not readable by others (unlike textual passwords). Thus, acoustic baseform passwords are difficult for unauthorized users to steal by invading (i.e., hacking into) a security system.) The password in Muschett et al. is a free-form password in any language. However, Muschett et al. does not teach the user speaks a wake-up phrase in the target language in the enrollment phase and in the authorization phase. Muschett et al. does not teach using the acoustic/phoneme model trained in the source language different from the target language to obtain the first sequence of speech units in the authorization phase and the reference sequence of speech units in the enrollment phase. Thus, Muschett et al. fail to teach and/or suggest the allowable subject matter noted above. 
e.	B. Ramabhadran, L. R. Bahl, P. V. deSouza and M. Padmanabhan, “Acoustics-only based automatic phonetic baseform generation,” Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), 1998, pp. 309-312 vol.1, doi: 10.1109/ICASSP.1998.674429. In this reference, Ramabhadran et al. disclose a method for generating phonetic baseforms of words based on acoustic evidence alone (Section 2. Algorithm for generating baseforms. This section describes the algorithm that is used for generating the phonetic baseforms from the acoustic alone. The goal here is to find the phone string P that maximizes p(P|U), where U is the utterance for which the baseform is to be generated. The algorithm proceeds as follows. The acoustic data from the enrolled utterance is labelled in less than real time using the ballistic labeler. Thin involves the construction of a trellis of arc (sub phone units) nodes from the speech utterance. The probability of a transition occurring from one arc to another is determining by weighing the score obtained from a Hidden Markov Model (HMM) [3] with a precomputed arc to arc transition probability obtained from any training corpora. At any time frame the set of active nodes in the trellis is defined as the nodes with scores greater than a certain pruning threshold. Once the entire utterance has been processed, a back-tracking procedure is employed that traces the best arc-predecessor from the end of the utterance, forcing silence at the beginning and at the end of utterance. Thus a sequence of phonetic arcs are obtained from which a phone sequence (baseform) is derived for that enrolled utterance. This corresponds to one pronunciation for the word enrolled by the user that will be used subsequently for recognition.) Ramabhadran et al. generates the baseform for the enrolled utterance and subsequently uses that baseform for recognition. However, Ramabhadran et al. does not teach the utterance of the user is the target language and the acoustic model is trained in the source language different than the target language. Thus, Ramabhadran et al. fail to teach and/or suggest the allowable subject matter noted above. 
 	Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
7.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429. The examiner can normally be reached Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C. Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/THUYKHANH LE/Primary Examiner, Art Unit 2655