DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments/Amendments
2.	With respect to Claim Rejections 35 USC § 102/103, Applicant’s arguments have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

	Claim Rejections - 35 USC § 103
3.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

4.	Claims 1-9, 11, 14-23, 25, 28 are rejected under 35 U.S.C. 103 as being unpatentable over Hughes et al. (US 2018/0182390 A1) in view of VanBlon et al. (US 2017/0169817 A1.)

 	With respect to Claim 1, Hughes et al. disclose 
 	A method for activating speaker-dependent warm words, the method comprising: 
 	receiving, at data processing hardware, audio data corresponding to an utterance spoken by a user and captured by an assistant-enabled device associated with the user, the utterance comprising a command for a digital assistant to perform a long-standing operation (Hughes et al. [0017] the computing device 104 begins to play music in response to the utterance 106, “OK computer, play music.” The computing device 104 begins to play music);
 	after receiving the audio data corresponding to the utterance (Hughes et al. [0023] With the music playing 122 the computing device 104 is running a music application in either the foreground or the background. The computing device 104 may include a context identifier 124 and an active hotword selector 126. The context identifier 124 may be configured to identify a current context of the computing device 104. The active hotword selector 126 may use the current context of the computing device 104 to select active hotwords. In this example, the context of the device may be related to playing music 122 and running the music application): 
 		activating, by the data processing hardware, a set of one or more warm words each associated with a respective action for controlling the long-standing operation (Hughes et al. [0023] The active hotword selector 126 may examine the code of the music application to identify any hotwords that the developers of the music application want users to be able to speak to interact with the music application and the respective actions for each hotword. The music application may identify hotwords such as “play,” “next,” “stop,” and “back.” Based on the context of music actively playing the active hotword selector 126 may select the hotwords of “next,” “stop,” and “back” and store them in the active hotwords 112); and
 		 associating, by data processing hardware, the activated set of one or more warm words with only the user that spoke the utterance (Hughes et al. [0023] The active hotword selector 126 may examine the code of the music application to identify any hotwords that the developers of the music application want users to be able to speak to interact with the music application and the respective actions for each hotword. The music application may identify hotwords such as “play,” “next,” “stop,” and “back.” Based on the context of music actively playing the active hotword selector 126 may select the hotwords of “next,” “stop,” and “back” and store them in the active hotwords 112, [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples); and 
 	while the digital assistant is performing the long-standing operation (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music): 
 		receiving, at the data processing hardware, additional audio data corresponding to an additional utterance captured by the assistant-enabled device (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music); 
 		identifying, by the data processing hardware, in the additional audio data, one of the warm words from the activated set of one or more warm words (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music); 
 		performing, by the data processing hardware, speaker verification on the additional audio data to determine whether the additional utterance was spoken by the same user that is associated with the activated set of one or more warm words (Hughes et al. [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples); and 
 		when the additional utterance was spoken by the same user that is associated with the activated set of one or more warm words, performing, by data processing hardware, the respective action associated with the identified one of the warm words for controlling the long-standing operation (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music.)  
	Hughes et al. fail to explicitly teach 
  	such that other users are not permitted to use the activated set of one or more warm words to control the long-standing operation;
	However, VanBlon et al. teach 
  	associating, by data processing hardware, the activated set of one or more warm words with only the user that spoke the utterance, such that other users are not permitted to use the activated set of one or more warm words to control the long-standing operation (VanBlon et al. [0037] The relationship determined at 340 may be based on one or more currently active applications. For example, if a user requests an embodiment to play a specific media file (e.g., music, video, etc.) it may anticipate a subsequent request regarding the media playing application, such as: volume up/down, pause, skip track/chapter, etc. By way of further example, an embodiment may play music based on a voice command (e.g., “Cortana, Play Tom Petty”), and then allow the user to issue an additional related command (e.g., “turn it up,” “skip,” “I like this,” “pause,” “stop,” etc.) without a wakeup word (i.e., activation cue), [0039] Individual voice recognition may be used. Based on the recognition of an individual, for example, an embodiment may only accept commands from the person that issued the initial command at 310. In doing this, an embodiment may extend the available time to enter commands, while also ensuring that the commands are issued by a single user. Therefore, by way of voice filtering, an embodiment may identify an individual who issued a first command and accept subsequent commands from that user, e.g., for a predetermined period of time);
 	Hughes et al. and VanBlon et al. are analogous art because they are from a similar field of endeavor in the Signal Processing algorithm and application. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of activating the set of hotwords based on the identified context as taught by Hughes et al., using teaching of identifying a person issued the initial command as taught by VanBlon et al. for the benefit of accepting subsequent commands from only that user (VanBlon et al. [0039] Individual voice recognition may be used. Based on the recognition of an individual, for example, an embodiment may only accept commands from the person that issued the initial command at 310. In doing this, an embodiment may extend the available time to enter commands, while also ensuring that the commands are issued by a single user. Therefore, by way of voice filtering, an embodiment may identify an individual who issued a first command and accept subsequent commands from that user, e.g., for a predetermined period of time.)

 	With respect to Claim 2, Hughes et al. in view of VanBlon et al. teach
 	wherein: 
 	activating the set of one or more warm words comprises activating, for each corresponding warm word in the activated set of one or more warm words, a respective warm word model to run on the assistant-enabled device associated with the user (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music); and 
 	identifying, in the additional audio data, the one of the warm words from the activated set of one or more warm words comprises detecting, using the respective warm word model activated for the corresponding one of the warm words, the one of the warm words in the additional audio data without performing speech recognition on the additional audio data (Hughes et al. [0058] the system determines that the audio data includes the hotword without performing speech recognition on the audio data. In some implementations, the system determines that the audio data includes the hotword by extracting audio features of the audio data that corresponds to the utterance. The system generates a hotword confidence score by processing the audio features and possibly by comparing the audio features to those in a hotword model.)

 	With respect to Claim 3, Hughes et al. in view of VanBlon et al. teach
 	wherein detecting the one of the warm words in the additional audio data comprises: 
 	extracting audio features of the additional audio data (Hughes [0020] the hotword detector 108 may be configured to identify hotwords that are in the initial portion of the utterance 106. In this example, the hotword detector 108 may determine that the utterance 106 “Ok computer, play music” includes the hotword 110 “ok computer” if the hotword detector 108 detects acoustic features in the audio data that are characteristic of an active hotword 112. The acoustic features may be mel-frequency cepstral coefficients (MFCCs) that are representations of short-term power spectrums of the utterance or may be mel-scale filterbank energies for the utterance 106. For example, the hotword detector 108 may detect that the utterance 106 “Ok computer, play music” includes the hotword 110 “ok computer” based on generating MFCCs from the audio data and classifying that the MFCCs include MFCCs that are similar to MFCCs that are characteristic of the hotword “ok computer” as stored in the hotword models 114);
 	generating, using the respective warm word model activated for the corresponding one of the warm words, a warm word confidence score by processing the extracted audio features (Hughes et al. [0050] In some implementations, the hotword detector 108 generates a hotword confidence score for each initial portion of processed audio data. If the hotword confidence score satisfies a threshold then the hotword detector 108 determines that the audio data includes the hotword. For example, if the hotword confidence score is 0.9 and the hotword confidence threshold is 0.8, then the hotword detector 108 determines that the audio data includes the hotword); and
 	determining that the additional audio data corresponding to the additional utterance includes the corresponding one of the warm words when the warm word confidence score satisfies a warm word confidence threshold (Hughes et al. [0050] In some implementations, the hotword detector 108 generates a hotword confidence score for each initial portion of processed audio data. If the hotword confidence score satisfies a threshold then the hotword detector 108 determines that the audio data includes the hotword. For example, if the hotword confidence score is 0.9 and the hotword confidence threshold is 0.8, then the hotword detector 108 determines that the audio data includes the hotword.)

 	With respect to Claim 4, Hughes et al. in view of VanBlon et al. teach
 	wherein: 
 	activating the set of one or more warm words comprises executing a speech recognizer on the assistant-enabled device, the speech recognizer biased to recognize the one or more warm words in the activated set of one or more warm words (Hughes et al. [0067] the system may determine that the audio data includes more than one hotword. This may happen because currently active hotwords sound similar. For example, two active hotwords may be “next” and “text.” In some implementations, the system may only determine that the audio data includes a hotword, not necessarily which hotword. If the system determines that two or more hotword models match the audio data, then the system may perform speech recognition on the portion of the audio data that includes the hotword to determine what hotword the user spoke); and 
 	identifying, in the additional audio data, the one of the warm words from the activated set of one or more warm words comprises recognizing, using the speech recognizer executing on the assistant-enabled device, the one of the warm words in the additional audio data (Hughes et al. [0067] the system may determine that the audio data includes more than one hotword. This may happen because currently active hotwords sound similar. For example, two active hotwords may be “next” and “text.” In some implementations, the system may only determine that the audio data includes a hotword, not necessarily which hotword. If the system determines that two or more hotword models match the audio data, then the system may perform speech recognition on the portion of the audio data that includes the hotword to determine what hotword the user spoke.)

 	With respect to Claim 5, Hughes et al. in view of VanBlon et al. teach
 	further comprising, after receiving the audio data corresponding to the utterance spoken by the user, performing, by the data processing hardware, speaker identification on the audio data to identify the user that spoke the utterance by: 
 	extracting, from the audio data corresponding to the utterance spoken by the user, a first speaker-discriminative vector representing characteristics of the utterance spoken by the user (Hughes [0020] In some implementations, the hotword detector 108 may be configured to identify hotwords that are in the initial portion of the utterance 106. In this example, the hotword detector 108 may determine that the utterance 106 “Ok computer, play music” includes the hotword 110 “ok computer” if the hotword detector 108 detects acoustic features in the audio data that are characteristic of an active hotword 112. The acoustic features may be mel-frequency cepstral coefficients (MFCCs) that are representations of short-term power spectrums of the utterance or may be mel-scale filterbank energies for the utterance 106. For example, the hotword detector 108 may detect that the utterance 106 “Ok computer, play music” includes the hotword 110 “ok computer” based on generating MFCCs from the audio data and classifying that the MFCCs include MFCCs that are similar to MFCCs that are characteristic of the hotword “ok computer” as stored in the hotword models 114. As another example, the the hotword detector 108 may detect that the utterance 106 “Ok computer, play music” includes the hotword 110 “ok computer” based on generating mel-scale filterbank energies from the audio data and classifying that the mel-scale filterbank energies include mel-scale filterbank energies that are similar to mel-scale filterbank energies that are characteristic of the hotword “ok computer” as stored in the hotword models 114);
 determining whether the extracted first speaker-discriminative vector matches any enrolled speaker vectors stored on the assistant-enabled device, each enrolled speaker vector associated with a different respective enrolled user of the assistant-enabled device (Hughes et al. [0058] The system determines that the audio data includes the hotword (240). In some implementations, the system determines that the audio data includes the hotword without performing speech recognition on the audio data. In some implementations, the system determines that the audio data includes the hotword by extracting audio features of the audio data that corresponds to the utterance. The system generates a hotword confidence score by processing the audio features and possibly by comparing the audio features to those in a hotword model, [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples); and 
 when the extracted first speaker-discriminative vector matches one of the enrolled speaker vectors, identifying the user that spoke the utterance as the respective enrolled user associated with the one of the enrolled speaker vectors that matches the extracted first speaker-discriminative vector (Hughes et al. [0058] The system determines that the audio data includes the hotword (240). In some implementations, the system determines that the audio data includes the hotword without performing speech recognition on the audio data. In some implementations, the system determines that the audio data includes the hotword by extracting audio features of the audio data that corresponds to the utterance. The system generates a hotword confidence score by processing the audio features and possibly by comparing the audio features to those in a hotword model, [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples.) 

 	With respect to Claim 6, Hughes et al. in view of VanBlon et al. teach
 	wherein: 
 	the utterance spoken by the user further comprises a hotword preceding the command for the digital assistant to perform the long-standing operation (Hughes [0020] In some implementations, the hotword detector 108 may be configured to identify hotwords that are in the initial portion of the utterance 106. In this example, the hotword detector 108 may determine that the utterance 106 “Ok computer, play music” includes the hotword 110 “ok computer” if the hotword detector 108 detects acoustic features in the audio data that are characteristic of an active hotword 112); 
	the extracted first speaker-discriminative vector comprises a text-dependent speaker- discriminative vector extracted from the portion of the audio data that includes the hotword (Hughes et al. [0020] The acoustic features may be mel-frequency cepstral coefficients (MFCCs) that are representations of short-term power spectrums of the utterance or may be mel-scale filterbank energies for the utterance 106. For example, the hotword detector 108 may detect that the utterance 106 “Ok computer, play music” includes the hotword 110 “ok computer” based on generating MFCCs from the audio data and classifying that the MFCCs include MFCCs that are similar to MFCCs that are characteristic of the hotword “ok computer” as stored in the hotword models 114. As another example, the the hotword detector 108 may detect that the utterance 106 “Ok computer, play music” includes the hotword 110 “ok computer” based on generating mel-scale filterbank energies from the audio data and classifying that the mel-scale filterbank energies include mel-scale filterbank energies that are similar to mel-scale filterbank energies that are characteristic of the hotword “ok computer” as stored in the hotword models 114); and 
 	each enrolled speaker vector comprises a text-dependent enrolled speaker vector extracted from one or more audio samples of the respective enrolled user speaking the hotword (Hughes et al. [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples.)

	With respect to Claim 7, Hughes et al. in view of VanBlon et al. teach
 	wherein performing the speaker verification on the additional audio data comprises: 
 	extracting, from the additional audio data corresponding to the additional utterance of the one of the warm words, a second speaker-discriminative vector representing characteristics of the additional utterance (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music);
 	determining whether the extracted second speaker-discriminative vector matches a reference speaker vector for the respective enrolled user identified as the user that spoke the utterance (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music); and 
 	when the extracted second speaker-discriminative vector matches the reference speaker vector, determining that the additional utterance was spoken by the same user that is associated with the activated set of one or more warm words (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music, [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples.)

 	With respect to Claim 8, Hughes et al. in view of VanBlon et al. teach
 	wherein the reference speaker vector comprises the enrolled speaker vector associated with the respective enrolled user (Hughes et al. [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples.)

 	With respect to Claim 9, Hughes et al. in view of VanBlon et al. teach
 	wherein the reference speaker vector comprises a text- dependent speaker vector extracted from one or more audio samples of the respective enrolled user speaking the identified one of the warm words (Hughes et al. [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples.)

 	With respect to Claim 11, Hughes et al. in view of VanBlon et al. teach
 	further comprising, when the additional utterance was spoken by a different user than the user that is associated with the activated set of one or more warm words, suppressing, by the data processing hardware, performance of the respective action associated with the identified one of the warm words for controlling the long-standing operation (Hughes et al. [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples, [0006] The system recognizes a spoken hotword, and performs the corresponding operation, VanBlon et al. [0039] Individual voice recognition may be used. Based on the recognition of an individual, for example, an embodiment may only accept commands from the person that issued the initial command at 310. In doing this, an embodiment may extend the available time to enter commands, while also ensuring that the commands are issued by a single user. Therefore, by way of voice filtering, an embodiment may identify an individual who issued a first command and accept subsequent commands from that user, e.g., for a predetermined period of time. Examiner notes that the hotwords of Hughes are speaker-specific and must be associated with a speaker in order for the command to execute.)

 	With respect to Claim 14, Hughes et al. in view of VanBlon et al. teach
further comprising: determining, by the data processing hardware, when the digital assistant stops performing the long-standing operation (Hughes et al. [0009] The action further include determining, by the computing device, that the context is no longer associated with the computing device; and determining that subsequently received audio data that includes the hotword is not to trigger an operation, [0053] the context may be playing music); and 
 	deactivating, by the data processing hardware, the set of one or more warm words (Hughes et al. [0009] The action further include determining, by the computing device, that the context is no longer associated with the computing device; and determining that subsequently received audio data that includes the hotword is not to trigger an operation, [0056] The system may provide notification when the hotword becomes active and when the system deactivates it, [0062] the system removes a hotword from the active hotwords list when the context is no longer valid.)

With respect to Claim 15, Hughes et al. disclose 
 	A system comprising: 
 	data processing hardware (Hughes et al. [0069] The processor 302 can process instructions for execution within the computing device 300, including instructions stored in the memory 304); and 
 	memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations (Hughes et al. [0070] The memory 304 stores information within the computing device 300. In some implementations, the memory 304 is a volatile memory unit or units. The memory 304 may also be another form of computer-readable medium, such as a magnetic or optical disk) comprising: 
 	 receiving audio data corresponding to an utterance spoken by a user and captured by an assistant-enabled device associated with the user, the utterance comprising a command for a digital assistant to perform a long-standing operation (Hughes et al. [0017] the computing device 104 begins to play music in response to the utterance 106, “OK computer, play music.” The computing device 104 begins to play music);
 	after receiving the audio data corresponding to the utterance (Hughes et al. [0023] With the music playing 122 the computing device 104 is running a music application in either the foreground or the background. The computing device 104 may include a context identifier 124 and an active hotword selector 126. The context identifier 124 may be configured to identify a current context of the computing device 104. The active hotword selector 126 may use the current context of the computing device 104 to select active hotwords. In this example, the context of the device may be related to playing music 122 and running the music application):
 	 	activating a set of one or more warm words each associated with a respective action for controlling the long-standing operation (Hughes et al. [0023] The active hotword selector 126 may examine the code of the music application to identify any hotwords that the developers of the music application want users to be able to speak to interact with the music application and the respective actions for each hotword. The music application may identify hotwords such as “play,” “next,” “stop,” and “back.” Based on the context of music actively playing the active hotword selector 126 may select the hotwords of “next,” “stop,” and “back” and store them in the active hotwords 112); and
 		associating the activated set of one or more warm words with only the user that spoke the utterance(Hughes et al. [0023] The active hotword selector 126 may examine the code of the music application to identify any hotwords that the developers of the music application want users to be able to speak to interact with the music application and the respective actions for each hotword. The music application may identify hotwords such as “play,” “next,” “stop,” and “back.” Based on the context of music actively playing the active hotword selector 126 may select the hotwords of “next,” “stop,” and “back” and store them in the active hotwords 112, [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples); and 
 	while the digital assistant is performing the long-standing operation (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music):   
 		receiving additional audio data corresponding to an additional utterance captured by the assistant-enabled device (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music); 
 		identifying in the additional audio data, one of the warm words from the activated set of one or more warm words (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music); 
 		performing speaker verification on the additional audio data to determine whether the additional utterance was spoken by the same user that is associated with the activated set of one or more warm words (Hughes et al. [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples); and 
 		when the additional utterance was spoken by the same user that is associated with the activated set of one or more warm words, performing the respective action associated with the identified one of the warm words for controlling the long- standing operation (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music.) 
 	Hughes et al. fail to explicitly teach 
  	such that other users are not permitted to use the activated set of one or more warm words to control the long-standing operation;
	However, VanBlon et al. teach 
  	associating the activated set of one or more warm words with only the user that spoke the utterance, such that other users are not permitted to use the activated set of one or more warm words to control the long-standing operation (VanBlon et al. [0037] The relationship determined at 340 may be based on one or more currently active applications. For example, if a user requests an embodiment to play a specific media file (e.g., music, video, etc.) it may anticipate a subsequent request regarding the media playing application, such as: volume up/down, pause, skip track/chapter, etc. By way of further example, an embodiment may play music based on a voice command (e.g., “Cortana, Play Tom Petty”), and then allow the user to issue an additional related command (e.g., “turn it up,” “skip,” “I like this,” “pause,” “stop,” etc.) without a wakeup word (i.e., activation cue), [0039] Individual voice recognition may be used. Based on the recognition of an individual, for example, an embodiment may only accept commands from the person that issued the initial command at 310. In doing this, an embodiment may extend the available time to enter commands, while also ensuring that the commands are issued by a single user. Therefore, by way of voice filtering, an embodiment may identify an individual who issued a first command and accept subsequent commands from that user, e.g., for a predetermined period of time);
 	Hughes et al. and VanBlon et al. are analogous art because they are from a similar field of endeavor in the Signal Processing algorithm and application. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of activating the set of hotwords based on the identified context as taught by Hughes et al., using teaching of identifying a person issued the initial command as taught by VanBlon et al. for the benefit of accepting subsequent commands from only that user (VanBlon et al. [0039] Individual voice recognition may be used. Based on the recognition of an individual, for example, an embodiment may only accept commands from the person that issued the initial command at 310. In doing this, an embodiment may extend the available time to enter commands, while also ensuring that the commands are issued by a single user. Therefore, by way of voice filtering, an embodiment may identify an individual who issued a first command and accept subsequent commands from that user, e.g., for a predetermined period of time.)

 	With respect to Claim 16, Hughes et al. in view of VanBlon et al. teach
	wherein: 
 	activating the set of one or more warm words comprises activating, for each corresponding warm word in the activated set of one or more warm words, a respective warm word model to run on the assistant-enabled device associated with the user (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music); and 
 	identifying, in the additional audio data, the one of the warm words from the activated set of one or more warm words comprises detecting, using the respective warm word model activated for the corresponding one of the warm words, the one of the warm words in the additional audio data without performing speech recognition on the additional audio data (Hughes et al. [0058] the system determines that the audio data includes the hotword without performing speech recognition on the audio data. In some implementations, the system determines that the audio data includes the hotword by extracting audio features of the audio data that corresponds to the utterance. The system generates a hotword confidence score by processing the audio features and possibly by comparing the audio features to those in a hotword model.)

 	With respect to Claim 17, Hughes et al. in view of VanBlon et al. teach
 	wherein detecting the one of the warm words in the additional audio data comprises: 
 	extracting audio features of the additional audio data (Hughes [0020] the hotword detector 108 may be configured to identify hotwords that are in the initial portion of the utterance 106. In this example, the hotword detector 108 may determine that the utterance 106 “Ok computer, play music” includes the hotword 110 “ok computer” if the hotword detector 108 detects acoustic features in the audio data that are characteristic of an active hotword 112. The acoustic features may be mel-frequency cepstral coefficients (MFCCs) that are representations of short-term power spectrums of the utterance or may be mel-scale filterbank energies for the utterance 106. For example, the hotword detector 108 may detect that the utterance 106 “Ok computer, play music” includes the hotword 110 “ok computer” based on generating MFCCs from the audio data and classifying that the MFCCs include MFCCs that are similar to MFCCs that are characteristic of the hotword “ok computer” as stored in the hotword models 114);
 	generating, using the respective warm word model activated for the corresponding one of the warm words, a warm word confidence score by processing the extracted audio features (Hughes et al. [0050] In some implementations, the hotword detector 108 generates a hotword confidence score for each initial portion of processed audio data. If the hotword confidence score satisfies a threshold then the hotword detector 108 determines that the audio data includes the hotword. For example, if the hotword confidence score is 0.9 and the hotword confidence threshold is 0.8, then the hotword detector 108 determines that the audio data includes the hotword); and
 	determining that the additional audio data corresponding to the additional utterance includes the corresponding one of the warm words when the warm word confidence score satisfies a warm word confidence threshold (Hughes et al. [0050] In some implementations, the hotword detector 108 generates a hotword confidence score for each initial portion of processed audio data. If the hotword confidence score satisfies a threshold then the hotword detector 108 determines that the audio data includes the hotword. For example, if the hotword confidence score is 0.9 and the hotword confidence threshold is 0.8, then the hotword detector 108 determines that the audio data includes the hotword.)

 	With respect to Claim 18, Hughes et al. in view of VanBlon et al. teach	wherein: 
 	activating the set of one or more warm words comprises executing a speech recognizer on the assistant-enabled device, the speech recognizer biased to recognize the one or more warm words in the activated set of one or more warm words (Hughes et al. [0067] the system may determine that the audio data includes more than one hotword. This may happen because currently active hotwords sound similar. For example, two active hotwords may be “next” and “text.” In some implementations, the system may only determine that the audio data includes a hotword, not necessarily which hotword. If the system determines that two or more hotword models match the audio data, then the system may perform speech recognition on the portion of the audio data that includes the hotword to determine what hotword the user spoke); and 
 	identifying, in the additional audio data, the one of the warm words from the activated set of one or more warm words comprises recognizing, using the speech recognizer executing on the assistant-enabled device, the one of the warm words in the additional audio data (Hughes et al. [0067] the system may determine that the audio data includes more than one hotword. This may happen because currently active hotwords sound similar. For example, two active hotwords may be “next” and “text.” In some implementations, the system may only determine that the audio data includes a hotword, not necessarily which hotword. If the system determines that two or more hotword models match the audio data, then the system may perform speech recognition on the portion of the audio data that includes the hotword to determine what hotword the user spoke.)

	With respect to Claim 19, Hughes et al. in view of VanBlon et al. teach 
 	wherein the operations further comprise, after receiving the audio data corresponding to the utterance spoken by the user, performing speaker identification on the audio data to identify the user that spoke the utterance by: 
 	extracting, from the audio data corresponding to the utterance spoken by the user, a first speaker-discriminative vector representing characteristics of the utterance spoken by the user (Hughes [0020] In some implementations, the hotword detector 108 may be configured to identify hotwords that are in the initial portion of the utterance 106. In this example, the hotword detector 108 may determine that the utterance 106 “Ok computer, play music” includes the hotword 110 “ok computer” if the hotword detector 108 detects acoustic features in the audio data that are characteristic of an active hotword 112. The acoustic features may be mel-frequency cepstral coefficients (MFCCs) that are representations of short-term power spectrums of the utterance or may be mel-scale filterbank energies for the utterance 106. For example, the hotword detector 108 may detect that the utterance 106 “Ok computer, play music” includes the hotword 110 “ok computer” based on generating MFCCs from the audio data and classifying that the MFCCs include MFCCs that are similar to MFCCs that are characteristic of the hotword “ok computer” as stored in the hotword models 114. As another example, the the hotword detector 108 may detect that the utterance 106 “Ok computer, play music” includes the hotword 110 “ok computer” based on generating mel-scale filterbank energies from the audio data and classifying that the mel-scale filterbank energies include mel-scale filterbank energies that are similar to mel-scale filterbank energies that are characteristic of the hotword “ok computer” as stored in the hotword models 114);
 determining whether the extracted first speaker-discriminative vector matches any enrolled speaker vectors stored on the assistant-enabled device, each enrolled speaker vector associated with a different respective enrolled user of the assistant-enabled device (Hughes et al. [0058] The system determines that the audio data includes the hotword (240). In some implementations, the system determines that the audio data includes the hotword without performing speech recognition on the audio data. In some implementations, the system determines that the audio data includes the hotword by extracting audio features of the audio data that corresponds to the utterance. The system generates a hotword confidence score by processing the audio features and possibly by comparing the audio features to those in a hotword model, [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples); and 
 	 when the extracted first speaker-discriminative vector matches one of the enrolled speaker vectors, identifying the user that spoke the utterance as the respective enrolled user associated with the one of the enrolled speaker vectors that matches the extracted first speaker-discriminative vector (Hughes et al. [0058] The system determines that the audio data includes the hotword (240). In some implementations, the system determines that the audio data includes the hotword without performing speech recognition on the audio data. In some implementations, the system determines that the audio data includes the hotword by extracting audio features of the audio data that corresponds to the utterance. The system generates a hotword confidence score by processing the audio features and possibly by comparing the audio features to those in a hotword model, [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples.) 

 	With respect to Claim 20, Hughes et al. in view of VanBlon et al. teach
 	wherein: 
 	the utterance spoken by the user further comprises a hotword preceding the command for the digital assistant to perform the long-standing operation (Hughes [0020] In some implementations, the hotword detector 108 may be configured to identify hotwords that are in the initial portion of the utterance 106. In this example, the hotword detector 108 may determine that the utterance 106 “Ok computer, play music” includes the hotword 110 “ok computer” if the hotword detector 108 detects acoustic features in the audio data that are characteristic of an active hotword 112); 
 	the extracted first speaker-discriminative vector comprises a text-dependent speaker- discriminative vector extracted from the portion of the audio data that includes the hotword (Hughes et al. [0020] The acoustic features may be mel-frequency cepstral coefficients (MFCCs) that are representations of short-term power spectrums of the utterance or may be mel-scale filterbank energies for the utterance 106. For example, the hotword detector 108 may detect that the utterance 106 “Ok computer, play music” includes the hotword 110 “ok computer” based on generating MFCCs from the audio data and classifying that the MFCCs include MFCCs that are similar to MFCCs that are characteristic of the hotword “ok computer” as stored in the hotword models 114. As another example, the the hotword detector 108 may detect that the utterance 106 “Ok computer, play music” includes the hotword 110 “ok computer” based on generating mel-scale filterbank energies from the audio data and classifying that the mel-scale filterbank energies include mel-scale filterbank energies that are similar to mel-scale filterbank energies that are characteristic of the hotword “ok computer” as stored in the hotword models 114); and 
 	each enrolled speaker vector comprises a text-dependent enrolled speaker vector extracted from one or more audio samples of the respective enrolled user speaking the hotword (Hughes et al. [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples.)

 	With respect to Claim 21, Hughes et al. in view of VanBlon et al. teach
 	wherein performing the speaker verification on the additional audio data comprises: 
 	extracting, from the additional audio data corresponding to the additional utterance of the one of the warm words, a second speaker-discriminative vector representing characteristics of the additional utterance (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music); 
 	determining whether the extracted second extracted speaker-discriminative vector matches a reference speaker vector for the respective enrolled user identified as the user that spoke the utterance (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music); and 
 	when the extracted second speaker-discriminative vector matches the reference speaker vector, determining that the additional utterance was spoken by the same user that is associated with the activated set of one or more warm words (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music, [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples.)

 	With respect to Claim 22, Hughes et al. in view of VanBlon et al. teach
 	wherein the reference speaker vector comprises the enrolled speaker vector associated with the respective enrolled user (Hughes et al. [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples.)

 	With respect to Claim 23, Hughes et al. in view of VanBlon et al. teach
 	wherein the reference speaker vector comprises a text- dependent speaker vector extracted from one or more audio samples of the respective enrolled user speaking the identified one of the warm words (Hughes et al. [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples.)

 	With respect to Claim 25, Hughes et al. in view of VanBlon et al. teach
 	wherein the operations further comprise, when the additional utterance was spoken by a different user than the user that is associated with the activated set of one or more warm words, suppressing performance of the respective action associated with the identified one of the warm words for controlling the long- standing operation (Hughes et al. [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples, [0006] The system recognizes a spoken hotword, and performs the corresponding operation, VanBlon et al. [0039] Individual voice recognition may be used. Based on the recognition of an individual, for example, an embodiment may only accept commands from the person that issued the initial command at 310. In doing this, an embodiment may extend the available time to enter commands, while also ensuring that the commands are issued by a single user. Therefore, by way of voice filtering, an embodiment may identify an individual who issued a first command and accept subsequent commands from that user, e.g., for a predetermined period of time. Examiner notes that the hotwords of Hughes are speaker-specific and must be associated with a speaker in order for the command to execute.)

 	With respect to Claim 28, Hughes et al. in view of VanBlon et al. teach
 	wherein the operations further comprise:
 	determining, by the data processing hardware, when the digital assistant stops performing the long-standing operation (Hughes et al. [0009] The action further include determining, by the computing device, that the context is no longer associated with the computing device; and determining that subsequently received audio data that includes the hotword is not to trigger an operation, [0053] the context may be playing music); and 
 	deactivating, by the data processing hardware, the set of one or more warm words (Hughes et al. [0009] The action further include determining, by the computing device, that the context is no longer associated with the computing device; and determining that subsequently received audio data that includes the hotword is not to trigger an operation, [0056] The system may provide notification when the hotword becomes active and when the system deactivates it, [0062] the system removes a hotword from the active hotwords list when the context is no longer valid.)

5.	Claims 10, 24 are rejected under 35 U.S.C. 103 as being unpatentable over Hughes et al. (US 2018/0182390 A1) in view of VanBlon et al. (US 2017/0169817 A1) and De Assis et al. (US 2021/0157542 A1.) 

With respect to Claim 10, Hughes et al. in view of VanBlon et al. teach
 	wherein:  
 	performing the speaker verification on the additional audio data comprises: 
 	 	extracting, from the additional audio data, a second speaker-discriminative vector representing characteristics of the additional utterance (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music); 
 		determining whether the  extracted second speaker-discriminative vector matches the first speaker-discriminative vector representing the characteristics (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music); and 
 	 	when the  extracted first and second extracted speaker-discriminative vectors match, determining that the additional utterance was spoken by the same user that is associated with the activated set of one or more warm words (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music, [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples.)
	Hughes et al. in view of VanBlon et al. fail to explicitly teach 
 	when the extracted first speaker-discriminative vector does not match any of the enrolled speaker vectors, identifying the user that spoke the utterance as a guest user of the assistant-enabled device; and 
	However, De Assis et al. teach 
 	when the  extracted first speaker-discriminative vector does not match any of the enrolled speaker vectors, identifying the user that spoke the utterance as a guest user of the assistant-enabled device; and (De Assis et al. [0031] Credentials authenticator 132 initially register the voice of an individual person when he or she utters words during a voice ID registration/training session, [0031] credentials authenticator 132 identifies that the received voice input is from a non-registered users (e.g., a guest) when the received voice input does not correspond to any of the user IDs of users registry 122. Credentials authenticator 132 attaches or associates a non-registered status indicator to the received voice input (or other form of user input) from the non-registered user.)
  	Hughes et al., VanBlon et al. and De Assis et al. are analogous art because they are from a similar field of endeavor in the Signal Processing algorithm and application. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of activating the set of hotwords based on the identified context as taught by Hughes et al., using teaching of identifying a person issued the initial command as taught by VanBlon et al. for the benefit of accepting subsequent commands from only that user, using teaching of registering the voice of an individual person as taught by De Assis for the benefit of determining whether the user that spoke the utterance is a registered user or a guest user (De Assis et al. [0031] Credentials authenticator 132 initially register the voice of an individual person when he or she utters words during a voice ID registration/training session, [0031]  credentials authenticator 132 identifies that the received voice input is from a non-registered users (e.g., a guest) when the received voice input does not correspond to any of the user IDs of users registry 122. Credentials authenticator 132 attaches or associates a non-registered status indicator to the received voice input (or other form of user input) from the non-registered user.)

With respect to Claim 24, Hughes et al. in view of VanBlon et al. teach 
 	wherein:  
 	performing the speaker verification on the additional audio data comprises: 
 	 	extracting, from the additional audio data, a second speaker-discriminative vector representing characteristics of the additional utterance (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music); 
 		determining whether the  extracted second speaker-discriminative vector matches the first speaker-discriminative vector representing the characteristics (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music); and 
 	 	when the  extracted first and second extracted speaker-discriminative vectors match, determining that the additional utterance was spoken by the same user that is associated with the activated set of one or more warm words (Hughes et al. [0006] when music is playing, the system may identify the hotwords “next,” “stop,” and “back” for controlling the music. The system may request a hotword model for each of the identified hotwords. The system may use the hotword models to recognize the new hotwords by processing the audio characteristics of audio data corresponding to the user’s speech and applying the hotword models to the audio characteristics. The system recognizes a spoken hotword, and performs the corresponding operation. If the user speaks “stop” and “stop” is an active hotword because the system is playing music, then the system may stop playing music, [0029] In order to improve security, the computing device 104 may use speaker identification techniques to verify that the speaker is the user 102. In this case, a corresponding hotword model would be trained using speech of the user 102. For example, the computing device 104 may prompt the user 102 to speak “unlock” several times so that the computing device 104 or server 120 can build a hotword model specific to user 102 with the speech samples.)
	Hughes et al. in view of VanBlon et al. fail to explicitly teach 
 	when the extracted first speaker-discriminative vector does not match any of the enrolled speaker vectors, identifying the user that spoke the utterance as a guest user of the assistant-enabled device; and 
	However, De Assis et al. teach 
 	when the extracted first speaker-discriminative vector does not match any of the enrolled speaker vectors, identifying the user that spoke the utterance as a guest user of the assistant-enabled device; and (De Assis et al. [0031] Credentials authenticator 132 initially register the voice of an individual person when he or she utters words during a voice ID registration/training session, [0031]  credentials authenticator 132 identifies that the received voice input is from a non-registered users (e.g., a guest) when the received voice input does not correspond to any of the user IDs of users registry 122. Credentials authenticator 132 attaches or associates a non-registered status indicator to the received voice input (or other form of user input) from the non-registered user.)
  	Hughes et al., VanBlon et al. and De Assis et al. are analogous art because they are from a similar field of endeavor in the Signal Processing algorithm and application. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of activating the set of hotwords based on the identified context as taught by Hughes et al., using teaching of identifying a person issued the initial command as taught by VanBlon et al. for the benefit of accepting subsequent commands from only that user, using teaching of registering the voice of an individual person as taught by De Assis for the benefit of determining whether the user that spoke the utterance is a registered user or a guest user (De Assis et al. [0031] Credentials authenticator 132 initially register the voice of an individual person when he or she utters words during a voice ID registration/training session, [0031]  credentials authenticator 132 identifies that the received voice input is from a non-registered users (e.g., a guest) when the received voice input does not correspond to any of the user IDs of users registry 122. Credentials authenticator 132 attaches or associates a non-registered status indicator to the received voice input (or other form of user input) from the non-registered user.)

Allowable Subject Matter
6.	Claims 12, 26 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
	Claims 12 and 27 are objected to as being dependent upon an objected claim(s) by virtue of their dependency. 
The following is an examiner’s statement of reasons for allowance: The prior art(s) taken alone or in combination fail(s) to teach the following element(s) in combination with the other recited elements in the claim(s).
	“when the additional utterance was spoken by a different user than the user that is associated with the activated set of one or more warm words:
 prompting, by the data processing hardware, the user that is associated with the activated set of one or more warm words to authorize performance of the respective action associated with the identified one of the warm words for controlling the long- standing operation; 
 receiving, at the data processing hardware, an acknowledgement from the user authorizing performance of the respective action; and ” as recited in Claims 12 and 26. 

Conclusion
7.	The prior art made of record and not relied upon is considered pertinent to application’s disclosure. See PTO-892.
a.	Arkko et al. (US 2015/0199961.) In this reference, Arkko et al. disclose a method for activating a set of keywords based on the context. 
b.	Trufinescu et al. (US 2022/0139391 A1.) In this reference, Trufinescu et al. disclose a method for identifying keyword and to determine that keyword is associated with a registered voice assistant. 
c. 	Gruber et al. (US 2016/0179831 A1.) In this reference, Gruber et al. disclose a method for identifying a keyword in an audio stream prior to translation and transcription. 

8.	Applicant’s amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429. The examiner can normally be reached Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C. Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/THUYKHANH LE/Primary Examiner, Art Unit 2655