DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
2.	The information disclosure statement (IDS) submitted on 10/12/2020 and 02/19/2021 is/are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
3.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

4.	Claims 1, 3, 5, 8, 11, 13, 15, 18 are rejected under 35 U.S.C.103 as being unpatentable over VanBlon et al. (US 2017/0169817 A1) in view of Arkko et al. (US 2015/0199961 A1). 

 	With respect to Claim 1, VanBlon et al. disclose 
 	A method comprising: 
 	receiving, at data processing hardware of a computing device, audio data corresponding to a first utterance spoken by a user associated with the computing device, the first utterance comprising a default hotword followed by a command to playback music from the computing device (VanBlon et al. [0028] an embodiment may receive audio input at 310. The audio input may be of various types, for example, human voice in the form of command inputs. Additionally, the audio input could be produced from a media device (e.g., radio, television, computer, etc). As the audio is received, an embodiment may parse the audio to determine if it contains an activation cue or trigger phrase at 310. An activation cue or trigger phrase allows a device to “wake up” (e.g., enable a device to capture and analyze audio for performing an associated command). Currently, this wake up period is brief and only intended to allow or a single command to be entered, Fig. 3 element 310, [0037] play music based on a voice command (e.g., “Cortana, Play Tom Petty”); 
 	detecting, by the data processing hardware, the default hotword in the audio data corresponding to the first utterance spoken by the user, the detection of the default hotword in the audio data causing a speech recognizer to perform speech recognition on the audio data to identify the command to playback the music from the computing device (VanBlon et al. Fig. 3 element 310 Receive and parse audio input to identify an activation cue and at least one command, [0037] play music based on a voice command (e.g., “Cortana, Play Tom Petty”); 
 	in response to the speech recognizer performing speech recognition on the audio data to identify the command to playback the music, executing, by the data processing hardware, the command to playback the music from the computing device (VanBlon et al. Fig. 3 element 320 Performing an action based on the at least one command); and 
 	during playback of the music from the computing device (VanBlon et al. [0037] if a user requests an embodiment to play a specific media file (e.g., music, video, etc.) it may anticipate a subsequent request regarding the media playing application, such as volume up/down, pause, skip track/chapter, etc. By way of further example, an embodiment may play music based on a voice command (e.g., “Cortana, Play Tom Petty)): 
 			receiving, at the data processing hardware, additional audio data corresponding to a second utterance spoken by the user, the second utterance comprising one of the additional hotwords in the activated set of additional hotwords (VanBlon et al. [0030] Once an action has been carried out at 320, an embodiment may receive additional audio input at 330. The additional audio may, similar to the first received audio input, contain at least one command); 
 	 	detecting, by the data processing hardware, the additional hotword in the additional audio data corresponding to the second utterance spoken by the user (VanBlon et al. [0042] A list to previously issued commands may be maintained and used to identify requests where a user has historically made follow up commands. Thus, an embodiment may listen to commands that are typically followed up with other command (e.g., by the general population or by a particular user). For example, a user may typically adjust the playback volume of a device shortly or immediately after requesting media to be played (e.g., music, video, etc.), for example based on the media type (e.g., hard rock, classical, etc.), a current volume setting of the device application, etc. Thus, an embodiment may anticipate the upcoming volume control command (e.g., up or down) based on the media type, etc., and extend the period of time for instruction entry); and 
 	 	based on detecting the additional hotword in the additional audio data corresponding to the second utterance, performing, by the data processing hardware, the respective action associated with the detected additional hotword for controlling the playback of the music from the computing device (VanBlon et al. [0042] an embodiment may listen to commands that are typically followed up with other command (e.g., by the general population or by a particular user). For example, a user may typically adjust the playback volume of a device shortly or immediately after requesting media to be played (e.g., music, video, etc.), for example based on the media type (e.g., hard rock, classical, etc.), a current volume setting of the device application, etc., Fig. 3 element 360 Perform the an action based on the additional input.)  
VanBlon et al. teach a method/a system for extending wakeup period based on anticipated command to listen to potential follow up command. With that extending wakeup period, VanBlon et al. allows certain words pertaining to music adjustment (e.g., volume) after playback has started. VanBlon et al. fail to explicitly teach 
 		activating, by the data processing hardware, a set of additional hotwords each associated with a respective action for controlling the playback of the music from the computing device; 
	However, Arkko et al. teach 
 		activating, by the data processing hardware, a set of additional hotwords each associated with a respective action for controlling the playback of the music from the computing device (Arkko et al. [0025] When a current context of the user is detected, which context is characterized by certain context parameters, a predefined context is selected having context parameters that best matches the detected context, and the keywords that are associated with the selected context are then valid as input to the application. Thus, when any of the keywords of the selected context is recognized in speech from the user, it is used as input to the application. For example, when recognized in speech from the user, the keywords of the selected context may be used as commands, information or other input for controlling the application in some way, [0026] A predefined context may further pertain to any of:.. the type or current status of the activated application. The Examiner notes that Arkko et al. activates a set of keyword based upon various conditions including application status. With the set of activated keyword related to the status of the application, the system are allowed to recognize the one or more activated keyword related to the status of the application. And in the analogous art, VanBlon et al. discloses one of the voice-controlled application is the music application); 
 	VanBlon et al. and Arkko et al. are analogous art because they are from a similar field of endeavor in the Speech Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of extending the period of voice recognition as taught by VanBlon et al., using teaching of activating a set of keywords based upon the type or current status of the activated application as taught by Arkko et al. for the benefit of spotting keyword in the user’s utterance (Arkko et al. [0008] A second example is referred to as “keyword spotting” which does not require translation of the entire speech input into text but the audio is searched only for specific words or phrases by recognition their sound, more or less, and then translating them into text. In general, keyword spotting requires less computing than speech recognition since only a limited word or phrase must be recognized for translation instead of an entire vocabulary, [0032] The keyword may be recognized by using any of the above-described techniques of...keyword spotting.)

With respect to Claim 3, VanBlon et al. in view of Arkko et al. teach 
 	wherein the speech recognizer executes on the data processing hardware of the computing device (VanBlon et al. [0019] While various other circuits, circuitry or components may be utilized in information handling devices, with regard to smart phone and/or tablet circuitry 100, an example illustrated in FIG. 1 includes a system on a chip design found for example in tablet or other mobile computing platforms. Software and processor(s) are combined in a single chip 110.Processors comprise internal arithmetic units, registers, cache memory, busses, I/O ports, etc., as is well known in the art. Internal busses and the like depend on different vendors, but essentially all the peripheral devices (120) may attach to a single chip 110. The circuitry 100 combines the processors memory control, and I/O controller hub all into a single chip 110. Also, systems 100 of this type do not typically use SATA or PCI or LPC. Common interfaces, for example, include SDIO and I2C, [0010] Fig. 3 illustrates an example method of extending the period of voice recognition, Arkko et al. [0065] It should be noted that FIG. 6 illustrates various functional units in the application node 600 and the speech recognition node 602 in a logical sense, and the skilled person is able to implement these functional units in practice using suitable software and hardware means.)

With respect to Claim 5, VanBlon et al. in view of Arkko et al. teach   
 	wherein detecting the additional hotword in the additional audio data corresponding to the second utterance comprises detecting the additional hotword in the additional audio data without performing speech recognition on the additional audio data (Arkko et al. [0008] A second example is referred to as “keyword spotting” which does not require translation of the entire speech input into text but the audio is searched only for specific words or phrases by recognition their sound, more or less, and then translating them into text. In general, keyword spotting requires less computing than speech recognition since only a limited word or phrase must be recognized for translation instead of an entire vocabulary, [0032] The keyword may be recognized by using any of the above-described techniques of...keyword spotting.)

 	With respect to Claim 8, VanBlon et al. in view of Arkko et al. teach 
 	wherein the second utterance only comprises the one of the additional hotwords in the activated set of additional hotwords (VanBlon et al. [0037] allow the user to issue an additional related command (e.g., “turn it up,” “skip,” “I like this,” “pause,” “stop,” etc.) without a wakeup word (i.e., activation cue).)

 	With respect to Claim 11, VanBlon et al. disclose
 	A computing device comprising: 
 	data processing hardware (VanBlon et al. [0046] It should be noted that the various functions described herein may be implemented using instructions stored on a device readable storage medium such as a non-signal storage device that are executed by a processor); and 
A storage device may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a storage medium would include the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a storage device is not a signal and "non-transitory" includes all media except signal media) comprising: 
 		receiving audio data corresponding to a first utterance spoken by a user associated with the computing device, the first utterance comprising a default hotword followed by a command to playback music from the computing device (VanBlon et al. [0028] an embodiment may receive audio input at 310. The audio input may be of various types, for example, human voice in the form of command inputs. Additionally, the audio input could be produced from a media device (e.g., radio, television, computer, etc). As the audio is received, an embodiment may parse the audio to determine if it contains an activation cue or trigger phrase at 310. An activation cue or trigger phrase allows a device to “wake up” (e.g., enable a device to capture and analyze audio for performing an associated command). Currently, this wake up period is brief and only intended to allow or a single command to be entered, Fig. 3 element 310, [0037] play music based on a voice command (e.g., “Cortana, Play Tom Petty”)
 		detecting the default hotword in the audio data corresponding to the first utterance spoken by the user, the detection of the default hotword in the audio data causing a speech recognizer to perform speech recognition on the audio data to identify the command to playback the music from the computing device (VanBlon et al. Fig. 3 element 310 Receive and parse audio input to identify an activation cue and at least one command, [0037] play music based on a voice command (e.g., “Cortana, Play Tom Petty”); 
 	 	 in response to the speech recognizer performing speech recognition on the audio data to identify the command to playback the music, executing the command to playback the  element 320 Performing an action based on the at least one command); and 
 		during playback of the music from the computing device (VanBlon et al. [0037] if a user requests an embodiment to play a specific media file (e.g., music, video, etc.) it may anticipate a subsequent request regarding the media playing application, such as volume up/down, pause, skip track/chapter, etc. By way of further example, an embodiment may play music based on a voice command (e.g., “Cortana, Play Tom Petty)): 
 			receiving additional audio data corresponding to a second utterance spoken by the user, the second utterance comprising one of the additional hotwords in the activated set of additional hotwords (VanBlon et al. [0030] Once an action has been carried out at 320, an embodiment may receive additional audio input at 330. The additional audio may, similar to the first received audio input, contain at least one command); 
 		detecting the additional hotword in the additional audio data corresponding to the second utterance spoken by the user (VanBlon et al. [0042] A list to previously issued commands may be maintained and used to identify requests where a user has historically made follow up commands. Thus, an embodiment may listen to commands that are typically followed up with other command (e.g., by the general population or by a particular user). For example, a user may typically adjust the playback volume of a device shortly or immediately after requesting media to be played (e.g., music, video, etc.), for example based on the media type (e.g., hard rock, classical, etc.), a current volume setting of the device application, etc. Thus, an embodiment may anticipate the upcoming volume control command (e.g., up or down) based on the media type, etc., and extend the period of time for instruction entry); and 
 	 	based on detecting the additional hotword in the additional audio data corresponding to the second utterance, performing the respective action associated with the detected additional hotword for controlling the playback of the music from the computing device (VanBlon et al. [0042] an embodiment may listen to commands that are typically followed up with other command (e.g., by the general population or by a particular user). For example, a user may typically adjust the playback volume of a device shortly or immediately after requesting media to be played (e.g., music, video, etc.), for example based on the media type (e.g., hard rock, classical, etc.), a current volume setting of the device application, etc., Fig. 3 element 360 Perform the an action based on the additional input.)  
VanBlon et al. teach a method/a system for extending wakeup period based on anticipated command to listen to potential follow up command. With that extending wakeup period, VanBlon et al. allows certain words pertaining to music adjustment (e.g., volume) after playback has started. VanBlon et al. fail to explicitly teach 
 		activating a set of additional hotwords each associated with a respective action for controlling the playback of the music from the computing device;
However, Arkko et al. teach
 		activating a set of additional hotwords each associated with a respective action for controlling the playback of the music from the computing device (Arkko et al. [0025] When a current context of the user is detected, which context is characterized by certain context parameters, a predefined context is selected having context parameters that best matches the detected context, and the keywords that are associated with the selected context are then valid as input to the application. Thus, when any of the keywords of the selected context is recognized in speech from the user, it is used as input to the application. For example, when recognized in speech from the user, the keywords of the selected context may be used as commands, information or other input for controlling the application in some way, [0026] A predefined context may further pertain to any of:.. the type or current status of the activated application. The Examiner notes that Arkko et al. activates a set of keyword based upon various conditions including application status. With the set of activated keyword related to the status of the application, the system are allowed to recognize the one or more activated keyword related to the status of the application. And in the analogous art, VanBlon et al. discloses one of the voice-controlled application is the music application); 
 	VanBlon et al. and Arkko et al. are analogous art because they are from a similar field of endeavor in the Speech Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of extending the period of voice recognition as taught by VanBlon et al., using teaching of activating a set of keywords based upon the type or current status of the activated application as taught by Arkko et al. for the benefit of spotting keyword in the user’s utterance (Arkko et al. [0008] A second example is referred to as “keyword spotting” which does not require translation of the entire speech input into text but the audio is searched only for specific words or phrases by recognition their sound, more or less, and then translating them into text. In general, keyword spotting requires less computing than speech recognition since only a limited word or phrase must be recognized for translation instead of an entire vocabulary, [0032] The keyword may be recognized by using any of the above-described techniques of...keyword spotting.)

 	With respect to Claim 13, VanBlon et al. in view of Arkko et al. teach 
 	wherein the speech recognizer executes on the data processing hardware of the computing device (VanBlon et al. [0019] While various other circuits, circuitry or components may be utilized in information handling devices, with regard to smart phone and/or tablet circuitry 100, an example illustrated in FIG. 1 includes a system on a chip design found for example in tablet or other mobile computing platforms. Software and processor(s) are combined in a single chip 110.Processors comprise internal arithmetic units, registers, cache memory, busses, I/O ports, etc., as is well known in the art. Internal busses and the like depend on different vendors, but essentially all the peripheral devices (120) may attach to a single chip 110. The circuitry 100 combines the processors memory control, and I/O controller hub all into a single chip 110. Also, systems 100 of this type do not typically use SATA or PCI or LPC. Common interfaces, for example, include SDIO and I2C, [0010] Fig. 3 illustrates an example method of extending the period of voice recognition, Arkko et al. [0065] It should be noted that FIG. 6 illustrates various functional units in the application node 600 and the speech recognition node 602 in a logical sense, and the skilled person is able to implement these functional units in practice using suitable software and hardware means.)

 	With respect to Claim 15, VanBlon et al. in view of Arkko et al. teach   
 	wherein detecting the additional hotword in the additional audio data corresponding to the second utterance comprises detecting the additional hotword in the additional audio data without performing speech recognition on the additional audio data (Arkko et al. [0008] A second example is referred to as “keyword spotting” which does not require translation of the entire speech input into text but the audio is searched only for specific words or phrases by recognition their sound, more or less, and then translating them into text. In general, keyword spotting requires less computing than speech recognition since only a limited word or phrase must be recognized for translation instead of an entire vocabulary, [0032] The keyword may be recognized by using any of the above-described techniques of...keyword spotting.)

 	With respect to Claim 18, VanBlon et al. in view of Arkko et al. teach 
 	wherein the second utterance only comprises the one of the additional hotwords in the activated set of additional hotwords (VanBlon et al. [0037] allow the user to issue an additional related command (e.g., “turn it up,” “skip,” “I like this,” “pause,” “stop,” etc.) without a wakeup word (i.e., activation cue).)

5.	Claims 2, 12 are rejected under 35 U.S.C.103 as being unpatentable over VanBlon et al. (US 2017/0169817 A1) in view of Arkko et al. (US 2015/0199961 A1) and Wanderlust (US 2018/0122372 A1.)

	With respect to Claim 2, VanBlon et al. in view of Arkko et al. teach all the limitations of Claim 1 upon which Claim 2 depends. VanBlon et al. in view of Arkko et al. fail to explicitly teach
 	wherein detecting the default hotword in the audio data corresponding to the first utterance comprises detecting the default hotword in the audio data without performing speech recognition on the audio data.  
	However, Wanderlust teaches 
wherein detecting the default hotword in the audio data corresponding to the first utterance comprises detecting the default hotword in the audio data without performing speech recognition on the audio data (Wanderlust [0002] capture audio, such as through microphones, process it, and attempt to spot a specific wake-up phrase. Upon spotting the wake-up phrase, they capture a following speech utterance, and behave in a programmed responsive manner, Fig. 4 element 41 spot wake-up phrase, [0039] The process begins at step 41 when a system spots a wake-up phrase. At step 42, the system processed to select an open sound from a plurality of open sounds stored in memory or database and includes a collection of open sounds 43.)
 VanBlon et al., Arkko et al. and Wanderlust are analogous art because they are from a similar field of endeavor in the Speech Processing techniques and applications. Therefore, it spot wake-up phrase, [0039] The process begins at step 41 when a system spots a wake-up phrase. At step 42, the system processed to select an open sound from a plurality of open sounds stored in memory or database and includes a collection of open sounds 43.)

With respect to Claim 12, VanBlon et al. in view of Arkko et al. teach all the limitations of Claim 11 upon which Claim 12 depends. VanBlon et al. in view of Arkko et al. fail to explicitly teach
  	wherein detecting the default hotword in the audio data corresponding to the first utterance comprises detecting the default hotword in the audio data without performing speech recognition on the audio data However, Wanderlust teaches 
wherein detecting the default hotword in the audio data corresponding to the first utterance comprises detecting the default hotword in the audio data without performing speech recognition on the audio data (Wanderlust [0002] capture audio, such as through microphones, process it, and attempt to spot a specific wake-up phrase. Upon spotting the wake-up phrase, they capture a following speech utterance, and behave in a programmed responsive manner, Fig. 4 element 41 spot wake-up phrase, [0039] The process begins at step 41 when a system spots a wake-up phrase. At step 42, the system processed to select an open sound from a plurality of open sounds stored in memory or database and includes a collection of open sounds 43.)
 VanBlon et al., Arkko et al. and Wanderlust are analogous art because they are from a similar field of endeavor in the Speech Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of extending the period of voice recognition as taught by VanBlon et al., using teaching of activating a set of keywords based upon the type or current status of the activated application as taught by Arkko et al. for the benefit of spotting keyword in spot wake-up phrase, [0039] The process begins at step 41 when a system spots a wake-up phrase. At step 42, the system processed to select an open sound from a plurality of open sounds stored in memory or database and includes a collection of open sounds 43.)

6.	Claims 4, 14 are rejected under 35 U.S.C.103 as being unpatentable over VanBlon et al. (US 2017/0169817 A1) in view of Arkko et al. (US 2015/0199961 A1) and Devaraj et al. (US 2018/0061403 A1.)

	With respect to Claim 4, VanBlon et al. in view of Arkko et al. teach all the limitations of Claim 1 upon which Claim 4 depends. VanBlon et al. in view of Arkko et al. fail to explicitly teach 
 	further comprising, in response to detecting the default hotword in the audio data corresponding to the first utterance spoken by the user, providing, by the data processing hardware, the audio data corresponding to the first utterance to a server in communication with the data processing hardware, the server executing the speech recognizer to perform the speech recognition on the audio data to identify the command to playback the music from the computing device.
	However, Devaraj et al. teach
 	further comprising, in response to detecting the default hotword in the audio data corresponding to the first utterance spoken by the user, providing, by the data processing hardware, the audio data corresponding to the first utterance to a server in communication with the data processing hardware, the server executing the speech recognizer to perform the speech recognition on the audio data to identify the command to playback the music from the computing device (Devaraj et al. [0034] The device 110, using a wakeword detection module 220, then processes the audio, or audio data corresponding to the audio, to determine if a keyword (such as a wakeword) is detected in the audio. Following detection of a wakeword, the device sends audio data 111 corresponding to the utterance, to a server 120 that includes an ASR module 250, [0064] The destination command processor 290 may be determined based on the NLU output. For example, if the NLU output includes a command to play music, the destination command processor 290 may be a music playing application, such as one located on device 110.)
 	VanBlon et al., Arkko et al. and Devaraj et al. are analogous art because they are from a similar field of endeavor in the Speech Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of extending the period of voice recognition as taught by VanBlon et al., using teaching of activating a set of keywords based upon the type or current status of the activated application as taught by Arkko et al. for the benefit of spotting keyword in the user’s utterance, using teaching of hybrid recognition as taught by Devaraj et al. for the benefit of detecting the wakeword at the local device and sending the audio data to the server in response to detecting the wakeword (Devaraj et al. [0034] The device 110, using a wakeword detection module 220, then processes the audio, or audio data corresponding to the audio, to determine if a keyword (such as a wakeword) is detected in the audio. Following detection of a wakeword, the device sends audio data 111 corresponding to the utterance, to a server 120 that includes an ASR module 250, [0064] The destination command processor 290 may be determined based on the NLU output. For example, if the NLU output includes a command to play music, the destination command processor 290 may be a music playing application, such as one located on device 110.)

	With respect to Claim 14, VanBlon et al. in view of Arkko et al. teach all the limitations of Claim 11 upon which Claim 14 depends. VanBlon et al. in view of Arkko et al. fail to explicitly teach 
 	wherein the operations further comprise, in response to detecting the default hotword in the audio data corresponding to the first utterance spoken by the user, providing the audio data corresponding to the first utterance to a server in communication with the data processing hardware, the server executing the speech recognizer to perform the speech recognition on the audio data to identify the command to playback the music from the computing device.  
However, Devaraj et al. teach
 	wherein the operations further comprise, in response to detecting the default hotword in the audio data corresponding to the first utterance spoken by the user, providing the audio data corresponding to the first utterance to a server in communication with the data processing The device 110, using a wakeword detection module 220, then processes the audio, or audio data corresponding to the audio, to determine if a keyword (such as a wakeword) is detected in the audio. Following detection of a wakeword, the device sends audio data 111 corresponding to the utterance, to a server 120 that includes an ASR module 250, [0064] The destination command processor 290 may be determined based on the NLU output. For example, if the NLU output includes a command to play music, the destination command processor 290 may be a music playing application, such as one located on device 110.)
 	VanBlon et al., Arkko et al. and Devaraj et al. are analogous art because they are from a similar field of endeavor in the Speech Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of extending the period of voice recognition as taught by VanBlon et al., using teaching of activating a set of keywords based upon the type or current status of the activated application as taught by Arkko et al. for the benefit of spotting keyword in the user’s utterance, using teaching of hybrid recognition as taught by Devaraj et al. for the benefit of detecting the wakeword at the local device and sending the audio data to the server in response to detecting the wakeword (Devaraj et al. [0034] The device 110, using a wakeword detection module 220, then processes the audio, or audio data corresponding to the audio, to determine if a keyword (such as a wakeword) is detected in the audio. Following detection of a wakeword, the device sends audio data 111 corresponding to the utterance, to a server 120 that includes an ASR module 250, [0064] The destination command processor 290 may be determined based on the NLU output. For example, if the NLU output includes a command to play music, the destination command processor 290 may be a music playing application, such as one located on device 110.)

7.	Claims 6, 9, 10, 16, 19, 20 are rejected under 35 U.S.C.103 as being unpatentable over VanBlon et al. (US 2017/0169817 A1) in view of Arkko et al. (US 2015/0199961 A1) and Ganapathiraju et al. (US 2014/0025372 A1.)

With respect to Claim 6, VanBlon et al. in view of Arkko et al. teach all the limitations of Claim 1 up which Claim 6 depends. VanBlon et al. in view of Arkko et al. fail to explicitly 
 	wherein detecting the additional hotword in the additional audio data corresponding to the second utterance comprises: 
 	extracting audio features of the additional audio data that corresponds to the second utterance; 
 	generating, using a hotword detector, hotword confidence score by processing the extracted audio features; 
 	determining, by the hotword detector, whether the hotword confidence score satisfies a hotword confidence threshold; and 
 	when the hotword confidence score satisfies the hotword confidence threshold, determining, by the hotword detector, that the additional audio data corresponding to the second utterance includes the additional hotword.  
	However, Ganapathiraju et al. teach
 	wherein detecting the additional hotword in the additional audio data corresponding to the second utterance comprises: 
 	extracting audio features of the additional audio data that corresponds to the second utterance (Ganapathiraju et al. [0036] The audio stream (i.e., what is spoken into the system by the user), 130, may be fed into the front end feature calculator, 135, which may convert the audio stream into a representation of the audio stream, or a sequence of spectral feature); 
 	generating, using a hotword detector, hotword confidence score by processing the extracted audio features (Ganapathiraju et al. [0037] In the multi-dimensional space constructed by the feature calculator, a spoken word may become a sequence of MFCC vectors forming a trajectory in the acoustic space. Keyword spotting may now simply become a problem of computing probability of generating the trajectory given the keyword model. This operation may be achieved by using the well-known principle of dynamic programing, specifically the Viterbi algorithm, which aligns the keyword model to the best segment of the audio signal, and results in a match score); 
 	determining, by the hotword detector, whether the hotword confidence score satisfies a hotword confidence threshold (Ganapathiraju et al. [0008] a computer-implemented method for spotting predetermined keywords in an audio stream is disclosed, comprising the steps of : a) developing a keyword model for the predetermined keywords: b) comparing the keyword model and the audio stream to spot probable ones of the predetermined keywords; c) computing a probability that a portion of the audio stream matches one of the predetermined keywords from the keyword model; d) comparing the computed probability to a predetermined threshold; e) declaring a potential spotted word if the computed probability is greater than the predetermined threshold, [0037] If the match score is significant, the keyword spotting algorithm infers that the keywords was spoken and reports a keyword spotted event); and  
 	when the hotword confidence score satisfies the hotword confidence threshold, determining, by the hotword detector, that the additional audio data corresponding to the second utterance includes the additional hotword (Ganapathiraju et al. [0008] a computer-implemented method for spotting predetermined keywords in an audio stream is disclosed, comprising the steps of : a) developing a keyword model for the predetermined keywords: b) comparing the keyword model and the audio stream to spot probable ones of the predetermined keywords; c) computing a probability that a portion of the audio stream matches one of the predetermined keywords from the keyword model; d) comparing the computed probability to a predetermined threshold; e) declaring a potential spotted word if the computed probability is greater than the predetermined threshold, [0037] If the match score is significant, the keyword spotting algorithm infers that the keywords was spoken and reports a keyword spotted event).
 	VanBlon et al., Arkko et al. and Ganapathiraju et al. are analogous art because they are from a similar field of endeavor in the Speech Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of extending the period of voice recognition as taught by VanBlon et al., using teaching of activating a set of keywords based upon the type or current status of the activated application as taught by Arkko et al. for the benefit of spotting keyword in the user’s utterance, using teaching of extracting the acoustic feature, aligning the keyword model to the best segment of the audio data as taught by Ganapathiraju et al. for the benefit of spotting keywords in the audio data (Ganapathiraju et al. [0037] Keyword spotting may now simply become a problem of computing probability of generating the trajectory given the keyword model. This operation may be achieved by using the well-known principle of dynamic programing, specifically the Viterbi algorithm, which aligns the keyword model to the best segment of the audio signal, and results in a match score. If the match score is significant, the keyword spotting algorithm infers that the keywords was spoken and reports a keyword spotted event).

	With respect to Claim 9, VanBlon et al. in view of Arkko et al. teach all the limitations of Claim 1 upon which Claim 9 depends. VanBlon et al. in view of Arkko fail to explicitly teach 
 	wherein each additional hotword in the activated set of additional hotwords is associated with a respective hotword model configured to recognize audio of the respective additional hotword.  
	However, Ganapathiraju et al. teach 
 	wherein each additional hotword in the activated set of additional hotwords is associated with a respective hotword model configured to recognize audio of the respective additional hotword (Ganapathiraju et al. [0008] a computer-implemented method for spotting predetermined keywords in an audio stream is disclosed, comprising the steps of : a) developing a keyword model for the predetermined keywords: b) comparing the keyword model and the audio stream to spot probable ones of the predetermined keywords; c) computing a probability that a portion of the audio stream matches one of the predetermined keywords from the keyword model; d) comparing the computed probability to a predetermined threshold; e) declaring a potential spotted word if the computed probability is greater than the predetermined threshold.)
 	VanBlon et al., Arkko et al. and Ganapathiraju et al. are analogous art because they are from a similar field of endeavor in the Speech Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of extending the period of voice recognition as taught by VanBlon et al., using teaching of activating a set of keywords based upon the type or current status of the activated application as taught by Arkko et al. for the benefit of spotting keyword in the user’s utterance, using teaching of keyword models as taught by Ganapathiraju et al. for the benefit of spotting predetermined keywords in an audio stream (Ganapathiraju et al. [0008] a computer-implemented method for spotting predetermined keywords in an audio stream is disclosed, comprising the steps of : a) developing a keyword model for the predetermined keywords: b) comparing the keyword model and the audio stream to spot probable ones of the predetermined keywords; c) computing a probability that a portion of the audio stream matches one of the predetermined keywords from the keyword model; d) comparing the computed probability to a predetermined threshold; e) declaring a potential spotted word if the computed probability is greater than the predetermined threshold.)

	With respect to Claim 10, VanBlon et al. in view of Arkko et al. teach all the limitations of Claim 1 upon which Claim 10 depends. VanBlon et al. in view of Arkko fail to explicitly teach 
 	further comprising storing, by the data processing hardware, multiple hotword models on memory hardware of the computing device, each hotword model of the multiple hotword models associated with a respective one of the additional hotwords in the activated set of additional hotwords that is configured to recognize audio of the respective audio additional hotword.  
	However, Ganapathiraju et al. teach 
 	further comprising storing, by the data processing hardware, multiple hotword models on memory hardware of the computing device, each hotword model of the multiple hotword models associated with a respective one of the additional hotwords in the activated set of additional hotwords that is configured to recognize audio of the respective audio additional hotword (Ganapathiraju et al. [0031] FIG. 1 is a diagram illustrating the basic components in a keyword spotter, 100. The basic components of a keyword spotter 100 may include User Data/Keywords 105, Keyword Model 110, Knowledge Sources 115 which include an Acoustic Model 120 and a Pronunciation Dictionary/Predictor 125, an Audio Stream 130, a Front End Feature Calculator 135, a Recognition Engine (Pattern Matching) 140, and the Reporting of Found Keywords in Real-Time 145, [0032] Keywords may be defined, 105, by the user of the system according to user preference. The keyword model 110 may be formed by concatenating phoneme HMMs. This is further described in the description of FIG. 2. The Keyword Model, 110, may be composed based on the keywords that are defined by the user and the input to the keyword model based on Knowledge Sources, 115. Such knowledge sources may include an Acoustic Model, 120, and a Pronunciation Dictionary/Predictor, 125, [0037] The task of the recognition engine may be to take a set of keyword models and search through presented audio stream to find if the words were spoken.)
 	VanBlon et al., Arkko et al. and Ganapathiraju et al. are analogous art because they are from a similar field of endeavor in the Speech Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of The task of the recognition engine may be to take a set of keyword models and search through presented audio stream to find if the words were spoken.)

	With respect to Claim 16, VanBlon et al. in view of Arkko et al. teach all the limitations of Claim 11 up which Claim 16 depends. VanBlon et al. in view of Arkko et al. fail to explicitly 
  	wherein detecting the additional hotword in the additional audio data corresponding to the second utterance comprises: 
 	extracting audio features of the additional audio data that corresponds to the second utterance; 
 	generating, using a hotword detector, hotword confidence score by processing the extracted audio features; 
 	determining, by the hotword detector, whether the hotword confidence score satisfies a hotword confidence threshold; and 
 	when the hotword confidence score satisfies the hotword confidence threshold, determining, by the hotword detector, that the additional audio data corresponding to the second utterance includes the additional hotword.  
	However, Ganapathiraju et al. teach
 	wherein detecting the additional hotword in the additional audio data corresponding to the second utterance comprises: 
 	extracting audio features of the additional audio data that corresponds to the second utterance (Ganapathiraju et al. [0036] The audio stream (i.e., what is spoken into the system by the user), 130, may be fed into the front end feature calculator, 135, which may convert the audio stream into a representation of the audio stream, or a sequence of spectral feature); 
 	generating, using a hotword detector, hotword confidence score by processing the extracted audio features (Ganapathiraju et al. [0037] In the multi-dimensional space constructed by the feature calculator, a spoken word may become a sequence of MFCC vectors forming a trajectory in the acoustic space. Keyword spotting may now simply become a problem of computing probability of generating the trajectory given the keyword model. This operation may be achieved by using the well-known principle of dynamic programing, specifically the Viterbi algorithm, which aligns the keyword model to the best segment of the audio signal, and results in a match score); 
 	determining, by the hotword detector, whether the hotword confidence score satisfies a hotword confidence threshold (Ganapathiraju et al. [0008] a computer-implemented method for spotting predetermined keywords in an audio stream is disclosed, comprising the steps of : a) developing a keyword model for the predetermined keywords: b) comparing the keyword model and the audio stream to spot probable ones of the predetermined keywords; c) computing a probability that a portion of the audio stream matches one of the predetermined keywords from the keyword model; d) comparing the computed probability to a predetermined threshold; e) declaring a potential spotted word if the computed probability is greater than the predetermined threshold, [0037] If the match score is significant, the keyword spotting algorithm infers that the keywords was spoken and reports a keyword spotted event); and  
 	when the hotword confidence score satisfies the hotword confidence threshold, determining, by the hotword detector, that the additional audio data corresponding to the second utterance includes the additional hotword (Ganapathiraju et al. [0008] a computer-implemented method for spotting predetermined keywords in an audio stream is disclosed, comprising the steps of : a) developing a keyword model for the predetermined keywords: b) comparing the keyword model and the audio stream to spot probable ones of the predetermined keywords; c) computing a probability that a portion of the audio stream matches one of the predetermined keywords from the keyword model; d) comparing the computed probability to a predetermined threshold; e) declaring a potential spotted word if the computed probability is greater than the predetermined threshold, [0037] If the match score is significant, the keyword spotting algorithm infers that the keywords was spoken and reports a keyword spotted event).
 	VanBlon et al., Arkko et al. and Ganapathiraju et al. are analogous art because they are from a similar field of endeavor in the Speech Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of extending the period of voice recognition as taught Keyword spotting may now simply become a problem of computing probability of generating the trajectory given the keyword model. This operation may be achieved by using the well-known principle of dynamic programing, specifically the Viterbi algorithm, which aligns the keyword model to the best segment of the audio signal, and results in a match score. If the match score is significant, the keyword spotting algorithm infers that the keywords was spoken and reports a keyword spotted event).

	With respect to Claim 19, VanBlon et al. in view of Arkko et al. teach all the limitations of Claim 11 upon which Claim 19 depends. VanBlon et al. in view of Arkko fail to explicitly teach 
 	wherein each additional hotword in the activated set of additional hotwords is associated with a respective hotword model configured to recognize audio of the respective additional hotword.  
 	However, Ganapathiraju et al. teach 
 	wherein each additional hotword in the activated set of additional hotwords is associated with a respective hotword model configured to recognize audio of the respective additional hotword (Ganapathiraju et al. [0008] a computer-implemented method for spotting predetermined keywords in an audio stream is disclosed, comprising the steps of : a) developing a keyword model for the predetermined keywords: b) comparing the keyword model and the audio stream to spot probable ones of the predetermined keywords; c) computing a probability that a portion of the audio stream matches one of the predetermined keywords from the keyword model; d) comparing the computed probability to a predetermined threshold; e) declaring a potential spotted word if the computed probability is greater than the predetermined threshold.)
 	VanBlon et al., Arkko et al. and Ganapathiraju et al. are analogous art because they are from a similar field of endeavor in the Speech Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of a computer-implemented method for spotting predetermined keywords in an audio stream is disclosed, comprising the steps of : a) developing a keyword model for the predetermined keywords: b) comparing the keyword model and the audio stream to spot probable ones of the predetermined keywords; c) computing a probability that a portion of the audio stream matches one of the predetermined keywords from the keyword model; d) comparing the computed probability to a predetermined threshold; e) declaring a potential spotted word if the computed probability is greater than the predetermined threshold.)

	With respect to Claim 20, VanBlon et al. in view of Arkko et al. teach all the limitations of Claim 11 upon which Claim 20 depends. VanBlon et al. in view of Arkko fail to explicitly teach 
 	wherein the operations further comprise storing multiple hotword models on the memory hardware of the computing device, each hotword model of the multiple hotword models associated with a respective one of the additional hotwords in the activated set of additional hotwords that is configured to recognize audio of the respective audio additional hotword.
 	However, Ganapathiraju et al. teach 
 	wherein the operations further comprise storing multiple hotword models on the memory hardware of the computing device, each hotword model of the multiple hotword models associated with a respective one of the additional hotwords in the activated set of additional hotwords that is configured to recognize audio of the respective audio additional hotword (Ganapathiraju et al. [0031] FIG. 1 is a diagram illustrating the basic components in a keyword spotter, 100. The basic components of a keyword spotter 100 may include User Data/Keywords 105, Keyword Model 110, Knowledge Sources 115 which include an Acoustic Model 120 and a Pronunciation Dictionary/Predictor 125, an Audio Stream 130, a Front End Feature Calculator 135, a Recognition Engine (Pattern Matching) 140, and the Reporting of Found Keywords in Real-Time 145, [0032] Keywords may be defined, 105, by the user of the system according to This is further described in the description of FIG. 2. The Keyword Model, 110, may be composed based on the keywords that are defined by the user and the input to the keyword model based on Knowledge Sources, 115. Such knowledge sources may include an Acoustic Model, 120, and a Pronunciation Dictionary/Predictor, 125, [0037] The task of the recognition engine may be to take a set of keyword models and search through presented audio stream to find if the words were spoken.)
 	VanBlon et al., Arkko et al. and Ganapathiraju et al. are analogous art because they are from a similar field of endeavor in the Speech Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of extending the period of voice recognition as taught by VanBlon et al., using teaching of activating a set of keywords based upon the type or current status of the activated application as taught by Arkko et al. for the benefit of spotting keyword in the user’s utterance, using teaching of keyword models as taught by Ganapathiraju et al. for the benefit of taking a set of keyword models and search through presented audio stream to find if the words were spoken (Ganapathiraju et al. [0037] The task of the recognition engine may be to take a set of keyword models and search through presented audio stream to find if the words were spoken.)

8.	Claims 7, 17 are rejected under 35 U.S.C.103 as being unpatentable over VanBlon et al. (US 2017/0169817 A1) in view of Arkko et al. (US 2015/0199961 A1), Ganapathiraju et al. (US 2014/0025372 A1) and Hart et al. (US 2014/0249817 A1.)

With respect to Claim 7, VanBlon et al. in view of Arkko et al. and Ganapathiraju et al. teach 
 	further comprising, when the hotword confidence score fails to satisfy the hotword confidence threshold: 
 	determining, by the hotword detector, that the additional audio data does not include the additional audio data corresponding to the second utterance does not include the additional hotword (Ganapathiraju et al. [0008] a computer-implemented method for spotting predetermined keywords in an audio stream is disclosed, comprising the steps of : a) developing a keyword model for the predetermined keywords: b) comparing the keyword model and the audio stream to spot probable ones of the predetermined keywords; c) computing a probability that a portion of the audio stream matches one of the predetermined keywords from the keyword model; d) comparing the computed probability to a predetermined threshold; e) declaring a potential spotted word if the computed probability is greater than the predetermined threshold. The Examiner notes that the method/the system in Ganapathiraju et al. discloses that if the computed probability is greater than the predetermined threshold, the system declaring that the keyword is presented in the audio data. It construed that if the computed probability is not greater than the predetermined threshold, the system determines that the keyword is not presented in the audio data); and 
	VanBlon et al. in view of Arkko et al. fail to explicitly teach
 	bypassing, by the data processing hardware, performing the respective action for controlling the playback of the music from the computing device.  
	However, Hart et al. teach 
bypassing, by the data processing hardware, performing the respective action for controlling the playback of the music from the computing device (Hart et al. [0070] the process 300 may perform the operation if the confidence level is greater than a threshold and otherwise may refrain from the performing the operation. As used herein, a confidence score level denotes any metric for representing a likelihood that a particular user uttered a particular piece of speech, [0010] the first user may request, via a voice command, to begin playing music on the device or on the another device. After the device begins playing the music, the first user may continue to provide voice commands to the device, such as “stop”, “next song”, “please turn up the volume”, and the like.)
 	VanBlon et al., Arkko et al., Ganapathiraju et al. and Hart et al. are analogous art because they are from a similar field of endeavor in the Speech Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of extending the period of voice recognition as taught by VanBlon et al., using teaching of activating a set of keywords based upon the type or current status of the activated application as taught by Arkko et al. for the benefit of spotting keyword in the user’s utterance, using teaching of extracting the acoustic feature, aligning the keyword model to the best segment of the audio data as taught by the process 300 may perform the operation if the confidence level is greater than a threshold and otherwise may refrain from the performing the operation. As used herein, a confidence score level denotes any metric for representing a likelihood that a particular user uttered a particular piece of speech, [0010] the first user may request, via a voice command, to begin playing music on the device or on the another device. After the device begins playing the music, the first user may continue to provide voice commands to the device, such as “stop”, “next song”, “please turn up the volume”, and the like.)

With respect to Claim 17, VanBlon et al. in view of Arkko et al. and Ganapathiraju et al. teach 
 	wherein the operations further comprise, when the hotword confidence score fails to satisfy the hotword confidence threshold: 
 	determining, by the hotword detector, that the additional audio data does not include the additional audio data corresponding to the second utterance does not include the additional hotword (Ganapathiraju et al. [0008] a computer-implemented method for spotting predetermined keywords in an audio stream is disclosed, comprising the steps of : a) developing a keyword model for the predetermined keywords: b) comparing the keyword model and the audio stream to spot probable ones of the predetermined keywords; c) computing a probability that a portion of the audio stream matches one of the predetermined keywords from the keyword model; d) comparing the computed probability to a predetermined threshold; e) declaring a potential spotted word if the computed probability is greater than the predetermined threshold. The Examiner notes that the method/the system in Ganapathiraju et al. discloses that if the computed probability is greater than the predetermined threshold, the system declaring that the keyword is presented in the audio data. It construed that if the computed probability is not greater than the predetermined threshold, the system determines that the keyword is not presented in the audio data); and 
	VanBlon et al. in view of Arkko et al. fail to explicitly teach

However, Hart et al. teach
bypassing performing the respective action for controlling the playback of the music from the computing device (Hart et al. [0070] the process 300 may perform the operation if the confidence level is greater than a threshold and otherwise may refrain from the performing the operation. As used herein, a confidence score level denotes any metric for representing a likelihood that a particular user uttered a particular piece of speech, [0010] the first user may request, via a voice command, to begin playing music on the device or on the another device. After the device begins playing the music, the first user may continue to provide voice commands to the device, such as “stop”, “next song”, “please turn up the volume”, and the like.)
  	VanBlon et al., Arkko et al. and Hart et al. are analogous art because they are from a similar field of endeavor in the Speech Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of extending the period of voice recognition as taught by VanBlon et al., using teaching of activating a set of keywords based upon the type or current status of the activated application as taught by Arkko et al. for the benefit of spotting keyword in the user’s utterance, using teaching of the threshold as taught by Hart et al. for the benefit of refraining from the performing the operation if the confidence level is not greater than the threshold (Hart et al. [0070] the process 300 may perform the operation if the confidence level is greater than a threshold and otherwise may refrain from the performing the operation. As used herein, a confidence score level denotes any metric for representing a likelihood that a particular user uttered a particular piece of speech, [0010] the first user may request, via a voice command, to begin playing music on the device or on the another device. After the device begins playing the music, the first user may continue to provide voice commands to the device, such as “stop”, “next song”, “please turn up the volume”, and the like.)

Conclusion
9.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. See PTO-892. 

b. 	Liu et al. (US 2015/0162002 A1.) In this reference, Liu et al. disclose a method/a system for analyzing an audio stream in response to detecting of keyword. 
c. 	Kim et al. (US 2014/0337031 A1.) In this reference, Kim et al. disclose a method/a system for detecting a target keyword by activating a function in an electronic device. 

10. 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429.  The examiner can normally be reached on Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C. Flanders can be reached 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/THUYKHANH LE/Primary Examiner, Art Unit 2655