DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1.	The present application is being examined under the pre-AIA  first to invent provisions. 

Continued Examination Under 37 CFR 1.114
2.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/19/2021 has been entered.

Response to Arguments
3.	Applicant’s arguments with respect to claim(s) 29 and 41 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Double Patenting
4.	The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

5.	Claims 29, 32, 36, 41 and 46 of the pending application 16/443160 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 5, 19, 5, 5, 5 of the issued patent US 9704486 B2 in view of Murthi et al. (US 2013/0132095 A1) respectively. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims of the pending application are similar in scope in comparison to the issued patent in view of Murthi et al. 
 	Claim 5 of the issued patent does not teach the following limitation as recited in claim 29 of the pending application. More specifically, 
 	a second set of one or more processors, configured to: 
 	determine that the audio data likely does not comprise data representing the designated keyword; and 
 generate, in response to determining that the audio data likely does not comprise data representing the designated keyword, an instruction regarding the audio data.  
 	However, Murthi et al. teach 
 	a second set of one or more processors (Murthi et al. Fig. 6 element 454 Voice Recognition Engine), configured to: 
determine that the audio data likely does not comprise data representing the designated keyword (Murthi et al. [0087] As noted above, a rich voice recognition engine may not operate on the sparse power available in standby mode. However, once the computing system 12 is activated by the standby activation unit 464 as described above, a voice recognition engine 194 (Fig. 2) may then confirm in step 454 whether the user did in fact speak the correct activation phrase. If so, the computing device may remain activated. If not, a signal may be sent to the power supply 474 to revert back to standby mode in step 456. The Examiner notes that as disclosed in steps 436, 440 and 450 in Fig. 6 of Murthi et al., the method in Murthi et al. determines that the audio data likely comprises data representing a designated keyword by matching the received audio data with the stored activation phrase(s) before doing the confirmation process. See more at paragraphs [0078, 0081 and 0082] of Murthi et al.); and 
 generate, in response to determining that the audio data likely does not comprise data representing the designated keyword, an instruction regarding the audio data (Murthi et al. [0087] If so, the computing device may remain activated. If not, a signal may be sent to the power supply 474 to revert back to standby mode in step 456. Fig. 6 element 454 Confirmation by Voice Recognition Engine, No, element 456 Revert to Standby Mode.)
 Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the issued patent to incorporate the voice recognition engine as taught by Murthi et al for the benefit of confirming activation with higher degree of certainty (Murthi et al. [0088] The voice recognition engine may use more sophisticated algorithms than the pattern matching performed by the standby activation unit 464 to confirm activation with a much higher degree of certainty.)
 	Claim 5 of the issued patent does not teach the following limitation as recited in claim 41 of the pending application. More specifically, 
 	determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword. 
 	However, Murthi et al. teach 
 	determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword (Murthi et al. [0087] As noted above, a rich voice recognition engine may not operate on the sparse power available in standby mode. However, once the computing system 12 is activated by the standby activation unit 464 as described above, a voice recognition engine 194 (Fig. 2) may then confirm in step 454 whether the user did in fact speak the correct activation phrase. If so, the computing device may remain activated. If not, a signal may be sent to the power supply 474 to revert back to standby mode in step 456. The Examiner notes that as disclosed in steps 436, 440 and 450 in Fig. 6 of Murthi et al., the method in Murthi et al. determines that the audio data likely comprises data representing a designated keyword by matching the received audio data with the stored activation phrase(s) before doing the confirmation process. See more at paragraphs [0078, 0081 and 0082] of Murthi et al.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the issued patent to incorporate the voice recognition engine as taught by Murthi et al for the benefit of confirming activation with higher degree of certainty (Murthi et al. [0088] The voice recognition engine may use more sophisticated algorithms than the pattern matching performed by the standby activation unit 464 to confirm activation with a much higher degree of certainty.)

Pending Application 16/443160
Issued Patent US 9,704,486 B2
  29. (Currently amended) A system comprising: 
 	an audio input component comprising a microphone, wherein the audio input component is configured to generate audio data representing sound detected by the microphone; 
 	a first set of one or more processors, configured to: 
 	determine that the audio data likely comprises data representing voice activity; and 
 	determine, in response to determining that the audio data likely comprises data representing the voice activity, that the audio 
 	a second set of one or more processors, configured to: 
 	determine that the audio data likely does not comprise data representing the designated keyword; and 
 generate, in response to determining that the audio data likely does not comprise data representing the designated keyword, an instruction regarding the audio data.  


receiving an audio input; 
determining one or more values from the audio input, wherein the one or more values comprise at least one of: 
a first value indicating an energy level of the audio input; or 
a second value indicating a likelihood that the audio input comprises speech; 

activating a first module of the first computing device based at least in part on the one or more values; 
performing an operation, by the first module, wherein the operation comprises at least one of: 
determining that the audio input comprises a wakeword and causing activation of a network interface module in response to determining that the audio input comprises a wakeword, wherein causing activation of the network interface module comprises providing power to the network interface module; 
performing speech recognition on at least a portion of the audio input to obtain speech recognition results; or 
causing transmission of at least a portion of the audio input to a second computing device. 



 	wherein the audio input component is further configured to generate second audio data representing sound detected by the microphone; 
 	wherein the first set of one or more processors is further configured to: determine that the second audio data likely comprises data representing voice activity; and 
 	determine, in response to determining that the second audio data likely comprises data representing second voice activity, that the second audio data likely comprises data representing the designated keyword; and wherein the second set of one or more processors is further configured to: 
 	determine that the second audio data likely comprises data representing the designated keyword; 
 	cause the network interface component to send a transmission of at least a portion of the second audio data to a remote computing system, wherein the portion of the second audio data represents an utterance; and 

receiving an audio input; 
determining one or more values from the audio input, wherein the one or more values comprise at least one of: 
a first value indicating an energy level of the audio input; or 
a second value indicating a likelihood that the audio input comprises speech; 
increasing a sampling rate of the audio input, from a first lower sampling rate to a second higher sampling rate, based at least in part on the one or more values; 
activating a first module of the first computing device based at least in part on the one or more values; 
performing an operation, by the first module, wherein the operation comprises at least one of: 

performing speech recognition on at least a portion of the audio input to obtain speech recognition results; or 
causing transmission of at least a portion of the audio input to a second computing device. 

 	under control of a computing system comprising a plurality of processors, 
 		receiving audio data representing sound detected by a microphone; 
 	 	determining, by a first subset of the plurality of processors, that the audio data likely comprises data representing voice activity; 
 		in response to determining that the audio data likely comprises data representing the voice activity, determining, by the first subset of the plurality processors, that the audio data likely comprises data representing a designated keyword; and
, by a second subset of the plurality of processors, speech recognition on at least a portion of the audio data to obtain speech recognition results; and 
 	 	determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword.  


receiving an audio input; 
determining one or more values from the audio input, wherein the one or more values comprise at least one of: 
a first value indicating an energy level of the audio input; or 
a second value indicating a likelihood that the audio input comprises speech; 
increasing a sampling rate of the audio input, from a first lower sampling rate to a second 
activating a first module of the first computing device based at least in part on the one or more values; 
performing an operation, by the first module, wherein the operation comprises at least one of: determining that the audio input comprises a wakeword and causing activation of a network interface module in response to determining that the audio input comprises a wakeword, wherein causing activation of the network interface module comprises providing power to the network interface module; 
performing speech recognition on at least a portion of the audio input to obtain speech recognition results; or 
causing transmission of at least a portion of the audio input to a second computing device.

 	receiving second audio data representing sound detected by the microphone; 
 	determining, by the first subset of the plurality of processors, that the second audio data likely comprises data representing voice activity; 
 	in response to determining that the second audio data likely comprises data representing voice activity, determining, by the first subset of the plurality of processors, that the second audio data likely comprises data representing the designated keyword; 
 	determining, by the second subset of the plurality of processors, that the second audio data likely comprises data representing the designated keyword; 
 	sending at least a portion of the second audio data to a remote computing system; and
 	receiving speech recognition results from the remote computing system.  

receiving an audio input; 

a first value indicating an energy level of the audio input; or 
a second value indicating a likelihood that the audio input comprises speech; 
increasing a sampling rate of the audio input, from a first lower sampling rate to a second higher sampling rate, based at least in part on the one or more values; 
activating a first module of the first computing device based at least in part on the one or more values; 
performing an operation, by the first module, wherein the operation comprises at least one of: determining that the audio input comprises a wakeword and causing activation of a network interface module in response to determining that the audio input comprises a wakeword, wherein causing activation of the network interface module comprises providing power to the network interface module; 

causing transmission of at least a portion of the audio input to a second computing device.


 	This is a non-provisional nonstatutory double patenting rejection because the patentably indistinct claims have in fact been patented.

6.	Claims 29, 35, 36-38, 41, 46 of the pending application 16/443160 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 4, 1, 5-7, 7 of the issued patent US 10,325,598 B2 in view of Murthi et al. (US 2013/0132095 A1) respectively. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims of the pending application are similar in scope in comparison to the issued patent in view of Murthi et al. 
 	Claim 1 of the issued patent does not teach the following limitation as recited in claim 29 of the pending application. More specifically, 
 	a second set of one or more processors, configured to: 
 	determine that the audio data likely does not comprise data representing the designated keyword; and 
 generate, in response to determining that the audio data likely does not comprise data representing the designated keyword, an instruction regarding the audio data.  
 	However, Murthi et al. teach 
 	a second set of one or more processors (Murthi et al. Fig. 6 element 454 Voice Recognition Engine), configured to: 
 	determine that the audio data likely does not comprise data representing the designated keyword (Murthi et al. [0087] As noted above, a rich voice recognition engine may not operate on the sparse power available in standby mode. However, once the computing system 12 is activated by the standby activation unit 464 as described above, a voice recognition engine 194 (Fig. 2) may then confirm in step 454 whether the user did in fact speak the correct activation phrase. If so, the computing device may remain activated. If not, a signal may be sent to the power supply 474 to revert back to standby mode in step 456. The Examiner notes that as disclosed in steps 436, 440 and 450 in Fig. 6 of Murthi et al., the method in Murthi et al. determines that the audio data likely comprises data representing a designated keyword by matching the received audio data with the stored activation phrase(s) before doing the confirmation process. See more at paragraphs [0078, 0081 and 0082] of Murthi et al.); and 
 generate, in response to determining that the audio data likely does not comprise data representing the designated keyword, an instruction regarding the audio data (Murthi et al. [0087] If so, the computing device may remain activated. If not, a signal may be sent to the power supply 474 to revert back to standby mode in step 456. Fig. 6 element 454 Confirmation by Voice Recognition Engine, No, element 456 Revert to Standby Mode.)
 Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the issued patent to incorporate the voice recognition engine as taught by Murthi et al for the benefit of confirming activation with higher degree of certainty (Murthi et al. [0088] The voice recognition engine may use more sophisticated algorithms than the pattern matching performed by the standby activation unit 464 to confirm activation with a much higher degree of certainty.)
 	Claim 7 of the issued patent does not teach the following limitation as recited in claim 41 of the pending application. More specifically, 
 	determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword. 
 	However, Murthi et al. teach 
 	determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword (Murthi et al. [0087] As noted above, a rich voice recognition engine may not operate on the sparse power available in standby mode. However, once the computing system 12 is activated by the standby activation unit 464 as described above, a voice recognition engine 194 (Fig. 2) may then confirm in step 454 whether the user did in fact speak the correct activation phrase. If so, the computing device may remain activated. If not, a signal may be sent to the power supply 474 to revert back to standby mode in step 456. The Examiner notes that as disclosed in steps 436, 440 and 450 in Fig. 6 of Murthi et al., the method in Murthi et al. determines that the audio data likely comprises data representing 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the issued patent to incorporate the voice recognition engine as taught by Murthi et al for the benefit of confirming activation with higher degree of certainty (Murthi et al. [0088] The voice recognition engine may use more sophisticated algorithms than the pattern matching performed by the standby activation unit 464 to confirm activation with a much higher degree of certainty.)

Pending Application 16/443160
Issued Patent US 10,325,598 B2
29. (Currently amended) A system comprising: 
 	an audio input component comprising a microphone, wherein the audio input component is configured to generate audio data representing sound detected by the microphone; 
 	a first set of one or more processors, configured to: 
 	determine that the audio data likely comprises data representing voice activity; and 
 	determine, in response to determining that the audio data likely comprises data representing the voice activity, that the audio data likely comprises data representing a designated keyword; and 
 	a second set of one or more processors, configured to: 
determine that the audio data likely does not comprise data representing the designated keyword; and 
 generate, in response to determining that the audio data likely does not comprise data representing the designated keyword, an instruction regarding the audio data.  


 a network interface component;
an audio input component configured to receive an audio input; and 
one or more processors configured to: 
 	determine that an energy level of the audio input satisfies a threshold; 
 	determine, in response to determining that the energy level satisfies the threshold, that the audio input likely comprises data representing an utterance; 
 	determine, in response to determining that the audio input likely comprises data representing the utterance, that the audio input likely comprises data representing a 
 	cause transmission of the audio input by the network interface component in response to determining that the audio input likely comprises data representing the wakeword; 
 wherein the network interface component is configured to: 
 	transmit the audio input to a remote computing system; 
 	receive speech recognition results from the remote computing system; 
 	receive confirmation data from the remote computing system, wherein the confirmation data indicates that the audio input likely comprises data representing the wakeword; 
 	transmit a subsequent audio input to the remote computing system based at least partly on receiving the confirmation data; and 
 	receive subsequent speech recognition results from the remote computing system. 



 	wherein the audio input component is further configured to generate second audio data representing sound detected by the microphone; 
 	wherein the first set of one or more processors is further configured to: determine that the second audio data likely comprises data representing voice activity; and 
 	determine, in response to determining that the second audio data likely comprises data representing second voice activity, that the second audio data likely comprises data representing the designated keyword; and wherein the second set of one or more processors is further configured to: 
 	determine that the second audio data likely comprises data representing the designated keyword; 
 	cause the network interface component to send a transmission of at least a second audio data to a remote computing system, wherein the portion of the second audio data represents an utterance; and receive speech recognition results from the remote computing system.  

 a network interface component;
an audio input component configured to receive an audio input; and 
one or more processors configured to: 
 	determine that an energy level of the audio input satisfies a threshold; 
 	determine, in response to determining that the energy level satisfies the threshold, that the audio input likely comprises data representing an utterance; 
 	determine, in response to determining that the audio input likely comprises data representing the utterance, that the audio input likely comprises data representing a wakeword indicative of device-directed speech; and 

 wherein the network interface component is configured to: 
 	transmit the audio input to a remote computing system; 
 	receive speech recognition results from the remote computing system; 
 	receive confirmation data from the remote computing system, wherein the confirmation data indicates that the audio input likely comprises data representing the wakeword; 
 	transmit a subsequent audio input to the remote computing system based at least partly on receiving the confirmation data; and 
 	receive subsequent speech recognition results from the remote computing system. 
third audio data representing an audio output, and wherein the second set of one or more processors is further configured 



6. The system of claim 1, wherein the one or more processors are further configured to determine a response to the utterance using the speech recognition results, wherein the speech recognition results comprise a transcription of the utterance. 
41. (Currently amended) A computer-implemented method comprising: 
 	under control of a computing system comprising a plurality of processors, 
 		receiving audio data representing sound detected by a microphone; 
 	 	determining, by a first subset of the plurality of processors, that the audio data likely comprises data representing voice activity; 
 		in response to determining that the audio data likely comprises data representing the voice activity, determining, by the first subset of the plurality processors, that the audio data likely comprises data representing a designated keyword; and
 	  	in response to determining that the audio data likely comprises data representing the designated keyword, performing, by a second subset of the plurality of processors, speech recognition on at least a portion of the audio data to obtain speech recognition results; and 
 	 	determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword.  


under control of a computing system configured to execute specific computer-executable instructions, receiving an audio input; 
determining that an energy level of the audio input satisfies a threshold; 
in response to determining that the energy level satisfies the threshold, determining that the audio input likely comprises data representing an utterance; 
in response to determining that audio input likely comprises data representing the utterance, determining that the audio input likely comprises data representing a 
in response to determining that the audio input likely comprises data representing the wakeword, transmitting the audio input to a remote computing system; 
receiving speech recognition results from the remote computing system; 
receiving confirmation data from the remote computing system, wherein the confirmation data indicates that the audio input likely comprises data representing the wakeword; 
transmitting a subsequent audio input to the remote computing system based at least partly on receiving the confirmation data; and 
receiving subsequent speech recognition results from the remote computing system. 

 	receiving second audio data representing sound detected by the microphone; 
 	determining, by the first subset of the plurality of processors, that the second audio data likely comprises data representing voice activity; 
in response to determining that the second audio data likely comprises data representing voice activity, determining, by the first subset of the plurality of processors, that the second audio data likely comprises data representing the designated keyword; 
 	determining, by the second subset of the plurality of processors, that the second audio data likely comprises data representing the designated keyword; 
 	sending at least a portion of the second audio data to a remote computing system; and
 	receiving speech recognition results from the remote computing system.  

under control of a computing system configured to execute specific computer-executable instructions, receiving an audio input; 
determining that an energy level of the audio input satisfies a threshold; 

in response to determining that audio input likely comprises data representing the utterance, determining that the audio input likely comprises data representing a wakeword indicative of device-directed speech; 
in response to determining that the audio input likely comprises data representing the wakeword, transmitting the audio input to a remote computing system; 
receiving speech recognition results from the remote computing system; 
receiving confirmation data from the remote computing system, wherein the confirmation data indicates that the audio input likely comprises data representing the wakeword; 
transmitting a subsequent audio input to the remote computing system based at least partly on receiving the confirmation data; and 
receiving subsequent speech recognition results from the remote computing system. 




Claim Rejections - 35 USC § 103
7.	The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.

8.	 Claims 29, 30, 32, 39, 41, 43, 49, 50, 51, 53 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Miyazawa et al. (5983186) and Murthi et al. (US 2013/0132095 A1).

	With respect to Claim 29, Miyazawa et al. disclose
 	A system comprising: 
 	an audio input component comprising a microphone, wherein the audio input component is configured to generate audio data representing sound detected by the microphone (Miyazawa et al. col. 6 lines 33 sound signal capture unit (here a microphone)); 
 	a first set of one or more processors (Miyazawa et al. col. 3 lines 18-20 Preferably, this power detector includes processing circuitry for forcing the mechanism to selectively enter or terminate a low-power sleep mode), configured to: 
 	determine that the audio data likely comprises data representing voice activity (Miyazawa et al. col. 3 lines 12-20 7) an input sound signal power detector in communication with at least the sound signal input unit and the interaction controller for detecting the volume, magnitude or amplitude of input sound signals based on sound signal waveforms perceived by the sound signal input unit or capture device, col. 10 lines 8-13 Control begins as step s1, as shown in Fig. 4. In step s1, input sound signal power detector 9 determines whether or not the power of the input sound signal is greater than a preset threshold th1, and outputs a signal indicating that a sound signal has been input when the power of the input sound signal becomes greater than threshold th1); and 
 	determine, in response to determining that the audio data likely comprises data representing the voice activity, that the audio data likely comprises data representing a designated keyword (Miyazawa et al. Fig. 4 element s6 Is it a keyword? Col. 10 lines 37-41 control passes on to step s6 and accumulated phrase detection data is used to determine whether or not the input sound signal contains a preregistered recognizable keyword using the above-described recognition techniques, col. 10 lines 48-51 if the input sound signal is determined to be a keyword in step s6, control instead passed to step s8 in which sleep mode flag is cleared for shifting the device from the sleep mode to the active mode); and 
	Miyazawa et al. fail to explicitly teach 
 	a second set of one or more processors, configured to: 
 	determine that the audio data likely does not comprise data representing the designated keyword; and 
 generate, in response to determining that the audio data likely does not comprise data representing the designated keyword, an instruction regarding the audio data.  
	However, Murthi et al. teach 
 	a second set of one or more processors (Murthi et al. Fig. 6 element 454 Voice Recognition Engine), configured to: 
 	determine that the audio data likely does not comprise data representing the designated keyword (Murthi et al. [0087] As noted above, a rich voice recognition engine may not operate on the sparse power available in standby mode. However, once the computing system 12 is activated by the standby activation unit 464 as described above, a voice recognition engine 194 (Fig. 2) may then confirm in step 454 whether the user did in fact speak the correct activation phrase. If so, the computing device may remain activated. If not, a signal may be sent to the power supply 474 to revert back to standby mode in step 456. The Examiner notes that as disclosed in steps 436, 440 and 450 in Fig. 6 of Murthi et al., the method in Murthi et al. determines that the audio data likely comprises data representing a designated keyword by matching the received audio data with the stored activation phrase(s) before doing the confirmation process. See more at paragraphs [0078, 0081 and 0082] of Murthi et al.); and 
generate, in response to determining that the audio data likely does not comprise data representing the designated keyword, an instruction regarding the audio data (Murthi et al. [0087] If so, the computing device may remain activated. If not, a signal may be sent to the power supply 474 to revert back to standby mode in step 456. Fig. 6 element 454 Confirmation by Voice Recognition Engine, No, element 456 Revert to Standby Mode.)
Miyazawa et al. and Murthi et al. are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of waking up the device as taught by Miyazawa et al., using teaching of the voice recognition engine as taught by Murthi et al for the benefit of confirming activation with higher degree of certainty (Murthi et al. [0088] The voice recognition engine may use more sophisticated algorithms than the pattern matching performed by the standby activation unit 464 to confirm activation with a much higher degree of certainty.)

 	With respect to Claim 30, Miyazawa et al. in view of Murthi et al. disclose 
 	wherein the designated keyword comprises a wakeword indicative of device-directed speech (Murthi et al. [0067] The activation phrase may be a simple two-word phrase such as “activate Microsoft Corporation,” the activation phrase may for example be “Xbox on.”

 	With respect to Claim 32, Miyazawa et al. in view of Murthi et al. disclose 
wherein the first set of one or more processors configured to determine that the audio data likely comprises data representing the voice activity is further configured to determine that an energy level represented by the audio data is equal to or greater than an energy level threshold (Miyazawa et al. col. 4 lines 36-50 input signal power detection may be used for ambient noise feedback purposes to enable a speech recognition mechanism to take into account perceived noise levels in formulating the volume of response message and other audible functions. In so doing, may set an initial threshold for eliminating noise, and perform power detection for a specified duration of time using this threshold as the reference...is greater than the steady noise level.)

 	With respect to Claim 39, Miyazawa et al. in view of Murthi et al. teach 
if the input sound signal is the keyword “Good morning,” the device switches to the active mode at this point and performs speech comprehension interaction control processing for the input speech signal (step s9), col. 13 lines 25-27 the device will response in a loud voice if the speaker’s voice is loud, and in a soft voice if the speaker’s voice is soft.)
	
 	With respect to Claim 41, Miyazawa et al. disclose
 	A computer-implemented method comprising: 
 	under control of a computing system comprising a plurality of processors (Miyazawa et al. col. 3 lines 18-20 Preferably, this power detector includes processing circuitry for forcing the mechanism to selectively enter or terminate a low-power sleep mode), 
 		receiving audio data representing sound detected by a microphone (Miyazawa et al. col. 6 lines 33 sound signal capture unit (here a microphone)); 
 	 	determining, by a first subset of the plurality of processors, that the audio data likely comprises data representing voice activity (Miyazawa et al. col. 3 lines 12-20 7) an input sound signal power detector in communication with at least the sound signal input unit and the interaction controller for detecting the volume, magnitude or amplitude of input sound signals based on sound signal waveforms perceived by the sound signal input unit or capture device, col. 10 lines 8-13 Control begins as step s1, as shown in Fig. 4. In step s1, input sound signal power detector 9 determines whether or not the power of the input sound signal is greater than a preset threshold th1, and outputs a signal indicating that a sound signal has been input when the power of the input sound signal becomes greater than threshold th1); 
 		in response to determining that the audio data likely comprises data representing the voice activity, determining, by the first subset of the plurality processors, that the audio data likely comprises data representing a designated keyword (Miyazawa et al. Fig. 4 element s6 Is it a keyword? Col. 10 lines 37-41 control passes on to step s6 and accumulated phrase detection data is used to determine whether or not the input sound signal contains a preregistered recognizable keyword using the above-described recognition techniques, col. 10 lines 48-51 if the input sound signal is determined to be a keyword in step s6, control instead passed to step s8 in which sleep mode flag is cleared for shifting the device from the sleep mode to the active mode); and
 	  in response to determining that the audio data likely comprises data representing the designated keyword, performing, by a second subset of the plurality of processors, speech recognition on at least a portion of the audio data to obtain speech recognition results (Miyazawa et al. col. 10 lines 51-55 if the input sound signal is the keyword “Good morning,” the device switches to the active mode at this point and performs speech comprehension interaction control processing for the input speech signal (step s9)); and 
	Miyazawa et al. fail to explicitly teach
 	determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword. 
 	However, Murthi et al. teach 
 	determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword (Murthi et al. [0087] As noted above, a rich voice recognition engine may not operate on the sparse power available in standby mode. However, once the computing system 12 is activated by the standby activation unit 464 as described above, a voice recognition engine 194 (Fig. 2) may then confirm in step 454 whether the user did in fact speak the correct activation phrase. If so, the computing device may remain activated. If not, a signal may be sent to the power supply 474 to revert back to standby mode in step 456. The Examiner notes that as disclosed in steps 436, 440 and 450 in Fig. 6 of Murthi et al., the method in Murthi et al. determines that the audio data likely comprises data representing a designated keyword by matching the received audio data with the stored activation phrase(s) before doing the confirmation process. See more at paragraphs [0078, 0081 and 0082] of Murthi et al.)
Miyazawa et al. and Murthi et al. are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of waking up the device as taught by Miyazawa et al., using teaching of the voice recognition engine as taught by Murthi et al for the benefit of confirming activation with higher degree of certainty (Murthi et al. [0088] The voice recognition engine may use more sophisticated algorithms than the pattern matching performed by the standby activation unit 464 to confirm activation with a much higher degree of certainty.)

 	With respect to Claim 43, Miyazawa et al. in view of Murthi et al. disclose 
 wherein determining, by the first subset of the plurality of processors, that the audio data likely comprises data representing voice activity is based at least partly on at least one of: an energy level represented by the audio data (Miyazawa et al. col. 4 lines 36-50 input signal power detection may be used for ambient noise feedback purposes to enable a speech recognition mechanism to take into account perceived noise levels in formulating the volume of response message and other audible functions. In so doing, may set an initial threshold for eliminating noise, and perform power detection for a specified duration of time using this threshold as the reference...is greater than the steady noise level), a spectral slope between two frames of the audio data, or a signal-to-noise ratio of the audio data within a spectral band.  

	With respect to Claim 49, Miyazawa et al. in view of Murthi et al. teach 
 further comprising generating an instruction in response to the determining, by the second subset of the plurality of processors, that the second audio data likely does not comprise data representing the designated keyword, wherein the instruction comprises at least one of: a first instruction to stop processing of the audio data, a second instruction to stop transmission of the audio data, or a third instruction to deactivate the second subset of the plurality of processors (Murthi et al. [0087] If so, the computing device may remain activated. If not, a signal may be sent to the power supply 474 to revert back to standby mode in step 456. Fig. 6 element 454 Confirmation by Voice Recognition Engine, No, element 456 Revert to Standby Mode.)

	With respect to Claim 50, Miyazawa et al. in view of Murthi et al. teach 
 	further comprising: 
 	activating at least one processor of the second subset of the plurality of processors in response to the determining, by the first subset of the plurality of processors, that the second audio data likely comprises data representing the designated keyword (Murthi et al. Fig. 6 steps 440, 450, Pattern match? Yes, Activate Device); and
a voice recognition engine 194 (Fig. 2) may then confirm in step 454 whether the user did in fact speak the correct activation phrase.)

 	With respect to Claim 51, Miyazawa et al. in view of Murthi et al. teach 
 	wherein the instruction regarding the audio data comprises an instruction to stop processing of the audio data (Murthi et al. Fig. 6 steps 454, 456 Confirmation by Voice Recognition Engine, No, Revert to Standby Mode, [0087] a voice recognition engine 194 (Fig. 2) may then confirm in step 454 whether the user did in fact speak the correct activation phrase. The Examiner notes that Voice Recognition Engine reverts to standby mode. It implies that the processing of the audio data is stopped.)

 	With respect to Claim 53, Miyazawa et al. in view of Murthi et al. teach 
 	wherein the second set of one or more processor are deactivated in response to the second set of one or more processors determining that the audio data likely does not comprise data representing the designated keyword (Murthi et al. Fig. 6 steps 454, 456 Confirmation by Voice Recognition Engine, No, Revert to Standby Mode, [0087] a voice recognition engine 194 (Fig. 2) may then confirm in step 454 whether the user did in fact speak the correct activation phrase.)

9.	 Claims 33 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Miyazawa et al. (5983186) and Murthi et al. (US 2013/0132095 A1) as applied to claim 29 above, and further in view of Kim et al. (US 2010/0110834 A1).

With respect to Claim 33, Miyazawa et al. in view of Murthi et al. teach all the limitations of Claim 29 upon which Claim 33 depends. Miyazawa et al. in view of Murthi et al. fails to explicitly teach
 wherein the first set of one or more processors configured to determine that the audio data likely comprises data representing the voice activity is further configured to determine at 
However, Kim et al. teach 
wherein the first set of one or more processors configured to determine that the audio data likely comprises data representing the voice activity is further configured to determine at least one of a spectral slope between two frames of the audio data or a signal-to-noise ratio of the audio data within a spectral band (Kim et al. [0005] the VAD (Voice activity detection) detects the presence and/or absence of voice signals using magnitude values of the frequency spectrums of input signal, such as energy of voice signals, Zero Crossing Rate (ZCR), Level Crossing Rate (LCR), Signal to Noise Ratio (SNR), the statistical distribution of frequency components, etc.)
Miyazawa et al., Murthi et al. and Kim et al. are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of waking up the device as taught by Miyazawa et al., using teaching of the voice recognition engine as taught by Murthi et al for the benefit of confirming activation with higher degree of certainty, using teaching of SNR as taught by Kim et al. for the benefit of detecting the presence of voice signal (Kim et al. [0005] the VAD (Voice activity detection) detects the presence and/or absence of voice signals using magnitude values of the frequency spectrums of input signal, such as energy of voice signals, Zero Crossing Rate (ZCR), Level Crossing Rate (LCR), Signal to Noise Ratio (SNR), the statistical distribution of frequency components, etc.)

10.	 Claims 34, 44 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Miyazawa et al. (5983186) and Murthi et al. (US 2013/0132095 A1) as applied to claims 29 and 41 above, and further in view of Zak (US 2005/0209858 A1.)

With respect to Claim 34, Miyazawa et al. in view of Murthi et al. teach all the limitations of Claim 29 upon which Claim 34 depends. Miyazawa et al. in view of Murthi et al. fails to explicitly teach
wherein the first set of one or more processors configured to determine that the audio data likely comprises data representing the voice activity comprises a first digital signal 
However, Zak teach	
wherein the first set of one or more processors configured to determine that the audio data likely comprises data representing the voice activity comprises a first digital signal processor, and wherein the first digital signal processor is further configured to activate a second digital processor in response to determining that the audio data likely comprises data representing the voice activity (Zak [0026] Speech processor 60 interfaces with microprocessor 62 and detects and recognizes speech input by a user via microphone 42. Generally, any speech processor known in the art may be used with the invention, for example, a digital signal processor (DSP). Speech processor 60 may include a voice activity detector (VAD) 54, a speech encoder (SPE) 56, and a voice recognition engine (VRE) 58, [0027] SPE 56 may also receive as input a signal output from VAD 54. The signal from VAD 54 may, for example, enable/disable SPE 56 in accordance with the voice activity/inactivity indication output by VAD 54. [0028] VRE 58 compares the encoded speech to a plurality of predetermined voice commands stored in memory 64. VRE 58 may recognize a limited vocabulary or may be more sophisticated as desired. The Examiner notes that both VAD, SPE and VRE are processed by digital signal processor.)
 Miyazawa et al., Murthi et al. and Zak et al. are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of waking up the device as taught by Miyazawa et al., using teaching of the voice recognition engine as taught by Murthi et al for the benefit of confirming activation with higher degree of certainty, using teaching of VAD as taught by Zak et al. for the benefit of enabling/disabling SPE in accordance with the voice activity/inactivity indication output by VAD (Zak [0026] Speech processor 60 interfaces with microprocessor 62 and detects and recognizes speech input by a user via microphone 42. Generally, any speech processor known in the art may be used with the invention, for example, a digital signal processor (DSP). Speech processor 60 may include a voice activity detector (VAD) 54, a speech encoder (SPE) 56, and a voice recognition engine (VRE) 58, [0027] SPE 56 may also receive as input a signal output from VAD 54. The signal from VAD 54 may, for example, enable/disable SPE 56 in accordance with the voice activity/inactivity indication output by VAD 54.)

With respect to Claim 44, Miyazawa et al. in view of Murthi et al. teach all the limitations of Claim 41 upon which Claim 44 depends. Miyazawa et al. in view of Murthi et al. fails to explicitly teach
 	further comprising activating, by the first subset of the plurality of processors, a digital signal processor in response to determining that the audio data likely comprises data representing the voice activity, wherein the first subset of the plurality of processors comprises the digital signal processor, and wherein the determining that the audio data likely comprises data representing the designated keyword is performed using the digital signal processor.  
 	However, Zak teaches
further comprising activating, by the first subset of the plurality of processors, a digital signal processor in response to determining that the audio data likely comprises data representing the voice activity, wherein the first subset of the plurality of processors comprises the digital signal processor, and wherein the determining that the audio data likely comprises data representing the designated keyword is performed using the digital signal processor (Zak [0024] memory 64 may store predetermined keywords or voice command recognized by speech processor 60, [0026] Speech processor 60 interfaces with microprocessor 62 and detects and recognizes speech input by a user via microphone 42. Generally, any speech processor known in the art may be used with the invention, for example, a digital signal processor (DSP). Speech processor 60 may include a voice activity detector (VAD) 54, a speech encoder (SPE) 56, and a voice recognition engine (VRE) 58, [0027] SPE 56 may also receive as input a signal output from VAD 54. The signal from VAD 54 may, for example, enable/disable SPE 56 in accordance with the voice activity/inactivity indication output by VAD 54. [0028] VRE 58 compares the encoded speech to a plurality of predetermined voice commands stored in memory 64. VRE 58 may recognize a limited vocabulary or may be more sophisticated as desired. The Examiner notes that both VAD, SPE and VRE are processed by digital signal processor.)
 Miyazawa et al., Murthi et al. and Zak et al. are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to  using teaching of VAD as taught by Zak et al. for the benefit of enabling/disabling SPE in accordance with the voice activity/inactivity indication output by VAD (Zak [0026] Speech processor 60 interfaces with microprocessor 62 and detects and recognizes speech input by a user via microphone 42. Generally, any speech processor known in the art may be used with the invention, for example, a digital signal processor (DSP). Speech processor 60 may include a voice activity detector (VAD) 54, a speech encoder (SPE) 56, and a voice recognition engine (VRE) 58, [0027] SPE 56 may also receive as input a signal output from VAD 54. The signal from VAD 54 may, for example, enable/disable SPE 56 in accordance with the voice activity/inactivity indication output by VAD 54.)

11.	 Claim 35 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Miyazawa et al. (5983186) and Murthi et al. (US 2013/0132095 A1) as applied to claim 29 above, and further in view of Weng et al. (US 2013/0173268 A1). 

With respect to Claim 35, Miyazawa et al. in view of Murthi et al. teach all the limitations of Claim 29 upon which Claim 35 depends. Miyazawa et al. in view of Murthi et al. fails to explicitly teach
wherein the first set of one or more processors configured to determine that the audio data likely comprises data representing a designated keyword comprises a digital signal processor, and wherein the digital signal processor is further configured to activate a microprocessor in response to determining that the audio data likely comprises data representing the designated keyword.
However, Weng et al. teach 
 	wherein the first set of one or more processors configured to determine that the audio data likely comprises data representing a designated keyword comprises a digital signal processor, and wherein the digital signal processor is further configured to activate a microprocessor in response to determining that the audio data likely comprises data representing the designated keyword (Weng et al. [0018] The audio data processor 112 compares the generated utterance data to predetermined utterance data 134 in the memory 128 that corresponds to one or more trigger phrases. If the generated utterance data correspond to the utterance data of the predetermined trigger phrase, the controller 134 activates other components in the telemedical device 100, including a speaker verification module, [0028] one or both of the audio data processor 112 and speaker verification module 116 include specialized processing devices such as digital signal processors (DSPs). The Examiner notes that the DSP processor is a particular type of microprocessor. In other words, the DSP is a microprocessor).
 Miyazawa et al., Murthi et al. and Weng et al. are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of waking up the device as taught by Miyazawa et al., using teaching of the voice recognition engine as taught by Murthi et al for the benefit of confirming activation with higher degree of certainty, using teaching of the audio data processor as taught by Weng et al. for the benefit of activating other component (Weng et al. [0018] The audio data processor 112 compares the generated utterance data to predetermined utterance data 134 in the memory 128 that corresponds to one or more trigger phrases. If the generated utterance data correspond to the utterance data of the predetermined trigger phrase, the controller 134 activates other components in the telemedical device 100, including a speaker verification module, [0028] one or both of the audio data processor 112 and speaker verification module 116 include specialized processing devices such as digital signal processors (DSPs).)

12.	 Claims 36, 37, 38, 46, 47, 52 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Miyazawa et al. (5983186) and Murthi et al. (US 2013/0132095 A1) as applied to claims 29 and 41 above, and further in view of Soemo et al. (US 2013/0060571 A1). 

With respect to Claim 36, Miyazawa et al. in view of Murthi et al. teach 
 	further comprising a network interface component (Murthi et al. [0059] a router), 
 	wherein the audio input component is further configured to generate second audio data representing sound detected by the microphone (Miyazawa col. 6 lines 43-48 The sound signal input from the microphone is first passed through the amplifier and the lowpass filter and converted into an appropriate sound waveform. This waveform is converted into a digital signal (e.g., 12 KHz, 16 bits) by the A/D converter, which is then sent to sound signal analyzer 2); 
 	wherein the first set of one or more processors (Miyazawa et al. col. 3 lines 18-20 Preferably, this power detector includes processing circuitry for forcing the mechanism to selectively enter or terminate a low-power sleep mode) is further configured to: 
 	determine that the second audio data likely comprises data representing voice activity (Miyazawa et al. col. 3 lines 12-20 7) an input sound signal power detector in communication with at least the sound signal input unit and the interaction controller for detecting the volume, magnitude or amplitude of input sound signals based on sound signal waveforms perceived by the sound signal input unit or capture device, col. 10 lines 8-13 Control begins as step s1, as shown in Fig. 4. In step s1, input sound signal power detector 9 determines whether or not the power of the input sound signal is greater than a preset threshold th1, and outputs a signal indicating that a sound signal has been input when the power of the input sound signal becomes greater than threshold th1); and 
 	determine, in response to determining that the second audio data likely comprises data representing second voice activity, that the second audio data likely comprises data representing the designated keyword (Miyazawa et al. Fig. 4 element s6 Is it a keyword? Col. 10 lines 37-41 control passes on to step s6 and accumulated phrase detection data is used to determine whether or not the input sound signal contains a preregistered recognizable keyword using the above-described recognition techniques, col. 10 lines 48-51 if the input sound signal is determined to be a keyword in step s6, control instead passed to step s8 in which sleep mode flag is cleared for shifting the device from the sleep mode to the active mode); and 
 	wherein the second set of one or more processors (Murthi et al. Fig. 6 element 454 Voice Recognition Engine) is further configured to: 
 	determine that the second audio data likely comprises data representing the designated keyword (Murthi et al. [0087] As noted above, a rich voice recognition engine may not operate on the sparse power available in standby mode. However, once the computing system 12 is activated by the standby activation unit 464 as described above, a voice recognition engine 194 (Fig. 2) may then confirm in step 454 whether the user did in fact speak the correct activation phrase. If so, the computing device may remain activated. If not, a signal may be sent to the power supply 474 to revert back to standby mode in step 456. The Examiner notes that as ; 
	Miyazawa et al. in view of Murthi et al. fail to explicitly teach
 	cause the network interface component to send a transmission of at least a portion of the second audio data to a remote computing system, wherein the portion of the second audio data represents an utterance; and 
 	receive speech recognition results from the remote computing system. 
	However, Soemo et al. teach
 	cause the network interface component to send a transmission of at least a portion of the second audio data to a remote computing system, wherein the portion of the second audio data represents an utterance (Soemo et al. [0003] The method further includes performing local speech recognition on each of the one or more audio recordings including detecting a first utterance and detecting one or more keywords within the first utterance. The method further includes transmitting the first utterance and the one or more keywords to a second computing device); and 
 	receive speech recognition results from the remote computing system (Soemo et al. [0003] receiving a first response from the second computing device based on the first utterance.)
  	Miyazawa et al., Murthi et al. and Soemo et al. are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of waking up the device as taught by Miyazawa et al., using teaching of the voice recognition engine as taught by Murthi et al for the benefit of confirming activation with higher degree of certainty, using teaching of integrating local speech recognition with cloud-based speech recognition as taught by Soemo et al. for the benefit of providing an efficient natural user interface (Soemo et al. [0002] integrating local speech recognition with cloud-based speech recognition in order to provide an efficient natural user interface.)

	With respect to Claim 37, Miyazawa et al. in view of Murthi et al. and Soemo et al. teach
third audio data representing an audio output, and wherein the second set of one or more processors is further configured to cause the speaker to present the audio output (Miyazawa et al. col. 13 lines 25-27 the device will response in a loud voice if the speaker’s voice is loud, and in a soft voice if the speaker’s voice is soft.)

 	With respect to Claim 38, Miyazawa et al. in view of Murthi et al. and Soemo et al. teach 
 	comprise text data representing the utterance, and wherein the second set of one or more processors is further configured to determine an audio response to the utterance using the speech recognition results (Soemo et al. [0039] the one or more servers may return text associated with the one or more words.)

With respect to Claim 46, Miyazawa et al. in view of Murthi et al. teach 
 	further comprising:
 	receiving second audio data representing sound detected by the microphone (Miyazawa col. 6 lines 43-48 The sound signal input from the microphone is first passed through the amplifier and the lowpass filter and converted into an appropriate sound waveform. This waveform is converted into a digital signal (e.g., 12 KHz, 16 bits) by the A/D converter, which is then sent to sound signal analyzer 2); 
 	determining, by the first subset of the plurality of processors, that the second audio data likely comprises data representing voice activity (Miyazawa et al. col. 3 lines 18-20 Preferably, this power detector includes processing circuitry for forcing the mechanism to selectively enter or terminate a low-power sleep mode, col. 3 lines 12-20 7) an input sound signal power detector in communication with at least the sound signal input unit and the interaction controller for detecting the volume, magnitude or amplitude of input sound signals based on sound signal waveforms perceived by the sound signal input unit or capture device, col. 10 lines 8-13 Control begins as step s1, as shown in Fig. 4. In step s1, input sound signal power detector 9 determines whether or not the power of the input sound signal is greater than a preset threshold th1, and outputs a signal indicating that a sound signal has been input when the power of the input sound signal becomes greater than threshold th1); 
in response to determining that the second audio data likely comprises data representing voice activity, determining, by the first subset of the plurality of processors, that the second audio data likely comprises data representing the designated keyword (Miyazawa et al. Fig. 4 element s6 Is it a keyword? Col. 10 lines 37-41 control passes on to step s6 and accumulated phrase detection data is used to determine whether or not the input sound signal contains a preregistered recognizable keyword using the above-described recognition techniques, col. 10 lines 48-51 if the input sound signal is determined to be a keyword in step s6, control instead passed to step s8 in which sleep mode flag is cleared for shifting the device from the sleep mode to the active mode); 
 	determining, by the second subset of the plurality of processors, that the second audio data likely comprises data representing the designated keyword (Murthi et al. Fig. 6 element 454 Voice Recognition Engine, (Murthi et al. [0087] As noted above, a rich voice recognition engine may not operate on the sparse power available in standby mode. However, once the computing system 12 is activated by the standby activation unit 464 as described above, a voice recognition engine 194 (Fig. 2) may then confirm in step 454 whether the user did in fact speak the correct activation phrase. If so, the computing device may remain activated. If not, a signal may be sent to the power supply 474 to revert back to standby mode in step 456. The Examiner notes that as disclosed in steps 436, 440 and 450 in Fig. 6 of Murthi et al., the method in Murthi et al. determines that the audio data likely comprises data representing a designated keyword by matching the received audio data with the stored activation phrase(s) before doing the confirmation process. See more at paragraphs [0078, 0081 and 0082] of Murthi et al.); 
 	Miyazawa et al. in view of Murthi et al. fail to explicitly teach
 	sending at least a portion of the second audio data to a remote computing system; and
 	receiving speech recognition results from the remote computing system.  
	However, Soemo et al. teach
 	sending at least a portion of the second audio data to a remote computing system (Soemo et al. [0003] The method further includes performing local speech recognition on each of the one or more audio recordings including detecting a first utterance and detecting one or more keywords within the first utterance. The method further includes transmitting the first utterance and the one or more keywords to a second computing device); and 
receiving a first response from the second computing device based on the first utterance.)
  	Miyazawa et al., Murthi et al. and Soemo et al. are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of waking up the device as taught by Miyazawa et al., using teaching of the voice recognition engine as taught by Murthi et al for the benefit of confirming activation with higher degree of certainty, using teaching of integrating local speech recognition with cloud-based speech recognition as taught by Soemo et al. for the benefit of providing an efficient natural user interface (Soemo et al. [0002] integrating local speech recognition with cloud-based speech recognition in order to provide an efficient natural user interface.)

 	With respect to Claim 47, Miyazawa et al. in view of Murthi et al. and Soemo et al. teach
 	further comprising presenting audio output using the speech recognition results, wherein the speech recognition results comprise third audio data representing the audio output (Miyazawa et al. col. 13 lines 25-27 the device will response in a loud voice if the speaker’s voice is loud, and in a soft voice if the speaker’s voice is soft.)

 	With respect to Claim 52, Miyazawa et al. in view of Murthi et al. fail to explicitly teach
further comprising a network interface, wherein the instruction regarding the audio data comprises an instruction to stop transmission of the audio data by the network interface.  
However, Soemo et al. teach 
further comprising a network interface, wherein the instruction regarding the audio data comprises an instruction to stop transmission of the audio data by the network interface (Soemo et al. [0003] The method further includes performing local speech recognition on each of the one or more audio recordings including detecting a first utterance and detecting one or more keywords within the first utterance. The method further includes transmitting the first utterance and the one or more keywords to a second computing device, [0060] if no keywords within an utterance are detected, then subsequent cloud-based speech processing is not performed, Fig. 5A step 513, Have one or more keywords been detected within the first utterance? No. The method is back to 510, the method is not going to transmitting step, the transmission stops.)
The method further includes performing local speech recognition on each of the one or more audio recordings including detecting a first utterance and detecting one or more keywords within the first utterance. The method further includes transmitting the first utterance and the one or more keywords to a second computing device, [0060] if no keywords within an utterance are detected, then subsequent cloud-based speech processing is not performed, Fig. 5A step 513, Have one or more keywords been detected within the first utterance? No. The method is back to 510, the method is not going to transmitting step, the transmission stops.)

Conclusion 
13.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429.  The examiner can normally be reached on Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RICHEMOND DORVIL can be reached on 571-272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR 





/THUYKHANH LE/Primary Examiner, Art Unit 2658