DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1.	The present application is being examined under the pre-AIA  first to invent provisions. 

Response to Amendments/Arguments
2.	With respect to Claims 29 and 41, Examiner notes that the amended claims 29 and 41 are not the same as in the proposal amendment for claims 29 and 41 in the interview on 06/02/2021. 
	With respect to 102/103 rejection, Applicant’s arguments have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Objections
3.	Claim 29 (it is at the end of the claim set) is objected to because of the following informalities: formal issue. This should be numbered as Claim 56. For compact prosecution, Examiner interprets this claim is Claim 56. Appropriate correction is required.

Double Patenting
4.	The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

5.	Claims 29, 32, 36, 41 and 46 of the pending application 16/443160 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 5, 19, 5, 5, 5 of the issued patent US 9704486 B2 in view of Shenhav (US 2014/0006825 A1) respectively. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims of the pending application are similar in scope in comparison to the issued patent in view of Shenhav et al. 
 	Claim 5 of the issued patent does not teach the following limitation as recited in claim 29 of the pending application. More specifically, 
 	a second set of one or more processors, configured to: 
 		determine that the audio data likely does not comprise data representing the designated keyword; and 
 		generate, in response to determining that the audio data likely does not comprise data representing the designated keyword, information related to a determination not to transmit the audio data.  
	However, Shenhav teaches
If the electronic device determines a relatively high and/or high enough likelihood that the sound signal may be representative of a wake-up phase, then the electronic device may transmit the sound signal to a remote server, such as a recognition server, to further analyze the sound signal and determine of whether the sound signal is indeed representative of wake-up phrase), configured to: 
 	determine that the audio data likely does not comprise data representing the designated keyword (Shenhav Fig. 5 elements 506, 508, 510, 512 [0058] At block 506, it may be determined if the sound signal corresponds to a correct wake-up phrase, [0059] At block 506 if the correct wake-up phrase is not detected in the wake-up inquiry request, then at optional block 508, the recognition server 204 and associated processors 260 may log the results/message statistics of the inquiry); and 
 	generate, in response to determining that the audio data likely does not comprise data representing the designated keyword, information related to a determination not to transmit the audio data (Shenhav Fig. 5 elements 506, 508, 510, 512, [0058] At block 506, it may be determined if the sound signal corresponds to a correct wake-up phrase, [0059] At block 506 if the correct wake-up phrase is not detected in the wake-up inquiry request, then at optional block 508, the recognition server 204 and associated processors 260 may log the results/message statistics of the inquiry, [0060] If at block 506, it is determined that the received sound signal does correspond to a wake-up phrase, then the recognition server 204 may, at block 510, may process the logged results and/or statistics of the wake-up recognition. The method 500 may proceed to transmit a wake-up signal to the mobile device 200 at block 512. The wake-up signal, as described above, may enable the processors 212 to awake into an on state from a stand by state.)
 Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the issued patent to incorporate the voice recognition engine taught by Shenhav for the benefit of determining whether or not transmitting the wake-up signal (Shenhav Fig. 5 elements 506, 508, 510, 512, [0058] At block 506, it may be determined if the sound signal corresponds to a correct wake-up phrase, [0059] At block 506 if the correct wake-up phrase is not detected in the wake-up inquiry request, then at optional block 508, the recognition server 204 and associated processors 260 may log the results/message statistics of the inquiry, [0060] If at block 506, it is determined that the received sound signal does correspond to a wake-up phrase, then the recognition server 204 may, at block 510, may process the logged results and/or statistics of the wake-up recognition. The method 500 may proceed to transmit a wake-up signal to the mobile device 200 at block 512. The wake-up signal, as described above, may enable the processors 212 to awake into an on state from a stand by state.)

 	Claim 5 of the issued patent does not teach the following limitation as recited in claim 41 of the pending application. More specifically, 
 	determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword. 
 	However, Shenhav teaches 
determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword (Shenhav Fig. 5 elements 506, 508, 510, 512, [0058] At block 506, it may be determined if the sound signal corresponds to a correct wake-up phrase, [0059] At block 506 if the correct wake-up phrase is not detected in the wake-up inquiry request, then at optional block 508, the recognition server 204 and associated processors 260 may log the results/message statistics of the inquiry, [0060] If at block 506, it is determined that the received sound signal does correspond to a wake-up phrase, then the recognition server 204 may, at block 510, may process the logged results and/or statistics of the wake-up recognition. The method 500 may proceed to transmit a wake-up signal to the mobile device 200 at block 512. The wake-up signal, as described above, may enable the processors 212 to awake into an on state from a stand by state.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the issued patent to incorporate the voice recognition engine as taught by Shenhav for the benefit of verifying of whether the sound is indeed representation of one or more wake-up phrases (Shenhav [0016] If it is determined that the sound may be indicative of one or more wake-up phrases, then the electronic device may transmit a signal representative of the sound to the recognition server for further verification of whether the sound is indeed representative of one or more wake-up phrases. The recognition server may conduct this verification using computing and analysis resources, which in certain embodiments, may exceed the computing bandwidth of the relatively lower bandwidth processors of the electronic device.)

Pending Application 16/443160
Issued Patent US 9,704,486 B2
29. (Currently amended) A system comprising: 
 	an audio input component comprising a microphone, wherein the audio input component is configured to generate audio data representing sound detected by the microphone; 
 	a first set of one or more processors, configured to: 
 		determine that the audio data likely comprises data representing voice activity; and 
 		determine, in response to determining that the audio data likely comprises data representing the voice activity, that the audio data likely comprises data representing a designated keyword; and 
 	a second set of one or more processors, configured to: 
 		determine that the audio data likely does not comprise data representing the designated keyword; and 
 		generate, in response to determining that the audio data likely does not comprise data representing the designated keyword, information related to a determination not to transmit the audio data.  


receiving an audio input; 
determining one or more values from the audio input, wherein the one or more values comprise at least one of: 
a first value indicating an energy level of the audio input; or 
a second value indicating a likelihood that the audio input comprises speech; 
increasing a sampling rate of the audio input, from a first lower sampling rate to a second higher sampling rate, based at least in part on the one or more values; 
activating a first module of the first computing device based at least in part on the one or more values; 
performing an operation, by the first module, wherein the operation comprises at least one of: 

performing speech recognition on at least a portion of the audio input to obtain speech recognition results; or 
causing transmission of at least a portion of the audio input to a second computing device. 

19. The computer-implemented method of claim 5, further comprising determining that the energy level of the audio input satisfies a threshold, wherein the increasing the sampling rate is performed in response to determining that the energy level of the audio input satisfies the threshold. di
36. (Currently amended) The system of claim 29, further comprising a network interface component, 
 	wherein the audio input component is further configured to generate second audio data representing sound detected by the microphone; 

 	determine, in response to determining that the second audio data likely comprises data representing second voice activity, that the second audio data likely comprises data representing the designated keyword; and wherein the second set of one or more processors is further configured to: 
 	determine that the second audio data likely comprises data representing the designated keyword; 
 	cause the network interface component to send a transmission of at least a portion of the second audio data to a remote computing system, wherein the portion of the second audio data represents an utterance; and receive speech recognition results from the remote computing system.  


receiving an audio input; 

a first value indicating an energy level of the audio input; or 
a second value indicating a likelihood that the audio input comprises speech; 
increasing a sampling rate of the audio input, from a first lower sampling rate to a second higher sampling rate, based at least in part on the one or more values; 
activating a first module of the first computing device based at least in part on the one or more values; 
performing an operation, by the first module, wherein the operation comprises at least one of: 
determining that the audio input comprises a wakeword and causing activation of a network interface module in response to determining that the audio input comprises a wakeword, wherein causing activation of the network interface module comprises providing power to the network interface module; 

causing transmission of at least a portion of the audio input to a second computing device. 

 	under control of a computing system comprising a plurality of processors, 
 		receiving audio data representing sound detected by a microphone; 
 		determining, by a first subset of the plurality of processors, that the audio data likely comprises data representing voice activity based at least partly on one of: a difference between two or more frames of the audio data; a classification model; or a state model; 
 	in response to determining that the audio data likely comprises data representing the voice activity, determining, by the first subset of the plurality processors, that the audio data likely comprises data representing a designated keyword; and 
 	in response to determining that the audio data likely comprises data representing the designated keyword: 
 		performing, by a second subset of the plurality of processors, speech 
 		determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword.  
  

receiving an audio input; 
determining one or more values from the audio input, wherein the one or more values comprise at least one of: 
a first value indicating an energy level of the audio input; or 
a second value indicating a likelihood that the audio input comprises speech; 
increasing a sampling rate of the audio input, from a first lower sampling rate to a second higher sampling rate, based at least in part on the one or more values; 
activating a first module of the first computing device based at least in part on the one or more values; 
performing an operation, by the first module, wherein the operation comprises at least one 
performing speech recognition on at least a portion of the audio input to obtain speech recognition results; or 
causing transmission of at least a portion of the audio input to a second computing device.

 	receiving second audio data representing sound detected by the microphone; 
 	determining, by the first subset of the plurality of processors, that the second audio data likely comprises data representing voice activity; 
 	in response to determining that the second audio data likely comprises data representing voice activity, determining, by the first subset of the plurality of processors, that the second audio data likely comprises data representing the designated keyword; 

 	sending at least a portion of the second audio data to a remote computing system; and
 	receiving speech recognition results from the remote computing system.  


receiving an audio input; 
determining one or more values from the audio input, wherein the one or more values comprise at least one of: 
a first value indicating an energy level of the audio input; or 
a second value indicating a likelihood that the audio input comprises speech; 
increasing a sampling rate of the audio input, from a first lower sampling rate to a second 
activating a first module of the first computing device based at least in part on the one or more values; 
performing an operation, by the first module, wherein the operation comprises at least one of: determining that the audio input comprises a wakeword and causing activation of a network interface module in response to determining that the audio input comprises a wakeword, wherein causing activation of the network interface module comprises providing power to the network interface module; 
performing speech recognition on at least a portion of the audio input to obtain speech recognition results; or 
causing transmission of at least a portion of the audio input to a second computing device.


 	This is a non-provisional nonstatutory double patenting rejection because the patentably indistinct claims have in fact been patented.

6.	Claims 29, 35, 36-38, 41, 46 of the pending application 16/443160 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 4, 1, 5-7, 7 of the issued patent US 10,325,598 B2 in view of Shehav (US 2014/0006825 A1) respectively. Although the claims at issue are not identical, they are not patentably distinct from each other 
 	Claim 1 of the issued patent US 10,325,598 B2 does not teach the following limitation as recited in claim 29 of the pending application. More specifically, 
 	a second set of one or more processors, configured to: 
 		determine that the audio data likely does not comprise data representing the designated keyword; and 
 		generate, in response to determining that the audio data likely does not comprise data representing the designated keyword, information related to a determination not to transmit the audio data.  
 	However, Shenhav teaches
a second set of one or more processors (Shenhav [0014] If the electronic device determines a relatively high and/or high enough likelihood that the sound signal may be representative of a wake-up phase, then the electronic device may transmit the sound signal to a remote server, such as a recognition server, to further analyze the sound signal and determine of whether the sound signal is indeed representative of wake-up phrase), configured to: 
 	determine that the audio data likely does not comprise data representing the designated keyword (Shenhav Fig. 5 elements 506, 508, 510, 512 [0058] At block 506, it may be determined if the sound signal corresponds to a correct wake-up phrase, [0059] At block 506 if the correct wake-up phrase is not detected in the wake-up inquiry request, then at optional block 508, the recognition server 204 and associated processors 260 may log the results/message statistics of the inquiry); and 
 	generate, in response to determining that the audio data likely does not comprise data representing the designated keyword, information related to a determination not to transmit the audio data (Shenhav Fig. 5 elements 506, 508, 510, 512, [0058] At block 506, it may be determined if the sound signal corresponds to a correct wake-up phrase, [0059] At block 506 if the correct wake-up phrase is not detected in the wake-up inquiry request, then at optional block 508, the recognition server 204 and associated processors 260 may log the results/message statistics of the inquiry, [0060] If at block 506, it is determined that the received sound signal does correspond to a wake-up phrase, then the recognition server 204 may, at block 510, may process the logged results and/or statistics of the wake-up recognition. The method 500 may proceed to transmit a wake-up signal to the mobile device 200 at block 512. The wake-up signal, as described above, may enable the processors 212 to awake into an on state from a stand by state.)
 Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the issued patent to incorporate the voice recognition engine as taught by Shenhav for the benefit of determining whether or not transmitting the wake-up signal (Shenhav Fig. 5 elements 506, 508, 510, 512, [0058] At block 506, it may be determined if the sound signal corresponds to a correct wake-up phrase, [0059] At block 506 if the correct wake-up phrase is not detected in the wake-up inquiry request, then at optional block 508, the recognition server 204 and associated processors 260 may log the results/message statistics of the inquiry, [0060] If at block 506, it is determined that the received sound signal does correspond to a wake-up phrase, then the recognition server 204 may, at block 510, may process the logged results and/or statistics of the wake-up recognition. The method 500 may proceed to transmit a wake-up signal to the mobile device 200 at block 512. The wake-up signal, as described above, may enable the processors 212 to awake into an on state from a stand by state.)

 	Claim 7 of the issued patent US 10,325,598 B2 does not teach the following limitation as recited in claim 41 of the pending application. More specifically, 
 	determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword. 
 	However, Shenhav teaches 
determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword (Shenhav Fig. 5 elements 506, 508, 510, 512, [0058] At block 506, it may be determined if the sound signal corresponds to a correct wake-up phrase, [0059] At block 506 if the correct wake-up phrase is not detected in the wake-up inquiry request, then at optional block 508, the recognition server 204 and associated processors 260 may log the results/message statistics of the inquiry, [0060] If at block 506, it is determined that the received sound signal does correspond to a wake-up phrase, then the recognition server 204 may, at block 510, may process the logged results and/or statistics of the wake-up recognition. The method 500 may proceed to transmit a wake-up signal to the mobile device 200 at block 512. The wake-up signal, as described above, may enable the processors 212 to awake into an on state from a stand by state.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the issued patent to incorporate the voice recognition engine as taught by Shenhav for the benefit of verifying of whether the sound is indeed representation of one or more wake-up phrases (Shenhav [0016] If it is determined that the sound may be indicative of one or more wake-up phrases, then the electronic device may transmit a signal representative of the sound to the recognition server for further verification of whether the sound is indeed representative of one or more wake-up phrases. The recognition server may conduct this verification using computing and analysis resources, which in certain embodiments, may exceed the computing bandwidth of the relatively lower bandwidth processors of the electronic device.)

Pending Application 16/443160
Issued Patent US 10,325,598 B2
29. (Currently amended) A system comprising: 
 	an audio input component comprising a microphone, wherein the audio input component is configured to generate audio data representing sound detected by the microphone; 
 	a first set of one or more processors, configured to: 
 		determine that the audio data likely comprises data representing voice activity; and 
 		determine, in response to determining that the audio data likely comprises data representing the voice activity, that the audio data likely comprises data representing a designated keyword; and 

 		determine that the audio data likely does not comprise data representing the designated keyword; and 
 		generate, in response to determining that the audio data likely does not comprise data representing the designated keyword, information related to a determination not to transmit the audio data.  


 a network interface component;
an audio input component configured to receive an audio input; and 
one or more processors configured to: 
 	determine that an energy level of the audio input satisfies a threshold; 
 	determine, in response to determining that the energy level satisfies the threshold, that the audio input likely comprises data representing an utterance; 
 	determine, in response to determining that the audio input likely comprises data 
 	cause transmission of the audio input by the network interface component in response to determining that the audio input likely comprises data representing the wakeword; 
 wherein the network interface component is configured to: 
 	transmit the audio input to a remote computing system; 
 	receive speech recognition results from the remote computing system; 
 	receive confirmation data from the remote computing system, wherein the confirmation data indicates that the audio input likely comprises data representing the wakeword; 
 	transmit a subsequent audio input to the remote computing system based at least partly on receiving the confirmation data; and 
 	receive subsequent speech recognition results from the remote computing system. 

4. The system of claim 1, wherein the one or more processors comprise a digital signal processor and a microprocessor, wherein the digital signal processor is configured to activate the microprocessor in response to determining that the audio input likely comprises data representing the utterance, and wherein the microprocessor determines that the audio input likely comprises data representing the wakeword. 
36. (Currently amended) The system of claim 29, further comprising a network interface component, 
 	wherein the audio input component is further configured to generate second audio data representing sound detected by the microphone; 
 	wherein the first set of one or more processors is further configured to: determine that the second audio data likely comprises data representing voice activity; and 
 	determine, in response to determining that the second audio data likely comprises data representing second voice activity, that the second audio data likely comprises data representing the designated keyword; and wherein the second set of one or more processors is further configured to: 

 	cause the network interface component to send a transmission of at least a portion of the second audio data to a remote computing system, wherein the portion of the second audio data represents an utterance; and receive speech recognition results from the remote computing system.  



 a network interface component;
an audio input component configured to receive an audio input; and 
one or more processors configured to: 
 	determine that an energy level of the audio input satisfies a threshold; 
 	determine, in response to determining that the energy level satisfies the threshold, that the audio input likely comprises data representing an utterance; 
 	determine, in response to determining that the audio input likely comprises data representing the utterance, that the audio input likely comprises data representing a 
 	cause transmission of the audio input by the network interface component in response to determining that the audio input likely comprises data representing the wakeword; 
 wherein the network interface component is configured to: 
 	transmit the audio input to a remote computing system; 
 	receive speech recognition results from the remote computing system; 
 	receive confirmation data from the remote computing system, wherein the confirmation data indicates that the audio input likely comprises data representing the wakeword; 
 	transmit a subsequent audio input to the remote computing system based at least partly on receiving the confirmation data; and 
 	receive subsequent speech recognition results from the remote computing system. 



6. The system of claim 1, wherein the one or more processors are further configured to determine a response to the utterance using the speech recognition results, wherein the speech recognition results comprise a transcription of the utterance. 
41. (Currently amended) A computer-implemented method comprising: 
 	under control of a computing system comprising a plurality of processors, 
 		receiving audio data representing sound detected by a microphone; 
 		determining, by a first subset of the plurality of processors, that the audio data likely comprises data representing voice activity based at least partly on one of: a difference between two or more frames of the audio data; a classification model; or a state model; 
 	in response to determining that the audio data likely comprises data representing the voice activity, determining, by the first subset of the plurality processors, that the audio data likely comprises data representing a designated keyword; and 

 		performing, by a second subset of the plurality of processors, speech recognition on at least a portion of the audio data to obtain speech recognition results; and 
 		determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword.  



under control of a computing system configured to execute specific computer-executable instructions, receiving an audio input; 
determining that an energy level of the audio input satisfies a threshold; 
in response to determining that the energy level satisfies the threshold, determining that the audio input likely comprises data representing an utterance; 
in response to determining that audio input likely comprises data representing the utterance, determining that the audio input likely comprises data representing a 
in response to determining that the audio input likely comprises data representing the wakeword, transmitting the audio input to a remote computing system; 
receiving speech recognition results from the remote computing system; 
receiving confirmation data from the remote computing system, wherein the confirmation data indicates that the audio input likely comprises data representing the wakeword; 
transmitting a subsequent audio input to the remote computing system based at least partly on receiving the confirmation data; and 
receiving subsequent speech recognition results from the remote computing system. 

 	receiving second audio data representing sound detected by the microphone; 
 	determining, by the first subset of the plurality of processors, that the second audio data likely comprises data representing voice activity; 

 	determining, by the second subset of the plurality of processors, that the second audio data likely comprises data representing the designated keyword; 
 	sending at least a portion of the second audio data to a remote computing system; and
 	receiving speech recognition results from the remote computing system.  



under control of a computing system configured to execute specific computer-executable instructions, receiving an audio input; 
determining that an energy level of the audio input satisfies a threshold; 

in response to determining that audio input likely comprises data representing the utterance, determining that the audio input likely comprises data representing a wakeword indicative of device-directed speech; 
in response to determining that the audio input likely comprises data representing the wakeword, transmitting the audio input to a remote computing system; 
receiving speech recognition results from the remote computing system; 
receiving confirmation data from the remote computing system, wherein the confirmation data indicates that the audio input likely comprises data representing the wakeword; 
transmitting a subsequent audio input to the remote computing system based at least partly on receiving the confirmation data; and 
receiving subsequent speech recognition results from the remote computing system. 




Claim Rejections - 35 USC § 103
7.	The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.

8.	 Claims 29, 30, 32, 36, 37, 39 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Miyazawa et al. (5983186) and Shenhav (US 2014/0006825 A1). 
  
 	With respect to Claim 29, Miyazawa et al. disclose
 	A system comprising: 
 	an audio input component comprising a microphone, wherein the audio input component is configured to generate audio data representing sound detected by the microphone (Miyazawa et al. col. 6 lines 33 sound signal capture unit (here a microphone)); 
 	a first set of one or more processors (Miyazawa et al. col. 3 lines 18-20 Preferably, this power detector includes processing circuitry for forcing the mechanism to selectively enter or terminate a low-power sleep mode), configured to: 
 	determine that the audio data likely comprises data representing voice activity (Miyazawa et al. col. 3 lines 12-20 7) an input sound signal power detector in communication with at least the sound signal input unit and the interaction controller for detecting the volume, magnitude or amplitude of input sound signals based on sound signal waveforms perceived by the sound signal input unit or capture device, col. 10 lines 8-13 Control begins as step s1, as shown in Fig. 4. In step s1, input sound signal power detector 9 determines whether or not the power of the input sound signal is greater than a preset threshold th1, and outputs a signal indicating that a sound signal has been input when the power of the input sound signal becomes greater than threshold th1); and 
control passes on to step s6 and accumulated phrase detection data is used to determine whether or not the input sound signal contains a preregistered recognizable keyword using the above-described recognition techniques, col. 10 lines 48-51 if the input sound signal is determined to be a keyword in step s6, control instead passed to step s8 in which sleep mode flag is cleared for shifting the device from the sleep mode to the active mode); and 
	Miyazawa et al. fail to explicitly teach  
a second set of one or more processors, configured to: 
 	determine that the audio data likely does not comprise data representing the designated keyword; and 
 	generate, in response to determining that the audio data likely does not comprise data representing the designated keyword, information related to a determination not to transmit the audio data.  
 	However, Shenhav teaches
a second set of one or more processors (Shenhav [0014] If the electronic device determines a relatively high and/or high enough likelihood that the sound signal may be representative of a wake-up phase, then the electronic device may transmit the sound signal to a remote server, such as a recognition server, to further analyze the sound signal and determine of whether the sound signal is indeed representative of wake-up phrase), configured to: 
 	determine that the audio data likely does not comprise data representing the designated keyword (Shenhav Fig. 5 elements 506, 508, 510, 512 [0058] At block 506, it may be determined if the sound signal corresponds to a correct wake-up phrase, [0059] At block 506 if the correct wake-up phrase is not detected in the wake-up inquiry request, then at optional block 508, the recognition server 204 and associated processors 260 may log the results/message statistics of the inquiry); and 
 	generate, in response to determining that the audio data likely does not comprise data representing the designated keyword, information related to a determination not to transmit the audio data (Shenhav Fig. 5 elements 506, 508, 510, 512, [0058] At block 506, it may be determined if the sound signal corresponds to a correct wake-up phrase, [0059] At block 506 if the correct wake-up phrase is not detected in the wake-up inquiry request, then at optional block 508, the recognition server 204 and associated processors 260 may log the results/message statistics of the inquiry, [0060] If at block 506, it is determined that the received sound signal does correspond to a wake-up phrase, then the recognition server 204 may, at block 510, may process the logged results and/or statistics of the wake-up recognition. The method 500 may proceed to transmit a wake-up signal to the mobile device 200 at block 512. The wake-up signal, as described above, may enable the processors 212 to awake into an on state from a stand by state.)
Miyazawa et al. and Shenhav are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of waking up the device as taught by Miyazawa et al., using teaching of the wake-up detecting as taught by Shenhav for the benefit of determining whether or not transmitting the wake-up signal (Shenhav Fig. 5 elements 506, 508, 510, 512, [0058] At block 506, it may be determined if the sound signal corresponds to a correct wake-up phrase, [0059] At block 506 if the correct wake-up phrase is not detected in the wake-up inquiry request, then at optional block 508, the recognition server 204 and associated processors 260 may log the results/message statistics of the inquiry, [0060] If at block 506, it is determined that the received sound signal does correspond to a wake-up phrase, then the recognition server 204 may, at block 510, may process the logged results and/or statistics of the wake-up recognition. The method 500 may proceed to transmit a wake-up signal to the mobile device 200 at block 512. The wake-up signal, as described above, may enable the processors 212 to awake into an on state from a stand by state.)

With respect to Claim 30, Miyazawa et al. in view of Shenhav disclose
wherein the designated keyword comprises a wakeword indicative of device-directed speech (Shenhav [0058] At block 506, it may be determined if the sound signal corresponds to a correct wake-up phrase, [0059] At block 506 if the correct wake-up phrase is not detected in the wake-up inquiry request, then at optional block 508, the recognition server 204 and associated processors 260 may log the results/message statistics of the inquiry, [0060] If at block 506, it is determined that the received sound signal does correspond to a wake-up phrase, then the recognition server 204 may, at block 510, may process the logged results and/or statistics of the wake-up recognition. The method 500 may proceed to transmit a wake-up signal to the mobile device 200 at block 512. The wake-up signal, as described above, may enable the processors 212 to awake into an on state from a stand by state.)

 	With respect to Claim 32, Miyazawa et al. in view of Shenhav disclose 
wherein the first set of one or more processors configured to determine that the audio data likely comprises data representing the voice activity is further configured to determine that an energy level represented by the audio data is equal to or greater than an energy level threshold (Miyazawa et al. col. 4 lines 36-50 input signal power detection may be used for ambient noise feedback purposes to enable a speech recognition mechanism to take into account perceived noise levels in formulating the volume of response message and other audible functions. In so doing, may set an initial threshold for eliminating noise, and perform power detection for a specified duration of time using this threshold as the reference...is greater than the steady noise level.)

With respect to Claim 36, Miyazawa et al. in view of Shenhav teach
 	further comprising a network interface component, 
 	wherein the audio input component is further configured to generate second audio data representing sound detected by the microphone (Miyazawa col. 6 lines 43-48 The sound signal input from the microphone is first passed through the amplifier and the lowpass filter and converted into an appropriate sound waveform. This waveform is converted into a digital signal (e.g., 12 KHz, 16 bits) by the A/D converter, which is then sent to sound signal analyzer 2); 
 	wherein the first set of one or more processors (Miyazawa et al. col. 3 lines 18-20 Preferably, this power detector includes processing circuitry for forcing the mechanism to selectively enter or terminate a low-power sleep mode) is further configured to: 
 	determine that the second audio data likely comprises data representing voice activity (Miyazawa et al. col. 3 lines 12-20 7) an input sound signal power detector in communication with at least the sound signal input unit and the interaction controller for detecting the volume, magnitude or amplitude of input sound signals based on sound signal waveforms perceived by the sound signal input unit or capture device, col. 10 lines 8-13 Control begins as step s1, as shown in Fig. 4. In step s1, input sound signal power detector 9 determines whether or not the power of the input sound signal is greater than a preset threshold th1, and outputs a signal indicating that a sound signal has been input when the power of the input sound signal becomes greater than threshold th1); and 
 	determine, in response to determining that the second audio data likely comprises data representing second voice activity, that the second audio data likely comprises data representing the designated keyword (Miyazawa et al. Fig. 4 element s6 Is it a keyword? Col. 10 lines 37-41 control passes on to step s6 and accumulated phrase detection data is used to determine whether or not the input sound signal contains a preregistered recognizable keyword using the above-described recognition techniques, col. 10 lines 48-51 if the input sound signal is determined to be a keyword in step s6, control instead passed to step s8 in which sleep mode flag is cleared for shifting the device from the sleep mode to the active mode); and 
 	wherein the second set of one or more processors is further configured to: 
 	determine that the second audio data likely comprises data representing the designated keyword (Shenhav [0014] If the electronic device determines a relatively high and/or high enough likelihood that the sound signal may be representative of a wake-up phase, then the electronic device may transmit the sound signal to a remote server, such as a recognition server, to further analyze the sound signal and determine of whether the sound signal is indeed representative of wake-up phrase); 
 	cause the network interface component to send a transmission of at least a portion of the second audio data to a remote computing system, wherein the portion of the second audio data represents an utterance (Shenhav [0014] If the electronic device determines a relatively high and/or high enough likelihood that the sound signal may be representative of a wake-up phase, then the electronic device may transmit the sound signal to a remote server, such as a recognition server, to further analyze the sound signal and determine of whether the sound signal is indeed representative of wake-up phrase); and 
 	receive speech recognition results from the remote computing system (Shenhav [0015] the recognition server may receive the wake-up inquiry request from the electronic device and extract the sound signal therefrom. The recognition server may then analyze the sound signal using speech and/or voice recognition methods to determine if the sound signal is indicative of one or more wake-up phrase. If the sound signal is indicative of one or more wake-up phrases, then the recognition server may generate and transmit a wake-up signal to the electronic device. The wake-up signal may prompt the electronic device to wake up from the sleep or stand by state to a powered state.)

With respect to Claim 37, Miyazawa et al. in view of Shenhav teach
 	further comprising a speaker, wherein the speech recognition results comprise third audio data representing an audio output, and wherein the second set of one or more processors is further configured to cause the speaker to present the audio output (Miyazawa et al. col. 13 lines 25-27 the device will response in a loud voice if the speaker’s voice is loud, and in a soft voice if the speaker’s voice is soft.)

With respect to Claim 39, Miyazawa et al. in view of Shenhav teach
 	wherein the second set of one or more processors is further configured to generate speech recognition results for an utterance represented by the audio data (Shenhav [0015] the recognition server may receive the wake-up inquiry request from the electronic device and extract the sound signal therefrom. The recognition server may then analyze the sound signal using speech and/or voice recognition methods to determine if the sound signal is indicative of one or more wake-up phrase. If the sound signal is indicative of one or more wake-up phrases, then the recognition server may generate and transmit a wake-up signal to the electronic device. The wake-up signal may prompt the electronic device to wake up from the sleep or stand by state to a powered state.)

9.	 Claim 33 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Miyazawa et al. (5983186) and Shenhav (US 2014/0006825 A1) and Kim et al. (US 2010/0110834 A1).

With respect to Claim 33, Miyazawa et al. in view of Shenhav teach all the limitations of Claim 29 upon which Claim 33 depends. Miyazawa et al. in view of Shenhav fails to explicitly teach
 wherein the first set of one or more processors configured to determine that the audio data likely comprises data representing the voice activity is further configured to determine at 
However, Kim et al. teach 
wherein the first set of one or more processors configured to determine that the audio data likely comprises data representing the voice activity is further configured to determine at least one of a spectral slope between two frames of the audio data or a signal-to-noise ratio of the audio data within a spectral band (Kim et al. [0005] the VAD (Voice activity detection) detects the presence and/or absence of voice signals using magnitude values of the frequency spectrums of input signal, such as energy of voice signals, Zero Crossing Rate (ZCR), Level Crossing Rate (LCR), Signal to Noise Ratio (SNR), the statistical distribution of frequency components, etc.)
Miyazawa et al., Shenhav and Kim are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of waking up the device as taught by Miyazawa et al., using teaching of the wake-up detecting as taught by Shenhav for the benefit of determining whether or not transmitting the wake-up signal, using teaching of SNR as taught by Kim et al. for the benefit of detecting the presence of voice signal (Kim et al. [0005] the VAD (Voice activity detection) detects the presence and/or absence of voice signals using magnitude values of the frequency spectrums of input signal, such as energy of voice signals, Zero Crossing Rate (ZCR), Level Crossing Rate (LCR), Signal to Noise Ratio (SNR), the statistical distribution of frequency components, etc.)

10.	 Claim 34 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Miyazawa et al. (5983186) and Shenhav (US 2014/0006825 A1) and Zak (US 2005/0209858 A1.)

With respect to Claim 34, Miyazawa et al. in view of Shenhav teach all the limitations of Claim 29 upon which Claim 34 depends. Miyazawa et al. in view of Shenhav fails to explicitly teach
wherein the first set of one or more processors configured to determine that the audio data likely comprises data representing the voice activity comprises a first digital signal 
However, Zak teach	
wherein the first set of one or more processors configured to determine that the audio data likely comprises data representing the voice activity comprises a first digital signal processor, and wherein the first digital signal processor is further configured to activate a second digital processor in response to determining that the audio data likely comprises data representing the voice activity (Zak [0026] Speech processor 60 interfaces with microprocessor 62 and detects and recognizes speech input by a user via microphone 42. Generally, any speech processor known in the art may be used with the invention, for example, a digital signal processor (DSP). Speech processor 60 may include a voice activity detector (VAD) 54, a speech encoder (SPE) 56, and a voice recognition engine (VRE) 58, [0027] SPE 56 may also receive as input a signal output from VAD 54. The signal from VAD 54 may, for example, enable/disable SPE 56 in accordance with the voice activity/inactivity indication output by VAD 54. [0028] VRE 58 compares the encoded speech to a plurality of predetermined voice commands stored in memory 64. VRE 58 may recognize a limited vocabulary or may be more sophisticated as desired. The Examiner notes that both VAD, SPE and VRE are processed by digital signal processor.)
Miyazawa et al., Shenhav and Zak are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of waking up the device as taught by Miyazawa et al., using teaching of the wake-up detecting as taught by Shenhav for the benefit of determining whether or not transmitting the wake-up signal, using teaching of VAD as taught by Zak et al. for the benefit of enabling/disabling SPE in accordance with the voice activity/inactivity indication output by VAD (Zak [0026] Speech processor 60 interfaces with microprocessor 62 and detects and recognizes speech input by a user via microphone 42. Generally, any speech processor known in the art may be used with the invention, for example, a digital signal processor (DSP). Speech processor 60 may include a voice activity detector (VAD) 54, a speech encoder (SPE) 56, and a voice recognition engine (VRE) 58, [0027] SPE 56 may also receive as input a signal output from VAD 54. The signal from VAD 54 may, for example, enable/disable SPE 56 in accordance with the voice activity/inactivity indication output by VAD 54.)

11.	 Claim 35 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Miyazawa et al. (5983186) and Shenhav (US 2014/0006825 A1) and Weng et al. (US 2013/0173268 A1). 

With respect to Claim 35, Miyazawa et al. in view of Shenhav teach all the limitations of Claim 29 upon which Claim 35 depends. Miyazawa et al. in view of Shenhav fails to explicitly teach
wherein the first set of one or more processors configured to determine that the audio data likely comprises data representing a designated keyword comprises a digital signal processor, and wherein the digital signal processor is further configured to activate a microprocessor in response to determining that the audio data likely comprises data representing the designated keyword.
However, Weng et al. teach 
 	wherein the first set of one or more processors configured to determine that the audio data likely comprises data representing a designated keyword comprises a digital signal processor, and wherein the digital signal processor is further configured to activate a microprocessor in response to determining that the audio data likely comprises data representing the designated keyword (Weng et al. [0018] The audio data processor 112 compares the generated utterance data to predetermined utterance data 134 in the memory 128 that corresponds to one or more trigger phrases. If the generated utterance data correspond to the utterance data of the predetermined trigger phrase, the controller 134 activates other components in the telemedical device 100, including a speaker verification module, [0028] one or both of the audio data processor 112 and speaker verification module 116 include specialized processing devices such as digital signal processors (DSPs). The Examiner notes that the DSP processor is a particular type of microprocessor. In other words, the DSP is a microprocessor).
Miyazawa et al., Shenhav and Weng are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would The audio data processor 112 compares the generated utterance data to predetermined utterance data 134 in the memory 128 that corresponds to one or more trigger phrases. If the generated utterance data correspond to the utterance data of the predetermined trigger phrase, the controller 134 activates other components in the telemedical device 100, including a speaker verification module, [0028] one or both of the audio data processor 112 and speaker verification module 116 include specialized processing devices such as digital signal processors (DSPs).)
12.	 Claim 38 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Miyazawa et al. (5983186) and Shenhav (US 2014/0006825 A1) and Bennett et al. (US 2007/0094032 A1). 
With respect to Claim 38, Miyazawa et al. in view of Shenhav teach all the limitations of Claim 36 upon which Claim 38 depends. Miyazawa et al. in view of Shenhav teach fail to explicitly teach
 	wherein the speech recognition results comprise text data representing the utterance, and wherein the second set of one or more processors is further configured to determine an audio response to the utterance using the speech recognition results.  
	However, Bennett et al. teach
wherein the speech recognition results comprise text data representing the utterance, and wherein the second set of one or more processors is further configured to determine an audio response to the utterance using the speech recognition results (Bennett et al. [0042] Once the question is captured, the question is processed partially by NLQS client-side software resident in the client’s machine. The output of this partial processing is a set of speech vectors that are transported to the server via the Internet to complete the recognition of the user’s question. This recognized speech is then converted to text at the server, Fig. 3 elements 258 Receive the “Best” answer from the server, [0057] transmitting speech data for such utterances to a remote server, and receiving appropriate responses back from such server.)
Miyazawa et al., Shenhav and Bennett et al. are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of waking up the device as taught by Miyazawa et al., using teaching of the wake-up detecting as taught by Shenhav for the benefit of determining whether or not transmitting the wake-up signal, using teaching of the server side system as taught by Bennett et al. for the benefit of completing processing speech utterance and generating the best answer for the user’s question (Bennett et al. [0042] Once the question is captured, the question is processed partially by NLQS client-side software resident in the client’s machine. The output of this partial processing is a set of speech vectors that are transported to the server via the Internet to complete the recognition of the user’s question. This recognized speech is then converted to text at the server, Fig. 3 elements 258 Receive the “Best” answer from the server, [0057] transmitting speech data for such utterances to a remote server, and receiving appropriate responses back from such server.)
13.	 Claim 53 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Miyazawa et al. (5983186) and Shenhav (US 2014/0006825 A1) and Murthi et al. (US 2013/0132095 A1)
 	With respect to Claim 53, Cho et al. in view of Shenhav teach all the limitation of Claim 29 upon which Claim 53 depends. Cho et al. in view of Shenhav fail to explicitly teach
 	wherein the second set of one or more processors are deactivated in response to the second set of one or more processors determining that the audio data likely does not comprise data representing the designated keyword.  
	However, Murthi et al. teach 
wherein the second set of one or more processors are deactivated in response to the second set of one or more processors determining that the audio data likely does not comprise data representing the designated keyword (Murthi et al. [0087] If so, the computing device may remain activated. If not, a signal may be sent to the power supply 474 to revert back to standby mode in step 456. Fig. 6 element 454 Confirmation by Voice Recognition Engine, No, element 456 Revert to Standby Mode.)
The voice recognition engine may use more sophisticated algorithms than the pattern matching performed by the standby activation unit 464 to confirm activation with a much higher degree of certainty.)
14.	 Claim 41, 46 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Cho et al. (US 2012/0022863 A1) and Shenhav (US 2014/0006825 A1).
	With respect to Claim 41, Cho et al. disclose
 	A computer-implemented method comprising: 
 	under control of a computing system comprising a plurality of processors, 
 		receiving audio data representing sound detected by a microphone (Cho et al. [0027] a microphone); 
 	 	determining, by a first subset of the plurality of processors, that the audio data likely comprises data representing voice activity based at least partly on one of: a difference between two or more frames of the audio data; a classification model; or a state model (Cho et al. [0007] there is provided a method of detecting voice activity, the method being performed in a Continuous Listening environment and comprising: extracting a feature parameter from a frame signal; determining whether the frame signal is a voice signal or a noise signal by comparing the feature parameter with model parameter  of a plurality of comparison signal, respectively; and outputting the frame signal when the frame signal is determined to be a voice signal); 
 		in response to determining that the audio data likely comprises data representing the voice activity, determining, by the first subset of the plurality processors, that the audio data likely comprises data representing a designated keyword (Cho et al. [0038] When an auto activation module (not shown) is positioned between voice activity detection apparatus 10 and the voice recognizer, VAD apparatus 100 detects a section of voice activity from an input signal, and then sends the voice activity section as a detected voice signal to the auto activation module The auto activating module performs speaker/keyword recognition using the input signal transmitted from VAD apparatus 100 and then may transmit only a signal corresponding to a recognized speaker/ keyword to the voice recognizer); and
 	  	in response to determining that the audio data likely comprises data representing the designated keyword, performing, by a second subset of the plurality of processors, speech recognition on at least a portion of the audio data to obtain speech recognition results (Cho et al. [0038] When an auto activation module (not shown) is positioned between voice activity detection apparatus 10 and the voice recognizer, VAD apparatus 100 detects a section of voice activity from an input signal, and then sends the voice activity section as a detected voice signal to the auto activation module The auto activating module performs speaker/keyword recognition using the input signal transmitted from VAD apparatus 100 and then may transmit only a signal corresponding to a recognized speaker/ keyword to the voice recognizer); and 
	Cho et al. fail to explicitly teach
 	 	determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword.  
	However, Shenhav teaches 
 	 determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword (Shenhav Fig. 5 elements 506, 508, 510, 512, [0058] At block 506, it may be determined if the sound signal corresponds to a correct wake-up phrase, [0059] At block 506 if the correct wake-up phrase is not detected in the wake-up inquiry request, then at optional block 508, the recognition server 204 and associated processors 260 may log the results/message statistics of the inquiry, [0060] If at block 506, it is determined that the received sound signal does correspond to a wake-up phrase, then the recognition server 204 may, at block 510, may process the logged results and/or statistics of the wake-up recognition. The method 500 may proceed to transmit a wake-up signal to the mobile device 200 at block 512. The wake-up signal, as described above, may enable the processors 212 to awake into an on state from a stand by state.)
Cho et al. and Shenhav are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of If it is determined that the sound may be indicative of one or more wake-up phrases, then the electronic device may transmit a signal representative of the sound to the recognition server for further verification of whether the sound is indeed representative of one or more wake-up phrases. The recognition server may conduct this verification using computing and analysis resources, which in certain embodiments, may exceed the computing bandwidth of the relatively lower bandwidth processors of the electronic device.)

 	With respect to Claim 46, Cho et al. in view of Shenhav teach 
 	further comprising:
 	receiving second audio data representing sound detected by the microphone (Cho et al. [0027] a microphone); 
 	determining, by the first subset of the plurality of processors, that the second audio data likely comprises data representing voice activity (Cho et al. [0007] there is provided a method of detecting voice activity, the method being performed in a Continuous Listening environment and comprising: extracting a feature parameter from a frame signal; determining whether the frame signal is a voice signal or a noise signal by comparing the feature parameter with model parameter  of a plurality of comparison signal, respectively; and outputting the frame signal when the frame signal is determined to be a voice signal); 
 	in response to determining that the second audio data likely comprises data representing voice activity, determining, by the first subset of the plurality of processors, that the second audio data likely comprises data representing the designated keyword (Cho et al. [0038] When an auto activation module (not shown) is positioned between voice activity detection apparatus 10 and the voice recognizer, VAD apparatus 100 detects a section of voice activity from an input signal, and then sends the voice activity section as a detected voice signal to the auto activation module The auto activating module performs speaker/keyword recognition using the input signal transmitted from VAD apparatus 100 and then may transmit only a signal corresponding to a recognized speaker/ keyword to the voice recognizer); 
When an auto activation module (not shown) is positioned between voice activity detection apparatus 10 and the voice recognizer, VAD apparatus 100 detects a section of voice activity from an input signal, and then sends the voice activity section as a detected voice signal to the auto activation module The auto activating module performs speaker/keyword recognition using the input signal transmitted from VAD apparatus 100 and then may transmit only a signal corresponding to a recognized speaker/ keyword to the voice recognizer);  
 	sending at least a portion of the second audio data to a remote computing system (Shenhav Fig. 3 elements 310, 312, 314); and
receiving speech recognition results from the remote computing system (Shenhav Fig. 5 elements 506, 508, 510, 512, [0058] At block 506, it may be determined if the sound signal corresponds to a correct wake-up phrase, [0059] At block 506 if the correct wake-up phrase is not detected in the wake-up inquiry request, then at optional block 508, the recognition server 204 and associated processors 260 may log the results/message statistics of the inquiry, [0060] If at block 506, it is determined that the received sound signal does correspond to a wake-up phrase, then the recognition server 204 may, at block 510, may process the logged results and/or statistics of the wake-up recognition. The method 500 may proceed to transmit a wake-up signal to the mobile device 200 at block 512. The wake-up signal, as described above, may enable the processors 212 to awake into an on state from a stand by state.)

15.	 Claim 44 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Cho et al. (US 2012/0022863 A1) and Shenhav (US 2014/0006825 A1) and Zak (US 2005/0209858 A1.)

With respect to Claim 44, Cho et al. in view of Murthi et al. teach all the limitations of Claim 41 upon which Claim 44 depends. Cho et al. in view of Murthi et al. fails to explicitly teach
 	further comprising activating, by the first subset of the plurality of processors, a digital signal processor in response to determining that the audio data likely comprises data representing the voice activity, wherein the first subset of the plurality of processors comprises the digital 
 	However, Zak teaches
further comprising activating, by the first subset of the plurality of processors, a digital signal processor in response to determining that the audio data likely comprises data representing the voice activity, wherein the first subset of the plurality of processors comprises the digital signal processor, and wherein the determining that the audio data likely comprises data representing the designated keyword is performed using the digital signal processor (Zak [0024] memory 64 may store predetermined keywords or voice command recognized by speech processor 60, [0026] Speech processor 60 interfaces with microprocessor 62 and detects and recognizes speech input by a user via microphone 42. Generally, any speech processor known in the art may be used with the invention, for example, a digital signal processor (DSP). Speech processor 60 may include a voice activity detector (VAD) 54, a speech encoder (SPE) 56, and a voice recognition engine (VRE) 58, [0027] SPE 56 may also receive as input a signal output from VAD 54. The signal from VAD 54 may, for example, enable/disable SPE 56 in accordance with the voice activity/inactivity indication output by VAD 54. [0028] VRE 58 compares the encoded speech to a plurality of predetermined voice commands stored in memory 64. VRE 58 may recognize a limited vocabulary or may be more sophisticated as desired. The Examiner notes that both VAD, SPE and VRE are processed by digital signal processor.)
Cho et al., Shenhav and Zak et al. are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of voice activity detector and keyword detector as taught by Cho et al., using teaching of the speech recognition server as taught by Shenhav for the benefit of verifying of whether the sound is indeed representation of one or more wake-up phrases, using teaching of VAD as taught by Zak et al. for the benefit of enabling/disabling SPE in accordance with the voice activity/inactivity indication output by VAD (Zak [0026] Speech processor 60 interfaces with microprocessor 62 and detects and recognizes speech input by a user via microphone 42. Generally, any speech processor known in the art may be used with the invention, for example, a digital signal processor (DSP). Speech processor 60 may include a voice activity detector (VAD) 54, a speech encoder (SPE) 56, and a voice recognition engine (VRE) 58, [0027] SPE 56 may also receive as input a signal output from VAD 54. The signal from VAD 54 may, for example, enable/disable SPE 56 in accordance with the voice activity/inactivity indication output by VAD 54.)

16.	 Claim 47 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Cho et al. (US 2012/0022863 A1) and Shenhav (US 2014/0006825 A1) and Miyazawa et al. (5983186).

	With respect to Claim 47, Cho et al. in view of Shenhav teach all the limitation of Claim 46 upon which Claim 47 depends. Cho et al. in view of Shenhav fail to explicitly teach
 	further comprising presenting audio output using the speech recognition results, wherein the speech recognition results comprise third audio data representing the audio output.  
	However, Miyazawa et al. teach 
 	further comprising presenting audio output using the speech recognition results, wherein the speech recognition results comprise third audio data representing the audio output (Miyazawa et al. col. 13 lines 25-27 the device will response in a loud voice if the speaker’s voice is loud, and in a soft voice if the speaker’s voice is soft.)
 	Cho et al., Shenhav and Miyazawa et al. are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of voice activity detector and keyword detector as taught by Cho et al., using teaching of the speech recognition server as taught by Shenhav for the benefit of verifying of whether the sound is indeed representation of one or more wake-up phrases, using teaching of the audio output as taught by Miyazawa et al. for the benefit of presenting the audio output to the user (Miyazawa et al. col. 13 lines 25-27 the device will response in a loud voice if the speaker’s voice is loud, and in a soft voice if the speaker’s voice is soft.)

17.	 Claims 49, 50 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Cho et al. (US 2012/0022863 A1) and Shenhav (US 2014/0006825 A1) and Murthi et al. (US 2013/0132095 A1).

With respect to Claim 49, Cho et al. in view of Shenhav teach all the limitation of Claim 41 upon which Claim 49 depends. Cho et al. in view of Shenhav fail to explicitly teach
 	further comprising generating an instruction in response to the determining, by the second subset of the plurality of processors, that the second audio data likely does not comprise data representing the designated keyword, wherein the instruction comprises at least one of: a first instruction to stop processing of the audio data, a second instruction to stop transmission of the audio data, or a third instruction to deactivate the second subset of the plurality of processors.  
	However, Murthi et al. teach 
 further comprising generating an instruction in response to the determining, by the second subset of the plurality of processors, that the second audio data likely does not comprise data representing the designated keyword, wherein the instruction comprises at least one of: a first instruction to stop processing of the audio data, a second instruction to stop transmission of the audio data, or a third instruction to deactivate the second subset of the plurality of processors (Murthi et al. [0087] If so, the computing device may remain activated. If not, a signal may be sent to the power supply 474 to revert back to standby mode in step 456. Fig. 6 element 454 Confirmation by Voice Recognition Engine, No, element 456 Revert to Standby Mode.)
Cho et al., Shenhav and Murthi et al. are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of voice activity detector and keyword detector as taught by Cho et al., using teaching of the speech recognition server as taught by Shenhav for the benefit of verifying of whether the sound is indeed representation of one or more wake-up phrases, using teaching of the voice recognition engine as taught by Murthi et al for the benefit of confirming activation with higher degree of certainty (Murthi et al. [0088] The voice recognition engine may use more sophisticated algorithms than the pattern matching performed by the standby activation unit 464 to confirm activation with a much higher degree of certainty.)

 	With respect to Claim 50, Cho et al. in view of Shenhav teach all the limitation of Claim 41 upon which Claim 50 depends. Cho et al. in view of Shenhav fail to explicitly teach

 	activating at least one processor of the second subset of the plurality of processors in response to the determining, by the first subset of the plurality of processors, that the second audio data likely comprises data representing the designated keyword; and 
 	deactivating the at least one processor of the second subset of the plurality of processors in response to the determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword.  
	However, Murthi et al. teach 
 	further comprising: 
 	activating at least one processor of the second subset of the plurality of processors in response to the determining, by the first subset of the plurality of processors, that the second audio data likely comprises data representing the designated keyword (Murthi et al. Fig. 6 steps 440, 450, Pattern match? Yes, Activate Device); and
 	 deactivating the at least one processor of the second subset of the plurality of processors in response to the determining, by the second subset of the plurality of processors, that the audio data likely does not comprise data representing the designated keyword (Murthi et al. Fig. 6 steps 454, 456 Confirmation by Voice Recognition Engine, No, Revert to Standby Mode, [0087] a voice recognition engine 194 (Fig. 2) may then confirm in step 454 whether the user did in fact speak the correct activation phrase.)
Cho et al., Shenhav and Murthi et al. are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of voice activity detector and keyword detector as taught by Cho et al., using teaching of the speech recognition server as taught by Shenhav for the benefit of verifying of whether the sound is indeed representation of one or more wake-up phrases, using teaching of the voice recognition engine as taught by Murthi et al for the benefit of confirming activation with higher degree of certainty (Murthi et al. [0088] The voice recognition engine may use more sophisticated algorithms than the pattern matching performed by the standby activation unit 464 to confirm activation with a much higher degree of certainty.)

18.	 Claim 55 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Cho et al. (US 2012/0022863 A1) and Shenhav (US 2014/0006825 A1) and Dyba et al. (US 2009/0265169 A1.)

 	With respect to Claim 55, Miyazawa et al. in view of Shenhav teach all the limitations of Claim 41 upon which Claim 55 depends. Miyazawa et al. in view of Shenhav fail to explicitly teach  
wherein the determining, by the first subset of the plurality of processors, that the audio data likely comprises data representing voice activity comprises determining, by the first subset of the plurality of processors, that the audio data likely comprises data representing voice activity based on a spectral slope between the two or more frames of the audio data. 
However, Dyba et al. teach
wherein the determining, by the first subset of the plurality of processors, that the audio data likely comprises data representing voice activity comprises determining, by the first subset of the plurality of processors, that the audio data likely comprises data representing voice activity based on a spectral slope between the two or more frames of the audio data (Dyba et al. [0004] VAD algorithms usually formulate decision rules on a frame-by-frame basis using instantaneous measures of divergence distance between speech and noise. The different measures which are used in VAD algorithms may include spectral slope, correlation coefficients, logarithm likelihood ratio, cepstral, weighted cepstral, and modified distance measures.)
Cho et al., Shenhav and Dyba et al. are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of voice activity detector and keyword detector as taught by Cho et al., using teaching of the speech recognition server as taught by Shenhav for the benefit of verifying of whether the sound is indeed representation of one or more wake-up phrases, using teaching of the spectral slope as taught by Dyba et al. to detecting voice activity (Dyba et al. [0004] VAD algorithms usually formulate decision rules on a frame-by-frame basis using instantaneous measures of divergence distance between speech and noise. The different measures which are used in VAD algorithms may include spectral slope, correlation coefficients, logarithm likelihood ratio, cepstral, weighted cepstral, and modified distance measures.)

19.	 Claim 56 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Cho et al. (US 2012/0022863 A1) and Shenhav (US 2014/0006825 A1) and Skarin et al. (US 2011/0072052 A1.)

	With respect to Claim 56, Miyazawa et al. in view of Shenhav teach all the limitations of Claim 29 upon which Claim 56 depends. Miyazawa et al. in view of Shenhav fail to explicitly teach 
 	wherein the determining, by the first subset of the plurality of processors, that the audio data likely comprises data representing voice activity comprises determining, by the first subset of the plurality of processors, that the audio data likely comprises data representing voice activity based on one of: a linear classifier, a support vector machine, a decision tree, a hidden Markov model, or Gaussian mixture model.
	However, Skarin et al. teach 
 	wherein the determining, by the first subset of the plurality of processors, that the audio data likely comprises data representing voice activity comprises determining, by the first subset of the plurality of processors, that the audio data likely comprises data representing voice activity based on one of: a linear classifier, a support vector machine, a decision tree, a hidden Markov model, or Gaussian mixture model (Skarin et al [0084] A two-layer Hidden Markov Model (HMM) is then trained to detect voiced/unvoiced and speaking/non-speaking regions using the features. This method works very reliably even in noisy environment, with less than 2% error at 10 dB SNR.)
 	Cho et al., Shenhav and Skarin et al. are analogous art because they are from a similar field of endeavor in the in the Speech recognition techniques and applications. Thus, it would have been obvious to a person of ordinary skill in the art, at the time of invention, to modify the teaching of voice activity detector and keyword detector as taught by Cho et al., using teaching of the speech recognition server as taught by Shenhav for the benefit of verifying of whether the sound is indeed representation of one or more wake-up phrases, using teaching of the Hidden Markov Model as taught by Skarin et al. for the benefit of detecting voiced/unvoiced and speaking/non-speaking (Skarin et al [0084] A two-layer Hidden Markov Model (HMM) is then trained to detect voiced/unvoiced and speaking/non-speaking regions using the features. This method works very reliably even in noisy environment, with less than 2% error at 10 dB SNR.)

Allowable Subject Matter
20.	Claim 54 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and if the nonstatutory double patenting rejection noted above is overcome.
	
Conclusion
21.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. See PTO-892.
a.	Bringert et al. (US 2013/0085753 A1). Bringert et al. teach an embedded speech recognizer and a network speech recognizer for speech recognition. 
b.  	Somemo et al. (US 2013/0060571 A1). Somemo et al. teach integrated local and cloud based speech recognition. 
c.	Koll (US 2010/0057450 A1). Koll teach hybrid speech recognition. 

22.	Applicant’s amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 	

23.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429.  The examiner can normally be reached on Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RICHEMOND DORVIL can be reached on 571-272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/THUYKHANH LE/Primary Examiner, Art Unit 2658