Morning DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
2.	 Applicant’s arguments and amendments in the Amendment filed June 13, 2022 with an RCE, with respect to the rejections of claims 1 and 13, and claims depending therefrom, under 35 U.S.C. 103 have been fully considered and are persuasive in part, as detailed below.  Therefore, the rejection has been withdrawn.  However, upon further consideration, new grounds of rejection are made in view of Li et al., U.S. Patent App. Pub. No. 20220223150. Claims 1, 13, and 19 are amended, Claims 6, 10, 21, and 22 are canceled, and new Claims 23-26 are added.  Amended independent Claims 1 and 13 have been considered as discussed below.  
3. 	Applicant argues in the Amendment that Newell would be rendered unsatisfactory for its primary purpose. This argument is rendered moot as Li is now the primary reference.
4.	Applicant argues in the Amendment that the previously cited art does not describe 16 to 64 samples or use a 4 ms sample.  However, Li is now cited for these features, as noted below.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5.	Claims 1, 4, 7, 11-13, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (U.S. Patent App. Pub. No. 20220223150, herein “Li”) in view of Newell (US 2020/0143802), Chaudhuri et al. (US 2017/0316792, herein “Chaudhuri”) and Thomsen, U.S. Patent App. Pub. No. 2017/0064461.
With regard to Claim 1, Li teaches:
A device for detecting one of a plurality of keywords, comprising:
a microphone; (microphone 163, paragraph 68)
a memory device, comprising instructions, which when executed by the processing unit, enable the device to: (memory 121, paragraph 68)
receive the plurality of digitized audio samples [[from the ADC]] wherein the plurality of digitized samples is between 16 and 64 samples and is collected during a time interval that is less than or equal to 4 milliseconds; (Paragraph 97 describes that Q samples are collected, where Q is an integer.  Accordingly, the number of samples is recognized by Li as a result effective variable.  Thus, optimizing this variable would be a matter of routine experimentation.  See MPEP 2144.05.  Further, each sample may be 0.1 ms, 0.2 ms, or the like.  With 16 to 39 sample of 0.1 ms, the sample time would be less than 4 ms.)
Li does not explicitly describe:
a wakeup timer; 
an analog to digital converter (ADC) in communication with an output of the microphone to receive audio signals; 
a processing unit in communication with an output of the ADC to receive digitized audio samples from the ADC; 
receive the plurality of digitized audio samples from the ADC;
wake up from a sleep mode after expiration of the wakeup timer; 
use a plurality of digitized audio samples as an input to an auxiliary neural network, wherein the auxiliary neural network determines where any audio activity is detected; 
return to sleep mode if no audio activity is detected; 
capture a plurality of additional digitized audio samples only if audio activity is detected by the auxiliary neural network;
use the plurality of additional digitized audio samples to create a spectrogram; and
provide the spectrogram as an input to a main neural network, wherein the main neural network comprises a plurality of outputs, where each output is a confidence level that the spectrogram contains a respective keyword.”
However, Newell describes:
a wakeup timer; (paragraph 39 describes a wakeup timer)
an analog to digital converter (ADC) in communication with an output of the microphone to receive audio signals; (ADC circuit, paragraph 42)
a processing unit in communication with an output of the ADC to receive digitized audio samples from the ADC; (computer 110, paragraph 42)
receive the plurality of digitized audio samples from the ADC; (receive sensor 130 data, paragraph 39)
wake up from a sleep mode after expiration of the wakeup timer; (paragraph 39)
use a plurality of digitized audio samples as an input to an auxiliary neural network, wherein the auxiliary neural network determines where any audio activity is detected; (paragraph 52, a neural network may be used to determine the behavior pattern from audio data, where para. 53 discloses the behavior pattern is from processing of the audio data, and para. 59 discloses the behavior pattern indicates audio attributes of the audio data) and
return to sleep mode if no audio activity is detected;” (go to the sleep state, paragraph 39)
It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate providing a neural network that detects audio activity as described by Newell into the device described by Li, as this would allow more accurate detection of audio activity, as described at paragraph 9 of Newell.
Li in view of Newell does not explicitly describe:
“capture a plurality of additional digitized audio samples only if audio activity is detected by the auxiliary neural network;
use the plurality of additional digitized audio samples to create a spectrogram;
provide the spectrogram as an input to a main neural network, wherein the main neural network comprises a plurality of outputs, where each output is a confidence level that the spectrogram contains a respective keyword.”
However Chaudhuri describes:
“capture a plurality of additional digitized audio samples; (250 ms audio segments are collected in paragraph 41)
use the plurality of additional digitized audio samples to create a spectrogram; (a spectrogram is generated in paragraph 45)
provide the spectrogram as an input to a main neural network, wherein the main neural network comprises a plurality of outputs, where each output is a confidence level that the spectrogram contains a respective keyword.”  (the spectrogram is fed into a neural network classifier in paragraph 46, the output is a likelihood (confidence level))
It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate providing a neural network that detects keywords as described by Chaudhuri into the device described by Li in view of Newell, as this would allow the device of Newell to provide captions of the audio input received, as described at paragraph 16 of Chaudhuri.
While Li in view of Newell and Chaudhuri does teach the auxiliary neural network (see above rejection rationale), Li in view of Newell and Chaudhuri does not explicitly describe to “capture a plurality of additional digitized audio samples only if audio activity is detected.”  However, paragraphs 21 and 33 of Thomsen describes a device using digital signal processors that acts to reduce the maximum power output of an amplifier only when the speech detector does not detect the presence of a speech signal in the input signal received by the speech detector.  Thus, Thomsen describes a device that includes a speech detector that is used to save power when speech is not detected.  Accordingly, It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate a speech detector that limits power usage when speech is not detected as described by Thomsen into the device described by Li in view of Newell and Chaudhuri including the auxiliary neural network, as this would reduce power consumption, as described in the abstract of Thomsen.
With regard to Claim 4, Chaudhuri describes “the processing unit performs a function or activity based on the outputs from the main neural network.”  (paragraph 74 describes that caption text is created based on the output of the neural network - which are timing windows for the caption text)  It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate providing a neural network that detects keywords as described by Chaudhuri into the device described by Li in view of Newell, as this would allow the device of Li in view of Newell to provide the captions described in paragraph 74 of Chaudhuri.
With regard to Claim 7, Newell describes “the wakeup timer is set to a value between 25 and 250 milliseconds.”  (101 ms, paragraph 39 of Newell)
It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate providing a wakeup timeras described by Newell into the device described by Li, as this would allow the device of Li in view of Newell to wakeup periodically to save power as described in paragraph 39 of Newell.
With regard to Claim 11, Chaudhuri describes “the plurality of additional digitized audio samples comprises at least 1000 digitized audio samples.”  (Paragraph 41) It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate providing a neural network that detects keywords as described by Chaudhuri into the device described by Li in view of Newell, as this would allow the device of Li in view of Newell to provide captions of the audio input received, as described at paragraph 16 of Chaudhuri.
With regard to Claim 12, Chaudhuri describes “the main neural network is trained using spectrograms containing keywords and truncated versions of the spectrograms.”  (Paragraph 47 describes that the model may be trained using both audio streams and split (truncated) audio streams) It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate providing a neural network that detects keywords as described by Chaudhuri into the device described by Li in view of Newell, as this would allow the device of Li in view of Newell to provide captions of the audio input received, as described at paragraph 16 of Chaudhuri.
With regard to Claim 13, Li describes:
A software program, disposed on a non-transitory storage media (memory 121, paragraph 68), comprising instructions, which when executed by a processing unit (processor 110, paragraph 68) disposed on a device having a microphone, (microphone 163, paragraph 68) …, enable the device to:
receive the plurality of digitized audio samples [[from the ADC]], wherein the plurality of digitized samples is between 16 and 64 samples and is collected during a time interval that is less than or equal to 4 milliseconds.” (Paragraph 97 describes that Q samples are collected, where Q is an integer.  Accordingly, the number of samples is recognized by Li as a result effective variable.  Thus, optimizing this variable would be a matter of routine experimentation.  See MPEP 2144.05.  Further, each sample may be 0.1 ms, 0.2 ms, or the like.  With 16 to 39 sample of 0.1 ms, the sample time would be less than 4 ms.)
Li does not explicitly describe:
“A … wakeup timer and an analog to digital converter (ADC), enable the device to:
wake up from a sleep mode after expiration of the wakeup timer; 
receive the plurality of digitized audio samples from the ADC; 
use a plurality of digitized audio samples as an input to an auxiliary neural network, wherein the auxiliary neural network determines where any audio activity is detected; and
return to sleep mode if no audio activity is detected;
 capture a plurality of additional digitized audio samples only if audio activity is detected by the auxiliary neural network;
use the plurality of additional digitized audio samples to create a spectrogram; and
provide the spectrogram as an input to a main neural network, wherein the main neural network comprises a plurality of outputs, where each output is a confidence level that the spectrogram contains a respective keyword.”
However, Newell describes:
“A wakeup timer (paragraph 39) and an analog to digital converter (ADC) (ADC circuit, paragraph 42), enable the device to:
wake up from a sleep mode after expiration of the wakeup timer; (paragraph 39)
receive the plurality of digitized audio samples from the ADC; (receive sensor 130 data, paragraph 39)
use a plurality of digitized audio samples as an input to an auxiliary neural network, wherein the auxiliary neural network determines where any audio activity is detected; (paragraph 52, a neural network may be used to determine the behavior pattern from audio data, where para. 53 discloses the behavior pattern is from processing of the audio data, and para. 59 discloses the behavior pattern indicates audio attributes of the audio data) and
return to sleep mode if no audio activity is detected.” (go to the sleep state, paragraph 39)
It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate providing a neural network that detects audio activity as described by Newell into the device described by Li, as this would allow more accurate detection of audio activity, as described at paragraph 9 of Newell.
Newell does not describe:
“capture a plurality of additional digitized audio samples only if audio activity is detected by the auxiliary neural network;
use the plurality of additional digitized audio samples to create a spectrogram;
provide the spectrogram as an input to a main neural network, wherein the main neural network comprises a plurality of outputs, where each output is a confidence level that the spectrogram contains a respective keyword.”
However, Chaudhuri describes:
“capture a plurality of additional digitized audio samples; (250 ms audio segments are collected in paragraph 41)
use the plurality of additional digitized audio samples to create a spectrogram; (a spectrogram is generated in paragraph 45)
provide the spectrogram as an input to a main neural network, wherein the main neural network comprises a plurality of outputs, where each output is a confidence level that the spectrogram contains a respective keyword.”  (the spectrogram is fed into a neural network classifier in paragraph 46, the output is a likelihood (confidence level))
It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate providing a neural network that detects keywords as described by Chaudhuri into the device described by Li in view of Newell, as this would allow the device of Li in view of Newell to provide captions of the audio input received, as described at paragraph 16 of Chaudhuri.
While Li in view of Newell and Chaudhuri does teach the auxiliary neural network (see above rejection rationale), Li in view of Newell and Chaudhuri does not explicitly describe to “capture a plurality of additional digitized audio samples only if audio activity is detected.”  However, paragraphs 21 and 33 of Thomsen describes a device using digital signal processors that acts to reduce the maximum power output of an amplifier only when the speech detector does not detect the presence of a speech signal in the input signal received by the speech detector.  Thus, Thomsen describes a device that includes a speech detector that is used to save power when speech is not detected.  Accordingly, It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate a speech detector that limits power usage when speech is not detected as described by Thomsen into the device described by Li in view of Newell and Chaudhuri including the auxiliary neural network, as this would reduce power consumption, as described in the abstract of Thomsen.
With respect to Claim 19, Chaudhuri describes “the plurality of additional digitized audio samples comprises at least 1000 digitized audio samples.”  (Paragraph 41) It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate various numbers of input audio samples as described by Chaudhuri into the device described by Li in view of Newell, as this allows for greater flexibility in generating captions from the audio data, as described in paragraph 16 of Chaudhuri.
With respect to Claim 20, software program Claim 20 and device Claim 12 are related as an apparatus and the method of using same, with each claimed apparatus function corresponding to each claimed method step. Accordingly, Claim 20 is similarly rejected under the same rationale as applied above with respect to Claim 12.

6.	Claims 2, 3, 14, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Newell, Chaudhuri, and Thomsen and further in view of Brothers et al. (US 2016/0358069, herein “Brothers”).
With regard to Claim 2, Li in view of Newell, Chaudhuri, and Thomsen does not describe “the main neural network is a convolutional neural network, comprising a convolutional stage and a fully connected stage, wherein the convolutional stage comprises one or more convolutional layers and the fully connected stage comprises one or more fully connected layers.”
However, Brothers describes convolutional neural networks (paragraph 23) which include at least one convolutional layer and at least one fully connected layer (paragraph 80).  It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate the convolutional neural network as described by Brothers into the device described by Li in view of Newell, Chaudhuri, and Thomsen, as this would allow the device of Li in view of Newell, Chaudhuri, and Thomsen to achieve better power and performance characteristics, as described in paragraph 27 of Brothers.
With regard to Claim 3, Li in view of Newell, Chaudhuri, and Thomsen does not describe “the auxiliary neural network comprises a fully connected neural network.”
However, Brothers describes convolutional neural networks (paragraph 23) which include at least one fully connected layer (paragraph 80).  It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate the convolutional neural network as described by Brothers into the device described by Li in view of Newell, Chaudhuri, and Thomsen, as this would allow the device of Li in view of Newell, Chaudhuri, and Thomsen to achieve better power and performance characteristics, as described in paragraph 27 of Brothers.
	With respect to Claims 14 and 15, software program Claims 14 and 15 and device Claims 2 and 3 are related as an apparatus and the method of using same, with each claimed apparatus function corresponding to each claimed method step. Accordingly, Claims 14 and 15 are similarly rejected under the same rationale as applied above with respect to Claims 2 and 3, respectively.  

7.	Claims 5 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Newell, Chaudhuri, and Thomsen and further in view of Salsbery et al. (US 2012/0066526, herein “Salsbery”).
With regard to Claim 5, Newell describes in paragraph 39 that a processing unit returns to sleep, but does not describe “the processing unit returns to sleep mode in less than 1 milliseconds.”  However, Figure 1D of Salsbery describes a processing unit that falls asleep (transition 25A on Figure 1D) and then wakes back up in 0.1 ms.  It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate the quick sleep time as described by Salsbery into the device described by Li in view of Newell, Chaudhuri, and Thomsen, as this would allow the device of Li in view of Newell, Chaudhuri, and Thomsen to use less power, as described in paragraph 13 of Salsbery.
With respect to Claim 16, software program Claim 16 and device Claim 5 are related as an apparatus and the method of using same, with each claimed apparatus function corresponding to each claimed method step. Accordingly, Claim 16 is similarly rejected under the same rationale as applied above with respect to Claim 5.  

8.	Claims 8 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Newell, Chaudhuri, and Thomsen and further in view of Mandal et al. (US 10,726,830, herein “Mandal”).
With regard to Claim 8, Li in view of Newell, Chaudhuri, and Thomsen does not explicitly describe “the instructions to create the spectrogram enable the processing unit to: bin the plurality of additional digitized audio samples into one or more segments; perform a fast Fourier transform (FFT) of each of the one or more segments; perform Mel-cepstral conversion of the FFT for each segment to obtain Mel-cepstral information; and combine Mel-cepstral information from each segment to form the spectrogram.”
However, column 21, lines 15-26 of Mandal describes “bin the plurality of additional digitized audio samples into one or more segments.”  Column 21, lines 27-47 of Mandal describes “perform a fast Fourier transform (FFT) of each of the one or more segments.”  Column 23, lines 8-25 of Mandal describes “perform Mel-cepstral conversion of the FFT for each segment to obtain Mel-cepstral information” and “combine Mel-cepstral information from each segment to form the spectrogram.”  (The extracted frequency bands form the spectrogram)  It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate FFT processing as described by Mandal into the device described by Li in view of Newell, Chaudhuri, and Thomsen, as this would allow the device of Li in view of Newell, Chaudhuri, and Thomsen to optimize the components of the device for automatic speech recognition, as described in column 3, lines 13-24 of Mandal.
With respect to Claim 17, software program Claim 17 and device Claim 8 are related as an apparatus and the method of using same, with each claimed apparatus function corresponding to each claimed method step. Accordingly, Claim 17 is similarly rejected under the same rationale as applied above with respect to Claim 8.  

9.	Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Newell, Chaudhuri, and Thomsen and further in view of Kemmerer (US 10,897,663).
With regard to Claim 9, Li in view of Newell, Chaudhuri, and Thomsen does not explicitly describe “the instructions to create the spectrogram enable the processing unit to: bin the plurality of additional digitized audio samples into one or more segments; perform a fast Fourier transform (FFT) of each of the one or more segments; perform averaging of the FFT for each segment to obtain spectral information; and combine spectral information from each segment to form the spectrogram.”
However, column 6, line 62 to column 7, line 30 of Kemmerer describes dividing the audio data into 256 samples (column 6, line 64), performing an FFT on the samples (column 7, line 7), averaging the FFT for each sample (column 7, line 11), and combining spectral information to form a spectrogram (column 7, lines 20-30).  It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate FFT processing as described by Kemmerer into the device described by Li in view of Newell, Chaudhuri, and Thomsen, as this would allow the device of Li in view of Newell, Chaudhuri, and Thomsen to provide improved noise reduction, as described in column 4, lines 6-40 of Kemmerer.
With respect to Claim 18, software program Claim 18 and device Claim 9 are related as an apparatus and the method of using same, with each claimed apparatus function corresponding to each claimed method step. Accordingly, Claim 18 is similarly rejected under the same rationale as applied above with respect to Claim 9.

10.	Claims 23 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Newell, Chaudhuri, and Thomsen and further in view of U.S. Patent App. Pub. No. 2014/0149118 (Lee et al., hereinafter “Lee”).
With regard to Claim 23, Li in view of Newell, Chaudhuri, and Thomsen does not explicitly describe “a total time from when the network device wakes up from sleep mode after expiration of the wakeup timer to a time when the network device returns to sleep mode is less than 5 milliseconds when no audio activity is present.”
However, paragraph 99 of Lee describes a device that enters standby mode from sleep mode and waits for an audio command.  If no command coms within a specified time (for example, one minute), the device returns to sleep mode. Thus, the time to return to sleep mode is recognized by Lee as a result effective variable.  Accordingly, optimizing this variable would be a matter of routine experimentation.  See MPEP 2144.05.   It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate the return to sleep mode as described by Lee into the device described by Newell in view of Chaudhuri and Thomsen, as this would allow the device of Newell in view of Chaudhuri and Thomsen to use less power when detecting audio activity to wake up, as described in paragraph 99 of Lee.
With respect to Claim 24, software program Claim 24 and device Claim 23 are related as an apparatus and the method of using same, with each claimed apparatus function corresponding to each claimed method step. Accordingly, Claim 24 is similarly rejected under the same rationale as applied above with respect to Claim 23.

11.	Claims 25 and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Newell, Chaudhuri, and Thomsen and further in view of U.S. Patent No. 20150326985 (Priyantha et al., hereinafter “Pri”).
With regard to Claim 25, Li in view of Newell, Chaudhuri, and Thomsen does not explicitly describe “the microphone is powered off in sleep mode.”
However, paragraph 2 of Pri describes a device powers off a microphone in sleep mode while waiting for an audio signal.  It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate the powered off microphone as described by Pri into the device described by Li in view of Newell, Chaudhuri, and Thomsen, as this would allow the device of Li in view of Newell, Chaudhuri, and Thomsen to use less power when detecting audio activity to wake up, as described in paragraph 2 of Pri.
With respect to Claim 26, software program Claim 26 and device Claim 25 are related as an apparatus and the method of using same, with each claimed apparatus function corresponding to each claimed method step. Accordingly, Claim 26 is similarly rejected under the same rationale as applied above with respect to Claim 25.

Conclusion
12. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
U.S. Patent App. Pub. No. 20200296236 (Kobayashi) describes returning to sleep mode after 5 seconds.
13.	 A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWARD TRACY whose telephone number is (571)272-8332. The examiner can normally be reached Monday-Friday 9 AM- 5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached at 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/EDWARD TRACY JR./           Examiner, Art Unit 2656                                                                                                                                                                                             

/BHAVESH M MEHTA/           Supervisory Patent Examiner, Art Unit 2656