DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
2.	 Applicant’s arguments and amendments in the Amendment, with respect to the rejections of claims 1 and 13, and claims depending therefrom, under 35 U.S.C. 103 have been fully considered and are persuasive in part, as detailed below.  Therefore, the rejection has been withdrawn.  However, upon further consideration, new grounds of rejection are made in view of Thomsen, U.S. Patent App. Pub. No. 2017/0064461 and Elias et al., U.S. Patent App. Pub. No. 2018/0224923. Original Claims 1, 8, 13, and 17 are amended, and new Claims 21 and 22 are added.  Amended independent Claims 1 and 13 have been considered as discussed below.  
3. 	Applicant argues in the Amendment that “both references fail to disclose two neural networks.”  However, each of Newell and Chaudhuri disclose a neural network, and the rejection is a combination of these two networks.
4.	Applicant argues in the Amendment that the network of Newell is designed to identify a word sequence, and thus cannot be modified to, for example, use 16 to 64 samples as recited in Claim 10 and 19, or use a 4 ms sample as recited in Claims 21 and 22.  However, paragraph 39 of Newell describes that the device is designed to normally be in a sleep state until speech activity is detected.  When speech activity is detected, the device “wakes up.”  This preliminary speech detection could use 16 to 64 samples, or a sample of 4 ms, as noted below.

6.	Applicant argues in the Amendment with regard to Claims 10 and 18 that the subject matter of these claims would not be a matter of routine experimentation, because 16 to 64 samples would not be enough for the devices describes by Newell and Chaudhuri, which discloses 1000 samples in an exemplary embodiment.  However, as held in In re Aller, 220 F.2d 454, 456, 105 USPQ 233, 235 (CCPA 1955), “[W]here the general conditions of a claim are disclosed in the prior art, it is not inventive to discover the optimum or workable ranges by routine experimentation.”  In this case, Chaudhuri discloses that the number of samples is a variable that effects the result of the processing, and thus numbers of samples outside the range disclosed by Chaudhuri are a matter of routine experimentation.  See MPEP 2144.05.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

7.	Claims 1, 4, 6, 7, 10-13, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Newell (US 2020/0143802) in view of Chaudhuri et al. (US 2017/0316792, herein “Chaudhuri”) and further in view of Thomsen, U.S. Patent App. Pub. No. 2017/0064461.
With regard to Claim 1, Newell teaches:
A device for detecting one of a plurality of keywords, comprising:
a microphone;  (microphone 130, paragraph 36)
an analog to digital converter (ADC) in communication with an output of the microphone to receive audio signals;  (ADC circuit, paragraph 42)
a processing unit in communication with an output of the ADC to receive digitized audio samples from the ADC;  (computer 110, paragraph 42)
a memory device, comprising instructions, which when executed by the processing unit, enable the device to:  (computer readable medium, paragraph 92)
wake up from a sleep mode;  (paragraph 39)
receive the plurality of digitized audio samples from the ADC; (receive sensor 130 data, paragraph 39)
use a plurality of digitized audio samples as an input to an auxiliary neural network, wherein the auxiliary neural network determines where any audio activity is paragraph 52, a neural network may be used to determine the behavior pattern from audio data, where para. 53 discloses the behavior pattern is from processing of the audio data, and para. 59 discloses the behavior pattern indicates audio attributes of the audio data) and
return to sleep mode if no audio activity is detected;” (go to the sleep state, paragraph 39)
Newell does not describe:
“capture a plurality of additional digitized audio samples only if audio activity is detected by the auxiliary neural network;
use the plurality of additional digitized audio samples to create a spectrogram;
provide the spectrogram as an input to a main neural network, wherein the main neural network comprises a plurality of outputs, where each output is a confidence level that the spectrogram contains a respective keyword.”
However Chaudhuri describes:
“capture a plurality of additional digitized audio samples;  (250 ms audio segments are collected in paragraph 41)
use the plurality of additional digitized audio samples to create a spectrogram;  (a spectrogram is generated in paragraph 45)
provide the spectrogram as an input to a main neural network, wherein the main neural network comprises a plurality of outputs, where each output is a confidence level that the spectrogram contains a respective keyword.”  (the spectrogram is fed into a neural network classifier in paragraph 46, the output is a likelihood (confidence level))

While Newell in view of Chaudhuri does teach the auxiliary neural network (see above rejection rationale), Newell in view of Chaudhuri does not explicitly describe to “capture a plurality of additional digitized audio samples only if audio activity is detected.”  However, paragraphs 21 and 33 of Thomsen describes a device using digital signal processors that acts to reduce the maximum power output of an amplifier only when the speech detector does not detect the presence of a speech signal in the input signal received by the speech detector.  Thus, Thomsen describes a device that includes a speech detector that is used to save power when speech is not detected.  Accordingly, It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate a speech detector that limits power usage when speech is not detected as described by Thomsen into the device described by Newell in view of Chaudhuri including the auxiliary neural network, as this would reduce power consumption, as described in the abstract of Thomsen.
With regard to Claim 4, Chaudhuri describes “the processing unit performs a function or activity based on the outputs from the main neural network.”  (paragraph 74 describes that caption text is created based on the output of the neural network - which are timing windows for the caption text)  It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate 
With regard to Claim 6, Newell describes “the device comprises a wakeup timer, and the processing unit executes the instructions each time the wakeup timer expires.” (Newell describes a wakeup timer in paragraph 39.)
With regard to Claim 7, Newell describes “the wakeup timer is set to a value between 25 and 250 milliseconds.”  (101 ms, paragraph 39 of Newell)
With respect to Claim 10, Newell does not explicitly describe “the plurality of digitized audio samples comprises between 16 and 64 digitized audio samples.”  However, paragraphs 41 and 42 of Chaudhuri describe exemplary amount of audio samples, such as 1000.  Accordingly, the number of samples is recognized by Chaudhuri as a result effective variable.  Thus, optimizing this variable would be a matter of routine experimentation.  See MPEP 2144.05.  Further, it would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate various numbers of input audio samples as described by Chaudhuri into the device described by Newell, as this allows for greater flexibility in generating captions from the audio data, as described in paragraph 16 of Chaudhuri.
With regard to Claim 11, Chaudhuri describes “the plurality of additional digitized audio samples comprises at least 1000 digitized audio samples.”  (Paragraph 41)  It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate providing a neural network that detects keywords as described by Chaudhuri into the device described by Newell, as this would allow the 
With regard to Claim 12, Chaudhuri describes “the main neural network is trained using spectrograms containing keywords and truncated versions of the spectrograms.”  (Paragraph 47 describes that the model may be trained using both audio streams and split (truncated) audio streams) It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate providing a neural network that detects keywords as described by Chaudhuri into the device described by Newell, as this would allow the device of Newell to provide captions of the audio input received, as described at paragraph 16 of Chaudhuri.
With regard to Claim 13, Newell describes:
“A software program, disposed on a non-transitory storage media (computer readable medium, paragraph 92), comprising instructions, which when executed by a processing unit (computer 110, paragraph 42) disposed on a device having a microphone (microphone 130, paragraph 36) and an analog to digital converter (ADC) (ADC circuit, paragraph 42), enable the device to:
wake up from a sleep mode;  (paragraph 39)
receive the plurality of digitized audio samples from the ADC; (receive sensor 130 data, paragraph 39)
use a plurality of digitized audio samples as an input to an auxiliary neural network, wherein the auxiliary neural network determines where any audio activity is detected; (paragraph 52, a neural network may be used to determine the behavior pattern from audio data, where para. 53 discloses the behavior pattern is from processing of the audio data, and para. 59 discloses the behavior pattern indicates audio attributes of the audio data) and
return to sleep mode if no audio activity is detected;”  (go to the sleep state, paragraph 39)
Newell does not describe:
“capture a plurality of additional digitized audio samples only if audio activity is detected by the auxiliary neural network;
use the plurality of additional digitized audio samples to create a spectrogram;
provide the spectrogram as an input to a main neural network, wherein the main neural network comprises a plurality of outputs, where each output is a confidence level that the spectrogram contains a respective keyword.”
However Chaudhuri describes:
“capture a plurality of additional digitized audio samples;  (250 ms audio segments are collected in paragraph 41)
use the plurality of additional digitized audio samples to create a spectrogram;  (a spectrogram is generated in paragraph 45)
provide the spectrogram as an input to a main neural network, wherein the main neural network comprises a plurality of outputs, where each output is a confidence level that the spectrogram contains a respective keyword.”  (the spectrogram is fed into a neural network classifier in paragraph 46, the output is a likelihood (confidence level))
It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate providing a neural network that detects 
While Newell in view of Chaudhuri does teach the auxiliary neural network (see above rejection rationale), Newell in view of Chaudhuri does not explicitly describe to “capture a plurality of additional digitized audio samples only if audio activity is detected.”  However, paragraphs 21 and 33 of Thomsen describes a device using digital signal processors that acts to reduce the maximum power output of an amplifier only when the speech detector does not detect the presence of a speech signal in the input signal received by the speech detector.  Thus, Thomsen describes a device that includes a speech detector that is used to save power when speech is not detected.  Accordingly, It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate a speech detector that limits power usage when speech is not detected as described by Thomsen into the device described by Newell in view of Chaudhuri including the auxiliary neural network, as this would reduce power consumption, as described in the abstract of Thomsen.
With respect to Claim 19, Chaudhuri describes “the plurality of additional digitized audio samples comprises at least 1000 digitized audio samples.”  (Paragraph 41)  Further, Newell in view of Chaudhuri does not explicitly describe “the plurality of digitized audio samples comprises between 16 and 64 digitized audio samples.”  However, paragraphs 41 and 42 of Chaudhuri describe exemplary amount of audio samples, such as 1000.  Accordingly, the number of samples is recognized by Chaudhuri as a result effective variable.  Thus, optimizing this variable would be a 
With respect to Claim 20, software program Claim 20 and device Claim 12 are related as an apparatus and the method of using same, with each claimed apparatus function corresponding to each claimed method step. Accordingly, Claim 20 is similarly rejected under the same rationale as applied above with respect to Claim 12.

8.	Claims 2, 3, 14, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Newell in view of Chaudhuri and Thomsen and further in view of Brothers et al. (US 2016/0358069, herein “Brothers”).
With regard to Claim 2, Newell in view of Chaudhuri does not describe “the main neural network is a convolutional neural network, comprising a convolutional stage and a fully connected stage, wherein the convolutional stage comprises one or more convolutional layers and the fully connected stage comprises one or more fully connected layers.”
However, Brothers describes convolutional neural networks (paragraph 23) which include at least one convolutional layer and at least one fully connected layer (paragraph 80).  It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate the convolutional neural network as described by Brothers into the device described by Newell in view of Chaudhuri and 
With regard to Claim 3, Newell in view of Chaudhuri does not describe “the auxiliary neural network comprises a fully connected neural network.”
However, Brothers describes convolutional neural networks (paragraph 23) which include at least one fully connected layer (paragraph 80).  It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate the convolutional neural network as described by Brothers into the device described by Newell in view of Chaudhuri and Thomsen, as this would allow the device of Newell in view of Chaudhuri and Thomsen to achieve better power and performance characteristics, as described in paragraph 27 of Brothers.
	With respect to Claims 14 and 15, software program Claims 14 and 15 and device Claims 2 and 3 are related as an apparatus and the method of using same, with each claimed apparatus function corresponding to each claimed method step. Accordingly, Claims 14 and 15 are similarly rejected under the same rationale as applied above with respect to Claims 2 and 3, respectively.  

9.	Claims 5 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Newell in view of Chaudhuri and Thomsen and further in view of Salsbery et al. (US 2012/0066526, herein “Salsbery”).
With regard to Claim 5, Newell describes in paragraph 39 that a processing unit returns to sleep, but does not describe “the processing unit returns to sleep mode in 
With respect to Claim 16, software program Claim 16 and device Claim 5 are related as an apparatus and the method of using same, with each claimed apparatus function corresponding to each claimed method step. Accordingly, Claim 16 is similarly rejected under the same rationale as applied above with respect to Claim 5.  

10.	Claims 8 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Newell  in view of Chaudhuri and Thomsen and further in view of Mandal et al. (US 10,726,830, herein “Mandal”).
With regard to Claim 8, Newell in view of Chaudhuri does not explicitly describe “the instructions to create the spectrogram enable the processing unit to: bin the plurality of additional digitized audio samples into one or more segments; perform a fast Fourier transform (FFT) of each of the one or more segments; perform Mel-cepstral conversion of the FFT for each segment to obtain Mel-cepstral information; and combine Mel-cepstral information from each segment to form the spectrogram.”
However, column 21, lines 15-26 of Mandal describes “bin the plurality of additional digitized audio samples into one or more segments.”  Column 21, lines 27-47 
With respect to Claim 17, software program Claim 17 and device Claim 8 are related as an apparatus and the method of using same, with each claimed apparatus function corresponding to each claimed method step. Accordingly, Claim 17 is similarly rejected under the same rationale as applied above with respect to Claim 8.  

11.	Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Newell in view of Chaudhuri and Thomsen and further in view of Kemmerer (US 10,897,663).
With regard to Claim 9, Newell in view of Chaudhuri does not explicitly describe “the instructions to create the spectrogram enable the processing unit to: bin the plurality of additional digitized audio samples into one or more segments; perform a fast Fourier transform (FFT) of each of the one or more segments; perform averaging of the 
However, column 6, line 62 to column 7, line 30 of Kemmerer describes dividing the audio data into 256 samples (column 6, line 64), performing an FFT on the samples (column 7, line 7), averaging the FFT for each sample (column 7, line 11), and combining spectral information to form a spectrogram (column 7, lines 20-30).  It would have been obvious to one of ordinary skill in the art at the time the present invention was made to incorporate FFT processing as described by Kemmerer into the device described by Newell in view of Chaudhuri and Thomsen, as this would allow the device of Newell in view of Chaudhuri and Thomsen to provide improved noise reduction, as described in column 4, lines 6-40 of Kemmerer.
With respect to Claim 18, software program Claim 18 and device Claim 9 are related as an apparatus and the method of using same, with each claimed apparatus function corresponding to each claimed method step. Accordingly, Claim 18 is similarly rejected under the same rationale as applied above with respect to Claim 9.

12.	Claims 21 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Newell in view of Chaudhuri and Thomsen and further in view of U.S. Patent App. Pub. No. 2018/0224923 (Elias et al., hereinafter “Elias”).
With regard to Claim 21, Newell in view of Chaudhuri does not explicitly describe “the plurality of digitized samples is collected during a time interval, and wherein the time interval is less than or equal to 4 millisecond.”

With respect to Claim 22, software program Claim 22 and device Claim 21 are related as an apparatus and the method of using same, with each claimed apparatus function corresponding to each claimed method step. Accordingly, Claim 22 is similarly rejected under the same rationale as applied above with respect to Claim 21.

Conclusion
13.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWARD TRACY whose telephone number is (571)272-8332. The examiner can normally be reached Monday-Friday 9 AM- 5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached at 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/EDWARD TRACY JR./           Examiner, Art Unit 2656                                                                                                                                                                                             
/MICHELLE M KOETH/Primary Examiner, Art Unit 2656