DETAILED ACTION
This Office Action is in response to the correspondence filed by the applicant on 1/19/2021.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The Information Statements (IDS) filed on 1/19/2021 have been accepted and considered in this office action and are in compliance with the provisions of 37 CFR 1.97.


Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 2-21 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-27 of U.S. Patent No. 9,697,828. Although the claims, at issue are not identical, they are not patentably distinct from each other because the claims of the U.S. Patent anticipate the claims of the instant application.  Please see below for the mapping in the table, where the bolded or underlined limitations indicate the corresponding limitations between the U.S. Patent and instant application.  

Instant application: 17/090,716
U.S. Patent No. 9,697,828
Claim 2: 
 A system comprising: 


computer-readable memory storing executable instructions; and 
one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least:


receive audio data from a computing device subsequent to the computing device determining a wake word was detected in a first portion of the audio data;









generate speech recognition result data using at least the first portion of the audio data; 



generate acoustic data representing an acoustic property of a voice represented by the audio data; 












generate feature data using the speech recognition result data and the acoustic data; and determine, using a statistical wake word detection model configured to receive the feature data as input, that the audio data fails to satisfy a detection criterion related to detecting a representation of the wake word.


Claim 1: 
A system comprising: 


a computer-readable memory storing executable instructions; and 
one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least: 


obtain from a client device: an audio 
signal, wherein a first portion of the audio signal comprises audio data likely 
corresponding to a wake word, and wherein a second portion of the audio signal does not comprise audio data likely corresponding to the wake word;  


contextual information associated with the audio signal;  


and information indicating the first portion of the audio signal comprises audio data likely corresponding to the wake word;  


obtain acoustic information and environmental information from the first portion of the audio signal, wherein the acoustic information reflects one or more characteristics of a voice in the audio signal, and


 
wherein the environmental information reflects one or more characteristics of 
an environment in which sound in the audio signal was recorded;  


determine whether audio data corresponding to the wake word is present in the audio signal using a server-side detection model configured to generate a detection score using the contextual information, the environmental information, the acoustic information, and 
natural language understanding results generated based at least partly on at least one of the audio signal or a subsequent audio signal, 


wherein a detection score greater than a detection threshold indicates that audio data corresponding to the wake word is present in the audio signal;  


in response to determining that audio data corresponding to the wake word is 
present in the audio signal, perform an action corresponding to a request in 
the audio signal;  and 

in response to determining that audio data corresponding to the wake word is not present in the audio signal, close an audio signal stream from the client device.


Independent Claim 12 is similar to Claim 1; thus, Claim 12 is rejected under the same rationale.
With respect to the dependent claims, each of the claims maps to a corresponding dependent claim of the U.S. Patent or are found within the scope of the independent claim.

Claims 2-21 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-15 of U.S. Patent No. 10,832,662. Although the claims, at issue are not identical, they are not patentably distinct from each other because the claims of the U.S. Patent anticipate the claims of the instant application.  Please see below for the mapping in the table, where the bolded or underlined limitations indicate the corresponding limitations between the U.S. Patent and instant application.  

Instant application: 17/090,716
U.S. Patent No. 10,832,662
Claim 2: 
 A system comprising: 

computer-readable memory storing executable instructions; and 
one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least:


receive audio data from a computing device subsequent to the computing device determining a wake word was detected in a first portion of the audio data;



generate speech recognition result data using at least the first portion of the audio data; 



generate acoustic data representing an acoustic property of a voice represented by the audio data; 





generate feature data using the speech recognition result data and the acoustic data; and 

determine, using a statistical wake word detection model configured to receive the feature data as input, that the audio data fails to satisfy a detection criterion related to detecting a representation of the wake word.


Claim 1: 
A system comprising:

computer-readable memory storing executable instructions; and
one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least:


receive audio data from a computing device subsequent to the computing device determining a wake word was detected in a first portion of the audio data;



generate speech recognition result data using at least the first portion of audio data and a second portion of the audio data;

generate acoustic data representing an acoustic property of a voice represented by the audio data;

generate contextual data representing a context associated with the audio data;


generate feature data using the speech recognition result data, acoustic data, and contextual data;

determine, using a statistical wake word detection model configured to receive the feature data as input, that the audio data fails to satisfy a detection criterion related to detecting a representation of the wake word; and


close an audio data stream from the computing device.


Independent Claim 12 is similar to Claim 1; thus, Claim 12 is rejected under the same rationale.
With respect to the dependent claims, each of the claims maps to a corresponding dependent claim of the U.S. Patent or are found within the scope of the independent claim.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 2 and 4-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The independent claims 2 and 12 recite, “A system comprising: computer-readable memory storing executable instructions; and one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least: receive audio data from a computing device subsequent to the computing device determining a wake word was detected in a first portion of the audio data; generate speech recognition result data using at least the first portion of the audio data; generate acoustic data representing an acoustic property of a voice represented by the audio data; generate feature data using the speech recognition result data and the acoustic data; and determine, using a statistical wake word detection model configured to receive the feature data as input, that the audio data fails to satisfy a detection criterion related to detecting a representation of the wake word.”
The recited limitations as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “one or more processors …. the computing device …”, nothing in the claim element precludes the step from practically being performed in the mind. For example, a person can listen to another person’s calling, the person can determine if his/her name was called, determine if the other person is a known person based on his/her voice characteristics, and determine that the other person is indeed calling him/her. The limitations, as drafted, are processes that, under its broadest reasonable interpretation, cover performance of the limitations in the mind. 
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas.  Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claims only recite additional elements – “one or more processors … the computing device …”. The additional elements in both steps is recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of the recited steps) such that it amounts no more than mere instructions to apply the exception using a generic computer component.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.  The claim is directed to an abstract idea.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a processor to perform the recited steps amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.  The claim is not patent eligible.

Regarding the dependent claims, Claim 4 recites generating a detection score; Claim 5 recites determining acoustic property; Claim 6 recite environmental data; Claim 7 recites a noise level of the environmental data; Claim 8 recites a user-specific wake word detection model; claim 9 recites sending the user-specific model to a computing device; Claim 10 recites identity data; Claim 11 recites a neural network; Claim 13 recites receiving the audio data over a network; Claim 14 recites identity data; Claim 15 recites a microphone of the computing device; Claim 16 recites generating NLP understanding data; Claim 17 recites generating detection score; Claim 18 recites generating a user-specific model; Claim 19 recites generating second feature data; Claim 20 recites sending the model to a computing device; and Claim 21 recites generating environmental data.
  Even though the disclosed invention is described in the specification as improving computer technology, the claim provides no meaningful limitations such that this improvement is realized. Therefore, the claim does not amount to significantly more than the abstract idea itself. 
Accordingly, the limitations of the Claims, whether considered individually or as an ordered combination, are not sufficient to add significantly more to improve technological functionality. As such, Claims 2-21 are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter.

Most pertinent prior art:
MALLINSON (US 2014/0237277 A1) discloses a system comprising: computer-readable memory storing executable instructions (MALLINSON Fig. 1); and one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions (MALLINSON Fig. 1; Par 48 – “processor”; Par 52 – “computer readable medium … memory .. instructions …”) to at least: 
receive audio data from a computing device (MALLINSON Fig. 1 – “Client Device Platform”; Fig. 3 – “Deliver Non-tactile input to an external server over the network 364”; Par 48 – “FIG. 3 is a flow diagram describing a process 300 for implementing one or more operations that are initiated by a non-tactile input signal that is detected by the client device platform 105 while the client device platform 105 is in a low-power state, according to an additional aspect of the present disclosure. Process 300 is substantially similar to process 200 while operating in the low-power state until after decision block 363. When the first confidence score is equal to or above the first threshold level, the low-power processor delivers the recorded non-tactile inputs to a cloud based server 106 over the network 160, as indicated at block 364.”; Par 14 – “Examples of such non-tactile inputs include, but are not limited to, audio inputs, which may be received, e.g., via a microphone, and optical inputs, which may be received, e.g., by an optical sensor or image capture device.”) subsequent to the computing device determining a wake word was detected (MALLINSON Fig. 3 – “1st Confidence Score > 1th threshold? 363”; Par 48 – “When the first confidence score is equal to or above the first threshold level, the low-power processor delivers the recorded non-tactile inputs to a cloud based server 106 over the network 160, as indicated at block 364.”; Par 31 – “The first analysis may be implemented through the use of one or more algorithms that are used to produce the first confidence score. The first confidence score corresponds to a degree of similarity between the recorded non-tactile inputs and the one or more reference inputs that are stored on the low-power memory 148.”; Par 16 – “Each of the one or more operations may be associated with a specific first reference signal. By way of example, and not by way of limitation, if the first reference signal is the phrase “Device On”, then the operation that is associated with the first reference signal may cause the client device platform to initiate a full-power state.”) in a first portion of the audio data (MALLINSON Par 17 – “The first reference signal may be shorter than the second reference signal. Therefore, less data needs to be stored on the low-power memory in order to analyze the signal. By way of example, and not by way of limitation, the first reference signal may be used to determine if a human voice has been detected, or if a short phrase, such as “device on” has been spoken by a user. The second reference signal may be longer, and may be associated with a more complex operation. By way of example, the second reference signal may be used to determine if a human voice has spoken the phrase, “device on—play video game one”.”); 
generate speech recognition result data (MALLINSON Par 45 – “By way of example, and not by way of limitation, if the recorded non-tactile input is audio data, then the second confidence score may be generated with a high quality ASR, such as one that may incorporate the use of auditory attention cues, or by breaking the recorded speech into phonemes or by using an array and AEC of multi-channel data instead of single channel data in low power mode. ..”; Par 29 – “The reference input for speech recognition can be done in a number of ways. Pure text is one possible way, but perhaps not the most reliable since it needs to be machine processed and converted to a phonetic representation.”; Par 35 – “Since an ASR algorithm is capable of determining the words that have been spoken, it will be capable of comparing the actual words spoken in the recoded non-tactile input to the words in the reference input.”) using at least the first portion of the audio data (MALLINSON Par 44 – “The second confidence score corresponds to a degree of similarity between the recorded non-tactile inputs and one or more second reference inputs that are stored on the larger memory that may be accessible to the client device platform 105 in the intermediate-power state. The second reference signals may be the same as the first reference signals and also may include additional reference signals that would occupy too much space and therefore may not have been stored in the limited memory available in the low-power state. For example, in addition to the reference signal “Device on” that may be stored in the low power memory 148, a longer reference input such as, “Device on—play video game one” may be accessible in the intermediate-power state.”; Par 18 – “According to an additional aspect of the present disclosure, the second analysis may be implemented on a cloud-based server. When the first analysis produces a first confidence score that is above the first threshold value, the client device platform may deliver the non-tactile input to the cloud-based server over a network. The second analysis is then performed on the cloud-based server. If the second analysis produces a second confidence score that is higher than the second threshold value, then the cloud-based server may deliver a command back to the client device platform over the network that instructs it to execute the one or more operations associated with the reference signal.”); 
generate acoustic data representing an acoustic property of a voice represented by the audio data (MALLINSON Par 44 – “The confidence score can be calculated based on all data. The second confidence score corresponds to a degree of similarity between the recorded non-tactile inputs and one or more second reference inputs that are stored on the larger memory that may be accessible to the client device platform 105 in the intermediate-power state.”); 
generate feature data using the speech recognition result data and the acoustic data (MALLINSON Par 45 – “The second confidence score may be generated through the use of one or more additional algorithms. Since there are more CPU cycles available, these algorithms may be more robust and capable of more detailed analysis of the recorded non-tactile inputs. By way of example, and not by way of limitation, if the recorded non-tactile input is audio data, then the second confidence score may be generated with a high quality ASR, such as one that may incorporate the use of auditory attention cues, or by breaking the recorded speech into phonemes or by using an array and AEC of multi-channel data instead of single channel data in low power mode. If the recorded non-tactile input is video data, then the second confidence score may be generated through the use of facial recognition algorithms, or advanced gesture recognition algorithms.”); and 
determine, [using a statistical wake word detection model configured to receive the feature data as input], that the audio data fails to satisfy a detection criterion related to detecting a representation of the wake word (MALLINSON Fig. 3 – “2nd Confidence Score > 2nd Threshold? 366”; Par 49 – “Once the second confidence score has been generated, process 300 continues on to decision block 366. If the second confidence score is below a second threshold value then process 300 returns to block 361 and continues recording non-tactile inputs. When the second confidence score is above the second threshold value, process 300 continues to block 367. At block 367 the cloud based server 106 delivers a command signal to the client device platform 105 that will direct it to execute the one or more operations that are associated with the one or more reference inputs that were matched by the recorded non-tactile inputs.”; Par 17 – “By way of example, the second reference signal may be used to determine if a human voice has spoken the phrase, “device on—play video game one”. If that phrase is matched with a sufficiently high second confidence value then a command signal may be generated that instructs the client device platform to execute a more complex operation, such as initiating a full-power state on the client device platform, and loading video game one so it is ready to be played by a user.”).  
However, MALLIONSON does not explicitly teach the [square-bracketed] limitations.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN C KIM whose telephone number is (571)272-3327. The examiner can normally be reached Monday to Friday 8:00 AM thru 4:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JONATHAN C KIM/Primary Examiner, Art Unit 2655