DETAILED ACTION
This Office Action is in response to the correspondence filed by the applicant on 7/22/2020.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The Information Statements (IDS) filed on 8/20/2022 and 8/06/2021 have been accepted and considered in this office action and are in compliance with the provisions of 37 CFR 1.97.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.  A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with 
The USPTO internet Web site contains terminal disclaimer forms which may be used.  Please visit http://www.uspto.gov/forms/.  The filing date of the application will determine what form should be used.  A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission.  For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.  

Claims 1-21 are rejected on the ground of nonstatutory double patenting as being unpatentable over Claims 1-20 of US PAT 10,586,540 in view of PASKO (US 10,777,203 B1). Although the claims, at issue are not identical, they are not patentably distinct from each other because the claims of the instant application are rejected as being unpatentable over the claims of the US PAT 10,586,540 in view of PASKO.  Please see below for the mapping in the table, where the bolded limitations indicate the corresponding limitations between the US PAT and instant application.  


16/812,758
US PAT 10,586,540
1. (Original) A playback device comprising: a network interface; one or more microphones configured to detect sound; at least one speaker; one or more processors; data storage having instructions stored thereon that are executable by the one or more processors to cause the playback device to perform functions comprising: 



monitoring an input sound-data stream representing the sound detected by the one or more microphones for (i) a wake-word event and (ii) a media playback system keyword event; 


















detecting a first media playback system keyword event, wherein detecting the first media playback system keyword event comprises after detecting a first sound via the one or more microphones, 

determining, with at least a threshold confidence, that the detected first sound includes a first media playback system keyword, wherein the first media playback system keyword is one of a plurality of command keywords supported by the playback device;























in response to detecting the first media playback system keyword event, processing, via a local voice input engine of a media playback system voice assistant, the first sound as a first voice input, wherein processing the first sound comprises:


(i) determining that one or more media playback system keyword conditions corresponding to the first media playback system keyword are satisfied; and 


[(ii) determining that local voice input engine is unable to determine an intent of the first voice input, wherein determining that local voice input engine is unable to determine the intent of the first voice input comprises determining that one or more parameter slots associated with the first media playback system keyword are not matched with keywords in the first voice input; 














based on (a) detecting the first media playback system keyword event and (b) determining that local voice input engine is unable to determine the intent of the first voice input, sending, via the network interface, sound data corresponding to at least a portion of the first voice input to one or more servers of the media playback system voice assistant for processing of the first voice input]; 





after receiving data indicating one or more first playback operations according to an intent of the first voice input as determined by the one or more servers of the media playback system voice assistant, performing the one or more first playback operations; 



detecting a first wake-word event, wherein detecting the first wake-word event comprises after detecting a second sound via the one or more microphones, determining that the detected second sound includes a second voice input comprising a first wake word; and 

in response to detecting the first wake-word event, streaming, via the network interface, sound data corresponding to at least a portion of the second voice input to one or more remote servers of a first voice assistant service.
1. A playback device comprising:
a network interface;
at least one microphone configured to detect sound; at least one speaker;
one or more processors;
data storage having instructions stored thereon that are executable by the one or more processors to cause the playback device to perform functions comprising:



monitoring an input sound-data stream representing the sound detected by the at least one microphone for (i) a wake-word event and (ii) a first command keyword event;


detecting the wake-word event, wherein detecting the wake-word event comprises after detecting a first sound via the one or more microphones, determining that the detected first sound includes a first voice input comprising a wake word;


in response to detecting the wake-word event, (i) outputting audible feedback that a wake-word event was detected and (ii) streaming, via the network interface, sound data corresponding to at least a portion of the first voice input to one or more remote servers of a voice assistant service;


detecting the first command keyword event, wherein detecting the first command keyword event comprises after detecting a second sound via the one or more microphones, 


determining that the detected second sound includes a second voice input comprising a first command keyword, wherein the first command keyword is one of a plurality of command keywords supported by the playback device;


determining that one or more command keyword conditions corresponding to the second voice input are satisfied, wherein determining that the one or more command keyword conditions corresponding to the second voice input are satisfied comprises determining that the second voice input excludes a wake word;

after detecting the first command keyword event, determining that one or more playback conditions corresponding to the detected first command keyword in the second voice input are satisfied, wherein satisfying the one or more playback conditions indicates that playback is in a state where a first playback command corresponding to the detected first command keyword could be performed; and


in response to (a) detecting the first command keyword event, 







(b) determining that one or more command keyword conditions corresponding to the second voice input are satisfied, and 


(c) determining that the one or more playback conditions corresponding to the first command keyword are satisfied, (i) outputting audible feedback that a first command keyword event was detected and (ii) performing the first playback command corresponding to the first command keyword, wherein the playback device forgoes outputting of the audible feedback when at least one of the following conditions is not satisfied: (1) the one or more command keyword conditions corresponding to the second voice input or (2) the one or more playback conditions corresponding to the first command keyword.



[square-bracketed] limitations.  However, PASKO (US 10,777,203 B1) teaches determining a local engine is unable to resolve the intent (Col 11:15-18 – “In the illustrative example, the NLU component 144 was unable to resolve the first local ASR result 302 into an intent, so the first local NLU result 304 represents a failure to recognize an intent”), and sending the input to a server for resolving the intent (Col 11:19-23 – “At 206 of the process 200, the speech interface device 102 may send the audio data 116 to a remote speech processing system 120 executing on a remote system 104. The audio data 116 may be send over a wide area network 118 at block 206.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of the US PAT to include resolving an intent at a remote server, as taught by PASKO.
One of ordinary skill would have been motivated to include resolving an intent at a remote server, in order to accurately parse the user input.

Other independent claims 9 and 17 are also similar to the independent claims 9 and 15 of the US PAT.
With respect to the dependent claims, each of the claims maps to a corresponding dependent claim of the US PAT or are found within the scope of the independent claim.


Allowable Subject Matter
Claims 1-20 are allowed over the prior art of record.  The following is the examiner’s statement of reasons for allowance:
The closest relevant prior art (which is discussed in further detail below), either taken individually or in combination, fails to explicitly teach or reasonably suggest the invention as represented by the independent claims 1, 9, and 17.
Most pertinent prior art:
GANONG (US 2014/0274203 A1) discloses a playback device comprising: a network interface (GANONG Par 42 – “Alternatively, or in addition to, wireless communication component 160 may include a wireless transceiver capable of communicating with one or more other networks or external devices. For example, wireless communication component 160 may include a component configured to communication via the IEEE 802.11 standard (Wi-Fi) to connect to a local area network (LAN), wide area network (WAN) such as the Internet, and/or may include a Bluetooth® transceiver to connect to a Bluetooth® compatible device.”); one or more microphones configured to detect sound (GANONG Par 41 – “which includes an input capable of receiving acoustic input (e.g., one or more microphones). Mobile device 100 includes one or more transducers 130 for converting acoustic energy to electrical energy and vice versa. For example, transducers 130 may include one or more speakers and/or one or more microphones arranged on the mobile device to allow input/output (I/O) of acoustic information.”); at least one speaker (GANONG Par 41 – “For example, transducers 130 may include one or more speakers and/or one or more microphones arranged on the mobile device to allow input/output (I/O) of acoustic information.””); one or more processors (GANONG Par 123 – “Exemplary system components of a mobile device may include a primary processor 115, a secondary processor 125 and an audio codec 105, all illustrated for convenience and clarity of illustration as being interconnected via a common bus 155.”); data storage having instructions stored thereon that are executable by the one or more processors to cause the playback device to perform functions (GANONG Par 185 – “To perform functionality and/or techniques described herein, the processor 1010 may execute one or more instructions stored in one or more computer-readable storage media (e.g., the memory 1020, storage media, etc.), which may serve as non-transitory computer-readable storage media storing instructions for execution by processor 1010.”) comprising: 
monitoring an input sound-data stream representing the sound detected by the one or more microphones for (i) a wake-word event (GANONG Fig. 8; Par 158 – “Voice response system 850 is configured to be responsive to voice even when the mobile device 800 is operating in a low power mode. In the example shown in FIGS. 8A and 8B, user 890 has spoken the words “Hello, Dragon” to wake-up the mobile device and engage the voice response system, or otherwise utilize functionality of the mobile device. “Hello, Dragon,” in this example, represents an explicit voice trigger understood by voice response system 850.”) and (ii) a media playback system keyword event (GANONG Par 8 – “Some embodiments include a method of monitoring an acoustic environment of a mobile device for voice commands when the mobile device is operating in a low power mode,… and using at least one contextual cue to assist in detecting whether the acoustic input includes a voice command.”; Par 145 – “For example, according to some embodiments, acoustic input 705 may undergo limited vocabulary ASR to perform keyword spotting, any technique for which may be used to identify whether acoustic input 705 contains any words deemed suggestive of a voice command and/or to identify words needed to perform classification.”); 
detecting a first media playback system keyword event (GANONG Par 69 – “For example, if the acoustic input is determined to include an explicit voice trigger, the voice response system may be readied to expect one or more voice commands to act upon. If the acoustic input includes an actionable voice command, initiation of the processes to perform the actions needed to respond to the voice command may be invoked.”; Par 33 – “According to some embodiments, the acoustic environment of a mobile device is monitored to receive acoustic input when the mobile device is in a low power mode and to detect when the acoustic input includes a voice command.”; Par 145 – “For example, according to some embodiments, acoustic input 705 may undergo limited vocabulary ASR to perform keyword spotting, any technique for which may be used to identify whether acoustic input 705 contains any words deemed suggestive of a voice command and/or to identify words needed to perform classification.”), wherein detecting the first media playback system keyword event comprises after detecting a first sound via the one or more microphones (GANONG Par 47 – “For example, one or more microphones may sense acoustic activity in the environment and obtain the resulting acoustic input for further processing to assess whether the acoustic input includes a voice command.”), determining, [with at least a threshold confidence], that the detected first sound includes a first media playback system keyword (GANONG Par 150 – “Voice commands such as “next track,” “previous track,” “repeat track,” “pause music,” “decrease volume,” “increase volume,” etc. may be performed without having to exit a low power mode.”), wherein the first media playback system keyword is one of a plurality of command keywords supported by the playback device (GANONG Par 150 – “In this example, the fact that a music application or player is executing on the mobile device in a low power mode may also operate as a contextual cue to bias the evaluation of the acoustic input to assist in detecting voice commands related to the music player (e.g., the music player being operational may be used to select a processing stage that includes limited vocabulary ASR, wherein the limited vocabulary is selected to include terms frequently associated with controlling a music player such as one or any combination of “track,” “volume,” “resume,” “pause,” “repeat,” “skip,” “shuffle,” etc., or any other word or term deemed suggestive of a voice command to control the music player).”);
in response to detecting the first media playback system keyword event, processing, via a local voice input engine of a media playback system voice assistant, the first sound as a first voice input, wherein processing the first sound comprises:
(i) determining that one or more media playback system keyword conditions corresponding to the first media playback system keyword are satisfied (GANONG Par 150 – “In this example, the fact that a music application or player is executing on the mobile device in a low power mode may also operate as a contextual cue to bias the evaluation of the acoustic input to assist in detecting voice commands related to the music player (e.g., the music terms frequently associated with controlling a music player such as one or any combination of “track,” “volume,” “resume,” “pause,” “repeat,” “skip,” “shuffle,” etc., or any other word or term deemed suggestive of a voice command to control the music player).”; Par 58 – “If it is determined that the acoustic input includes a voice command, the voice response system may initiate one or more processes to respond to the voice command (act 230). For example, the voice response system may perform further language processing to understand what the voice command means and engage the necessary procedures/components required to undertake carrying out the directives of the voice command. Otherwise, the mobile device may discontinue further processing of the acoustic input and ignore it as spurious acoustic activity (e.g., non-speech sounds, background noise, speech not corresponding to a voice command or, according to some embodiments, speech from one or more people that are not the user of the mobile device, as discussed in further detail below). The voice response system may then continue to monitor the acoustic environment to obtain further acoustic input (e.g., the voice response system may return to or continue to perform act 210).”); and 
[(ii) determining that local voice input engine is unable to determine an intent of the first voice input, wherein determining that local voice input engine is unable to determine the intent of the first voice input comprises determining that one or more parameter slots associated with the first media playback system keyword are not matched with keywords in the first voice input; 


based on (a) detecting the first media playback system keyword event and (b) determining that local voice input engine is unable to determine the intent of the first voice input, sending, via the network interface, sound data corresponding to at least a portion of the first voice input to one or more servers of the media playback system voice assistant for processing of the first voice input; 


after receiving data indicating one or more first playback operations according to an intent of the first voice input as determined by the one or more servers of the media playback system voice assistant], performing the one or more first playback operations (GANONG Par 69 – “If the acoustic input includes an actionable voice command, initiation of the processes to perform the actions needed to respond to the voice command may be invoked.”; Par 132 – “The voice response system may then be readied to process subsequent acoustic input expected to follow the explicit voice trigger, or to further process the acoustic input if it includes an actionable voice command in addition to the explicit voice trigger. The further processing may engage the primary processor to assist in understanding the voice command and/or to carry out the directives of the voice command.”); 

detecting a first wake-word event (GANONG Par 59 – “Initiating further processing may include evaluating or modifying the evaluation of subsequently received acoustic input, for example, when the detected voice command includes an explicit voice trigger.”), wherein detecting the first wake-word event comprises after detecting a second sound via the one or more microphones (GANONG Par 158 – “In the example shown in FIGS. 8A and 8B, user 890 has spoken the words “Hello, Dragon” to wake-up the mobile device and engage the voice response system, or otherwise utilize functionality of the mobile device. “Hello, Dragon,” in this example, represents an explicit voice trigger understood by voice response system 850. The user's speech may be detected by one or more microphones, located on mobile device, that has been kept at least partially on and enabled in order to monitor the acoustic environment of the mobile device.”), determining that the detected second sound includes a second voice input comprising a first wake word (GANONG Par 69 – “For example, if the acoustic input is determined to include an explicit voice trigger, the voice response system may be readied to expect one or more voice commands to act upon. If the acoustic input includes an actionable voice command, initiation of the processes to perform the actions needed to respond to the voice command may be invoked.”; Par 86 – “Limited vocabulary ASR may be used to perform explicit voice trigger detection. For example, an exemplary speech processing stage may include performing ASR using a vocabulary restricted to the words in the explicit voice trigger phrase (which may include as few as a single word.). For example, for the explicit voice trigger “Hello, Dragon,” the vocabulary may be restricted to the two words “Hello” and “Dragon.” By limiting the vocabulary to the words permitted in an explicit voice trigger, ASR may be performed using little processing to assess whether the acoustic input includes a voice command (e.g., whether the acoustic input includes the explicit voice trigger).”); and 

in response to detecting the first wake-word event, streaming, via the network interface, sound data corresponding to at least a portion of the second voice input to one or more remote servers of a first voice assistant service (GANONG Figs. 7 and 8 – “Server(s)”; Par 173 – “According to some embodiments, acoustic input may be transmitted to ASR component 930 to be recognized. The acoustic input may be processed in any suitable manner prior to providing the acoustic input to ASR component 930. For example, the acoustic input may be pre-processed to remove information, format the acoustic input or modify the acoustic input in preparation for ASR (e.g., the acoustic input may be formatted to conform with a desired audio format and/or prepared for streaming as an audio stream or prepared as an appropriate audio file) so that the acoustic input can be provided as an audio input to ASR component 930 (e.g., transmitted over a network).”).

[square-bracketed] limitations as shown above.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN C KIM whose telephone number is (571)272-3327. The examiner can normally be reached Monday to Friday 8:00 AM thru 4:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JONATHAN C KIM/Primary Examiner, Art Unit 2655