DETAILED ACTION
This office action is in response to correspondence filed on 5/26/2022.  
Claims 1-21 remain pending in the application of which Claims 1, 9, and 17 are independent.  

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter
Claims 1-21 are allowed over the prior art of record.  The following is the examiner’s statement of reasons for allowance:
The closest relevant prior art (which is discussed in further detail below), either taken individually or in combination, fails to explicitly teach or reasonably suggest the invention as represented by the independent claims 1, 9, and 17.

Most pertinent prior art:
GANONG (US 2014/0274203 A1) discloses a playback device comprising: a network interface (GANONG Par 42 – “Alternatively, or in addition to, wireless communication component 160 may include a wireless transceiver capable of communicating with one or more other networks or external devices. For example, wireless communication component 160 may include a component configured to communication via the IEEE 802.11 standard (Wi-Fi) to connect to a local area network (LAN), wide area network (WAN) such as the Internet, and/or may include a Bluetooth® transceiver to connect to a Bluetooth® compatible device.”); one or more microphones configured to detect sound (GANONG Par 41 – “which includes an input capable of receiving acoustic input (e.g., one or more microphones). Mobile device 100 includes one or more transducers 130 for converting acoustic energy to electrical energy and vice versa. For example, transducers 130 may include one or more speakers and/or one or more microphones arranged on the mobile device to allow input/output (I/O) of acoustic information.”); at least one speaker (GANONG Par 41 – “For example, transducers 130 may include one or more speakers and/or one or more microphones arranged on the mobile device to allow input/output (I/O) of acoustic information.””); one or more processors (GANONG Par 123 – “Exemplary system components of a mobile device may include a primary processor 115, a secondary processor 125 and an audio codec 105, all illustrated for convenience and clarity of illustration as being interconnected via a common bus 155.”); data storage having instructions stored thereon that are executable by the one or more processors to cause the playback device to perform functions (GANONG Par 185 – “To perform functionality and/or techniques described herein, the processor 1010 may execute one or more instructions stored in one or more computer-readable storage media (e.g., the memory 1020, storage media, etc.), which may serve as non-transitory computer-readable storage media storing instructions for execution by processor 1010.”) comprising: 
monitoring an input sound-data stream representing the sound detected by the one or more microphones for (i) a wake-word event (GANONG Fig. 8; Par 158 – “Voice response system 850 is configured to be responsive to voice even when the mobile device 800 is operating in a low power mode. In the example shown in FIGS. 8A and 8B, user 890 has spoken the words “Hello, Dragon” to wake-up the mobile device and engage the voice response system, or otherwise utilize functionality of the mobile device. “Hello, Dragon,” in this example, represents an explicit voice trigger understood by voice response system 850.”) and (ii) a media playback system keyword event (GANONG Par 8 – “Some embodiments include a method of monitoring an acoustic environment of a mobile device for voice commands when the mobile device is operating in a low power mode,… and using at least one contextual cue to assist in detecting whether the acoustic input includes a voice command.”; Par 145 – “For example, according to some embodiments, acoustic input 705 may undergo limited vocabulary ASR to perform keyword spotting, any technique for which may be used to identify whether acoustic input 705 contains any words deemed suggestive of a voice command and/or to identify words needed to perform classification.”); 
detecting a first media playback system keyword event (GANONG Par 69 – “For example, if the acoustic input is determined to include an explicit voice trigger, the voice response system may be readied to expect one or more voice commands to act upon. If the acoustic input includes an actionable voice command, initiation of the processes to perform the actions needed to respond to the voice command may be invoked.”; Par 33 – “According to some embodiments, the acoustic environment of a mobile device is monitored to receive acoustic input when the mobile device is in a low power mode and to detect when the acoustic input includes a voice command.”; Par 145 – “For example, according to some embodiments, acoustic input 705 may undergo limited vocabulary ASR to perform keyword spotting, any technique for which may be used to identify whether acoustic input 705 contains any words deemed suggestive of a voice command and/or to identify words needed to perform classification.”), wherein detecting the first media playback system keyword event comprises after detecting a first sound via the one or more microphones (GANONG Par 47 – “For example, one or more microphones may sense acoustic activity in the environment and obtain the resulting acoustic input for further processing to assess whether the acoustic input includes a voice command.”), determining, [with at least a threshold confidence], that the detected first sound includes a first media playback system keyword (GANONG Par 150 – “Voice commands such as “next track,” “previous track,” “repeat track,” “pause music,” “decrease volume,” “increase volume,” etc. may be performed without having to exit a low power mode.”), wherein the first media playback system keyword is one of a plurality of command keywords supported by the playback device (GANONG Par 150 – “In this example, the fact that a music application or player is executing on the mobile device in a low power mode may also operate as a contextual cue to bias the evaluation of the acoustic input to assist in detecting voice commands related to the music player (e.g., the music player being operational may be used to select a processing stage that includes limited vocabulary ASR, wherein the limited vocabulary is selected to include terms frequently associated with controlling a music player such as one or any combination of “track,” “volume,” “resume,” “pause,” “repeat,” “skip,” “shuffle,” etc., or any other word or term deemed suggestive of a voice command to control the music player).”);
in response to detecting the first media playback system keyword event, processing, via a local voice input engine of a media playback system voice assistant, the first sound as a first voice input, wherein processing the first sound comprises:
(i) determining that one or more media playback system keyword conditions corresponding to the first media playback system keyword are satisfied (GANONG Par 150 – “In this example, the fact that a music application or player is executing on the mobile device in a low power mode may also operate as a contextual cue to bias the evaluation of the acoustic input to assist in detecting voice commands related to the music player (e.g., the music player being operational may be used to select a processing stage that includes limited vocabulary ASR, wherein the limited vocabulary is selected to include terms frequently associated with controlling a music player such as one or any combination of “track,” “volume,” “resume,” “pause,” “repeat,” “skip,” “shuffle,” etc., or any other word or term deemed suggestive of a voice command to control the music player).”; Par 58 – “If it is determined that the acoustic input includes a voice command, the voice response system may initiate one or more processes to respond to the voice command (act 230). For example, the voice response system may perform further language processing to understand what the voice command means and engage the necessary procedures/components required to undertake carrying out the directives of the voice command. Otherwise, the mobile device may discontinue further processing of the acoustic input and ignore it as spurious acoustic activity (e.g., non-speech sounds, background noise, speech not corresponding to a voice command or, according to some embodiments, speech from one or more people that are not the user of the mobile device, as discussed in further detail below). The voice response system may then continue to monitor the acoustic environment to obtain further acoustic input (e.g., the voice response system may return to or continue to perform act 210).”); and 
[(ii) determining that local voice input engine is unable to determine an intent of the first voice input, wherein determining that local voice input engine is unable to determine the intent of the first voice input comprises determining that one or more parameter slots associated with the first media playback system keyword are not matched with keywords in the first voice input; 

based on (a) detecting the first media playback system keyword event and (b) determining that local voice input engine is unable to determine the intent of the first voice input, sending, via the network interface, sound data corresponding to at least a portion of the first voice input to one or more servers of the media playback system voice assistant for processing of the first voice input; 

after receiving data indicating one or more first playback operations according to an intent of the first voice input as determined by the one or more servers of the media playback system voice assistant], performing the one or more first playback operations (GANONG Par 69 – “If the acoustic input includes an actionable voice command, initiation of the processes to perform the actions needed to respond to the voice command may be invoked.”; Par 132 – “The voice response system may then be readied to process subsequent acoustic input expected to follow the explicit voice trigger, or to further process the acoustic input if it includes an actionable voice command in addition to the explicit voice trigger. The further processing may engage the primary processor to assist in understanding the voice command and/or to carry out the directives of the voice command.”); 

detecting a first wake-word event (GANONG Par 59 – “Initiating further processing may include evaluating or modifying the evaluation of subsequently received acoustic input, for example, when the detected voice command includes an explicit voice trigger.”), wherein detecting the first wake-word event comprises after detecting a second sound via the one or more microphones (GANONG Par 158 – “In the example shown in FIGS. 8A and 8B, user 890 has spoken the words “Hello, Dragon” to wake-up the mobile device and engage the voice response system, or otherwise utilize functionality of the mobile device. “Hello, Dragon,” in this example, represents an explicit voice trigger understood by voice response system 850. The user's speech may be detected by one or more microphones, located on mobile device, that has been kept at least partially on and enabled in order to monitor the acoustic environment of the mobile device.”), determining that the detected second sound includes a second voice input comprising a first wake word (GANONG Par 69 – “For example, if the acoustic input is determined to include an explicit voice trigger, the voice response system may be readied to expect one or more voice commands to act upon. If the acoustic input includes an actionable voice command, initiation of the processes to perform the actions needed to respond to the voice command may be invoked.”; Par 86 – “Limited vocabulary ASR may be used to perform explicit voice trigger detection. For example, an exemplary speech processing stage may include performing ASR using a vocabulary restricted to the words in the explicit voice trigger phrase (which may include as few as a single word.). For example, for the explicit voice trigger “Hello, Dragon,” the vocabulary may be restricted to the two words “Hello” and “Dragon.” By limiting the vocabulary to the words permitted in an explicit voice trigger, ASR may be performed using little processing to assess whether the acoustic input includes a voice command (e.g., whether the acoustic input includes the explicit voice trigger).”); and 

in response to detecting the first wake-word event, streaming, via the network interface, sound data corresponding to at least a portion of the second voice input to one or more remote servers of a first voice assistant service (GANONG Figs. 7 and 8 – “Server(s)”; Par 173 – “According to some embodiments, acoustic input may be transmitted to ASR component 930 to be recognized. The acoustic input may be processed in any suitable manner prior to providing the acoustic input to ASR component 930. For example, the acoustic input may be pre-processed to remove information, format the acoustic input or modify the acoustic input in preparation for ASR (e.g., the acoustic input may be formatted to conform with a desired audio format and/or prepared for streaming as an audio stream or prepared as an appropriate audio file) so that the acoustic input can be provided as an audio input to ASR component 930 (e.g., transmitted over a network).”).

However, GANONG fails to teach all the limitations recited in the independent claims.  Especially, GANONG fails to the [square-bracketed] limitations as shown above.

Any comments considered necessary by Applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled "Comments on Statement of Reasons for Allowance." 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Please see attached from PTO-892.



Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN C KIM whose telephone number is (571)272-3327. The examiner can normally be reached Monday to Friday 8:00 AM thru 4:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JONATHAN C KIM/Primary Examiner, Art Unit 2655