DETAILED ACTION

Remarks
Claims 16, 17, 19-24, 26-31, and 33-35 have been examined and rejected. This Office action is responsive to the amendment filed on 09/02/2021, which has been entered in the above identified application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 16, 17, 19-24, 26-31, and 33-35 are rejected under 35 U.S.C. 103 as being unpatentable over Williams et al. (US 9898250 B1, published 02/20/2018), hereinafter Williams.


Regarding claim 16, Williams teaches the claim comprising:
A media playback system comprising: a first playback device located in a zone of the media playback system; and a second playback device located in the zone, wherein the second playback device includes: at least one microphone; one or more processors; and tangible computer-readable memory storing instructions that, when executed by the one or more processors, cause the second playback device to perform operations for outputting a feedback element, the operations comprising (Williams Figs. 1-18; col. 8 [line 18], the system 100 may enable the user 10 to instruct the server(s) 112 to generate output audio 30 using any combination of the speaker(s) 20; user 10 may control the output audio 30 (e.g., select an audio source 40, adjust a volume, stop or mute the output audio 30, or the like), control the output devices generating the output audio 30 (e.g., generate output audio 30 in one or more output zones), or the like using spoken commands; the device(s) 110 may be located in a house and the system 100 may generate the output audio 30 in one or more rooms of the house; the house may include multiple speaker systems (e.g., speaker(s) 20) that are not connected to the device(s) 110 and the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); col. 8 [line 46], an audio capture component, such as a microphone of device 110, captures audio 11 corresponding to a spoken utterance; col. 17 [line 33], the server(s) 112 may generate output audio 30 using the speakers 20a-1/20a-2/20b/20c and/or device 110c in Room 1, Room 3 and/or Room 4 of the house 440; the device 110c (e.g., television) may act as an input device (e.g., include a microphone array configured to receive the input audio 11) and as an output device (e.g., include speakers configured to generate the output audio 30); while devices 110a, 110b-1 and 110b-2 are included as input devices, they may generate output audio 30 without departing from the present disclosure; col. 18 [line 44], FIG. 5A illustrates output devices located in house 540a, such as device 110a in Room 1, speaker 20a-1 and speaker 20a-2 in Room 1, device 110c (e.g., television) in Room 1, speaker 20b in Room 3 and speaker 20c in Room 4; col. 19 [line 8], the user 10 and/or the server(s) 112 may select output devices and generate output zones, as illustrated in FIG; a house 540b illustrated in FIG. 5C include the device 110a, the device 110c and the speakers 20a in Zone 1, speaker 20b in Zone 2 and speaker 20c in Zone 4, as illustrated by interface 520 shown in FIG. 5D; an output zone may include input devices and/or output devices in multiple rooms; Zone 5 (not shown) may include Zone 1, Zone 2, Zone 3 and Zone 4 and may be used to generate output audio 30 all over the house 540b; col. 22 [line 13], the device 110a may be included in the master association table 602 as an input device (e.g., microphone) and as an output device (e.g., speaker); col. 36 [line 12], each of these devices (110/112) may include one or more controllers/processors (1604/1704), that may each include a central processing unit (CPU) for processing data and computer-readable instructions, and a memory (1606/1706) for storing data and instructions of the respective device; each device may also include a data storage component (1608/1708), for storing data and controller/processor-executable instructions):
synchronously playing back media content via the first playback device at a first volume level and via the second playback device at a second volume level (Williams Figs. 1-18; col. 3 [line 37], a volume of the audio; col. 3 [line 54] – col. 4 [line 14], a speaker controller 22 may control multiple speakers 20 and may send audio data to the multiple speakers 20 so that the multiple speakers 20 collectively generate output audio 30; col. 8 [line 18], the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio 30 over the speaker(s) 20 while displaying output video on a television; col. 19 [line 8], an output zone may include input devices and/or output devices in multiple rooms; Zone 5 (not shown) may include Zone 1, Zone 2, Zone 3 and Zone 4 and may be used to generate output audio 30 all over the house 540b);
receiving voice input data via the at least one microphone; determining an audible response corresponding to a command request in the voice input data, the audible response distinct from the media content (Williams Figs. 1-18; abs. the system may generate voice output and send the voice output to the speakers, along with a command to reduce a volume of output audio while playing the voice output; col. 8 [line 18], the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); col. 26 [line 28], FIGS. 10A-10B illustrate communication and operations among devices to determine that a voice command is being received and lower a volume of corresponding output audio; FIG. 10A illustrates an example of the server(s) 112 sending output audio data to speaker(s) 20 directly when receiving the voice command; as illustrated in FIG. 10A, the server(s) 112 may send (1006) output audio data to the speaker(s) 20 and the speaker(s) 20 may play (1008) output audio using the output audio data; while the speaker(s) 20 are playing the output audio, a device 110 may receive (1010) input audio; the device 110 may identify a wakeword in the input audio; col. 25 [line 51] – col. 26 [line 2], the output audio 960 may include the music playing at a first volume and the voice output playing at a second volume higher than the first volume; col. 32 [line 4], after the system 100 interprets a command from the input audio data, the system 100 may generate the voice output data and send the voice output data to the speaker(s) 20; col. 32 [line 21], while the speaker(s) 20 is playing the output audio, an input device 110 may receive (1214) input audio and send (1216) input audio data to the server(s) 112; the server(s) 112 may determine (1218) a first command from the input audio data; the server(s) 112 may generate (1220) voice output data corresponding to the first command);
after determining the audible response, playing back, via the second playback device, the audible response via the second playback device; while playing back the audible response via the second playback device, reducing playback volume of the media content via the second playback device from the second volume level to a third, lower volume level (Williams Figs. 1-18; abs. the system receives voice commands and may determine speakers playing output audio in proximity to the voice commands; the system may generate voice output and send the voice output to the speakers, along with a command to reduce a volume of output audio while playing the voice output; col. 8 [line 1], the user 10 hears the voice output at a first volume and the music at a second, lower, volume; col. 8 [line 18], the device(s) 110 may be located in a house and the system 100 may generate the output audio 30 in one or more rooms of the house; the house may include multiple speaker systems (e.g., speaker(s) 20) that are not connected to the device(s) 110 and the system 100 may control the multiple speaker systems to play music from an audio source; col. 25 [line 51] – col. 26 [line 2], the output audio 960 may include the music playing at a first volume and the voice output playing at a second volume higher than the first volume; col. 29 [line 55], the server(s) 112 may identify specific output devices and the speaker controller 22 can forward the command to the identified output devices; col. 30 [line 17], FIG. 11C illustrates a third example of the server(s) 112 indicating an input device 110 and/or a location of an input device 110 and the speaker controller 22 determining output devices corresponding to the input device/location; col. 31 [line 15], the server(s) 112 may send an instruction to the speaker controller 22 indicating (e.g., using an identification, location, address and/or a combination thereof) a specific speaker 20 and the speaker controller 22 may control corresponding speaker(s) 20 in response to the instruction; col. 32 [line 4], after the system 100 interprets a command from the input audio data, the system 100 may generate the voice output data and send the voice output data to the speaker(s) 20; the speaker(s) 20 may reduce a volume of the output audio from the first volume level to a second volume level while generating the voice output, then increase the volume of the output audio from the second volume level to the first volume level; col. 32 [line 21], while the speaker(s) 20 is playing the output audio, an input device 110 may receive (1214) input audio; the server(s) 112 may determine (1218) a first command from the input audio data; the server(s) 112 may generate (1220) voice output data corresponding to the first command, may generate (1222) a second command to lower a volume of the output data, and may send (1224) the second command and the voice output data to the speaker controller 22; col. 32 [line 44], the speaker controller 22 may determine (1226) output devices, as discussed above with regard to FIGS. 11A-11C, and may send (1228) the second command and the voice output data to the speaker(s) 20; the speaker(s) 20 may lower (1230) the volume of the output audio from a first volume level to a second volume level, play (1232) voice output using the voice output data and raise (1234) the volume of the output audio from the second volume level to the first volume level; the server(s) 112 may instruct the speaker(s) 20 to reduce the volume of the output audio while playing the voice output; the first command may be a query (e.g., “What is the date?”) and the speaker(s) 20 may lower a volume of the output audio (e.g., music being played), play voice output responding to the query (e.g., “Today's date is March 23rd”) and raise the volume of the output audio);
and after playing back the audible response, resuming playback volume of the media content via the second playback device at the second volume level (Williams Figs. 1-18; col. 27 [line 36], the system 100 may execute the command and resume a previous volume of the output audio; col. 32 [line 44], the speaker(s) 20 may lower (1230) the volume of the output audio from a first volume level to a second volume level, play (1232) voice output using the voice output data and raise (1234) the volume of the output audio from the second volume level to the first volume level)
Williams does not expressly disclose playing back, via the second playback device, the audible response via the second playback device without playing back the audible response via the first playback device and while playing back the audible response via the second playback device, reducing playback volume of the media content via the second playback device without reducing playback volume of the media content via the first playback device from the first volume level.  However, Williams discloses the system 100 may control the multiple speaker systems to collectively play audio from a single audio source (col. 8 [line 18], col. 3 [line 54] – col. 4 [line 14]).  Williams further discloses the system receives voice commands and may determine speakers playing output audio in proximity to the voice commands.  The system may generate voice output and send the voice output to the speakers, along with a command to reduce a volume of output audio while playing the voice output (abs.).  Williams further discloses the output audio 960 may include the music playing at a first volume and the voice output playing at a second volume higher than the first volume (col. 25 [line 51] – col. 26 [line 2]).  Williams further discloses the server(s) 112 may generate (1220) voice output data corresponding to the first command, may generate (1222) a second command to lower a volume of the output data, and may send (1224) the second command and the voice output data to the speaker controller 22 (col. 32 [line 21]).  Williams further discloses the speaker controller 22 may determine (1226) output devices, as discussed above with regard to FIGS. 11A-11C, and may send (1228) the second command and the voice output data to one or multiple speakers 20 (command to lower volume and output voice sent may be sent to only a single device while multiple devices are playing). The speaker 20 may lower (1230) the volume of the output audio from a first volume level to a second volume level, play (1232) voice output using the voice output data and raise (1234) the volume of the output audio from the second volume level to the first volume level (col. 32 [line 44]).  Williams further discloses the speaker controller determines one or more devices to receive the command based on a location (col. 30 [line 4] - col. 30 [line 34]).  Williams further discloses that the speaker controller determining an output device based on location includes determining a single device that is closest (col. 22 [line 54] – col. 23 [line 2]).  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated playing back, via the second playback device, the audible response via the second playback device without playing back the audible response via the first playback device and while playing back the audible response via the second playback device, and reducing playback volume of the media content via the second playback device without reducing playback volume of the media content via the first playback device from the first volume level (Figs. 1-18; abs. col. 8 [line 18], col. 22 [line 54] – col. 23 [line 2], col. 25 [line 51] – col. 26 [line 2], col. 30 [line 4] - col. 30 [line 34], col. 32 [line 44]).  Doing so would be desirable because using the techniques described herein, a user is able to conveniently interact with multiple entertainment systems/speakers at one time using voice commands (see col. 2 [line 49]).  Additionally, when playing music in multiple zones, a zone comprising multiple rooms, or throughout an entire house (col. 8 [line 18], col. 18 [line 44], col. 19 [line 8]), sending a command to lower the volume and output a voice response on the device or devices closest to the user (abs. col. 22 [line 54] – col. 23 [line 2], col. 30 [line 4] - col. 30 [line 34], col. 32 [line 44]) would minimize network traffic, conserve system resources, and enable other listeners to consume audio output without interruption.  Additionally, other non-proximate users may be confused as to why their volume has been lowered or why the system is outputting a verbal response when they did not issue a spoken command.

Regarding claims 23 and 30, claims 23 and 30 contain substantially similar limitations to those found in claim 16, the only difference being wherein the first and second playback devices are located within the same zone of a listening environment (Williams Figs. 1-18; col. 18 [line 44], FIG. 5A illustrates output devices located in house 540a, such as device 110a in Room 1, speaker 20a-1 and speaker 20a-2 in Room 1, device 110c (e.g., television) in Room 1, speaker 20b in Room 3 and speaker 20c in Room 4; col. 19 [line 8], the user 10 and/or the server(s) 112 may select output devices and generate output zones, as illustrated in FIG; a house 540b illustrated in FIG. 5C include the device 110a, the device 110c and the speakers 20a in Zone 1, speaker 20b in Zone 2 and speaker 20c in Zone 4, as illustrated by interface 520 shown in FIG. 5D; an output zone may include input devices and/or output devices in multiple rooms; Zone 5 (not shown) may include Zone 1, Zone 2, Zone 3 and Zone 4 and may be used to generate output audio 30 all over the house 540b) and receiving voice input data via at least one microphone of the second playback device, the voice input data including a command request (Williams Figs. 1-18; abs. the system receives voice commands and may determine speakers playing output audio in proximity to the voice commands; the system may generate voice output and send the voice output to the speakers, along with a command to reduce a volume of output audio while playing the voice output; col. 8 [line 46], communication between various components illustrated in FIG. 2 may occur directly or across a network 199; an audio capture component, such as a microphone of device 110, captures audio 11 corresponding to a spoken utterance; col. 17 [line 33], the server(s) 112 may generate output audio 30 using the speakers 20a-1/20a-2/20b/20c and/or device 110c in Room 1, Room 3 and/or Room 4 of the house 440; the device 110c (e.g., television) may act as an input device (e.g., include a microphone array configured to receive the input audio 11) and as an output device (e.g., include speakers configured to generate the output audio 30); while devices 110a, 110b-1 and 110b-2 are included as input devices, they may generate output audio 30 without departing from the present disclosure; col. 22 [line 13], the device 110a may be included in the master association table 602 as an input device (e.g., microphone) and as an output device (e.g., speaker); col. 32 [line 21], while the speaker(s) 20 is playing the output audio, an input device 110 may receive (1214) input audio and send (1216) input audio data to the server(s) 112; the server(s) 112 may determine (1218) a first command from the input audio data; the server(s) 112 may generate (1220) voice output data corresponding to the first command).  Consequently, claims 23 and 30 are rejected for the same reasons.

Regarding claim 17, Williams teaches all the limitations of claim 16, further comprising:
outputting, via the second playback device, the audible response at a fourth volume level while the media content plays back via the first playback device and via the second playback device at the third volume level (Williams Figs. 1-18; abs. the system receives voice commands and may determine speakers playing output audio in proximity to the voice commands; the system may generate voice output and send the voice output to the speakers, along with a command to reduce a volume of output audio while playing the voice output; col. 8 [line 1], the user 10 hears the voice output at a first volume and the music at a second, lower, volume; col. 25 [line 51] – col. 26 [line 2], the output audio 960 may include the music playing at a first volume and the voice output playing at a second volume higher than the first volume; col. 32 [line 4], after the system 100 interprets a command from the input audio data, the system 100 may generate the voice output data and send the voice output data to the speaker(s) 20; the speaker(s) 20 may reduce a volume of the output audio from the first volume level to a second volume level while generating the voice output, then increase the volume of the output audio from the second volume level to the first volume level; col. 32 [line 21], while the speaker(s) 20 is playing the output audio, an input device 110 may receive (1214) input audio; the server(s) 112 may determine (1218) a first command from the input audio data; the server(s) 112 may generate (1220) voice output data corresponding to the first command, may generate (1222) a second command to lower a volume of the output data, and may send (1224) the second command and the voice output data to the speaker controller 22; col. 32 [line 44], the speaker controller 22 may determine (1226) output devices, as discussed above with regard to FIGS. 11A-11C, and may send (1228) the second command and the voice output data to the speaker(s) 20; the speaker(s) 20 may lower (1230) the volume of the output audio from a first volume level to a second volume level, play (1232) voice output using the voice output data and raise (1234) the volume of the output audio from the second volume level to the first volume level; the server(s) 112 may instruct the speaker(s) 20 to reduce the volume of the output audio while playing the voice output)
Williams does not expressly disclose outputting, via the second playback device, the audible response at a fourth volume level while the media content plays back via the first playback device at the first volume level and via the second playback device at the third volume level.  However, Williams discloses the system 100 may control the multiple speaker systems to collectively play audio from an audio source (col. 8 [line 18], col. 3 [line 54] – col. 4 [line 14]).  Williams further discloses the system receives voice commands and may determine speakers playing output audio in proximity to the voice commands.  The system may generate voice output and send the voice output to the speakers, along with a command to reduce a volume of output audio while playing the voice output (abs.).  Williams further discloses the server(s) 112 may generate (1220) voice output data corresponding to the first command, may generate (1222) a second command to lower a volume of the output data, and may send (1224) the second command and the voice output data to the speaker controller 22 (col. 32 [line 21]).  Williams further discloses the speaker controller 22 may determine (1226) output devices, as discussed above with regard to FIGS. 11A-11C, and may send (1228) the second command and the voice output data to one or multiple speakers 20 (command to lower volume and output voice sent may be sent to only a single device while multiple devices are playing).  Williams further discloses the speaker controller determines one or more devices to receive the command based on a location (col. 30 [line 4] - col. 30 [line 34]).  Williams further discloses that the speaker controller determining an output device based on location includes determining a single device that is closest (col. 22 [line 54] – col. 23 [line 2]).  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated outputting, via the second playback device, the audible response at a fourth volume level while the media content plays back via the first playback device at the first volume level and via the second playback device at the third volume level (Figs. 1-18; abs. col. 8 [line 18], col. 22 [line 54] – col. 23 [line 2], col. 25 [line 51] – col. 26 [line 2], col. 30 [line 4] - col. 30 [line 34], col. 32 [line 44]).  Doing so would be desirable because there using the techniques described herein, a user is able to conveniently interact with multiple entertainment systems/speakers at one time using voice commands (see col. 2 [line 49]).  Additionally, when playing music in multiple zones, a zone comprising multiple rooms, or throughout an entire house (col. 8 [line 18], col. 18 [line 44], col. 19 [line 8]), sending a command to lower the volume and output a voice response on the device or devices closest to the user (abs. col. 22 [line 54] – col. 23 [line 2], col. 30 [line 4] - col. 30 [line 34], col. 32 [line 44]) would minimize network traffic, conserve system resources, and enable other listeners to consume audio output without interruption.  Additionally, other non-proximate users may be confused as to why their volume has been lowered or why the system is outputting a verbal response when they did not issue a spoken command.

Regarding claims 24 and 31, claims 24 and 31 contain substantially similar limitations to those found in claim 17.  Consequently, claims 24 and 31 are rejected for the same reasons.

Regarding claim 19, Williams teaches all the limitations of claim 17, further comprising:
wherein causing output of the audible response further comprises outputting, via the second playback device, the audible response at the fourth volume level while playing back, via the second playback device, the media content at the third volume level in synchrony with the media content playing back, via the first playback device (Williams Figs. 1-18; abs. the system receives voice commands and may determine speakers playing output audio in proximity to the voice commands; the system may generate voice output and send the voice output to the speakers, along with a command to reduce a volume of output audio while playing the voice output; col. 3 [line 54] – col. 4 [line 14], a speaker controller 22 may control multiple speakers 20 and may send audio data to the multiple speakers 20 so that the multiple speakers 20 collectively generate output audio 30; col. 8 [line 1], the user 10 hears the voice output at a first volume and the music at a second, lower, volume; col. 8 [line 18], the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio 30 over the speaker(s) 20 while displaying output video on a television; col. 19 [line 8], a house 540b illustrated in FIG. 5C include the device 110a, the device 110c and the speakers 20a in Zone 1, speaker 20b in Zone 2 and speaker 20c in Zone 4, as illustrated by interface 520 shown in FIG. 5D; an output zone may include input devices and/or output devices in multiple rooms; Zone 5 (not shown) may include Zone 1, Zone 2, Zone 3 and Zone 4 and may be used to generate output audio 30 all over the house 540b; col. 25 [line 51] – col. 26 [line 2], the output audio 960 may include the music playing at a first volume and the voice output playing at a second volume higher than the first volume; col. 32 [line 4], after the system 100 interprets a command from the input audio data, the system 100 may generate the voice output data and send the voice output data to the speaker(s) 20; the speaker(s) 20 may reduce a volume of the output audio from the first volume level to a second volume level while generating the voice output, then increase the volume of the output audio from the second volume level to the first volume level; col. 32 [line 21], while the speaker(s) 20 is playing the output audio, an input device 110 may receive (1214) input audio; the server(s) 112 may determine (1218) a first command from the input audio data; the server(s) 112 may generate (1220) voice output data corresponding to the first command, may generate (1222) a second command to lower a volume of the output data, and may send (1224) the second command and the voice output data to the speaker controller 22; col. 32 [line 44], the speaker controller 22 may determine (1226) output devices, as discussed above with regard to FIGS. 11A-11C, and may send (1228) the second command and the voice output data to the speaker(s) 20; the speaker(s) 20 may lower (1230) the volume of the output audio from a first volume level to a second volume level, play (1232) voice output using the voice output data and raise (1234) the volume of the output audio from the second volume level to the first volume level; the server(s) 112 may instruct the speaker(s) 20 to reduce the volume of the output audio while playing the voice output)
Williams does not expressly disclose outputting, via the second playback device, the audible response at the fourth volume level while playing back, via the second playback device, the media content at the third volume level in synchrony with the media content playing back, via the first playback device, at the first volume level.  However, Williams discloses the system 100 may control the multiple speaker systems to collectively play audio from an audio source (col. 8 [line 18], col. 3 [line 54] – col. 4 [line 14]).  Williams further discloses the system receives voice commands and may determine speakers playing output audio in proximity to the voice commands.  The system may generate voice output and send the voice output to the speakers, along with a command to reduce a volume of output audio while playing the voice output (abs.).  Williams further discloses the server(s) 112 may generate (1220) voice output data corresponding to the first command, may generate (1222) a second command to lower a volume of the output data, and may send (1224) the second command and the voice output data to the speaker controller 22 (col. 32 [line 21]).  Williams further discloses the speaker controller 22 may determine (1226) output devices, as discussed above with regard to FIGS. 11A-11C, and may send (1228) the second command and the voice output data to one or multiple speakers 20 (command to lower volume and output voice sent may be sent to only a single device while multiple devices are playing).  Williams further discloses the speaker controller determines one or more devices to receive the command based on a location (col. 30 [line 4] - col. 30 [line 34]).  Williams further discloses that the speaker controller determining an output device based on location includes determining a single device that is closest (col. 22 [line 54] – col. 23 [line 2]).  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated outputting, via the second playback device, the audible response at the fourth volume level while playing back, via the second playback device, the media content at the third volume level in synchrony with the media content playing back, via the first playback device, at the first volume level (Figs. 1-18; abs. col. 3 [line 54] – col. 4 [line 14], col. 8 [line 18], col. 22 [line 54] – col. 23 [line 2], col. 25 [line 51] – col. 26 [line 2], col. 30 [line 4] - col. 30 [line 34], col. 32 [line 44]).  Doing so would be desirable because there using the techniques described herein, a user is able to conveniently interact with multiple entertainment systems/speakers at one time using voice commands (see col. 2 [line 49]).  Additionally, when playing music in multiple zones, a zone comprising multiple rooms, or throughout an entire house (col. 8 [line 18], col. 18 [line 44] , col. 19 [line 8]), sending a command to lower the volume and output a voice response on the device or devices closest to the user (abs. col. 22 [line 54] – col. 23 [line 2], col. 30 [line 4] - col. 30 [line 34], col. 32 [line 44]) would minimize network traffic, conserve system resources, and enable other listeners to consume audio output without interruption.  Additionally, other non-proximate users may be confused as to why their volume has been lowered or why the system is outputting a verbal response when they did not issue a spoken command.

Regarding claims 26 and 33, claims 26 and 33 contain substantially similar limitations to those found in claim 19.  Consequently, claims 26 and 33 are rejected for the same reasons.

Regarding claim 20, Williams teaches all the limitations of claim 17, further comprising:
wherein synchronously playing back the media content via the first playback device at a first volume level and via the second playback device at a second volume level comprises playing back a first channel of the media content via the first playback device and playing back a second channel of the media content via the second playback device, the system further comprising: a third playback device located in the zone, wherein the instructions further include instructions for (Williams Figs. 1-18; col. 3 [line 54] – col. 4 [line 14], a speaker controller 22 may control multiple speakers 20 and may send audio data to the multiple speakers 20 so that the multiple speakers 20 collectively generate output audio 30; col. 8 [line 18], the system 100 may enable the user 10 to instruct the server(s) 112 to generate output audio 30 using any combination of the speaker(s) 20; user 10 may control the output audio 30 (e.g., select an audio source 40, adjust a volume, stop or mute the output audio 30, or the like), control the output devices generating the output audio 30 (e.g., generate output audio 30 in one or more output zones), or the like using spoken commands; the device(s) 110 may be located in a house and the system 100 may generate the output audio 30 in one or more rooms of the house; the house may include multiple speaker systems (e.g., speaker(s) 20) that are not connected to the device(s) 110 and the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); col. 8 [line 46], an audio capture component, such as a microphone of device 110, captures audio 11 corresponding to a spoken utterance; col. 18 [line 44], FIG. 5A illustrates output devices located in house 540a, such as device 110a in Room 1, speaker 20a-1 and speaker 20a-2 in Room 1, device 110c (e.g., television) in Room 1, speaker 20b in Room 3 and speaker 20c in Room 4; col. 19 [line 8], the user 10 and/or the server(s) 112 may select output devices and generate output zones, as illustrated in FIG; a house 540b illustrated in FIG. 5C include the device 110a, the device 110c and the speakers 20a in Zone 1, speaker 20b in Zone 2 and speaker 20c in Zone 4, as illustrated by interface 520 shown in FIG. 5D; an output zone may include input devices and/or output devices in multiple rooms; Zone 5 (not shown) may include Zone 1, Zone 2, Zone 3 and Zone 4 and may be used to generate output audio 30 all over the house 540b)
playing back, via the third playback device, a third channel of the media content at the second volume level, and wherein causing output of the audible response further comprises outputting, via the second playback device, the audible response at the fourth volume level while the third channel of the media content plays back via the third playback device (Williams Figs. 1-18; abs. the system receives voice commands and may determine speakers playing output audio in proximity to the voice commands; the system may generate voice output and send the voice output to the speakers, along with a command to reduce a volume of output audio while playing the voice output; col. 8 [line 1], the user 10 hears the voice output at a first volume and the music at a second, lower, volume; col. 25 [line 51] – col. 26 [line 2], the output audio 960 may include the music playing at a first volume and the voice output playing at a second volume higher than the first volume; col. 8 [line 18], the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio 30 over the speaker(s) 20 while displaying output video on a television; col. 19 [line 8], a house 540b illustrated in FIG. 5C include the device 110a, the device 110c and the speakers 20a in Zone 1, speaker 20b in Zone 2 and speaker 20c in Zone 4, as illustrated by interface 520 shown in FIG. 5D; an output zone may include input devices and/or output devices in multiple rooms; Zone 5 (not shown) may include Zone 1, Zone 2, Zone 3 and Zone 4 and may be used to generate output audio 30 all over the house 540b; col. 32 [line 4], after the system 100 interprets a command from the input audio data, the system 100 may generate the voice output data and send the voice output data to the speaker(s) 20; the speaker(s) 20 may reduce a volume of the output audio from the first volume level to a second volume level while generating the voice output, then increase the volume of the output audio from the second volume level to the first volume level; col. 32 [line 21], while the speaker(s) 20 is playing the output audio, an input device 110 may receive (1214) input audio; the server(s) 112 may determine (1218) a first command from the input audio data; the server(s) 112 may generate (1220) voice output data corresponding to the first command, may generate (1222) a second command to lower a volume of the output data, and may send (1224) the second command and the voice output data to the speaker controller 22; col. 32 [line 44], the speaker controller 22 may determine (1226) output devices, as discussed above with regard to FIGS. 11A-11C, and may send (1228) the second command and the voice output data to the speaker(s) 20; the speaker(s) 20 may lower (1230) the volume of the output audio from a first volume level to a second volume level, play (1232) voice output using the voice output data and raise (1234) the volume of the output audio from the second volume level to the first volume level; the server(s) 112 may instruct the speaker(s) 20 to reduce the volume of the output audio while playing the voice output)
Williams does not expressly disclose outputting, via the second playback device, the audible response at the fourth volume level while the third channel of the media content plays back via the third playback device at the second volume level.  However, Williams discloses the system 100 may control the multiple speaker systems to collectively play audio from a single audio source (col. 8 [line 18], col. 3 [line 54] – col. 4 [line 14]), where the multiple speaker systems comprise multiple first, second, and third playback devices (col. 18 [line 44], col. 19 [line 8]).  Williams further discloses the system receives voice commands and may determine speakers playing output audio in proximity to the voice commands.  The system may generate voice output and send the voice output to the speakers, along with a command to reduce a volume of output audio while playing the voice output (abs.).  Williams further discloses the server(s) 112 may generate (1220) voice output data corresponding to the first command, may generate (1222) a second command to lower a volume of the output data, and may send (1224) the second command and the voice output data to the speaker controller 22 (col. 32 [line 21]).  Williams further discloses the speaker controller 22 may determine (1226) output devices, as discussed above with regard to FIGS. 11A-11C, and may send (1228) the second command and the voice output data to one or multiple speakers 20 (command to lower volume and output voice sent may be sent to only a single device while multiple devices are playing).  Williams further discloses the speaker controller determines one or more devices to receive the command based on a location (col. 30 [line 4] - col. 30 [line 34]).  Williams further discloses that the speaker controller determining an output device based on location includes determining a single device that is closest (col. 22 [line 54] – col. 23 [line 2]).  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated outputting, via the second playback device, the audible response at the fourth volume level while the third channel of the media content plays back via the third playback device at the second volume level (Figs. 1-18; abs. col. 8 [line 18], col. 22 [line 54] – col. 23 [line 2], col. 25 [line 51] – col. 26 [line 2], col. 30 [line 4] - col. 30 [line 34], col. 32 [line 44]).  Doing so would be desirable because there using the techniques described herein, a user is able to conveniently interact with multiple entertainment systems/speakers at one time using voice commands (see col. 2 [line 49]).  Additionally, when playing music in multiple zones, a zone comprising multiple rooms, or throughout an entire house (col. 8 [line 18], col. 18 [line 44], col. 19 [line 8]), sending a command to lower the volume and output a voice response on the device or devices closest to the user (abs. col. 22 [line 54] – col. 23 [line 2], col. 30 [line 4] - col. 30 [line 34], col. 32 [line 44]) would minimize network traffic, conserve system resources, and enable other listeners to consume audio output without interruption.  Additionally, other non-proximate users may be confused as to why their volume has been lowered or why the system is outputting a verbal response when they did not issue a spoken command.

Regarding claims 27 and 34, claims 27 and 34 contain substantially similar limitations to those found in claim 20.  Consequently, claims 27 and 34 are rejected for the same reasons.

Regarding claim 21, Williams teaches all the limitations of claim 16, further comprising:
wherein the first and second playback devices are part of a home theater system (Williams Figs. 1-18; col. 8 [line 18], the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio 30 over the speaker(s) 20 while displaying output video on a television; col. 18 [line 44], FIG. 5A illustrates output devices located in house 540a, such as device 110a in Room 1, speaker 20a-1 and speaker 20a-2 in Room 1, device 110c (e.g., television) in Room 1, speaker 20b in Room 3 and speaker 20c in Room 4)

Regarding claim 28, claim 28 contains substantially similar limitations to those found in claim 21.  Consequently, claims 28 is rejected for the same reasons.

Regarding claim 22, Williams teaches all the limitations of claim 16, further comprising:
wherein the instructions further include (a) determining a category of the media content being played back by the first and second playback devices, and (b) determining a type of the command request, and wherein determining the audible response is based on the category of media content and the type of command request (Williams Figs. 1-18; col. 8 [line 18], the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio 30 over the speaker(s) 20 while displaying output video on a television; col. 12 [line 34], to correctly perform NLU processing of speech input, the NLU process 260 may be configured to determine a “domain” of the utterance so as to determine and narrow down which services offered by the endpoint device (e.g., server(s) 112 or device 110) may be relevant; an endpoint device may offer services relating to interactions with a telephone service, a contact list service, a calendar/scheduling service, a music player service, etc; col. 13 [line 19], the device 110 may be associated with domains for different applications such as music, telephony, calendaring, contact lists, and device-specific communications; col. 13 [line 50], a query potentially implicates both communications and music; col. 13 [line 61], a music intent database may link words and phrases such as “quiet,” “volume off,” and “mute” to a “mute” intent; col. 15 [line 20], the NER modules 262 may also use contextual operational rules to fill slots; if a user had previously requested to pause a particular song and thereafter requested that the voice-controlled device to “please un-pause my music,” the NER module 262 may apply an inference-based rule to fill a slot associated with the name of the song that the user currently wishes to play—namely the song that was playing at the time that the user requested to pause the music; col. 35 [line 10], the user 10 may ask what song is playing; as the server(s) 112 are not involved in sending the audio data, to answer the question the server(s) 112 may require bidirectional communication with the speaker controller 22 enabling the server(s) 112 to already know and/or request the song title; col. 35 [line 33], the server(s) 112 may determine (1520) that the input audio data corresponds to a query of “What's playing; the server(s) 112 may determine an artist name and song title; col. 35 [line 45], the speaker(s) 20 may optionally lower (1530) a volume of the output audio (indicated by the dotted line) and may play (1532) voice output corresponding to voice output data; col. 35 [line 56], FIG. 15 illustrates an example of the server(s) 112 responding to a query of “What's playing”)

Regarding claims 29 and 35, claims 29 and 35 contain substantially similar limitations to those found in claim 22.  Consequently, claims 29 and 35 are rejected for the same reasons.

Response to Arguments
The Examiner acknowledges the Applicant’s amendments to claims 16, 17, 19, 20, 22-24, 26, 27, 29-31, and 33-35 and the cancellation of claims 18, 25, and 32.  Applicant’s arguments with respect to the claims have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Mackay (US 11411763 B2) see Figs. 1-6 and abs., col. 3 [line 28] – col. 4 [line 26]. 
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN T REPSHER III whose telephone number is (571)272-7487. The examiner can normally be reached Monday - Friday, 8AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch can be reached on (571) 272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JOHN T REPSHER III/            Primary Examiner, Art Unit 2143