DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the response to this office action, the Examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.

Claim Objections
Claims 1-20 are objected to because of the following informalities: 
Claim 1 recites “at least one non-transitory computer-readable medium comprising program instructions that executable by the at least one processor such that the playback device is configured to: …” which should be -- at least one non-transitory computer-readable medium comprising program instructions that are executed by the at least one processor perform: …-- for clarification because (1) the word “executable” means a executable capability of the “instructions”, not necessarily executed by the “processor” and (2) the entire claim 1 is under a phrase “such that” which would be interpreted as intended purpose, i.e., the claimed “playback device” is merely intended to perform the claimed functions, or merely claimed a capability of the “device” to perform the claimed functions, but not placed in an action of claimed functions. Claims 2-9, 19 are objected due to the dependencies to claim 1.
Claim 10 is objected for the at least similar reason as described in claim 1 above since claim 10 recited similar deficient features as recited in claim 1. Claims 11-18 are objected due to the dependencies to claim 10.
Claim 20 is objected for the at least similar reason as described in claim 1 above since claim 20 recited similar deficient features as recited in claim 1.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(B)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims ? are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which applicant regards as the invention.
Claim 1 recites “A playback device comprising: an audio input interface; an audio stage comprising an audio processor and an audio amplifier; one or more speakers, one or more microphones; at least one processor; and … non-transitory computer-readable medium comprising program instructions that executable by the at least one processor such that the playback device is configured to: (functions)” which is confusing because it is unclear whether the claimed “playback device is configured to (perform functions)” is referred to “audio input interface” to perform the claimed functions, “an audio processor” to perform the claimed functions, “an audio amplifier” to perform the claimed functions, …, or “at least one processor” to perform the claimed functions, and thus, renders claim indefinite. Claim 1 further recites “determine an nth frame of an output signal, wherein the program instructions that are … such that the playback device is configured to determine the nth frame of the output signal comprises program instructions to determine the nth frame …” which is further confusing because it is unclear what “comprises (underscored for emphasis)” herein is referred to and it is unclear whether the “comprises” herein is referred to “the output signal”, “playback device”,  “at least one processor”, or “output signal” and thus, further renders claim indefinite. Claim 1 further recites “wherein the program instructions that are executable … is configured to determine the n+1th instance of the adaptive filter …” which is further confusing about “the program instructions” because it is unclear whether “the program instructions” is referred back to “program instructions” as recited in line 27 of claim 1, or to “program instructions” as recited in line 7 of claim 1 and thus, further rendered indefinite. Claim 1 further recites “for the next iteration of the AEC comprises comprises program instructions …” which is the similar unclear whether “comprises” herein is referred back to “playback device”, “the next iteration”, “the adaptive filter”, etc. and thus, further renders claim indefinite. Claims 2-9, 19 are rejected due to the dependencies to claim 1.
Claim 2 is further rejected for the at least similar reason as described in claim 1 above because claim 2 recites the similar deficient feature as recited in claim 1, e.g., claim 2 recites “the program instructions” which is unclear whether “the program instructions” is referred back to several claimed “program instructions” as recited in the parent claim 1 or “program instructions” as recited in claim 2 and thus, further renders claim indefinite. 
Claim 3 is further rejected for the at least similar reason as described in claim 2 above since claim 3 recited the similar deficient features as recited in claim 2 above, e.g., claim 3 recites “the program instructions”. 
Claim 4 is further rejected for the at least similar reason as described in claim 1 above since claim 4 recited the similar deficient features as recited in claim 1 above, e.g., claim 4 recites “the program instructions” and “… comprises”. Claim 5-8 are rejected due to the dependencies to claim 4. 
Claim 5 is further rejected for the at least similar reasons as described in claim 4 above since claim 5 recited the similar deficient features as recited in claim 4 above, e.g., claim 5 recites “the program instructions” and “… comprises”. 
Claim 6 is further rejected for the at least similar reasons as described in claim 4 above since claim 6 recited the similar deficient features as recited in claim 4 above, e.g., claim 6 recites “the program instructions” and “… comprises”.
Claim 7 is further rejected for the at least similar reasons as described in claim 4 above since claim 7 recited the similar deficient features as recited in claim 4 above, e.g., claim 7 recites “the program instructions” and “… comprises”.
Claim 9 is further rejected for the at least similar reasons as described in claim 4 above since claim 9 recited the similar deficient features as recited in claim 4 above, e.g., claim 9 recites “the program instructions” and “… comprises”.
Claim 10 is rejected for the at least similar reasons as described in claim 1 above since claim 10 recited the similar deficient features as recited in claim 1 above. Claims 11-18 are rejected due to the dependencies to claim 10.
Claim 11 is rejected for the at least similar reason as described in claim 2 above since claim 11 recites the similar deficient features as recited in claim 2.
Claim 12 is rejected for the at least similar reason as described in claim 3 above since claim 12 recites the similar deficient features as recited in claim 3.
Claim 13 is rejected for the at least similar reason as described in claim 4 above since claim 13 recites the similar deficient features as recited in claim 4. Claims 14-18 are rejected due to the dependencies to claim 13.
Claim 14 is rejected for the at least similar reason as described in claim 5 above since claim 14 recites the similar deficient features as recited in claim 5.
Claim 15 is rejected for the at least similar reason as described in claim 6 above since claim 15 recites the similar deficient features as recited in claim 6.
Claim 16 is rejected for the at least similar reason as described in claim 7 above since claim 16 recites the similar deficient features as recited in claim 7.
Claim 18 is rejected for the at least similar reason as described in claim 9 above since claim 18 recites the similar deficient features as recited in claim 9.
Claim 19 recites “The system of claim 1” and “wherein the system further comprises a networked-microphone device comprising a second network interface, the one or more microphones, the at least one processor, and the at least one non-transitory computer-readable medium …” and wherein “the system” herein has an insufficient antecedent basis for the limitation in claim 19, which causes confusing because it is unclear what “the system” is and it is unclear whether “the one or more microphones” “the at least one processor”, etc., are included in “the system” as recited in claim 19 or included in “playback device” as recited in the parent claim 1 and thus, renders claim indefinite.
Claim 20 is rejected for the at least similar reasons as described in claim 1 above since claim 20 recited the similar deficient features as recited in claim 1 above, e.g., claim 20 recites multiple “program instructions” and then “the program instructions” and “… comprises” 

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.  A nonstatutory obviousness-type double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and  In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the conflicting application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. 
Effective January 1, 1994, a registered attorney or agent of record may sign a terminal disclaimer. A terminal disclaimer signed by the assignee must fully comply with 37 CFR 3.73(b).

Claims 1-20 rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claim 1-8, 10 of U.S. Patent No. 11,017,789 B2 in view of Beckhardt et al. (US 20130336499 A1, hereinafter Beckhardt). 
The conflicting claims 1-8, 10 of the U.S. Patent No. 11,017,789 B2 do not explicitly teach an audio input interface by which receiving one or more audio signals is performed as recited in claims 1, 10, and 20 and do not explicitly teach features of dependent claims 2, 3, similar to claims 11, 12, including the features “one or more first channels of multi-channel audio content”, “a group coordinator of a bonded zone that includes one or more additional playback devices” and “send, via an network interface, data representing the one or more second channels” “to one or more additional playback devices”, and “the playback device is configured to playback the one or more first channels” “in synchrony with playback of the one or more second channels” “by the one or more additional playback devices” as recited in claims 2, 11 and features “receive, via the audio input interface, the one or more audio signals from a television” as recited in claim 3, 12. Beckhardt teaches a similar playback device (e.g., primary zone player 600 in fig. 6) and further teaches “an audio input interface” to receive one or more audio signals (audio interface 610 in fig. 6; receive and transmit audio information, such as audio data from DVD, Blu-ray disc, etc., para 81, and television audio channels, para 28-29), “one or more first channels of multi-channel audio content (play multichannel audio, para 25; e.g., left, right, subwoofer channels, etc., para 28)”, a group coordinator of a bonded zone that includes one or more additional playback devices (controller 500 with a primary zone player 600 in figs. 5-6; responsible to setting up low latency channels to transmit audio signals and control signals, para 29)” and “send, via an network interface, data representing the one or more second channels (via the established links to send audio channel data to zone players and satellite playback devices in fig. 8, 10-11)” “to one or more additional playback devices (the satellite playback devices and other zone players in fig. 8, 10-11)”, and “the playback device is configured to playback the one or more first channels (via speakers 418 of the primary zone player 600 in fig. 6)” “in synchrony with playback of the one or more second channels” “by the one or more additional playback devices (stereo effects of a sound are reproduced and enhanced through two zone players 106, 108 in synchrony with other zone players, para 49; played in a party, para 53; synchronizing a multichannel audio environment, para 59; synchrony playback a list of identical audio sources, para 67)” as recited in claims 2, 11 and features “receive, via the audio input interface, the one or more audio signals from a television (receiving television audio channels through the audio interface 610 in fig. 6, para 28-29)” for benefits of enhancing an multi-channel audio playback by reducing audio signal processing and transmission legacy (providing low-latency delivery and playback of audio, abstract, and overcome audible delays or hiccups, para 29, para 67) and improving users’ perception of home theater presentation (para 101). Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to apply the audio input interface by which receiving one or more audio signals is performed and one or more first channels of multi-channel audio content”, “a group coordinator of a bonded zone that includes one or more additional playback devices, and send, via the network interface, the data representing the one or more second channels to the one or more additional playback devices,  the playback device is configured to playback the one or more first channels in the synchrony with playback of the one or more second channels by the one or more additional playback devices, and receive, via the audio input interface, the one or more audio signals from a television, as taught by Beckhardt, to the system, as taught by conflicting claims 1-8, 10 of the U.S. Patent No. 11,017,789 B2, for the benefits discussed above. A comparison of claims 1-20 of the instant application with the conflicting claims 1-8, 10 in U.S. Patent No. 11,017,789 B2 is listed below for reference:
Claim(s) in the current application
Conflicting claim(s) in U.S. Patent No. 11,017,789 B2



Claim(s) in the current application
Conflicting claim(s) in U.S. Patent No. 11,017,789 B2
 1. A playback device comprising: an audio input interface; an audio stage comprising an audio processor and an audio amplifier; one or more speakers; one or more microphones; at least one processor; and at least one non-transitory computer-readable medium comprising program instructions that are executable by the at least one processor such that the playback device is configured to: receive, via the audio input interface, one or more audio signals; play back at least one audio signal of the one or more audio signals via the one or more speakers and the audio stage; while playing back the at least one audio signal, capture, via the one or more microphones, audio within an acoustic environment, wherein at least a portion of the captured audio represents sound produced by the one or more speakers in playing back the at least one audio signal via the one or more speakers; receive at least one playback signal from the audio stage representing the at least one audio signal being played back by the one or more speakers and the audio stage; transform into a short time Fourier transform (STFT) domain the captured audio within the acoustic environment to generate a measured signal representing actual acoustic echo; transform into the STFT domain the received playback signal from the audio stage to generate a reference signal; during each n.sup.th iteration of an acoustic echo canceller (AEC): determine an n.sup.th frame of an output signal, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to determine the n.sup.th frame of the output signal comprises program instructions that are executable by the at least one processor such that the playback device is configured to: (i) generate an n.sup.th frame of a model signal representing estimated acoustic echo by passing an n.sup.th frame of the reference signal through an n.sup.th instance of an adaptive filter; and (ii) generate the n.sup.th frame of the output signal by differencing the n.sup.th frame of the model signal and an n.sup.th frame of the measured signal; determine a n+1.sup.th instance of the adaptive filter for a next iteration of the AEC, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to determine the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC comprises comprises program instructions that are executable by the at least one processor such that the playback device is configured to: (i) estimate an n.sup.th frame of an error signal, the n.sup.th frame of the error signal representing a difference between the n.sup.th frame of the measured signal and the n.sup.th frame of the model signal; (ii) convert the n.sup.th frame of an error signal to an n.sup.th update filter; (iii) deactivate inactive portions of the n.sup.th update filter, the inactive portions having less than a threshold energy; (iv) generate the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC by summing the n.sup.th instance of the adaptive filter with the n.sup.th update filter; and send the output signal as a voice input to one or more voice assistants for processing of the voice input.


2. The playback device of claim 1, wherein the at least one audio signal of the one or more audio signals comprises one or more first channels of multi-channel audio content, wherein the one or more audio signals comprises one or more second channels of the multi-channel audio content, wherein the playback device comprises a network interface, wherein the playback device is a group coordinator of a bonded zone that includes one or more additional playback devices, and wherein at least one non-transitory computer readable medium further comprises program instructions that are executable by the at least one processor such that the playback device is configured to: send, via the network interface, data representing the one or more second channels of the multi-channel audio content to one or more additional playback devices; and wherein the program instructions that are executable by the at least one processor such that the playback device is configured to play back the at least one audio signal comprises program instructions that are executable by the at least one processor such that the playback device is configured to play back the one or more first channels of multi-channel audio content in synchrony with playback of the one or more second channels of the multi-channel audio content by the one or more additional playback devices.

3. The playback device of claim 1, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to play back the at least one audio signal comprises program instructions that are executable by the at least one processor such that the playback device is configured to: receive, via the audio input interface, the one or more audio signals from a television.

4. The playback device of claim 1, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to convert the n.sup.th frame of an error signal to the n.sup.th update filter comprises program instructions that are executable by the at least one processor such that the playback device is configured to determine a normalized least mean square (NMLS) of the n.sup.th frame of the error signal, and wherein the program instructions that are executable by the at least one processor such that the playback device is configured to deactivate inactive portions of the n.sup.th update filter comprises program instructions that are executable by the at least one processor such that the playback device is configured to determine a sparse NMLS of the n.sup.th frame of the error signal by applying to the NMLS of the n.sup.th frame of the error signal, a sparse partition criterion that zeroes out frequency bands of the NMLS having less than the threshold energy.

5. The playback device of claim 4, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to determine the normalized least mean square (NMLS) of the n.sup.th frame of the error signal comprises program instructions that are executable by the at least one processor such that the playback device is configured to apply a frequency-dependent regularization parameter to adapt an NMLS learning rate of change between AEC iterations according to a magnitude of the measured signal.

6. The playback of claim 4, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to convert the sparse NMLS of the n.sup.th frame of the error signal to the n.sup.th update filter comprises program instructions that are executable by the at least one processor such that the playback device is configured to: convert the sparse NMLS of the n.sup.th frame to a matrix of filter coefficients; and cross-band filter the matrix of filter coefficients to generate the n.sup.th update filter.

7. The playback device of claim 4, wherein at least one non-transitory computer readable medium further comprises program instructions that are executable by the at least one processor such that the playback device is configured to: before determination of the NMLS of the n.sup.th frame of the error signal, apply an error recovery non-linearity function to the error signal to limit the error signal to a threshold magnitude, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to determine the normalized least mean square (NMLS) of the n.sup.th frame of the error signal comprises program instructions that are executable by the at least one processor such that the playback device is configured to determine the NMLS of the n.sup.th frame of the limited error signal.

8. The playback device of claim 7, wherein the error recovery non-linearity function comprises a non-linear clipping function that limits portions of the error signal that are above the threshold magnitude to the threshold magnitude.

9. The playback device of claim 1, excluding a double-talk detector that disables the AEC when a double-talk condition is detected, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to capture audio within the acoustic environment comprises program instructions that are executable by the at least one processor such that the playback device is configured to capture audio signals representing sound produced by two or more voices.

10. A system comprising: a playback device comprising an audio input interface, one or more speakers, and an audio stage comprising an audio processor and an audio amplifier; one or more microphones; at least one processor; and at least one non-transitory computer-readable medium comprising program instructions that are executable by the at least one processor such that the system is configured to: receive, via the audio input interface, one or more audio signals; play back at least one audio signal of the one or more audio signals via the one or more speakers and the audio stage; while playing back the at least one audio signal, capture, via the one or more microphones, audio within an acoustic environment, wherein at least a portion of the captured audio represents sound produced by the one or more speakers in playing back the at least one audio signal via the one or more speakers; receive at least one playback signal from the audio stage representing the at least one audio signal being played back by the one or more speakers and the audio stage; transform into a short time Fourier transform (STFT) domain the captured audio within the acoustic environment to generate a measured signal representing actual acoustic echo; transform into the STFT domain the received playback signal from the audio stage to generate a reference signal; during each n.sup.th iteration of an acoustic echo canceller (AEC): determine an n.sup.th frame of an output signal, wherein the program instructions that are executable by the at least one processor such that the system is configured to determine the n.sup.th frame of the output signal comprises program instructions that are executable by the at least one processor such that the system is configured to: (i) generate an n.sup.th frame of a model signal representing estimated acoustic echo by passing an n.sup.th frame of the reference signal through an n.sup.th instance of an adaptive filter; and (ii) generate the n.sup.th frame of the output signal by differencing the n.sup.th frame of the model signal and an n.sup.th frame of the measured signal; determine a n+1.sup.th instance of the adaptive filter for a next iteration of the AEC, wherein the program instructions that are executable by the at least one processor such that the system is configured to determine the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC comprises program instructions that are executable by the at least one processor such that the system is configured to: (i) estimate an n.sup.th frame of an error signal, the n.sup.th frame of the error signal representing a difference between the n.sup.th frame of the measured signal and the n.sup.th frame of the model signal; (ii) convert the n.sup.th frame of an error signal to an n.sup.th update filter; (iii) deactivate inactive portions of the n.sup.th update filter, the inactive portions having less than a threshold energy; (iv) generate the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC by summing the n.sup.th instance of the adaptive filter with the n.sup.th update filter; and send the output signal as a voice input to one or more voice assistants for processing of the voice input.

11. The system of claim 10, wherein the at least one audio signal of the one or more audio signals comprises one or more first channels of multi-channel audio content, wherein the one or more audio signals comprises one or more second channels of the multi-channel audio content, wherein the system comprises a network interface, wherein the system is a group coordinator of a bonded zone that includes one or more additional playback devices, and wherein at least one non-transitory computer readable medium further comprises program instructions that are executable by the at least one processor such that the system is configured to: send, via the network interface, data representing the one or more second channels of the multi-channel audio content to one or more additional playback devices; and wherein the program instructions that are executable by the at least one processor such that the system is configured to play back the at least one audio signal comprises program instructions that are executable by the at least one processor such that the system is configured to play back the one or more first channels of multi-channel audio content in synchrony with playback of the one or more second channels of the multi-channel audio content by the one or more additional playback devices.

12. The system of claim 10, wherein the program instructions that are executable by the at least one processor such that the system is configured to play back the at least one audio signal comprises program instructions that are executable by the at least one processor such that the system is configured to: receive, via the audio input interface, the one or more audio signals from a television.

13. The system of claim 10, wherein the program instructions that are executable by the at least one processor such that the system is configured to convert the n.sup.th frame of an error signal to the n.sup.th update filter comprises program instructions that are executable by the at least one processor such that the system is configured to determine a normalized least mean square (NMLS) of the n.sup.th frame of the error signal, and wherein the program instructions that are executable by the at least one processor such that the system is configured to deactivate inactive portions of the n.sup.th update filter comprises program instructions that are executable by the at least one processor such that the system is configured to determine a sparse NMLS of the n.sup.th frame of the error signal by applying to the NMLS of the n.sup.th frame of the error signal, a sparse partition criterion that zeroes out frequency bands of the NMLS having less than the threshold energy.

14. The system of claim 13, wherein the program instructions that are executable by the at least one processor such that the system is configured to determine the normalized least mean square (NMLS) of the n.sup.th frame of the error signal comprises program instructions that are executable by the at least one processor such that the system is configured to apply a frequency-dependent regularization parameter to adapt an NMLS learning rate of change between AEC iterations according to a magnitude of the measured signal.

15. The playback of claim 13, wherein the program instructions that are executable by the at least one processor such that the system is configured to convert the sparse NMLS of the n.sup.th frame of the error signal to the n.sup.th update filter comprises program instructions that are executable by the at least one processor such that the system is configured to: convert the sparse NMLS of the n.sup.th frame to a matrix of filter coefficients; and cross-band filter the matrix of filter coefficients to generate the n.sup.th update filter.

16. The system of claim 13, wherein at least one non-transitory computer readable medium further comprises program instructions that are executable by the at least one processor such that the system is configured to: before determination of the NMLS of the n.sup.th frame of the error signal, apply an error recovery non-linearity function to the error signal to limit the error signal to a threshold magnitude, wherein the program instructions that are executable by the at least one processor such that the system is configured to determine the normalized least mean square (NMLS) of the n.sup.th frame of the error signal comprises program instructions that are executable by the at least one processor such that the system is configured to determine the NMLS of the n.sup.th frame of the limited error signal.

17. The system of claim 16, wherein the error recovery non-linearity function comprises a non-linear clipping function that limits portions of the error signal that are above the threshold magnitude to the threshold magnitude.

18. The system of claim 13, excluding a double-talk detector that disables the AEC when a double-talk condition is detected, wherein the program instructions that are executable by the at least one processor such that the system is configured to capture audio within the acoustic environment comprises program instructions that are executable by the at least one processor such that the system is configured to capture audio signals representing sound produced by two or more voices.

19. The system of claim 1, wherein the playback device comprises a first network interface, wherein the system further comprises a networked-microphone device comprising a second network interface, the one or more microphones, the at least one processor, and the at least one non-transitory computer-readable medium, and wherein the first network interface and the second network interface are configured to communicatively couple the playback device and the networked-microphone device.

20. A method comprising: receiving, via an audio input interface of a playback device, one or more audio signals; playing back at least one audio signal of the one or more audio signals via one or more speakers and an audio stage comprising an audio processor and an audio amplifier; while playing back the at least one audio signal, capturing, via one or more microphones, audio within an acoustic environment, wherein at least a portion of the captured audio represents sound produced by the one or more speakers in playing back the at least one audio signal via the one or more speakers; receiving at least one playback signal from the audio stage representing the at least one audio signal being played back by the one or more speakers and the audio stage; transforming into a short time Fourier transform (STFT) domain the captured audio within the acoustic environment to generate a measured signal representing actual acoustic echo; transforming into the STFT domain the received playback signal from the audio stage to generate a reference signal; during each n.sup.th iteration of an acoustic echo canceller (AEC): determining an n.sup.th frame of an output signal, wherein determining the n.sup.th frame of the output signal comprises: (i) generating an n.sup.th frame of a model signal representing estimated acoustic echo by passing an n.sup.th frame of the reference signal through an n.sup.th instance of an adaptive filter; and (ii) generating the n.sup.th frame of the output signal by differencing the n.sup.th frame of the model signal and an n.sup.th frame of the measured signal; determining a n+1.sup.th instance of the adaptive filter for a next iteration of the AEC, wherein determining the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC comprises: (i) estimating an n.sup.th frame of an error signal, the n.sup.th frame of the error signal representing a difference between the n.sup.th frame of the measured signal and the n.sup.th frame of the model signal; (ii) converting the n.sup.th frame of an error signal to an n.sup.th update filter; (iii) deactivating inactive portions of the n.sup.th update filter, the inactive portions having less than a threshold energy; (iv) generating the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC by summing the n.sup.th instance of the adaptive filter with the n.sup.th update filter; and sending the output signal as a voice input to one or more voice assistants for processing of the voice input.
1. A system comprising: an audio stage comprising an audio processor and an audio amplifier; one or more speakers; one or more microphones; one or more processors; data storage storing instructions executable by the one or more processors that cause the system to perform functions comprising: while audio content is playing back via the one or more speakers, capturing, via the one or more microphones, audio within an acoustic environment, wherein the captured audio comprises audio signals representing sound produced by the one or more speakers in playing back the audio content; receiving a playback signal from the audio stage representing the audio content being played back by the one or more speakers; transforming into a short time Fourier transform (STFT) domain the captured audio within the acoustic environment to generate a measured signal representing actual acoustic echo; transforming into the STFT domain the received playback signal from the audio stage to generate a reference signal; during each n.sup.th iteration of an acoustic echo canceller (AEC): determining an n.sup.th frame of an output signal, wherein determining the n.sup.th frame of the output signal comprises: generating an n.sup.th frame of a model signal representing estimated acoustic echo by passing an n.sup.th frame of the reference signal through an n.sup.th instance of an adaptive filter; and generating the n.sup.th frame of the output signal by differencing the n.sup.th frame of the model signal and an n.sup.th frame of the measured signal; determining a n+1.sup.th instance of the adaptive filter for a next iteration of the AEC, wherein determining the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC comprises: estimating an n.sup.th frame of an error signal, the n.sup.th frame of the error signal representing a difference between the n.sup.th frame of the measured signal and the n.sup.th frame of the model signal; converting the n.sup.th frame of an error signal to an n.sup.th update filter; deactivating inactive portions of the n.sup.th update filter, the inactive portions having less than a threshold energy; generating the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC by summing the n.sup.th instance of the adaptive filter with the n.sup.th update filter; and sending the output signal as a voice input to one or more voice services for processing of the voice input.

8. The system of claim 1, further comprising: a playback device comprising a first network interface and the one or more speakers; and a networked-microphone device comprising a second network interface, the one or more microphones, the one or more processors, and the data storage storing instructions executable by the one or more processors, wherein the first network interface and the second network interface are configured to communicatively couple the playback device and the networked-microphone device.



















































2. The system of claim 1, wherein converting the n.sup.th frame of an error signal to the n.sup.th update filter comprises determining a normalized least mean square (NMLS) of the n.sup.th frame of the error signal, and wherein deactivating inactive portions of the n.sup.th update filter comprises determining a sparse NMLS of the n.sup.th frame of the error signal by applying to the NMLS of the n.sup.th frame of the error signal, a sparse partition criterion that zeroes out frequency bands of the NMLS having less than the threshold energy.












5. The system of claim 2, wherein determining the normalized least mean square (NMLS) of the n.sup.th frame of the error signal comprises: applying a frequency-dependent regularization parameter to adapt an NMLS learning rate of change between AEC iterations according to a magnitude of the measured signal.






6. The system of claim 2, wherein converting the sparse NMLS of the n.sup.th frame of the error signal to the n.sup.th update filter comprises: converting the sparse NMLS of the n.sup.th frame to a matrix of filter coefficients; and cross-band filtering the matrix of filter coefficients to generate the n.sup.th update filter.







3. The system of claim 2, wherein the data storage further includes instructions that cause the system to perform functions comprising: before determining the NMLS of the n.sup.th frame of the error signal, applying an error recovery non-linearity function to the error signal to limit the error signal to a threshold magnitude, wherein determining the normalized least mean square (NMLS) of the n.sup.th frame of the error signal comprises determining the NMLS of the n.sup.th frame of the limited error signal.








4. The system of claim 3, wherein the error recovery non-linearity function comprises a non-linear clipping function that limits portions of the error signal that are above the threshold magnitude to the threshold magnitude.

7. The system of claim 1, excluding a double-talk detector that disables the AEC when a double-talk condition is detected, wherein capturing audio within the acoustic environment comprises capturing audio signals representing sound produced by two or more voices.







1. A system comprising: an audio stage comprising an audio processor and an audio amplifier; one or more speakers; one or more microphones; one or more processors; data storage storing instructions executable by the one or more processors that cause the system to perform functions comprising: while audio content is playing back via the one or more speakers, capturing, via the one or more microphones, audio within an acoustic environment, wherein the captured audio comprises audio signals representing sound produced by the one or more speakers in playing back the audio content; receiving a playback signal from the audio stage representing the audio content being played back by the one or more speakers; transforming into a short time Fourier transform (STFT) domain the captured audio within the acoustic environment to generate a measured signal representing actual acoustic echo; transforming into the STFT domain the received playback signal from the audio stage to generate a reference signal; during each n.sup.th iteration of an acoustic echo canceller (AEC): determining an n.sup.th frame of an output signal, wherein determining the n.sup.th frame of the output signal comprises: generating an n.sup.th frame of a model signal representing estimated acoustic echo by passing an n.sup.th frame of the reference signal through an n.sup.th instance of an adaptive filter; and generating the n.sup.th frame of the output signal by differencing the n.sup.th frame of the model signal and an n.sup.th frame of the measured signal; determining a n+1.sup.th instance of the adaptive filter for a next iteration of the AEC, wherein determining the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC comprises: estimating an n.sup.th frame of an error signal, the n.sup.th frame of the error signal representing a difference between the n.sup.th frame of the measured signal and the n.sup.th frame of the model signal; converting the n.sup.th frame of an error signal to an n.sup.th update filter; deactivating inactive portions of the n.sup.th update filter, the inactive portions having less than a threshold energy; generating the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC by summing the n.sup.th instance of the adaptive filter with the n.sup.th update filter; and sending the output signal as a voice input to one or more voice services for processing of the voice input.

8. The system of claim 1, further comprising: a playback device comprising a first network interface and the one or more speakers; and a networked-microphone device comprising a second network interface, the one or more microphones, the one or more processors, and the data storage storing instructions executable by the one or more processors, wherein the first network interface and the second network interface are configured to communicatively couple the playback device and the networked-microphone device.

















































2. The system of claim 1, wherein converting the n.sup.th frame of an error signal to the n.sup.th update filter comprises determining a normalized least mean square (NMLS) of the n.sup.th frame of the error signal, and wherein deactivating inactive portions of the n.sup.th update filter comprises determining a sparse NMLS of the n.sup.th frame of the error signal by applying to the NMLS of the n.sup.th frame of the error signal, a sparse partition criterion that zeroes out frequency bands of the NMLS having less than the threshold energy.











5. The system of claim 2, wherein determining the normalized least mean square (NMLS) of the n.sup.th frame of the error signal comprises: applying a frequency-dependent regularization parameter to adapt an NMLS learning rate of change between AEC iterations according to a magnitude of the measured signal.






6. The system of claim 2, wherein converting the sparse NMLS of the n.sup.th frame of the error signal to the n.sup.th update filter comprises: converting the sparse NMLS of the n.sup.th frame to a matrix of filter coefficients; and cross-band filtering the matrix of filter coefficients to generate the n.sup.th update filter.






3. The system of claim 2, wherein the data storage further includes instructions that cause the system to perform functions comprising: before determining the NMLS of the n.sup.th frame of the error signal, applying an error recovery non-linearity function to the error signal to limit the error signal to a threshold magnitude, wherein determining the normalized least mean square (NMLS) of the n.sup.th frame of the error signal comprises determining the NMLS of the n.sup.th frame of the limited error signal.








4. The system of claim 3, wherein the error recovery non-linearity function comprises a non-linear clipping function that limits portions of the error signal that are above the threshold magnitude to the threshold magnitude.

7. The system of claim 1, excluding a double-talk detector that disables the AEC when a double-talk condition is detected, wherein capturing audio within the acoustic environment comprises capturing audio signals representing sound produced by two or more voices.






8. The system of claim 1, further comprising: a playback device comprising a first network interface and the one or more speakers; and a networked-microphone device comprising a second network interface, the one or more microphones, the one or more processors, and the data storage storing instructions executable by the one or more processors, wherein the first network interface and the second network interface are configured to communicatively couple the playback device and the networked-microphone device.

10. A method to be performed by a system comprising a playback device, the method comprising: while audio content is playing back via one or more speakers of the playback device, capturing, via one or more microphones, audio within an acoustic environment, wherein the captured audio comprises audio signals representing sound produced by the one or more speakers in playing back the audio content; receiving a playback signal from an audio stage of the playback device, the playback signal representing the audio content being played back by the one or more speakers; transforming into a short time Fourier transform (STFT) domain the captured audio within the acoustic environment to generate a measured signal representing actual acoustic echo; transforming into the STFT domain the received playback signal from the audio stage to generate a reference signal; during each n.sup.th iteration of an acoustic echo canceller (AEC): determining an n.sup.th frame of an output signal, wherein determining the n.sup.th frame of the output signal comprises: generating an n.sup.th frame of a model signal representing estimated acoustic echo by passing an n.sup.th frame of the reference signal through an n.sup.th instance of an adaptive filter; and generating the n.sup.th frame of the output signal by differencing the n.sup.th frame of the model signal and an n.sup.th frame of the measured signal; determining a n+1.sup.th instance of the adaptive filter for a next iteration of the AEC, wherein determining the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC comprises: estimating an n.sup.th frame of an error signal, the n.sup.th frame of the error signal representing a difference between the n.sup.th frame of the measured signal and the n.sup.th frame of the model signal; converting the n.sup.th frame of an error signal to an n.sup.th update filter; deactivating inactive portions of the n.sup.th update filter, the inactive portions having less than a threshold energy; generating the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC by summing the n.sup.th instance of the adaptive filter with the n.sup.th update filter; and sending the output signal as a voice input to one or more voice services for processing of the voice input.


Claims 1-20 rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claim 1-7, 9 of U.S. Patent No. 10,446,165 B2 in view of Beckhardt et al. (US 20130336499 A1). 
The conflicting claims 1-7, 9 of the U.S. Patent No. 10,446,165 B2 do not explicitly teach an audio input interface by which receiving one or more audio signals is performed as recited in claims 1, 10, and 20 and do not explicitly teach features of dependent claims 2, 3, similar to claims 11, 12, including the features “one or more first channels of multi-channel audio content”, “a group coordinator of a bonded zone that includes one or more additional playback devices” and “send, via an network interface, data representing the one or more second channels” “to one or more additional playback devices”, and “the playback device is configured to playback the one or more first channels” “in synchrony with playback of the one or more second channels” “by the one or more additional playback devices” as recited in claims 2, 11 and features “receive, via the audio input interface, the one or more audio signals from a television” as recited in claim 3, 12. Beckhardt teaches a similar playback device (e.g., primary zone player 600 in fig. 6) and further teaches “an audio input interface” to receive one or more audio signals (audio interface 610 in fig. 6; receive and transmit audio information, such as audio data from DVD, Blu-ray disc, etc., para 81, and television audio channels, para 28-29), “one or more first channels of multi-channel audio content (play multichannel audio, para 25; e.g., left, right, subwoofer channels, etc., para 28)”, a group coordinator of a bonded zone that includes one or more additional playback devices (controller 500 with a primary zone player 600 in figs. 5-6; responsible to setting up low latency channels to transmit audio signals and control signals, para 29)” and “send, via an network interface, data representing the one or more second channels (via the established links to send audio channel data to zone players and satellite playback devices in fig. 8, 10-11)” “to one or more additional playback devices (the satellite playback devices and other zone players in fig. 8, 10-11)”, and “the playback device is configured to playback the one or more first channels (via speakers 418 of the primary zone player 600 in fig. 6)” “in synchrony with playback of the one or more second channels” “by the one or more additional playback devices (stereo effects of a sound are reproduced and enhanced through two zone players 106, 108 in synchrony with other zone players, para 49; played in a party, para 53; synchronizing a multichannel audio environment, para 59; synchrony playback a list of identical audio sources, para 67)” as recited in claims 2, 11 and features “receive, via the audio input interface, the one or more audio signals from a television (receiving television audio channels through the audio interface 610 in fig. 6, para 28-29)” for benefits of enhancing an multi-channel audio playback by reducing audio signal processing and transmission legacy (providing low-latency delivery and playback of audio, abstract, and overcome audible delays or hiccups, para 29, para 67) and improving users’ perception of home theater presentation (para 101). Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to apply the audio input interface by which receiving one or more audio signals is performed and one or more first channels of multi-channel audio content”, “a group coordinator of a bonded zone that includes one or more additional playback devices, and send, via the network interface, the data representing the one or more second channels to the one or more additional playback devices,  the playback device is configured to playback the one or more first channels in the synchrony with playback of the one or more second channels by the one or more additional playback devices, and receive, via the audio input interface, the one or more audio signals from a television, as taught by Beckhardt, to the system, as taught by conflicting claims 1-8, 10 of the U.S. Patent No. 10,446,165 B2, for the benefits discussed above. A comparison of claims 1-20 of the instant application with the conflicting claims 1-8, 10 in U.S. Patent No. 10,446,165 B2 is listed below for reference:
Claim(s) in the current application
Conflicting claim(s) in U.S. Patent No. 10,446,165 B2



Claim(s) in the current application
Conflicting claim(s) in U.S. Patent No. 11,017,789 B2
 1. A playback device comprising: an audio input interface; an audio stage comprising an audio processor and an audio amplifier; one or more speakers; one or more microphones; at least one processor; and at least one non-transitory computer-readable medium comprising program instructions that are executable by the at least one processor such that the playback device is configured to: receive, via the audio input interface, one or more audio signals; play back at least one audio signal of the one or more audio signals via the one or more speakers and the audio stage; while playing back the at least one audio signal, capture, via the one or more microphones, audio within an acoustic environment, wherein at least a portion of the captured audio represents sound produced by the one or more speakers in playing back the at least one audio signal via the one or more speakers; receive at least one playback signal from the audio stage representing the at least one audio signal being played back by the one or more speakers and the audio stage; transform into a short time Fourier transform (STFT) domain the captured audio within the acoustic environment to generate a measured signal representing actual acoustic echo; transform into the STFT domain the received playback signal from the audio stage to generate a reference signal; during each n.sup.th iteration of an acoustic echo canceller (AEC): determine an n.sup.th frame of an output signal, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to determine the n.sup.th frame of the output signal comprises program instructions that are executable by the at least one processor such that the playback device is configured to: (i) generate an n.sup.th frame of a model signal representing estimated acoustic echo by passing an n.sup.th frame of the reference signal through an n.sup.th instance of an adaptive filter; and (ii) generate the n.sup.th frame of the output signal by differencing the n.sup.th frame of the model signal and an n.sup.th frame of the measured signal; determine a n+1.sup.th instance of the adaptive filter for a next iteration of the AEC, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to determine the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC comprises comprises program instructions that are executable by the at least one processor such that the playback device is configured to: (i) estimate an n.sup.th frame of an error signal, the n.sup.th frame of the error signal representing a difference between the n.sup.th frame of the measured signal and the n.sup.th frame of the model signal; (ii) convert the n.sup.th frame of an error signal to an n.sup.th update filter; (iii) deactivate inactive portions of the n.sup.th update filter, the inactive portions having less than a threshold energy; (iv) generate the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC by summing the n.sup.th instance of the adaptive filter with the n.sup.th update filter; and send the output signal as a voice input to one or more voice assistants for processing of the voice input.

4. The playback device of claim 1, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to convert the n.sup.th frame of an error signal to the n.sup.th update filter comprises program instructions that are executable by the at least one processor such that the playback device is configured to determine a normalized least mean square (NMLS) of the n.sup.th frame of the error signal, and wherein the program instructions that are executable by the at least one processor such that the playback device is configured to deactivate inactive portions of the n.sup.th update filter comprises program instructions that are executable by the at least one processor such that the playback device is configured to determine a sparse NMLS of the n.sup.th frame of the error signal by applying to the NMLS of the n.sup.th frame of the error signal, a sparse partition criterion that zeroes out frequency bands of the NMLS having less than the threshold energy.

6. The playback of claim 4, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to convert the sparse NMLS of the n.sup.th frame of the error signal to the n.sup.th update filter comprises program instructions that are executable by the at least one processor such that the playback device is configured to: convert the sparse NMLS of the n.sup.th frame to a matrix of filter coefficients; and cross-band filter the matrix of filter coefficients to generate the n.sup.th update filter.


2. The playback device of claim 1, wherein the at least one audio signal of the one or more audio signals comprises one or more first channels of multi-channel audio content, wherein the one or more audio signals comprises one or more second channels of the multi-channel audio content, wherein the playback device comprises a network interface, wherein the playback device is a group coordinator of a bonded zone that includes one or more additional playback devices, and wherein at least one non-transitory computer readable medium further comprises program instructions that are executable by the at least one processor such that the playback device is configured to: send, via the network interface, data representing the one or more second channels of the multi-channel audio content to one or more additional playback devices; and wherein the program instructions that are executable by the at least one processor such that the playback device is configured to play back the at least one audio signal comprises program instructions that are executable by the at least one processor such that the playback device is configured to play back the one or more first channels of multi-channel audio content in synchrony with playback of the one or more second channels of the multi-channel audio content by the one or more additional playback devices.

3. The playback device of claim 1, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to play back the at least one audio signal comprises program instructions that are executable by the at least one processor such that the playback device is configured to: receive, via the audio input interface, the one or more audio signals from a television.

5. The playback device of claim 4, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to determine the normalized least mean square (NMLS) of the n.sup.th frame of the error signal comprises program instructions that are executable by the at least one processor such that the playback device is configured to apply a frequency-dependent regularization parameter to adapt an NMLS learning rate of change between AEC iterations according to a magnitude of the measured signal.

6. The playback of claim 4, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to convert the sparse NMLS of the n.sup.th frame of the error signal to the n.sup.th update filter comprises program instructions that are executable by the at least one processor such that the playback device is configured to: convert the sparse NMLS of the n.sup.th frame to a matrix of filter coefficients; and cross-band filter the matrix of filter coefficients to generate the n.sup.th update filter.

7. The playback device of claim 4, wherein at least one non-transitory computer readable medium further comprises program instructions that are executable by the at least one processor such that the playback device is configured to: before determination of the NMLS of the n.sup.th frame of the error signal, apply an error recovery non-linearity function to the error signal to limit the error signal to a threshold magnitude, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to determine the normalized least mean square (NMLS) of the n.sup.th frame of the error signal comprises program instructions that are executable by the at least one processor such that the playback device is configured to determine the NMLS of the n.sup.th frame of the limited error signal.

8. The playback device of claim 7, wherein the error recovery non-linearity function comprises a non-linear clipping function that limits portions of the error signal that are above the threshold magnitude to the threshold magnitude.

9. The playback device of claim 1, excluding a double-talk detector that disables the AEC when a double-talk condition is detected, wherein the program instructions that are executable by the at least one processor such that the playback device is configured to capture audio within the acoustic environment comprises program instructions that are executable by the at least one processor such that the playback device is configured to capture audio signals representing sound produced by two or more voices.

10. A system comprising: a playback device comprising an audio input interface, one or more speakers, and an audio stage comprising an audio processor and an audio amplifier; one or more microphones; at least one processor; and at least one non-transitory computer-readable medium comprising program instructions that are executable by the at least one processor such that the system is configured to: receive, via the audio input interface, one or more audio signals; play back at least one audio signal of the one or more audio signals via the one or more speakers and the audio stage; while playing back the at least one audio signal, capture, via the one or more microphones, audio within an acoustic environment, wherein at least a portion of the captured audio represents sound produced by the one or more speakers in playing back the at least one audio signal via the one or more speakers; receive at least one playback signal from the audio stage representing the at least one audio signal being played back by the one or more speakers and the audio stage; transform into a short time Fourier transform (STFT) domain the captured audio within the acoustic environment to generate a measured signal representing actual acoustic echo; transform into the STFT domain the received playback signal from the audio stage to generate a reference signal; during each n.sup.th iteration of an acoustic echo canceller (AEC): determine an n.sup.th frame of an output signal, wherein the program instructions that are executable by the at least one processor such that the system is configured to determine the n.sup.th frame of the output signal comprises program instructions that are executable by the at least one processor such that the system is configured to: (i) generate an n.sup.th frame of a model signal representing estimated acoustic echo by passing an n.sup.th frame of the reference signal through an n.sup.th instance of an adaptive filter; and (ii) generate the n.sup.th frame of the output signal by differencing the n.sup.th frame of the model signal and an n.sup.th frame of the measured signal; determine a n+1.sup.th instance of the adaptive filter for a next iteration of the AEC, wherein the program instructions that are executable by the at least one processor such that the system is configured to determine the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC comprises program instructions that are executable by the at least one processor such that the system is configured to: (i) estimate an n.sup.th frame of an error signal, the n.sup.th frame of the error signal representing a difference between the n.sup.th frame of the measured signal and the n.sup.th frame of the model signal; (ii) convert the n.sup.th frame of an error signal to an n.sup.th update filter; (iii) deactivate inactive portions of the n.sup.th update filter, the inactive portions having less than a threshold energy; (iv) generate the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC by summing the n.sup.th instance of the adaptive filter with the n.sup.th update filter; and send the output signal as a voice input to one or more voice assistants for processing of the voice input.

13. The system of claim 10, wherein the program instructions that are executable by the at least one processor such that the system is configured to convert the n.sup.th frame of an error signal to the n.sup.th update filter comprises program instructions that are executable by the at least one processor such that the system is configured to determine a normalized least mean square (NMLS) of the n.sup.th frame of the error signal, and wherein the program instructions that are executable by the at least one processor such that the system is configured to deactivate inactive portions of the n.sup.th update filter comprises program instructions that are executable by the at least one processor such that the system is configured to determine a sparse NMLS of the n.sup.th frame of the error signal by applying to the NMLS of the n.sup.th frame of the error signal, a sparse partition criterion that zeroes out frequency bands of the NMLS having less than the threshold energy.

15. The playback of claim 13, wherein the program instructions that are executable by the at least one processor such that the system is configured to convert the sparse NMLS of the n.sup.th frame of the error signal to the n.sup.th update filter comprises program instructions that are executable by the at least one processor such that the system is configured to: convert the sparse NMLS of the n.sup.th frame to a matrix of filter coefficients; and cross-band filter the matrix of filter coefficients to generate the n.sup.th update filter.

11. The system of claim 10, wherein the at least one audio signal of the one or more audio signals comprises one or more first channels of multi-channel audio content, wherein the one or more audio signals comprises one or more second channels of the multi-channel audio content, wherein the system comprises a network interface, wherein the system is a group coordinator of a bonded zone that includes one or more additional playback devices, and wherein at least one non-transitory computer readable medium further comprises program instructions that are executable by the at least one processor such that the system is configured to: send, via the network interface, data representing the one or more second channels of the multi-channel audio content to one or more additional playback devices; and wherein the program instructions that are executable by the at least one processor such that the system is configured to play back the at least one audio signal comprises program instructions that are executable by the at least one processor such that the system is configured to play back the one or more first channels of multi-channel audio content in synchrony with playback of the one or more second channels of the multi-channel audio content by the one or more additional playback devices.

12. The system of claim 10, wherein the program instructions that are executable by the at least one processor such that the system is configured to play back the at least one audio signal comprises program instructions that are executable by the at least one processor such that the system is configured to: receive, via the audio input interface, the one or more audio signals from a television.

14. The system of claim 13, wherein the program instructions that are executable by the at least one processor such that the system is configured to determine the normalized least mean square (NMLS) of the n.sup.th frame of the error signal comprises program instructions that are executable by the at least one processor such that the system is configured to apply a frequency-dependent regularization parameter to adapt an NMLS learning rate of change between AEC iterations according to a magnitude of the measured signal.

15. The playback of claim 13, wherein the program instructions that are executable by the at least one processor such that the system is configured to convert the sparse NMLS of the n.sup.th frame of the error signal to the n.sup.th update filter comprises program instructions that are executable by the at least one processor such that the system is configured to: convert the sparse NMLS of the n.sup.th frame to a matrix of filter coefficients; and cross-band filter the matrix of filter coefficients to generate the n.sup.th update filter.

16. The system of claim 13, wherein at least one non-transitory computer readable medium further comprises program instructions that are executable by the at least one processor such that the system is configured to: before determination of the NMLS of the n.sup.th frame of the error signal, apply an error recovery non-linearity function to the error signal to limit the error signal to a threshold magnitude, wherein the program instructions that are executable by the at least one processor such that the system is configured to determine the normalized least mean square (NMLS) of the n.sup.th frame of the error signal comprises program instructions that are executable by the at least one processor such that the system is configured to determine the NMLS of the n.sup.th frame of the limited error signal.

17. The system of claim 16, wherein the error recovery non-linearity function comprises a non-linear clipping function that limits portions of the error signal that are above the threshold magnitude to the threshold magnitude.

18. The system of claim 13, excluding a double-talk detector that disables the AEC when a double-talk condition is detected, wherein the program instructions that are executable by the at least one processor such that the system is configured to capture audio within the acoustic environment comprises program instructions that are executable by the at least one processor such that the system is configured to capture audio signals representing sound produced by two or more voices.

19. The system of claim 1, wherein the playback device comprises a first network interface, wherein the system further comprises a networked-microphone device comprising a second network interface, the one or more microphones, the at least one processor, and the at least one non-transitory computer-readable medium, and wherein the first network interface and the second network interface are configured to communicatively couple the playback device and the networked-microphone device.

20. A method comprising: receiving, via an audio input interface of a playback device, one or more audio signals; playing back at least one audio signal of the one or more audio signals via one or more speakers and an audio stage comprising an audio processor and an audio amplifier; while playing back the at least one audio signal, capturing, via one or more microphones, audio within an acoustic environment, wherein at least a portion of the captured audio represents sound produced by the one or more speakers in playing back the at least one audio signal via the one or more speakers; receiving at least one playback signal from the audio stage representing the at least one audio signal being played back by the one or more speakers and the audio stage; transforming into a short time Fourier transform (STFT) domain the captured audio within the acoustic environment to generate a measured signal representing actual acoustic echo; transforming into the STFT domain the received playback signal from the audio stage to generate a reference signal; during each n.sup.th iteration of an acoustic echo canceller (AEC): determining an n.sup.th frame of an output signal, wherein determining the n.sup.th frame of the output signal comprises: (i) generating an n.sup.th frame of a model signal representing estimated acoustic echo by passing an n.sup.th frame of the reference signal through an n.sup.th instance of an adaptive filter; and (ii) generating the n.sup.th frame of the output signal by differencing the n.sup.th frame of the model signal and an n.sup.th frame of the measured signal; determining a n+1.sup.th instance of the adaptive filter for a next iteration of the AEC, wherein determining the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC comprises: (i) estimating an n.sup.th frame of an error signal, the n.sup.th frame of the error signal representing a difference between the n.sup.th frame of the measured signal and the n.sup.th frame of the model signal; (ii) converting the n.sup.th frame of an error signal to an n.sup.th update filter; (iii) deactivating inactive portions of the n.sup.th update filter, the inactive portions having less than a threshold energy; (iv) generating the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC by summing the n.sup.th instance of the adaptive filter with the n.sup.th update filter; and sending the output signal as a voice input to one or more voice assistants for processing of the voice input.
1. A system comprising: an audio stage comprising an audio processor and an audio amplifier; one or more speakers; one or more microphones; one or more processors; data storage storing instructions executable by the one or more processors that cause the system to perform operations comprising: causing, via the audio stage, the one or more speakers to play back audio content; while the audio content is playing back via the one or more speakers, capturing, via the one or more microphones, audio within an acoustic environment, wherein the captured audio comprises audio signals representing sound produced by the one or more speakers in playing back the audio content; receiving a playback signal from the audio stage representing the audio content being played back by the one or more speakers; transforming into a short time Fourier transform (STFT) domain the captured audio within the acoustic environment to generate a measured signal in the STFT domain comprising a series of frames representing the captured audio within the acoustic environment; transforming into the STFT domain the received output signal from the audio stage to generate a reference signal in the STFT domain comprising a series of frames representing the audio content being played back via the one or more speakers; during each n.sup.th iteration of an acoustic echo canceller (AEC): determining an n.sup.th frame of an output signal, wherein determining the n.sup.th frame of the output signal comprises: generating an n.sup.th frame of a model signal by passing an n.sup.th frame of the reference signal through an n.sup.th instance of an adaptive filter, wherein the first instance of the adaptive filter is an initial filter; and generating the n.sup.th frame of the output signal by redacting the n.sup.th frame of the model signal from an n.sup.th frame of the measured signal; determining a n+1.sup.th instance of the adaptive filter for a next iteration of the AEC, wherein determining the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC comprises: determining an n.sup.th frame of an error signal, the n.sup.th frame of the error signal representing a difference between the n.sup.th frame of the model signal and the n.sup.th frame of the reference signal less audio signals representing sound from sources other than an n.sup.th frame of the audio signals representing sound produced by the one or more speakers in playing back the n.sup.th frame of the reference signal; determining a normalized least mean square (NMLS) of the n.sup.th frame of the error signal; determining a sparse NMLS of the n.sup.th frame of the error signal by applying to the NMLS of the n.sup.th frame of the error signal, a sparse partition criterion that zeroes out frequency bands of the NMLS having less than a threshold energy; converting the sparse NMLS of the n.sup.th frame of the error signal to an n.sup.th update filter; and generating the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC by summing the n.sup.th instance of the adaptive filter with the n.sup.th update filter; and sending the output signal as a voice input to one or more voice services for processing of the voice input.

























5. The system of claim 1, wherein converting the sparse NMLS of the n.sup.th frame of the error signal to the n.sup.th update filter comprises: converting the sparse NMLS of the n.sup.th frame to a matrix of filter coefficients; and cross-band filtering the matrix of filter coefficients to generate the n.sup.th update filter.




















































4. The system of claim 1, wherein determining the normalized least mean square (NMLS) of the n.sup.th frame of the error signal comprises: applying a frequency-dependent regularization parameter to adapt an NMLS learning rate of change between AEC iterations according to a magnitude of the measured signal.






5. The system of claim 1, wherein converting the sparse NMLS of the n.sup.th frame of the error signal to the n.sup.th update filter comprises: converting the sparse NMLS of the n.sup.th frame to a matrix of filter coefficients; and cross-band filtering the matrix of filter coefficients to generate the n.sup.th update filter.







2. The system of claim 1, wherein the data storage further includes instructions that cause the system to perform operations comprising: before determining the NMLS of the n.sup.th frame of the error signal, applying an error recovery non-linearity function to the error signal to limit the error signal to a threshold magnitude, wherein determining the normalized least mean square (NMLS) of the n.sup.th frame of the error signal comprises determining the NMLS of the n.sup.th frame of the limited error signal.








3. The system of claim 2, wherein the error recovery non-linearity function comprises a non-linear clipping function that limits portions of the error signal that are above the threshold magnitude to the threshold magnitude.

6. The system of claim 1, excluding a double-talk detector that disables the AEC when a double-talk condition is detected, wherein capturing audio within the acoustic environment comprises capturing audio signals representing sound produced by two or more voices.







1. A system comprising: an audio stage comprising an audio processor and an audio amplifier; one or more speakers; one or more microphones; one or more processors; data storage storing instructions executable by the one or more processors that cause the system to perform operations comprising: causing, via the audio stage, the one or more speakers to play back audio content; while the audio content is playing back via the one or more speakers, capturing, via the one or more microphones, audio within an acoustic environment, wherein the captured audio comprises audio signals representing sound produced by the one or more speakers in playing back the audio content; receiving a playback signal from the audio stage representing the audio content being played back by the one or more speakers; transforming into a short time Fourier transform (STFT) domain the captured audio within the acoustic environment to generate a measured signal in the STFT domain comprising a series of frames representing the captured audio within the acoustic environment; transforming into the STFT domain the received output signal from the audio stage to generate a reference signal in the STFT domain comprising a series of frames representing the audio content being played back via the one or more speakers; during each n.sup.th iteration of an acoustic echo canceller (AEC): determining an n.sup.th frame of an output signal, wherein determining the n.sup.th frame of the output signal comprises: generating an n.sup.th frame of a model signal by passing an n.sup.th frame of the reference signal through an n.sup.th instance of an adaptive filter, wherein the first instance of the adaptive filter is an initial filter; and generating the n.sup.th frame of the output signal by redacting the n.sup.th frame of the model signal from an n.sup.th frame of the measured signal; determining a n+1.sup.th instance of the adaptive filter for a next iteration of the AEC, wherein determining the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC comprises: determining an n.sup.th frame of an error signal, the n.sup.th frame of the error signal representing a difference between the n.sup.th frame of the model signal and the n.sup.th frame of the reference signal less audio signals representing sound from sources other than an n.sup.th frame of the audio signals representing sound produced by the one or more speakers in playing back the n.sup.th frame of the reference signal; determining a normalized least mean square (NMLS) of the n.sup.th frame of the error signal; determining a sparse NMLS of the n.sup.th frame of the error signal by applying to the NMLS of the n.sup.th frame of the error signal, a sparse partition criterion that zeroes out frequency bands of the NMLS having less than a threshold energy; converting the sparse NMLS of the n.sup.th frame of the error signal to an n.sup.th update filter; and generating the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC by summing the n.sup.th instance of the adaptive filter with the n.sup.th update filter; and sending the output signal as a voice input to one or more voice services for processing of the voice input.

















































































4. The system of claim 1, wherein determining the normalized least mean square (NMLS) of the n.sup.th frame of the error signal comprises: applying a frequency-dependent regularization parameter to adapt an NMLS learning rate of change between AEC iterations according to a magnitude of the measured signal.






5. The system of claim 1, wherein converting the sparse NMLS of the n.sup.th frame of the error signal to the n.sup.th update filter comprises: converting the sparse NMLS of the n.sup.th frame to a matrix of filter coefficients; and cross-band filtering the matrix of filter coefficients to generate the n.sup.th update filter.






2. The system of claim 1, wherein the data storage further includes instructions that cause the system to perform operations comprising: before determining the NMLS of the n.sup.th frame of the error signal, applying an error recovery non-linearity function to the error signal to limit the error signal to a threshold magnitude, wherein determining the normalized least mean square (NMLS) of the n.sup.th frame of the error signal comprises determining the NMLS of the n.sup.th frame of the limited error signal.








3. The system of claim 2, wherein the error recovery non-linearity function comprises a non-linear clipping function that limits portions of the error signal that are above the threshold magnitude to the threshold magnitude.

6. The system of claim 1, excluding a double-talk detector that disables the AEC when a double-talk condition is detected, wherein capturing audio within the acoustic environment comprises capturing audio signals representing sound produced by two or more voices.






7. The system of claim 1, further comprising: a playback device comprising a first network interface and the one or more speakers; and a networked-microphone device comprising a second network interface, the one or more microphones, the one or more processors, and the data storage storing instructions executable by the one or more processors, wherein the first network interface and the second network interface are configured to communicatively couple the playback device and the networked-microphone device.

9. A method to be performed by a system comprising a playback device, the method comprising: causing, via an audio stage of the playback device, one or more speakers of the playback device to play back audio content, wherein the audio stage comprises an audio processor and an audio amplifier; while the audio content is playing back via the one or more speakers, capturing, via one or more microphones, audio within an acoustic environment, wherein the captured audio comprises audio signals representing sound produced by the one or more speakers in playing back the audio content; receiving a playback signal from the audio stage representing the audio content being played back by the one or more speakers; transforming into a short time Fourier transform (STFT) domain the captured audio within the acoustic environment to generate a measured signal in the STFT domain comprising a series of frames representing the captured audio within the acoustic environment; transforming into the STFT domain the received output signal from the audio stage to generate a reference signal in the STFT domain comprising a series of frames representing the audio content being played back via the one or more speakers; during each n.sup.th iteration of an acoustic echo canceller (AEC): determining an n.sup.th frame of an output signal, wherein determining the n.sup.th frame of the output signal comprises: generating an n.sup.th frame of a model signal by passing an n.sup.th frame of the reference signal through an n.sup.th instance of an adaptive filter, wherein the first instance of the adaptive filter is an initial filter; and generating the n.sup.th frame of the output signal by redacting the n.sup.th frame of the model signal from an n.sup.th frame of the measured signal; determining a n+1.sup.th instance of the adaptive filter for a next iteration of the AEC, wherein determining the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC comprises: determining an n.sup.th frame of an error signal, the n.sup.th frame of the error signal representing a difference between the n.sup.th frame of the model signal and the n.sup.th frame of the reference signal less audio signals representing sound from sources other than an n.sup.th frame of the audio signals representing sound produced by the one or more speakers in playing back the n.sup.th frame of the reference signal; determining a normalized least mean square (NMLS) of the n.sup.th frame of the error signal; determining a sparse NMLS of the n.sup.th frame of the error signal by applying to the NMLS of the n.sup.th frame of the error signal, a sparse partition criterion that zeroes out frequency bands of the NMLS having less than a threshold energy; converting the sparse NMLS of the n.sup.th frame of the error signal to an n.sup.th update filter; and generating the n+1.sup.th instance of the adaptive filter for the next iteration of the AEC by summing the n.sup.th instance of the adaptive filter with the n.sup.th update filter; and sending the output signal as a voice input to one or more voice services for processing of the voice input.


Examiner Comments

There are 35 U.S.C. 112(b) issues in claims, which cause confusions in scope and limitation by limitation and in addition, there are rejection of claims 1-20 on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claim 1-8, 10 of U.S. Patent No. 11,017,789 B2 in view of Beckhardt et al. (US 20130336499 A1, hereinafter Beckhardt) and rejection of claims 1-20 on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claims 1-7, 9 of U.S. Patent No. 10,446,165 B2 and  the objections of claims 1-20 as well, a prior art search has been conducted by the examiner, which is recorded in attached PTO-892 form. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LESHUI ZHANG whose telephone number is (571)270-5589.  The examiner can normally be reached on Monday-Friday 6:30am-4:00pm EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached on 571-272-7848.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LESHUI ZHANG/
Primary Examiner, Art Unit 2654