DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office Action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 3/25/2021 has been entered.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:

(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.  The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.  The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office Action.  Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office Action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
Applicant’s specification, as published, at paragraphs 0005 and 0047 disclose the computation device being a host and the host may be a network-based or cloud-based host (e.g., server), or one of the client devices, for example.  The client devices may be mobile phones, such as smartphones, for example.
If Applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, Applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office Action:


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors.  In considering patentability of the claims the Examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the Examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-6, 11, 13, 15, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over LaFata et al. (U.S. Patent Application Publication No. 2016/0014373 A1) (hereinafter LaFata) in view of Virolainen et al. (U.S. Patent Application Publication No. 2009/0264114 A1) (hereinafter Virolainen).

Regarding claim 1, LaFata discloses a method of hosting a teleconference among a plurality of client devices arranged in two or more acoustic spaces, each client device having an audio capturing capability and/or an audio rendering capability (Figure 1 and paragraph 0027 disclose the conference call is being held with participants at four different locations--Locale A, Locale B, Locale C and Locale D.  In the present example, locales A, B and C each has a single participant connecting to the conference call, 
grouping the plurality of client devices into two or more groups based on their belonging to respective acoustic spaces, wherein the two or more groups include a first group into which two or more client devices in the plurality of client devices are grouped, wherein the two or more client devices belong to a first acoustic space in the two or more acoustic spaces (Paragraph 0029 disclose a "locale" refers to the physical space where audio feedback from multiple active devices used by multiple conference participants to connect to the same conference call would create feedback problems or would create unsynchronized speaker signals or would inhibit the proper functioning of acoustic echo cancellation used in conventional cloud-based conferencing systems.  A "shared locale" refers to a locale or physical location occupied by two or more participants of a conference call.  Figure 2 and paragraphs 0041 and 0042 disclose a same-locale multiple-device conferencing method 200 starts by detecting the presence of two or more audio clients connecting to the conference call from the same physical location or same locale (202).  With the detection of two or more audio client connections at the same physical location, the method 200 then organizes the audio client nodes at the same physical location (204));
receiving, by a computation device over one or more networks, first audio streams from the plurality of client devices (Figure 4 and paragraphs 0053 and 0057 disclose the aggregation node 25 is configured to receive inbound microphone data packets from four shared locale audio clients.  In the present example, the aggregation node 25 is configured on the host device of audio client node 4.  Accordingly, audio client nodes 1-3 are connected to the aggregation node 25 through a local area network to provide their microphone packet streams 30 and audio client node 4 provides its microphone packet stream 30 directly to the aggregation node 25.  Meanwhile, the network socket 82 of the aggregation node 25 receives inbound speaker packet stream 84 from the audio conferencing server 
generating, by the computation device from the first audio streams, second audio streams for rendering by respective client devices among the plurality of client devices, wherein the second audio streams are generated based on the grouping of the plurality of client devices into the two or more groups (Figure 4 and paragraphs 0054, 0056, 0057, 0059 and 0060 disclose with inbound microphone packet streams 30 being received from multiple audio client nodes 1-3 in the shared locale, the aggregation node 25 stores the inbound microphone data packets into respective jitter buffers 70.  The aggregation node 25 pulls microphone data packets out of each jitter buffer and from the local audio client node (audio client 4) and processes the microphone data packets through the respective delay line 74.  The delay lines 74 introduce delays that are specific to each audio client node to each inbound microphone packet stream.  The microphone data packets for each inbound microphone packet stream go through its own delay line 74.  With each delay line 74 applying the audio-client-specific delay, the microphone signals from all the audio clients will be lined up after the delay lines.  The aligned microphone signals are mixed together by mixer 76 to form a mixed audio data packet 78 containing the audio signals of all of the audio clients connected to the aggregation node 25.  The mixed audio data packets form an outbound microphone packet stream 80.The inbound speaker audio signals 86 are coupled to a splitter 88 to produce copies of the inbound speaker audio signals 86 for each audio client.  The splitter 88 produces individual speaker data packets 89 destined for each audio client node (for example, audio client nodes 1-4).  The aggregation node 25 dealigns the separated speaker data packets 89 so that the speaker data packets will sound aligned when play out on the individual speakers of the host device of each audio client node.  In embodiments of the present invention, the aggregation node 25 performs speaker alignment by introducing a delay to each speaker data packet 89 that is specific to the audio client node to which the speaker data packet is destined.  the aggregation node 25 receives the inbound speaker packet stream 84 and generates separated speaker data packets destined for each audio client.  The aggregation node 25 processes the separated speaker data packets for each audio client through the respective output 
wherein the second audio streams comprise an individual second audio stream generated for rendering by each of the respective client devices arranged in the two or more acoustic spaces (Figure 4 and paragraphs 0054, 0056, 0057, 0059 and 0060 disclose with inbound microphone packet streams 30 being received from multiple audio client nodes 1-3 in the shared locale, the aggregation node 25 stores the inbound microphone data packets into respective jitter buffers 70.  The aggregation node 25 pulls microphone data packets out of each jitter buffer and from the local audio client node (audio client 4) and processes the microphone data packets through the respective delay line 74.  The delay lines 74 introduce delays that are specific to each audio client node to each inbound microphone packet stream.  The microphone data packets for each inbound microphone packet stream go through its own delay line 74.  With each delay line 74 applying the audio-client-specific delay, the microphone signals from all the audio clients will be lined up after the delay lines.  The aligned microphone signals are mixed together by mixer 76 to form a mixed audio data packet 78 containing the audio signals of all of the audio clients connected to the aggregation node 25.  The mixed audio data packets form an outbound microphone packet stream 80.The inbound speaker audio signals 86 are coupled to a splitter 88 to produce copies of the inbound speaker audio signals 86 for each audio client.  The splitter 88 produces individual speaker data packets 89 destined for each audio client node (for example, audio client nodes 1-4).  The aggregation node 25 dealigns the separated speaker data packets 89 so that the speaker data packets will sound aligned when play out on the individual speakers of the host device of each audio client node.  
wherein the second audio streams include a specific audio stream for a specific client device in the two or more client devices in the first group (Figure 4 and paragraphs 0054, 0056, 0057, 0059 and 0060 disclose with inbound microphone packet streams 30 being received from multiple audio client nodes 1-3 in the shared locale, the aggregation node 25 stores the inbound microphone data packets into respective jitter buffers 70.  The aggregation node 25 pulls microphone data packets out of each jitter buffer and from the local audio client node (audio client 4) and processes the microphone data packets through the respective delay line 74.  The delay lines 74 introduce delays that are specific to each audio client node to each inbound microphone packet stream.  The microphone data packets for each inbound microphone packet stream go through its own delay line 74.  With each delay line 74 applying the audio-client-specific delay, the microphone signals from all the audio clients will be lined up after the delay lines.  The aligned microphone signals are mixed together by mixer 76 to form a mixed audio data packet 78 containing the audio signals of all of the audio clients connected to the aggregation node 25.  The mixed audio data packets form an outbound microphone packet stream 80.The inbound speaker audio signals 
wherein the specific audio stream is sent by the computation device to the specific client device (Figure 4 and paragraphs 0056, 0057, and 0060 disclose the outbound microphone packet stream 80 from the aggregation node 25 is then provided through the network socket 82 to the audio conferencing server.  The network socket 82 of the aggregation node 25 receives inbound speaker packet stream 84 from the audio conferencing server destined to the locale associated with the aggregation node.  In operation, the aggregation node 25 receives the inbound speaker packet stream 84 and generates separated speaker data packets destined for each audio client.  The aggregation node 25 processes the separated speaker data packets for each audio client through the respective output delay line 94.  The speaker data packets for each audio client node go through its own delay line 94.  With each delay line 94 
outputting, by the computation device over the one or more networks, the generated second audio streams to the respective client devices for rendering (Figure 4 and paragraphs 0056, 0057, and 0060 disclose the outbound microphone packet stream 80 from the aggregation node 25 is then provided through the network socket 82 to the audio conferencing server.  The network socket 82 of the aggregation node 25 receives inbound speaker packet stream 84 from the audio conferencing server destined to the locale associated with the aggregation node.  In operation, the aggregation node 25 receives the inbound speaker packet stream 84 and generates separated speaker data packets destined for each audio client.  The aggregation node 25 processes the separated speaker data packets for each audio client through the respective output delay line 94.  The speaker data packets for each audio client node go through its own delay line 94.  With each delay line 94 applying the audio-client-specific playout delay, the speaker signals for all the audio clients will become dealigned.  The network socket 42 pulls the delay-adjusted speaker data packets from the delay lines and sends the speaker data packets to the respective audio client as the audio client receive stream.  The delay-adjusted speaker data packets form the outbound speaker packet stream 50.  The outbound speaker packet stream 50 are then provided through the network socket 42 to the respective audio client nodes.  Figure 3 and paragraph 0049 discloses audio client 20 also processes incoming audio data packets received either from the audio conferencing server 15 or the audio client aggregation node 25 that are to be played out on the speaker 
LaFata does not explicitly disclose wherein the specific audio stream is generated by the computation device for the specific client device from a subset of the first audio streams; wherein the subset of the first audio streams excludes one or more audio streams received from one or more other client devices in the two or more client devices in the first group.
In analogous art, Virolainen discloses wherein the specific audio stream is generated by the computation device for the specific client device from a subset of the first audio streams (Paragraph 0006 discloses the conference switch 100, also referred to as a conference bridge, mixes incoming speech signals from each site and sends the mixed signal back to each site.  The speech signal coming from the current site is usually removed from the mixed signal that is sent back to this same site.  Figure 1 and paragraph 0037 disclose the conference switch 148 may be configured to mix incoming speech signals from each site and sends the mixed signal back to each site, except that the speech signal coming from the current site may be removed from the mixed signal that is sent back to the current site);
wherein the subset of the first audio streams excludes one or more audio streams received from one or more other client devices in the two or more client devices in the first group (Paragraph 0006 discloses the conference switch 100, also referred to as a conference bridge, mixes incoming speech signals from each site and sends the mixed signal back to each site.  The speech signal coming from the current site is usually removed from the mixed signal that is sent back to this same site.  Figure 1 and paragraph 0037 disclose the conference switch 148 may be configured to mix incoming speech signals from each site and sends the mixed signal back to each site, except that the speech signal coming from the current site may be removed from the mixed signal that is sent back to the current site).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to incorporate removing speech signals from a site from a mixed signal sent back to the site, as described in Virolainen, with mixing signals from microphones in various locations for a teleconference, as described in LaFata, because 
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to combine the teachings of LaFata and Virolainen to obtain the invention as specified in claim 1.

Regarding claim 2, as applied to claim 1 above, LaFata, as modified by Virolainen, discloses the claimed invention except explicitly disclosing wherein generating the second audio streams comprises: for an active sound source in a given acoustic space, determining the client device in the given acoustic space that is closest to the active sound source; generating a source audio stream that represents captured audio for the currently active sound source based on the first audio stream from the determined client device, disregarding the first audio streams from any other client devices in the same group as the determined client device; and generating the second audio streams from the source audio stream.
Virolainen further discloses wherein generating the second audio streams comprises: for an active sound source in a given acoustic space, determining the client device in the given acoustic space that is closest to the active sound source; generating a source audio stream that represents captured audio for the currently active sound source based on the first audio stream from the determined client device, disregarding the first audio streams from any other client devices in the same group as the determined client device; and generating the second audio streams from the source audio stream (Paragraph 0050 discloses the mixer 202 may employ a dynamic mixing algorithm.  The dynamic mixing algorithm may enable calculation of various audio features for the microphone signals T1(t), T2(t), T3(t) . . . TN(t) and, based on these features, the dynamic mixing algorithm may attempt to mix signal(s) from microphone(s) that have (or typically have) the highest energy or best signal-to-noise ratio as compared to other signals.  As such, for example, the mixer 202 (e.g., via the dynamic mixing algorithm) may be configured to select one of the microphone signals T1(t), T2(t), T3(t) . . . TN(t) at any given time for 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to incorporate the microphone closest to the speaker selected as the signal to be included in the downmixed signal, as described in Virolainen, with mixing signals from microphones, as described in LaFata, because doing so is using a known technique to improve a similar method in the same way.  Combining the microphone closest to the speaker selected as the signal to be included in the downmixed signal of Virolainen with mixing signals from microphones of LaFata was within the ordinary ability of one of ordinary skill in the art based on the teachings of Virolainen.
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to combine the teachings of LaFata and Virolainen to obtain the invention as specified in claim 3.

Regarding claim 3, as applied to claim 2 above, LaFata, as modified by Virolainen, discloses the claimed invention except explicitly disclosing wherein determining the client device in the given acoustic space that is closest to the active sound source is based on at least one of: measuring sound volumes of audio events in first audio streams from client devices in a group corresponding to the given audio space; and measuring times of arrival of audio events in first audio streams from client devices in a group corresponding to the given audio space.
Virolainen further discloses wherein determining the client device in the given acoustic space that is closest to the active sound source is based on at least one of: measuring sound volumes of audio events in first audio streams from client devices in a group corresponding to the given audio space; and 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to incorporate selecting the microphone closest to the speaker as the microphone having the highest energy as compared to other signals, as described in Virolainen, with mixing signals from microphones, as described in LaFata, because doing so is using a known technique to improve a similar method in the same way.  Combining selecting the microphone closest to the speaker as the microphone having the highest energy as compared to other signals of Virolainen with mixing signals from microphones of LaFata was within the ordinary ability of one of ordinary skill in the art based on the teachings of Virolainen.
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to combine the teachings of LaFata and Virolainen to obtain the invention as specified in claim 3.

Regarding claim 4, as applied to claim 1 above, LaFata, as modified by Virolainen, further discloses wherein generating the second audio streams comprises:
for an active sound source in a given acoustic space, applying a signal processing technique to the first audio streams from client devices that are grouped in a group corresponding to the given acoustic space, to generate a source audio stream that represents captured audio for the currently active sound source (Figure 4 and paragraphs 0054 and 0056 disclose with inbound microphone packet streams 30 being received from multiple audio client nodes 1-3 in the shared locale, the aggregation node 25 stores the inbound microphone data packets into respective jitter buffers 70.  The aggregation node 25 pulls microphone data packets out of each jitter buffer and from the local audio client node (audio client 4) and 
generating the second audio streams from the source audio stream Figure 4 and paragraph 0056 disclose the aligned microphone signals are mixed together by mixer 76 to form a mixed audio data packet 78 containing the audio signals of all of the audio clients connected to the aggregation node 25.  The mixed audio data packets form an outbound microphone packet stream 80).

Regarding claim 5, as applied to claim 1 above, LaFata discloses the claimed invention except explicitly disclosing wherein for a given group of client devices, first audio streams from client devices in the given group of client devices are not used for generating second audio streams for the client devices in the given group of client devices.
Virolainen further discloses wherein for a given group of client devices, first audio streams from client devices in the given group of client devices are not used for generating second audio streams for the client devices in the given group of client devices (Paragraph 0006 discloses the conference switch 100, also referred to as a conference bridge, mixes incoming speech signals from each site and sends the mixed signal back to each site.  The speech signal coming from the current site is usually removed from the mixed signal that is sent back to this same site.  Figure 1 and paragraph 0037 disclose the conference switch 148 may be configured to mix incoming speech signals from each site and sends the mixed signal back to each site, except that the speech signal coming from the current site may be removed from the mixed signal that is sent back to the current site).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to incorporate removing speech signals from a site from a mixed signal sent back to the site, as described in Virolainen, with mixing signals from microphones in various locations for a teleconference, as described in LaFata, because doing so is using a known technique to improve a similar method in the same way.  Combining removing 
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to combine the teachings of LaFata and Virolainen to obtain the invention as specified in claim 5.

Regarding claim 6, as applied to claim 1 above, LaFata, as modified by Virolainen, further discloses wherein the second audio streams are generated to be the same for all client devices in a given group of client devices (Figure 4 and paragraph 0057 disclose the inbound speaker audio signals 86 are coupled to a splitter 88 to produce copies of the inbound speaker audio signals 86 for each audio client.  The splitter 88 produces individual speaker data packets 89 destined for each audio client node (for example, audio client nodes 1-4).  In particular, each audio client node receives speaker audio signals from all other conference participants).

Regarding claim 11, as applied to claim 1 above, LaFata, as modified by Virolainen, further discloses wherein grouping the plurality of client devices based on their belonging to respective acoustic spaces involves at least one of: acoustic watermarking; receiving a user input indicative of a list of client devices present in at least one acoustic space; proximity detection using Bluetooth communication between client devices; and visual inspection using one or more video cameras (Paragraph 0089 discloses the remote audio conferencing server identifies a locale by sampling over a preview period the background noise signature received from each audio client requesting to join a conference call.  In some embodiments, the remote audio conferencing server may generate an Acoustic Background Spectrum of each audio client connecting to the conference call.  The Acoustic Background Spectrum of each audio client can be thought of as a room fingerprint and would enable the remote audio conferencing server to identify if two or more audio clients may be in the same locale).

Regarding claim 13, as applied to claim 1 above, LaFata, as modified by Virolainen, further 
adding respective delays to the second audio streams for the client devices in the at least one group of client devices based on the determined transmission latencies, to time-synchronize the second audio streams for the client devices in the at least one group of client devices (Paragraph 0604 discloses the present invention, the audio clients' microphone audio packets are timestamped by the respective host OS audio API.  The timing information is used by the input signal aligner 72 at the aggregation node 25 to align the audio clients' microphone audio streams.  Paragraph 0055 discloses the input delay lines 74 are controlled by an input signal aligner 72 which is configured to generate delay lines adjustment values for each of the input delay lines 74.  Paragraph 0071 discloses the input signal aligner 72 generates input delay lines adjustment values for the delay lines 74 based on the difference between the audio client timestamp and the reference timestamp).

Regarding claim 15, LaFata discloses a computation device (Paragraph 0022 discloses the invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor) comprising:
a computer processor (Paragraph 0022 discloses a processor); and
a non-transitory computer-readable storage medium storing a computer program (Paragraph 
grouping the plurality of client devices into two or more groups based on their belonging to respective acoustic spaces, wherein the two or more groups include a first group into which two or more client devices in the plurality of client devices are grouped, wherein the two or more client devices belong to a first acoustic space in the two or more acoustic spaces (Paragraph 0029 disclose a "locale" refers to the physical space where audio feedback from multiple active devices used by multiple conference participants to connect to the same conference call would create feedback problems or would create unsynchronized speaker signals or would inhibit the proper functioning of acoustic echo cancellation used in conventional cloud-based conferencing systems.  A "shared locale" refers to a locale or physical location occupied by two or more participants of a conference call.  Figure 2 and paragraphs 0041 and 0042 disclose a same-locale multiple-device conferencing method 200 starts by detecting the presence of two or more audio clients connecting to the conference call from the same physical location or same locale (202).  With the detection of two or more audio client connections at the same physical location, the method 200 then organizes the audio client nodes at the same physical location (204);
receiving, by the computation device, first audio streams from the plurality of client devices (Figure 4 and paragraphs 0053 and 0057 disclose the aggregation node 25 is configured to receive inbound microphone data packets from four shared locale audio clients.  In the present example, the aggregation node 25 is configured on the host device of audio client node 4.  Accordingly, audio client nodes 1-3 are connected to the aggregation node 25 through a local area network to provide their microphone packet streams 30 and audio client node 4 provides its microphone packet stream 30 directly to the aggregation node 25.  Meanwhile, the network socket 82 of the aggregation node 25 receives inbound speaker packet stream 84 from the audio conferencing server destined to the locale associated 
generating, by the computation device from the first audio streams, second audio streams for rendering by respective client devices among the plurality of client devices, wherein the second audio streams are generated based on the grouping of the plurality of client devices into the two or more groups (Figure 4 and paragraphs 0054, 0056, 0057, 0059 and 0060 disclose with inbound microphone packet streams 30 being received from multiple audio client nodes 1-3 in the shared locale, the aggregation node 25 stores the inbound microphone data packets into respective jitter buffers 70.  The aggregation node 25 pulls microphone data packets out of each jitter buffer and from the local audio client node (audio client 4) and processes the microphone data packets through the respective delay line 74.  The delay lines 74 introduce delays that are specific to each audio client node to each inbound microphone packet stream.  The microphone data packets for each inbound microphone packet stream go through its own delay line 74.  With each delay line 74 applying the audio-client-specific delay, the microphone signals from all the audio clients will be lined up after the delay lines.  The aligned microphone signals are mixed together by mixer 76 to form a mixed audio data packet 78 containing the audio signals of all of the audio clients connected to the aggregation node 25.  The mixed audio data packets form an outbound microphone packet stream 80.The inbound speaker audio signals 86 are coupled to a splitter 88 to produce copies of the inbound speaker audio signals 86 for each audio client.  The splitter 88 produces individual speaker data packets 89 destined for each audio client node (for example, audio client nodes 1-4).  The aggregation node 25 dealigns the separated speaker data packets 89 so that the speaker data packets will sound aligned when play out on the individual speakers of the host device of each audio client node.  In embodiments of the present invention, the aggregation node 25 performs speaker alignment by introducing a delay to each speaker data packet 89 that is specific to the audio client node to which the speaker data packet is destined.  the aggregation node 25 receives the inbound speaker packet stream 84 and generates separated speaker data packets destined for each audio client.  The aggregation node 25 processes the separated speaker data packets for each audio client through the respective output 
wherein the second audio streams comprise an individual second audio stream generated for rendering by each of the respective client devices arranged in the two or more acoustic spaces (Figure 4 and paragraphs 0054, 0056, 0057, 0059 and 0060 disclose with inbound microphone packet streams 30 being received from multiple audio client nodes 1-3 in the shared locale, the aggregation node 25 stores the inbound microphone data packets into respective jitter buffers 70.  The aggregation node 25 pulls microphone data packets out of each jitter buffer and from the local audio client node (audio client 4) and processes the microphone data packets through the respective delay line 74.  The delay lines 74 introduce delays that are specific to each audio client node to each inbound microphone packet stream.  The microphone data packets for each inbound microphone packet stream go through its own delay line 74.  With each delay line 74 applying the audio-client-specific delay, the microphone signals from all the audio clients will be lined up after the delay lines.  The aligned microphone signals are mixed together by mixer 76 to form a mixed audio data packet 78 containing the audio signals of all of the audio clients connected to the aggregation node 25.  The mixed audio data packets form an outbound microphone packet stream 80.The inbound speaker audio signals 86 are coupled to a splitter 88 to produce copies of the inbound speaker audio signals 86 for each audio client.  The splitter 88 produces individual speaker data packets 89 destined for each audio client node (for example, audio client nodes 1-4).  The aggregation node 25 dealigns the separated speaker data packets 89 so that the speaker data packets will sound aligned when play out on the individual speakers of the host device of each audio client node.  
wherein the second audio streams include a specific audio stream for a specific client device in the two or more client devices in the first group (Figure 4 and paragraphs 0054, 0056, 0057, 0059 and 0060 disclose with inbound microphone packet streams 30 being received from multiple audio client nodes 1-3 in the shared locale, the aggregation node 25 stores the inbound microphone data packets into respective jitter buffers 70.  The aggregation node 25 pulls microphone data packets out of each jitter buffer and from the local audio client node (audio client 4) and processes the microphone data packets through the respective delay line 74.  The delay lines 74 introduce delays that are specific to each audio client node to each inbound microphone packet stream.  The microphone data packets for each inbound microphone packet stream go through its own delay line 74.  With each delay line 74 applying the audio-client-specific delay, the microphone signals from all the audio clients will be lined up after the delay lines.  The aligned microphone signals are mixed together by mixer 76 to form a mixed audio data packet 78 containing the audio signals of all of the audio clients connected to the aggregation node 25.  The mixed audio data packets form an outbound microphone packet stream 80.The inbound speaker audio signals 
wherein the specific audio stream is sent by the computation device to the specific client device (Figure 4 and paragraphs 0056, 0057, and 0060 disclose the outbound microphone packet stream 80 from the aggregation node 25 is then provided through the network socket 82 to the audio conferencing server.  The network socket 82 of the aggregation node 25 receives inbound speaker packet stream 84 from the audio conferencing server destined to the locale associated with the aggregation node.  In operation, the aggregation node 25 receives the inbound speaker packet stream 84 and generates separated speaker data packets destined for each audio client.  The aggregation node 25 processes the separated speaker data packets for each audio client through the respective output delay line 94.  The speaker data packets for each audio client node go through its own delay line 94.  With each delay line 94 
outputting, by the computation device, the generated second audio streams to the respective client devices for rendering (Figure 4 and paragraphs 0056, 0057, and 0060 disclose the outbound microphone packet stream 80 from the aggregation node 25 is then provided through the network socket 82 to the audio conferencing server.  The network socket 82 of the aggregation node 25 receives inbound speaker packet stream 84 from the audio conferencing server destined to the locale associated with the aggregation node.  In operation, the aggregation node 25 receives the inbound speaker packet stream 84 and generates separated speaker data packets destined for each audio client.  The aggregation node 25 processes the separated speaker data packets for each audio client through the respective output delay line 94.  The speaker data packets for each audio client node go through its own delay line 94.  With each delay line 94 applying the audio-client-specific playout delay, the speaker signals for all the audio clients will become dealigned.  The network socket 42 pulls the delay-adjusted speaker data packets from the delay lines and sends the speaker data packets to the respective audio client as the audio client receive stream.  The delay-adjusted speaker data packets form the outbound speaker packet stream 50.  The outbound speaker packet stream 50 are then provided through the network socket 42 to the respective audio client nodes.  Figure 3 and paragraph 0049 discloses audio client 20 also processes incoming audio data packets received either from the audio conferencing server 15 or the audio client aggregation node 25 that are to be played out on the speaker 54 of the host device.  At the host device, incoming 
LaFata does not explicitly disclose wherein the specific audio stream is generated by the computation device for the specific client device from a subset of the first audio streams; wherein the subset of the first audio streams excludes one or more audio streams received from one or more other client devices in the two or more client devices in the first group.
In analogous art, Virolainen discloses wherein the specific audio stream is generated by the computation device for the specific client device from a subset of the first audio streams (Paragraph 0006 discloses the conference switch 100, also referred to as a conference bridge, mixes incoming speech signals from each site and sends the mixed signal back to each site.  The speech signal coming from the current site is usually removed from the mixed signal that is sent back to this same site.  Figure 1 and paragraph 0037 disclose the conference switch 148 may be configured to mix incoming speech signals from each site and sends the mixed signal back to each site, except that the speech signal coming from the current site may be removed from the mixed signal that is sent back to the current site);
wherein the subset of the first audio streams excludes one or more audio streams received from one or more other client devices in the two or more client devices in the first group (Paragraph 0006 discloses the conference switch 100, also referred to as a conference bridge, mixes incoming speech signals from each site and sends the mixed signal back to each site.  The speech signal coming from the current site is usually removed from the mixed signal that is sent back to this same site.  Figure 1 and paragraph 0037 disclose the conference switch 148 may be configured to mix incoming speech signals from each site and sends the mixed signal back to each site, except that the speech signal coming from the current site may be removed from the mixed signal that is sent back to the current site).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to incorporate removing speech signals from a site from a mixed signal sent back to the site, as described in Virolainen, with mixing signals from microphones in various locations for a teleconference, as described in LaFata, because doing so is using a known technique to improve a similar method in the same way.  Combining removing 
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to combine the teachings of LaFata and Virolainen to obtain the invention as specified in claim 15.

Regarding claim 16, LaFata discloses a non-transitory computer-readable storage medium storing a computer program (Paragraph 0022 discloses the invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor) that, when executed by a computer processor, causes the computer processor to perform operations of hosting a teleconference among a plurality of client devices arranged in two or more acoustic spaces, each client device having an audio capturing capability and/or an audio rendering capability), the operations comprising:
grouping the plurality of client devices into two or more groups based on their belonging to respective acoustic spaces, wherein the two or more groups include a first group into which two or more client devices in the plurality of client devices are grouped, wherein the two or more client devices belong to a first acoustic space in the two or more acoustic spaces (Paragraph 0029 disclose a "locale" refers to the physical space where audio feedback from multiple active devices used by multiple conference participants to connect to the same conference call would create feedback problems or would create unsynchronized speaker signals or would inhibit the proper functioning of acoustic echo cancellation used in conventional cloud-based conferencing systems.  A "shared locale" refers to a locale or physical location occupied by two or more participants of a conference call.  Figure 2 and paragraphs 0041 and 0042 disclose a same-locale multiple-device conferencing method 200 starts by detecting the presence of two or more audio clients connecting to the conference call from the same physical location or same locale (202).  With the detection of two or more audio client connections at the same physical location, the 
receiving, by a computation device, first audio streams from the plurality of client devices (Figure 4 and paragraphs 0053 and 0057 disclose the aggregation node 25 is configured to receive inbound microphone data packets from four shared locale audio clients.  In the present example, the aggregation node 25 is configured on the host device of audio client node 4.  Accordingly, audio client nodes 1-3 are connected to the aggregation node 25 through a local area network to provide their microphone packet streams 30 and audio client node 4 provides its microphone packet stream 30 directly to the aggregation node 25.  Meanwhile, the network socket 82 of the aggregation node 25 receives inbound speaker packet stream 84 from the audio conferencing server destined to the locale associated with the aggregation node.  The inbound speaker packet stream 84 contains inbound speaker audio signals 86 being a mix of the audio signals of all of the conference participants.  Applicant’s “first audio streams” are interpreted to include each of the microphone packet streams and the inbound speaker packet stream 84 of LaFata);
generating, by the computation device from the first audio streams, second audio streams for rendering by respective client devices among the plurality of client devices, wherein the second audio streams are generated based on the grouping of the plurality of client devices into the two or more groups (Figure 4 and paragraphs 0054, 0056, 0057, 0059 and 0060 disclose with inbound microphone packet streams 30 being received from multiple audio client nodes 1-3 in the shared locale, the aggregation node 25 stores the inbound microphone data packets into respective jitter buffers 70.  The aggregation node 25 pulls microphone data packets out of each jitter buffer and from the local audio client node (audio client 4) and processes the microphone data packets through the respective delay line 74.  The delay lines 74 introduce delays that are specific to each audio client node to each inbound microphone packet stream.  The microphone data packets for each inbound microphone packet stream go through its own delay line 74.  With each delay line 74 applying the audio-client-specific delay, the microphone signals from all the audio clients will be lined up after the delay lines.  The aligned microphone signals are mixed together by mixer 76 to form a mixed audio data packet 78 containing the audio signals of all of the audio clients connected to the aggregation node 25.  The mixed audio data packets form an outbound microphone packet stream 80.The inbound speaker audio signals 86 are coupled to a splitter 88 to produce copies of the inbound speaker audio signals 86 for each audio client.  The splitter 88 produces individual speaker 
wherein the second audio streams comprise an individual second audio stream generated for rendering by each of the respective client devices arranged in the two or more acoustic spaces (Figure 4 and paragraphs 0054, 0056, 0057, 0059 and 0060 disclose with inbound microphone packet streams 30 being received from multiple audio client nodes 1-3 in the shared locale, the aggregation node 25 stores the inbound microphone data packets into respective jitter buffers 70.  The aggregation node 25 pulls microphone data packets out of each jitter buffer and from the local audio client node (audio client 4) and processes the microphone data packets through the respective delay line 74.  The delay lines 74 introduce delays that are specific to each audio client node to each inbound microphone packet stream.  The microphone data packets for each inbound microphone packet stream go through its own delay line 74.  With each delay line 74 applying the audio-client-specific delay, the microphone signals from all the 
wherein the second audio streams include a specific audio stream for a specific client device in the two or more client devices in the first group (Figure 4 and paragraphs 0054, 0056, 0057, 0059 and 0060 disclose with inbound microphone packet streams 30 being received from multiple audio client nodes 1-3 in the shared locale, the aggregation node 25 stores the inbound microphone data packets into respective jitter buffers 70.  The aggregation node 25 pulls microphone data packets out of each jitter 
wherein the specific audio stream is sent by the computation device to the specific client device 
outputting, by the computation device, the generated second audio streams to the respective client devices for rendering (Figure 4 and paragraphs 0056, 0057, and 0060 disclose the outbound microphone packet stream 80 from the aggregation node 25 is then provided through the network socket 82 to the audio conferencing server.  The network socket 82 of the aggregation node 25 receives inbound speaker packet stream 84 from the audio conferencing server destined to the locale associated with the aggregation node.  In operation, the aggregation node 25 receives the inbound speaker packet stream 84 and generates separated speaker data packets destined for each audio client.  The aggregation node 25 processes the separated speaker data packets for each audio client through the respective output delay line 94.  The speaker data packets for each audio client node go through its own delay line 94.  With each 
LaFata does not explicitly disclose wherein the specific audio stream is generated by the computation device for the specific client device from a subset of the first audio streams; wherein the subset of the first audio streams excludes one or more audio streams received from one or more other client devices in the two or more client devices in the first group.
In analogous art, Virolainen discloses wherein the specific audio stream is generated by the computation device for the specific client device from a subset of the first audio streams (Paragraph 0006 discloses the conference switch 100, also referred to as a conference bridge, mixes incoming speech signals from each site and sends the mixed signal back to each site.  The speech signal coming from the current site is usually removed from the mixed signal that is sent back to this same site.  Figure 1 and paragraph 0037 disclose the conference switch 148 may be configured to mix incoming speech signals from each site and sends the mixed signal back to each site, except that the speech signal coming from the current site may be removed from the mixed signal that is sent back to the current site);
wherein the subset of the first audio streams excludes one or more audio streams received from one or more other client devices in the two or more client devices in the first group (Paragraph 0006 discloses the conference switch 100, also referred to as a conference bridge, mixes incoming speech signals from each site and sends the mixed signal back to each site.  The speech signal coming from the current site is usually removed from the mixed signal that is sent back to this same site.  Figure 1 and 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to incorporate removing speech signals from a site from a mixed signal sent back to the site, as described in Virolainen, with mixing signals from microphones in various locations for a teleconference, as described in LaFata, because doing so is using a known technique to improve a similar method in the same way.  Combining removing speech signals from a site from a mixed signal sent back to the site of Virolainen with mixing signals from microphones in various locations for a teleconference of LaFata was within the ordinary ability of one of ordinary skill in the art based on the teachings of Virolainen.
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to combine the teachings of LaFata and Virolainen to obtain the invention as specified in claim 16.

Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over LaFata in view of Virolainen as applied to claim 1 above, and further in view of Cartwright et al. (U.S. Patent Application Publication No. 2015/0244869 A1) (hereinafter Cartwright).

Regarding claim 7, as applied to claim 1 above, LaFata, as modified by Virolainen, discloses the claimed invention except explicitly disclosing determining a linear mapping function for mapping the first audio streams to the second audio streams based on the grouping of the plurality of client devices into the two or more groups; and generating the second audio streams from the first audio streams by applying the linear mapping function to the first audio streams.
In analogous art, Cartwright discloses determining a linear mapping function for mapping the first audio streams to the second audio streams based on the grouping of the plurality of client devices into the two or more groups; and generating the second audio streams from the first audio streams by applying the linear mapping function to the first audio streams (Paragraph 0033 discloses it is desirable to 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to incorporate mapping a soundfield using linear transformation, as described in Cartwright, with transforming audio signals, as described in 
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to combine the teachings of LaFata, Virolainen, and Cartwright to obtain the invention as specified in claim 7.

Regarding claim 8, as applied to claim 1 above, LaFata, as modified by Virolainen, discloses the claimed invention except explicitly disclosing for at least one group of client devices, assigning client devices in other groups of client devices to respective virtual source locations in a virtual listening environment, wherein the second audio streams for the client devices in the at least one group of client devices are generated such that captured audio from the client devices in the other groups of client devices is rendered to respective virtual source locations when the second audio streams for the client devices in the at least one group of client devices are rendered by the client devices in the at least one group of client devices.
In analogous art, Cartwright discloses for at least one group of client devices, assigning client devices in other groups of client devices to respective virtual source locations in a virtual listening environment, wherein the second audio streams for the client devices in the at least one group of client devices are generated such that captured audio from the client devices in the other groups of client devices is rendered to respective virtual source locations when the second audio streams for the client devices in the at least one group of client devices are rendered by the client devices in the at least one group of client devices (Paragraphs 0054-0057 disclose the present document addresses the technical problem of building a 2D or 3D conference scene for a multi-party conference system 100 which comprises one or more soundfield endpoints 120.  The conference scene may be built within an endpoint 120 of the conference system 100 and/or within the conference server 110 of the conference system 100.  The conference scene should allow a listener to identify the different participants of the multi-party conference, including a plurality of participants at the one or more soundfield endpoints 120.  For this 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to incorporate generating conference scenes in which multiple soundfields are combined in such a way that the listener appears to be sitting in a virtual room with all of the other meeting participant, as described in Cartwright, with generating audio sounds in conference calls, as described in LaFata, as modified by Virolainen, because doing so is using a known technique to improve a similar method in the same way.  Combining generating conference scenes in which multiple soundfields are combined in such a way that the listener appears to be sitting in a virtual room with all of the other meeting participant of Cartwright with generating audio sounds in conference calls of LaFata, as modified by Virolainen, was within the ordinary ability of one of ordinary skill in the art based on the teachings of Cartwright.
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to combine the teachings of LaFata, Virolainen, and Cartwright to obtain the invention as specified in claim 8.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over LaFata in view of Virolainen as applied to claim 1 above, and further in view of Hägglund et al. (U.S. Patent Application Publication No. 2012/0230509 A1) (hereinafter Hägglund).

Regarding claim 9, as applied to claim 1 above, LaFata, as modified by Virolainen, discloses the claimed invention except explicitly disclosing for each client device among the plurality of client devices, detecting whether the respective client device renders audio via headphone loudspeakers; and 
In analogous art, Hägglund discloses for each client device among the plurality of client devices, detecting whether the respective client device renders audio via headphone loudspeakers (Figure 2 and paragraph 0035 disclose when a use of a service is commenced, it is first detected 210 if the headset 140 is in full physical contact with the user of the device 100 that is used.  If the headset 140 is in full physical contact with the user, the application 110 will operate in the first active mode 111, 220.  If the headset 140 is in less than full physical contact with the user, the application 110 will operate in the second active mode 112, 240); and
for each client device that is determined to render audio via headphone loudspeakers, generating the second audio stream for the respective client device to include captured audio from all active sound sources (Figure 2 and paragraphs 0036 and 0037 disclose the first active mode 111, 220 corresponds to full operation of a microphone and speakers of the headset 140.  Assume the user is engaged in a conference call involving at least two other persons or participants.  The user is using a headset 140 connected to a laptop or a mobile phone, constituting the device 100 described above, for this conference call.  Also assume the headset 140 has two earpieces, both being inserted in the ears of the user.  The user is interrupted by another person needing the user's attention.  When detecting that one earpiece is removed, the application 110 in the laptop or mobile phone controlling the service switches operation mode from the first active mode 111, 220 to the second active mode 112, 240.  In the second active mode 112, 240, audio signals from the microphone of the headset 140 are discarded so as to mute the microphone of the headset 140.  This means that the user may still be able to hear the other persons talking, not missing out on anything being said, but also his/her microphone is muted so that the user may speak freely to the interrupting person without the other participants in the conference call being able to hear what the user is saying.  When detecting that the currently removed earpiece of the headset 140 is re-inserted in the user's ear, the application 110 switches operation mode to the first active mode 111, 220, corresponding to regained full operation of the microphone and speakers of the headset 140 so that the other participants may again hear what the user is saying).

Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to combine the teachings of LaFata, Virolainen, and Hägglund to obtain the invention as specified in claim 9.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over LaFata in view of Virolainen as applied to claim 1 above, and further in view of Zhang et al. (U.S. Patent Application Publication No. 2010/0074433 A1) (hereinafter Zhang).

Regarding claim 10, as applied to claim 1 above, LaFata, as modified by Virolainen, discloses the claimed invention except explicitly disclosing at least one of: performing single-channel echo cancellation for at least one client device among the plurality of client devices to suppress a representation of the second audio stream received by the at least one client device in the first audio stream output by the at least one client device; and performing multi-channel echo cancellation for at least one group of client devices to suppress representations of the second audio streams received by the client devices in the at least one group of client devices in the first audio streams output by the client devices in the at least one group of client devices.
In analogous art, Zhang discloses at least one of: performing single-channel echo cancellation for at least one client device among the plurality of client devices to suppress a representation of the second audio stream received by the at least one client device in the first audio stream output by the at least one client device; and performing multi-channel echo cancellation for at least one group of client devices to 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to incorporate using echo cancellation in an audio conferencing system, as described in Zhang, with generating audio for a conference call, as described in LaFata, as modified by Virolainen, because doing so is using a known technique to improve a similar method in the same way.  Combining using echo cancellation in an audio conferencing system of Zhang with generating audio for a conference call of LaFata, as modified by Virolainen, was within the ordinary ability of one of ordinary skill in the art based on the teachings of Zhang.
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to combine the teachings of LaFata, Virolainen, and Zhang to obtain the invention as specified in claim 10.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over LaFata in view of Virolainen as applied to claim 1 above, and further in view of Ahgren et al. (U.S. Patent Application Publication No. 2016/0050491 A1) (hereinafter Ahgren).

Regarding claim 12, as applied to claim 1 above, LaFata, as modified by Virolainen, discloses the claimed invention except explicitly disclosing for at least one group of client devices, determining a relative spatial arrangement of the client devices in the respective group of client devices, wherein generating the second audio streams is further based on the determined relative spatial arrangement of client devices in the at least one group of client devices.
In analogous art, Ahgren discloses for at least one group of client devices, determining a relative spatial arrangement of the client devices in the respective group of client devices (Paragraph 0096 discloses a communication client application can detect that the user terminal on which it is executed is co-located with another user terminal participating in the conference call based on the communication client application determining the location of the user terminal on which it is executed and receiving location information from another terminal participating in the conference call.  Each communication client application may determine the location of the user terminal on which it is executed and report their location by transmitting location information (i.e. latitude and longitude information) to the other communication client applications executed on the other user terminals participating in the conference call.  Thus a communication client application has location information of the user terminal on which it is executed as well as location information of the other user terminals participating in the conference call, and can detect whether the user terminal on which it is executed is co-located with another user terminal participating in the conference call based on whether the other user terminal is within a predetermined range of the user terminal),
wherein generating the second audio streams is further based on the determined relative spatial arrangement of client devices in the at least one group of client devices (Paragraphs 0115 and 0116 disclose the method may further comprise encoding the output audio signal at an encoder of said network entity to produce an encoded output audio signal and transmitting the encoded output audio signal over the communications network to said user device.  In exemplary embodiments the method is performed based on detecting that the user device and the at least one further user device are located in a common acoustic space).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to incorporate determining whether 
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to combine the teachings of LaFata, Virolainen, and Ahgren to obtain the invention as specified in claim 12.

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over LaFata in view of Virolainen as applied to claim 1 above, and further in view of Hirsch et al. (U.S. Patent Application Publication No. 2012/0260232 A1) (hereinafter Hirsch).

Regarding claim 14, as applied to claim 1 above, LaFata, as modified by Virolainen, discloses the claimed invention except explicitly disclosing wherein the grouping the plurality of client devices into two or more groups is further based on at least one of: operating systems of the client devices; and CPU availabilities of the client devices.
In analogous art, Hirsch discloses wherein the grouping the plurality of client devices into two or more groups is further based on at least one of: operating systems of the client devices; and CPU availabilities of the client devices (Paragraph 0085 discloses each mobile device category may be associated with the group of mobile devices that run a particular mobile operating system (e.g., the various versions of Apple's iPhone, iPad and iTouch, which run the iOS mobile operating system)).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to incorporate grouping mobile devices by operating system, as described in Hirsch, with using mobile devices in a conference call, as described in LaFata, as modified by Virolainen, because doing so is combining prior art elements 
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains to combine the teachings of LaFata, Virolainen, and Hirsch to obtain the invention as specified in claim 14.
Response to Arguments
Applicant's arguments filed 3/25/2021 have been fully considered but they are not persuasive.
On pages 11 and 12 in the Remarks, Applicant argues that LaFata at least fails to disclose “receiving, by a computation device, first audio streams from the plurality of client devices; generating, by the computation device from the first audio streams, second audio streams for rendering by respective client devices among the plurality of client devices, wherein the second audio streams are generated based on the grouping of the plurality of client devices into the two or more groups; wherein the second audio streams comprise an individual second audio stream generated for rendering by each of the respective client devices arranged in the two or more acoustic spaces; wherein the second audio streams include a specific audio stream for a specific client device in the two or more client devices in the first group; wherein the specific audio stream is generated by the computation device from a subset of the first audio streams and sent by the computation device to the specific client device.”
Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references.  Applicant has stated that LaFata does not disclose specific limitations, but has not addressed how the already cited portions of LaFata fail to disclose the specific limitations.  Applicant’s computation device is interpreted to include LaFata’s aggregation node.
LaFata discloses receiving, by a computation device, first audio streams from the plurality of client devices (Figure 4 and paragraphs 0053 and 0057 disclose the aggregation node 25 is configured to receive inbound microphone data packets from four shared locale audio clients.  In the present example, 
generating, by the computation device from the first audio streams, second audio streams for rendering by respective client devices among the plurality of client devices, wherein the second audio streams are generated based on the grouping of the plurality of client devices into the two or more groups (Figure 4 and paragraphs 0054, 0056, 0057, 0059 and 0060 disclose with inbound microphone packet streams 30 being received from multiple audio client nodes 1-3 in the shared locale, the aggregation node 25 stores the inbound microphone data packets into respective jitter buffers 70.  The aggregation node 25 pulls microphone data packets out of each jitter buffer and from the local audio client node (audio client 4) and processes the microphone data packets through the respective delay line 74.  The delay lines 74 introduce delays that are specific to each audio client node to each inbound microphone packet stream.  The microphone data packets for each inbound microphone packet stream go through its own delay line 74.  With each delay line 74 applying the audio-client-specific delay, the microphone signals from all the audio clients will be lined up after the delay lines.  The aligned microphone signals are mixed together by mixer 76 to form a mixed audio data packet 78 containing the audio signals of all of the audio clients connected to the aggregation node 25.  The mixed audio data packets form an outbound microphone packet stream 80.The inbound speaker audio signals 86 are coupled to a splitter 88 to produce copies of the inbound speaker audio signals 86 for each audio client.  The splitter 88 produces individual speaker data packets 89 destined for each audio client node (for example, audio client nodes 1-4).  The aggregation node 25 dealigns the separated speaker data packets 89 so that the speaker data packets will sound aligned when play out on the individual speakers of the host device of each audio client node.  
wherein the second audio streams comprise an individual second audio stream generated for rendering by each of the respective client devices arranged in the two or more acoustic spaces (Figure 4 and paragraphs 0054, 0056, 0057, 0059 and 0060 disclose with inbound microphone packet streams 30 being received from multiple audio client nodes 1-3 in the shared locale, the aggregation node 25 stores the inbound microphone data packets into respective jitter buffers 70.  The aggregation node 25 pulls microphone data packets out of each jitter buffer and from the local audio client node (audio client 4) and processes the microphone data packets through the respective delay line 74.  The delay lines 74 introduce delays that are specific to each audio client node to each inbound microphone packet stream.  The microphone data packets for each inbound microphone packet stream go through its own delay line 74.  With each delay line 74 applying the audio-client-specific delay, the microphone signals from all the audio clients will be lined up after the delay lines.  The aligned microphone signals are mixed together by mixer 76 to form a mixed audio data packet 78 containing the audio signals of all of the audio clients connected to the aggregation node 25.  The mixed audio data packets form an outbound microphone 
wherein the second audio streams include a specific audio stream for a specific client device in the two or more client devices in the first group (Figure 4 and paragraphs 0054, 0056, 0057, 0059 and 0060 disclose with inbound microphone packet streams 30 being received from multiple audio client nodes 1-3 in the shared locale, the aggregation node 25 stores the inbound microphone data packets into respective jitter buffers 70.  The aggregation node 25 pulls microphone data packets out of each jitter buffer and from the local audio client node (audio client 4) and processes the microphone data packets through the respective delay line 74.  The delay lines 74 introduce delays that are specific to each audio client node to each inbound microphone packet stream.  The microphone data packets for each inbound 
wherein the specific audio stream is sent by the computation device to the specific client device (Figure 4 and paragraphs 0056, 0057, and 0060 disclose the outbound microphone packet stream 80 from the aggregation node 25 is then provided through the network socket 82 to the audio conferencing server.  The network socket 82 of the aggregation node 25 receives inbound speaker packet stream 84 .
In response to Applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).  
LaFata does not explicitly disclose wherein the specific audio stream is generated by the computation device for the specific client device from a subset of the first audio streams.
Virolainen discloses wherein the specific audio stream is generated by the computation device for the specific client device from a subset of the first audio streams (Paragraph 0006 discloses the conference switch 100, also referred to as a conference bridge, mixes incoming speech signals from each site and sends the mixed signal back to each site.  The speech signal coming from the current site is usually removed from the mixed signal that is sent back to this same site.  Figure 1 and paragraph 0037 disclose the conference switch 148 may be configured to mix incoming speech signals from each site and 
Conclusion
Any inquiry concerning this communication or earlier communications from the Examiner should be directed to MARK G. PANNELL whose telephone number is (303) 297-4245.  The Examiner can normally be reached on Monday through Friday 8:00 am to 3:00 pm (Mountain Time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool.  To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, Rafael Perez-Gutierrez can be reached on (571) 272-7915.  The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at (866) 217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call (800) 786-9199 (IN USA OR CANADA) or (571) 272-1000.






/Mark G. Pannell/Examiner, Art Unit 2642