Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Continued Examination Under 37 CFR 1.114
1.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 10/10/2022 has been entered.
Response to Amendment
2.	Claims 1, 2, 7, 14-15, 19-20 have been amended.
Response to Arguments
3.	Applicant’s arguments filed have been considered but are moot based on the new grounds of rejection responsive to the amendments.
	Additional prior art, Singaraju, has been incorporated with Nemala to further teach the newly recited limitations of a model trained with speech and noise signal strength values (see art rejection below).

Claim Rejections - 35 USC § 103
4.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

6.	Claims 1-5, 13-17, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Jain et al (2008/0310398) in view of Bader et al (8,719,032) in further view of Nemala et al (9,640,194) in further view of Singaraju et al (2017/0256270).

Regarding claim 1 Jain teaches A computer-implemented method (abstract: method and system; fig 2, 19: computer system) comprising: 
receiving, by a computing device that is an audio conferencing device in a room, and from a plurality of microphones of the audio conferencing device, and each microphone providing audio data to a respective audio channel, first audio data for a first audio channel of the respective channels (abstract: receiving calls, wherein each call comprises an audio stream; fig 3-5; 13: conference calls; audio channel; 22: receiving a plurality of calls in a telephone conference call system 300 includes acquiring an audio stream in packet intervals, e.g., audio packets, of each of N call audio streams 310(N). Each call may be analyzed according to selected feature criteria in N separate audio channel analyzers 330(N)); 
transmitting, by the computing device, the first audio data (fig 3, 4, 5; para 23: selected number of speakers…are forwarded); 
while receiving and transmitting the first audio data (23-25): 
receiving, by the computing device, second audio data for a second audio channel of the respective audio channels (25 other speakers enter the conversation; 24: audio stream corresponding to a channel); 
determining, by the computing device, a first speech audio energy level of the first audio data and a first noise energy level of the first audio data by providing the first audio data as a first input to a model that is trained to determine a speech audio energy level of given audio data and a noise energy level of the given audio data (22: audio channel analyzers; 23: each of channel analyzers may determine energy content in speaker’s voice; may be accomplished using software that recognizes and distinguishes speech from non-speech sounds; energy content of non-speech sounds); 
determining, by the computing device, a second speech audio energy level of the second audio data and a second noise energy level of the second audio data by providing the second audio data as a second input to the model (22: audio channel analyzers; 23: each of channel analyzers may determine energy content in speaker’s voice; may be accomplished using software that recognizes and distinguishes speech from non-speech sounds; energy content of non-speech sounds); and 
based on the first speech audio energy level, the first noise energy level, the second speech audio energy level, and the second noise energy level, determining, by the computing device, the audio data with a highest speech energy level and whether to switch to transmitting the second audio data or continue transmitting the first audio data based on the determination (fig 3; para 23: audio ranked by voice energy content…and selected number of speakers, e.g., those with the loudest speech energy content, are forwarded; 24-25; analyzing energy features of each channel and determining which to select for transmission); and 
based on determining whether to switch to transmitting the second audio data or continue transmitting the first audio data, transmitting, by the computing device, the first audio data or the second audio data (23-25 -
where Jain teaches receiving multiple channels of audio data, analyzing the audio data to determine audio characteristics, and presenting certain channels based on the audio characteristics; Jain monitors the channels, and as a specific channel exhibits certain characteristics that are not acceptable, can instead incorporate another more acceptable channel).  
Jain does not specifically teach where Bader teaches
receiving, by a computing device that is an audio conferencing device in a room, and from a plurality of microphones of the audio conferencing device, each microphone of the microphones positioned in the room (abstract: conference room with multiple microphones; figure 1; col 1 l. 37-39: microphones in the same room; col 3 l. 10-28).
Jain already teaches an audio conferencing device in a location/room and multiple microphones, while Bader teaches multiple microphones that could be in the same room.
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Bader with Jain for an improved system to allow the conference system to also function when microphones are positioned locally, and further yield the predictable results of still obtaining, analyzing, and transmitting each stream accordingly.  The results of the combination would have been predictable and would optimize the user experience during conference calls (Jain 0013).

Jain additionally teaches
0016:  Methods may be employed that can recognize and distinguish between voice and non-voice contributions to the audio stream, and thereby determine the energy content in the voice component of the audio stream. Priority access is then based on each of the meeting participant's voice energy content
[0023] Each of channel analyzers 330(N) may determine the energy content in a speaker's voice apart from other audio energy content in the audio stream. This may be accomplished, for example, using software that recognizes and distinguishes speech from non-speech sounds. Intervals between spoken words, for example, may be used to evaluate the energy content of non-speech sounds. This energy content may be subtracted from the total energy content level of the audio stream packet, determining a net voice audio energy content.  Each audio packet may then be ranked by the voice energy content and only a selected number of speakers, e.g., those with the loudest speech energy content, are forwarded to a mixer 360 for broadcast routing 370 to conference call participants.
	But does not specifically teach 
A model that is trained on speech audio samples that have either no background noise or background noise below a threshold, speech signal strength values of the speech audio samples, noise samples, and noise signal strength values of the noise samples, to determine a speech audio energy level of given audio data and a noise energy level of the given audio data.

Nemala teaches
model that is trained on speech audio samples that have either no background noise or background noise below a threshold, [speech signal strength values of the speech audio samples,] noise samples, [and noise signal strength values of the noise samples,] to determine a speech audio energy level of given audio data and a noise energy level of the given audio data (col 6 l. 5-11 energy; col 8 l. 1-11 energy at channel
abstract; fig 4-5; col 8 l 1-11: reference clean speech and noise signals may be combined by a combination module of the training system into synthetic noisy speech signals; 
abstract: machine learning trained on cues pertaining to clean and noisy speech signals and a synthetic noisy speech signal).  
Jain teaches determine the energy content in a speaker's voice apart from other audio energy content in the audio stream…using software that recognizes and distinguishes speech from non-speech sounds (0023), with Nemala teaching a model trained on speech and noise audio samples.
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Nemala for an improved system to allow for training a model to recognize speech and noise for more accurate speech and noise identification and classification.  This would allow for Jain to better determine the energy content in a speaker’s voice and noise by using the model of Nemala to optimize the user experience during conference calls (Jain 0013).


Nemala teaches training machine-learning algorithms trained with clean speech, noise, and cues, where the cues can be channel energy and noise estimates
(Col 4 l. 37-39: machine-learning algorithms for mapping cues between predetermined, reference noise signals/clean speech signals and noisy speech signals.
Col 4 l. 60-63: The mapping may be learned from a training database, and one such mapping may exist per frequency domain tap or per group of frequency domain taps.
Col 7 l. 66 – col 8 l. 14: As follows from this figure, a frequency analysis module 450 and/or combination module 460 of the training system 410 may receive predetermined reference clean speech signals and predetermined reference noise signals from the clean speech database 420 and the noise database 430, respectively. These reference clean speech and noise signals may be combined by a combination module 460 of the training system 410 into “synthetic” noisy speech signals. The synthetic noisy speech signals may then be processed, and one or more cues may be extracted therefrom, by a Frequency Extractor (FE) module 470 of the training system 410. As discussed, these cues may refer to at least one of ILD cues, IPD cues, energy at channel cues, VAD cues, spatial cues, frequency cues, Wiener gain mask estimates, pitch-based cues, periodicity-based cues, noise estimates, context cues, and so forth.).
Nemala does not specifically teach however training incorporating signal strength values for the speech and noise samples.
Singaraju teaches determine speech and noise energy levels (abstract: determine energy levels for speech and noise; 0054; [0042] Alternately, the noise characteristics during, before and after the time period when the low scored trigger was said can also be used to improve the recognition model. The noise characteristics can be added to the training models, or the model be retrained or simply allow for these speech variations and noise variations into the recognition model. User specific thresholds such as speaker verification or thresholds used for detection or minimizing false accepts can also be modified using this information.)
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Singaraju for improved training and speech/noise determination.  Nemala already teaches obtaining noise and speech samples and energy for training models, and one could look to Singaraju to further include the energy level of each sample to obtain more accurate determinations of the speech and noise of received audio data to improve the real time audio processing (Nemala abstract) and ultimately better determine overall signal energy levels for deciding which channels to use (Jain).



Regarding claim 2 Jain does not specifically teach where Nemala teaches The method of claim 1, comprising: 
receiving, by the computing device, the speech audio samples (abstract clean speech signals; fig 4; col 7 l. 59-65; col 8 l. 1); 
receiving, by the computing device, the noise samples (abstract; fig 4; col 8 l. 2 reference noise signals); 
[determining, by the computing device, the noise energy level of each noise sample and the speech audio energy level of each speech audio sample]; 
generating, by the computing device, noisy speech audio samples by combining each noise sample and each speech audio sample (abstract; fig 4-5; col 8 l 1-11: reference clean speech and noise signals may be combined by a combination module of the training system into synthetic noisy speech signals); and 
training, by the computing device and using machine learning, the model using [the noise energy level of each noise sample, the speech audio energy level of each speech audio sample, and] the noisy speech audio samples to estimate the speech audio energy level and the noise energy level of the samples (abstract: machine learning trained on cues pertaining to clean and noisy speech signals and a synthetic noisy speech signal);
and does not specifically teach where Singaraju teaches
determining, by the computing device, the noise energy level of each noise sample and the speech audio energy level of each speech audio sample (abstract: determine energy levels for speech and noise);
and where Nemala when incorporated with Singaraju teach  
training, by the computing device and using machine learning, the model using the noise energy level of each noise sample, the speech audio energy level of each speech audio sample, and the noisy speech audio samples to estimate the speech audio energy level and the noise energy level of the samples
Rejected for similar rationale and reasoning as claim 1 above.

Regarding claim 3 Nemala teaches The method of claim 2, wherein combining each noise sample and each speech audio sample comprises overlapping each noise sample and each audio sample in the time domain and summing each noise sample and each audio sample.  (col 8 l 1-11: reference clean speech and noise signals may be combined by a combination module of the training system into synthetic noisy speech signals)
Rejected for similar rationale and reasoning as claim 2

Regarding claim 4 Jain teaches The method of claim 1, wherein: 
determining whether to switch to transmitting the second audio data or continue transmitting the first audio data comprises determining to switch to transmitting the second audio data, and transmitting the first audio data or the second audio data comprises transmitting the second audio data and ceasing to transmit the first audio data (24-25 analyzing energy features of each channel and determining which to select for transmission; 24: block passage of the audio stream from one or more of the calls).  

Regarding claim 5 Jain teaches The method of claim 1, wherein: 
determining whether to switch to transmitting the second audio data or continue transmitting the first audio data comprises determining to continue transmitting the first audio data, and transmitting the first audio data or the second audio data comprises continue transmitting the first audio data (24-25 analyzing energy features of each channel and determining which to select for transmission).  

Regarding claim 13 Jain teaches The method of claim 1, wherein the computing device is configured to receive additional audio data for additional audio channels and determine whether to switch to transmitting the additional audio data from one of the additional audio channels (22-25 analyzing energy features of each channel and determining which to select for transmission).  


Regarding claim 14 Jain, Bader, Nemala, and Singaraju teach A system comprising: 
one or more computers; and 
one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform the operations comprising: 
receiving, by a computing device that is an audio conferencing device in a room, and from a plurality of microphones of the audio conferencing device, each microphone of the microphones positioned in the room, and each microphone providing audio data to a respective audio channel, first audio data for a first audio channel of the respective audio channels; 
transmitting, by the computing device, the first audio data; 
while receiving and transmitting the first audio data: 32Attorney Docket No. 16113-8782001 
receiving, by the computing device, second audio data for a second audio channel of the respective audio channels; 
determining, by the computing device, a first speech audio energy level of the first audio data and a first noise energy level of the first audio data by providing the first audio data as a first input to a model that is trained on speech audio samples that have either no background noise or background noise below a threshold, speech signal strength values of the speech audio samples, noise samples, and noise signal strength values of the noise samples, to determine a speech audio energy level of given audio data and a noise energy level of the given audio data; 
determining, by the computing device, a second speech audio energy level of the second audio data and a second noise energy level of the second audio data by providing the second audio data as a second input to the model; and 
based on the first speech audio energy level, the first noise energy level, the second speech audio energy level, and the second noise energy level, determining, by the computing device, the audio data with a highest speech energy level and whether to switch to transmitting the second audio data or continue transmitting the first audio data based on the determination; and 
based on determining whether to switch to transmitting the second audio data or continue transmitting the first audio data, transmitting, by the computing device, the first audio data or the second audio data.  
Claim recites limitations similar to claim 1 and is rejected for similar rationale and reasoning.

Claim 15 recites limitations similar to claim 2 and is rejected for similar rationale and reasoning.
Claim 16 recites limitations similar to claim 4 and is rejected for similar rationale and reasoning.
Claim 17 recites limitations similar to claim 5 and is rejected for similar rationale and reasoning.


Regarding claim 20 Jain, Bader, Nemala, and Singaraju teach A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform the operations comprising: 
receiving, by a computing device that is an audio conferencing device in a room, and from a plurality of microphones of the audio conferencing device, each microphone of the microphones positioned in the room, and each microphone providing audio data to a respective audio channel, first audio data for a first audio channel of the respective audio channels; 
transmitting, by the computing device, the first audio data; 
while receiving and transmitting the first audio data: 
receiving, by the computing device, second audio data for a second audio channel of the respective audio channels; 
determining, by the computing device, a first speech audio energy level of the first audio data and a first noise energy level of the first audio data by providing the first audio data as a first input to a model that is trained on speech audio samples that have either no background noise or background noise below a threshold, speech signal strength values of the speech audio samples, noise samples, and noise signal strength values of the noise samples, to determine a speech audio energy level of given audio data and a noise energy level of the given audio data; 
determining, by the computing device, a second speech audio energy level of the second audio data and a second noise energy level of the second audio data by providing the second audio data as a second input to the model; and 
based on the first speech audio energy level, the first noise energy level, the second speech audio energy level, and the second noise energy level, determining, by the computing device, the audio data with a highest speech energy level and whether to switch to transmitting the second audio data or continue transmitting the first audio data based on the determination; and 
based on determining whether to switch to transmitting the second audio data or continue transmitting the first audio data, transmitting, by the computing device, the first audio data or the second audio data. 
Claim recites limitations similar to claim 1 and is rejected for similar rationale and reasoning.




7.	Claims 6, 12, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Jain in view of Bader in further view of Nemala in further view of Singaraju in further view of Femal (2016/0261749).

	Regarding claim 6 Jain teaches the method of claim 1, wherein 
determining a first speech audio energy level of the first audio data and a first noise energy level of the first audio data	
determining whether to switch to transmitting the second audio data or continue transmitting the first audio data is based further on, [for each of the multiple frequency bands,] each first speech audio energy level, each first noise energy level, each second speech audio energy level, and each second noise energy level (as discussed and rejected in claim 1).  
Jain does not specifically teach where Femal teaches
determining a first speech audio energy level of the first audio data and a first noise energy level of the first audio data comprises: 
for each of multiple frequency bands, determining a respective first speech audio energy level and a respective first noise energy level (31 audio signal for each channel is analyzed to determine speech or noise; noise, voice, indicator of frequency at which the energy in audio signal is concentrated), 
determining a second speech audio energy level of the second audio data and a second noise energy level of the second audio data comprises: 
for each of the multiple frequency bands, determining a respective second speech audio energy level and a respective second noise energy level (31 audio signal for each channel is analyzed).
	Femal teaches provide automatic filtering of noisy conference members to reduce or eliminate background noise from conference audio, and determining energy levels for frequency bands of the channels.
	It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Femal for an improved system for additional audio analysis.
	Jain teaches receiving multiple channels of audio data, analyzing the audio data to determine audio characteristics, and presenting certain channels based on the audio characteristics, and one could look to Femal to allow for additional audio analysis to assist in making determination of audio characteristics and whether to incorporate certain channels.  Jain and Femal would therefore teach
determining whether to switch to transmitting the second audio data or continue transmitting the first audio data is based further on, for each of the multiple frequency bands, each first speech audio energy level, each first noise energy level, each second speech audio energy level, and each second noise energy level.  

Regarding claim 12 Jain teaches The method of claim 1, comprising: 
[before] transmitting the first audio data or the second audio data and based on the first speech audio energy level, the first noise energy level, the second speech audio energy level, the second noise energy level, [performing, by the computing device, noise reduction on the first audio data or the second audio data],
But does not specifically teach where Femal teaches before transmitting, performing, by the computing device, noise reduction on the first audio data or the second audio data (abstract; 5-6; 5: filtering of noisy conference members to reduce or eliminate background noise).  
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate noise reduction for an improved system for improved channel audio quality.

Claim 18 recites limitations similar to claim 6 and is rejected for similar rationale and reasoning.


8.	Claims 7-11, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Jain in view of Bader in further view of Nemala in further view of Singaraju in further view of He et al (2006/0287856)

Regarding claim 7 Jain teaches
first speech audio energy level, the first noise energy level, the second speech audio energy level, and the second noise energy level,
but does not specifically teach where He teaches The method of claim 1, comprising: 30Attorney Docket No. 16113-8782001 
[based on the first speech audio energy level, the first noise energy level, the second speech audio energy level, and the second noise energy level,] 
updating, by the computing device, a state of a state machine that includes a speech state, a noise state, a silence state, and an uncertain state (89-90: waveforms categorized into classes; speech state, noise state, silence state, onset state).  
Jain teaches determining energy content and distinguishing speech from non-speech (23).
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate specific states for classification presenting a reasonable expectation of success in still allowing for sound/speech determination (classification) and subsequent processes based on the determination.  The incorporation of He allows for the teaching of making a determination for transmitting the first audio data or the second audio data based on the state of the state machine.

Regarding claim 8 Jain teaches The method of claim 7, wherein: 
the first audio channel is an established speaker channel that indicates that first speech audio energy level satisfies a speech audio energy level threshold (22 channel analyzer; 23: determine energy content in a speaker’s voice; distinguishes speech from non-speech, intervals between spoken words used to evaluate energy of non-speech sounds; 24 quality ranking for each channel, threshold), 
the second audio channel is another established speaker channel that indicates that second speech audio energy level satisfies the speech audio energy level threshold (22-25 where steps (energy, quality ranking) are performed for multiple channels), 
[updating the state of the state machine comprises updating the state of the state machine to the speech state], and 
determining whether to switch to transmitting the second audio data or continue transmitting the first audio data comprises determining to transmit both the first audio data and the second audio data [based on updating the state of the speech machine to the speech state and] based on the first audio channel and the second audio channel both being established speaker channels (Jain 24-25 can send multiple acceptable channels; further rejected based on rationale of claim 1).  
Jain does not specifically teach where He teaches updating the state of the state machine comprises updating the state of the state machine to the speech state (89-90)
Rejected for similar rationale and reasoning as claim 7 above

Regarding claim 9 Jain and He teach The method of claim 7, wherein: 
the first audio channel is an established speaker channel that indicates that first speech audio energy level satisfies a speech audio energy level threshold (Jain), 
updating the state of the state machine comprises updating the state of the state machine to the noise state (He)
determining whether to switch to transmitting the second audio data or continue transmitting the first audio data comprises determining to continue transmitting the first audio data based on updating the state of the state machine to the noise state (Jain -determining channel for transmission; He - state).  
Rejected for similar rationale and reasoning as claim 7/8 above

Regarding claim 10 Jain and He teach The method of claim 7, wherein: 
the first audio channel is an established speaker channel that indicates that first speech audio energy level satisfies a speech audio energy level threshold (Jain), 
updating the state of the state machine comprises updating the state of the state machine to the silence state (He), and 31Attorney Docket No. 16113-8782001 
determining whether to switch to transmitting the second audio data or continue transmitting the first audio data comprises determining to continue transmitting the first audio data based on updating the state of the state machine to the silence state (Jain -determining channel for transmission; He - state).
Rejected for similar rationale and reasoning as claim 7/8 above
  
Regarding claim 11 Jain and He teach The method of claim 7, wherein: 
the first audio channel is an established speaker channel that indicates that first speech audio energy level satisfies a speech audio energy level threshold (Jain), 
updating the state of the state machine comprises updating the state of the state machine to the uncertain state (He), and 
determining whether to switch to transmitting the second audio data or continue transmitting the first audio data comprises determining to continue transmitting the first audio data based on updating the state of the state machine to the uncertain state (Jain -determining channel for transmission; He - state).  
Rejected for similar rationale and reasoning as claim 7/8 above


Claim 19 recites limitations similar to claim 7 and is rejected for similar rationale and reasoning.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAUN A ROBERTS whose telephone number is (571)270-7541.  The examiner can normally be reached Monday-Friday 9-5 EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAUN ROBERTS/
Primary Examiner, Art Unit 2655