Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 03/16/2020 and 03/17/2021. The submissions are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement(s) are being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 7, 11-14, 17 and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Matheja et al. (US 2016/0261951 A1) in view of Mitchell et al. (US 10,878,840 B1).

Regarding Claim 1: 
Matheja discloses a method comprising:
receiving, by a processing device through a plurality of channels, audio data, wherein the audio data of each channel corresponds to a plurality of frequency ranges; Matheja discloses 
determining, based on at least one of the speech audio energy level or the noise energy level for each of the plurality of frequency ranges, a speech signal with removed noise for each channel associated with the audio data; Matheja Figs 2-4 discloses, after receiving speech signals from the plurality of microphones, estimating the peak levels for each channel via the Automatic Gain Control (AGC) module (Fig. 3) and removing noise from each channel using the estimated noise power via the Noise Reduction (NR) module (Fig. 4) to produce preprocessed speech signals. Fig. 2 also includes Voice Activity Detection (VAD)/Speaker Activity Detection (SAD) to contribute to the calculation of the target values necessary to determine the dominant speaker and calculate values for adjusting the AGC and the maximum attenuation of the NR module. (Matheja ¶0037, 0042-0043 and 0046)
for each channel, determining one or more statistical values associated with an energy level of a channel’s speech signal with the removed noise; As detailed in the Specification, the peak measurement can be defined as a statistical value associated to the energy level of a channel.  Matheja discloses, in peak level estimation module, the estimation (determination) of a peak level (statistical value) for m-th microphone signal (channel) within the Automatic Gain Control (AGC) modules. (Matheja ¶0042)
determining a strongest channel, wherein the strongest channel has highest one or more statistical values associated with an energy level of a speech signal of a respective channel; Matheja discloses specifying (determining) the strongest (dominant) channel as the reference speech level by observing the reference/target peak level of Automatic Gain Control (AGC) modules. (Matheja ¶0044-0045).
determining that the one or more statistical values associated with the energy level of the speech signal of the strongest channel satisfy a threshold condition; Matheja discloses the dominant channel must be active for a predetermined (threshold) amount of time to control the target values necessary to regulate the background noise (Matheja ¶0038).
comparing one or more statistical values associated with an energy level of a speech signal of each channel other than the strongest channel with the corresponding one or more statistical values associated with the energy level of the speech signal of the strongest channel; Matheja discloses utilizing the dominant channel as the reference speech level to assess whether to conduct Automatic Gain Control (AGC) and noise attenuation to ensure all channels are adapted to similar levels.(Matheja ¶0045 and 0027).  
depending on the comparing, determining whether to update a gain value for a respective channel based on the one or more statistical values associated with the energy level of the respective channel; Matheja discloses achieving equivalent background noise characteristics for each channel by utilizing the reference channel (dominant speaker) to conduct Automatic Gain Control (AGC) and noise attenuation techniques to adapt the speech signal power levels to approximately the same power for all channels (Matheja ¶0027, 0041-0045).   

Matheja does not explicitly disclose:
determining, for each of the plurality of frequency ranges for each channel, at least one of a speech audio energy level or a noise energy level by providing audio data corresponding to each frequency range as input to a model that is trained to determine at least one of a speech audio energy level of given audio data or a noise energy level of the given audio data; 

However, in an analogous art, Mitchell discloses:
determining, for each of the plurality of frequency ranges for each channel, at least one of a speech audio energy level or a noise energy level by providing audio data corresponding to each frequency range as input to a model that is trained to determine at least one of a speech audio energy level of given audio data or a noise energy level of the given audio data; Mitchell teaches the use of a wide variability of energy levels from audio data/signals to provide input into a trained machine learning model to aid in recognizing the audio data (sounds) as related to speech or non-speech. Mitchell further teaches the multiple energy levels, from a plurality of audio channels and frequency ranges, can be captured from a variety of devices. (Mitchell col 6:16-29, 9:63-67, 4:1-3, 18:37-39, 2:10-13, 8:38-41)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Matheja to incorporate the teachings of Mitchell to utilized the audio data as input to train a machine learning model. Matheja discloses receiving sound information (audio data) channels from a plurality of microphone signals, which correspond to frequencies in the frequency sub-band domain. (Matheja ¶0005, ¶0028, and ¶0060). Mitchell teaches the use of energy levels from audio data to provide input into a trained machine learning model to aid in recognizing the sounds as related to speech or non-speech (Mitchell col 6:16-29, 9:63-67, 4:1-3, 18:37-39).  Matheja can be modified by the teachings of Mitchell to train machine learning data with the captured audio data.  The motivation for doing so would be to improve the sound recognition ability of the system by utilizing previously captured data that has been processed and trained within a machine learning model or neural network (Mitchell col 6:42-46). Overall, this would work to produce a high-level of system accuracy for the processed audio data (Mitchell col 5:44-50).

Regarding Claim 2: 
Matheja further discloses the method of claim 1, wherein determining the speech signal with removed noise for each channel comprises: for each of the plurality of frequency ranges of a channel, calculating a denoised signal based on at least one of the speech audio energy level or the noise energy level for a corresponding frequency range; and combining calculated denoised signals that each correspond to one of the plurality of frequency ranges of the channel. (View Matheja ¶0037, 0052 and 0058).

Regarding Claim 3: 
Matheja further discloses the method of claim 1, wherein the threshold condition requires that the one or more statistical values associated with the energy level of the strongest channel be above a respective threshold value for a threshold period of time. (View Matheja ¶0038-0040).

Regarding Claim 4: 
Matheja further discloses the method of claim 1, wherein determining whether to update the gain value for the respective channel comprises: determining whether the one or more statistical values associated with the energy level of the respective channel have been within a predefined range from a corresponding one or more statistical values associated with the energy level of the strongest channel for a period of time. (View Matheja ¶0038-0040).

Regarding Claim 7: 
Matheja, hereinafter, teaches the method of claim 1, wherein the plurality of frequency ranges is limited to a predefined set of frequencies. (View Matheja ¶0006, 0038, 0047, 0074 and 0092).

Regarding Claim 11: 
Matheja further discloses a system comprising:
a memory: Matheja Fig. 12 discloses volatile memory 1204 and non-volatile memory 1206 (e.g., hard disk).  Processing may be implemented in computer programs executed on programmable computers/machines that each includes a processor, a storage medium or other article of manufacture that is readable by the processor (including volatile and non-volatile memory and/or 
a processing device communicably coupled to the memory, the processing device to: Matheja Fig. 12 discloses a processor 1202, volatile memory 1204, and non-volatile memory 1206 (e.g., hard disk).  Processing may be implemented in computer programs executed on programmable computers/machines that each includes a processor, a storage medium or other article of manufacture that is readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and one or more output devices. (Matheja ¶0121-0122)
     The remaining limitations of Claim 11 contain similar limitations as Claim 1 and are therefore rejected for the same aforementioned reasons. 

Regarding Claim 12: 
     Claim 12 contain similar limitations as Claim 2 and is therefore rejected for the same aforementioned reasons. 

Regarding Claim 13: 
     Claim 13 contain similar limitations as Claim 3 and is therefore rejected for the same aforementioned reasons. 

Regarding Claim 14: 
     Claim 14 contain similar limitations as Claim 4 and is therefore rejected for the same aforementioned reasons. 

Regarding Claim 17: 
 Claim 17 contain similar limitations as Claim 7 and is therefore rejected for the same aforementioned reasons. 

Regarding Claim 21: 
Matheja discloses teaches a system comprising:
A non-transitory machine-readable storage medium comprising instructions that cause a
processing device to: Matheja Fig. 12 discloses the computer instructions are executed by the processor out of volatile memory. Module 1220 comprises non-transitory computer-readable instructions. (Matheja ¶0007 and 0121)
    The remaining limitations of Claim 21 contain similar limitations as Claim 1 and are therefore rejected for the same aforementioned reasons. 

Regarding Claim 22: 
     Claim 22 contain similar limitations as Claim 2 and is therefore rejected for the same aforementioned reasons. 

Regarding Claim 23: 
     Claim 23 contain similar limitations as Claim 3 and is therefore rejected for the same aforementioned reasons. 
Claims 5-6, 8, 15-16 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Matheja et al. (US 2016/0261951 A1) in view of Mitchell et al. (US 10,878,840 B1) and further in view of Wu et al. (US 11,164,592 B1).

Regarding Claim 5: 
Matheja, hereinafter, in combination with Mitchell, teaches the method of claim 1, comprising: 
based on the speech audio energy level and the noise energy level, updating a state of a state machine that includes a speech state, a noise state and an uncertain state. Matheja, in combination with Mitchell, teaches, utilizing sound recognition, calculating scores for sound classes related to speech, non-speech/non-verbal (noise), and uncertain events/scenes (states) based on the energy level of the audio data. (Mitchell col 3:65, 18:37-39, 7:61-63, 10:27-30, 11:13-16 and 6:56-65)
Matheja, in combination with Mitchell, does not explicitly disclose:
based on the speech audio energy level and the noise energy level, updating a state of a state machine that includes a silence state.

However, in an analogous art, Wu discloses:
based on the speech audio energy level and the noise energy level, updating a state of a state machine that includes a silence state. Wu teaches updating the state of the Voice Activity Detection (VAD) based on whether the energy levels of the audio data/frames are determined to be speech, silence, noise, and/or non-speech. (Wu col 7:20-29, 3:53-55, 4:24-26).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Matheja, hereinafter, to incorporate the teachings of Wu to add the additional state of silence based on lack of speech.  Matheja, hereinafter, in combination with Mitchell, teaches, utilizing sound recognition, calculating scores for sound classes related to speech, non-speech/non-verbal (noise), and uncertain events/scenes (states) based on the energy level of the audio data. (Mitchell col 3:65, 18:37-39, 7:61-63, 10:27-30, 11:13-16 and 6:56-65).  Wu teaches updating the state of the Voice Activity Detection (VAD) based on whether the energy levels of the audio data/frames are determined to be speech, silence, noise, and/or non-speech. (Wu col 7:20-29, 3:53-55, 4:24-26). Matheja, hereinafter, can be modified by the teachings of Wu to properly identify the VAD state of silence within the audio frames processed.  The motivation for doing so would be to improve the various states of the current the Voice Activity Detection (VAD) abilities by identifying the gaps of silence found 

Regarding Claim 6: 
Matheja, hereinafter, in combination with Mitchell and Wu, further teaches the method of claim 5, further comprising: updating the gain value for the respective channel, wherein updating the gain value for the respective channel further comprises: determining whether the state of the state machine is speech state for a threshold amount of time; responsive to determining that the state of the state machine is speech state for the threshold amount of time, updating the gain value by no more than a first number of decibels per second; determining whether the state of the state machine is uncertain state for the threshold amount of time; and responsive to determining that the state of the state machine is uncertain state for the threshold amount of time, updating the gain value by no more than a second number of decibels per second. (View Wu Fig. 11E, col 2:43-57, 4:14-19, 7:20-29, 13:6-47, 17:61-67, 3:37-51 and Mitchell col 3:65, 18:37-39, 7:61-63, 10:27-30, 11:13-16 and 6:56-65)

Regarding Claim 8: 
Matheja, hereinafter, in combination with Mitchell and Wu, further teaches the method of claim 6, wherein updating the gain value comprises: ensuring that the updated gain value does not exceed a gain value threshold. (View Wu col 20:52-64, 16:39-44, 16:52-55).

Regarding Claim 15: 
 Claim 15 contain similar limitations as Claim 5 and is therefore rejected for the same aforementioned reasons. 

Regarding Claim 16: 
     Claim 16 contain similar limitations as Claim 6 and is therefore rejected for the same aforementioned reasons. 

Regarding Claim 18: 
     Claim 18 contain similar limitations as Claim 8 and is therefore rejected for the same aforementioned reasons. 

Claims 9-10 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Matheja et al. (US 2016/0261951 A1), in view of Mitchell et al. (US 10,878,840 B1) and further in view of Alvarez et al. (US 2016/0099007 A1).

Regarding Claim 9: 
Matheja, hereinafter, teaches the method of claim 1, comprising:

Matheja does not explicitly disclose:
receiving speech audio segments and noise segment;
determining a noise energy level of each noise segment and a speech energy level of each speech audio segment;
generating noisy speech audio segments by combining each noise segment and each speech audio segment;
and training, using machine learning, the model using the noise energy level of each noise segment, a speech audio energy level of each speech audio segment, and the noisy speech audio segments.

However, in an analogous art, Alvarez discloses:
receiving speech audio segments and noise segment; Alvarez teaches receiving a stream of audio data that is segmented into a plurality of segments that do and do not include speech (Alvarez ¶0005 and 0006).
determining a noise energy level of each noise segment and a speech energy level of each speech audio segment; Alvarez teaches determining, via a plurality of audio segments that includes speech or noise only audio segments, the intensity levels (energy levels transfer rate) of each audio segment by observing the peak signal levels (Alvarez ¶0011 and 0012).
generating noisy speech audio segments by combining each noise segment and each speech audio segment; Alvarez teaches generating noisy audio data that is comprised of speech utterances, background speech (noisy speech) and other forms of noise (e.g. music and car noises) (Alvarez ¶0066 and 0067).
and training, using machine learning, the model using the noise energy level of each noise segment, a speech audio energy level of each speech audio segment, and the noisy speech audio segments; Alvarez teaches training noisy data that is comprised of speech utterances, background speech (noisy speech) and other forms of noise (e.g. music and car noises) (Alvarez ¶0066 and 0067).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Matheja, hereinafter, in combination with Mitchell, to incorporate the teachings of Alvarez to receive, process and train audio segments. Matheja discloses receiving sound information (audio data) channels from a plurality of microphone signals, which correspond to frequencies in the frequency sub-band domain. (Matheja ¶0005, 0028, and 0060).  Alvarez 

Regarding Claim 10: 
Matheja, hereinafter, in combination with Mitchell and Alvarez, teaches the method of claim 9: wherein combining each noise segment and each speech audio segment comprises overlapping each noise segment and each audio segment in a time domain and summing each noise segment and each audio segment. (View Alvarez ¶0005-0006 and 0066-0067, Mitchell col 2:28-32 and 17:14-19).

Regarding Claim 19: 
     Claim 19 contain similar limitations as Claim 9 and is therefore rejected for the same aforementioned reasons. 

Regarding Claim 20: 
     Claim 20 contain similar limitations as Claim 10 and is therefore rejected for the same aforementioned reasons. 

Conclusion
The prior arts made of record and not relied upon is considered pertinent to applicant's disclosure. 
	Dickins et al. (US 10,511,718 B2) discloses, in a teleconferencing setting, receiving audio signal data from a plurality of uplink data streams (channels), with corresponding frequencies, to recognize speech and noise signals (including the strongest signal) based on energy levels to provide updates to the outputs gain of each signal. 
Zhou et al. (US 10,728,656 B1) teaches receiving audio data from a plurality of input channels and signals to identify what is voice data versus background sound by conducting Automatic Gain Control (AGC) techniques and further training the data with an acoustic network model/artificial neural network.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DERRICK SCOTT JEFFERIES whose telephone number is (571)272-0923. The examiner can normally be reached 8:30a-5:00p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like 





/DERRICK SCOTT JEFFERIES/Examiner, Art Unit 2658                                                                                                                                                                                                        

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658