Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in a telephone interview with Mandy J. Song, Ph.D (Reg. No. 69,583) on 11/10/2021.
The claims has been amended as follows:
Claim 1, (Proposed Amendments)  A system for audio signal processing, comprising: 
a communication interface configured to receiving a first audio signal acquired from an audio source through a first channel, and a second audio signal acquired from the same audio source through a second channel; and 
at least one processor coupled to the communication interface and configured to: 
determine channel features based on the first audio signal and the second audio signal individually; 
determine a cross-channel feature based on the first audio signal and the second audio signal collectively; 
concatenate the channel features and the cross-channel feature; 
wherein the spectral-spatial masks for the first channel include a first speech mask for the first audio signal, and the spectral-spatial masks for the second channel include a second speech mask for the second audio signal;
determine time-frequency representations by performing a Short Time Fourier Transform (STFT) to the first audio signal and the second audio signal; 
calculate a speech Cross Power Spectral Density (CPSD) matrix based on the first speech mask, the second speech mask, and the time-frequency representations; and perform beamforming using the speech CPSD matrix.
Claim 8: (Proposed Amendments) The system of claim 1, wherein the spectral-spatial masks for the first channel further include further include 
Claim 9: (Proposed Amendments) The system of claim 8, wherein 


calculate a noise CPSD matrix based on the first noise mask, the second noise mask, and the time-frequency representations; and 
perform the beamforming using the speech CPSD matrix and the noise CPSD matrix.

determining channel features based on the first audio signal and the second audio signal individually; 
determining a cross-channel feature based on the first audio signal and the second audio signal collectively;
concatenating the channel features and the cross-channel feature;
estimating spectral-spatial masks for the first channel and the second channel using the concatenated channel features and the cross-channel feature; 
wherein the spectral-spatial masks for the first channel include a first speech mask for the first audio signal, and the spectral-spatial masks for the second channel include a second speech mask for the second audio signal;
determine time-frequency representations by performing a Short Time Fourier Transform (STFT) to the first audio signal and the second audio signal; 
calculate a speech Cross Power Spectral Density (CPSD) matrix based on the first speech mask, the second speech mask, and the time-frequency representations and performing beamforming using the speech CPSD matrix. 
Claim 17: (Proposed Amendments) The system of claim 12, wherein the spectral-spatial masks for the first channel further include further include 
Claim 18: (Proposed Amendments) The system of claim 17, wherein 


calculate a noise CPSD matrix based on the first noise mask, the second noise mask, and the time-frequency representations; and perform the beamforming using the speech CPSD matrix and the noise CPSD matrix.
Claim 20, (Proposed Amendments) A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform a method for audio signal processing, the method comprising: 
receiving a first audio signal acquired from an audio source through a first channel, and a second audio signal acquired from the same audio source through a second channel; 
determining channel features based on the first audio signal and the second audio signal individually; 
determining a cross-channel feature based on the first audio signal and the second audio signal collectively;

estimating spectral-spatial masks for the first channel and the second channel using the concatenated channel features and the cross-channel feature;
wherein the spectral-spatial masks for the first channel include a first speech mask for the first audio signal, and the spectral-spatial masks for the second channel include a second speech mask for the second audio signal;
determine time-frequency representations by performing a Short Time Fourier Transform (STFT) to the first audio signal and the second audio signal; 
calculate a speech Cross Power Spectral Density (CPSD) matrix based on the first speech mask, the second speech mask, and the time-frequency representations;
and performing beamforming using the speech CPSD matrix. .

Allowable Subject Matter
Examiner’s reason for Allowance
Claims 1-20 are allowed.
Claim 1, A system for audio signal processing, comprising: 
a communication interface configured to receiving a first audio signal acquired from an audio source through a first channel, and a second audio signal acquired from the same audio source through a second channel; and 
at least one processor coupled to the communication interface and configured to: 

determine a cross-channel feature based on the first audio signal and the second audio signal collectively; 
concatenate the channel features and the cross-channel feature; 
estimate spectral-spatial masks for the first channel and the second channel using the concatenated channel features and the cross-channel feature; wherein the spectral-spatial masks for the first channel include a first speech mask for the first audio signal, and the spectral-spatial masks for the second channel include a second speech mask for the second audio signal;
determine time-frequency representations by performing a Short Time Fourier Transform (STFT) to the first audio signal and the second audio signal; 
calculate a speech Cross Power Spectral Density (CPSD) matrix based on the first speech mask, the second speech mask, and the time-frequency representations; and perform beamforming using the speech CPSD matrix
The following is an examiner's statement of reasons for allowance:Regarding claim 1, the prior art of record, specifically Imai et al. (US Patent Application Publication #2013/0022261) teaches a spatial mask generator generates a  spectral spatial mask or an image sensor with tunable spectral sensitivities. The spectral spatial mask indicates the spectral sensitivities that the image sensor  is to be configured to detect. Also, multi-aperture/multi-lens optics  are used to direct light from a scene  to the image sensor. The image sensor detects the light (which may include light field information) according to Goldstein et al. (US 2007/0296969) teaches Spectral encoding can be achieved by transmission, rejection, or intensity modulation of particular wavelengths to generate a dynamic series of spatial masks 111 that implement a variety of transform functions. Spectral transform functions include traditional single-slit wavelength scanning, simple multiple-slit filtering, multiplexed transform spectroscopy using a sequence of orthogonal mask, and application of a spectral template mask that match known distinctive spectral features of the target. (Paragraphs 0043).
However, none of the prior art cited alone or in combination provides the motivation to teach wherein the spectral-spatial masks for the first channel include a first speech mask for the first audio signal, and the spectral-spatial masks for the second channel include a second speech mask for the second audio signal;
determine time-frequency representations by performing a Short Time Fourier Transform (STFT) to the first audio signal and the second audio signal; 
calculate a speech Cross Power Spectral Density (CPSD) matrix based on the first speech mask, the second speech mask, and the time-frequency representations; and perform beamforming using the speech CPSD matrix.
Claim 12, A method audio signal processing, comprising: receiving a first audio signal acquired from an audio source through a first channel, and a second audio signal acquired from the same audio source through a second channel;

determining a cross-channel feature based on the first audio signal and the second audio signal collectively;
concatenating the channel features and the cross-channel feature;
estimating spectral-spatial masks for the first channel and the second channel using the concatenated channel features and the cross-channel feature; 
wherein the spectral-spatial masks for the first channel include a first speech mask for the first audio signal, and the spectral-spatial masks for the second channel include a second speech mask for the second audio signal;
determine time-frequency representations by performing a Short Time Fourier Transform (STFT) to the first audio signal and the second audio signal; 
calculate a speech Cross Power Spectral Density (CPSD) matrix based on the first speech mask, the second speech mask, and the time-frequency representations and performing beamforming using the speech CPSD matrix. 
The following is an examiner's statement of reasons for allowance:Regarding claim 12, the prior art of record, specifically Imai et al. (US Patent Application Publication US 2013/0022261) teaches a spatial mask generator generates a  spectral spatial mask or an image sensor with tunable spectral sensitivities. The spectral spatial mask indicates the spectral sensitivities that the image sensor  is to be configured to detect. Also, multi-aperture/multi-lens optics  Goldstein et al. (US 2007/0296969) teaches Spectral encoding can be achieved by transmission, rejection, or intensity modulation of particular wavelengths to generate a dynamic series of spatial masks that implement a variety of transform functions. Spectral transform functions include traditional single-slit wavelength scanning, simple multiple-slit filtering, multiplexed transform spectroscopy using a sequence of orthogonal mask, and application of a spectral template mask that match known distinctive spectral features of the target. (Paragraphs 0043).
However, none of the prior art cited alone or in combination provides the motivation to teach wherein the spectral-spatial masks for the first channel include a first speech mask for the first audio signal, and the spectral-spatial masks for the second channel include a second speech mask for the second audio signal;
determine time-frequency representations by performing a Short Time Fourier Transform (STFT) to the first audio signal and the second audio signal; 
calculate a speech Cross Power Spectral Density (CPSD) matrix based on the first speech mask, the second speech mask, and the time-frequency representations; and perform beamforming using the speech CPSD matrix.
Claim 20, A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more 
receiving a first audio signal acquired from an audio source through a first channel, and a second audio signal acquired from the same audio source through a second channel; 
determining channel features based on the first audio signal and the second audio signal individually; 
determining a cross-channel feature based on the first audio signal and the second audio signal collectively;
concatenating the channel features and the cross-channel feature;
estimating spectral-spatial masks for the first channel and the second channel using the concatenated channel features and the cross-channel feature;
wherein the spectral-spatial masks for the first channel include a first speech mask for the first audio signal, and the spectral-spatial masks for the second channel include a second speech mask for the second audio signal;
determine time-frequency representations by performing a Short Time Fourier Transform (STFT) to the first audio signal and the second audio signal; 
calculate a speech Cross Power Spectral Density (CPSD) matrix based on the first speech mask, the second speech mask, and the time-frequency representations;
and performing beamforming using the speech CPSD matrix. 
The following is an examiner's statement of reasons for allowance:Regarding claim 20, the prior art of record, specifically Imai et al. (US Patent Application Publication US 2013/0022261) teaches a spatial mask generator generates a  spectral spatial mask or an image sensor with tunable spectral sensitivities. The spectral spatial mask indicates the spectral sensitivities that the image sensor  is to be configured to detect. Also, multi-aperture/multi-lens optics  are used to direct light from a scene  to the image sensor. The image sensor detects the light (which may include light field information) according to the spectral spatial mask and generates a captured image. (Paragraphs 0056). Goldstein et al. (US 2007/0296969) teaches Spectral encoding can be achieved by transmission, rejection, or intensity modulation of particular wavelengths to generate a dynamic series of spatial masks 111 that implement a variety of transform functions. Spectral transform functions include traditional single-slit wavelength scanning, simple multiple-slit filtering, multiplexed transform spectroscopy using a sequence of orthogonal mask, and application of a spectral template mask that match known distinctive spectral features of the target. (Paragraphs 0043).
However, none of the prior art cited alone or in combination provides the motivation to teach wherein the spectral-spatial masks for the first channel include a first speech mask for the first audio signal, and the spectral-spatial masks for the second channel include a second speech mask for the second audio signal;

calculate a speech Cross Power Spectral Density (CPSD) matrix based on the first speech mask, the second speech mask, and the time-frequency representations; and perform beamforming using the speech CPSD matrix.


Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Akwasi M Sarpong whose telephone number is (571)270-3438. The examiner can normally be reached Mon-Fri. 8:00am-4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KING D POON can be reached on 571-272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For 





/AKWASI M SARPONG/Primary  Examiner, Art Unit 2675                                                                                                                                                                                                        12/30/2021