DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 03/31/2021 have been fully considered but they are not persuasive. Regarding arguments on page 8 of the Remarks, Examiner notes that separating speech from noise is separation of two audio source signals, as noise is considered an audio source. Regarding arguments in the second paragraph, Examiner notes that Huang teaches on page 2139 col. 1 that the masks are used in the model itself to aid in speech separation, where Huang teaches “a standard approach is to apply the time-frequency masks … to the magnitude spectra … of the mixture signals, and obtain the estimated separation spectra …” in col. 1 of page 2139. The masks are further calculated in the extra layer of the neural network model as described at the end of col. 1 of page 2139. 
Regarding arguments on pages 8-9 of the Remarks, Examiner notes that Huang teaches the determining the difference between signals, and that Zaharis is relied upon to teach that the signals are spatially filtered. Applicant argues that Zaharis does not appear to teach what it is states as teaching. However, Examiner notes that para [0049] of Applicant’s specification teaches that beamforming is spatial filtering, and thus the beamforming of Zaharis is interpreted as spatial filtering. While Zaharis does not teach labels, Zaharis does teach the error calculation for evaluation of the neural networks, which is applicable to the teachings of Huang regarding the labels.

Claim Objections
Claim 15 objected to because of the following informalities:  lines 8-9 read “the mixed signal the monaural signal” which should read “the monaural signal.”  Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-8 and 15-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Huang et al. (Huang, P. S., Kim, M., Hasegawa-Johnson, M., & Smaragdis, P. (2015). Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(12), 2136-2147.), hereinafter referred to as Huang, in view of Zaharis et al. (Zaharis, Z. D., Skeberis, C., Xenos, T. D., Lazaridis, P. I., & Cosmas, J. (2013). Design of a novel antenna array beamformer using neural networks trained by modified adaptive dispersion invasive weed optimization based data. IEEE Transactions on Broadcasting, 59(3), 455-460.), hereinafter referred to as Zaharis.

Regarding claim 2, Huang teaches:

generating, by the model, a mask for each audio source (page 2139 col. 1, where the masks are incorporated in the neural network model);
estimating, using the masks for each audio source, two or more output layers from the mixed signal, the two or more output layers being estimates of audio source signals in the mixed signal (page 2138 Sec. B first paragraph, where output predictions of a mixed input to a neural network model are determined, and page 2139 col. 1, where the extra layer incorporates the masks); 
identifying two or more labels corresponding to the output layers (page 2138 Sec. B first paragraph, where output predictions of a mixed input to a neural network model are determined, where the name of each prediction is a label, and where there is a prediction for each output layer);
determining assignment error scores for possible assignments of the two or more labels to the two or more output layers, wherein determining the assignment error scores comprises determining a difference between the two or more of spatially filtered signals and labels of the two or more labels that are assigned to respective output layers for a possible assignment of the labels to the respective output layers (page 2138 Sec. B first paragraph, where output predictions correspond to different sources, and page 2139 Sec. C first paragraph, eq. 7, where the error is minimized to optimize the parameters, for the predictions corresponding to the assignment order of output labels to sources); and 
adjusting parameters of the model based on the possible assignments and the assignment error scores (page 2139 Sec. C first paragraph, eq. 7, where the error is minimized to optimize the neural network parameters, and page 2141 col. 1, where 80% of the signals are used for training, showing iterative optimization).  
Huang does not teach:

Zaharis teaches:
obtaining two or more spatially filtered signals (page 457 col. 1 section V, where beamforming is performed to determine the weight vectors, which are interpreted as the spatially filtered signals);
determining assignment error scores for possible assignments of the two or more labels to the two or more output layers, wherein determining the assignment error scores comprises determining a difference between the two or more of spatially filtered signals and labels of the two or more labels that are assigned to respective output layers for a possible assignment of the labels to the respective output layers (page 457 col. 2 last paragraph, where mean squared error is used to evaluate the neural network training, where the neural network is trained using the MADIWO-based vector pairs of the beamformer, and where differences between the neural network and the beamformers are evaluated, as noted in page 458 col. 1 last paragraph and col. 2 first paragraph);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Huang by training the neural network of Huang (Huang page 2138 Sec. B first paragraph) using the beamformer of Zaharis (Zaharis page 456 Sec. 2) by applying a training process to the neural network using information from the beamformer, in order to achieve the efficiency of the beamforming algorithm, but with instant response instead of an iterative structure of the beamforming algorithm (Zaharis page 455 col. 1 last full paragraph).

Regarding claim 3, Huang in view of Zaharis teaches:
The method of claim 2, wherein the two or more spatially separated signals are true versions of the audio source signals in the mixed signal (Huang page 2138 Sec. B first paragraph, and page 2139 Sec. C first paragraph, eq. 7, where the targets are the original source signals).  

Regarding claim 4, Huang in view of Zaharis teaches:
The method of claim 2, adjusting the parameters reduces the error scores for the possible assignment of the labels to the respective output layers (Huang page 2139 Sec. C first paragraph, eq. 7, where the error is minimized to optimize the parameters).  

Regarding claim 5, Huang in view of Zaharis teaches:
The method of claim 2, further comprising determining the possible assignment of the labels to the respective output layers based on pairwise differences between the output layers and the two or more spatially separated signals (Huang page 2139 Sec. C first paragraph, eq. 7, where the error is minimized to optimize the parameters, for the predictions corresponding to the assignment order of output labels to sources).  

Regarding claim 6, Huang in view of Zaharis teaches:
The method of claim 2, wherein the possible assignment of the labels to the respective output layers assigns an individual label of the two or more labels to an individual output layer of the two or more output layers to attribute the individual output layer to a source of an individual audio source signal of the audio source signals (Huang page 2138 Sec. B first paragraph, and page 2139 Sec. C, where both P21 and P12 are calculated, relating or assigning the sources to the spectra).  

Regarding claim 7, Huang in view of Zaharis teaches:
The method of claim 2, further comprising generating the two or more output layers using two or more frames of the mixed signal or two or more frames of a feature signal of the mixed signal (Huang Fig. 1 caption, Fig. 2, where the t frames of the input mixed signal are separated into the output layers).  

Regarding claim 8, Huang in view of Zaharis teaches:
The method of claim 2, further comprising selecting the possible assignment of the labels to the respective output layers based on a window sample of the mixed signal (Huang Fig. 2, where the window sample is the length of time t of the input x) by: 
determining two or more possible assignments of the labels to the respective output layers in the window (Huang page 2139 Sec. C, where both P12 and P21, relating or assigning the sources to the spectra, are calculated); and 
selecting the possible assignment from the two or more possible assignments based on differences between each assignment of the two or more possible assignments and the two or more spatially separated signals (Huang page 2139 Sec. C, where both P12 and P21, relating or assigning the sources to the spectra, are calculated, and where error is minimized to optimize the parameters).  

Regarding claim 15, Huang teaches:
A system for separating two or more audio source signals from a first monaural signal having audio source signals and noise source signals (page 2144, where a mixed signal includes speech and noise sources), the system comprising: 
one or more processors (page 2145 2nd column, where neural networks have computational costs, teaching use of a computer); and 
a memory having stored thereon computer-executable instructions that, when executed by the one or more processors, configure the processors (page 2145 2nd column, where neural networks have memory costs, teaching use of memory) to: 
generate, by the model, a mask for each audio source (page 2139 col. 1, where the masks are incorporated in the neural network model);

identify two or more labels corresponding to the output layers (page 2138 Sec. B first paragraph, where output predictions of a mixed input to a neural network model are determined, where the name of each prediction is a label, and where there is a prediction for each output layer);
determine assignment error scores for possible assignments of the two or more labels to the two or more output layers by determining a difference between the two or more of spatially filtered signals and labels of the two or more labels that are assigned to respective output layers for a possible assignment of the labels to the respective output layers (page 2138 Sec. B first paragraph, where output predictions correspond to different sources, and page 2139 Sec. C first paragraph, eq. 7, where the error is minimized to optimize the parameters, for the predictions corresponding to the assignment order of output labels to sources); and 
adjust parameters of the model based on the possible assignments and the assignment error scores (page 2139 Sec. C first paragraph, eq. 7, where the error is minimized to optimize the neural network parameters, and page 2141 col. 1, where 80% of the signals are used for training, showing iterative optimization).  
Huang does not teach:
obtain two or more spatially filtered signals;
Zaharis teaches:
obtain two or more spatially filtered signals (page 457 col. 1 section V, where beamforming is performed to determine the weight vectors, which are interpreted as the spatially filtered signals);

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Huang by training the neural network of Huang (Huang page 2138 Sec. B first paragraph) using the beamformer of Zaharis (Zaharis page 456 Sec. 2) by applying a training process to the neural network using information from the beamformer, in order to achieve the efficiency of the beamforming algorithm, but with instant response instead of an iterative structure of the beamforming algorithm (Zaharis page 455 col. 1 last full paragraph).

Regarding claim 16, Huang in view of Zaharis teaches:
The system of claim 15, wherein the two or more spatially separated signals are true versions of the audio source signals in the mixed signal (Huang page 2138 Sec. B first paragraph, and page 2139 Sec. C first paragraph, eq. 7, where the targets are the original source signals).  

Regarding claim 17, Huang in view of Zaharis teaches:
The system of claim 15, wherein the computer-executable instructions, when executed by the one or more processors, further configure the processors to adjust the parameters by reducing the error 

Regarding claim 18, Huang in view of Zaharis teaches:
The system of claim 15, wherein the computer-executable instructions, when executed by the one or more processors, further configure the processors to determine the possible assignment of the labels to the respective output layers based on pairwise differences between the output layers and the two or more spatially separated signals (Huang page 2139 Sec. C first paragraph, eq. 7, where the error is minimized to optimize the parameters, for the predictions corresponding to the assignment order of output labels to sources).  

Regarding claim 19, Huang in view of Zaharis teaches:
The system of claim 15, wherein the possible assignment of the labels to the respective output layers assigns an individual label of the two or more labels to an individual output layer of the two or more output layers to attribute the individual output layer to a source of an individual audio source signal of the audio source signals (Huang page 2138 Sec. B first paragraph, and page 2139 Sec. C, where both P21 and P12 are calculated, relating or assigning the sources to the spectra).  

Regarding claim 20, Huang in view of Zaharis teaches:
The system of claim 15, wherein the computer-executable instructions, when executed by the one or more processors, further configure the processors to generate the two or more output layers using two or more frames of the mixed signal or two or more frames of a feature signal of the mixed signal (Huang Fig. 1 caption, Fig. 2, where the t frames of the input mixed signal are separated into the output layers).  

Regarding claim 21, Huang in view of Zaharis teaches:
The system of claim 15, wherein the computer-executable instructions, when executed by the one or more processors, further configure the processors to: 
select the possible assignment of the labels to the respective output layers based on a window sample of the mixed signal (Huang Fig. 2, where the window sample is the length of time t of the input x) by: 
determining two or more possible assignments of the labels to the respective output layers in the window (Huang page 2139 Sec. C, where both P12 and P21, relating or assigning the sources to the spectra, are calculated); and 
selecting the possible assignment from the two or more possible assignments based on differences between each assignment of the two or more possible assignments and the two or more spatially separated signals (Huang page 2139 Sec. C, where both P12 and P21, relating or assigning the sources to the spectra, are calculated, and where error is minimized to optimize the parameters).

Claims 9 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Huang, in view of Zaharis, and further in view of Chan et al. (US 8,898,056 B2), hereinafter referred to as Chan.

Regarding claim 9, Huang in view of Zaharis teaches:
The method of claim 8,
Huang in view of Zaharis does not teach:
wherein the window sample comprises one or more frames of the mixed signal and the one or more frames overlaps at least one frame of a previous window sample.
Chan teaches:

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Huang in view of Zaharis by using the frames/windows  and reordering of Chan (Chan col. 15 line 7 - col. 16 line 52) in the speech separation system of Huang in view of Zaharis (Huang page 2139 Sec. C) by performing the training on windows, and performing reordering of assignments as needed, in order to improve the coherence of the signals (Chan col. 1 lines 50-54).

Regarding claim 14, Huang in view of Zaharis teaches:
The method of claim 2, further comprising: 
jointly optimizing the model based at least in part on the spatially filtered audio signal sources (Huang page 2139 Sec. C first paragraph, where the parameters are jointly optimized based on the source signals).  
Huang in view of Zaharis does not teach:
spatially filtering, by a microphone array, the mixed signal to obtain the audio signal sources and to identify the signal-creating audio sources; and 
Chan teaches:
spatially filtering, by a microphone array, the mixed signal to obtain the audio signal sources and to identify the signal-creating audio sources (Fig. 6, 9, col. 12 lines 53-61, where a microphone array performs spatial filtering, and col. 15 lines 30-50, where the signal sources are identified); and 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Huang in view of Zaharis by using the frames/windows  .

Allowable Subject Matter
Claims 10-13 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  None of Huang, Zaharis, or Chan teach the limitations in claims 10-13. Specifically, none of the cited prior art teaches tracing a source signal attributable to a source signal through multiple frames of the mixed signal based on the limitations in the claims, in conjunction with the limitations of the claims depended upon. Hence, none of the cited prior art, either alone or in combination thereof, teaches the limitations in the claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 9,576,583 B1 col. 3 lines 4-32, where a mask is used in the model as part of audio signal separation; US 8,712,069 B1 col. 9 lines 36-51, where the mask generator determines masks based on the models.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYAN S BLANKENAGEL whose telephone number is (571)270-0685.  The examiner can normally be reached on 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/BRYAN S BLANKENAGEL/Primary Examiner, Art Unit 2658