DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the amendment filed December 20, 2021.  Claims 1, 4, 8-9, 11, 15-16, and 18-20 have been amended.  Claims 3, 10, and 17 are cancelled.  Claims 1-2, 4-9, 11-16, and 18-20 are pending.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 6, 8-9, 13, 15-16, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Kjolerbakken (US Patent Application Publication No. 2015/0039314) in view of Z. Chen, T. Yoshioka, X. Xiao, L. Li, M. L. Seltzer and Y. Gong, ("Efficient Integration of Fixed Beamformers and Speech Separation Networks for Multi-.
Regarding claims 1, 8, and 15, Kjolerbakken teaches [para 0013; 0069] a  method, computer system and non-transitory medium for automatic speech recognition [para 0012] comprising and one or more computer processors configured to access said computer program code and operate as instructed by said computer program code, said computer program code including procedures for [para 0069]: receiving video data and audio data corresponding to one or more speakers [para 0034-0036 – providing a microphone array directed to the face of a person speaking;  0047-0050 -- coordinates of where the sounds are expelled can be combined with information from a camera and/or other sources, and the known positions of the nose and mouth, and the sounds can be mapped to determine from where the sounds are expelled]; applying a minimum variance distortionless response function to the received audio and video data [para 0038-0039 – MVDR}; and generating a predicted target waveform corresponding to a target speaker from among the one or more speakers based the output of the applied minimum variance distortionless response function [para 0045-0056 -- Output coordinates, from the DOA, of where sounds are expelled can be combined with information of the position of the nose and mouth, and the sounds can be mapped to determine from where the sounds are expelled, i.e. identify the origin of the sound; 0067-0076 -- determine what kind of sound or what kind of sound class the sound is].  Kjolerbakken fails to teach backpropagating the minimum variance distortionless response function.  Z. Chen discloses efficient integration of fixed beamformers and speech separation networks fir multi-channel speech separation. And implements a nd col., last paragraph] and implements backpropagation learning during speech separation [sec 2.3] where the target speaker signal is enhanced [sec. 2.3.1].  Z. Chen specifically teaches the system outperforms conventional systems.  One having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the backpropagation speech separation techniques suggested by Z. Chen, in the system of Kjolerbakken, for the purpose of accurately determining the target speaker from among a plurality of speakers and enhance only the target speaker’s speech, to ensure the speech recognized has the best speech signal to process and correctly recognize, and thereby improve the performance of the recognizer.
Kjolerbakken fails to teach, but Markovich-Golan teaches generating a covariance matrix [col. 3, line 66 to col. 4, line 14].  One having ordinary skill in the art would have recognized the advantages of implementing the covariance matrix processing techniques suggested by Markovich-Golan, in the Kjolerbakken system for the purpose of improving the improving signal quality as suggested by Markovich-Golan (col. 1, lines 12-16) and thereby to ensure the speech recognized has the best speech signal to process and correctly recognize, and thereby improve the performance of the recognizer.
Kjolerbakken fails to teach replacing a real value mask with a complex value mask.  S.H. Chen teaches a speech enhancement system that enhance directly the complex-valued noisy spectrum, modifying not only the magnitude but also the phase,  estimating the complex-valued coefficients of a speech signal in the complex domain, rather than separately enhancing the magnitude and phase in the real domain [page 5439 at col. 2, lines 10-26; sec. 3. Proposed Methods at pages 5440-5441; page 5442 at col. 2, lines 9-12].  S. H. Chen specifically teaches integrating phase estimation into a speech enhancement procedure, significantly improves the quality of the enhanced speech [page 5439 at col. 2, lines 24-26].    One having ordinary skill in the art would have recognized the advantages of implementing the complex-value mask processing techniques over the real-value mask processing, as suggested by S.H. Chen, in the Kjolerbakken system for the purpose of improving the improving the quality of enhanced speech, as suggested by S.H. Chen, and thereby to ensure the speech recognized has the best speech signal to process and correctly recognize, and thereby improve the performance of the recognizer.
Regarding claims 2, 9, and 16, the combination of Kjolerbakken, Z. Chen,  Markovich-Golan, and S.H. Chen teaches scale-invariant source-to-noise ratio [Z. Chen, page 5385. Sec. 2, 3rd paragraph].
Regarding claims 6, 13, and 19, the combination of  Kjolerbakken, Z. Chen,  Markovich-Golan, and S.H. Chen teaches the video data corresponds to lip movement data captured by one or more cameras and the audio data corresponds to speech captured by one or more microphones [Kjolerbakken para 0045-0056l Fig. 1].


 
Allowable Subject Matter
Claims 4-5, 7, 11-12, 14, 18 and 20 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Response to Arguments
Applicant’s arguments with respect to claims 1-2, 4-9, 11-16, and 18-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter of the amended claim limitations specifically challenged in the argument.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 


Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGELA A ARMSTRONG whose telephone number is (571)272-7598. The examiner can normally be reached M,T,TH,F 11:30-8:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ANGELA A. ARMSTRONG
Primary Examiner
Art Unit 2659



/ANGELA A ARMSTRONG/Primary Examiner, Art Unit 2659