DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 06/03/2022 in response to the Office Action mailed on 03/07/2022. Claims 1-22 are pending in the application. As such, claims 1-22  have been reconsidered and examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The response filed on 06/03/2022 has been correspondingly accepted and considered in this Office Action. Claims 1-22 have been examined. 

Allowable Subject Matter
In view of Applicant’s amendments to independent Claims 1 and 12, the previous rejections of Claims 1-22 under 35 U.S.C. § 103 are respectfully reconsidered, and hereinafter addressed as follows. 
Yoshioka fails, however, to disclose when the first speaker was speaking prior to the labeled start time of the overlapping region, applying, to the respective masked audio embedding for the first speaker, a first masking loss setting all activations after the labeled end time to zero. Yoshioka also fails to disclose training the speech recognition model based on the labeled end time using the first masking loss applied before the labeled start time to the respective masked audio embedding for the first speaker.

Applicant’s arguments above with respect to claim 1 have been reconsidered and found persuasive.
The following is an Examiner’s statement of reasons for allowance:
Claims 1 and 12 are found allowable over the prior art of record for at least the following rationale.
At best, Settle (S. Settle, et.al., "End-to-End Multi-Speaker Speech Recognition," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 4819-4823) teaches a fully end-to-end, jointly trained deep learning system for separation and recognition of overlapping speech signals. The joint speech separation and recognition uses the masks output from the chimera++ network to extract each source, from which the log-mel filterbank features are computed for recognition (see Settle, pg. 4821, sect 4, Fig. 1).
Further, Yoshioka (Yoshioka, et.al.. (2018). Recognizing overlapped speech in meetings: A multichannel separation approach using neural network) teaches using spectral masking using BLSTM with permutation invariant training (PIT)-trained network for an unmixing transducer to recognize speech even when even when utterances of different speakers are overlapped (see Yoshioka, pg. 3039, sect. 2).
Additionally, Droppo et.al  (U.S. Patent 10,460,727) teaches separate the mixed speech audio using permutation invariant training(PIT), wherein a criterion of the permutation invariant training is defined on an utterance of the mixed speech audio. Some of the criteria for the training are instruct separated frames belonging to the same speaker to be aligned to the same output layer during training time using a PIT separator (see Droppo, col. 3 lines 50-65, col 4, lines 23-48).
Notwithstanding, Settle, Yoshioka and Droppo’s teachings still fail to teach or fairly suggest either individually or in a reasonable combination the recited limitations in independent Claims 1 and 12 as specifically recited.
Please, see additional references in form PTO-892 for more details.
Similarly, dependent Claims 2-11 and 13-22 further limit allowable independent Claims 1 and 12 correspondingly, and thus said claims are also found allowable over the prior art of record by virtue of their dependency.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Chen et. al., US Patent 10,957,337 teaches applying time-frequency masks can be obtained as the output of the speech separation model. For example, the masks can take values between 0 and 1, with a 0 representing a speaker that is not speaking and a 1 representing a speaker that is dominant at the corresponding time-frequency point (see Chen, Fig. 3).
Dong, US Patent 10,249,305 teaches permutation invariant training of deep learning models can be used for talker-independent multi-talker scenarios (see Dong, Fig. 2A).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 12:00pm - 5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/NANDINI SUBRAMANI/Examiner, Art Unit 2656        

/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656