DETAILED ACTION
This action is in response to the amendment filed on 11/22/2021.

Response to Amendment
Applicant’s amendment filed on 11/22/2021 has been entered. No claims been amended. No claims have been canceled. No claims have been added. Claims 1 – 20 are still pending in this application, with claims 1, 11 and 20 being independent.

Allowable Subject Matter
Claims 4 – 6 and 14 – 16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims since a search failed to uncover prior art which teaches or suggests the following limitations in reasonable combination: wherein for decoding an acoustic utterance from which an input sequence is derived, an output of a primary network for minimizing a CTC loss is used while an output of a sub network for minimizing MSE loss is discarded; wherein an input sequence which a common encoder network receives is distorted when minimizing MSE loss of a subnetwork by swapping the former and latter parts of the input sequence; and wherein an input sequence which a common encoder network receives is distorted when minimizing MSE loss of a subnetwork by stripping a beginning portion and an end portion of the sequence for the feature reconstruction.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1 - 3, 7, 8, 11 - 13, 17, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hori et al. (US 2018/0330718) (“Hori”) in view of Cui et al. (US 2020/0135174) (“Cui”), and further in view of in view Saon et al. (US 2015/0161522) (“Saon”).
For claims 1, 11 and 20, Hori discloses a multi-task learning system, method and computer program product for speech recognition (Abstract, [0058] [0061] [0062]) comprising: a common encoder network (Fig.2, 202); a primary network (CTC module, Fig.2, 208 ) for minimizing a Connectionist Temporal Classification (CTC) loss for speech recognition ([0072]); and a subnetwork (attention decoder, Fig.2, 204) wherein a first set of output data of the common encoder network is received by both of the primary network and sub network (Fig.3, [0031] [0032] [0053]). Yet, Hori fails to teach the following:  a mean squared error is minimized for the sub network; and a second set of the output data of the common encoder network is received only by the primary network from among the primary network and the sub network.
However, Cui discloses a system and method for performing speech recognition training (Abstract), wherein a training a sequence to sequence speech recognition model comprises both computing CTC loss and cross entropy loss ([0054 – 0056] [0058 - 0063]), wherein a set of output data (hu) of the common encoder network (Fig.2, 210; [0054] [0055]) is received only by a first network (Fig.2, 220 and Fig.3, 320; [0067] [0071 – 0073]) from among a first (Fig.2, 220 u and a new different sequence pattern trains the decoder, [0056] [0057]).
Additionally, Saon discloses a system and method for training neural network (Abstract), wherein a neural network is trained by optimizing cross-entropy or mean-squared error ([0026]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by Hori in the same way that Cui’s invention has been improved to achieve the following predictable results for the purpose of increasing the speed and efficiency of training the Seq2Seq model (Cui, [0002 – 0004]): during training, a second set of the output data of the common encoder network is received only by the primary (first network comprising CTC network from among the primary network and the sub network.
Additionally, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to substitute the cross-entropy function loss disclosed by the combination of Hori and Cui with the mean squared error loss disclosed by Saon to achieve the predictable results of training the neural network system disclosed above by the combination of Hori and Cui for the purpose of improving the accuracy, efficiency and performance of speech recognition by using neural networks (Hori, [0002]).

For claims 2 and 12, Cui and Saon further disclose, wherein for the first set of output data, the CTC loss is minimized after the MSE loss is minimized (Cui, CTC model training and attention model training using cross-entropy loss may be independently performed by  training each model at different time periods, wherein different time periods encompasses adjacent and/or 
 For claims 3 and 13, Cui further disclose, wherein for the second set of output data, only the CTC loss is minimized (Cui, the second set of output data, hu is only received by the CTC model, [0055] [0056]).
	For claims 7 and 17, Hori further discloses wherein the common encoder network is common to both the primary network and the sub network (Hori, Fig.2, 202 and Fig.3, [0031] [0032] [0053]).
For claims 8 and 18, Hori further discloses wherein the common encoder network is a bi-directional Long Short Term Memory network (Hori, [0037 - 0040]).

Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Hori et al. (US 2018/0330718) (“Hori”) in view of Cui et al. (US 2020/0135174) (“Cui”), and further in view of in view Saon et al. (US 2015/0161522) (“Saon”) and further in view of  Zhe et al.  (“An Hybrid CTC-Attention Model for Speech Recognition”) (“Zhe”).
For claims 9 and 19, the combination of Hori, Saon and Cui fails to teach, wherein the sub network is a bi-directional Long Short Term Memory (LSTM) network.
However, Zhe discloses a system and method for an improved hybrid CTC-attention model for speech recognition (Abstract) wherein a sub network (attention model network) is a bi-directional LSTM network (Loc-Attention, Fig.2, 3.1. Encoder-Decoder Architecture).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s invention to modify the combined teachings of Hori, Cui and Saon with Zhe’s teachings so that the sub network (Fu, [0047]) additionally comprises a bi-directional LSTM for .

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Hori et al. (US 2018/0330718) (“Hori”) in view of Cui et al. (US 2020/0135174) (“Cui”), and further in view of in view Saon et al. (US 2015/0161522) (“Saon”) and further in view of  Yao et al. (US 2019/0188567) (“Yao”).
For claim 10, the combination of Hori, Cui and Saon fails to teach, wherein the first set of output data of the common encoder network is randomly selected.
However, Yao discloses a system and method for training a neural network (Abstract) wherein a mini-batch of a training set which is used as input to a neural network to generate output data from the neural network is randomly selected, thereby generating  randomly selected output data ([0064] [0065]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to modify the combined teachings of Hori, Cui and Saon with Yao’s teachings so that the first set of the output data of the common encoder network is randomly selected due to randomly selecting the data input to the encoder for the purpose of increasing the efficiency and accuracy of training the learning system.

Response to Arguments
Applicant's arguments filed on 11/22/2021  have been ben considered, but are not persuasive.  On pg. 2 of the remarks, applicant argues that examiner used impermissible hindsight in the combination of Hori, Cui and Saon. Hori discloses an attention based end-to-end 
Furthermore, applicant argues that there is no teaching of two sets of output data from the common encoder on pg. 2 and 3 of the remarks. First, the claims fail to recite specific limitations u) received by both the primary network and the subnetwork from the common encoder (hu is received at the transform layer of the subnetwork) (Fig.2, Fig.5, 520; [0055] [0056]), and a second set of output data (hu) is received by the primary network only from the common encoder (after the transform layer, the encoder output data received by the primary network is hu, and the encoder output data received by the subnetwork is hu’ which is different than hu). (Fig.2, Fig.5, 520; [0055- 0058])  Therefore, the combination of Hori and Cui teaches the aforementioned limitations.
 
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SONIA L GAY whose telephone number is (571)270-1951.  The examiner can normally be reached on Monday-Friday 9-5 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on 571-272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SONIA L GAY/Primary Examiner, Art Unit 2657