DETAILED ACTION
This action is in response to the initial filing of Application no. 16/297,052 on 03/08/2019.
Claims 1 – 20 are still pending in this application, with claims 1, 11 and 20 being independent.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter
Claims 4 – 6  and 14 – 16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Objections
Claims 1, 11 and 20 are objected to because of the following informalities:  “common encode” should recite -- common encoder --Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective 

Claims 1 - 3, 7, 8, 11 - 13, 17, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Fu et al. (US 2020/0219486) (“Fu”) in view Saon et al. (US 2015/0161522) (“Saon”) and further in view of Cui et al. (US 2020/0135174) (“Cui”).
For claims 1, 11 and 20, Fu discloses a multi-task learning system, method and computer program product for speech recognition (Abstract, [0058] [0061] [0062]) comprising: a common encoder network (Fig.4, 420 and Fig.5, 520; [0039] [0045]); a primary network (CTC module, wherein the CTC module provides truncation/spike information to guide the attention model to perform attention modeling for each truncation which realizes continuous speech recognition and ensure high accuracy, Fig.5, 540; [0030] [0031] [0046]) for minimizing a Connectionist Temporal Classification (CTC) loss for speech recognition (the head of the CTC module is trained with a CTC loss function, [0048])); and a subnetwork (attention decoder, Fig.5, 550) for minimizing a cross-entropy function loss ([0047] [0048]), wherein a first set of output data of the common encoder network is received by both of the primary network and sub network (Fig.5, 530; [0045 – 0047]). Yet, Fu fails to teach the following: a Mean squared error is minimized for the sub network; and a second set of the output data of the common encode network is received only by the primary network from among the primary network and the sub network.
However, Saon discloses a system and method for training neural network (Abstract), wherein a neural network is trained by optimizing cross-entropy or mean-squared error ([0026]).
Additionally, Cui discloses a system and method for performing speech recognition training (Abstract), wherein a set of output data of the common encode network (Fig.2, 210; [0054] [0055]) is received only by a first network (Fig.2, 220 and Fig.3, 320; [0067] [0071 – 0073]) from among a first and second network (Fig.2, 240, 260, 270, 280, 250; [0059 – 0063]) 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to substitute the cross-entropy function loss disclosed by Fu with the mean squared error loss disclosed by Saon to achieve the predictable results of training the neural network system disclosed above by Fu for the purpose of improving the accuracy, efficiency and performance of speech recognition by using neural networks (Fu, [0035])
Additionally, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by the combination of Fu and Saon in the same way that Cui’s invention has been improved to achieve the following predictable results for the purpose of increasing the speed and efficiency of training the input data (Cui, [0059 – 0052]): a second set of the output data of the common encode network is received only by the primary network from among the primary network and the sub network.

For claims 2 and 12, Fu and Cui further disclose, wherein for the first set of output data, the CTC loss is minimized after the MSE loss is minimized (Fu, the truncation information is not provided by the CTC module so that the CTC module and attention decoder operate independently, [0030] [0031] [0038] [0039] [0046 – 0048]) (Cui, CTC model training and attention model training may be independently performed by  training each model at different time periods, wherein different time periods encompasses adjacent and/or consecutive time periods and any and all durations,  [0011] [0039 – 0042] [0045] [0046] [0050]).
 For claims 3 and 13, Fu and Cui further disclose, wherein for the second set of output data, only the CTC loss is minimized (Fu, [0030] [0031] [0038] [0039] [0046 – 0048]) (Cui, CTC model training and attention model training may be independently performed e.g. a mini-batch based alternate training in which one of the CTC model training and the attention model training is randomly selected for optimization in each mini-batch, [0012] [0039 – 0042] [0045] [0046] [0051]).

	For claims 7 and 17, Fu further discloses wherein the common encoder network is common to both the primary network and the sub network (Fu, Fig.4, 420 and Fig.5, 520; [0039] [0045]).

For claims 8 and 18, Fu and Cui further disclose wherein the common encoder network is a bi-directional Long Short Term Memory network (Fu, [0025] [0045]) (Cui, Fig.3, 310; [0071]).

Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Fu et al. (US 2020/0219486) (“Fu”) in view Saon et al. (US 2015/0161522) (“Saon”), and further in view of Cui et al. (US 2020/0135174) (“Cui”) and further in view of  Zhe et al.  (“An Hybrid CTC-Attention Model for Speech Recognition”) (“Zhe”).
For claims 9 and 19, the combination of Fu, Saon and Cui fails to teach, wherein the sub network is a bi-directional Long Short Term Memory (LSTM) network.
However, Zhe discloses a system and method for an improved hybrid CTC-attention model for speech recognition (Abstract) wherein a sub network (attention model network) is a bi-directional LSTM network (Loc-Attention, Fig.2, 3.1. Encoder-Decoder Architecture).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s invention to modify the combined teachings of Fu, Saon and Cui with Zhe’s teachings so that the sub network (Fu, [0047]) is a bi-directional LSTM for the purpose of .

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Fu et al. (US 2020/0219486) (“Fu”) in view Saon et al. (US 2015/0161522) (“Saon”), and further in view of Cui et al. (US 2020/0135174) (“Cui”) and further in view of  Yao et al. (US 2019/0188567) (“Yao”).
For claim 10, the combination of Fu, Saon and Cui fails to teach, wherein the first set of output data of the common encoder network is randomly selected.
However, Yao discloses a system and method for training a neural network (Abstract) wherein a mini-batch of a training set which is used as input to a neural network to generate output data from the neural network is randomly selected, thereby generating  randomly selected output data ([0064] [0065]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to modify the combined teachings of Fu, Saon and Cui with Yao’s teachings so that the first set of the output data of the common encoder network is randomly selected due to randomly selecting the data input to the encoder for the purpose of increasing the efficiency and accuracy of training the learning system.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SONIA L GAY whose telephone number is (571)270-1951.  The examiner can normally be reached on Monday-Friday 9-5 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on 571-272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.