DETAILED ACTION
Status of the Claims
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 8/5/2022 has been entered.

Status of the Claims
This action is in response to the amendment filed on 8/5/2022 for application 16/270,681 filed on 2/8/2019. Claim 1 – 29 are pending and have been examined.

Claim 1, 3, 28 has been amended.

Claim 30 – 33 has been canceled.

Claim rejection under 112(b) has been withdrawn in light of the amendment to the claim. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 4/17/2019, 9/25/2019, 7/29/2022 and 10/18/2022 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

The information disclosure statement filed on 7/6/2021 fails to comply with 37 CFR 1.98(a)(2), which requires a legible copy of each cited foreign patent document; each non-patent literature publication or that portion which caused it to be listed; and all other information or that portion which caused it to be listed.  It has been placed in the application file, but the information referred to therein has not been fully considered.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claim 1, 28 and 29 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention. 

Claim 1, 28 and 29 recites terms Dtd, Xti, ϴDtd , Nt, yti in equations. Specification also recite these terms in equations. However, neither the claim nor the specification provide a definition or description for these terms. Thus, the scope of the claim is unclear. For the examination purpose, these terms are interpreted as any decoder function, any input data, parameter of any decoder function, any number, and any target in training data.

Response to Remarks
Applicant's remark filed on 3/22/2021 has been fully considered but they are not persuasive. 

Regarding independent Claim 1, 28 and 29. The claimed model and the training equation are anticipated by Caruana under examiner’s BRI. For further detail, please see the claim rejection under 35 U.S.C. 102 section. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1 – 7, 10 – 12, 18, 21, 22 and 28 - 29  and are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Caruana, R. Multitask learning. In Learning to learn, pp. 95–133. Springer US, 1998.

Regarding Claim 1, Caruana discloses: 
A neural network-based model coupled to memory and running on one or more parallel processors (Caruana, intro ln. 8 – 9, multi-task learning in artificial neural net; page 218, para. 3, ln. 7 – 10, where experiments run on different work station including SPARC [parallel processors] with different memory size,), comprising: 
an encoder that processes an input and generates an encoding (Caruana, page. 19, & fig. 1.2, where bottom 2 layers [encoder] takes input and create encoded representation of input); 
numerous decoders that are grouped into sets of decoders in dependence upon a corresponding classification tasks (Caruana, page 108, para. 2, ln. 5 – 10, where one output representation for this problem is to have one output for each of the twenty persons that net is supposed to recognize [twenty decoders in a group for a task to identify person], another output representation … a set of face features … beard/no_beard, mustache/no_mustache, glasses/no_glasses, long_hair/short_hair/bald, hair_color, eye_color, male/female, etc. [multiple decoder for face feature task]),
that (i) respectively receive the encoding as input from the encoder, thereby forming encoder-decoder pairs which operate independently of each other when performing the corresponding classification tasks T (Caruana, page. 19, fig. 1.1, top layer [decoder] receive encoding from the bottom 2 layer [encoder] and form encoder-decoder pairs that each operate independently of each other when performing tasks), wherein the encoder is paired with multiple decoders D including multiple decoders from a same set of decoders and multiple decoders from a different set of decoders, wherein each decoder in the same set of decoders performs a first classification task and each decoder in the different set of decoders performs a second classification task (Caruana, page 108, para. 2, ln. 5 – 10, where one output representation for this problem is to have one output for each of the twenty persons that net is supposed to recognize [twenty decoders for task to identify person], another output representation … a set of face features … beard/no_beard, mustache/no_mustache, glasses/no_glasses, long_hair/short_hair/bald, hair_color, eye_color, male/female, etc. [multiple decoder for face feature task]),
(ii) respectively process the received encoding including processing the encoding through their individual decoder layers and classification layers and produce classification scores for classes defined for the corresponding classification tasks (Caruana, fig. 1.2, where each decoder process received encodings and produce classification results for each classification tasks); 
and a trainer that jointly trains the encoder-decoder pairs over one thousand to millions of backpropagation iterations to perform the corresponding classification tasks (Caruana, page. 88, fig. 3.3, where training MTL of 6 tasks jointly with backprop passes millions of iterations).
Wherein the neural network-based model is 
    PNG
    media_image1.png
    42
    368
    media_image1.png
    Greyscale
for a dth decoder of a tth task, ϴF is parameterization of a joint model F shared across all classification tasks, Dt is parameterization of task-specific decoders for each classification task; (Examiner’s BRI, each decoder output y^ of each task is the decoded result of the encoded input, based on the decoder parameter and encoder parameter; Caruana, fig. 1.2, where each of the top node [decoder] takes output of the lower 2 layer as input and produce classification results) 
and further wherein an overall loss for the joint model F where there are T classification task and D decoder is 
    PNG
    media_image2.png
    80
    470
    media_image2.png
    Greyscale
, where 
    PNG
    media_image3.png
    59
    414
    media_image3.png
    Greyscale
 (Examiner’s BRI, Nt is not defined in the claim and is interpreted as a constant. The optimization of model parameter is based on the normalized cross entropy over tasks and decoders. The optimization of parameters including parameter of encoder, and parameter of each decoder of each task; Caruana, page. 139, para. 3, ln. 3 – 4, where normalized cross entropy is a standard way of preserving probability semantics when multiple outputs code for mutually exclusive classes; i.e., as the outputs of each decoder of each task are of different class, normalized cross entropy is a standard way to measure overall model loss to optimize the model). 

Regarding Claim 2, Caruana further discloses:
wherein the trainer is further configured to comprise: a forward pass stage that processes training inputs through the encoder and resulting encodings through the decoders to compute respective activations for each of the training inputs; a backward pass stage, that, over each of the one thousand to millions of backpropagation iterations determines gradient data for the decoders for each of the training inputs in dependence upon a loss function (Caruana, fig. 1.2, where during training, the training data is input into the lower two layer [forward pass through encoder] and the output of the lower two layer [encoder] are processed by the top layer [decoder] to compute respective output; page 107, sec. 4.4, ln. 3, where sigmoid output unit [activations]; page 67, para. 2, ln. 6 – 8 where backprop error gradients when they are summed in the hidden layer shared by the tasks;),
averages the gradient data determined for the decoders, and determines gradient data for the encoder by backpropagating the averaged gradient data through the encoder (Caruana, page 98, para. 2, ln. 5 – 9, where backpropagation using shared hidden layer … error gradient summed at the shared hidden layer; page 70, para. 2, ln. 8 – 9, & footnote, where the MTL benefit may depends on the learning algorithm including learning rate; the effective learning delta is the back propped gradient times the learning rate. By summing error gradients of multiple tasks and set a proper learning rate, the effect of the back propagation is a weighted average of the gradients from each tasks);
an update stage that modifies weights of the encoder in dependence upon the gradient data determined for the encoder (Caruana, fig. 1.2, during training the weights of the lower layers [encoder] are updated [update stage; modify weights] based on the determined gradient); 
and a persistence stage that, upon convergence after a final backpropagation iteration, persists in the memory the modified weights of the encoder derived by the training to be applied to future classification tasks (Caruana, fig. 1.2, the lower layer model parameters [encoder weights] are stored [persistence stage; persist] in memory during training and when training is complete [convergence after a final backpropagation iteration] and the stored model parameter can be used later for classification tasks).

Regarding Claim 3, Caruana further discloses:
further configured to use a combination of the modified weights of the encoder derived by the training and modified weights of each decoder within a set of decoders derived by the training to perform the corresponding classification tasks on inference inputs (Examiner’s BRI: during inference mode, the model use both trained/modified weight of encoder and decoders; Caruana, page. 22, ln. 1 – 3, where weights in the nets [weights of encoder and decoders] are updated [modified] each epoch … every 5000 epochs we evaluate the performance of the net [inference using trained model]), and wherein the inference inputs are processed by the encoder to produce encodings, followed by one of the decoders processing the encodings to output classification scores for classes defined for the corresponding classification tasks (Caruana, fig. 1.1, where during inference mode, the model use the trained [modified] weights in both lower layer [encoder] and top layer [decoder], which is the trained model of Ronnie, to perform classification tasks. The input is processed by the lower layer [encoder] to produce encoded representations [encoding] for the top layer [decoder]).

Regarding Claim 4, Caruana further discloses:
further configured to use a combination of the modified weights of the encoder derived by the training and modified weights of two or more of the decoders derived by the training to respectively perform two or more of the classification tasks on inference inputs, and wherein the inference inputs are processed by the encoder to produce encodings, followed by the two or more of the decoders respectively processing the encodings to output classification scores for classes defined for the two or more of the classification tasks (Caruana, page 108, para. 2, & fig. 1.2, where during inference mode, the model use the trained [modified] weights in both lower layer [encoder] and top layer [decoder] to produce multiple classification outputs of the classification score defined by each tasks).

Regarding Claim 5, Caruana further discloses:
 wherein each training input is annotated with a plurality of task-specific labels for the corresponding classification tasks (Caruana, page. 111, sec. 4.6, para. 2, ln. 1 – 2, where good example of this is in scene analysis where human expertise is often required to label [annotated] important features [task-specific labels for the corresponding classification tasks]).

Regarding Claim 6, Caruana further discloses:
wherein a plurality of training inputs for the corresponding classification tasks are fed in parallel to the encoder as input in each forward pass iteration (Caruana, fig. 1.2, plurality of training input for the corresponding classification tasks are fed in parallel to the lower layer [encoder] for each forward passing iteration),
and wherein each training input is annotated with a task-specific label for a corresponding classification task (Caruana, page. 111, sec. 4.6, para. 2, ln. 1 – 2, where good example of this is in scene analysis where human expertise is often required to label [annotated] important features [task-specific labels for the corresponding classification tasks]).

Regarding Claim 7, Caruana further discloses:
wherein a loss function is cross entropy (Caruana, page 139, para. 3, ln. 2, where use a normalized cross-entropy loss function) that uses either a maximum likelihood objective function, a policy gradient function (Caruana, page 139, para. 2, ln. 1 – 2, where train the net using conjugate gradient; by using cross-entropy loss and the gradient, the reinforcement learning is a policy gradient), or both.

Regarding Claim 10, Caruana further discloses:
wherein each decoder further comprises at least one decoder layer and at least one classification layer (Caruana, fig. 1.2 where each of the top layer node [decoder; decoder layer; classification layer] decode the encoded representation from the lower layer and classifies the task of its hospital).

Regarding Claim 11, Caruana further discloses:
wherein each of the numerous decoder is a fully- connected neural network (abbreviated FCNN) and the decoder layer is a fully-connected layer (Caruana, fig. 1.1, the top layer [decoder; fully connected neural network, decoder layer, fully connected layer] is fully connected).

Regarding Claim 12, Caruana further discloses:
wherein the at least one classification layer is a sigmoid classifier (Caruana, page. 107, sec. 4.4, ln. 3, where using sigmoid output unit).

Regarding Claim 18, Caruana further discloses:
wherein the encoder is a fully- connected neural network (abbreviated FCNN) with at least one fully-connected layer (Caruana, fig. 1.2, where the lower two layers [encoder; fully connected neural network] is fully connected).

Regarding Claim 21, Caruana further discloses:
wherein the input, the training inputs, and the inference inputs are selected from the group consisting of image data, text data and genomic data (Caruana, page. 37, para. 2, ln. 1 – 2, training set contains images).

Regarding Claim 22, Caruana further discloses:
further configured to comprise an initializer that initializes the decoders with random weights (Caruana, page. 84, sec. 3.3.6, ln. 3 – 4, where random initialization is critical to MTL back prop; i.e., the model needs initialization and is done by randomize parameters [weights]).

Regarding Claim 28, Clam 28 is the method claim corresponding to Claim 1. Claim 28 is rejected with the same reason as Claim 1. 

Regarding Claim 29, Clam 29 is the non-transitory computer readable storage medium claim corresponding to Claim 1. Caruana further discloses: non-transitory computer readable storage medium impressed with computer program instruction, when executed on a processor implement methods (Caruana, page. 248, para. 2, ln. 9 – 14, where the method is written in a computer instruction code which is stored in hard drive [non-transitory computer readable storage medium] and can be executed by processor to perform the disclosed method). Claim 29 is rejected with the same reason as Claim 1. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 8, 9 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Caruana, R. Multitask learning. In Learning to learn, pp. 95–133. Springer US, 1998 in view of Loy, Facial landmark detection by deep multi-task learning. In Proceedings of ECCV’14, 2014.

Regarding Claim 8, Caruana does not explicitly disclose: 
wherein the encoder is a convolutional neural network (abbreviated CNN) with a plurality of convolution layers arranged in a sequence from lowest to highest.
Loy explicitly discloses: 
wherein the encoder is a convolutional neural network (abbreviated CNN) with a plurality of convolution layers arranged in a sequence from lowest to highest (Loy, fig. 3, where feature extraction network [encoder] is convolutional neural network with plurality of convolutional layers).
Caruana and Loy both teach multi task learning with neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Caruana’s disclosure of multitask learning model with Loy’s disclosure of convolutional encoder module to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order gain superior accuracy at facial classification (Loy, intro., para. 2, ln. 3 – 7).

Regarding Claim 9, Caruana does not explicitly disclose: 
wherein the encoding is convolution data.
Loy explicitly discloses:
wherein the encoding is convolution data (Loy, fig. 3, where the shared feature [encoding] are based on convolved data).
The reason for combination is same as Claim 8.

Regarding Claim 13, Caruana does not explicitly disclose: 
wherein the classification layer is a softmax classifier.
Loy explicitly discloses:
wherein the classification layer is a softmax classifier (Loy, page. 5, para. 2, ln. 13 – 14, where softmax function at the classifier output).
Caruana and Loy both teach multi task learning with neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Caruana’s disclosure of multitask learning model with Loy’s disclosure of convolutional encoder module to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to output posterior probability (Loy, page. 5, para. 2).

Claim 14 – 16 are rejected under 35 U.S.C. 103 as being unpatentable over Caruana, R. Multitask learning. In Learning to learn, pp. 95–133. Springer US, 1998, in view of Dong, Multi-task learning for multiple language translation. In Proc. of ACL, pp. 1723–1732, 2015.

Regarding Claim 14, Caruana does not explicitly disclose: 
wherein the encoder is a recurrent neural network (abbreviated RNN), including long short-term memory (LSTM) network or gated recurrent unit (GRU) network.
Dong explicitly discloses: 
wherein the encoder is a recurrent neural network (abbreviated RNN), including long short-term memory (LSTM) network or gated recurrent unit (GRU) network (Dong, fig. 2, where lower network [encoder] is recurrent neural network; fig. 1, where gated recurrent neural network).
Caruana and Dong both teach multi task learning with neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Caruana’s disclosure of multitask learning model with Dong’s disclosure of recurrent multi task model to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to achieve significantly higher translation quality (Dong, abs., ln. 17 – 22).

Regarding Claim 15, Caruana does not explicitly disclose: 
wherein the encoding is hidden state data
Dong explicitly discloses: 
wherein the encoding is hidden state data (Dong fig. 2, & eq. 9 where ht [encoding] is a recurrent neural network hidden state at time t). 
The reason of combination is same as Claim 14

Regarding Claim 16, Caruana does not explicitly disclose: 
wherein each decoder is a recurrent neural network (abbreviated RNN), including long short-term memory (LSTM) network or gated recurrent unit (GRU) network. 
Dong explicitly discloses: 
wherein each decoder is a recurrent neural network (abbreviated RNN), including long short-term memory (LSTM) network or gated recurrent unit (GRU) network (Dong fig. 2, & fig. 1 where each of the language model is a gated recurrent model)
The reason for combination is same as Claim 14

Claim 17, 19, 20 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Caruana, R. Multitask learning. In Learning to learn, pp. 95–133. Springer US, 1998, in view of Huang, US20170256254, Modular Deep Learning Model, 2017.

Regarding Claim 17, Caruana does not explicitly disclose: 
wherein each decoder is a convolutional neural network (abbreviated CNN) with a plurality of convolution layers arranged in a sequence from lowest to highest.
Huang explicitly discloses: 
wherein each decoder is a convolutional neural network (abbreviated CNN) with a plurality of convolution layers arranged in a sequence from lowest to highest (Huang, fig. 3, & para. 0028, where decoder 381, 382, and 383 can be convolutional with multiple layers arranged in a sequence from lowest to highest).
Caruana and Huang both teach multi task neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Caruana’s disclosure of multitask learning model with Huang’s disclosure of multiple types of decoder model to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to improve the accuracy of the model (Huang, para. 0024, ln. 9 – 10).

Regarding Claim 19, Caruana does not explicitly disclose: 
wherein at least some of the decoders are of a first neural network type, at least some of the decoders are of a second neural network type, and at least some of the decoders are of a third neural network type. 
Huang explicitly discloses: 
wherein at least some of the decoders are of a first neural network type, at least some of the decoders are of a second neural network type, and at least some of the decoders are of a third neural network type (Huang, fig. 3, & para. 0028, where multiple decoder 381, 382 and 383; technology can be used for decoding … such as deep neural network DNN, convolutional neural network CNN, long short term memory recursive neural network LSTM-RNN or a Convolutional Long Short-Term Memory Deep Neural Network CL-DNN).
The reason for combination is same as Claim 17

Regarding Claim 20, Caruana does not explicitly disclose: 
wherein at least some of the decoders are convolutional neural networks (abbreviated CNNs) with a plurality of convolution layers arranged in a sequence from lowest to highest, at least some of the decoders are recurrent neural networks (abbreviated RNNs), including long short-term memory (LSTM) networks or gated recurrent unit (GRU) networks, and at least some of the decoders are fully-connected neural networks (abbreviated FCNNs). 
Huang explicitly discloses: 
wherein at least some of the decoders are convolutional neural networks (abbreviated CNNs) with a plurality of convolution layers arranged in a sequence from lowest to highest, at least some of the decoders are recurrent neural networks (abbreviated RNNs), including long short-term memory (LSTM) networks or gated recurrent unit (GRU) networks, and at least some of the decoders are fully-connected neural networks (abbreviated FCNNs) (Huang, fig. 3, & para. 0028, where multiple decoder 381, 382 and 383 with multiple layers arranged in sequence from low to high; technology can be used for decoding … such as deep neural network DNN, convolutional neural network CNN, long short term memory recursive neural network LSTM-RNN or a Convolutional Long Short-Term Memory Deep Neural Network CL-DNN; para. 0026, ln. 4, where fully connected layers [fully connected neural network]).
The reason for combination is same as Claim 17

Regarding Claim 23, Caruana does not explicitly disclose: 
further configured to comprise the initializer that freezes weights of some decoders for certain number of backpropagation iterations while updating weights of at least one high performing decoder among the decoders over the certain number of backpropagation iterations,
and wherein the high performing decoder is identified based on performance on validation data.
Huang explicitly discloses: 
further configured to comprise the initializer that freezes weights of some decoders for certain number of backpropagation iterations while updating weights of at least one high performing decoder among the decoders over the certain number of backpropagation iterations (Huang, fig. 6, & para. 0086, where training the female voice specific sub-module … holding the value constant [freeze weight] in all other sub module [some decoders] … using female voice specific input data, during the training iterations [backpropagation iterations] of updating the female voice specific sub-module [high performing decoder]),
and wherein the high performing decoder is identified based on performance on validation data (Huang, para. 0056, ln. 9 – 11, where training data for a female voice may be used to train female voice specific sub-module to more accurately [higher performance] process female voice data; i.e., while processing female voice data, the female voice specific sub-module has the higher performance than another module).
The reason for combination is same as Claim 17.

Claim(s) 24 – 27 are rejected under 35 U.S.C. 103 as being unpatentable over Caruana, R. Multitask learning. In Learning to learn, pp. 95–133. Springer US, 1998 in view of Rennie, Annealed dropout training of deep networks, 2014 IEEE Spoken Language Technology Workshop (SLT) 2014.

Regarding Claim 24, Caruana do not explicitly disclose: 
comprise the initializer that periodically and randomly drops out weights of the decoders after certain number of backpropagation iterations
Rennie explicitly disclose: 
further configured to comprise the initializer that periodically and randomly drops out weights of the decoders after certain number of backpropagation iterations (Rennie, sec. 3.3, ln. 4 – 10, where during iteration i is initialized [periodically; certain number of backpropagation iterations] … drop out probability [randomly drop out]).
Caruana and Rennie both teach neural network learning techniques and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Caruana’s disclosure of multitask learning model with Rennie’s disclosure of annealed drop out learning technique to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to improve the performance (Rennie, abs. ln. 3 – 4).

Regarding Claim 25, Caruana in view of Rennie further discloses: 
comprise the initializer that periodically perturbs weights of the decoders after certain number of backpropagation iterations by adding random noise to the weights (Rennie, sec. 3.3, ln. 4 – 10, where during iteration i is initialized [periodically; certain number of backpropagation iterations] … with a lower average number of non-zero weights, and higher variance in the number of active weights [noise to the weights of the model]).
The reason for combination is same as Claim 24. 

Regarding Claim 26, Caruana in view of Rennie further discloses: 
further configured to comprise the initializer that periodically perturbs hyperparameters of the decoders after certain number of backpropagation iterations by randomly changing a rate at which weights of the decoders are randomly dropped out (Rennie, sec. 4.7, ln. 3 – 11, where at each epoch [periodically; certain number of back propagation iterations] … the model that lowers … error rate the most is then selected. i.e., at each epoch, the dropout rate are changed base on the performance of different setup. Since the performance among different setup cannot be projected, the next dropout rate is random). 
The reason for combination is same as Claim 24. 

Regarding Claim 27, Caruana in view of Rennie further discloses: 
further configured to comprise the initializer that identifies at least one high performing decoder among the decoders after every certain number of backpropagation iterations and copies weights and hyperparameters of the high performing decoder to the other decoders, and wherein the high performing decoder is identified based on performance on validation data (Rennie, sec. 4.7, ln. 3 – 11, where tree search approach … each epoch produce 4 updated models … the model that lowers the … error rate most is then selected. The selected model is copied and used by the other 3 models in the next epoch to continue the training).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIEN MING CHOU whose telephone number is (571)272-9354.  The examiner can normally be reached on Monday- Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CHAKI KAKALI can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.








/S.C./Examiner, Art Unit 2122                                                                                                                                                                                                        

/VIKER A LAMARDO/Primary Examiner, Art Unit 2126