DETAILED ACTION
Status of the Claims
This action is in response to the application filed on 6/12/2019 for application 16/270,681. Claim 1 – 33 are pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 4/17/2019 and 9/25/2019 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

The information disclosure statement filed on 7/6/2021 fails to comply with 37 CFR 1.98(a)(2), which requires a legible copy of each cited foreign patent document; each non-patent literature publication or that portion which caused it to be listed; and all other information or that portion which caused it to be listed.  It has been placed in the application file, but the information referred to therein has not been fully considered.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description: central processing unit (CPU) 572, user interface output devices 576, network interface subsystem 574,  Subsystem 578 and memory (ROM) 534.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: ROM 535, CPU 582, Network Interface Subsystem 585, User Interface Output devices 586 and Deep Learning Processor (GPU, FPGA) 588.  Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of 


Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 31 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.

Claim 31 recite the term “a pseudo-task”. The specification does not define the term and one of the ordinary skill in the art would not be reasonably apprise the scope of the invention. In light of the non-patent literature “Meyerson, Pseudo-task Augmentation: From Deep Multitask Learning to Intratask Sharing – and Back” submitted by the inventor, examiner interpret a pseudo task as one of the tasks trained within multiple tasks in a model (Meyerson, abs. ln. 5 - 7).  

Claim 3, 11 – 13, 21, 23 – 27 and 30 – 33 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention. 

Claim 3 recites limitation “a particular one of the decoders” and “a particular one of the classification tasks”.  There is insufficient antecedent basis for this limitation in the claim or the depending claim. The depending claim Claim 1 recites ”numerous decoders” and “corresponding classification tasks” however one of the ordinary skill in the art would not be 

Claim 11 recites limitation “the decoder” and “the decoder layer”.  There is insufficient antecedent basis for this limitation in the claim or the depending claim. The depending claim Claim 1 recites ”numerous decoders” however one of the ordinary skill in the art would not be able to evaluate which decoder the claim is referring to and thus would not be reasonably apprise the scope of the invention. For the examination purpose, examiner interpret the term as “one of the decoders” and “the decoder layer of the one of the decoders”.

Claim 12 and 13 recite limitation “the classification layer”.  There is insufficient antecedent basis for this limitation in the claim or the depending claim. One of the ordinary skill in the art would not be reasonably apprise the scope of the invention. For the examination purpose, examiner interpret the term as “the classification layer of one of the decoders”.

Claim 21 recites limitation “the training input”, “the inference input” and “the group”.  There is insufficient antecedent basis for this limitation in the claim or the depending claim. One of the ordinary skill in the art would not be reasonably apprise the scope of the invention. For the examination purpose, examiner interpret the term as “a training input”, “an inference input” and “a group”.



Claim 30 recite limitation “the model”. There is insufficient antecedent basis for this limitation in the claim or the depending claim. One of the ordinary skill in the art would not be able to evaluate whether claim is referring to “the underlying multitask model”, “one of the plurality of decoder models”, “the decoder model” or “one of a plurality of task models” and thus would not be reasonably apprise the scope of the invention. For the examination purpose, examiner interpret the term as referring to the pseudo-task augmentation system.
The dependent claims Claim 31 – 33 are rejected with the same reason.  

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –




(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1 – 7, 10 – 12, 18, 21 – 22, 28 – 31 and 33 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Caruana, R. Multitask learning. In Learning to learn, pp. 95–133. Springer US, 1998.

Regarding Claim 1, Caruana discloses: 
A neural network-based model coupled to memory and running on one or more parallel processors (Caruana, intro ln. 8 – 9, multi-task learning in artificial neural net; page 218, para. 3, ln. 7 – 10, where experiments run on different work station including SPARC [parallel processors] with different memory size,), comprising: 
an encoder that processes an input and generates an encoding (Caruana, page. 19, & fig. 1.2, where bottom 2 layers [encoder] takes input and create encoded representation of input); 
numerous decoders that are grouped into sets of decoders in dependence upon corresponding classification tasks (Caruana, page 19, & fig. 1.2 where top layer are decoders, each forms a group of decoding tasks; page 107, sec. 4.4, ln. 8-9, where multiple output on the MTL net and code the Boolean 0/1 on one output, 0.15/0.85 on another, and 0.25/0.75 on a third [multiple classification tasks]),
that respectively receive the encoding as input from the encoder, thereby forming encoder-decoder pairs which operate independently of each other when performing the 
and that respectively process the encoding and produce classification scores for classes defined for the corresponding classification tasks (Caruana, page 107, sec. 4.4, ln. 8-9, where multiple output on the MTL net and code the Boolean 0/1 on one output, 0.15/0.85 on another, and 0.25/0.75 on a third [classification scores for classes defined for the corresponding classification tasks]); 
and a trainer that jointly trains the encoder-decoder pairs over one thousand to millions of backpropagation iterations to perform the corresponding classification tasks (Caruana, page. 88, fig. 3.3, where training MTL of 6 tasks jointly with backprop passes millions of iterations).

Regarding Claim 2, Caruana discloses the neural network-based model of Claim 1. Caruana further discloses:
wherein the trainer is further configured to comprise: a forward pass stage that processes training inputs through the encoder and resulting encodings through the decoders to compute respective activations for each of the training inputs; a backward pass stage, that, over each of the one thousand to millions of backpropagation iterations determines gradient data for the decoders for each of the training inputs in dependence upon a loss function (Caruana, fig. 1.2, where during training, the training data is input into the lower two layer [forward pass through encoder] and the output of the lower two layer [encoder] are processed by the top layer [decoder] to compute respective output; page 107, sec. 4.4, ln. 3, where sigmoid output unit [activations]; page 67, para. 2, ln. 6 – 8 where backprop error gradients when they are summed in the hidden layer shared by the tasks;),
averages the gradient data determined for the decoders, and determines gradient data for the encoder by backpropagating the averaged gradient data through the encoder (Caruana, page 98, para. 2, ln. 5 – 9, where backpropagation using shared hidden layer … error gradient summed at the shared hidden layer; page 70, para. 2, ln. 8 – 9, & footnote, where the MTL benefit may depends on the learning algorithm including learning rate; the effective learning delta is the back propped gradient times the learning rate. By summing error gradients of multiple tasks and set a proper learning rate, the effect of the back propagation is a weighted average of the gradients from each tasks);
an update stage that modifies weights of the encoder in dependence upon the gradient data determined for the encoder (Caruana, fig. 1.2, during training the weights of the lower layers [encoder] are updated [update stage; modify weights] based on the determined gradient); 
and a persistence stage that, upon convergence after a final backpropagation iteration, persists in the memory the modified weights of the encoder derived by the training to be applied to future classification tasks (Caruana, fig. 1.2, the lower layer model parameters [encoder weights] are stored [persistence stage; persist] in memory during training and when training is complete [convergence after a final backpropagation iteration] and the stored model parameter can be used later for classification tasks).


further configured to use a combination of the modified weights of the encoder derived by the training and modified weights of a particular one of the decoders derived by the training to perform a particular one of the classification tasks on inference inputs, and wherein the inference inputs are processed by the encoder to produce encodings, followed by the particular one of the decoders processing the encodings to output classification scores for classes defined for the particular one of the classification tasks (Caruana, fig. 1.1, where during inference mode, the model use the trained [modified] weights in both lower layer [encoder] and top layer [decoder] to perform it’s classification tasks. The input is processed by the lower layer [encoder] to produce encoded representations [encoding] for the top layer [decoder]).

Regarding Claim 4, Caruana discloses the neural network-based model of Claim 1. Caruana further discloses:
further configured to use a combination of the modified weights of the encoder derived by the training and modified weights of two or more of the decoders derived by the training to respectively perform two or more of the classification tasks on inference inputs, and wherein the inference inputs are processed by the encoder to produce encodings, followed by the two or more of the decoders respectively processing the encodings to output classification scores for classes defined for the two or more of the classification tasks (Caruana, fig. 1.2, where during inference mode, the model use the trained [modified] weights in both lower layer 

Regarding Claim 5, Caruana discloses the neural network-based model of Claim 1. Caruana further discloses:
 wherein each training input is annotated with a plurality of task-specific labels for the corresponding classification tasks (Caruana, page. 111, sec. 4.6, para. 2, ln. 1 – 2, where good example of this is in scene analysis where human expertise is often required to label [annotated] important features [task-specific labels for the corresponding classification tasks]).

Regarding Claim 6, Caruana discloses the neural network-based model of Claim 1. Caruana further discloses:
wherein a plurality of training inputs for the corresponding classification tasks are fed in parallel to the encoder as input in each forward pass iteration (Caruana, fig. 1.2, plurality of training input for the corresponding classification tasks are fed in parallel to the lower layer [encoder] for each forward passing iteration),
and wherein each training input is annotated with a task-specific label for a corresponding classification task (Caruana, page. 111, sec. 4.6, para. 2, ln. 1 – 2, where good example of this is in scene analysis where human expertise is often required to label [annotated] important features [task-specific labels for the corresponding classification tasks]).


wherein the loss function is cross entropy (Caruana, page 139, para. 3, ln. 2, where use a normalized cross-entropy loss function) that uses either a maximum likelihood objective function, a policy gradient function (Caruana, page 139, para. 2, ln. 1 – 2, where train the net using conjugate gradient; by using cross-entropy loss and the gradient, the reinforcement learning is a policy gradient), or both.

Regarding Claim 10, Caruana discloses the neural network-based model of Claim 1. Caruana further discloses:
wherein each decoder further comprises at least one decoder layer and at least one classification layer (Caruana, fig. 4.3 where each of the top layer node [decoder; decoder layer; classification layer] decode the encoded representation from the lower layer and classifies the task of its hospital).

Regarding Claim 11, Caruana discloses the neural network-based model of Claim 1. Caruana further discloses:
wherein the decoder is a fully- connected neural network (abbreviated FCNN) and the decoder layer is a fully-connected layer (Caruana, fig. 1.1, the top layer [decoder; fully connected neural network, decoder layer, fully connected layer] is fully connected).


wherein the classification layer is a sigmoid classifier (Caruana, page. 107, sec. 4.4, ln. 3, where using sigmoid output unit).

Regarding Claim 18, Caruana discloses the neural network-based model of Claim 1. Caruana further discloses:
wherein the encoder is a fully- connected neural network (abbreviated FCNN) with at least one fully-connected layer (Caruana, fig. 1.2, where the lower two layers [encoder; fully connected neural network] is fully connected).

Regarding Claim 21, Caruana discloses the neural network-based model of Claim 1. Caruana further discloses:
wherein the input, the training inputs, and the inference inputs are selected from the group consisting of image data, text data and genomic data (Caruana, page. 37, para. 2, ln. 1 – 2, training set contains images).

Regarding Claim 22, Caruana discloses the neural network-based model of Claim 1. Caruana further discloses:
further configured to comprise an initializer that initializes the decoders with random weights (Caruana, page. 84, sec. 3.3.6, ln. 3 – 4, where random initialization is critical to MTL back prop).

Regarding Claim 28, Clam 28 is the method claim corresponding to Claim 1. Caruana discloses the neural network-based model of Claim 1. Claim 28 is rejected with the same reason as Claim 1. 

Regarding Claim 29, Clam 29 is the non-transitory computer readable storage medium claim corresponding to Claim 1. Caruana discloses the neural network-based model of Claim 1. Caruana further discloses: non-transitory computer readable storage medium impressed with computer program instruction, when executed on a processor implement methods (Caruana, page. 248, para. 2, ln. 9 – 14, where the method is written in a computer instruction code which is stored in hard drive [non-transitory computer readable storage medium] and can be executed by processor to perform the disclosed method). Claim 29 is rejected with the same reason as Claim 1. 

Regarding Claim 30, Caruana discloses: 
A pseudo-task augmentation system, comprising:
an underlying multitask model that embeds task inputs into task embeddings (Caruana, fig. 1.2, where the lower two layers [multitask model] takes input and calculate [embed] the encoded representation [task embeddings] for the top layer to perform multiple tasks); 
a plurality of decoder models that project the task embeddings into distinct classification layers (Caruana, fig. 1.2, where the top layer performs multiple classification tasks 
wherein a combination of the multitask model and a decoder model in the plurality of decoder models defines a task model, and a plurality of task models populate a model space (Caruana, fig. 1.1, where the lower layer [multitask model] and each of the top node [decoder] combine into plurality of models [task model] to perform defined tasks and produce outputs [model space])
and a traverser that traverses a model space and determines a distinct loss for each task model in the model based on a distinct gradient during training (Caruana, page 67, para. 2, ln. 6 – 8, where during training backprop error gradients [loss for each task model; distinct gradient] when they are summed in the hidden layer shared by the tasks).

Regarding Claim 31, Caruana discloses the neural network-based model of Claim 30. Caruana further discloses:
wherein a task coupled with a decoder model and its parameters defines a pseudo-task for the underlying multitask model (Caruana, fig. 1.1, each task decoder models including weights is used to train the lower layers [multitask model]).

Regarding Claim 33, Caruana discloses the neural network-based model of Claim 30. Caruana further discloses:
the output unit for the correlated extra feature uses a liner transfer function).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 8, 9 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Caruana, R. Multitask learning. In Learning to learn, pp. 95–133. Springer US, 1998 in view of Loy, Facial landmark detection by deep multi-task learning. In Proceedings of ECCV’14, 2014.

Regarding Claim 8, depending on Claim 1, Caruana discloses a neural network-based model of Claim 1. Caruana does not explicitly disclose: 
wherein the encoder is a convolutional neural network (abbreviated CNN) with a plurality of convolution layers arranged in a sequence from lowest to highest.
Loy explicitly discloses: 
wherein the encoder is a convolutional neural network (abbreviated CNN) with a plurality of convolution layers arranged in a sequence from lowest to highest (Loy, fig. 3, where feature extraction network [encoder] is convolutional neural network with plurality of convolutional layers).


Regarding Claim 9, depending on Claim 1, Caruana discloses a neural network-based model of Claim 1. Caruana does not explicitly disclose: 
wherein the encoding is convolution data.
Loy explicitly discloses:
wherein the encoding is convolution data (Loy, fig. 3, where the shared feature [encoding] are based on convolved data).

Regarding Claim 13, depending on Claim 1, Caruana discloses a neural network-based model of Claim 1. Caruana does not explicitly disclose: 
wherein the classification layer is a softmax classifier.
Loy explicitly discloses:
wherein the classification layer is a softmax classifier (Loy, page. 5, para. 2, ln. 13 – 14, where softmax function at the classifier output).

Claim 14 – 16 are rejected under 35 U.S.C. 103 as being unpatentable over Caruana, R. Multitask learning. In Learning to learn, pp. 95–133. Springer US, 1998 in view of Dong, Multi-task learning for multiple language translation. In Proc. of ACL, pp. 1723–1732, 2015.

Regarding Claim 14, depending on Claim 1, Caruana discloses a neural network-based model of Claim 1. Caruana does not explicitly disclose: 
wherein the encoder is a recurrent neural network (abbreviated RNN), including long short-term memory (LSTM) network or gated recurrent unit (GRU) network.
Dong explicitly discloses: 
wherein the encoder is a recurrent neural network (abbreviated RNN), including long short-term memory (LSTM) network or gated recurrent unit (GRU) network (Dong, fig. 2, where lower network [encoder] is recurrent neural network; fig. 1, where gated recurrent neural network).
Caruana and Dong both teach multi task learning with neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Caruana’s disclosure of multitask learning model with Dong’s disclosure of recurrent multi task model to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to achieve significantly higher translation quality (Dong, abs., ln. 17 – 22).

Regarding Claim 15, depending on Claim 1, Caruana discloses a neural network-based model of Claim 1. Caruana does not explicitly disclose: 

Dong explicitly discloses: 
wherein the encoding is hidden state data (Dong fig. 2, & eq. 9 where ht [encoding] is a recurrent neural network hidden state at time t). 

Regarding Claim 16, depending on Claim 1, Caruana discloses a neural network-based model of Claim 1. Caruana does not explicitly disclose: 
wherein each decoder is a recurrent neural network (abbreviated RNN), including long short-term memory (LSTM) network or gated recurrent unit (GRU) network. 
Dong explicitly discloses: 
wherein each decoder is a recurrent neural network (abbreviated RNN), including long short-term memory (LSTM) network or gated recurrent unit (GRU) network (Dong fig. 2, & fig. 1 where each of the language model is a gated recurrent model)

Claim 17, 19, 20 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Caruana, R. Multitask learning. In Learning to learn, pp. 95–133. Springer US, 1998 in view of Huang, US20170256254, Modular Deep Learning Model, 2017.

Regarding Claim 17, depending on Claim 1, Caruana discloses a neural network-based model of Claim 1. Caruana does not explicitly disclose: 
wherein each decoder is a convolutional neural network (abbreviated CNN) with a plurality of convolution layers arranged in a sequence from lowest to highest.

wherein each decoder is a convolutional neural network (abbreviated CNN) with a plurality of convolution layers arranged in a sequence from lowest to highest (Huang, fig. 3, & para. 0028, where decoder 381, 382, and 383 can be convolutional with multiple layers arranged in a sequence from lowest to highest).
Caruana and Huang both teach multi task neural network and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Caruana’s disclosure of multitask learning model with Huang’s disclosure of multiple types of decoder model to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to improve the accuracy of the model (Huang, para. 0024, ln. 9 – 10).

Regarding Claim 19, depending on Claim 1, Caruana discloses a neural network-based model of Claim 1. Caruana does not explicitly disclose: 
wherein at least some of the decoders are of a first neural network type, at least some of the decoders are of a second neural network type, and at least some of the decoders are of a third neural network type. 
Huang explicitly discloses: 
wherein at least some of the decoders are of a first neural network type, at least some of the decoders are of a second neural network type, and at least some of the decoders are of a third neural network type (Huang, fig. 3, & para. 0028, where multiple decoder 381, 382 and 383; technology can be used for decoding … such as deep neural network DNN, convolutional neural network CNN, long short term memory recursive neural network LSTM-RNN or a Convolutional Long Short-Term Memory Deep Neural Network CL-DNN).

Regarding Claim 20, depending on Claim 1, Caruana discloses a neural network-based model of Claim 1. Caruana does not explicitly disclose: 
wherein at least some of the decoders are convolutional neural networks (abbreviated CNNs) with a plurality of convolution layers arranged in a sequence from lowest to highest, at least some of the decoders are recurrent neural networks (abbreviated RNNs), including long short-term memory (LSTM) networks or gated recurrent unit (GRU) networks, and at least some of the decoders are fully-connected neural networks (abbreviated FCNNs). 
Huang explicitly discloses: 
wherein at least some of the decoders are convolutional neural networks (abbreviated CNNs) with a plurality of convolution layers arranged in a sequence from lowest to highest, at least some of the decoders are recurrent neural networks (abbreviated RNNs), including long short-term memory (LSTM) networks or gated recurrent unit (GRU) networks, and at least some of the decoders are fully-connected neural networks (abbreviated FCNNs) (Huang, fig. 3, & para. 0028, where multiple decoder 381, 382 and 383 with multiple layers arranged in sequence from low to high; technology can be used for decoding … such as deep neural network DNN, convolutional neural network CNN, long short term memory recursive neural network LSTM-RNN or a Convolutional Long Short-Term Memory Deep Neural Network CL-DNN; para. 0026, ln. 4, where fully connected layers [fully connected neural network]).


further configured to comprise the initializer that freezes weights of some decoders for certain number of backpropagation iterations while updating weights of at least one high performing decoder among the decoders over the certain number of backpropagation iterations,
and wherein the high performing decoder is identified based on performance on validation data.
Huang explicitly discloses: 
further configured to comprise the initializer that freezes weights of some decoders for certain number of backpropagation iterations while updating weights of at least one high performing decoder among the decoders over the certain number of backpropagation iterations (Huang, fig. 6, & para. 0086, where training the female voice specific sub-module … holding the value constant [freeze weight] in all other sub module [some decoders] … using female voice specific input data, during the training iterations [backpropagation iterations] of updating the female voice specific sub-module [high performing decoder]),
and wherein the high performing decoder is identified based on performance on validation data (Huang, para. 0056, ln. 9 – 11, where training data for a female voice may be used to train female voice specific sub-module to more accurately [higher performance] process female voice data; i.e., while processing female voice data, the female voice specific sub-module has the higher performance than other module).

Claim  24 – 27 and 32 are rejected under 35 U.S.C. 103 as being unpatentable over Caruana, R. Multitask learning. In Learning to learn, pp. 95–133. Springer US, 1998 in view of Rennie, Annealed dropout training of deep networks, 2014 IEEE Spoken Language Technology Workshop (SLT) 2014.

Regarding Claim 24, depending on Claim 1, Caruana discloses a neural network-based model of Claim 1. Caruana does not explicitly disclose: 
further configured to comprise the initializer that periodically and randomly drops out weights of the decoders after certain number of backpropagation iterations.
Rennie explicitly discloses: 
further configured to comprise the initializer that periodically and randomly drops out weights of the decoders after certain number of backpropagation iterations (Rennie, sec. 3.3, ln. 4 – 10, where during iteration i is initialized [periodically; certain number of backpropagation iterations] … drop out probability [randomly drop out]).
Caruana and Rennie both teach neural network learning techniques and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Caruana’s disclosure of multitask learning model with Huang’s disclosure of annealed drop out learning technique to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to improve the performance (Rennie, abs. ln. 3 – 4).


further configured to comprise the initializer that periodically perturbs weights of the decoders after certain number of backpropagation iterations by adding random noise to the weights.
Rennie explicitly discloses: 
further configured to comprise the initializer that periodically perturbs weights of the decoders after certain number of backpropagation iterations by adding random noise to the weights (Rennie, sec. 3.3, ln. 4 – 10, where during iteration i is initialized [periodically; certain number of backpropagation iterations] … with a lower average number of non-zero weights, and higher variance in the number of active weights [noise to the weights of the model]).

Regarding Claim 26, depending on Claim 1, Caruana discloses a neural network-based model of Claim 1. Caruana does not explicitly disclose: 
further configured to comprise the initializer that periodically perturbs hyperparameters of the decoders after certain number of backpropagation iterations by randomly changing a rate at which weights of the decoders are randomly dropped out. 
Rennie explicitly discloses: 
further configured to comprise the initializer that periodically perturbs hyperparameters of the decoders after certain number of backpropagation iterations by randomly changing a rate at which weights of the decoders are randomly dropped out (Rennie, sec. 4.7, ln. 3 – 11, where at each epoch [periodically; certain number of back propagation iterations] … the model that lowers … error rate the most is then selected. i.e., at each epoch, the drop out rate are changed base on the performance of different setup. Since the performance among different setup can not be projected, the next drop out rate is random). 

Regarding Claim 27, depending on Claim 1, Caruana discloses a neural network-based model of Claim 1. Caruana does not explicitly disclose: 
further configured to comprise the initializer that identifies at least one high performing decoder among the decoders after every certain number of backpropagation iterations and copies weights and hyperparameters of the high performing decoder to the other decoders, and wherein the high performing decoder is identified based on performance on validation data. 
Rennie explicitly discloses: 
further configured to comprise the initializer that identifies at least one high performing decoder among the decoders after every certain number of backpropagation iterations and copies weights and hyperparameters of the high performing decoder to the other decoders, and wherein the high performing decoder is identified based on performance on validation data (Rennie, sec. 4.7, ln. 3 – 11, where tree search approach … each epoch produce 4 updated models … the model that lowers the … error rate most is then selected. The selected model is copied and used by the other 3 models in the next epoch to continue the training).

Regarding Claim 32, depending on Claim 30, Caruana discloses a system of Claim 30. Caruana does not explicitly disclose: 

Rennie explicitly discloses: 
further comprising a selector that selects a best performing decoder model for a given task (Rennie, sec. 4.7, ln. 3 – 11, where tree search approach … each epoch produce 4 updated models … the model that lowers the … error rate most is then selected).
Caruana and Rennie both teach neural network learning techniques and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Caruana’s disclosure of multitask learning model with Huang’s disclosure of annealed drop out learning technique to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to improve the performance (Rennie, abs. ln. 3 – 4).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIEN MING CHOU whose telephone number is (571)272-9354.  The examiner can normally be reached on Monday- Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/S.C./Examiner, Art Unit 2122                                                                                                                                                                                                        

/ERIC NILSSON/Primary Examiner, Art Unit 2122