DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The use of the term JAVA® is a trade name or a mark used in commerce has been noted in this application. It should be capitalized wherever it appears and be accompanied by the generic terminology. It should also include a ®, TM or SM, whichever is appropriate.
Although the use of trade names and marks used in commerce (i.e., trademarks, service marks, certification marks, and collective marks) are permissible in patent applications, the proprietary nature of the marks should be respected and every effort made to prevent their use in any manner which might adversely affect their validity as commercial marks.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claim recites “a computer program product comprising a computer readable medium”, which can fall into the impermissible software per se category because it is merely describing a program product with no tangible form or structure comprising a computer readable medium that can engage in a signal per se transmission of data (see MPEP §2106.03). Wherein software per se and signal per se are impermissible non-statutory subject matter. Therefore, claim 15 is rejected. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the 

Claims 1-3, 5, 7, 8, 10-12, 14, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Mnih et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0278018, hereinafter Mnih) in view of Kang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0083829, hereinafter Kang).

Regarding claim 1, Mnih teaches:
A system configured for training first and second neural network models, the system 5comprising ([0061] and [0083]-[0084]: describing a system for training first and second neural networks (NNs) as shown in Fig. 5a.): 
a memory comprising instruction data representing a set of instructions ([0030], [0032], and [0084]: describing various memory that can store codes or instructions for implementing the system and training of the neural networks.); 
a processor configured to communicate with the memory and to execute the set of instructions, wherein the set of instructions, when executed by the processor, cause the processor to ([0030]-[0032] and [0084]: describing a processor that operates in conjunction with the memory to execute the codes or instructions stored in the memory. This is shown in Fig. 5b.):  
10…; 
([0019], [0060]-[0061], and [0063]-[0065]: describing the training of “the second neural network (neural network 1)” and updating/adjusting weights of the second NN, wherein the training comprises Q-value computation using input state data, i.e. first dataset.); and 
adjust the corresponding weight in the first model based on the updated weight in the second model ([0064] and [0066]-[0067]: describing that weights in the first NN model (“neural network 0”) can be adjusted based on the updated training of the second NN model, wherein the adjustment comprises updated weight values of the second NN model being copied to the first NN model. See also [0053]-[0055] and [0066]-[0067]: describing the computation that includes updating the weights and the related pseudocode, respectively.).

While the cited reference Mnih teaches the above limitations of claim 1, it does not explicitly teach: “set a weight in the second model based on a corresponding weight in the first model” on line 7. Kang teaches: setting a weight in the student NN model (i.e. the second model) based on the weight of the teacher NN model (i.e. the first model) (Kang [0045], [0092], and [0106]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models in the cited reference to include the setting the weight in the second NN model in Kang. Doing so would enable training of a plurality of NN models comprising the teacher and student NN models, wherein “[a] plurality of teacher models 110 may be used to train the student model 120. At least one teacher model may be selected from the plurality of teacher models 110 and the student model 120 may be trained using the selected at least one teacher model. A process of selecting at least one teacher model from the plurality of teacher models 110 and training the student model 120 may be performed iteratively until the student model 120 satisfies a predetermined condition.” (Kang [0048]). The training comprising “[e]rror back-propagation learning” for “updating connection weights to reduce a loss” (Kang [0044]). 

Regarding claim 2, the rejection of claim 1 is incorporated. Kang further teaches:
A system as in claim 1 wherein the weight comprises a weight in one of: 
an input layer of the second model; and 
a hidden layer of the second model (Kang [0083], [0085]-[0086], and [0092]: describing the hidden layers including classifier layers in the student NN model and the weights in those layers as shown in Fig. 5. Wherein the student NN model denotes the second model as was previously described.).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models in the cited reference to include the weight in the hidden layer in the student NN model in Kang. Doing so would enable training of the student NN model via its hidden layers and classifier layers to update the various connection weights in the student NN model (Kang [0087]-[0088]).


Regarding claim 3, the rejection of claim 1 is incorporated. Mnih teaches:
A system as in claim 1 wherein causing the processor to adjust the corresponding weight in the first model comprises causing the processor to: 
copy a value of the weight from the second model to the corresponding weight in the first model ([0064]: describing that “weights from the second[] trained neural network are copied across to the first neural network”.).

Regarding claim 5, the rejection of claim 1 is incorporated. Kang further teaches:
A system as in claim 1 wherein causing the processor to adjust the corresponding weight in the first model further comprises causing the processor to: 
set a weight in an output layer of the first model to an arbitrary value (Kang [0055]: describing setting an initial weight of the teacher NN model, i.e. the first model as previously described, to a random initial value. Wherein the teacher NN model comprises an output layer (Kang [0039] and [0041]). Thus, the random initialization of the teacher NN model includes the random initialization of the output layer in the teacher NN model.). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models in the cited reference to include the random weight in teacher NN model in Kang. Doing so would enable “[t]he plurality of teacher models may have different initial weights” (Kang [0054]) as part of “a process of selecting one teacher model from a plurality of teacher models to train a student model” (Kang [0051]).


Regarding claim 7, the rejection of claim 1 is incorporated. Kang further teaches:
A system as in claim 1 wherein causing the processor to set a weight in the second model comprises causing the processor to: 
copy a value of a weight from one of: an input layer of the first model; and 10a hidden layer of the first model, to a corresponding weight in the second model (Kang [0086] and [0092]: describing that the weight of the student NN model (i.e. the second model) at the classifier layers (i.e. hidden layers) are being copied from the corresponding classifier layers (i.e. hidden layers) from the teacher NN model (i.e. the first model). This is shown in Fig. 5. See also Kang [0041] and [0046]-[0047]: further describing the NN models with the hidden/classifier layers.).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models in the cited reference to include the copying of the weights from the teacher NN model to the student NN model in Kang. Doing so would enable training of a plurality of NN models comprising the teacher and student NN models, wherein “[a] plurality of teacher models 110 may be used to train the student model 120. At least one teacher model may be selected from the plurality of teacher models 110 and the student model 120 may be trained using the selected at least one teacher model. A process of selecting at least one teacher model from the plurality of teacher models 110 and training the student model 120 may be performed iteratively until the student model 120 satisfies a predetermined condition.” (Kang [0048]). The training comprising “[e]rror back-propagation learning” for “updating connection weights to reduce a loss” (Kang [0044]). 
Regarding claim 8, the rejection of claim 1 is incorporated. Kang further teaches:
A system as in claim 1 wherein causing the processor to set a weight in the second model further comprises causing the processor to:  
15set at least one weight in an output layer of the second model to an arbitrary value (Kang [0086] and [0092]: describing that the weight of the output layer of the student NN model (i.e. second model) can be set to an initial weight of the teacher NN model (i.e. first model). Wherein the initial weight of the teacher NN model is random/arbitrary (Kang [0055]). Thus, enabling the weight of the output layer of the student NN model to be random/arbitrary.).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models in the cited reference to include the setting of random weight in the student NN model in Kang. Doing so would enable training of a plurality of NN models comprising the teacher and student NN models, wherein “[a] plurality of teacher models 110 may be used to train the student model 120 may be performed iteratively until the student model 120 satisfies a predetermined condition.” (Kang [0048]).

Regarding claim 10, the rejection of claim 1 is incorporated. Mnih teaches:
A system as in claim 1 wherein the first model comprises one of:  
25a model configured to produce a single output ([0062]: describing that the first NN model determines a target Qy value, i.e. a single output value.); and 
a model configured to produce a plurality of outputs; and 
wherein the second model comprises the other one of: 

a model configured to produce a plurality of outputs ([0061]: describing that the second NN model determines a set of output Q values, i.e. a plurality of output values.).

Regarding claim 11, the rejection of claim 1 is incorporated. Mnih teaches:
A system as in claim 1 wherein the set of instructions, when executed by the processor, further cause the processor to: 
adjust a weight in one of: 
the first model ([0064]: describing that weights in the first NN model are being changed/updated.); and  222017P02354US01 
the second model; 
in response to further training of the other one of: 
the first model; and 
the second model ([0064]: describing that weights in the first NN model are being changed/updated as “the training of the second neural network proceeds”.).

Regarding claim 12, the rejection of claim 11 is incorporated. Kang further teaches:
A system as in claim 11 wherein the set of instructions, when executed by the processor, cause the processor to repeat the step of adjusting a weight, until one or more of the following criteria are met: 
the first model and/or the second model reach a threshold accuracy level (Kang [0044], [0062], [0088], [0110] and [0133]: describing adjusting of weights in the student NN model (i.e. second model) via error backpropagation and optimizing an objective function to update the “connection weights” accordingly such that the student NN model reaches a threshold accuracy.);  
10the magnitude of an adjustment falls below a threshold magnitude; 
said weight in the first model and its corresponding weight in the second model converge towards one another within a predefined threshold; and 
a loss associated with the first model and/or a loss associated with the second model changes by less than a threshold amount between subsequent adjustments.
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models in the cited reference to include the weight adjustment in the student NN model in correlation with an accuracy threshold in Kang. Doing so would enable training of a plurality of NN models comprising the teacher and student NN models, wherein “[a] plurality of teacher models 110 may be used to train the student model 120 may be performed iteratively until the student model 120 satisfies a predetermined condition.” (Kang [0048]).

Regarding independent claim 14, claim 14 is substantially similar to independent claim 1 and therefore is rejected on the same grounds as claim 1. Claim 14 is a method claim that corresponds to system claim 1.



Regarding independent claim 15, the rejection of claim 14 is incorporated. Mnih teaches:
A computer program product comprising 
a computer readable medium, the computer readable medium having computer readable code embodied therein ([0030], [0032], and [0084]: describing various computer readable memory that can store codes or instructions for implementing the system and training of the neural networks.), 
the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to perform the method as claimed in claim 14 ([0030]-[0032] and [0084]: describing a processor that operates in conjunction with the memory to execute the codes or instructions stored in the memory. This is shown in Fig. 5b.).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Mnih et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0278018, hereinafter Mnih) and Kang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0083829, hereinafter Kang) in view of Aslan et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0132528, hereinafter Aslan) and Takatori et. al. (U.S. Pat. No. 5,430,829, hereinafter Takatori).

Regarding claim 4, the rejection of claim 1 is incorporated. The cited references in combination do not explicitly teach: “apply an increment to a value of the corresponding weight in the first model,….” Takatori teaches: application of an increase/increment to a weight value in the neural network whereby such increase can occur via a predetermined value, e.g. an increase of 5% (Takatori col. 4 lines 12-28 and 67; col. 5, lines 24-40; col. 6, lines 8-30; and col. 7, lines 2-30). The neural network being a first neural network (col. 3, lines 6-15 and col. 4, lines 52-55) as shown in Figs. 1 and 2.
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models along with the weight and locations of the weights in the student and teacher NN models in the combined cited references to include the increase in weights in Takatori. Doing so would enable “[t]he weight increase of synapse for a learning change … with respect to the number of learnings. The learning for a whole system is gradually performed by a plurality of learnings, and a slight adjustment is performed with a little change at the end of learning. Learning speed is heightened as weight increases rapidly at the beginning of learning.” (Takatori col. 4, lines43-50). 

While the cited reference Takatori teaches the above limitations of claim 4, it does not explicitly teach: “based on the difference between the corresponding weight in the first model and the weight in the second model”. Aslan teaches: an objective function computation “with respect to weight parameters multiple models being trained” that comprises a difference determination between the first and second NN models (Aslan [0033] and [0036]-[0039]).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the increase in the weights in the cited reference to include the difference determination between the two NN models in Aslan. Doing so would enable “[j]oint training of the first model 100 and the second model 102 involves training the models 100 and 102 in parallel such that at least one of the models 100 and/or 102 influences the training of the other model… In this sense, the second (student) model 102 can be considered to be learning from the first (teacher) model 100 as the first model 100 learns.” (Aslan [0026]). Wherein “the second (student) model 102, is able to “see” what another model, such as the first (teacher) model 100, is learning by virtue of terms in the objective function that is optimized for training the respective models 100 and 102” (Aslan [0028]). 

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Mnih et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0278018, hereinafter Mnih) and Kang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0083829, hereinafter Kang) in view of Go et. al. “Multigradient: A New Neural Network Learning Algorithm for Pattern Classification” (hereinafter Go).

Regarding claim 6, the rejection of claim 1 is incorporated. The cited references in combination do not explicitly teach: “maintain a value of at least one weight in an output layer of the first model at the same value”. Go teaches: the computation for adjusting weights in the output neurons of the neural network, wherein output neurons reside in the output layer of the neural network (Go Section II). Based on this computation, “we ignore the output neurons that exceed the target values and concentrate on the output neurons that do not meet the target values, updating weights accordingly” (Go Section II). That is, the weights in the ignored output neurons of the output layer are not being changed and are being maintained, while the other weights in the non-ignored output neurons are being changed accordingly. This results in at least one weight value in the output layer being maintained. The neural network denotes a first model. 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models along with the weight and locations of the weights in the student and teacher NN models in the combined cited references to include the maintaining of the weights in the output layer in Go. Doing so would enable “new learning algorithm for multilayer feedforward neural networks, which converges faster and achieves a better classification accuracy than the conventional backpropagation learning algorithm for pattern classification…. In the proposed learning algorithm, we view each term of the output layer as a function of weights and adjust the weights directly so that the output neurons produce the desired outputs” (Go Abstract). That is, the new learning algorithm comprises “adjust[ing] each weight so that the output neurons can produce the desired outputs. This adjustment is accomplished by taking gradients.” (Go Section I). 

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Mnih et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0278018, hereinafter Mnih) and Kang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0083829, hereinafter Kang) in view of Dijkman et. al. (U.S. Pat. App. Pre-Grant Pub. No. 20170169314, hereinafter Dijkman).




Regarding claim 9, the rejection of claim 1 is incorporated. Mnih teaches:
A system as in claim 1 wherein the first model comprises one of: 
an object detection model ([0024] and [0080]-[0081]: describing that the first NN model can be used to recognize/detect local structures in the image data via filtering of the image.); and 
an object localisation model; 20and ….

While the cited reference Mnih teaches the above limitations of claim 9, it does not explicitly teach: “wherein the second model comprises the other one of: an object detection model; and an object localisation model.” Dijkman teaches: 
“wherein the second model comprises the other one of: 
an object detection model; and 
an object localisation model (Dijkman [0044], [0084], [0095], [0097], and [0102]: describing the NN model for object localization using bounding boxes. Wherein the NN model can operate on its respective processing unit (Dijkman [0058]), i.e. the second NN operating on the second processing unit. The processing units are shown in Fig. 2.).”
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models in the cited reference to include the object localization in Dijkman. Doing so would enable “[a] method of training for image classification includes labelling a crop from an image including an object of interest. The crop may be labelled with an indication of whether the object of interest is framed, partially framed or not present in the crop.” (Dijkman Abstract). Wherein the training comprises “high-quality object localization process [that] may include a bounding box proposal, a bounding box classification and bounding box regression” (Dijkman [0045]).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Mnih et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0278018, hereinafter Mnih) and Kang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0083829, hereinafter Kang) in view of Ura et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0122078, hereinafter Ura).

Regarding claim 13, the rejection of claim 1 is incorporated. The cited references in combination do not explicitly teach: “wherein the first model is trained on a second dataset, the first dataset comprising less data than the second dataset, wherein the size of the second dataset alone is insufficient ….” Ura teaches: 
“wherein the first model is trained on a second dataset, the first dataset comprising less data than the second dataset (Ura [0032] and [0040]: describing learning processes for training machine learning model (i.e. the first model), wherein the learning processes can be on a second sample data set that is larger than the first sample data set. That is, the first data set is smaller with less data than the second dataset.), 
wherein the size of the second dataset alone is insufficient (Ura [0034] and [0081]: describing that the respective machine learning model is dependent on not just training data set alone, but on other metrics, e.g. hyperparameters. Wherein the size of the second sample data set was previously described.)….”
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) Ura. Doing so would enable “a model to be built by machine learning and its prediction performance” (Ura [0034] and [0053]). 

While the cited reference Ura teaches the above limitations of claim 13, it does not explicitly teach: “to train the second model to a predefined accuracy with arbitrarily initiated weights”. Kang teaches: training a student NN model (i.e. second model) to reach a predetermined accuracy condition (Kang [0048] and [0078]-[0080]). Wherein the weights of the student NN model can be initialized based the initial weights of the teacher NN model, with the initial weight of the teacher NN model is random/arbitrary (Kang [0055]). Thus, enabling the weights of the student NN model to be random/arbitrary.
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the sizes of the data sets for training machine learning models in the cited reference to include the accuracy and random initialization of the student NN model. Doing so would enable training of a plurality of NN models comprising the teacher and student NN models, wherein “[a] plurality of teacher models 110 may be used to train the student model 120. At least one teacher model may be selected from the plurality of teacher models 110 and the student model 120 may be trained using the selected at least one teacher model. A process of selecting at least one teacher model from the plurality of teacher models 110 and training the student model 120 may be performed iteratively until the student model 120 satisfies a predetermined condition.” (Kang [0048]). The training comprising “[e]rror back-propagation learning” for “updating connection weights to reduce a loss” (Kang [0044]). 
 
Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure:
Andoni et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0130277): describing training of ensemble of neural networks. Wherein the ensemble of neural networks can set its topology, activation functions, and connection weights. That is, the various parameters of the neural network can be configured and set. The training of the neural networks can include a backpropagation trainer to optimize the parameters of the neural networks. 
Kamiya et. al. (U.S. Pat. No. 5,195,169): describing a learning of a neural network, wherein weight values for the synaptic connections between the neuron units of the neural network can be monitored and updated in an “optimum manner”. The weights are updated when it meets a preset condition, with the weight value then being set to a predetermined value. Weight values can be set to either 0 or some “proper value” as determined by the error backpropagation computations and the convergence of the neural network learning based on the input and output data values. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SELENE A HAEDI whose telephone number is (571)270-5762. The examiner can normally be reached M-F 11 AM - 7 PM EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ RIVAS can be reached on (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.H./Examiner, Art Unit 2128                                                                                                                                                                                                        



/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128