DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
Acknowledgement is made of Applicant's claim amendments on 5/24/2022. The claim amendments are entered. Presently, claims 1-3 and 5-21 are now pending. Claim 4 has been cancelled. Claims 1-3 and 5-15 have been amended. Claims 16-21 have been newly added. 

Applicant seeks entry of claim amendments changing “A system” in the preamble of various the claims to “The system”. These amendments are approved and entered. 

Applicant has sufficiently amended the specification to include the requisite trademark designation. Accordingly, the specification objection is withdrawn. 

Applicant has sufficiently amended claim 15 to overcome the non-statutory subject matter issue. Accordingly, the §101 rejection against this claim is withdrawn. 

Response to Arguments
Applicant's arguments filed on 5/24/2022 have been fully considered but they are not persuasive.

Applicant argues that Aslan allegedly does not teach the claim limitations because it allegedly does not teach the weight adjustment and weight difference (Applicant’s reply pgs. 9-11).  This argument is not persuasive. Aslan is not being used to teach the weight adjustment. Regarding, the weight difference, the objective function in Aslan teaches the incorporation of a penalty term computing a difference value between the student and teacher neural network model and their outputs along with weight sets. Wherein the penalty term considers a difference in the outputs from the two neural network model. It is understood that outputs from a neural network model is based at least in part on the parameters of the neural network model, which includes weight parameters. As such, the penalty term intrinsically considers weight parameters and their respective differences for the two neural network models as part of the difference computation between their outputs. The objective function includes the penalty term, with its intrinsic consideration of weight parameters, and also weight sets. The inclusion of the weight sets acting as multipliers does not preclude the intrinsic consideration of the weight parameters as part of the penalty term. As such, the Aslan can teach the claim limitation. 

Applicant argues that Ura allegedly does not teach that the size of the second dataset alone is insufficient alone to train the neural network model (Applicant’s reply pgs. 10-12). This argument is not persuasive. The claim limitation in whole recites that for training the second model, the size of the second dataset alone would not be sufficient to train the second model to a predefined accuracy with arbitrarily initiated weights. That is, the dataset size is not the only factor in training the second model and other factors are considered as well. Ura was used to teach the various sample sizes and the insufficiency of the second dataset alone limitations. Ura discloses in the mapped citations that the prediction model is based on the dataset (which comprises various sizes) and on the hyperparameters of the model. That is, Ura teaches that the datasets and their sizes alone are insufficient for the prediction model because another factor (e.g., hyperparameters) is also considered. As such, Ura does teach the claim limitations. Ura was not being used to teach the remaining limitation regarding the training of the second model to a predefined accuracy with arbitrary initiated weights. Ura was used to teach the insufficiency of the second dataset alone limitation, and as such, it does teach the claim limitation for the reasons stated above. 

Applicant argues that the various dependent claims should be allowable since the independent claims are now allegedly allowable (Applicant’s reply pgs. 10-12). This argument is not persuasive because the independent claims remain rejectable and thus, so are the dependent claims. 

Claim Objections
Claims 2, 3, 5-13 and 20 are objected because of the following informality: there should be a comma between the claim number and “wherein” on line 1. Thus, each of these claims should be amended like so: “claim X, wherein”. The X denoting the various claims numbers. These changes would ensure that these various claims are consistent with the new claims. Appropriate corrections are required.

Claim 15 is objected to because of the following informality: there is a typo on line 1. The “a” should be removed. The claim should be amended like so: “A non-transitory [[a]] computer”.  Appropriate correction is required.
Claim 16 is objected to because of the following informality: there is a typo in the word “wherien” on line 1. The claim should be amended like so: “wherein”. Appropriate correction is required.

Claim 21 is objected to because of the following informality: a hyphen is not needed in the word “co-ordinates”. The term should be amended like so: “coordinates”. Appropriate correction is required. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 14-16, 19, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 14 recites the following limitations that lack sufficient antecedent basis:
Line 8: “the first dataset”. 
Line 9: “the size”.

Claim 15 recites the following limitations that lack sufficient antecedent basis:
Line 6: “the second model” and “the first model”.
Lines 7 and 8: “the second model”.
Line 9: “the first model”.
Lines 9-10: “the second model”.
Line 11: “the first model” and “the first dataset”. 
Line 12: “the size”.
Line 13: “the second model”.

Claim 16 recites the limitation “the other” on line 3. There is insufficient antecedent basis for this limitation in the claim.

Claim 19 recites the limitation “the other” on line 2. There is insufficient antecedent basis for this limitation in the claim.

Claim 20 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being incomplete for omitting essential elements, such omission amounting to a gap between the elements. See MPEP § 2172.01. The omitted element is (bolded for emphasis): “the corresponding weight in the first model may be adjusted by a percentage of the difference between the corresponding weight in the first model and the weight in the second model”. See specification paragraph [0066] in the PG-Pub for this application. Claim 20 is currently incomplete because it is not clear why the percentage difference between those two values are needed and what that percentage difference is used for. The elements in bold are the essential elements that enable the claim to be definite and for it to make sense. Thus, the essential elements shown in bold must be added to claim 20. Appropriate correction is required. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-3, 5, 7, 8, and 10-12 are rejected under 35 U.S.C. 103 as being unpatentable over Mnih et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0278018, hereinafter Mnih) in view of Kang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0083829, hereinafter Kang), Takatori et. al. (U.S. Pat. No. 5,430,829, hereinafter Takatori), and Aslan et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0132528, hereinafter Aslan).

Regarding claim 1, Mnih teaches:
A system configured for training first and second neural network models, the system 5comprising ([0061] and [0083]-[0084]: describing a system for training first and second neural networks (NNs) as shown in Fig. 5a.): 
a memory comprising instruction data representing a set of instructions ([0030], [0032], and [0084]: describing various memory that can store codes or instructions for implementing the system and training of the neural networks.); 
a processor configured to communicate with the memory and to execute the set of instructions, wherein the set of instructions, when executed by the processor, cause the processor to ([0030]-[0032] and [0084]: describing a processor that operates in conjunction with the memory to execute the codes or instructions stored in the memory. This is shown in Fig. 5b.):  
10…; 
train the second model on a first dataset, wherein the training comprises updating the weight in the second model ([0019], [0060]-[0061], and [0063]-[0065]: describing the training of “the second neural network (neural network 1)” and updating/adjusting weights of the second NN, wherein the training comprises Q-value computation using input state data, i.e. first dataset.); and 
adjust the corresponding weight in the first model based on the updated weight in the second model ([0064] and [0066]-[0067]: describing that weights in the first NN model (“neural network 0”) can be adjusted based on the updated training of the second NN model, wherein the adjustment comprises updated weight values of the second NN model being copied to the first NN model. See also [0053]-[0055] and [0066]-[0067]: describing the computation that includes updating the weights and the related pseudocode, respectively.)….

While the cited reference Mnih teaches the above limitations of claim 1, it does not explicitly teach: “set a weight in the second model based on a corresponding weight in the first model” on lines 7-8. Kang teaches: setting a weight in the student NN model (i.e. the second model) based on the weight of the teacher NN model (i.e. the first model) (Kang [0045], [0092], and [0106]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models in the cited reference to include the setting the weight in the second NN model in Kang. Doing so would enable training of a plurality of NN models comprising the teacher and student NN models, wherein “[a] plurality of teacher models 110 may be used to train the student model 120. At least one teacher model may be selected from the plurality of teacher models 110 and the student model 120 may be trained using the selected at least one teacher model. A process of selecting at least one teacher model from the plurality of teacher models 110 and training the student model 120 may be performed iteratively until the student model 120 satisfies a predetermined condition.” (Kang [0048]). The training comprising “[e]rror back-propagation learning” for “updating connection weights to reduce a loss” (Kang [0044]). 

While the cited references in combination teach the above limitations of claim 1, they do not explicitly teach: “by applying an increment to a value of the corresponding weight in the first model” on lines 12-13. Takatori teaches: application of an increase/increment to a weight value in the neural network whereby such increase can occur via a predetermined value, e.g. an increase of 5% (Takatori col. 4 lines 12-28 and 67; col. 5, lines 24-40; col. 6, lines 8-30; and col. 7, lines 2-30). The neural network being a first neural network (col. 3, lines 6-15 and col. 4, lines 52-55) as shown in Figs. 1 and 2.
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models and the setting the weight in the second NN model in the combined cited references to include the increase in weights in Takatori. Doing so would enable “[t]he weight increase of synapse for a learning change … with respect to the number of learnings. The learning for a whole system is gradually performed by a plurality of learnings, and a slight adjustment is performed with a little change at the end of learning. Learning speed is heightened as weight increases rapidly at the beginning of learning.” (Takatori col. 4, lines43-50). 

While the cited references in combination teach the above limitations of claim 1, they do not explicitly teach: “based on a difference between the corresponding weight in the first model and the weight in the second model” on lines 13-14. Aslan teaches: an objective function computation “with respect to weight parameters multiple models being trained” that comprises a difference determination between the first and second NN models (Aslan [0033] and [0036]-[0039]).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models, the setting the weight in the second NN model, and the increase in weights in the combined cited references to include the difference determination between the two NN models in Aslan. Doing so would enable “[j]oint training of the first model 100 and the second model 102 involves training the models 100 and 102 in parallel such that at least one of the models 100 and/or 102 influences the training of the other model… In this sense, the second (student) model 102 can be considered to be learning from the first (teacher) model 100 as the first model 100 learns.” (Aslan [0026]). Wherein “the second (student) model 102, is able to “see” what another model, such as the first (teacher) model 100, is learning by virtue of terms in the objective function that is optimized for training the respective models 100 and 102” (Aslan [0028]).

Regarding claim 2, the rejection of claim 1 is incorporated. The other cited references in combination do not explicitly teach: “wherein the weight comprises a weight in one of: 
an input layer of the second model; and a hidden layer of the second model”. Kang further teaches:
“wherein the weight comprises a weight in one of: 
an input layer of the second model; and 
a hidden layer of the second model (Kang [0083], [0085]-[0086], and [0092]: describing the hidden layers including classifier layers in the student NN model and the weights in those layers as shown in Fig. 5. Wherein the student NN model denotes the second model as was previously described.).”
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models, the increase in weights, and the difference determination between the two NN models in the combined cited references to include the weight in the hidden layer in the student NN model in Kang. Doing so would enable training of the student NN model via its hidden layers and classifier layers to update the various connection weights in the student NN model (Kang [0087]-[0088]).

Regarding claim 3, the rejection of claim 1 is incorporated. Mnih teaches:
The system as in claim 1 wherein causing the processor to adjust the corresponding weight in the first model comprises causing the processor to: 
copy a value of the weight from the second model to the corresponding weight in the first model ([0064]: describing that “weights from the second[] trained neural network are copied across to the first neural network”.).

Regarding claim 5, the rejection of claim 1 is incorporated. The other cited references in combination do not explicitly teach: “wherein causing the processor to adjust the corresponding weight in the first model further comprises causing the processor to: set a weight in an output layer of the first model to an arbitrary value.” Kang further teaches:
“wherein causing the processor to adjust the corresponding weight in the first model further comprises causing the processor to: set a weight in an output layer of the first model to an arbitrary value (Kang [0055]: describing setting an initial weight of the teacher NN model, i.e. the first model as previously described, to a random initial value. Wherein the teacher NN model comprises an output layer (Kang [0039] and [0041]). Thus, the random initialization of the teacher NN model includes the random initialization of the output layer in the teacher NN model.).” 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models, the increase in weights, and the difference determination between the two NN models in the combined cited references to include the random weight in teacher NN model in Kang. Doing so would enable “[t]he plurality of teacher models may have different initial weights” (Kang [0054]) as part of “a process of selecting one teacher model from a plurality of teacher models to train a student model” (Kang [0051]).

Regarding claim 7, the rejection of claim 1 is incorporated. The other cited references in combination do not explicitly teach: “wherein causing the processor to set a weight in the second model comprises causing the processor to: copy a value of a weight from one of: an input layer of the first model; and 10a hidden layer of the first model, to a corresponding weight in the second model.” Kang further teaches:
“wherein causing the processor to set a weight in the second model comprises causing the processor to: copy a value of a weight from one of: an input layer of the first model; and 10a hidden layer of the first model, to a corresponding weight in the second model (Kang [0086] and [0092]: describing that the weight of the student NN model (i.e. the second model) at the classifier layers (i.e. hidden layers) are being copied from the corresponding classifier layers (i.e. hidden layers) from the teacher NN model (i.e. the first model). This is shown in Fig. 5. See also Kang [0041] and [0046]-[0047]: further describing the NN models with the hidden/classifier layers.).”
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models, the increase in weights, and the difference determination between the two NN models in the combined cited references to include the copying of the weights from the teacher NN model to the student NN model in Kang. Doing so would enable training of a plurality of NN models comprising the teacher and student NN models, wherein “[a] plurality of teacher models 110 may be used to train the student model 120. At least one teacher model may be selected from the plurality of teacher models 110 and the student model 120 may be trained using the selected at least one teacher model. A process of selecting at least one teacher model from the plurality of teacher models 110 and training the student model 120 may be performed iteratively until the student model 120 satisfies a predetermined condition.” (Kang [0048]). The training comprising “[e]rror back-propagation learning” for “updating connection weights to reduce a loss” (Kang [0044]). 

Regarding claim 8, the rejection of claim 1 is incorporated. The other cited references in combination do not explicitly teach: “wherein causing the processor to set a weight in the second model further comprises causing the processor to: 15set at least one weight in an output layer of the second model to an arbitrary value.” Kang further teaches:
“wherein causing the processor to set a weight in the second model further comprises causing the processor to: 15set at least one weight in an output layer of the second model to an arbitrary value (Kang [0086] and [0092]: describing that the weight of the output layer of the student NN model (i.e. second model) can be set to an initial weight of the teacher NN model (i.e. first model). Wherein the initial weight of the teacher NN model is random/arbitrary (Kang [0055]). Thus, enabling the weight of the output layer of the student NN model to be random/arbitrary.).”
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models, the increase in weights, and the difference determination between the two NN models in the combined cited references to include the setting of random weight in the student NN model in Kang. Doing so would enable training of a plurality of NN models comprising the teacher and student NN models, wherein “[a] plurality of teacher models 110 may be used to train the student model 120 may be performed iteratively until the student model 120 satisfies a predetermined condition.” (Kang [0048]).

Regarding claim 10, the rejection of claim 1 is incorporated. Mnih teaches:
The system as in claim 1 wherein the first model comprises one of:  
25a model configured to produce a single output ([0062]: describing that the first NN model determines a target Qy value, i.e. a single output value.); and 
a model configured to produce a plurality of outputs; and 
wherein the second model comprises the other one of: 
a model configured to produce a single output; and 
a model configured to produce a plurality of outputs ([0061]: describing that the second NN model determines a set of output Q values, i.e. a plurality of output values.).

Regarding claim 11, the rejection of claim 1 is incorporated. Mnih teaches:
The system as in claim 1 wherein the set of instructions, when executed by the processor, further cause the processor to: 
adjust a weight in one of: 
the first model ([0064]: describing that weights in the first NN model are being changed/updated.); and  222017P02354US01 
the second model; 
in response to further training of the other one of: 
the first model; and 
the second model ([0064]: describing that weights in the first NN model are being changed/updated as “the training of the second neural network proceeds”.).

Regarding claim 12, the rejection of claim 11 is incorporated. The other cited references in combination do not explicitly teach: “The system as in claim 11 wherein the set of instructions, when executed by the processor, cause the processor to repeat the step of adjusting a weight, until one or more of the following criteria are met: the first model and/or the second model reach a threshold accuracy level; 10the magnitude of an adjustment falls below a threshold magnitude; said weight in the first model and its corresponding weight in the second model converge towards one another within a predefined threshold; and a loss associated with the first model and/or a loss associated with the second model changes by less than a threshold amount between subsequent adjustments.” Kang further teaches:
“The system as in claim 11 wherein the set of instructions, when executed by the processor, cause the processor to repeat the step of adjusting a weight, until one or more of the following criteria are met: 
the first model and/or the second model reach a threshold accuracy level (Kang [0044], [0062], [0088], [0110] and [0133]: describing adjusting of weights in the student NN model (i.e. second model) via error backpropagation and optimizing an objective function to update the “connection weights” accordingly such that the student NN model reaches a threshold accuracy.);  
10the magnitude of an adjustment falls below a threshold magnitude; 
said weight in the first model and its corresponding weight in the second model converge towards one another within a predefined threshold; and 
a loss associated with the first model and/or a loss associated with the second model changes by less than a threshold amount between subsequent adjustments.”
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models, the increase in weights, and the difference determination between the two NN models in the combined cited references to include the weight adjustment in the student NN model in correlation with an accuracy threshold in Kang. Doing so would enable training of a plurality of NN models comprising the teacher and student NN models, wherein “[a] plurality of teacher models 110 may be used to train the student model 120 may be performed iteratively until the student model 120 satisfies a predetermined condition.” (Kang [0048]).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Mnih et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0278018, hereinafter Mnih), Kang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0083829, hereinafter Kang), Takatori et. al. (U.S. Pat. No. 5,430,829, hereinafter Takatori), and Aslan et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0132528, hereinafter Aslan) in view of Go et. al. “Multigradient: A New Neural Network Learning Algorithm for Pattern Classification” (hereinafter Go).

Regarding claim 6, the rejection of claim 1 is incorporated. The cited references in combination do not explicitly teach: “maintain a value of at least one weight in an output layer of the first model at the same value”. Go teaches: the computation for adjusting weights in the output neurons of the neural network, wherein output neurons reside in the output layer of the neural network (Go Section II). Based on this computation, “we ignore the output neurons that exceed the target values and concentrate on the output neurons that do not meet the target values, updating weights accordingly” (Go Section II). That is, the weights in the ignored output neurons of the output layer are not being changed and are being maintained, while the other weights in the non-ignored output neurons are being changed accordingly. This results in at least one weight value in the output layer being maintained. The neural network denotes a first model. 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models, the setting the weight in the second NN model, the increase in weights, and the difference determination between the two NN models in the combined cited references to include the maintaining of the weights in the output layer in Go. Doing so would enable “new learning algorithm for multilayer feedforward neural networks, which converges faster and achieves a better classification accuracy than the conventional backpropagation learning algorithm for pattern classification…. In the proposed learning algorithm, we view each term of the output layer as a function of weights and adjust the weights directly so that the output neurons produce the desired outputs” (Go Abstract). That is, the new learning algorithm comprises “adjust[ing] each weight so that the output neurons can produce the desired outputs. This adjustment is accomplished by taking gradients.” (Go Section I). 

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Mnih et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0278018, hereinafter Mnih), Kang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0083829, hereinafter Kang), Takatori et. al. (U.S. Pat. No. 5,430,829, hereinafter Takatori), and Aslan et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0132528, hereinafter Aslan) in view of Dijkman et. al. (U.S. Pat. App. Pre-Grant Pub. No. 20170169314, hereinafter Dijkman).

Regarding claim 9, the rejection of claim 1 is incorporated. Mnih teaches:
The system as in claim 1 wherein the first model comprises one of: 
an object detection model ([0024] and [0080]-[0081]: describing that the first NN model can be used to recognize/detect local structures in the image data via filtering of the image.); and 
an object localisation model; 20and ….

While the cited reference Mnih teaches the above limitations of claim 9, it does not explicitly teach: “wherein the second model comprises the other one of: an object detection model; and an object localisation model.” Dijkman teaches: 
“wherein the second model comprises the other one of: 
an object detection model; and 
an object localisation model (Dijkman [0044], [0084], [0095], [0097], and [0102]: describing the NN model for object localization using bounding boxes. Wherein the NN model can operate on its respective processing unit (Dijkman [0058]), i.e. the second NN operating on the second processing unit. The processing units are shown in Fig. 2.).”
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models in the cited reference to include the object localization in Dijkman. Doing so would enable “[a] method of training for image classification includes labelling a crop from an image including an object of interest. The crop may be labelled with an indication of whether the object of interest is framed, partially framed or not present in the crop.” (Dijkman Abstract). Wherein the training comprises “high-quality object localization process [that] may include a bounding box proposal, a bounding box classification and bounding box regression” (Dijkman [0045]).

Claims 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over Mnih et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0278018, hereinafter Mnih), Kang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0083829, hereinafter Kang), Takatori et. al. (U.S. Pat. No. 5,430,829, hereinafter Takatori), and Aslan et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0132528, hereinafter Aslan) in view of Ura et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0122078, hereinafter Ura).

Regarding claim 13, the rejection of claim 1 is incorporated. The cited references in combination do not explicitly teach: “wherein the first model is trained on a second dataset, the first dataset comprising less data than the second dataset, wherein the size of the second dataset alone is insufficient ….” Ura teaches: 
“wherein the first model is trained on a second dataset, the first dataset comprising less data than the second dataset (Ura [0032] and [0040]: describing learning processes for training machine learning model (i.e. the first model), wherein the learning processes can be on a second sample data set that is larger than the first sample data set. That is, the first data set is smaller with less data than the second dataset.), 
wherein the size of the second dataset alone is insufficient (Ura [0034] and [0081]: describing that the respective machine learning model is dependent on not just training data set alone, but on other metrics, e.g. hyperparameters. Wherein the size of the second sample data set was previously described.)….”
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models, the setting the weight in the second NN model, the increase in weights, and the difference determination between the two NN models in the combined cited references to include the sizes of the data sets for training machine learning models in Ura. Doing so would enable “a model to be built by machine learning and its prediction performance” (Ura [0034] and [0053]). 

While the cited reference Ura teaches the above limitations of claim 13, it does not explicitly teach: “to train the second model to a predefined accuracy with arbitrarily initiated weights”. Kang teaches: training a student NN model (i.e. second model) to reach a predetermined accuracy condition (Kang [0048] and [0078]-[0080]). Wherein the weights of the student NN model can be initialized based the initial weights of the teacher NN model, with the initial weight of the teacher NN model is random/arbitrary (Kang [0055]). Thus, enabling the weights of the student NN model to be random/arbitrary.
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the sizes of the data sets for training machine learning models in the cited reference to include the accuracy and random initialization of the student NN model. Doing so would enable training of a plurality of NN models comprising the teacher and student NN models, wherein “[a] plurality of teacher models 110 may be used to train the student model 120. At least one teacher model may be selected from the plurality of teacher models 110 and the student model 120 may be trained using the selected at least one teacher model. A process of selecting at least one teacher model from the plurality of teacher models 110 and training the student model 120 may be performed iteratively until the student model 120 satisfies a predetermined condition.” (Kang [0048]). The training comprising “[e]rror back-propagation learning” for “updating connection weights to reduce a loss” (Kang [0044]). 

Regarding independent claim 14, claim 14 is substantially similar to a combination of independent claim 1 and dependent claim 13. Therefore, claim 14 is rejected on similar grounds as claims 1 and 13. Claim 14 is a method claim that corresponds to system claims 1 and 13. 

Regarding independent claim 15, claim 15 is substantially similar to independent claim 14 and therefore is rejected on similar grounds as claim 14. Claim 15 is a medium claim that corresponds to method claim 14. A mapping is shown below for the limitations of claim 15 that differ from claim 14.
Mnih teaches:
“A non-transitory a computer readable medium, the computer readable medium comprising computer readable code embodied therein ([0030], [0032], and [0084]: describing various non-transitory computer readable memory that can store codes or instructions for implementing the system and training of the neural networks.), 
the computer readable code being configured such that, on execution by a computer or processor, the computer or processor ([0030]-[0032] and [0084]: describing a processor that operates in conjunction with the memory to execute the codes or instructions stored in the memory. This is shown in Fig. 5b.): ….”
 
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Mnih et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0278018, hereinafter Mnih), Kang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0083829, hereinafter Kang), Takatori et. al. (U.S. Pat. No. 5,430,829, hereinafter Takatori), and Aslan et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0132528, hereinafter Aslan) in view of Chen et. al. “Diagnosis of Breast Tumors with Sonographic Texture Analysis Using Wavelet Transform and Neural Networks” (hereinafter Chen). 

Regarding claim 16, the rejection of claim 1 is incorporated. The other cited references in combination do not explicitly teach: “wherien the processor trains one of the first or second neural network models to detect a presence of a particular object in an image, and the processor trains the other of the first or second neural network models ….” Kang further teaches: 
“wherien the processor trains one of the first or second neural network models to detect a presence of a particular object in an image (Kang [0052] and [0113]: describing a data recognition process of objects that is performable by a processor. Wherein processor can train the neural network models to recognize data objects, i.e., neural network models are “capable of detecting objects from the target data, or classifying or clustering the objects” (Kang [0113]-[01158).), and the processor trains the other of the first or second neural network models (Kang [0122]-[0124]: describing that the processor can train the other neural network models.)….” 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models, the increase in weights, and the difference determination between the two NN models in the combined cited references to include the object detection using neural network models in Kang. Doing so would enable “data recognizing method [that] may be performed by a processor” (Kang [0113]).

While the cited reference Kang teaches the above limitations of claim 16, it does not explicitly teach: “to measure a length of a particular type of object in an image”. Chen teaches: the use of neural networks to measure radial line distances from a region of interest (ROI) in a medical image (e.g., medical image of a segmented tissue image with tumor) to determine the boundaries of the ROI (e.g., tumor) (Chen pgs. 1303-1305). Where the radial line distance denotes a length. 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the object detection using neural network models in Kang to include the radial line distance measurement in Chen. Doing so would enable techniques “[t]o increase the ability of ultrasonographic technology for the differential diagnosis of solid breast tumors, we describe a novel computer-aided diagnosis (CADx) system using neural networks for classification of breast tumors. Tumor regions and surrounding tissues are segmented from the physician-located region-of interest (ROI) images by applying our proposed segmentation algorithm…. A multilayered perceptron (MLP) neural network trained using error back-propagation algorithm with momentum was then used for the differential diagnosis of breast tumors on sonograms.” (Chen Abstract).

Claims 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Mnih et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0278018, hereinafter Mnih), Kang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0083829, hereinafter Kang), Takatori et. al. (U.S. Pat. No. 5,430,829, hereinafter Takatori), and Aslan et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0132528, hereinafter Aslan) in view of Islam et. al., “A Constructive Algorithm for Training Cooperative Neural Network Ensembles” (hereinafter Islam).

Regarding claim 17, the rejection of claim 1 is incorporated. The cited references in combination do not explicitly teach: “wherein at least one of the first model and second model is a partially trained model”. Islam teaches: a cooperative neural network ensembles (CNNE) wherein each of individual neural network (NN) models in the CNNE are partially trained (Islam Section III).  This is shown in Fig. 1 and described at Step 2. Wherein a number of individual NN models in the CNNE comprises a plurality of NN models that includes a first NN model and a second NN model, which are partially trained (see previous citations). See also Islam Section IV(A): describing various example applications of the CNNE, wherein an average of the NN models in the CNNE comprises values like 6.5, 3.9, 4.7, etc. depending on a particular experiment and parameters of the particular experiment. The numbers of individual NN models in the CNNE are tabulated in Islam Tables II-VIII. 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models, the setting the weight in the second NN model, the increase in weights, and the difference determination between the two NN models in the combined cited references in the combined cited references to include the partially trained NN models in Islam. Doing so would enable “a new constructive algorithm, called constructive NN ensemble (CNNE), for training cooperative NN ensembles. It is the first algorithm, to our best knowledge, that combines ensemble’s architecture design with cooperative training of individual NNs in an ensemble. It determines automatically not only the number of NNs in an ensemble, but also the number of hidden nodes in individual NNs. It uses incremental training based on negative correlation learning … in training individual NNs. The main advantage of negative correlation learning is that it encourages different individual NNs to learn different aspects of the training data so that the ensemble can learn the whole training data better.” (Islam Section I). Wherein the training of the CNNE includes partial training (Islam Section III). 

Regarding claim 18, the rejection of claim 1 is incorporated. The cited references in combination do not explicitly teach: “wherein the first model is a partially trained model”. Islam teaches: Islam teaches: a cooperative neural network ensembles (CNNE) wherein each of individual neural network (NN) models in the CNNE are partially trained (Islam Section III).  This is shown in Fig. 1 and described at Step 2. Wherein a number of individual NN models in the CNNE comprises a plurality of NN models that includes a first NN model, which is partially trained (see previous citations). See also Islam Section IV(A): describing various example applications of the CNNE, wherein an average of the NN models in the CNNE comprises values like 6.5, 3.9, 4.7, etc. depending on a particular experiment and parameters of the particular experiment. The numbers of individual NN models in the CNNE are tabulated in Islam Tables II-VIII, which includes the first NN model. 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models, the setting the weight in the second NN model, the increase in weights, and the difference determination between the two NN models in the combined cited references to include the partially trained first NN model in Islam. Doing so would enable “a new constructive algorithm, called constructive NN ensemble (CNNE), for training cooperative NN ensembles. It is the first algorithm, to our best knowledge, that combines ensemble’s architecture design with cooperative training of individual NNs in an ensemble. It determines automatically not only the number of NNs in an ensemble, but also the number of hidden nodes in individual NNs. It uses incremental training based on negative correlation learning … in training individual NNs. The main advantage of negative correlation learning is that it encourages different individual NNs to learn different aspects of the training data so that the ensemble can learn the whole training data better.” (Islam Section I). Wherein the training of the CNNE includes partial training (Islam Section III). 

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Mnih et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0278018, hereinafter Mnih), Kang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0083829, hereinafter Kang), Takatori et. al. (U.S. Pat. No. 5,430,829, hereinafter Takatori), and Aslan et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0132528, hereinafter Aslan) in view of Islam et. al., “A Constructive Algorithm for Training Cooperative Neural Network Ensembles” (hereinafter Islam) and Kabir et. al. “A New Wrapper Feature Selection Approach using Neural Network” (hereinafter Kabir).

Regarding claim 19, the rejection of claim 1 is incorporated. The cited references in combination do not explicitly teach: “wherein both the first and the second models are partially trained models and ….” Islam teaches: 
“wherein both the first and the second models are partially trained models (Islam Section III: describing a cooperative neural network ensembles (CNNE) wherein each of individual neural network (NN) models in the CNNE are partially trained.  This is shown in Fig. 1 and described at Step 2. Wherein a number of individual NN models in the CNNE comprises a plurality of NN models that includes a first NN model and a second NN model, which are both partially trained (see previous citations). See also Islam Section IV(A): describing various example applications of the CNNE, wherein an average of the NN models in the CNNE comprises values like 6.5, 3.9, 4.7, etc. depending on a particular experiment and parameters of the particular experiment. The numbers of individual NN models in the CNNE are tabulated in Islam Tables II-VIII, which includes the first and second NN models.) and….” 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models, the setting the weight in the second NN model, the increase in weights, and the difference determination between the two NN models in the combined cited references to include the partially trained first and second NN models in Islam. Doing so would enable “a new constructive algorithm, called constructive NN ensemble (CNNE), for training cooperative NN ensembles. It is the first algorithm, to our best knowledge, that combines ensemble’s architecture design with cooperative training of individual NNs in an ensemble. It determines automatically not only the number of NNs in an ensemble, but also the number of hidden nodes in individual NNs. It uses incremental training based on negative correlation learning … in training individual NNs. The main advantage of negative correlation learning is that it encourages different individual NNs to learn different aspects of the training data so that the ensemble can learn the whole training data better.” (Islam Section I). Wherein the training of the CNNE includes partial training (Islam Section III).

While the cited reference Islam teaches the above limitations of claim 19, it does not explicitly teach: “one of the first and the second models is trained more than the other of the first and the second models”. Kabir teaches: partial training of neural networks, wherein each NN model is partially trained for τ epochs, and if the terminating criterion is met, then no further training is needed (Kabir Section 3). However, if the terminating criterion is not met for a NN model, then that NN model is additionally trained (see previous citation). This is shown in Fig. 1. Thus, a NN model can be more trained than another NN model. 	
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the partially trained first and second NN models in the cited reference to include the additional training for each of the NN model in Kabir. . Doing so would enable an “algorithm [that] uses a constructive approach involving correlation information in selecting features and determining NN architectures. We call this algorithm as constructive approach for FS (CAFS). The aim of using correlation information in CAFS is to encourage the search strategy for selecting less correlated (distinct) features if they enhance accuracy of NNs. Such an encouragement will reduce redundancy of information resulting in compact NN architectures.” (Kabir Abstract). Wherein the CAFS involves “computation … for a partial training consisting of τ epochs. In general, CAFS needs several, say M, such partial trainings” (Kabir Section 3). 

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Mnih et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0278018, hereinafter Mnih), Kang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0083829, hereinafter Kang), Takatori et. al. (U.S. Pat. No. 5,430,829, hereinafter Takatori), and Aslan et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0132528, hereinafter Aslan) in view of Sarkar et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2018/0349788, hereinafter Sarkar).

Regarding claim 20, the rejection of claim 1 is incorporated. The cited references in combination do not explicitly teach: “a percentage of the difference between the corresponding weight in the first model and the weight in the second model”. Sarkar teaches: a weight evolution that is measured by a difference between initial and final weight values (Sarkar [0022]). The final weight value can be a value at training step t, also referred to as a jump step since it represents “when the introspection network 114 is used to train another neural network (e.g. neural network 154) at step t” (Sarkar [0025]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models, the setting the weight in the second NN model, the increase in weights, and the difference determination between the two NN models in the combined cited references to include the weight percentage in Sarkar. Doing so would enable “[a]n introspection network is a machine-learned neural network that accelerates training of other neural networks. The introspection network receives a weight history for each of a plurality of weights from a current training step for a target neural network. A weight history includes at least four values for the weight that are obtained during training of the target neural network up to the current step. The introspection network then provides, for each of the plurality of weights, a respective predicted value, based on the weight history. The predicted value for a weight represents a value for the weight in a future training step for the target neural network. Thus, the predicted value represents a jump in the training steps of the target neural network, which reduces the training time of the target neural network. The introspection network then sets each of the plurality of weights to its respective predicted value.” (Sarkar Abstract).

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Mnih et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0278018, hereinafter Mnih), Kang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0083829, hereinafter Kang), Takatori et. al. (U.S. Pat. No. 5,430,829, hereinafter Takatori), and Aslan et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0132528, hereinafter Aslan) in view of Zhou et. al., “Automated Assessment of Breast Tissue Density in Non-Contrast 3d CT Images Without Image Segmentation Based on a Deep CNN” (hereinafter Zhou).

Regarding claim 21, the rejection of claim 1 is incorporated. The cited references in combination do not explicitly teach: “wherein second dataset comprises medical images annotated with x,y co-ordinates of a center of a bounding box drawn around tissue of interest”. Zhou teaches: a plurality of medical computed tomography (CT) images of an organ that is labeled/annotated using a bounding box with (x,y,z) coordinates around the organ of interest (Zhou Sections 2 and 3). Wherein it is known that an organ contains tissues and thus, the bounding box is also around the tissue of interest. 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method for training the first and second NN models, the setting the weight in the second NN model, the increase in weights, and the difference determination between the two NN models in the combined cited references to include the labeling/annotating of the organ containing the tissues of interest in Zhou. Doing so would enable “a fast and robust segmentation scheme that automatically identifies and extracts a massive-organ region on torso CT images. In contrast to the conventional algorithms that are designed empirically for segmenting a specific organ based on traditional image processing techniques, the proposed scheme uses a fully data-driven approach to accomplish a universal solution for segmenting the different massive-organ regions on CT images. Our scheme includes three processing steps: machine-learning-based organ localization, content-based image (reference) retrieval, and atlas-based organ segmentation techniques.” (Zhou Abstract). 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure:
Nakano et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2018/0350069): describing first and second neural networks (NNs) for classification of medical computed tomography (CT) images, e.g., lung images. The NNs each having its own weight value and the values are adjusted based on error rate of a last prediction. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SELENE A HAEDI whose telephone number is (571)270-5762. The examiner can normally be reached M-F 11 AM - 7 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ RIVAS can be reached on (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.H./Examiner, Art Unit 2128                                                                                                                                                                                                        
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128