DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 1-15 are pending and are examined in this office action.
Claims 1-15 are rejected under 35 USC 103.

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/10/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective 

Claims 1, 3-6, 8-11, and 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over “Eads” (US 2014/0337269 A1) in view of “Cheng” (US 2016/0063396 A1), further in view of “Yan” (US 2011/0173116 A1).

	Regarding claim 1, Eads teaches
	A multi-sampling model training method, comprising: (Eads, Title and Abstract indicate that Eads is directed to techniques for training ensembles of decision trees. Figures 1A-1B provide an overview of a method. The sampling is performed at steps 140-142.)
	performing multi-sampling on samples to obtain a training set and a validation set in each sampling; (Eads, Figures 1A-1B, step 140 shows creating a variable D_inbag by sampling possibly with replacement from a dataset D provided at Step 104. This step is further described at [0072]. The set of samples identified by D_inbag corresponds to a training set. Step 142 shows creating a variable D_outbag using samples in D, but not in D_inbag. This is further described at [0073]. The set of samples identified by D_outbag corresponds to a validation set. As indicated in Figures 1A-1B, steps 140-142 are performed as part of a loop which generates the desired number of decision trees num_trees (see step 138 and [0071].)
	using the training set and the validation set obtained in each sampling as a group, and (Eads, Figures 1A-1B, the training set D_inbag and the validation set D_outbag correspond to each other since D_outbag is defined in terms of what is not in D_inbag. These two sets are used through the loop starting at decision step 138. That is, each iteration defines a D_inbag and a D_outbag which are used “as a group”.)
	performing model training and obtaining a trained model using the training set in each group; (Eads, Figures 1A-1B, step 144 shows learning a decision tree with the training set D_inbag as an input via the function “learn_tree” and setting the variable “tree” equal to the resulting decision tree. That is, a decision tree is both trained (in the “learn_tree” function) and obtained (since the variable “tree” is set equal to the output of the function). This step is further described at [0073]. This is performed using the D_inbag for the particular iteration. A particular method for training a decision tree is illustrated in Figures 2A-2B and described at [0074-0080]. The broadest reasonable interpretation of this limitation in view of the specification encompasses generating a trained model for each group.)
	…obtaining prediction results… (Eads, Figures 1A-1B, step 148 shows obtaining predictions from the tree. These are accumulated at steps 130 and 132 (depending on whether the task is a classification or regression task). Since this is performed for each model, it is also performed for any subset of the model including any models remaining after some were dropped.)
	obtaining an output model (Eads, Figures 1A-1B, element 102 indicates that the output of the algorithm is a forest (i.e., an ensemble of decision trees).)
	 Eads does not appear to explicitly teach 
	evaluating the trained model using the training set and the validation set in each group separately; 
	eliminating or retaining the trained model based on the evaluation results and a predetermined elimination criterion;
	… obtaining an output model by performing combined model training on the retained models using the prediction results.
	However, Cheng—directed to analogous art—teaches
	evaluating the trained model …eliminating or retaining the trained model based on [an evaluation result] and a predetermined elimination criterion;  (Cheng, Abstract describes using an ensemble of classifiers to obtain  prediction result. [0085-0089] describes removing a classifier based on a third or fourth weight value. The third (corresponding to first as indicated at [0086]) weight value computation is described at [0070-0075], where it is indicated that a classification accuracy for each 
	obtaining prediction results of the samples using retained models; and (Cheng, [0069] describes obtaining accuracies of the classifiers by applying the classifier to some of the samples. [0068] indicates that this may be performed for the classifiers to determine weights for the classifiers. [0090] describes obtaining fifth weights for the classifiers after removing any classifiers as described at [0085-0089] as described above.)
	obtaining an output model by performing combined model training on the retained models using the prediction results. (Cheng, [0062] indicates that the ensemble of models is used to determine a prediction based on prediction weights. That is, the prediction weights are parameters of the ensemble model. [0090-0091 describes obtaining the weights using the classification accuracies (i.e., using the prediction results). Since the determination of the weights is for the whole ensemble and is dependent on the classification accuracies of all of the models remaining in the ensemble (see [0071] for how the weights may be determined), it is a combined model training. The resulting model corresponds to an output model. This process of determining weights based on model accuracies and removing models based on the weights may be iterated as described at [0091-0096].)
	It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which the invention pertains to modify Eads to remove models from the ensemble and perform combined training as taught by Cheng as described above because “by dynamically updating the number of candidate classifiers, i.e. removing a candidate classifier which does not satisfy the classification requirement, or adding a new candidate classifier, a classification system capable of functioning properly, i.e. M target classifiers, is obtained; in this way, the problem in the prior art that using re-labeled training samples to re-construct a target classifier to replace an original target classifier makes it impossible to make full use of the original target classifier can be avoided, and the utilization rate of target classifiers can be effectively improved” (Cheng, [0046]) and because “by means of the technical solutions provided in the present invention, the classification result of data is not solely dependent on the predicted result of any one target classifier any more, but the predicted result of each 
	Eads and Cheng do not appear to explicitly teach 
	evaluating the trained model using the training set and the validation set in each group separately;
	eliminating or retaining the trained model based on the evaluation results and a predetermined elimination criterion;
	However, Yan—directed to analogous art--teaches
	evaluating the trained model using the training set and the validation set in each group separately; (Yan, Abstract describes determining a combined model based on a plurality of models (i.e., an ensemble). [0084-0085] describes determining a difference between the performance of a model on training data (i.e., a first evaluation) and the performance of the model on testing data (i.e., a second separate evaluation). This is step 326 in Figure 3. ) 
	eliminating or retaining the trained model based on the evaluation results (Yan, [0086] describes adjusting the model as necessary (step 328 of Figure 3) based on the evaluation result. In the combination with Eads and Cheng, Yan is relied upon to teach the difference between performance on a training set and performance on a validation set as a measure of model goodness. Cheng already teaches performing some model evaluation and making an elimination or retention decision on the basis of that evaluation. In the combination, the particular measure taught by Yan would be used in place of or in addition to the measures taught by Cheng used for the elimination/retention decision.)
	It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which the invention pertains to modify Eads and Cheng to consider the difference between performance on the training and testing datasets because this difference “demonstrates how robust the model is and how much the model is able to generalize to other datasets. The closer the two performances are, the more robust the model is.” (Yan, [0084]). Making a model elimination/retention decision based on this measure would therefore result in more robust models being retained, resulting in a more robust ensemble. 

	Regarding claim 3, the rejection of claim 1 is incorporated herein. Eads does not appear to explicitly teach 
	wherein evaluating the trained model using the training set and the validation set in each group separately, and eliminating or retaining the trained model based on the evaluation results and a predetermined elimination criterion, comprises: 
	obtaining a performance index corresponding to the trained model; 
	obtaining a feature value of the trained model based on the performance index; and 
	eliminating the trained model in response to determining that the feature value is less than a predetermined threshold. 
	However, Cheng—directed to analogous art—teaches
	wherein evaluating the trained model …eliminating or retaining the trained model based on [an evaluation result] and a predetermined elimination criterion, comprises: (Cheng, Abstract describes using an ensemble of classifiers to obtain  prediction result. [0085-0089] describes removing a classifier based on a third or fourth weight value. The third (corresponding to first as indicated at [0086]) weight value computation is described at [0070-0075], where it is indicated that a classification accuracy for each model is computed. This in turn is used to compute the weight, which may be compared to a threshold (see, e.g., [0087]) to determine whether or not the classifier is removed.)
	obtaining a performance index corresponding to the trained model; (Cheng, Abstract describes using an ensemble of classifiers to obtain  prediction result. [0085-0089] describes removing a classifier based on a third or fourth weight value. The third (corresponding to first as indicated at [0086]) weight value computation is described at [0070-0075], where it is indicated that a classification accuracy for each model is computed. The classification accuracy corresponds to a performance index corresponding to the model for which it was computed.)
	obtaining a feature value of the trained model based on the performance index; and (Cheng, Abstract describes using an ensemble of classifiers to obtain  prediction result. [0085-0089] describes removing a classifier based on a third or fourth weight value. The third (corresponding to first as indicated at [0086]) weight value computation is described at [0070-0075], where it is indicated that a 
	eliminating the trained model in response to determining that the feature value is less than a predetermined threshold. (Cheng, Abstract describes using an ensemble of classifiers to obtain  prediction result. [0085-0089] describes removing a classifier based on a third or fourth weight value. The third (corresponding to first as indicated at [0086]) weight value computation is described at [0070-0075], where it is indicated that a classification accuracy for each model is computed. This in turn is used to compute the weight, which may be compared to a threshold (see, e.g., [0087]) to determine whether or not the classifier is removed. In the given example, it may be removed if the value is less than 0.5. The use of the weight for evaluating may be used in addition to the difference taught by Yan as described above. As written, the claim does not tie the performance index or feature value to the evaluation results. Cheng also teaches multiple criteria (e.g., third and fourth weight as described at [0089]) for determining model elimination.)
	It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention to have performed this combination for the reasons given above with respect to claim 1.
	Cheng does not appear to explicitly teach 
	wherein evaluating the trained model using the training set and the validation set in each group separately and eliminating or retaining the trained model based on the evaluation results and a predetermined elimination criterion, comprises
	However, Yan—directed to analogous art—teaches
	wherein evaluating the trained model using the training set and the validation set in each group separately and (Yan, Abstract describes determining a combined model based on a plurality of models (i.e., an ensemble). [0084-0085] describes determining a difference between the performance of a model on training data (i.e., a first evaluation) and the performance of the model on testing data (i.e., a second separate evaluation). This is step 326 in Figure 3. )
	eliminating or retaining the trained model based on the evaluation results and a predetermined elimination criterion, comprises: (Yan, [0086] describes adjusting the model as 
	obtaining a performance index corresponding to the trained model; (Yan, Abstract describes determining a combined model based on a plurality of models (i.e., an ensemble). [0084-0085] describes determining a difference between the performance of a model on training data (i.e., a first evaluation) and the performance of the model on testing data (i.e., a second separate evaluation). This is step 326 in Figure 3. The performance on, say, the training data could correspond to the performance index. In the combination with Cheng, this performance index may be used in addition to the performance index described above.)
	obtaining a feature value of the trained model based on the performance index; and (Yan, Abstract describes determining a combined model based on a plurality of models (i.e., an ensemble). [0084-0085] describes determining a difference between the performance of a model on training data (i.e., a first evaluation) and the performance of the model on testing data (i.e., a second separate evaluation). This is step 326 in Figure 3. The performance on, say, the training data could correspond to the performance index. The difference between the two would correspond to the feature value. In the combination with Cheng, this feature value may be used in addition to the feature value described above.)
	eliminating the trained model [based on the feature value] (Yan, [0086] describes adjusting the model as necessary (step 328 of Figure 3) based on the evaluation result. In the combination with Eads and Cheng, Yan is relied upon to teach the difference between performance on a training set and performance on a validation set as a measure of model goodness. Cheng already teaches performing some model evaluation and making an elimination or retention decision on the basis of that evaluation. In the combination, the particular measure taught by Yan may be used in addition to the measures taught by Cheng used for the elimination/retention decision.)


	Regarding claim 4, the rejection of claim 1 is incorporated herein. Furthermore, Eads teaches
	wherein using the training set and the validation set obtained in each sampling as a group, and (Eads, Figures 1A-1B, the training set D_inbag and the validation set D_outbag correspond to each other since D_outbag is defined in terms of what is not in D_inbag. These two sets are used through the loop starting at decision step 138. That is, each iteration defines a D_inbag and a D_outbag which are used “as a group”.)
	performing model training and obtaining a trained model using the training set in each group comprises:  (Eads, Figures 1A-1B, step 144 shows learning a decision tree with the training set D_inbag as an input via the function “learn_tree” and setting the variable “tree” equal to the resulting decision tree. That is, a decision tree is both trained (in the “learn_tree” function) and obtained (since the variable “tree” is set equal to the output of the function). This step is further described at [0073]. This is performed using the D_inbag for the particular iteration. A particular method for training a decision tree is illustrated in Figures 2A-2B and described at [0074-0080])
	obtaining, by training, a first set of model parameters of the trained model.  (Eads, Figures 1A-1B, step 144 shows learning a decision tree. A particular method for training a decision tree is illustrated in Figures 2A-2B and described at [0074-0080]. [0028-0029] indicates that a decision tree consists of a set of nodes and at each of the nodes performs a test on a particular feature. The information encoded by a node is described at [0030-0038]. The collection of nodes along with the node thresholds (i.e., splits) are parameters which are determined by training (see especially [0075, 0077, 0079-0080] for the determination of these parameters in the training).  

	Regarding claim 5, the rejection of claim 4 is incorporated herein. Furthermore, Eads teaches
	wherein obtaining prediction results of the samples using the retained models, and (Eads, Figures 1A-1B, step 148 shows obtaining predictions from the tree. These are accumulated at steps 130 
	Eads does not appear to explicitly teach 
	obtaining an output model by performing combined model training on the retained models using the prediction results comprises:
	obtaining first prediction values of each sample of the samples based on the first set of model parameters of each retained model;
	using the first prediction values of each sample of the samples to perform another model training to obtain a second set of model parameters; and 
	obtaining a second prediction value of each sample of the samples based on the second set of model parameters and the first prediction values of each sample of the samples, and using the second prediction value as a final output of the output model.
	However, Cheng—directed to analogous art—teaches
	obtaining an output model by performing combined model training on the retained models using the prediction results comprises: (Cheng, [0062] indicates that the ensemble of models is used to determine a prediction based on prediction weights. That is, the prediction weights are parameters of the ensemble model. [0068-0069] describes obtaining the weights using the classification accuracies (i.e., using the prediction results). Since the determination of the weights is for the whole ensemble and is dependent on the classification accuracies of all of the models in the ensemble, it is a combined model training. The resulting model corresponds to an output model.)
	obtaining first prediction values of [some of] the samples based on the first set of model parameters of each retained model; (Cheng, [0069] describes obtaining accuracies of the classifiers by applying the classifier to some of the samples. [0068] indicates that this may be performed for the classifiers to determine weights for the classifiers. [0090] describes obtaining fifth weights for the classifiers after removing any classifiers as described at [0085-0089] as described above.)
	using the first prediction values of [some of] the samples to perform another model training to obtain a second set of model parameters; and (Cheng, [0062] indicates that the ensemble of 
	obtaining [prediction values of an ensemble model] based on the second set of model parameters and the [prediction values of the constituent models], and using [prediction values of an ensemble model] as a final output of the output model. (Cheng, [0062] indicates that the ensemble of models is used to determine a prediction based on prediction weights. [0097] clarifies that the ensemble classifier operates by inputting data into each of the classifiers in the ensemble to obtain results for these and then determining the output of the ensemble classifier (i.e., the final output of the output model) using the weights (i.e., second parameters). To clarify the distinction with the claims, Cheng teaches an ensemble model which would operate to determine an ensemble output as claimed, but does not explicitly teach applying the model to “the samples” which were used to train the model. The only difference is the data to which the model is applied.)
	It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention to have performed this combination for the reasons given above with respect to claim 1.
	Eads and Cheng do not appear to explicitly teach 
	…obtaining first prediction values of each sample of the samples based on the first set of model parameters of each retained model;
	using the first prediction values of each sample of the samples to perform another model training to obtain a second set of model parameters; and
	…obtaining a second prediction value of each sample of the samples based on the second set of model parameters and the first prediction values of each sample of the samples, and using the second prediction value as a final output of the output model.
	However, Yan—directed to analogous art—teaches 
	obtaining first prediction values of each sample of the samples based on the first set of model parameters of each retained model; using the first prediction values of each sample of the samples to perform another model training (Yan, Abstract describes determining a combined model based on a plurality of models (i.e., an ensemble). [0084-0085] describes determining a difference between the performance of a model on training data (i.e., a first evaluation) and the performance of the model on testing data (i.e., a second separate evaluation). This is step 326 in Figure 3. [0086] describes adjusting the model as necessary (step 328 of Figure 3) based on the evaluation result. In the combination with Eads and Cheng, Yan is relied upon to teach the difference between performance on a training set and performance on a validation set as a measure of model goodness. Cheng already teaches performing some model evaluation and making an elimination or retention decision on the basis of that evaluation. In the combination, the particular measure taught by Yan would be used in place of or in addition to the measures taught by Cheng used for the elimination/retention decision. Since the ensemble composition may change based on the result of the elimination or retention decision, the prediction values for each of the samples is used as part of the training to determine the second set of model parameters (since the second set of model parameters taught by Cheng depends on all of the remaining models as described above.)
	obtaining a second prediction value of each sample of the samples based on the second set of model parameters and the first prediction values of each sample of the samples, and using the second prediction value as a final output of the output model. (Yan, Abstract describes determining a combined model based on a plurality of models (i.e., an ensemble). [0084-0085] describes determining a difference between the performance of a model on training data (i.e., a first evaluation) and the performance of the model on testing data (i.e., a second separate evaluation). This is step 326 in Figure 3. That is, Yan teaches validating the model by applying the trained model to both the training and validation sets (i.e., to each sample of the samples). In the combination with Cheng, Cheng teaches that applying the trained model includes using the second set of model parameters and prediction values determined by the models in the ensemble to determine the output. Yan is relied upon to teach applying the trained model to the training and testing data.)

	
	Regarding claim 6, Eads teaches
	A multi-sampling model training device, comprising: a memory storing a set of instructions; and a processor configured to execute the set of instructions to cause the multi-sampling model training device to perform: (Eads, Title and Abstract indicate that Eads is directed to techniques for training ensembles of decision trees. Figures 1A-1B provide an overview of a method. The sampling is performed at steps 140-142. [0301-0303] indicates that the techniques taught by Eads may be implemented using a memory storing instructions which may be executed by a processor.)
	The remainder of claim 6 is substantially similar to claim 1 and is rejected with the same rationale, mutatis mutandis.
	
	Claims 8-10 are substantially similar to claims 3-5 and are rejected with the same rationale in view of the rejection of claim 6, mutatis mutandis.

	Regarding claim 11, Eads teaches
	A non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computer-to cause the computer to perform a multi-sampling model training method, comprising: (Eads, Title and Abstract indicate that Eads is directed to techniques for training ensembles of decision trees. Figures 1A-1B provide an overview of a method. The sampling is performed at steps 140-142. [0301-0303] indicates that the techniques taught by Eads may be implemented using a memory storing instructions which may be executed by processor.)
	The remainder of claim 11 is substantially similar to claim 1 and is rejected with the same rationale, mutatis mutandis.

mutatis mutandis.

	Claims 2, 7, and 12 are rejected under 35 U.S.C. 103 as being unpatentable over “Eads” (US 2014/0337269 A1) in view of “Cheng” (US 2016/0063396 A1), further in view of “Yan” (US 2011/0173116 A1), and further in view of “Mayle” (US 2016/0012317 A1).

	Regarding claim 2, the rejection of claim 1 is incorporated herein. Furthermore, Eads teaches
	wherein performing multisampling on samples to obtain a training set and a validation set in each sampling comprises: 
	…performing repeated sampling…to obtain n training sets and n validation sets. (Eads, Figures 1A-1B, step 140 shows creating a variable D_inbag by sampling possibly with replacement from a dataset D provided at Step 104. This step is further described at [0072]. The set of samples identified by D_inbag corresponds to a training set. Step 142 shows creating a variable D_outbag using samples in D, but not in D_inbag. This is further described at [0073]. The set of samples identified by D_outbag corresponds to a validation set. As indicated in Figures 1A-1B, steps 140-142 are performed as part of a loop which generates the desired number of decision trees num_trees (see step 138 and [0071].)
	The combination of Eads, Cheng and Yan does not appear to explicitly teach 
	dividing the samples into m subsets; and performing repeated sampling on the m subsets
	However, Mayle—directed to analogous art—teaches
	dividing the samples into m subsets; and performing repeated sampling on the m subsets (Mayle, [0127-0133] describes performing stratified sampling to determine training and test sets. The stratification includes dividing the samples into two sets (see [0128]) and then sampling from these sets using the algorithm described at [0129-0133].)
	It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which the invention pertains to modify The combination of Eads, Cheng, and Yan to use stratified sampling as taught by Mayle and described above because this ensures an approximately equal proportion of classes represented in the training and testing set, while a substantially 

	Claims 7 and 12 are substantially similar to claim 2 and are rejected with the same rationale in view of the rejections of claims 6 and 11.

 Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Cichosz (US 2015/0032674 A1) – Also teaches determining data bags to train a plurality of trees to be used in an ensemble. See Abstract.
Narsky (US 9,501,749 B1) – Abstract describes generating a model. Column 3, lines 4-14 indicate that the model may be an ensemble. Column 14, lines 31-59 describe shrinking the model based on a determination of weights of the constituent models.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Markus A Vasquez whose telephone number is (303)297-4432. The examiner can normally be reached Monday to Friday 9AM to 2PM MT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, 





/MARKUS A. VASQUEZ/Examiner, Art Unit 2121