DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings are deemed acceptable for the purpose of examination.
Specification
The specification is deemed acceptable for the purpose of examination.
Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in China on 04/27/2016. It is noted, however, that applicant has not filed a certified English translated copy of the CN201610269127.2 application as required by 37 CFR 1.55.
Failure to provide a certified translation may result in no benefit being accorded for the non-English application.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 36 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. As per claim 36, the claim limitation recites “A computer storage medium”. However, the usage of the phrase “a computer storage medium” is broad enough to In re Nuijten, 500 F.3d 1346, 1356-57 (Fed. Cir. 2007) (transitory embodiments are not directed to statutory subject matter). Therefore, claim 36 is non-statutory. The applicant should amend the claim to recite a non-transitory computer storage medium. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 13, 29, and 36 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pub. No. US 20140279760 A1 to Aliferis, et al. (hereinafter, “Aliferis”) in view of WIPO No. WO 2013067337 A1 to Donaldson, et al. (hereinafter, “Donaldson”)
As per claim 1, Aliferis teaches a method for presenting a prediction model, comprising:
acquiring at least one prediction result obtained by the prediction model with respect to at least one prediction sample (Aliferis, Para [0090] discloses “Learn a model M1 from dataset D” and “Create new data D1 that comprises of the generated inputs followed by the corresponding M1 model-estimated outputs” (The prediction model is M1, the prediction sample is dataset D, model-estimated outputs are prediction results))
acquiring at least one decision tree training sample for training a decision tree model based on the at least one prediction sample and the at least one prediction result (Aliferis, Para. [0090] discloses “Generate input patterns from B1 or D using statistical sampling“ and “Create new data D1 that comprises of the generated inputs followed by the corresponding M1 model-estimated outputs” (new data D1 is the decision tree training sample that is based on prediction samples (prediction samples are generated input patterns) and prediction results (the model-estimated outputs)))
wherein the decision tree model is used to fit the prediction model (Aliferis, Para. [0104] discloses “In step 7, the decision tree models can be combined to provide an explanation of model M1” (M1 is the prediction model that is fit to the decision tree))
training the decision tree model using the at least one decision tree training sample (Aliferis, Para. [0103] discloses “Step 5 performs feature selection to reduce the dimensionality of the input space for decision tree learning” (input space refers to dataset D1 which is the decision tree training sample))
Aliferis fails to explicitly teach:
visually presenting the trained decision tree model
However, Donaldson (Donaldson addresses the issue of visually displaying decision trees) teaches:
visually presenting the trained decision tree model (Donaldson, FIG. 2 discloses a decision tree visualization system and Page 4 discloses “a relatively large amount of sample data 110 may be used for generating or training a first portion of decision tree 7 and a relatively small amount of sample data 110 may be used for generating a second portion of decision tree 17” (a trained decision tree may be visually displayed))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify representing a prediction model in the form of a decision tree as disclosed by Aliferis to use the visualization system for decision trees as disclosed by Donaldson. The combination would have been obvious because a person of ordinary skill in the art would be motivated to visually understand the decision tree to grasp how and why predictions are being made the way they are. Additionally, visually displaying of the decision tree may allow one to further refine the prediction model to improve prediction results.

As per claim 13, the combination of Aliferis and Donaldson as shown above teaches the method according to claim 1, Donaldson further teaches:
wherein the visually presenting of the trained decision tree model comprises visually presenting the trained decision tree model through a pruning process, wherein a node which is cut in the pruning process is not presented, or is presented implicitly (Donaldson, Abstract discloses “A visualization system may automatically prune the decision tree model based on characteristics of nodes or branches in the decision tree or based on artifacts associated with model generation”)
Same motivation to combine Aliferis and Donaldson as claim 1

As per claim 29, Aliferis teaches a computing device for presenting a prediction model, following steps are performed:
acquiring at least one prediction result obtained by the prediction model with respect to at least one prediction sample (Aliferis, Para [0090] discloses “Learn a model M1 from dataset D” and “Create new data D1 that comprises of the generated inputs followed by the corresponding M1 model-estimated outputs” (The prediction model is M1, the prediction sample is dataset D, model-estimated outputs are prediction results))
acquiring at least one decision tree training sample for training a decision tree model based on the at least one prediction sample and the at least one prediction result (Aliferis, Para. [0090] discloses “Generate input patterns from B1 or D using statistical sampling“ and “Create new data D1 that comprises of the generated inputs followed by the corresponding M1 model-estimated outputs” (new data D1 is the decision tree training sample that is based on prediction samples (prediction samples are generated input patterns) and prediction results (the model-estimated outputs)))
wherein the decision tree model is used to fit the prediction model (Aliferis, Para. [0104] discloses “In step 7, the decision tree models can be combined to provide an explanation of model M1” (M1 is the prediction model that is fit to the decision tree))
(Aliferis, Para. [0103] discloses “Step 5 performs feature selection to reduce the dimensionality of the input space for decision tree learning” (input space refers to dataset D1 which is the decision tree training sample))
Aliferis fails to explicitly teach:
visually presenting the trained decision tree model and comprising a storage component in which a set of computer-executable instructions is stored, and a processor, wherein when the set of the computer-executable instructions is executed by the processor
However, Donaldson teaches:
visually presenting the trained decision tree model (Donaldson, FIG. 2 discloses a decision tree visualization system and Page 4 discloses “a relatively large amount of sample data 110 may be used for generating or training a first portion of decision tree 7 and a relatively small amount of sample data 110 may be used for generating a second portion of decision tree 17” (a trained decision tree may be visually displayed))
comprising a storage component in which a set of computer-executable instructions is stored, and a processor, wherein when the set of the computer-executable instructions is executed by the processor (Donaldson, Page 17 discloses “Processors 004 may execute instructions or "code" 1006 stored in any one of memories 1008, 1010, or 1020.”)
Same motivation to combine Aliferis and Donaldson as claim 1



As per claim 36, Aliferis teaches:
acquiring at least one prediction result obtained by the prediction model with respect to at least one prediction sample (Aliferis, Para [0090] discloses “Learn a model M1 from dataset D” and “Create new data D1 that comprises of the generated inputs followed by the corresponding M1 model-estimated outputs” (The prediction model is M1, the prediction sample is dataset D, model-estimated outputs are prediction results))
acquiring at least one decision tree training sample for training a decision tree model based on the at least one prediction sample and the at least one prediction result (Aliferis, Para. [0090] discloses “Generate input patterns from B1 or D using statistical sampling“ and “Create new data D1 that comprises of the generated inputs followed by the corresponding M1 model-estimated outputs” (new data D1 is the decision tree training sample that is based on prediction samples (prediction samples are generated input patterns) and prediction results (the model-estimated outputs)))
wherein the decision tree model is used to fit the prediction model (Aliferis, Para. [0104] discloses “In step 7, the decision tree models can be combined to provide an explanation of model M1” (M1 is the prediction model that is fit to the decision tree))
training the decision tree model using the at least one decision tree training sample (Aliferis, Para. [0103] discloses “Step 5 performs feature selection to reduce the dimensionality of the input space for decision tree learning” (input space refers to dataset D1 which is the decision tree training sample))
Aliferis fails to explicitly teach:
and a computer storage medium storing instructions that when executed by a processor cause the processor to perform operations comprising
However, Donaldson teaches:
visually presenting the trained decision tree model (Donaldson, FIG. 2 discloses a decision tree visualization system and Page 4 discloses “a relatively large amount of sample data 110 may be used for generating or training a first portion of decision tree 7 and a relatively small amount of sample data 110 may be used for generating a second portion of decision tree 17” (a trained decision tree may be visually displayed))
a computer storage medium storing instructions that when executed by a processor cause the processor to perform operations comprising (Donaldson, Page 17 discloses “Processors 004 may execute instructions or "code" 1006 stored in any one of memories 1008, 1010, or 1020.” and “"Computer-readable storage medium" (or alternatively, "machine-readable storage medium") may include ail of the foregoing types of memory, as well as new technologies that may arise in the future, as long as they may be capable of storing digital information in the nature of a computer program or other data, at least temporarily, in such a manner that the stored information may be "read" by an appropriate processing device”)
Same motivation to combine Aliferis and Donaldson as claim 1

s 2-6, 31-33 are rejected under 35 U.S.C. 103 as being unpatentable over Aliferis in view of Donaldson further in view of JP Pub. No. JP 2012053880 A to Aaron, et al. (hereinafter, “Aaron”)
As per claim 2, the combination of Aliferis and Donaldson as shown above teaches the method according to claim 1, Aliferis further teaches wherein in the acquiring at least one decision tree training sample for training a decision tree model based on the at least one prediction sample and the at least one prediction result:
using at least one portion of features of the prediction sample as features of the decision tree training sample (Aliferis, Para. [0103] discloses “Step 5 performs feature selection to reduce the dimensionality of the input space for decision tree learning” (features from the prediction sample are used in the decision tree training sample))
The combination of Aliferis and Donaldson fails to explicitly teach:
and acquiring a label of the decision tree training sample based on correspondingly obtained prediction result
However, Aaron (Aaron addresses the issue of creating a system for modeling and visualization of empirical data) teaches:
and acquiring a label of the decision tree training sample based on correspondingly obtained prediction result (Aaron, Para. [0038] discloses “The simplest quantization method is based on a fixed-size subrange, ie bin width (sometimes known as “fixed binning”), where the entire range of values associated with each input is equally spaced or Divided into equal-sized subranges or bins” (binning acquires labels, thus binning the appropriate data in Aliferis would result in labels for the decision tree training sample))


As per claim 3, the combination of Aliferis, Donaldson, and Aaron as shown above teaches the method according to claim 2, Aliferis further teaches:
wherein the at least one portion of the features of the prediction sample comprise a feature that plays a main role of prediction, and/or a feature that is easy to be understood by a user, among the features of the prediction sample (Aliferis, Para. [0103] discloses “Step 5 performs feature selection to reduce the dimensionality of the input space for decision tree learning. The Markov Boundary of a variable is typically a very small subset of the original input variables but is mathematically guaranteed to contain all predictive information about the variable that is contained in the full data” (Using the Markov boundary of features would result in features that play a main role of prediction as features are selected that contain all of the predictive information))

As per claim 4, the combination of Aliferis, Donaldson, and Aaron as shown above teaches the method according to claim 2, Aaron further teaches:
(Aaron, Para. [0029] discloses ”The determination of a subset of the feature data set that most accurately predicts the system output from the system input” and “In other cases, this may not be the case and it may be necessary to transform the data to create more appropriate “eigenvectors” that represent the data. Commonly used transformations include singular value decomposition (singular value decomposition) {Esday (SVD)}, principal component analysis) {PCA}, partial least square method) {PLS method} (Portion of features are transformed. It should be clear to a person of ordinary skill in the art that transformation of features can be done to reduce features to improve accuracy thus which would also improve interpretability of a node in a decision tree))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Aliferis as modified to use the feature transformation method as disclosed by Aaron. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve accuracy of the model as feature transformation methods reduces the number of features which subsequently can allow for a more accurate model to be created. Feature transformation will additionally allow for a simpler modeled to be created which would be easier to understand visually.

As per claim 5, the combination of Aliferis, Donaldson, and Aaron as shown above teaches the method according to claim 4, Aaron further teaches wherein the transforming of the at least one portion of the features of the prediction sample comprises:
transforming at least one feature subset among the at least one portion of the features of the prediction sample into at least one corresponding transformation feature subset respectively (Aaron, Para. [0029] discloses “The determination of a subset of the feature data set that most accurately predicts the system output from the system input” and “In other cases, this may not be the case and it may be necessary to transform the data to create more appropriate “eigenvectors” that represent the data. Commonly used transformations include singular value decomposition (singular value decomposition) {Esday (SVD)}, principal component analysis) {PCA}, partial least square method) {PLS method} (Subset of features are to be transformed))
Same motivation to combine Aliferis, Donaldson, and Aaron as claim 4

As per claim 6, the combination of Aliferis, Donaldson, and Aaron as shown above teaches the method according to claim 5, Aaron further teaches:
wherein a number of the features of the transformation feature subset is less than or equal to a number of the features of a corresponding feature subset before being transformed (Aaron, Para. [0029] discloses “The determination of a subset of the feature data set that most accurately predicts the system output from the system input” and “The determination of a subset of the feature data set that most accurately predicts the system output from the system input” and “In other cases, this may not be the case and it may be necessary to transform the data to create more appropriate “eigenvectors” that represent the data. Commonly used transformations include singular value decomposition (singular value decomposition) {Esday (SVD)}, principal component analysis) {PCA}, partial least square method) {PLS method} (Taking a subset of features, before they are transformed, results in the subset having the same number or fewer elements))
Same motivation to combine Aliferis, Donaldson, and Aaron as claim 4

As per claim 31, the combination of Aliferis and Donaldson teaches the computing device according to claim 29, Aliferis further teaches wherein in the acquiring at least one decision tree training sample for training a decision tree model based on the at least one prediction sample and the at least one prediction result:
using at least one portion of features of the prediction sample as features of the decision tree training sample (Aliferis, Para. [0103] discloses “Step 5 performs feature selection to reduce the dimensionality of the input space for decision tree learning” (features from the prediction sample are used in the decision tree training sample))
The combination of Aliferis and Donaldson fails to explicitly teach:
and acquiring a label of the decision tree training sample based on correspondingly obtained prediction result
However, Aaron teaches:
and acquiring a label of the decision tree training sample based on correspondingly obtained prediction result (Aaron, Para. [0038] discloses “The simplest quantization method is based on a fixed-size subrange, ie bin width (sometimes known as “fixed binning”), where the entire range of values associated with each input is equally spaced or Divided into equal-sized subranges or bins” (binning acquires labels, thus binning the appropriate data in Aliferis would result in labels for the decision tree training sample))
Same motivation to combine Aliferis, Donaldson, and Aaron as claim 2

As per claim 32, the combination of Aliferis, Donaldson, and Aaron as shown above teaches the computing devices according to claim 31, Aliferis further teaches:
wherein the at least one portion of the features of the prediction sample comprise a feature that plays a main role of prediction, and/or a feature that is easy to be understood by a user, among the features of the prediction sample (Aliferis, Para. [0103] discloses “Step 5 performs feature selection to reduce the dimensionality of the input space for decision tree learning. The Markov Boundary of a variable is typically a very small subset of the original input variables but is mathematically guaranteed to contain all predictive information about the variable that is contained in the full data” (Using the Markov boundary of features would result in features that play a main role of prediction as features are selected that contain all of the predictive information))

As per claim 33, the combination of Aliferis, Donaldson, and Aaron as shown above teaches the method according to claim 31, Aaron further teaches:
wherein, the at least one portion of the features of the prediction sample are transformed, in consideration of an expected scale of the decision tree model and/or node interpretability of the decision tree model (Aaron, Para. [0029] discloses ”The determination of a subset of the feature data set that most accurately predicts the system output from the system input” and “In other cases, this may not be the case and it may be necessary to transform the data to create more appropriate “eigenvectors” that represent the data. Commonly used transformations include singular value decomposition (singular value decomposition) {Esday (SVD)}, principal component analysis) {PCA}, partial least square method) {PLS method} (Portion of features are transformed. It should be clear to a person of ordinary skill in the art that transformation of features can be done to reduce features to improve accuracy thus which would also improve interpretability of a node in a decision tree))
Same motivation to combine Aliferis, Donaldson, and Aaron as claim 4

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Aliferis in view of Donaldson further in view of Aaron, and further in view of U.S. Pub. No. US 20100312727 A1 to Pottenger, et al. (hereinafter, “Pottenger”)
As per claim 7, the combination of Aliferis, Donaldson, and Aaron as shown above teaches the method according to claim 5, the combination of Aliferis, Donaldson, and Aaron fails to explicitly teach:
 wherein the feature subset before being transformed indicates attribute information of the prediction sample, and the corresponding transform feature subset indicates statistical information or weight information of the attribute information
However, Pottenger (Pottenger addresses the issue of transforming data in vector form) teaches:
(Pottenger, Abstract discloses “Each vector is composed of a set of attributes that are either boolean or have been mapped to boolean form. The vectors may or may not fall into categories assigned by a subject matter expert (SME). If categories exist, the categorical labels divide the vectors into subsets. The first transformation calculates a prior probability for each attribute based on the links between attributes in each subset of the vectors. The second transformation computes a new numeric value for each attribute based on the links between attributes in each subset of the vectors. The third transformation operates on vectors that have not been categorized. Based on the automatic selection of categories from the attributes, this transformation computes a new numeric value for each attribute based on the links between attributes in each subset of the vectors” (Discrete vectors contain attribute information, and subsequently contain numeric information (statistical information) after being transformed))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Aliferis as modified to use the vector feature transformation as disclosed by Pottenger. The combination would have been obvious because a person of ordinary skill in the art would be motivated to reduce the dimensionality of data to capture information in a different format so that a scale of a model being created can be further reduced for easier understanding and visualization.

s 8-9, and 34-35 are rejected under 35 U.S.C. 103 as being unpatentable over Aliferis in view of Donaldson further in view of Aaron, further in view of Pottenger, and further in view of U.S. Pub. No. US 20140337096 A1 to Bilenko, et al. (hereinafter, “Bilenko”)
As per claim 8, the combination of Aliferis, Donaldson, and Aaron as shown above teaches the method according to claim 2, Pottenger further teaches wherein the transforming of the at least one portion of the features of the prediction sample comprises:
transforming at least one discrete feature subset among the at least one portion of the features of the prediction sample into at least one corresponding continuous feature (Pottenger, Abstract discloses “Each vector is composed of a set of attributes that are either boolean or have been mapped to boolean form, The first transformation calculates a prior probability for each attribute based on the links between attributes in each subset of the vectors” and Fig 1 discloses input data consisting of a set of vectors and Para [0022] discloses “input data 110 is composed of vectors of attributes with categorical labels that divide the vectors into subsets” (subset of discrete feature vectors are to be transformed))
However, the combination of Aliferis, Donaldson, Aaron, and Pottenger fail to explicitly teach:
at least one corresponding continuous feature
However, Bilenko (Bilenko addresses the issue of using statistical features to make predictions) teaches:
at least one corresponding continuous feature (Bilenko, Para. [0003] discloses “…generates plural instances of statistical information for the respective subsets of data” (plural instance of statistical information corresponds to continuous features))


As per claim 9, the combination of Aliferis, Donaldson, Aaron, Pottenger, and Bilenko teaches the method according to claim 8, Pottenger further teaches:
wherein the discrete feature subset indicates attribute information of the prediction sample, (Pottenger, Abstract discloses “Each vector is composed of a set of attributes that are either boolean or have been mapped to boolean form” (vectors contain Boolean attributes that make the vector discrete))
Bilenko further teaches:
wherein, the corresponding continuous feature indicates statistical information of the attribute information about a prediction target of the prediction model; or, the corresponding continuous feature indicates a prediction weight of the attribute information about the prediction target of the prediction model (Bilenko, Para. [0038] discloses “The statistical information can also include various averages, ratios, etc. The statistical information can also include information which represents the output of a prediction module…The statistical information can provide one or more statistical measures that are based on these predicted values” and Para. [0003] discloses “…generates plural instances of statistical information for the respective subsets of data” (statistical information corresponds to continuous features) (continuous feature includes statistical information of attributes that indicate prediction targets. Prediction targets can be equated to the averages, ratios, output of a prediction model))
Same motivation to combine Aliferis, Donaldson, Aaron, Pottenger, and Bilenko as claim 8

As per claim 34, the combination of Aliferis, Donaldson, and Aaron as shown above teaches the computing device according to claim 31, Pottenger further teaches: wherein the transforming of the at least one portion of the features of the prediction sample comprises:
transforming at least one discrete feature subset among the at least one portion of the features of the prediction sample into at least one corresponding continuous feature (Pottenger, Abstract discloses “Each vector is composed of a set of attributes that are either boolean or have been mapped to boolean form, The first transformation calculates a prior probability for each attribute based on the links between attributes in each subset of the vectors” and Fig 1 discloses input data consisting of a set of vectors and Para [0022] discloses “input data 110 is composed of vectors of attributes with categorical labels that divide the vectors into subsets” (subset of discrete feature vectors are to be transformed))
However, the combination of Aliferis, Donaldson, Aaron, and Pottenger fail to explicitly teach:
at least one corresponding continuous feature
However, Bilenko teaches:
(Bilenko, Para. [0003] discloses “…generates plural instances of statistical information for the respective subsets of data” (statistical information corresponds to continuous features))
Same motivation to combine Aliferis, Donaldson, Aaron, Pottenger, and Bilenko as claim 8

As per claim 35, the combination of Aliferis, Donaldson, Aaron, Pottenger, and Bilenko teaches the computing device according to claim 34, Pottenger further teaches:
wherein the discrete feature subset indicates attribute information of the prediction sample, (Pottenger, Abstract discloses “Each vector is composed of a set of attributes that are either boolean or have been mapped to boolean form” (vectors contain Boolean attributes that make the vector discrete))
Bilenko further teaches:
wherein, the corresponding continuous feature indicates statistical information of the attribute information about a prediction target of the prediction model; or, the corresponding continuous feature indicates a prediction weight of the attribute information about the prediction target of the prediction model (Bilenko, Para. [0038] discloses “The statistical information can also include various averages, ratios, etc. The statistical information can also include information which represents the output of a prediction module…The statistical information can provide one or more statistical measures that are based on these predicted values” and Para. [0003] discloses “…generates plural instances of statistical information for the respective subsets of data” (continuous feature includes statistical information of attributes that indicate prediction targets. Prediction targets can be equated to the averages, ratios, output of a prediction model))
Same motivation to combine Aliferis, Donaldson, Aaron, Pottenger, and Bilenko as claim 8

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Aliferis in view of Donaldson, further in view of U.S. Pub. No. US 20150379426 A1 to Steele, et al. (hereinafter, “Steele”)
As per claim 10, the combination of Aliferis and Donaldson as shown above teaches the method according to claim 1, the combination of Aliferis and Donaldson fails to explicitly teach:
obtaining the at least one prediction sample based on at least one prediction model training sample on a basis of which the prediction model is trained, and inputting the at least one prediction sample into the prediction model
However, Steele (Steele addresses the issue of training of machine learning models) teaches wherein prior to acquiring at least one prediction result obtained by the prediction model with respect to at least one prediction sample, the method further comprises:
obtaining the at least one prediction sample based on at least one prediction model training sample on a basis of which the prediction model is trained, and inputting the at least one prediction sample into the prediction model (Steele, Para. [0151] discloses “FIG. 27 illustrates an example of data set splits that may be used for cross-validation of a machine learning model, according to at least some embodiments. In the depicted embodiment, a data set comprising labeled observation records 2702 is split five different ways to obtain respective training sets 2720 (e.g., 2720A-2720E) each comprising 80% of the data, and corresponding test sets 2710 (e.g., 2710A-2710E) comprising the remaining 20% of the data. Each of the training sets 2720 may be used to train a model, and the corresponding test set 2710 may then be used to evaluate the model” (prediction sample is equated to a test set which is based off a training set, where the prediction sample is input into a model))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Aliferis as modified to use the method of obtaining a sample data based on training data as disclosed by Steele. The combination would have been obvious because a person of ordinary skill in the art would be motivated to generate a sample data that can be input into a model such that additional sample data does not need to be obtained

Claim 11-12 is rejected under 35 U.S.C. 103 as being unpatentable over Aliferis in view of Donaldson, further in view of Evaluating Machine Learning Models to Zheng (hereinafter, “Zheng”)
As per claim 11, the combination of Aliferis and Donaldson as shown above teaches the method according to claim 1, Aliferis further teaches:
wherein in training the decision tree model using the at least one decision tree training sample (Aliferis, Para. [0103] discloses “Step 5 performs feature selection to reduce the dimensionality of the input space for decision tree learning” (input space refers to dataset D1 which is the decision tree training sample))
The combination of Aliferis and Donaldson fails to explicitly teach:
the training of the decision tree model is performed under a preset regularization term about an expected scale of the decision tree model 
However, Zheng (Zheng addresses the issue of evaluation of machine learning models) teaches:
the training of the decision tree model is performed under a preset regularization term about an expected scale of the decision tree model  (Zheng, Hyperparameter Tuning chapter discloses “A regularization hyperparameter controls the capacity of the model, i.e., how flexible the model is, how many degrees of freedom it has in fitting the data” and “Decision trees have hyperparameters” and “Training a machine learning model often involves optimizing a loss function (the training metric)” (It is clear to a person of ordinary skill in the art that regularization is a must if the scale of a model must be controlled to prevent overfitting, etc.)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Aliferis as modified to use the regularization as disclosed by Zheng. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the efficiency of a model as using regularization would allow one to control the scale of a model such that, for example, overfitting is minimized.

As per claim 12, the combination of Aliferis and Donaldson as shown above teaches the method according to claim 1, Zheng further teaches:
(Zheng, Hyperparameter Tuning chapter discloses “Decision trees have hyperparameters such as the desired depth and number of leaves in the tree” (Regularization hyperparameters control the number of leaf or child nodes, depth of tree, etc.))
Same motivation to combine Aliferis, Donaldson, and Zheng as claim 11
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAMZA RAZZAQ MUGHAL whose telephone number is 571-272-8833. The examiner can normally be reached on M-TR from 7:30 to 5:00.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV, can be reached at telephone number 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
/H.R.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123