DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
The following claims are pending in this office action: 1-13, 29 and 31-36
The following claims are amended: 1-2, 29, 31, and 36
The following claims are new: None
The following claims are cancelled: None
The following claims are rejected: 1-13, 29 and 31-36
Response to Arguments
Applicant’s arguments filed amendments on 08/04/2021 to address the 35 U.S.C. 101 rejection. In response to the Applicant’s amendments, the 35 U.S.C. 101 rejection has been withdrawn.
Applicant’s arguments with respect to claim 1-13, 29 and 31-36 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 29, 31-32, and 36 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pub. No. US 20140279760 A1 to Aliferis, et al. (hereinafter, “Aliferis”) in view of U.S. Pub. No. US 20160071017 A1 to Adjaoute (hereinafter, “Adjaoute”), and further in view of Decision trees to Scikit-learn (hereinafter, “Scikit”)
As per claim 1, Aliferis teaches a method for presenting a prediction model, comprising:
acquiring at least one prediction result obtained by the prediction model with respect to at least one prediction sample (Aliferis, Para [0090] discloses “Learn a model M1 from dataset D” and “Create new data D1 that comprises of the generated inputs followed by the corresponding M1 model-estimated outputs” (The prediction model is M1, the prediction sample is dataset D, model-estimated outputs are prediction results))
at least one prediction sample and the at least one prediction result (Aliferis, Para [0090] discloses “Learn a model M1 from dataset D” and “Create new data D1 that comprises of the generated inputs followed by the corresponding M1 model-estimated outputs” (The prediction model is M1, the prediction sample is dataset D, model-estimated outputs are prediction results))
(Aliferis, Para. [0104] discloses “In step 7, the decision tree models can be combined to provide an explanation of model M1”  and  Abstract discloses “The invention also converts a predictive model into a functionally equivalent model into a form that can be implemented and deployed more easily or efficiently in practice. The benefits include: model understandability and defensibility of modeling. A particularly interesting application is that of understanding the decision making of humans, comparison of the behavior of a human or computerized decision process against another and use to enhance education and guideline compliance/adherence detection and improvement.” (M1 is the prediction model that is fit to a decision tree))
Aliferis fails to explicitly teach:
acquiring at least one decision tree training sample for training a decision tree model based on the [[at least one prediction sample and the at least one prediction result]], wherein the decision tree model is used to fit [[the prediction model]]
training the decision tree model using the at least one decision tree training sample
However, Adjaoute teaches:
acquiring at least one decision tree training sample for training a decision tree model based on the [[at least one prediction sample and the at least one prediction result]], wherein the decision tree model is used to fit [[the prediction model]] (Adjaoute, Para. [0227[ discloses “FIG. 23 represents a decision tree 2300 in an example for a database 2301 maintained by an insurance company to predict a risk of an insurance contract based on a type of a car and an age of its driver. Database 2301 has three fields: (1) age, (2) car type, and (3) risk. The risk field is the output class that needs to be predicted for any new incoming data record. The age and the car type fields are used as inputs. The data mining technology builds a decision tree, e.g., one that can ease a search of cases in case-based reasoning to determine if an incoming transaction fits any profiles of similar cases existing in its database.” (Decisions trees are used to fit predictive data naturally) 
training the decision tree model using the at least one decision tree training sample (Adjaoute, Para. [0227[ discloses “FIG. 23 represents a decision tree 2300 in an example for a database 2301 maintained by an insurance company to predict a risk of an insurance contract based on a type of a car and an age of its driver. Database 2301 has three fields: (1) age, (2) car type, and (3) risk. The risk field is the output class that needs to be predicted for any new incoming data record. The age and the car type fields are used as inputs. The data mining technology builds a decision tree, e.g., one that can ease a search of cases in case-based reasoning to determine if an incoming transaction fits any profiles of similar cases existing in its database.” And Para. [0004] discloses “Machine learning can use various techniques such as supervised learning, unsupervised learning and Reinforcement learning. In supervised learning the learner is supplied with labeled training instances (set of examples), where both the input and the correct output are given. (Decision tree that is built/trained using data that represents the predictive data with supervised learning))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify representing a prediction model in the form of a decision tree as disclosed by Aliferis to build a decision tree using predictive data as disclosed by Adjaoute. The combination would have been obvious because a person of ordinary skill in the art would be motivated to “create a decision tree based on records in a training 
Aliferis fails to explicitly teach:
and visually presenting the trained decision tree model, wherein nodes presented in the trained decision tree model include intermediate nodes and endpoints, each of the intermediate nodes represents judgment for a certain condition, and a path that satisfies the condition of the intermediate node and a path that does not satisfy the condition of the intermediate node are differentially displayed, and each of the endpoints indicates the prediction result.
However, Scikit teaches:
and visually presenting the trained decision tree model, wherein nodes presented in the trained decision tree model include intermediate nodes and endpoints, each of the intermediate nodes represents judgment for a certain condition, and a path that satisfies the condition of the intermediate node and a path that does not satisfy the condition of the intermediate node are differentially displayed, and each of the endpoints indicates the prediction result. (Scikit, “Once trained, we can export the tree in Graphviz format using the export_graphviz exporter….The export_graphviz exporter also supports a variety of aesthetic options, including coloring nodes by their class (or value for regression) and using explicit variable and class names if desired. IPython notebooks can also render these plots inline using the Image() function:” and 
    PNG
    media_image1.png
    788
    1092
    media_image1.png
    Greyscale
 visually displays the trained decision tree where intermediate nodes and endpoints are displayed, each intermediate node containing judgement for conditions, and separate paths representing different conditions such as 1, 0 (true, false) and the endpoints additionally indicating prediction results)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify representing a prediction model in the form of a decision tree as disclosed by Aliferis to visually display trained decision trees as disclosed by Scikit. The combination would have been obvious because a person of ordinary skill in the art would be motivated to visually understand the decision tree to grasp how and 
Aliferis is directed towards converting a predictive model into its equivalent decision tree counterpart by fitting the predictive model into the decision tree. Adjaoute is directed towards predictive model wherein decision trees are used to fit predictive data that is output via a predictive model, and Scikit is directed towards a framework which enables one to train decision trees using data and subsequently visually display the decision tree so that one can understand the different paths and nodes within the decision tree. Further, one of ordinary skill in the art knows that decision trees are used to fit data and also that they are easy to read and interpret. The combination of the three references teaches claim 1 as currently claimed as Aliferis teaches the predictive model and its data, Adjaoute teaches training a decision tree model using predictive data wherein the decision tree fits the predictive data, and Scikit teaches visually displaying the decision tree to enable one to easily understand it. Thus, the combination of the three cited references sufficiently teaches claim 1 as currently claimed.

As per claim 2, the combination of Aliferis, Adjaoute, and Scikit as shown above teaches the method according to claim 1, Aliferis further teaches wherein in the acquiring at least one decision tree training sample for training a decision tree model based on the at least one prediction sample and the at least one prediction result:
using at least one portion of features of the prediction sample as features of the decision tree training sample (Aliferis, Para. [0103] discloses “Step 5 performs feature selection to reduce the dimensionality of the input space for decision tree learning” (features from the prediction sample are used in the decision tree training sample))
obtained prediction result (Aliferis, Para [0090] discloses “Learn a model M1 from dataset D” and “Create new data D1 that comprises of the generated inputs followed by the corresponding M1 model-estimated outputs” (The prediction model is M1, the prediction sample is dataset D, model-estimated outputs are prediction results))
Adjaoute further teaches:
and acquiring a label of the decision tree training sample based on corresponding [[obtained prediction result]] (Adjaoute, Para. [0004] discloses “Machine learning can use various techniques such as supervised learning, unsupervised learning and Reinforcement learning. In supervised learning the learner is supplied with labeled training instances (set of examples), where both the input and the correct output are given. For example, historical stock prices are used to guesses future prices. Each example used for training is labeled with the value of interest—in this case the stock price. A supervised learning algorithm learns from the labeled values using information such as the day of the week, the season, the company's financial data, the industry, etc. After the algorithm has found the best pattern it can, it uses that pattern to make predictions.” And Fig 23 discloses building a decision tree from predictive data which already contains assigned labels of the data. (Prediction results contain classifications or labels thus one may simply acquire classification for non classified data based off of a classification label that has been output prior))
and wherein the label of the decision tree training sample corresponds to the [[obtained prediction result]] (Adjaoute, Para. [0004] discloses “Machine learning can use various techniques such as supervised learning, unsupervised learning and Reinforcement learning. In supervised learning the learner is supplied with labeled training instances (set of examples), where both the input and the correct output are given. For example, historical stock prices are used to guesses future prices. Each example used for training is labeled with the value of interest—in this case the stock price. A supervised learning algorithm learns from the labeled values using information such as the day of the week, the season, the company's financial data, the industry, etc. After the algorithm has found the best pattern it can, it uses that pattern to make predictions.” And Fig 23 discloses building a decision tree from predictive data which already contains assigned labels of the data. (Prediction results contain classifications or labels thus one may simply acquire classification for non classified data based off of a classification label that has been output prior))
Same motivation to combine Aliferis and Adjaoute as claim 1

As per claim 3, the combination of Aliferis, Adjaoute, and Scikit as shown above teaches the method according to claim 2, Aliferis further teaches:
wherein the at least one portion of the features of the prediction sample comprise a feature that plays a main role of prediction, and/or a feature that is easy to be understood by a user, among the features of the prediction sample (Aliferis, Para. [0103] discloses “Step 5 performs feature selection to reduce the dimensionality of the input space for decision tree learning. The Markov Boundary of a variable is typically a very small subset of the original input variables but is mathematically guaranteed to contain all predictive information about the variable that is contained in the full data” (Using the Markov boundary of features would result in features that play a main role of prediction as features are selected that contain all of the predictive information))

As per claim 29, Aliferis teaches a computing device for presenting a prediction model, comprising a storage component in which a set of computer-executable instructions is stored, and a processor, wherein when the set of the computer-executable instructions is executed by the processor, following steps are performed:
acquiring at least one prediction result obtained by the prediction model with respect to at least one prediction sample (Aliferis, Para [0090] discloses “Learn a model M1 from dataset D” and “Create new data D1 that comprises of the generated inputs followed by the corresponding M1 model-estimated outputs” (The prediction model is M1, the prediction sample is dataset D, model-estimated outputs are prediction results))
at least one prediction sample and the at least one prediction result (Aliferis, Para [0090] discloses “Learn a model M1 from dataset D” and “Create new data D1 that comprises of the generated inputs followed by the corresponding M1 model-estimated outputs” (The prediction model is M1, the prediction sample is dataset D, model-estimated outputs are prediction results))
the prediction model (Aliferis, Para. [0104] discloses “In step 7, the decision tree models can be combined to provide an explanation of model M1”  and  Abstract discloses “The invention also converts a predictive model into a functionally equivalent model into a form that can be implemented and deployed more easily or efficiently in practice. The benefits include: model understandability and defensibility of modeling. A particularly interesting application is that of understanding the decision making of humans, comparison of the behavior of a human or computerized decision process against another and use to enhance education and guideline compliance/adherence detection and improvement.” (M1 is the prediction model that is fit to the decision tree))
Aliferis fails to explicitly teach:
acquiring at least one decision tree training sample for training a decision tree model based on the [[at least one prediction sample and the at least one prediction result]], wherein the decision tree model is used to fit [[the prediction model]]
training the decision tree model using the at least one decision tree training sample
However, Adjaoute teaches:
acquiring at least one decision tree training sample for training a decision tree model based on the [[at least one prediction sample and the at least one prediction result]], wherein the decision tree model is used to fit [[the prediction model]] (Adjaoute, Para. [0227[ discloses “FIG. 23 represents a decision tree 2300 in an example for a database 2301 maintained by an insurance company to predict a risk of an insurance contract based on a type of a car and an age of its driver. Database 2301 has three fields: (1) age, (2) car type, and (3) risk. The risk field is the output class that needs to be predicted for any new incoming data record. The age and the car type fields are used as inputs. The data mining technology builds a decision tree, e.g., one that can ease a search of cases in case-based reasoning to determine if an incoming transaction fits any profiles of similar cases existing in its database.” (Decision trees fit predictive data)) 
(Adjaoute, Para. [0227[ discloses “FIG. 23 represents a decision tree 2300 in an example for a database 2301 maintained by an insurance company to predict a risk of an insurance contract based on a type of a car and an age of its driver. Database 2301 has three fields: (1) age, (2) car type, and (3) risk. The risk field is the output class that needs to be predicted for any new incoming data record. The age and the car type fields are used as inputs. The data mining technology builds a decision tree, e.g., one that can ease a search of cases in case-based reasoning to determine if an incoming transaction fits any profiles of similar cases existing in its database.” And Para. [0004] discloses “Machine learning can use various techniques such as supervised learning, unsupervised learning and Reinforcement learning. In supervised learning the learner is supplied with labeled training instances (set of examples), where both the input and the correct output are given. (Decision tree that is built/trained using data that represents the predictive data with supervised learning))
Same motivation to combine Aliferis and Adjaoute as claim 1
Aliferis fails to explicitly teach:
and visually presenting the trained decision tree model, wherein nodes presented in the trained decision tree model include intermediate nodes and endpoints, each of the intermediate nodes represents judgment for a certain condition, and a path that satisfies the condition of the intermediate node and a path that does not satisfy the condition of the intermediate node are differentially displayed, and each of the endpoints indicates the prediction result.
However, Scikit teaches:
(Scikit, “Once trained, we can export the tree in Graphviz format using the export_graphviz exporter….The export_graphviz exporter also supports a variety of aesthetic options, including coloring nodes by their class (or value for regression) and using explicit variable and class names if desired. IPython notebooks can also render these plots inline using the Image() function:” and 
    PNG
    media_image1.png
    788
    1092
    media_image1.png
    Greyscale
 visually displays the trained decision tree where intermediate nodes and endpoints are displayed, each intermediate node containing judgement for conditions, and separate paths representing different conditions such as 1, 0 (true, false) and the endpoints additionally indicating prediction results)
Same motivation to combine Aliferis and Scikit as claim 1

As per claim 31, the combination of Aliferis, Adjaoute, and Scikit as shown above teaches the computing device according to claim 29, Aliferis further teaches wherein in the acquiring at least one decision tree training sample for training a decision tree model based on the at least one prediction sample and the at least one prediction result:
using at least one portion of features of the prediction sample as features of the decision tree training sample (Aliferis, Para. [0103] discloses “Step 5 performs feature selection to reduce the dimensionality of the input space for decision tree learning” (features from the prediction sample are used in the decision tree training sample))
Adjaoute further teaches:
and acquiring a label of the decision tree training sample based on corresponding obtained prediction result (Adjaoute, Para. [0004] discloses “Machine learning can use various techniques such as supervised learning, unsupervised learning and Reinforcement learning. In supervised learning the learner is supplied with labeled training instances (set of examples), where both the input and the correct output are given. For example, historical stock prices are used to guesses future prices. Each example used for training is labeled with the value of interest—in this case the stock price. A supervised learning algorithm learns from the labeled values using information such as the day of the week, the season, the company's financial data, the industry, etc. After the algorithm has found the best pattern it can, it uses that pattern to make predictions.” And Fig 23 discloses building a decision tree from predictive data which already contains assigned labels of the data. (Prediction results contain classifications or labels thus one may simply acquire classification for non classified data based off of a classification label that has been output prior))
and wherein the label of the decision tree training sample corresponds to the obtained prediction result (Adjaoute, Para. [0004] discloses “Machine learning can use various techniques such as supervised learning, unsupervised learning and Reinforcement learning. In supervised learning the learner is supplied with labeled training instances (set of examples), where both the input and the correct output are given. For example, historical stock prices are used to guesses future prices. Each example used for training is labeled with the value of interest—in this case the stock price. A supervised learning algorithm learns from the labeled values using information such as the day of the week, the season, the company's financial data, the industry, etc. After the algorithm has found the best pattern it can, it uses that pattern to make predictions.” And Fig 23 discloses building a decision tree from predictive data which already contains assigned labels of the data. (Prediction results contain classifications or labels thus one may simply acquire classification for non classified data based off of a classification label that has been output prior))
Same motivation to combine Aliferis and Adjaoute as claim 1

As per claim 32, the combination of Aliferis, Adjaoute, and Scikit as shown above teaches the computing devices according to claim 31, Aliferis further teaches:
wherein the at least one portion of the features of the prediction sample comprise a feature that plays a main role of prediction, and/or a feature that is easy to be understood by a user, among the features of the prediction sample (Aliferis, Para. [0103] discloses “Step 5 performs feature selection to reduce the dimensionality of the input space for decision tree learning. The Markov Boundary of a variable is typically a very small subset of the original input variables but is mathematically guaranteed to contain all predictive information about the variable that is contained in the full data” (Using the Markov boundary of features would result in features that play a main role of prediction as features are selected that contain all of the predictive information))

As per claim 36, Aliferis teaches a non-transitory computer storage medium storing instructions that when executed by a processor causes the processor to perform operations comprising:
acquiring at least one prediction result obtained by the prediction model with respect to at least one prediction sample (Aliferis, Para [0090] discloses “Learn a model M1 from dataset D” and “Create new data D1 that comprises of the generated inputs followed by the corresponding M1 model-estimated outputs” (The prediction model is M1, the prediction sample is dataset D, model-estimated outputs are prediction results))
at least one prediction sample and the at least one prediction result (Aliferis, Para [0090] discloses “Learn a model M1 from dataset D” and “Create new data D1 that comprises of the generated inputs followed by the corresponding M1 model-estimated outputs” (The prediction model is M1, the prediction sample is dataset D, model-estimated outputs are prediction results))
the prediction model (Aliferis, Para. [0104] discloses “In step 7, the decision tree models can be combined to provide an explanation of model M1”  and  Abstract discloses “The invention also converts a predictive model into a functionally equivalent model into a form that can be implemented and deployed more easily or efficiently in practice. The benefits include: model understandability and defensibility of modeling. A particularly interesting application is that of understanding the decision making of humans, comparison of the behavior of a human or computerized decision process against another and use to enhance education and guideline compliance/adherence detection and improvement.” (M1 is the prediction model that is fit to the decision tree))
Aliferis fails to explicitly teach:
acquiring at least one decision tree training sample for training a decision tree model based on the [[at least one prediction sample and the at least one prediction result]], wherein the decision tree model is used to fit [[the prediction model]]
training the decision tree model using the at least one decision tree training sample
However, Adjaoute teaches:
acquiring at least one decision tree training sample for training a decision tree model based on the [[at least one prediction sample and the at least one prediction result, wherein the decision tree model is used to fit the prediction model]] (Adjaoute, Para. [0227[ discloses “FIG. 23 represents a decision tree 2300 in an example for a database 2301 maintained by an insurance company to predict a risk of an insurance contract based on a type of a car and an age of its driver. Database 2301 has three fields: (1) age, (2) car type, and (3) risk. The risk field is the output class that needs to be predicted for any new incoming data record. The age and the car type fields are used as inputs. The data mining technology builds a decision tree, e.g., one that can ease a search of cases in case-based reasoning to determine if an incoming transaction fits any profiles of similar cases existing in its database.” (Decision trees fit predictive data)) 
training the decision tree model using the at least one decision tree training sample (Adjaoute, Para. [0227[ discloses “FIG. 23 represents a decision tree 2300 in an example for a database 2301 maintained by an insurance company to predict a risk of an insurance contract based on a type of a car and an age of its driver. Database 2301 has three fields: (1) age, (2) car type, and (3) risk. The risk field is the output class that needs to be predicted for any new incoming data record. The age and the car type fields are used as inputs. The data mining technology builds a decision tree, e.g., one that can ease a search of cases in case-based reasoning to determine if an incoming transaction fits any profiles of similar cases existing in its database.” And Para. [0004] discloses “Machine learning can use various techniques such as supervised learning, unsupervised learning and Reinforcement learning. In supervised learning the learner is supplied with labeled training instances (set of examples), where both the input and the correct output are given. (Decision tree that is built/trained using data that represents the predictive data with supervised learning))
Same motivation to combine Aliferis and Adjaoute as claim 1
Aliferis fails to explicitly teach:

However, Scikit teaches:
and visually presenting the trained decision tree model, wherein nodes presented in the trained decision tree model include intermediate nodes and endpoints, each of the intermediate nodes represents judgment for a certain condition, and a path that satisfies the condition of the intermediate node and a path that does not satisfy the condition of the intermediate node are differentially displayed, and each of the endpoints indicates the prediction result. (Scikit, “Once trained, we can export the tree in Graphviz format using the export_graphviz exporter….The export_graphviz exporter also supports a variety of aesthetic options, including coloring nodes by their class (or value for regression) and using explicit variable and class names if desired. IPython notebooks can also render these plots inline using the Image() function:” and 
    PNG
    media_image1.png
    788
    1092
    media_image1.png
    Greyscale
 visually displays the trained decision tree where intermediate nodes and endpoints are displayed, each intermediate node containing judgement for conditions, and separate paths representing different conditions such as 1, 0 (true, false) and the endpoints additionally indicating prediction results)
Same motivation to combine Aliferis and Scikit as claim 1

Claims 4-6, and 33 are rejected under 35 U.S.C. 103 as being unpatentable over Aliferis in view of Adjaoute, further in view of Scikit, and further in view of JP Pub. No. JP 2012053880 A to Aaron, et al. (hereinafter, “Aaron”)
As per claim 4, the combination of Aliferis, Adjaoute, and Scikit as shown above teaches the method according to claim 2, the combination of Aliferis, Adjaoute, and Scikit fails to explicitly teach:
wherein, the at least one portion of the features of the prediction sample are transformed, in consideration of an expected scale of the decision tree model and/or node interpretability of the decision tree model
However, Aaron teaches:
wherein, the at least one portion of the features of the prediction sample are transformed, in consideration of an expected scale of the decision tree model and/or node interpretability of the decision tree model (Aaron, Para. [0029] discloses ”The determination of a subset of the feature data set that most accurately predicts the system output from the system input” and “In other cases, this may not be the case and it may be necessary to transform the data to create more appropriate “eigenvectors” that represent the data. Commonly used transformations include singular value decomposition (singular value decomposition) {Esday (SVD)}, principal component analysis) {PCA}, partial least square method) {PLS method} (Portion of features are transformed. It should be clear to a person of ordinary skill in the art that transformation of features can be done to reduce features to improve accuracy thus which would also improve interpretability of a node in a decision tree))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Aliferis as modified to use the feature transformation method as disclosed by Aaron. The combination would have been obvious 

As per claim 5, the combination of Aliferis, Adjaoute, Scikit and Aaron as shown above teaches the method according to claim 4, Aaron further teaches wherein the transforming of the at least one portion of the features of the prediction sample comprises:
transforming at least one feature subset among the at least one portion of the features of the prediction sample into at least one corresponding transformation feature subset respectively (Aaron, Para. [0029] discloses “The determination of a subset of the feature data set that most accurately predicts the system output from the system input” and “In other cases, this may not be the case and it may be necessary to transform the data to create more appropriate “eigenvectors” that represent the data. Commonly used transformations include singular value decomposition (singular value decomposition) {Esday (SVD)}, principal component analysis) {PCA}, partial least square method) {PLS method} (Subset of features are to be transformed))
Same motivation to combine Aliferis, and Aaron as claim 4

As per claim 6, the combination of Aliferis, Adjaoute, Scikit and Aaron as shown above teaches the method according to claim 5, Aaron further teaches:
(Aaron, Para. [0029] discloses “The determination of a subset of the feature data set that most accurately predicts the system output from the system input” and “The determination of a subset of the feature data set that most accurately predicts the system output from the system input” and “In other cases, this may not be the case and it may be necessary to transform the data to create more appropriate “eigenvectors” that represent the data. Commonly used transformations include singular value decomposition (singular value decomposition) {Esday (SVD)}, principal component analysis) {PCA}, partial least square method) {PLS method} (Taking a subset of features, before they are transformed, results in the subset having the same number or fewer elements))
Same motivation to combine Aliferis, and Aaron as claim 4

As per claim 33, the combination of Aliferis, Adjaoute, and Scikit as shown above teaches the method according to claim 31, the combination of Aliferis, Adjaoute, and Scikit fails to explicitly teach:
wherein, the at least one portion of the features of the prediction sample are transformed, in consideration of an expected scale of the decision tree model and/or node interpretability of the decision tree model
However, Aaron teaches:
wherein, the at least one portion of the features of the prediction sample are transformed, in consideration of an expected scale of the decision tree model and/or node (Aaron, Para. [0029] discloses ”The determination of a subset of the feature data set that most accurately predicts the system output from the system input” and “In other cases, this may not be the case and it may be necessary to transform the data to create more appropriate “eigenvectors” that represent the data. Commonly used transformations include singular value decomposition (singular value decomposition) {Esday (SVD)}, principal component analysis) {PCA}, partial least square method) {PLS method} (Portion of features are transformed. It should be clear to a person of ordinary skill in the art that transformation of features can be done to reduce features to improve accuracy thus which would also improve interpretability of a node in a decision tree))
Same motivation to combine Aliferis, and Aaron as claim 4

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Aliferis in view of Adjaoute, further in view of Scikit, further in view of Aaron, and further in view of U.S. Pub. No. US 20100312727 A1 to Pottenger, et al. (hereinafter, “Pottenger”)
As per claim 7, the combination of Aliferis, Adjaoute, Scikit,  and Aaron as shown above teaches the method according to claim 5, the combination of Aliferis, Adjaoute, Scikit,  and Aaron fails to explicitly teach:
 wherein the feature subset before being transformed indicates attribute information of the prediction sample, and the corresponding transform feature subset indicates statistical information or weight information of the attribute information
However, Pottenger (Pottenger addresses the issue of transforming data in vector form) teaches:
wherein the feature subset before being transformed indicates attribute information of the prediction sample, and the corresponding transform feature subset indicates statistical information or weight information of the attribute information (Pottenger, Abstract discloses “Each vector is composed of a set of attributes that are either boolean or have been mapped to boolean form. The vectors may or may not fall into categories assigned by a subject matter expert (SME). If categories exist, the categorical labels divide the vectors into subsets. The first transformation calculates a prior probability for each attribute based on the links between attributes in each subset of the vectors. The second transformation computes a new numeric value for each attribute based on the links between attributes in each subset of the vectors. The third transformation operates on vectors that have not been categorized. Based on the automatic selection of categories from the attributes, this transformation computes a new numeric value for each attribute based on the links between attributes in each subset of the vectors” (Discrete vectors contain attribute information, and subsequently contain numeric information (statistical information) after being transformed))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Aliferis as modified to use the vector feature transformation as disclosed by Pottenger. The combination would have been obvious because a person of ordinary skill in the art would be motivated to reduce the dimensionality of data to capture information in a different format so that a scale of a model being created can be further reduced for easier understanding and visualization.

Claims 8-9, and 34-35 are rejected under 35 U.S.C. 103 as being unpatentable over Aliferis in view of Adjaoute, further in view of Scikit, further in view of Pottenger, and further in view of U.S. Pub. No. US 20140337096 A1 to Bilenko, et al. (hereinafter, “Bilenko”)
As per claim 8, the combination of Aliferis, Adjaoute, and Scikit as shown above teaches the method according to claim 2, Pottenger further teaches wherein the transforming of the at least one portion of the features of the prediction sample comprises:
transforming at least one discrete feature subset among the at least one portion of the features of the prediction sample into [[at least one corresponding continuous feature]] (Pottenger, Abstract discloses “Each vector is composed of a set of attributes that are either boolean or have been mapped to boolean form, The first transformation calculates a prior probability for each attribute based on the links between attributes in each subset of the vectors” and Fig 1 discloses input data consisting of a set of vectors and Para [0022] discloses “input data 110 is composed of vectors of attributes with categorical labels that divide the vectors into subsets” (subset of discrete feature vectors are to be transformed))
However, the combination of Aliferis, Adjaoute, Scikit, and Pottenger fail to explicitly teach:
at least one corresponding continuous feature
However, Bilenko (Bilenko addresses the issue of using statistical features to make predictions) teaches:
(Bilenko, Para. [0003] discloses “…generates plural instances of statistical information for the respective subsets of data” (plural instance of statistical information corresponds to continuous features))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Aliferis as modified to use the continuous feature as disclosed by Bilenko. The combination would have been obvious because a person of ordinary skill in the art would be motivated to transform data into a separate format to reduce the dimensionality of data. Reducing the dimensionality would allow for a more accurate model to be created.

As per claim 9, the combination of Aliferis, Adjaoute, Scikit, Pottenger, and Bilenko teaches the method according to claim 8, Pottenger further teaches:
wherein the discrete feature subset indicates attribute information of the prediction sample, (Pottenger, Abstract discloses “Each vector is composed of a set of attributes that are either boolean or have been mapped to boolean form” (vectors contain Boolean attributes that make the vector discrete))
Bilenko further teaches:
wherein, the corresponding continuous feature indicates statistical information of the attribute information about a prediction target of the prediction model; or, the corresponding continuous feature indicates a prediction weight of the attribute information about the prediction target of the prediction model (Bilenko, Para. [0038] discloses “The statistical information can also include various averages, ratios, etc. The statistical information can also include information which represents the output of a prediction module…The statistical information can provide one or more statistical measures that are based on these predicted values” and Para. [0003] discloses “…generates plural instances of statistical information for the respective subsets of data” (statistical information corresponds to continuous features) (continuous feature includes statistical information of attributes that indicate prediction targets. Prediction targets can be equated to the averages, ratios, output of a prediction model))
Same motivation to combine Aliferis, and Bilenko as claim 8

As per claim 34, the combination of Aliferis, Adjaoute, and Scikit as shown above teaches the computing device according to claim 31, Pottenger further teaches: wherein the transforming of the at least one portion of the features of the prediction sample comprises:
transforming at least one discrete feature subset among the at least one portion of the features of the prediction sample into [[at least one corresponding continuous feature]] (Pottenger, Abstract discloses “Each vector is composed of a set of attributes that are either boolean or have been mapped to boolean form, The first transformation calculates a prior probability for each attribute based on the links between attributes in each subset of the vectors” and Fig 1 discloses input data consisting of a set of vectors and Para [0022] discloses “input data 110 is composed of vectors of attributes with categorical labels that divide the vectors into subsets” (subset of discrete feature vectors are to be transformed))
However, the combination of Aliferis, Adjaoute, Scikit, and Pottenger fail to explicitly teach:

However, Bilenko teaches:
at least one corresponding continuous feature (Bilenko, Para. [0003] discloses “…generates plural instances of statistical information for the respective subsets of data” (statistical information corresponds to continuous features))
Same motivation to combine Aliferis, and Bilenko as claim 8

As per claim 35, the combination of Aliferis, Adjaoute, Scikit, Pottenger, and Bilenko teaches the computing device according to claim 34, Pottenger further teaches:
wherein the discrete feature subset indicates attribute information of the prediction sample, (Pottenger, Abstract discloses “Each vector is composed of a set of attributes that are either boolean or have been mapped to boolean form” (vectors contain Boolean attributes that make the vector discrete))
Bilenko further teaches:
wherein, the corresponding continuous feature indicates statistical information of the attribute information about a prediction target of the prediction model; or, the corresponding continuous feature indicates a prediction weight of the attribute information about the prediction target of the prediction model (Bilenko, Para. [0038] discloses “The statistical information can also include various averages, ratios, etc. The statistical information can also include information which represents the output of a prediction module…The statistical information can provide one or more statistical measures that are based on these predicted values” and Para. [0003] discloses “…generates plural instances of statistical information for the respective subsets of data” (continuous feature includes statistical information of attributes that indicate prediction targets. Prediction targets can be equated to the averages, ratios, output of a prediction model))
Same motivation to combine Aliferis, and Bilenko as claim 8

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Aliferis in view of Adjaoute, further in view of Scikit, and further in view of U.S. Pub. No. US 20150379426 A1 to Steele, et al. (hereinafter, “Steele”)
As per claim 10, the combination of Aliferis, Adjaoute, and Scikit as shown above teaches the method according to claim 1, the combination of Aliferis, Adjaoute, and Scikit fails to explicitly teach:
obtaining the at least one prediction sample based on at least one prediction model training sample on a basis of which the prediction model is trained, and inputting the at least one prediction sample into the prediction model
However, Steele (Steele addresses the issue of training of machine learning models) teaches wherein prior to acquiring at least one prediction result obtained by the prediction model with respect to at least one prediction sample, the method further comprises:
obtaining the at least one prediction sample based on at least one prediction model training sample on a basis of which the prediction model is trained, and inputting the at least one prediction sample into the prediction model (Steele, Para. [0151] discloses “FIG. 27 illustrates an example of data set splits that may be used for cross-validation of a machine learning model, according to at least some embodiments. In the depicted embodiment, a data set comprising labeled observation records 2702 is split five different ways to obtain respective training sets 2720 (e.g., 2720A-2720E) each comprising 80% of the data, and corresponding test sets 2710 (e.g., 2710A-2710E) comprising the remaining 20% of the data. Each of the training sets 2720 may be used to train a model, and the corresponding test set 2710 may then be used to evaluate the model” (prediction sample is equated to a test set which is based off a training set, where the prediction sample is input into a model))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Aliferis as modified to use the method of obtaining a sample data based on training data as disclosed by Steele. The combination would have been obvious because a person of ordinary skill in the art would be motivated to generate a sample data that can be input into a model such that additional sample data does not need to be obtained

Claim 11-12 is rejected under 35 U.S.C. 103 as being unpatentable over Aliferis in view of Adjaoute, further in view of Scikit, and further in view of Evaluating Machine Learning Models to Zheng (hereinafter, “Zheng”)
As per claim 11, the combination of Aliferis, Adjaoute, and Scikit as shown above teaches the method according to claim 1, Adjaoute further teaches:
wherein in training the decision tree model using the at least one decision tree training sample (Adjaoute, Para. [0227[ discloses “FIG. 23 represents a decision tree 2300 in an example for a database 2301 maintained by an insurance company to predict a risk of an insurance contract based on a type of a car and an age of its driver. Database 2301 has three fields: (1) age, (2) car type, and (3) risk. The risk field is the output class that needs to be predicted for any new incoming data record. The age and the car type fields are used as inputs. The data mining technology builds a decision tree, e.g., one that can ease a search of cases in case-based reasoning to determine if an incoming transaction fits any profiles of similar cases existing in its database.” (Decision tree that is built/trained using data that represents the predictive data))
The combination of Aliferis, Adjaoute, and Scikit fails to explicitly teach:
the training of the decision tree model is performed under a preset regularization term about an expected scale of the decision tree model 
However, Zheng (Zheng addresses the issue of evaluation of machine learning models) teaches:
the training of the decision tree model is performed under a preset regularization term about an expected scale of the decision tree model  (Zheng, Hyperparameter Tuning chapter discloses “A regularization hyperparameter controls the capacity of the model, i.e., how flexible the model is, how many degrees of freedom it has in fitting the data” and “Decision trees have hyperparameters” and “Training a machine learning model often involves optimizing a loss function (the training metric)” (It is clear to a person of ordinary skill in the art that regularization is a must if the scale of a model must be controlled to prevent overfitting, etc.)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Aliferis as modified to use the regularization as disclosed by Zheng. The combination would have been obvious because a 

As per claim 12, the combination of Aliferis, Adjaoute, Scikit, and Zheng as shown above teaches the method according to claim 11, Zheng further teaches:
wherein the regularization term is used to limit a number of nodes, a number of layers, and/or a node sample minimum threshold, of the decision tree model (Zheng, Hyperparameter Tuning chapter discloses “Decision trees have hyperparameters such as the desired depth and number of leaves in the tree” (Regularization hyperparameters control the number of leaf or child nodes, depth of tree, etc.))
Same motivation to combine Aliferis, Zheng as claim 11

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Aliferis in view of Adjaoute, further in view of Scikit, and further in view of WO 2013067337 to Donaldson, et al. (hereinafter, “Donaldson”)
As per claim 13, the combination of Aliferis, Adjaoute, and Scikit as shown above teaches the method according to claim 1, the combination of Aliferis, Adjaoute and Scikit fails to explicitly teach:
wherein the visually presenting of the trained decision tree model comprises visually presenting the trained decision tree model through a pruning process, wherein a node which is cut in the pruning process is not presented, or is presented implicitly
However, Donaldson teaches:
wherein the visually presenting of the trained decision tree model comprises visually presenting the trained decision tree model through a pruning process, wherein a node which is cut in the pruning process is not presented, or is presented implicitly (Donaldson, Abstract discloses “A visualization system may automatically prune the decision tree model based on characteristics of nodes or branches in the decision tree or based on artifacts associated with model generation”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention, to modify Aliferis as modified to use the pruning process as disclosed by Donaldson. The combination would have been obvious because a person of ordinary skill in the art would be motivated to reduce the overall complexity of the decision tree that is displayed so that it is easier to understand by an individual. Pruning removes sections of the tree that may not provide any benefit to the tree, thus the decision tree will then have improved efficacy in being able to explain a model.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAMZA RAZZAQ MUGHAL whose telephone number is 571-272-8833. The examiner can normally be reached on M-TR from 7:30 to 5:00.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV, can be reached at telephone number 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
/H.R.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123