Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.
Claim Rejections - 35 USC § 112
Claims 11 and 19  are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term “substantially a similar number of cases” in claims 11 and 19 is a relative term which renders the claim indefinite. The term is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. In determining whether the claims are subject matter eligible, the Examiner applies the 2019 USPTO Patent Eligibility Guidelines. (2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50, Jan. 7, 2019.) which is now largely incorporated into MPEP 2106.
Step 1: Claims 1-13 are directed to a method (i.e., process), claims 14-19 are directed to a system (i.e., machine/apparatus), and claim 20 is directed to a computer program product comprising a non-transitory computer readable medium (i.e.,  product/article of manufacture);  therefore, all pending claims are directed to one of the four statutory categories of invention.
	Step 2A, Prong 1: Claim 1 recites “creating a model representing underperforming cases” (Mental processes – observation, evaluation, judgement, opinion); “from a case collection associated with a total performance parameter, and which comprises for each of a multiplicity of case records” (Mental processes – observation, evaluation, judgement, opinion); “a value for each feature from a collection of features, a ground truth label and a prediction of a machine learning engine, obtaining at least one feature” (Mental processes – observation, evaluation, judgment, opinion); “dividing the case records into at least two groups, based on values of the at least one feature in each of the case records, such each of the at least two groups has a portion of the case records associated therewith” (Mental processes – observation, evaluation, judgment, opinion); “for at least one group of the at least two groups, calculating a performance parameter of the machine learning engine over the portion of the case records associated with the group” (Mental processes – observation, evaluation, judgment, opinion, something that can be performed by the human mind with the aid of pen and paper); “subject to the performance parameter of the at least one group being below the total performance parameter in at least a predetermined threshold: determining a characteristic for the at least one group; and adding the characteristic for the at least one group to the model” (Mental processes – observation, evaluation, judgment, opinion, something that can be performed by the human mind with the aid of pen and paper);  “providing the model to a user, thus indicating under-performing parts of the test collection” is a mental process, something that a person with their mind and the aid of a pen and paper could accomplish. Therefore the claims are not patent eligible under 35 U.S.C. 101.
	The claim, under its broadest reasonable interpretation, covers performance of the limitation in the mind. Thus, the claim falls within the mental processes enumerated category of abstract ideas.
	Independent claim 14 recites similar limitations as found in claim 1, and a similar analysis applies.  
	Independent claim 20 recites similar limitations as found in claim 1, and a similar analysis applies.  
	Accordingly, the claims recite an abstract idea.
	Step 2A, Prong 2: Regarding claim 1, it does not recite additional elements. Regarding claims 14 and 20, the judicial exception is not integrated into a practical application
because the claim language only recites elements tied to the types of data collected for use in
the Mental Processes and does not include claim language demonstrating a claimed practical
application and because it does not impose any meaningful limitations on practicing the
abstract idea. Each of the additional limitations is no more than mere instructions to apply the exception using a generic computer component.  Simply implementing the abstract idea on a generic computer is not a practical application of the abstract idea. See applicant’s specification paragraphs [0054- 0078] for generic computer description. The judicial exception is not integrated into a practical application.  Accordingly, the claim as a whole does not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea and the claim recites an abstract idea.
	Independent claims 14, and 20 include computer components. However, each of the additional limitations is no more than mere instructions to apply the exception using a generic computer component.  Simply implementing the abstract idea on a generic computer for storing in memory or sending information over a network is not a practical application of the abstract idea.  Such elements do not integrate the abstract idea into a practical application and are conventional functions of generically claimed computer uses or are insignificant extra-solution activity.  See MPEP 2106.05(d)
	After considering all claim elements, both individually and in combination and in ordered combination, it has been determined that the claims do not integrate the abstract idea into a practical application.
	Step 2B: Regarding claim 1, it does not recite additional elements. Regarding claims 14 and 20, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, in the Step 2A, Prong 2 analysis, the additional elements of performing steps “A system having a processor, the processor being adapted to perform the steps of: creating a Cartesian model representing underperforming cases” (claim 14) and “A computer program product comprising a non-transitory computer readable medium retaining program instructions, which instructions when read by a processor, cause the processor to perform: creating a Cartesian model representing underperforming cases” (claim 20) are construed as generic or conventional computer components, to perform the mental process and amount to no more than mere instructions to apply the exception using a generic computer component. 
	The same conclusion is reached for dependent claims 2-13, and 15-19.  See below for detail.
	Claims 2 and 15 are dependent on independent claims 1 and 14, respectively.  The dependent claims recite “wherein said dividing, said calculating, said determining the characteristics and said adding are performed independently for at least a first feature and a second feature.” This is a further recitation of a limitation within the Mental processes (observation, evaluation, judgment, opinion) enumerated category of abstract ideas and is capable of being performed by a person in their mind or with the aid of pen and paper. 
	Claim 3 is dependent on dependent claim 2.  The dependent claim recites “wherein said calculating, said determining the characteristics and said adding are performed at overlapping time periods for at least the first feature and the second feature.”  This is a recitation of a limitation within the Mental processes (observation, evaluation, judgment, opinion) enumerated category of abstract ideas and is capable of being performed by a person in their mind or with the aid of pen and paper.
	Claims 4 and 16 are dependent on independent claims 1 and 14, respectively.  The dependent claims recite “wherein the at least one feature comprises at least a first feature and a second feature, and wherein the case records are divided into at least two groups, based on combinations of values of the first feature and the second feature assigned to each case record.”  This is a recitation of a limitation within the Mental processes (observation, evaluation, judgment, opinion) enumerated category of abstract ideas and is capable of being performed by a person in their mind or with the aid of pen and paper.  
	Claims 5 and 17 are dependent on dependent claims 4 and 16, respectively.  The dependent claims recite “wherein dividing the case records into at least two groups is performed using a decision tree, and wherein training the decision tree is performed using Iterative Dichotomiser 3 (ID3), C4.5 classification, or Classification and Regression Tree (CART).” Merely mentioning known algorithms does not preclude the limitation from being performed by a person in their mind or with the aid of a pen and paper.  Similarly, the mention of training without details for how the training is performed outside of listing known algorithms amount to mere instructions to apply the exception using a generic computer component. Therefore the claim falls within the Mental processes (observation, evaluation, judgment, opinion) enumerated category of abstract ideas.  See MPEP 2106.05(f)
	Claim 6 is dependent on dependent claim 5.  The dependent claim recites “wherein a target label when training the decision tree is a binary indication of whether a prediction of the ML engine is the same or different than a ground truth of each case.” The binary indication of whether the target label meets the ground truth is a mental process (observation, evaluation, judgement, opinion) capable of being performed by a person in their mind or with the aid of a pen and paper. The mention of training without details for how the training is performed amount to mere instructions to apply the exception using a generic computer component. Therefore the claim falls within the Mental processes (observation evaluation, judgment, opinion) enumerated category of abstract ideas. See MPEP 2106.05(f)
	Claims 7 is dependent on dependent claims 4. The dependent claims recite  “wherein dividing the case records into at least two groups is performed by determining a subgroup of a group of cases having combinations of values of the first feature and the second feature assigned, such that more wrong decisions are associated with the subgroup than with the group, relatively to sizes of the subgroup and the group.” This is a further recitation of a limitation within the Mental processes (observation, evaluation, judgment, opinion) enumerated category of abstract ideas and is capable of being performed by a person in their mind or with the aid of pen and paper.
	Claim 8 is dependent on dependent claim 6. The dependent claim recites “wherein determining the subgroup is performed using genetic algorithms or linear programming.” Merely mentioning known algorithms does not preclude the limitation from being performed by a person in their mind or with the aid of a pen and paper. Therefore the claim falls within the Mental processes (observation evaluation, judgment, opinion) enumerated category of abstract ideas.
	Claim 9 is dependent on independent claim 1. The dependent claim recites “wherein dividing the case records into the at least two groups, is performed based on discrete values of the at least one feature in each of the case records.”  This is a further recitation of a limitation within the Mental processes (observation, evaluation, judgment, opinion) enumerated category of abstract ideas and is capable of being performed by a person in their mind or with the aid of pen and paper.
	Claims 10 and 19 are dependent on independent claims 1 and 14, respectively.  The dependent claims recite “wherein dividing the case records into at least two groups is performed based on clustering the multiplicity of values of the at least one feature in the case records.”  This is a further recitation of a limitation within the Mental processes (observation, evaluation, judgment, opinion) enumerated category of abstract ideas and is capable of being performed by a person in their mind or with the aid of pen and paper.
	Claim 11 is dependent from independent claim 1.  The dependent claim recites “wherein dividing the case records into the at least two groups, is based on binning the multiplicity of values into a predetermined number of bins, such that all bins have substantially a similar number of cases associated with it.”  This is a further recitation of a limitation within the Mental processes (observation, evaluation, judgment, opinion) enumerated category of abstract ideas and is capable of being performed by a person in their mind or with the aid of pen and paper.
	Claim 12 is dependent on independent claim 1.  The dependent claim recites “wherein dividing the case records into at least two groups is performed based on applying Highest Posterior Density (HPD) method, such that one group of the at least two groups comprises cases for which the values of the at least one feature falls within a HPD interval.” Merely mentioning known algorithms does not preclude the limitation from being performed by a person in their mind or with the aid of a pen and paper. Therefore the claim falls within the Mental processes (observation evaluation, judgment, opinion) enumerated category of abstract ideas.
	Claim 13 is dependent on independent claim 1.  The dependent claim recites “wherein the model is a Cartesian model.” Merely mentioning the format for developing a model does not preclude the limitation from being performed by a person in their mind or with the aid of a pen and paper. Therefore the claim falls within the Mental processes (observation evaluation, judgment, opinion) enumerated category of abstract ideas.
	Claims 18 is dependent on dependent claim 16. The dependent claims recite  “wherein dividing the case records into at least two groups is performed by determining a subgroup of a group of cases having combinations of values of the first feature and the second feature assigned, such that more wrong decisions are associated with the subgroup than with the group, relatively to sizes of the subgroup and the group, and wherein determining the subgroup is performed using genetic algorithms or linear programming.” This is a further recitation of a limitation within the Mental processes (observation, evaluation, judgment, opinion) enumerated category of abstract ideas and is capable of being performed by a person in their mind or with the aid of pen and paper.
	Therefore, claims 1-20 are not patent eligible.	

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-4, 7, 9-11, 13, 14-16, and 19-20 are rejected under 35 U.S.C. § 102 as being anticipated by Amershi et al (ModelTracker: Redesigning Performance Analysis Tools for Machine Learning, herein Amershi).

Regarding claim 1,
	Amershi teaches a method comprising:  creating a model representing underperforming cases (Amershi, Figure 1, and, page 338, column 1, paragraph 2, line 1 “In this paper, we present ModelTracker (Figure 1), an interactive visualization designed to encourage a more informed approach to model building in machine learning.”

    PNG
    media_image1.png
    563
    1374
    media_image1.png
    Greyscale

In other words, ModelTracker is a method generating a model for representing underperforming cases.) ; 
	from a case collection associated with a total performance parameter (Amershi, page 337, column 2, paragraph 2, line 1 “Performance inspection typically begins with an assessment of a model’s overall ability to correctly predict labels on data, often represented with summary statistics or graphs of common metrics (e.g., accuracy values, precision-recall curves).” In other words, data is case collection, and accuracy value is a total performance parameter.), and which comprises for each of a multiplicity of case records :
	a value for each feature from a collection of features, a ground truth label and a prediction of a machine learning engine, obtaining at least one feature  (Amershi, page 337, column 2, paragraph 1, line 1 “Machine learning is an iterative process. In supervised machine learning, practitioners iteratively collect and label a sample of data, create features to represent the data, train a model with the data and features, and then inspect the model’s performance to determine how to proceed in the next iteration (e.g., collecting more data, adding/editing features, experimenting with a different learning algorithm.” Examiner notes that typically in the field of machine learning the term “model” refers to the machine learning model, i.e. a decision tree, neural network, genetic algorithm, etc.   However, the claimed invention defines “model” as “model representing underperforming cases” (Specification, paragraph [0006], line 2.) which is distinguished from “machine learning engine” that generates the original predictions and is typically referred to as “machine learning model”. Amershi refers to “model” in the more typical sense of a machine learning model that generates predictions. In other words, feature is feature, supervised learning (which requires accurately labeled data) implies ground truth labels, adding features is obtaining at least one feature, model is machine learning engine and model performance is prediction.);  
	dividing the case records into at least two groups, based on values of the at least one feature in each of the case records, such each of the at least two groups has a portion of the case records associated therewith (Amershi, page 338, column 1, paragraph 1, line 1 “Debugging model performance typically requires a disruptive cognitive switch from the primary task of building a model to the task of analyzing prediction errors (i.e., examples whose user-provided labels are predicted incorrectly by the model). For example, performance analysis may involve first locating errors within large datasets via sorting and filtering (e.g., sorting by model prediction scores, filtering (e.g., sorting by model prediction scores, filtering by errors types) and then inspecting raw data to form hypotheses about potential causes of errors.”  In other words, features is features (from above mapping), locating errors within large datasets via sorting and filtering is dividing the case records into at least two groups, and 
sorting by model prediction scores is based on values of the at least one feature in each of the records.) ;
	for at least one group of the at least two groups, calculating a performance parameter of the machine learning engine over the portion of the case records associated with the group (Amershi, page 339, column 1, paragraph 2, line 1 “Dimensionality reduction techniques such as principal components analysis, multidimensional scaling, and clustering can also be used for model debugging [20]. Dimensionality reduction projects high-dimensional data onto fewer dimensions to enable visual inspection of relationships  between individual examples, often via two-dimensional scatterplots.  While these techniques facilitate deeper model analysis, they can also be complex and difficult to extract insight from [7]. Multidimensional scaling, for example, requires determining a distance function to represent the similarity between data, choosing a scaling mechanism, and then interpreting results.” In other words, clustering is at least two groups, each box (in Figure 1) located based on the model’s prediction score is calculating a performance parameter, and model is machine learning engine.)  ;
	subject to the performance parameter of the at least one group being below the total performance parameter in at least a predetermined threshold: determining a characteristic for the at least one group; and adding the characteristic of the at least one group to the model  (Amershi, Figure 1, Figure 3, and left panel of figure 3. See mapping above, 
      
    PNG
    media_image2.png
    717
    194
    media_image2.png
    Greyscale
         
    PNG
    media_image3.png
    622
    652
    media_image3.png
    Greyscale

Left panel of Figure 3 (enlarged).
In other words, clustering is at least two groups, each box (in Figure 1) located based on the model’s prediction score is performance parameter, features (from Figure 3 left panel) is characteristic, and the interactive Prediction threshold (Figure 1) is predetermined threshold.) ; and
	providing the model to a user, thus indicating under-performing parts of the test collection  (Amershi, Figure 2, 

    PNG
    media_image4.png
    547
    1349
    media_image4.png
    Greyscale

In other words, Figure 2 shows providing the model to a user, and confusion matrices show underperforming parts of the test collection.).
Regarding claim 2,
	Amershi teaches the method of Claim 1, wherein 
	said dividing, said calculating, said determining the characteristics and said adding are performed independently for at least a first feature and a second feature  (Amershi, page 339, column 2, paragraph 2, line 1 “Scatterplots are also used for displaying other properties of the data during model building. Scatterplots and scatterplot matrices, for example, are often used to display correlation between features over the current data set [20].” In other words, identifying correlations between features over the current data set is dividing, calculating, and determining characteristics of a first and second feature is performed independently.) .  
Regarding claim 3,
	Amershi teaches the method of Claim 2, wherein 
	said calculating, said determining the characteristics and said adding are performed at overlapping time periods for at least the first feature and the second feature  (Amershi, page 339, column 2, paragraph 2, line 1 “Scatterplots are also used for displaying other properties of the data during model building. Scatterplots and scatterplot matrices, for example, are often used to display correlation between features over the current data set [20].” In other words, determining correlations between features over the current data set is determining characteristics of a first and second feature at overlapping time periods.) .  
Regarding claim 4,
	Amershi teaches the method of Claim 1, wherein 
	the at least one feature comprises at least a first feature and a second feature  (Amershi, Figure 3, left panel, and page 339, column 2, paragraph 2, line 1 “Scatterplots are also used for displaying other properties of the data during model building. Scatterplots and scatterplot matrices, for example, are often used to display correlation between features over the current data set [20].” In other words, correlation between features is at least one feature comprises a first and second feature.) , and wherein
	the case records are divided into at least two groups, based on combinations of values of the first feature and the second feature assigned to each case record (Amershi, page 339, column 2, paragraph 2, line 1 “Scatterplots are also used for displaying other properties of the data during model building. Scatterplots and scatterplot matrices, for example, are often used to display correlation between features over the current data set [20]. This can help identify feature dependencies as well as potential clusters and outliers.” In other words, data set is case records, clusters is divided into at least two groups, and correlation between features is based on combinations of values of the first and second feature assigned to each case record.).  

Regarding claim 7,
	Amershi teaches the method of Claim 4, wherein 
	dividing the case records into at least two groups is performed by determining a subgroup of a group of cases having combinations of values of the first feature and the second feature assigned, such that more wrong decisions are associated with the subgroup than with the group, relatively to sizes of the subgroup and the group (Amershi, Figure 1, Figure 3, and, page 339, column 2, paragraph 2, line 2 “Scatterplots and scatterplot matrices, for example, are often used to display correlation between features over the current data set [20]. This can help identify feature dependencies as well as potential clusters and outliers.” and, page 341, column 2, paragraph 2, line 1 “ModelTracker automatically updates as a user iterated model building, adding boxes as more data is provided and rearranging boxes as prediction score change (e.g., with new data or features).” In other words, as shown in Figure 3, left panel, features can be selected or deselected, alone or as a group and the prediction results of these changes are automatically shown in the bottom panel.  In other words, features is a first feature and a second feature, cluster is subgroup, identify…clusters is determining a subgroup, and potential clusters and outliers in the red area of Figure 1 are subgroup associated with wrong decision in higher proportion than the overall group.) .
Regarding claim 9,
	Amershi teaches the method of Claim 1, wherein 
	dividing the case records into the at least two groups, is performed based on discrete values of the at least one feature in each of the case records (Amershi, Figure 3, left panel, The features are identified as true or false which represent discrete values of the at least one feature.). 
Regarding claim 10,
	Amershi teaches the method of Claim 1, wherein 
	dividing the case records into at least two groups is performed based on clustering the multiplicity of values of the at least one feature in the case records (Amershi, Figure 1, Figure 3, In Figure 1, the red boxes indicate individual records that are labeled as false, the green boxes are labeled as true. The vertical prediction threshold can be adjusted by moving to the left or right.  A red box is to the right of the prediction threshold indicates a false positive.  A green box to the left of the threshold indicates a false negative. In other words, the vertical threshold is dividing the case records into two groups based on the multiplicity of the values based on at least one feature of the case records.) .  
Regarding claim 11,
	Amershi teaches the method of Claim 1, wherein 
	dividing the case records into the at least two groups, is based on binning the multiplicity of values into a predetermined number of bins, such that all bins have substantially a similar number of cases associated with it (Amershi, Figure 1, and, page 341, column 1, paragraph 2, line 1 “As labeled examples accumulate, boxes with the same prediction score are binned and stacked away from the horizontal line (e.g., examples in the top section are stacked upwards). Boxes within each bin are sorted such that examples potentially needing attention (e.g., errors) appear closer to the horizontal line.” Examiner notes that there are a limited number of bins per vertical line. In other words, the boxes with the same prediction score are binned is binning the multiplicity of values into a predetermined number of bins.).  
Regarding claim 13,
	Amershi teaches the method of Claim 1, wherein 
	the model is a Cartesian model (Amershi, Figure 1. Examiner notes that the specification of the instant application does not explicitly define Cartesian model.  The closest description is “One technical solution comprises the generation of a Cartesian model, designed to comprise a description of areas in which the engine provides deficient results.” (Specification, paragraph [0018].) Therefore, Examiner is interpreting that a Cartesian model is simply a way to present a description or display of areas in which the model provides inaccurate results.  In other words, Figure 1 displaying the evaluation of each of the cases on a two-dimensional graph is Cartesian model.).  
Claims 14, 15, and 16 are system claims corresponding to method claims 1, 2, and 4, respectively.  Otherwise they are the same.  It is implicit that a computer implemented method requires a system with a processor in order to execute.  Therefore, claims 14, 15, and 16 are rejected for the same reasons as claims 1, 2, and 4, respectively.
Claim 19 is system claim that is a combination of method claims 10 and 11, respectively.  Otherwise, it is the same.  Therefore, claim 19 is rejected for the same reasons as claims 10 and 11, respectively.
Claim 20 is a computer program product comprising a non-transitory computer readable medium that corresponds to method claim 1.  Otherwise, they are the same.  It is implicit that a computer implemented method requires a computer program product comprising at least one non-transitory computer readable medium in order to execute.  Therefore, claim 20 is rejected for the same reasons as claim 1.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5, 6, and 17 are rejected under 35 U.S.C. § 103 as being unpatentable over Amershi and Piao et al (Discovery of Significant Classification Rules from Incrementally Inducted Decision Tree Ensemble for Diagnosis of Disease, herein Piao).
Regarding claim 5
	Amershi teaches the method of Claim 4, wherein
	Thus far, Amershi does not explicitly teach dividing the case records into at least two groups is performed using a decision tree, and wherein training the decision tree is performed using Iterative Dichotomiser 3 (ID3), C4.5 classification, or Classification and Regression Tree (CART).  
	Piao teaches dividing the case records into at least two groups is performed using a decision tree, and wherein training the decision tree is performed using Iterative Dichotomiser 3 (ID3), C4.5 classification, or Classification and Regression Tree (CART)   (Piao, Equation (3), and, page 588, paragraph 7, line 1 “The decision tree algorithm C4.5 [2] is developed from ID3 in the following ways: Handling missing data, handling continuous data, and pruning, generating rules, and splitting.  For splitting purpose, C4.5 uses the Gain Ratio instead of Information Gain.  C4.5 uses the largest Gain Ratio that ensures a larger than average information gain.  Given a data set D, and it is split into s subsets S = {D1, D2, …, Ds}:   

    PNG
    media_image5.png
    75
    573
    media_image5.png
    Greyscale

In other words, split into s subsets is divide the case records into at least two groups, decision tree is decision tree, and algorithm C4.5 is used for training the decision tree.).
	Both Amershi and Piao are directed to classification systems.  Amershi teaches a method for improving performance of machine learning classification models but does not explicitly teach using a decision tree for dividing the dataset into subgroups. Piao teaches using a decision tree for identifying underperforming subgroups for the purpose of classification, but does not explicitly teach a method for improving the performance of machine learning models.  In view of the teaching of Amershi, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Piao into Amershi.  This would result in a method for improving the performance of a machine learning system that can use a decision tree for identifying underperforming subgroups of the data classification.
	One of ordinary skill in the art would be motivated to do this because decision trees generalize well, are efficient, and are easy to understand.  (Piao, page 587, paragraph 1, line 1 “Decision trees are commonly used for gaining information for the purpose of decision making.  For inductive learning, decision tree is attractive for 3 reasons: (1) Decision tree is a good generalization for unobserved instance, only if the instances are described in terms of features that are correlated with the target concept. (2) The methods are efficient in computation that is proportional to the number of observed training instances. (3) The result of decision tree provides a representation of the concept that is explainable to a human.”)
Regarding claim 6,
	The combination of Amershi and Piao teaches the method of Claim 5 wherein 
	a target label when training the decision tree is a binary indication of whether a prediction of the ML engine is the same or different than a ground truth of each case  (Piao, Table 2, and, page 587, paragraph 1, line 2 “Decision tree is a good generalization for unobserved instance, only if the instances are described in terms of features that are correlated with the target concept.” And, page 590, paragraph 2, line 4 “The multiple decision trees are formed by bootstrap aggregating which repeatedly samples from a data set and the sampling is done with replacement.  It is that some instances may appear several times in the same training set, while others may be omitted from the training set.”

    PNG
    media_image6.png
    268
    765
    media_image6.png
    Greyscale

Examiner notes that decision trees are trained using supervised learning.  This means that the dataset used for training is labeled accurately (i.e. ground truth). In other words, target is target label, (from above mapping) C4.5 algorithm is used to train the tree with the training dataset, the training dataset includes ground truth labels, and the confusion matrix (i.e. malignant or benign) is binary indication of whether a prediction is the same or different from the ground truth.)  
Claim 17 is a system claim that corresponds to the combination of method claims 5 and 6, respectively.  Otherwise, it is the same.  Therefore, claim 17 is rejected for the same reasons as claims 5 and 6, respectively.
Claims 8 and 18 are rejected under 35 U.S.C. § 103 as being unpatentable over Amershi, Piao, and Shaffer, J. (WO 01/59610 A2, herein Shaffer).
 Regarding claim 8,
	The combination of Amershi and Piao teaches the method of Claim 6, wherein
	Thus far, the combination of Amershi and Piao does not explicitly teach determining the subgroup is performed using genetic algorithms or linear programming.
	Shaffer teaches determining the subgroup is performed using genetic algorithms or linear programming (Shaffer, Figure 1, and, page 14, line 9 “selecting (120) at least one subsequent plurality of feature sets from the pool of features (110) based on the measure of effectiveness of each evaluated feature set (131)..”

    PNG
    media_image7.png
    603
    645
    media_image7.png
    Greyscale

In other words, evolutionary algorithm is genetic algorithm, and selecting subsequent plurality of feature sets is determining a subgroup using genetic algorithms.).
	Both Shaffer and the combination of Amershi and Piao are directed to classification systems.  The combination of Amershi and Piao teaches a method for improving performance of machine learning classification models but does not explicitly teach using a genetic algorithm for identifying subgroups of the dataset. Schaffer teaches using a genetic algorithm for identifying underperforming subgroups for the purpose of classification, but does not explicitly teach a method for improving the performance of machine learning models.  In view of the teaching of the combination of Amershi and Piao, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Schaffer into the combination of Amershi and Piao.  This would result in a method for improving the performance of a machine learning system that can use a genetic algorithm for identifying underperforming subgroups of the data classification.
	One of ordinary skill in the art would be motivated to do this because evolutionary algorithms such as genetic algorithms have shown potential in being able to identify key features for classification. (Schaffer, page 3, line 29 “Evolutionary algorithms hold the promise of providing an identification of the most effective words, or features, to include in a classification system having limited processing and storage capabilities, and this invention addresses a method and apparatus that further enhance the used of evolutionary algorithms for identifying effective features subsets.”)
Claim 18 is a system claim that corresponds to the combination of claims 7 and 8, respectively.  Otherwise, it is the same.  Therefore, claim 18 is rejected for the same reasons as claims 7 and 8, respectively. 
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure:
Assegie, et al “Breast cancer prediction model with decision tree and adaptive boosting” teaches adaptively improving the decision tree model based on the results of prediction using adaptive boosting.
 Azar, et al “Decision tree classifiers for automated medical diagnosis” teaches a decision support tool for medical diagnosis using decision tree, boosted decision tree, and decision tree forest.
Kokol, et al “Evolutionary design of decision trees for medical application” teaches using a genetic algorithm to induce a decision tree that is optimized according to a given fitness function.
Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/B.I.R./Examiner, Art Unit 2124                                                                                                                                                                                                        
/YING YU CHEN/Primary Examiner, Art Unit 2125