DETAILED ACTION
Response to Amendment
The amendment was received 9/9/21. Claims 1-20 are pending.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
Accordingly, regarding claims 1-20, 35 USC 112(f) is NOT invoked in claims 1-20. 




Response to Arguments
OBJECTION
101 REJECITON

Applicant’s arguments, see remarks, page 10, filed 9/9/21, with respect to the claim objection of claims 7 and 16-19 and the 35 USC 101 rejection of claims 11-15 have been fully considered and are persuasive. The objection of claims 7 and 16-19 has been withdrawn. The 35 USC 101 rejection of claims 11-15 has been withdrawn. 
103 REJECTIONS
Applicant’s arguments, see remarks, pages 10-12, filed 9/9/21, with respect to the rejection(s) of claim(s) 1-3,5-13,15-18 and 20 under 35 USC 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of 35 USC 103 in view of Yan et al. (US 2011/0289025 A1) that teaches filtering out misclassified labels with respect to bias corresponding to Yan’s fig. 3:310: “USE THE CONFIDENCE LEVEL TO REMOVE ANY NOISE AND BIAS FOR THE TRAINING DATA FOR EACH RULE AND ANY UNLABLED DTATA TO CREATE DENOISED AND DEBIASED TRAINING DATA SETS FOR EACH RULE”. Thus, IDS cited Silberman et al. (US Patent App. Pub. No.: US 2017/0330058 A1) is removed from the 35 USC 103 rejection for being redundant to Yan et al. (US 2011/0289025 A1).





Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Regarding inquiry 4, see Suggestions.
Claims 1-3,5-13,15-18 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vabalas et al. (Machine learning algorithm validation with a limited sample size) in view of Guyon et al. (Gene Selection for Cancer Classification using Support Vector Machines) and Yan et al. (US Patent App. Pub. No.: US 2011/0289025 A1).
Regarding claim 1, Vabalas teaches a computer-implemented method comprising: 


obtaining, in connection with execution of a first (as shown by duplicates in fig. 2 such as “the model was…calculated…10” times being “the final performance”, cited below: Vabalas, page 9, section Validation and performance evaluation, 3rd,4th paragraphs) model (via fig. 2: “Model” or as fig. 2D: “ a) Parameter tuning b) Feature selection” shown three times), one or more class (via “classification errors”, page 2, 3rd para) designations (or “labels”, pg. 7: Overview of procedures, 1st Sentence) attributed to data points (via fig. 2: “All data”) from a dataset (said via fig. 2: “All data”) used to train (via fig. 2: “Train data”) the first model;
identifying (or selecting or picking out via said fig. 2: “Model” or as fig. 2D: “ a) Parameter tuning b) Feature selection” shown three times) any of the data points (said via fig. 2: “All data”) associated (via by being higher than or right on) with at least one of (i) an inaccurate (via “theoretical” “error”: pg. 2, 3rd para, 2nd S) class designation (via said “labels”) and (ii) a class designation associated with a confidence value below a given threshold; 
generating an updated version of the dataset by filtering, from the dataset, (said via fig. 2: “All data”) the data points identified with at least one of (i) an inaccurate class designation and (ii) a class designation associated with a confidence value below the given threshold;





training (said via fig. 2: “Train data”) a second (as shown by duplicates in fig. 2 such as “the model was…calculated…10” times being “the final performance” and “Support Vector Machine…coupled with Support Vector Machine”, page 7, Methods, penultimate S) model (via fig. 2: “Model” or as fig. 2D: “ a) Parameter tuning b) Feature selection” shown three times) using the data points (said via fig. 2: “All data”) from the updated version of the dataset (said via fig. 2: “All data”), the second model is related (as shown in fig. 2) to the first model;












determining (via “testing…biased results” comprising the means by which the presence, quality, or genuineness of biased results is determined as indicated by the graphs in figures 3-8 in the context of “investigating…bias” to find a motive, cause or culprit) bias (or “bias” represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%) related to at least a portion of those data points (said via fig. 2: “All data”) used to train (said via fig. 2: “Train data”) the second (said as shown by duplicates in fig. 2 such as “the model was…calculated…10” times being “the final performance” and “Support Vector Machine…coupled with Support Vector Machine”) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”), wherein said determining (said via “testing…biased results” comprising the means by which the presence, quality, or genuineness of biased results is determined as indicated by the graphs in figures 3-8) comprises: 
modifying (or reducing via said fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”: dimensionality reduction) one or more of the data points (said via fig. 2: “All data”) used to train (said via fig. 2: “Train data”) the second (said as shown by duplicates in fig. 2 such as “the model was…calculated…10” times being “the final performance” and “Support Vector Machine…coupled with Support Vector Machine”) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”);
executing (finally) the first (said as shown by duplicates in fig. 2 such as “the model was…calculated…10” times being “the final performance”) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”) using the one or more modified (or reduced via said fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”: dimensionality reduction) data points (said via fig. 2: “All data”); and
identifying (via “label” “examples”), subsequent to said (finally) executing the first (as shown by duplicates in fig. 2 such as “the model was…calculated…10” times) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”), one or more instances (said “examples”) of bias (said represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%) by observing a change (via “relationship between features and labels changes”) to one or more class (via “classification errors”, cited below: Vabalas: page 2, 3rd paragraph) designations (said “labels”) attributed to the one or more modified (said reduced via said fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”: dimensionality reduction) data points (said via fig. 2: “All data”) as compared (via said changed example labels) to before said modifying (said or reducing via said fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”: dimensionality reduction); and 
outputting (as shown by the arrows in fig. 2), to at least one user, identifying (said via “label” “examples”) information (said via fig. 2: “All data”) pertaining to the one or more instances (said “examples”) of bias (said represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%); 
wherein the method is carried out by at least one computing device (via:












page 2, 3rd paragraph:
“Despite small sample sizes being common, and the fact that limited data is problematic for pattern recognition [1, 5, 6], only a limited number of papers have systematically investigated how the ML validation process should be designed to help avoid optimistic performance estimates. Previous papers [5, 7] used synthetic Gaussian noise data to investigate how far experimental classification error is from the expected theoretical chance level. Varma and Simon [7] used a fixed sample size dataset (40 samples), and investigated the change from theoretical chance performance when using two different Cross-Validation (CV) approaches for selecting the data used for model development and model validation (Different CV methods are introduced in detail in section Validation strategies). They showed that the nested CV approach which avoided pooling training and testing data produced an “almost unbiased estimate [of
performance]” [7]. In comparison, Combrisson and Jerbi [5] used only a K-fold CV approach and varied sample size. They found that with small sample sizes empirical accuracies overshot theoretical chance level and were more variable.”;

page 3:
“In the Results section we show that while certain validation methods produce significantly overoptimistic performance estimates (> 50%), especially when sample size is small, others are robust regardless of sample size. We also show that the feature selection process, if performed on pooled training and testing data, is contributing to bias considerably more than parameter tuning. Results for other factors apart from sample size influencing overfitting and results on different validation approaches with discriminable data are also included. After the results section we have graphically illustrated why models, developed on pooled training and testing data, can produce overoptimistic performance estimates. The same concepts as in our main simulations are exemplified in a simpler and more intuitive way, as we are aware that some readers may be less familiar with ML. Program code used for the main simulations performed
in this study is provided with this article in S1 File.”;

















page 7:
“Methods

In this study, by using Gaussian noise as data, we have simulated a situation in which robust validation should produce two-class classification accuracy approaching theoretical chance level of 50%. We tested five validation approaches: Train/Test Split, K-fold, Nested, and two types of partially nested CV. Importantly, we performed these simulations using different sample sizes to provide an insight into whether the tendency to report higher performance estimates with smaller sample size could be due to insufficiently reliable validation. In addition, we have tested what other factors, apart from sample size, influence overfitting and how different validation methods perform with discriminable data. To show that simulation results generalise to algorithms differing in complexity, two algorithms were used. One, computationally demanding and complex, where Support Vector Machine (SVM) [12] classifier with Radial Basis Function (RBF) kernel was coupled with Support Vector Machine Recursive Feature
Elimination (SVM-RFE) [13] feature selection. Another, simpler, where the logistic regression classifier was coupled with two-sample t-test feature selection.

Overview of procedures

Typically, ML algorithm development starts with data cleaning and outlier removal, then the data is normalised to ensure that separate features have a balanced influence on the labels. Then if number of features is large, which is especially true for neuroimaging and gene expression studies [3, 14, 15], feature selection is performed. This is done because ML algorithms tend to achieve optimal performance in a reduced feature space [6, 16]. Many of the ML models include hyper-parameters which can be fine-tuned. This process is commonly coupled with CV to not only achieve optimal algorithm performance, but also to control overfitting. Finally, the model is validated to ensure that it generalises to “unseen” data. Below the development stages of ML algorithms which were used in this study are described with a particular emphasis on validation.”

page 8:
	“SVM separates the classes by maximising the gap between training examples from each class. The examples in the test data are when assigned a label based on which side of the gap they fall. The SVM algorithm assumes linear separability of classes, however in reality this assumption in rarely realistic. Therefore, a regularisation parameter C is introduced which weighs the importance of misclassification and allows SVM to fit a linear separating hyperplane with some of the examples being misclassified. Another method to deal with non-linearly separable classes is to use kernel functions. Kernel functions project features to a higher dimensional space. This enables the separation of classes which are non-linearly separable in the original space with a linear hyperplane in a higher dimensional space. In this study SVM with RBF kernel was used. RBF kernel has a regularisation parameter γ, which regulates the
spread of the kernel function and in turn determines the flexibility of the separating hyperplane. SVM was implemented with Libsvm library [17].”;

page 9, Feature selection, 2nd paragraph:
“SVM-RFE algorithm selects features based on how important they are for an SVM classifier to separate classes. SVM-RFE starts with a full feature set and in a number of iterations eliminates a set number of features which are deemed least important for separating classes by an SVM algorithm, using weight vector of dimension length(s) as a ranking criterion [13]. The algorithm removes least important features in iterations because in each iteration the relationship between features and labels changes. Top-ranked features are not necessarily the most relevant individually; they are, however, optimized by considering interdependencies with other features and the class. The final feature set is selected from the iteration in which SVM achieves best classification performance. In this study a single feature was eliminated in each SVM-RFE iteration and the final feature set was selected based on the highest classification accuracy by linear SVM with C set to 1.”; 
page 9, Validation and performance evaluation, 3rd paragraph:
“K-Fold CV. First, a single well-defined model was developed by selecting features and tuning parameters, Fig 2B. Then the model was validated by separating one-tenth of the data for validation and the rest for training. CV process was iteratively repeated ten times. In each fold, a different one-tenth of the data was selected for validation. In such way, in the end, all the data was used for training and also for validation. The final performance of the model was then calculated as a mean of classification performances in each of the ten validation folds.”; and

page 16: 3rd paragraph:
“Our results demonstrate the importance of separating training and testing data to avoid optimistically biased performance estimates. K-Fold CV was not sufficient to control overfitting. By simply testing the performance of the algorithm with the data which was also involved in algorithm training was enough to produce biased results with small sample sizes. However, a substantial bias still remained even with sample size of 1000. On the other hand, similar to [7] we have found that Nested CV gave unbiased performance estimates. Furthermore, Nested CV results were unbiased regardless of the sample size.”
page 17:
“Combrisson and Jerbi [5] have shown that K-fold CV is more biased with small sample size, while Varma and Simon [7] have shown that K-fold CV is biased, and Nested CV is not at a fixed sample size. We have filled the gap by investigating both factors associated with bias, namely validation method and sample size together. We have demonstrated that validation methods which do not separate training and testing data at model development stage lead to overoptimistic performance estimates. Moreover, the bias is strongest with small sample sizes. This gives a good indication why in our own and other surveys [3, 4] there was a negative relationship between reported performance estimates and sample size.”).

	

Thus, Vabalas does not teach, as indicated in bold above, the claimed:
A’.	generating an updated version of the dataset by filtering, from the dataset, the data points identified with at least one of (i) an inaccurate class designation and (ii) a class designation associated with a confidence value below the given threshold;
A.	a second model;
B.	outputting, to at least one user; and
C.	wherein the method is carried out by at least one computing device.
Accordingly, Guyon teaches the claimed “second model” or “multiple classifiers” or involving more than one classifier via page 413:
“Recursive Feature Elimination (RFE) requires training multiple classifiers on subsets of features of decreasing size. The training time scales linearly with the number of classifiers to be trained. Part of the calculations can be reused. Matrix H does not need to be re-computed entirely. The partial scalar products of the eliminated features can be subtracted. Also, the coefficients α can be initialized to their previous value. Our Matlab implementation of SVM RFE on a Pentium processor returns a gene ranking in about 15 minutes for the entire Colon dataset (2000 genes, 62 patients) and 3 hours on the Leukemia dataset (7129 genes, 72 patients). Given that the data collection and preparation may take several months or years, it is quite acceptable that the data analysis takes a few hours.”









Thus, one of skill in the art of Recursive Feature Elimination (RFE) can modify Vabalas’ teaching of said as shown by duplicates in fig. 2 such as “the model was…calculated…10” times being “the final performance” and “Support Vector Machine…coupled with Support Vector Machine” by:
a)	coupling a support vector machine (SVM) to Vabalas’ teaching of said “the model was…calculated…10” times;
b)	performing training 10 times as a coupled SVM pair; and
c)	recognizing that the modification is predictable or looked forward to because the modification results in “better features…than by using…a single classifier” via Guyon, page 413, 2nd paragraph:
“All our feature selection experiments using various classifiers (SVM, LDA, MSE) indicated that better features are obtained by using RFE than by using the weights of a single classifier (see Section 6.2 for details). Similarly, better results are obtained by eliminating one feature at a time than by eliminating chunks of features. However, there are only significant differences for the smaller subset of genes (less than 100). This suggests that, without trading accuracy for speed, one can use RFE by removing chunks of features in the first few iterations and then remove one feature at a time once the feature set reaches a few hundreds. This may become necessary if the number of genes increases to millions, as is expected to happen in the near future.”

	Thus, the combination does not teach the remaining limitations:
A’.	generating an updated version of the dataset by filtering, from the dataset, the data points identified with at least one of (i) an inaccurate class designation and (ii) a class designation associated with a confidence value below the given threshold;
B.	outputting, to at least one user; and
C.	wherein the method is carried out by at least one computing device


Accordingly, Yan teaches all of the above differences:
A’.	generating an updated version of the dataset by filtering (via “update the training sets at each round by filtering”: [0045]), from the dataset, the data points identified (or “labeled” [0037], 2nd S) with at least one of (Markush limitation follows: A and B) (A) i) an inaccurate class designation (via “mislabeled”: [0015], penultimate S) and (B) (ii) a class designation (via said labeling) associated with a confidence value (via fig. 1:116: “CONFIDENT RESULTS OF MODEL”) below the given threshold (via fig. 1:120: “STOP CRITERION SATISFIED”);
A.	a second model (via fig. 1:112: “CLASSIFICATION MODEL N”);
B.	outputting (via fig. 4:416: “Output Device(s)), to at least one user; and
C.	wherein the method is carried out by at least one computing device (as shown in said fig. 4).










Thus, one of ordinary skill in the art of training biased classifiers and searching or querying can modify Vabalas’ said fig. 2: “All data” with Yan’s teaching of said “update the training sets at each round by filtering” by:
a)	making said Vabalas’ fig. 2: “All data” be as Yan’s fig. 1: “RULE BASED TRANING SET”;
b)	making Vabalas’ fig. 2D: “a) Feature selection b) Parameter tuning”, as already modified via the combination of Guyon’s coupled SVM, as be as Yan’s fig. 1:112: “CLASSIFICATION MODEL”; and
c)	recognizing that the modification is predictable or looked forward to because Yan’s teaching of fig. 1 “tackles the problem of classifier learning from biased and noisy rule-generated training data to learn a user's intent when submitting a search query”, Yan, [0015].











Regarding claim 2, Vabalas as combined teaches the computer-implemented method of claim 1, comprising: 
training (said via fig. 2: “Train data”) the first (as shown by duplicates in fig. 2 such as “the model was…calculated…10” times) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”) using the identifying (said via “label” “examples”) information (said via fig. 2: “All data”) pertaining to the one or more instances (said “examples”) of bias (said represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%).
Regarding claim 3, Vabalas as combined teaches the computer-implemented method of claim 1, wherein said determining (said determining test results comprising bias) bias (or “bias” represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%) comprises determining (via said feature selection) one or more attributes (comprised by said via “classification errors”) of each of the one or more data points (said via fig. 2: “All data”) responsible (“responsible” comprising being the agent or cause (of some action)) for one or more corresponding class (said via “classification errors”) designations (via said “labels”) .	







Regarding claim 5, Vabalas as combined teaches the computer-implemented method of claim 3, wherein said modifying (said or reducing via said fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”: dimensionality reduction) the one or more data points (said via fig. 2: “All data”) comprises modifying (via said or reducing via said fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”: dimensionality reduction) the one or more attributes (comprised by said via “classification errors”) of each of the one or more data points (said via fig. 2: “All data”) responsible for the one or more corresponding class (said via “classification errors”) designations (via said “labels”) .
Regarding claim 6, Vabalas as combined teaches the computer-implemented method of claim 5, wherein said modifying (said or reducing via said fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”: dimensionality reduction, as modified via the combination) the one or more attributes (comprised by said via “classification errors”) comprises, for each of the one or more attributes (comprised by said via “classification errors”) that is a categorical attribute, substituting (via “replacement”) a first set of one or more categorical values for a second set of one or more categorical values (via page 8:
“Data normalisation, cleaning

As the data was drawn from standard normal distribution data normalisation was not necessary and was omitted. Data cleaning was also not necessary in this case; however, with real datasets, missing value replacement/removal and outlier replacement/removal are usually necessary steps.”).




Thus the combination already teaches as indicated in bold above, the claimed
“substituting a first set of one or more categorical values for a second set of one or more categorical values”. Accordingly, Yan teaches:
substituting (via said “update the training sets at each round by filtering” comprising “something that is substituted”) a first set of one or more categorical values (comprising “binary…vector…1”,[0032],3rd & 4th S, being substituted) for a second set of one or more categorical values (wherein “update” is defined via Dictionary.com:
update
verb (used with object), up·dat·ed, up·dat·ing.
1	to bring (a book, figures, or the like) up to date as by adding new information or making corrections:
to update a science textbook.

wherein “corrections” is defined:
correction
noun
1	something that is substituted or proposed for what is wrong or inaccurate; emendation.).











Regarding claim 7, Vabalas as combined teaches the computer-implemented method of claim 5, wherein said modifying (said or reducing via said fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”: dimensionality reduction) the one or more attributes (comprised by said via “classification errors”) comprises, for each of the one or more attributes (comprised by said via “classification errors”) that is a numerical attribute, utilizing  a given number of closest numerical values in lieu of one or more original numerical values attributed to the  numerical attribute.
Thus, the combination already teaches, as indicated in bold above, the claimed “utilizing a given number of closest numerical values in lieu of one or more original numerical values attributed to the 
utilizing (“only”) a given number of closest numerical values (or “closest” “vectors”) in lieu (“lieu” comprising: place;stead via said “closest”) of one or more original numerical values (comprised by “the training examples”) attributed to the 
“Although SVMs handle non-linear decision boundaries of arbitrary complexity, we limit ourselves, in this paper, to linear SVMs because of the nature of the data sets under investigation. Linear SVMs are particular linear discriminant classifiers (see Eq. (1)). An extension of the algorithm to the non-linear case can be found in the discussion section (Section 6). If the training data set is linearly separable, a linear SVM is a maximum margin classifier. The decision boundary (a straight line in the case of a two-dimensional separation) is positioned to leave the largest possible margin on either side. A particularity of SVMs is that the weights wi of the decision function D(x) are a function only of a small subset of the training examples, called “support vectors”. Those are the examples that are closest to the decision boundary and lie on the margin. The existence of such support vectors is at the origin of the computational properties of SVM and their competitive classification performance. While SVMs base their decision function on the support vectors that are the borderline cases, other methods such as the method used by Golub et al. (1999) base their decision function on the average case. As we shall see in the discussion section (Section 6), this has also consequences on the feature selection process.”).

Regarding claim 8, Vabalas as combined teaches the computer-implemented method of claim 1, wherein said determining (via said selecting via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection” in the context of “investigating…bias”) bias (or “bias” represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%) comprises determining plausibility (via fig. 2: “Validation”) of the one or more modified data points (said via fig. 2: “All data”) based at least in part on one or more constraints (via “limited data”, cited above in the rejection of claim 1) related to the one or more data points (said via fig. 2: “All data”).














Regarding claim 9, Vabalas as combined teaches the computer-implemented method of claim 1, wherein said determining (via said testing for performance regarding bias results) bias (or “bias” represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%) comprises determining sufficiency (via “evaluate… sufficient…fitting”) of the one or more modified data points (said via fig. 2: “All data”) based at least in part on an input domain related to the dataset (via pages 10,11:
“Comparison of different validation methods
The effect of sample size on how close the empirical classification result is to theoretical chance level was examined. The sample size was manipulated and ranged from 20 to 1000. To evaluate classification results K-Fold, Nested, Train/Test Split and two types of partially nested validation were used. Fig 3 shows that by using both a complex algorithm, where SVM-RFE feature selection was coupled with a SVM-RBF classifier (SVM algorithm hereafter, Fig 3A), and a simpler algorithm, with t-test feature selection and a logistic regression classifier (logistic regression algorithm hereafter, Fig 3B), accuracies given by K-Fold CV were considerably higher than the theoretical chance level of 50%. The highest difference was observed with smaller sample sizes; however, the difference was still evident even at the sample size of N = 1000. In contrast accuracy distributions produced by using Nested CV and Train/Test Split did not statistically significantly differ from 50% chance level with SVM and logistic regression algorithms at 96.5% sample size points (p ranged from 4.3 × 10−4 to 0.997, a small
number of significant differences is expected by chance with 95% confidence level).
Two types of partially nested validation were also performed. In the first instance, only parameter tuning was nested while feature selection was performed on the pooled training and testing data in non-nested fashion. Fig 3 shows that nesting parameter tuning only was not sufficient to control overfitting. By using both SVM and logistic regression algorithms, empirical accuracies at each sample point were significantly higher than the 50% chance level (p ranged from 1.1 × 10−42 to 6.3 × 10−20 and from 1.7 × 10−33 to 1.7 × 10−22 respectively). The results were considerably different when feature selection was nested and only parameter tuning was performed on the pooled training and testing data. The curves approached 50% chance level, Fig 3. One sample t-tests showed that for the SVM algorithm empirical accuracy distributions were significantly higher than 50% chance level on 56% sample size points (p ranged
from 2.8 × 10−5 to 0.879). For a logistic regression algorithm accuracy distribution using this type of partially nested validation was higher than the chance level only at one sample size point (2%, p ranged from 0.039 to 1.0).”).



Regarding claim 10, Vabalas as combined teaches the computer-implemented method of claim 1, wherein each of the first (as shown by duplicates in fig. 2 such as “the model was…calculated…10” times) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”) and the second (said as shown by duplicates in fig. 2 such as “the model was…calculated…10” times being “the final performance” and “Support Vector Machine…coupled with Support Vector Machine”) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Fe” as modified via the combination) comprises a machine learning (or “machine learning (ML)” or “machine learning…Support Vector Machines (SVMs)”) model (said “Model” and “(SVMs)” via:




























Vabalas, page 2:

“Unfortunately, such techniques and large databases are of less use for traditional hypothesis driven research. Advances in neuroimaging, genomic, motion-tracking, eye-tracking and many other technology-based data collection methods have led to many datasets, which frequently have a small number of samples. Small samples are common because tasks and experimental protocols which maximally discriminate between different conditions are still under development and because of the costs associated with data collection involving human participants. For example, in our work with autistic adults, running an experiment to generate one sample of high dimensional data may require 1.5–4 hours of experimenter time (for running the experiment including set up and set down) and 3.5–6 hours of participant time (including travel time). In addition, it is difficult to recruit large numbers of autistic adults due to difficulties ccessing participants and encouraging participation. Collecting samples from thousands
of subjects is thus not feasible with the resources available for early stage work. However, there is still a critical need for robust and reliable machine learning (ML) methods using these smaller datasets.”; and

Guyon, page 391:

“2.2. Space dimensionality reduction and feature selection

A known problem in classification specifically, and machine learning in general, is to
find ways to reduce the dimensionality n of the feature space F to overcome the risk of
“overfitting”. Data overfitting arises when the number n of features is large (in our case
thousands of genes) and the number of training patterns is comparatively small (in our case a few dozen patients). In such a situation, one can easily find a decision function that separates the training data (even a linear decision function) but will perform poorly on test data. Training techniques that use regularization (see e.g. Vapnik, 1998) avoid overfitting of the data to some extent without requiring space dimensionality reduction. Such is the case, for instance, of Support Vector Machines (SVMs) (Boser, 1992; Vapnik, 1998; Cristianini, 1999). Yet, as we shall see from experimental results (Section 5), even SVMs benefit from space dimensionality reduction.”).






	Regarding claim 11, claim 11 is rejected the same as claim 1. Thus, argument presented in claim 1 is equally applicable to claim 11. Accordingly, Vabalas teaches claim 11 of a computer program product (or “Program code”, cited in the rejection of claim 1) comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computing device to cause the computing device to: 
obtain, in connection with execution of a first (as shown by duplicates in fig. 2 such as “the model was…calculated…10” times) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”), one or more class (said via “classification errors”) designations (said labels) attributed to data points (said via fig. 2: “All data”) from a dataset used to train the first (as shown by duplicates in fig. 2 such as “the model was…calculated…10” times) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”);
identify (via said feature selection) any of the data points (said via fig. 2: “All data”) associated with at least one of (i) an inaccurate (via said “theoretical” “error”) class designation (via said labels) and (ii) a class designation associated with a confidence value below a given threshold; 
generate an updated version of the dataset by filtering, from the dataset, the data points identified with at least one of (i) an inaccurate class designation and (ii) a class designation associated with a confidence value below a given threshold;


train a second model using the data points (said via fig. 2: “All data”) from the updated version of the dataset, the second (said as shown by duplicates in fig. 2 such as “the model was…calculated…10” times being “the final performance” and “Support Vector Machine…coupled with Support Vector Machine”) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Fe…”) is related to the first (as shown by duplicates in fig. 2 such as “the model was…calculated…10” times) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”); 
determine (or testing for) bias related to at least a portion of those data points (said via fig. 2: “All data”) used to train the second (said as shown by duplicates in fig. 2 such as “the model was…calculated…10” times being “the final performance” and “Support Vector Machine…coupled with Support Vector Machine”) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Fe…”), wherein said determining (or said investigative testing) comprises: 
modifying (via said feature selection) one or more of the data points (said via fig. 2: “All data”) used to train the second (said as shown by duplicates in fig. 2 such as “the model was… calculated…10” times being “the final performance” and “Support Vector Machine…coupled with Support Vector Machine”) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Fe…”); 


executing the first (as shown by duplicates in fig. 2 such as “the model was…calculated…10” times) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”) using the one or more modified (via said feature selection) data points (said via fig. 2: “All data”); and 
identifying (via said label), subsequent to said executing the first (as shown by duplicates in fig. 2 such as “the model was…calculated…10” times) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”), one or more instances (said “examples”) of bias (said represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%) by observing a (label) change to one or more class (said via “classification errors”) designations (via said labels) attributed to the one or more modified (via said feature selection) data points (said via fig. 2: “All data”) as compared (via said changed label) to before said modifying (said or reducing via said fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”: dimensionality reduction); and 
output, to at least one user, identifying (said via “label” “examples”) information (said via fig. 2: “All data”) pertaining to the one or more instances (said “examples”) of bias (said represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%).






Thus, Vabalas does not teach, as indicated in bold above, the claimed:
A.’	generate an updated version of the dataset by filtering, from the dataset, the data points identified with at least one of (i) an inaccurate class designation and (ii) a class designation associated with a confidence value below a given threshold;
A.	computing device;
B.	second model; and
C.	output, to at least one user.
Accordingly as discussed above, Guyon teaches the claimed “second model” or “multiple classifiers” or involving more than one classifier.
Thus as discussed above, one of skill in the art of Recursive Feature Elimination (RFE) can modify Vabalas’ teaching of said as shown by duplicates in fig. 2 such as “the model was…calculated…10” times being “the final performance” and “Support Vector Machine…coupled with Support Vector Machine” by:
a)	coupling a support vector machine (SVM) to Vabalas’ teaching of said “the model was…calculated…10” times;
b)	performing training 10 times as a coupled SVM pair; and
c)	recognizing that the modification is predictable or looked forward to because the modification results in “better features…than by using…a single classifier” via Guyon.
	




Thus as discussed above, the combination does not teach the remaining limitations:
A.’	generate an updated version of the dataset by filtering, from the dataset, the data points identified with at least one of (i) an inaccurate class designation and (ii) a class designation associated with a confidence value below a given threshold;
B.	outputting, to at least one user;
C.	wherein the method is carried out by at least one computing device
Accordingly as discussed above, Yan teaches all of the above differences:
A’.	generating an updated version of the dataset by filtering (via “update the training sets at each round by filtering”: [0045]), from the dataset, the data points identified (or “labeled” [0037], 2nd S) with at least one of (Markush limitation follows: A and B) (A) i) an inaccurate class designation (via “mislabeled”: [0015], penultimate S) and (B) (ii) a class designation (via said labeling) associated with a confidence value (via fig. 1:116: “CONFIDENT RESULTS OF MODEL”) below the given threshold (via fig. 1:120: “STOP CRITERION SATISFIED”);
A.	a second model (via fig. 1:112: “CLASSIFICATION MODEL N”);
B.	outputting (via fig. 4:416: “Output Device(s)), to at least one user; and
C.	wherein the method is carried out by at least one computing device (as shown in said fig. 4).




Thus as discussed above, one of ordinary skill in the art of training biased classifiers and searching or querying can modify Vabalas’ said fig. 2: “All data” with Yan’s teaching of said “update the training sets at each round by filtering” by:
a)	making said Vabalas’ fig. 2: “All data” be as Yan’s fig. 1: “RULE BASED TRANING SET”;
b)	making Vabalas’ fig. 2D: “a) Feature selection b) Parameter tuning”, as already modified via the combination of Guyon’s coupled SVM, as be as Yan’s fig. 1:112: “CLASSIFICATION MODEL”; and
c)	recognizing that the modification is predictable or looked forward to because Yan’s teaching of fig. 1 “tackles the problem of classifier learning from biased and noisy rule-generated training data to learn a user's intent when submitting a search query”, Yan, [0015].










Regarding claim 12, claim 12 is rejected the same as claim 2. Thus, argument presented in claim 2 is equally applicable to claim 12. Accordingly, Vabalas as combined teaches the computer program product of claim 11, wherein the program instructions executable by a computing device further cause the computing device to:
train the first (as shown by duplicates in fig. 2 such as “the model was… calculated…10” times) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”) using the identifying (said via “label” “examples”) information (said via fig. 2: “All data”) pertaining to the one or more instances (said “examples”) of bias (said represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%).
Regarding claim 13, claim 13 is rejected the same as claim 3. Thus, argument presented in claim 3 is equally applicable to claim 13. Accordingly, Vabalas as combined teaches claim 13 of the computer program product of claim 11, wherein said determining (via said investigating results of bias) bias (or “bias” represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%) comprises determining (via said feature selection) one or more attributes (comprised by said via “classification errors”) of each of the one or more data points (said via fig. 2: “All data”) responsible for one or more corresponding class (said via “classification errors”) designations (via said labels).



Regarding claim 15, claim 15 is rejected the same as claim 8. Thus, argument presented in claim 8 is equally applicable to claim 15. Accordingly, Vabalas as combined teaches claim 15 of the computer program product of claim 11, wherein said determining (said determining presence of tested bias effects) bias (or “bias” represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%) comprises determining plausibility (via said fig. 2: “Validation”) of the one or more modified data points (said via fig. 2: “All data”) based at least in part on one or more constraints (via “limited data”) related to the one or more data points (said via fig. 2: “All data”).













Regarding claim 16, claim 16 is rejected the same as claims 1 and 11. Thus, argument presented in claims 1 and 11 is equally applicable to claim 16. Accordingly, Vabalas as combined teaches claim 16 of a system comprising: 
a memory; and 
at least one processor operably coupled to the memory and embodying at least one program configured for:
obtaining, in connection with execution of a first (as shown by duplicates in fig. 2 such as “the model was…calculated…10” times) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”), one or more class (said via “classification errors”) designations (said labeled examples) attributed to data points (said via fig. 2: “All data”) from a dataset used to train the first (as shown by duplicates in fig. 2 such as “the model was…calculated…10” times) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”);
identifying (via said feature selection) any of the data points (said via fig. 2: “All data”) associated (via being above or at) with at least one of (i) an inaccurate (via said 50% error) class designation (via an feature election or feature selection procedure resulting in said label) and (ii) a class designation associated with a confidence value below a given threshold; 
generating an updated version of the dataset by filtering, from the dataset, the data points identified with at least one of (i) an inaccurate class designation and (ii) a class designation associated with a confidence value below the given threshold;

training (said via fig. 2: “Train data”) a second model using the data points (said via fig. 2: “All data”) from the updated version of the dataset, the second (said as shown by duplicates in fig. 2 such as “the model was…calculated…10” times being “the final performance” and “Support Vector Machine…coupled with Support Vector Machine”) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Fe…”) is related to the first (as shown by duplicates in fig. 2 such as “the model was…calculated…10” times) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”); 
determining (via said selecting via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection” in the context of “investigating…bias” for said testing performance of results) bias (or “bias” represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%) related to at least a portion of those data points (said via fig. 2: “All data”) used to train the second (said as shown by duplicates in fig. 2 such as “the model was…calculated…10” times being “the final performance” and “Support Vector Machine…coupled with Support Vector Machine”) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Fe…”), wherein said determining (via said selecting and testing) comprises: 
modifying (via said dimensional feature characteristic selection) one or more of the data points (said via fig. 2: “All data”) used to train the second (said as shown by duplicates in fig. 2 such as “the model was… calculated…10” times being “the final performance” and “Support Vector Machine…coupled with Support Vector Machine”) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Fe…”); 
executing (for the last time) the first (as shown by duplicates in fig. 2 such as “the model was…calculated…10” times) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”) using the one or more modified (via said attribute selection) data points (said via fig. 2: “All data”); and 
identifying (via said category label), subsequent to said executing the first (as shown by duplicates in fig. 2 such as “the model was…calculated…10” times) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”), one or more instances (said “examples”) of bias (said represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%) by observing a change (of classification labels) to one or more class (said via “classification errors”) designations (said labels) attributed to the one or more modified (via said quality selection) data points (said via fig. 2: “All data”) as compared (via said changing identifiers) to before said modifying (said or reducing via said fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”: dimensionality reduction); and 
outputting (as shown by the arrows in fig. 2), to at least one user, identifying (said via “label” “examples”) information (said via fig. 2: “All data”) pertaining to the one or more instances (said “examples”) of bias (said represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%).





Thus, Vabalas does not teach, as indicated in bold above, the claimed:
A’.	generating an updated version of the dataset by filtering, from the dataset, the data points identified with at least one of (i) an inaccurate class designation and (ii) a class designation associated with a confidence value below the given threshold;
A.	at least one processor operably coupled to the memory;
B.	second model; and
C.	outputting, to at least one user.
Accordingly as discussed above, Guyon teaches the claimed “second model” or “multiple classifiers” or involving more than one classifier.
Thus as discussed above, one of skill in the art of Recursive Feature Elimination (RFE) can modify Vabalas’ teaching of said as shown by duplicates in fig. 2 such as “the model was…calculated…10” times being “the final performance” and “Support Vector Machine…coupled with Support Vector Machine” by:
a)	coupling a support vector machine (SVM) to Vabalas’ teaching of said “the model was…calculated…10” times;
b)	performing training 10 times as a coupled SVM pair; and
c)	recognizing that the modification is predictable or looked forward to because the modification results in “better features…than by using…a single classifier” via Guyon.
	




Thus as discussed above, the combination does not teach the remaining limitations:
A’.	generating an updated version of the dataset by filtering, from the dataset, the data points identified with at least one of (i) an inaccurate class designation and (ii) a class designation associated with a confidence value below the given threshold;
B.	outputting, to at least one user;
C.	wherein the method is carried out by at least one computing device
Accordingly as discussed above, Yan teaches all of the above differences:
A’.	generating an updated version of the dataset by filtering (via “update the training sets at each round by filtering”: [0045]), from the dataset, the data points identified (or “labeled” [0037], 2nd S) with at least one of (Markush limitation follows: A and B) (A) i) an inaccurate class designation (via “mislabeled”: [0015], penultimate S) and (B) (ii) a class designation (via said labeling) associated with a confidence value (via fig. 1:116: “CONFIDENT RESULTS OF MODEL”) below the given threshold (via fig. 1:120: “STOP CRITERION SATISFIED”);
A.	a second model (via fig. 1:112: “CLASSIFICATION MODEL N”);
B.	outputting (via fig. 4:416: “Output Device(s)), to at least one user; and
C.	wherein the method is carried out by at least one computing device (as shown in said fig. 4).




Thus as discussed above, one of ordinary skill in the art of training biased classifiers and searching or querying can modify Vabalas’ said fig. 2: “All data” with Yan’s teaching of said “update the training sets at each round by filtering” by:
a)	making said Vabalas’ fig. 2: “All data” be as Yan’s fig. 1: “RULE BASED TRANING SET”;
b)	making Vabalas’ fig. 2D: “a) Feature selection b) Parameter tuning”, as already 
modified via the combination of Guyon’s coupled SVM, as be as Yan’s fig. 1:112: “CLASSIFICATION MODEL”; and
c)	recognizing that the modification is predictable or looked forward to because Yan’s teaching of fig. 1 “tackles the problem of classifier learning from biased and noisy rule-generated training data to learn a user's intent when submitting a search query”, Yan, [0015].







	


Regarding claim 17, claim 17 is rejected the same as claim 2. Thus, argument presented in claim 2 is equally applicable to claim 17. Accordingly, Vabalas as combined teaches claim 17 of the system of claim 16, wherein the at least one program is further configured for: 
training (said via fig. 2: “Train data”) the first (as shown by duplicates in fig. 2 such as “the model was…calculated…10” times) model (said via fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”) using the identifying (said via “label” “examples”) information (said via fig. 2: “All data”) pertaining to the one or more instances (said “examples”) of bias (said represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%).
Regarding claim 18, claim 18 is rejected the same as claim 3. Thus, argument presented in claim 3 is equally applicable to claim 18. Accordingly, Vabalas as combined teaches claim 18 of the system of claim 16, wherein said determining (via said selecting features and testing of bias) bias (or “bias” represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%) comprises determining (via said feature selection) one or more attributes (comprised by said via “classification errors”) of each of the one or more data points (said via fig. 2: “All data”) responsible for one or more corresponding class (said via “classification errors”) designations (said change of labels).



Regarding claim 20, claim 20 is rejected the same as claims 1 and 11 and 16. Thus, argument presented in claims 1 and 11 and 16 is equally applicable to claim 20. Accordingly, Vabalas teaches claim 20 of a computer-implemented method comprising: 
obtaining, in connection with execution of a first machine learning model, one or more class (said via “classification errors”) designations (said labels) attributed to data points (said via fig. 2: “All data”) from a dataset used to train the first machine learning model; 
identifying (via said quality selection) any of the data points (said via fig. 2: “All data”) associated with a class designation (via said label) associated with a confidence value (or “confidence level”, cited below) below (via overshot) a given threshold (or overshooting via said targeted theoretical error at 50% as indicated in fig. 3: wherein two curves overshot due to bias and “noise”, cited below and thus needs to be zeroed to the 50% target); 
generating an updated version of the dataset by filtering, from the dataset, the data points identified as associated with a class designation associated with a confidence value below the given threshold;
training (said via fig. 2: “Train data”) a second machine learning model using the data points (said via fig. 2: “All data”) from the updated version of the dataset, 


determining (via said testing for) bias (or “bias” represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%) related to at least a portion of those data points (said via fig. 2: “All data”) used to train the second machine learning model, wherein said determining (or said testing) comprises: 
modifying (via said picking) one or more of the data points (said via fig. 2: “All data”) used to train the second machine learning model; 
executing the first machine learning model using the one or more modified (said picked) data points (said via fig. 2: “All data”); and 
identifying (via said category labels), subsequent to said executing the first machine learning model, one or more instances (said “examples”) of bias (said represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%) by observing a change (via said label change) to one or more class (said via “classification errors”) designations (said labels) attributed to the one or more modified data points (said via fig. 2: “All data”) as compared (via said label change) to before said modifying (said or reducing via said fig. 2: “Model” or as “ a) Parameter tuning b) Feature selection”: dimensionality reduction); 
outputting (as shown by the arrows in fig. 2), to at least one user, identifying (said via “label” “examples”) information (said via fig. 2: “All data”) pertaining to the one or more instances (said “examples”) of bias (said represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%); and 


training (said via fig. 2: “Train data”) the first machine learning model using the identifying (said via “label” “examples”) information (said via fig. 2: “All data”) pertaining to the one or more instances (said “examples”) of bias (said represented in fig. 3: “K-Fold”, the highest curve starting at 100% and goes to 60%); 
wherein the method is carried out by at least one computing device (via:
page 10:
“Implementation
The results section is organised as follows. First, we compare five different validation methods using Gaussian noise data as features. Data was split into two equally sized subsets for each class. The feature set started with 50 features and was reduced by using feature selection. Sample size was manipulated, and classification results were evaluated using Train/Test Split, Kfold CV, Nested CV and two types of partially nested CV. The performance estimate was accuracy. To obtain accuracy distributions models were trained 50 times at each sample point, then validation results were compared against theoretical chance level using confidence level of 95%.”).

	Thus, Vabalas does not teach, as indicated in bold above, the claimed:

A’.	generating an updated version of the dataset by filtering, from the dataset, the data points identified with at least one of (i) an inaccurate class designation and (ii) a class designation associated with a confidence value below the given threshold;
A.	a second machine learning model;
B.	outputting, to at least one user; and
C.	wherein the method is carried out by at least one computing device.
	Accordingly as discussed above, Guyon teaches the claimed “second model” or “multiple classifiers” or involving more than one classifier.




Thus as discussed above, one of skill in the art of Recursive Feature Elimination (RFE) can modify Vabalas’ teaching of said as shown by duplicates in fig. 2 such as “the model was…calculated…10” times being “the final performance” and “Support Vector Machine…coupled with Support Vector Machine” by:
a)	coupling a support vector machine to Vabalas’ teaching of said “the model was…calculated…10” times;
b)	performing training 10 times as a coupled SVM pair; and
c)	recognizing that the modification is predictable or looked forward to because the modification results in “better features…than by using…a single classifier” via Guyon.
	Thus, the combination does not teach the remaining limitations:
A’.	generating an updated version of the dataset by filtering, from the dataset, the data points identified with at least one of (i) an inaccurate class designation and (ii) a class designation associated with a confidence value below the given threshold;
B.	outputting, to at least one user; and
C.	wherein the method is carried out by at least one computing device








Accordingly as discussed above, Yan teaches all of the above differences:
A’.	generating an updated version of the dataset by filtering (via “update the training sets at each round by filtering”: [0045]), from the dataset, the data points identified (or “labeled” [0037], 2nd S) with at least one of (Markush limitation follows: A and B) (A) i) an inaccurate class designation (via “mislabeled”: [0015], penultimate S) and (B) (ii) a class designation (via said labeling) associated with a confidence value (via fig. 1:116: “CONFIDENT RESULTS OF MODEL”) below the given threshold (via fig. 1:120: “STOP CRITERION SATISFIED”);
A.	a second model (via fig. 1:112: “CLASSIFICATION MODEL N”);
B.	outputting (via fig. 4:416: “Output Device(s)), to at least one user; and
C.	wherein the method is carried out by at least one computing device (as shown in said fig. 4).











Thus as discussed above, one of ordinary skill in the art of training biased classifiers and searching or querying can modify Vabalas’ said fig. 2: “All data” with Yan’s teaching of said “update the training sets at each round by filtering” by:
a)	making said Vabalas’ fig. 2: “All data” be as Yan’s fig. 1: “RULE BASED TRANING SET”;
b)	making Vabalas’ fig. 2D: “a) Feature selection b) Parameter tuning”, as already twice modified via the combination of Guyon’s coupled SVM, as be as Yan’s fig. 1:112: “CLASSIFICATION MODEL”; and
c)	recognizing that the modification is predictable or looked forward to because Yan’s teaching of fig. 1 “tackles the problem of classifier learning from biased and noisy rule-generated training data to learn a user's intent when submitting a search query”, Yan, [0015].










Claims 4,14 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vabalas et al. (Machine learning algorithm validation with a limited sample size) in view of Guyon et al. (Gene Selection for Cancer Classification using Support Vector Machines) and Yan et al. (US Patent App. Pub. No.: US 2011/0289025 A1) as applied above further in view of Zavesky et al. (US Patent App. Pub. No.: US 2019/0188605 A1).
Regarding claim 4, Vabalas as combined teaches the computer-implemented method of claim 3, wherein said determining the one or more attributes (comprised by said via “classification errors”) comprises utilizing at least one local interpretable model- agnostic explanation algorithm.
	Thus, Vabalas does not teach, as indicated in bold, the claimed “local interpretable model-agnostic explanation algorithm”. Accordingly, Zavesky teaches “ ‘LIME’ “via:




















“[0033] The service response 134 also can help a user (e.g., a machine learning 
model developer) understand the deeper insights of the machine learning model(s) 112 created by the machine learning system 106 under their management.  Specifically, the model understanding service can help determine the importance and the tolerance of a specific input feature that keeps the same output feature.  The user system 108 can create, based upon the service response 134, one or more visualizations (e.g., graphs, charts, spreadsheets, and/or the like) and can present the visualization(s) to the user.  Alternatively or additionally, the user system 108 can create, based upon the service response 134, one or more statistical comparisons of model performance by comparing one or more different machine learning models 112 with variants of the features 118 and samples selected from the output data set 126 that are produced and discovered by the model understanding as-a-service system 108.  For example, if the machine learning models 112 trained on a slightly varying dataset for a year showed different output characteristics, but the removal of one or more of the features 118 had no impact on those characteristics, a conclusion can be made that either other overwhelming feature conditions exist or that the feature(s) 118 removed function to offset or "cancel each other out" in terms of output effects.  Through this exploration and discovery process, the model understanding service can learn to prune unnecessary input features based on their influence on the model output (e.g., the output data set 126 for a given input data set 124).  Moreover, beyond singular feature importance, the model understanding service can help to map outputs from machine learning models 112 back to traditional external data (e.g., demographics) by selecting only the most important feature sets--even if there multiple features are deemed important.  For example, selecting only "approval" outputs could allow the machine learning model(s) 112 to find positively correlated features, which can then be mapped to external datasets.”

; and

“[0049] The model understanding as-a-service system 110 can observe and build 
correlations to detect features biases.  In some embodiments, the model 
understanding as-a-service system 110 can build and update a parallel model 
with its output used as training for the machine learning model 112 under 
analysis.  In some embodiments, the model understanding as-a-service system 110 
can distort input and correlate output after distortion to identify and verify 
feature bias.  In some embodiments, the model understanding as-a-service system 
110 can suppress certain input and correlate output after suppression to 
identify and verify feature bias.  In some embodiments, the model understanding 
as-a-service system 110 can weight certain features 118 (i.e., perturbing) to 
measure the feature importance.  In these embodiments, local interpretable 
model-agnostic explanations or "LIME" open source technologies can be utilized 
to measure feature importance.”

	

Thus, one of skill in the art of features or dimensions can modify the combinations’ feature selection with LIME by:
a)	weighting the features under SVM paired feature selection with a “weight”, Zavesky, cited above;
b)	measuring the weighted features with LIME “to measure the feature importance”, Zavesky, cited above;
c)	developing the biased SVM pair based on the feature importance; and
d)	recognizing that the modification is predictable or looked forward to because LIME is used to “help” “a user” or “a machine learning model developer” “determine the importance and the tolerance of a specific input feature”.













Regarding claim 14, claim 14 is rejected the same as claim 4. Thus, argument presented in claim 4 is equally applicable to claim 14. Accordingly, Vabalas as combined teaches claim 14 of the computer program product of claim 13, wherein said determining (via said picking out qualities) the one or more attributes (comprised by said via “classification errors”) comprises utilizing at least one local interpretable model-agnostic explanation algorithm.
Thus as discussed above, Vabalas does not teach, as indicated in bold, the claimed “local interpretable model-agnostic explanation algorithm”. Accordingly, Zavesky teaches “ ‘LIME’”.
Thus as discussed above, one of skill in the art of features or dimensions can modify the combinations features selection with LIME by:
a)	weighting the features under SVM paired feature selection with a “weight”, Zavesky, cited above;
b)	measuring the weighted features with LIME “to measure the feature importance”, Zavesky, cited above; and
c)	developing the SVM pair;
d)	recognizing that the modification is predictable or looked forward to because LIME is used to “help” “a user” or “a machine learning model developer” “determine the importance and the tolerance of a specific input feature”.




Regarding claim 19, claim 19 is rejected the same as claim 4. Thus, argument presented in claim 4 is equally applicable to claim 19. Accordingly, Vabalas as combined teaches claim 19 of the system of claim 18, wherein said determining (said recognizing features) the one or more attributes (comprised by said via “classification errors”) comprises utilizing at least one local interpretable model-agnostic explanation algorithm.
Thus as discussed above, Vabalas does not teach, as indicated in bold, the claimed “local interpretable model-agnostic explanation algorithm”. Accordingly, Zavesky teaches “ ‘LIME’”.
Thus as discussed above, one of skill in the art of features or dimensions can modify the combinations features selection with LIME by:
a)	weighting the features under SVM paired feature selection with a “weight”, Zavesky, cited above;
b)	measuring the weighted features with LIME “to measure the feature importance”, Zavesky, cited above; and
c)	developing the biased SVMs; and
d)	recognizing that the modification is predictable or looked forward to because LIME is used to “help” “a user” or “a machine learning model developer” “determine the importance and the tolerance of a specific input feature”.




Suggestions
Applicant’s disclosure states:
“[0002] Model providers commonly aim to create a model which does not produce any discriminatory or biased behavior.  For example, existing data management approaches include attempting to identify bias-aware or discrimination-aware portions of data.  However, using such approaches, challenges exist in distinguishing bias-aware portions of data from the related notion of adversarial robustness.  Additionally, such approaches also face challenges with respect to generating realistic and/or plausible inputs with sufficient coverage of an input domain.”

	Thus distinguishing data from errors or noise or disturbance or agitation or interruption or perturbing encountered by a computer is not apparent in claim 1. Applicant’s [0015], such as filtering confidence, appears to be the solution to the disclosed “challenges”. In contrast, Yan (US 2011/0289025) as applied in the above 35 USC 103 rejection teaches filtering out data and adding new data [0045]. Thus said [0015] and applicant associated fig. 1: “Low Confidence and/or Misclassified Data Points” being input or used in parallel as shown in applicant’s fig. 1:112: “Explanation System” appears as an indication of non-obviousness in view of the cited art.
Note that these suggestions are not provided with respect to overcoming 35 USC 101,112,102 and/or 103. These suggestion are mainly provided to seek out advantages in the disclosure regardless of 35 USC 101,112,102 and/or 103.
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENNIS ROSARIO whose telephone number is (571)272-7397. The examiner can normally be reached Monday-Friday, 9AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DENNIS ROSARIO/Examiner, Art Unit 2667

/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667