Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5-6, 8-12, 14-15, 17-18, and 19-21 are rejected under 35 U.S.C. 103 as being unpatentable over Elkington (US 2014/0279739) in view of Ragavan (US 2017/0293666). 
With respect to claim 1 and claim 10 Elkington (US 2014/0279739) teaches “ (a) at a plurality of times, [analyzing] a data source to determine whether data relating to a person has updated” in Fig. 4 items 401A-C (any one the records in Fig. represent an update to John Johnson’s data as it relates to that source; 401A, for example, represents and update to the “referral” source); see also ¶ 167 (updated information from any particular source used in further training); 
 “(b) when data for the person has been updated, storing the updated data in a database such that the database includes a running log specifying how the person's data has changed over time” in Fig. items 401A-C; in ¶¶ 68-69 and ¶ 167 (classifiers use new data that changes over time and 
“wherein the person's data includes values for a plurality of properties relating to the person” in Fig. 401A-C (John Johnson is a person and ID, F.name, C.name, email, phone, title, and properties are all types of properties): 
“(c) receiving an indication that a value for the particular property in the person's data was verified as accurate or inaccurate at a particular time” in ¶ 78, ¶¶ 81-84, ¶ 88 and ¶ 97 (“phone number” and/or “email” verified as accurate for “most recent” record); 
 “(d) retrieving, from the database based on the particular time, the person's data, including values for the plurality of properties, that were up-to-date at the particular time” in ¶ 78, ¶¶ 81-84, ¶ 88 and ¶ 97 (“most recent” is up-to-date; values include phone number or email); 
“(e) training a plurality of models, each module utilizing a different type of machine learning algorithm” in ¶ 39-41, ¶ 78, ¶ 131; 


1 predict whether another person's2 value for the particular property is accurate, whereby3 having the retrieved data be current to the particular time maintains the retrieved data's significance in training the model” in ¶ 160 (confidence score predicts that resolved record 603A is accurate (i.e. phone number and title values are accurate; this determination is used in training the model and maintains significance—see ¶ 78, ¶ 86, and ¶ 167). 
It appears Elkington fails to explicitly teach “monitoring.”  However, Ragavan teaches monitoring in ¶ 5. Elkington and Ragavan are analogous art because they are from the same field of endeavor. It would have been obvious to one skilled in the art before the effective filing date of the 
It appears Elkington fails to explicitly teach 
“(f) evaluating accuracy of the plurality of models using available training data; and” 
“(g) selecting a model from the plurality of models determined based on the evaluated accuracy” 
However, Examiner takes official notice that “(f) evaluating accuracy of the plurality of models using available training data; and (g) selecting a model from the plurality of models determined based on the evaluated accuracy” were well known in the art before the effective filing date of the invention.  Stated another way, it was well known and common sense in the computing arts before the effective filing date of the invention to test models to determine which model was the most accurate by looking at each respective models output.  
The motivation to combine the determining step with these well-known elements would have been to provide reliable data to the user. 
With respect to claim 2, 11, Elkington teaches “2. The method of claim 1, further comprising:  (f) determining, based on the person's data retrieved in (d), a plurality of features, each of the plurality of features describing a fact about the person's data retrieved in (d)” in ¶ 78 (features of a person’s data used in training); 
“wherein the training (e) comprises training the model using the determined features” in ¶ 78 (features of a person’s data used in training). 
With respect to claim 3, 12, Elkington teaches “3. The method of claim 2, wherein the determining (f) comprises determining the features 
With respect to claim 5, 14, Elkington teaches “5. The method of claim 1, further comprising: (f) applying the model to predict whether the other person's value in the plurality of properties is accurate” in ¶ 41; ¶ 72, ¶ 86 and ¶ 160 (each record has a prediction in the form of a confidence score).
With respect to claim 6, 15 Elkington teaches “6. The method of claim 1, wherein the applying (f) comprises: (i) for respective values in a plurality of values for the particular property of the other person, applying the model to the respective value to determine a score” in ¶ 160 (values selected based on confidence scores); 
“and (ii) selecting at least one value from the plurality of values based on the respective scores determined in (i)” in ¶ 160 (values selected based on confidence scores). 
With respect to claim 8, 17 Elkington fails to explicitly teach “The method of claim 1, wherein the person and the other person are health care providers and the person's and the other person's data includes demographic information4.” 
However, Examiner takes official notice that doctors and other health care providers and demographic information was well known in the art before the effective filing date of the invention.  
It would have been obvious to one skilled in the art before the effective filing date of the invention to modify the person and other person 
With respect to claim 9, 18 it appears Elkington fails to explicitly teach 9. The method of claim 1, wherein the person and the other person are health care providers and the person's data includes an indication of whether the person has engaged in fraud5.” 
However, Examiner takes official notice that doctors, nurses, and other health care providers were well known in the art before the effective filing date of the invention.  Examiner also takes official notice that data that includes indications of whether a person has been engaged in fraud was well known in the art before the effective filing date of the invention. 
It would have been obvious to one skilled in the art to modify the person and the other person in Elkington to include a health care providers and to modify the data in Elkington to include “an indication of whether the person has engaged in fraud.”  
The motivation would have been data integrity and data security.  
With respect to claim 19, Elkington teaches "19. A system for training a machine learning algorithm with temporally variant personal data, comprising: a computing device; a database that includes a running log specifying how a person's data has changed over time" n ¶ 78, ¶¶ 81-84, ¶ 
"wherein the person's data includes values for a plurality of properties relating to the person" in Fig. 401A-C (John Johnson is a person and ID, F.name, C.name, email, phone, title, and properties are all types of properties): 
"a data ingestion process implemented on the computing device and configured to: (i) at a plurality of times, [analyzing]  a data source to determine whether data relating to the person has updated" in Fig. 4 items 401A-C (any one the records in Fig. represent an update to John Johnson’s data as it relates to that source; 401A, for example, represents and update to the “referral” source); see also ¶ 167 (updated information from any particular source used in further training); 
"and (ii) when data for the person has been updated, storing the updated data in the database" in Fig. items 401A-C; in ¶¶ 68-69 and ¶ 167 (classifiers use new data that changes over time and historical data; both the historical and new data are used from any and all sources; Examiner using historical and new from any and all sources teaches “a running log specifying how the person's data has changed over time”); ¶ 111 (historical log explicitly taught); 
"an API monitor implemented on the computing device and configured to receive an indication that a value for the particular property in the person's data was verified as accurate or inaccurate at a particular time" in ¶ 78, ¶¶ 81-84, ¶ 88 and ¶ 97 (“phone number” and/or “email” verified as accurate for “most recent” record); 
" a querier implemented on the computing device and configured to retrieve, from the database based on the particular time, the person's data, 
"and a trainer implemented on the computing device and configured to train a model using the retrieved data and the indication such that the model can predict whether another person's value for the particular property is accurate, whereby having the retrieved data be current to the particular time maintains the retrieved data's significance in training the model" in ¶ 160 (confidence score predicts that resolved record 603A is accurate (i.e. phone number and title values are accurate; this determination is used in training the model and maintains significance—see ¶ 78, ¶ 86, and ¶ 167). 
It appears Elkington fails to explicitly teach “monitoring.”  However, Ragavan teaches monitoring in ¶ 5. Elkington and Ragavan are analogous art because they are from the same field of endeavor. It would have been obvious to one skilled in the art before the effective filing date of the invention to modify the analyzing in Elkington to include “monitoring” as taught by Ragavan.  The motivation would have been to save time by minimizing manual user intervention. 
With respect to claim 20, Elkington teaches “20. The system of claim 19, further comprising: a featurizer configured to determine, based on the person's data retrieved in (d), a plurality of features, each of the plurality of features describing a fact about the person's data retrieved in (d)” in ¶ 78 (features of a person’s data used in training); 
“wherein the training (e) comprises training the model using the determined features” in ¶ 78 (features of a person’s data used in training). 
With respect to claim 21, Elkington teaches “21. The system of claim 19, wherein the model predicts whether the other person's value in the plurality of properties is accurate. in ¶ 41; ¶ 72, ¶ 86 and ¶ 160 (each record has a prediction in the form of a confidence score).
Claims 4, 13, and 22   are rejected under 35 U.S.C. 103 as being unpatentable over Elkington US 2014/027973 in view of Ragavan US 2017/0293666 as applied to claim 1, 10, and 19 above and further in view of Aggarwal, Data Classification Algorithms and Applications, 6-11-2014. 
With respect to claim 4, 13, and 22 Elkington teaches a plurality of models.  See above.  Additionally, Examiner has established that a plurality of models was well known in the art before the effective filing date of the invention.  See response to arguments below. It appears Elkinton et al. fails to explicitly teach “a plurality of models comprises two or more of logistic regression, naive Bayes, elastic nets, neural networks, Bernoulli naive Bayes, multimodal naive Bayes, nearest neighbor classifiers, support vector machines.”  
However, Aggarwal teaches 
“a plurality of models comprises two or more of logistic regression” on p. 7 (“Logistic regression is a popular discriminative classifier, and its goal is to directly estimate the posterior probability P(Y(T) = i|X) from the training data”); 
“naive Bayes” on p. 7 (“This is referred to as conditional independence, and therefore the Bayes method is referred to as
‘naive.’ This simplification is crucial, because these individual probabilities can be estimated from the training data in a more robust way. The naive 
“elastic nets” on p. 49 (“Elastic net regularization [81]: In practice, it is common that a few features are highly correlated. In this situation, the Lasso tends to select only one of the correlated features [81]. To handle
features with high correlations, elastic net regularization is proposed. . .”); 
“neural networks” on p. 175-6 
Radial-basis function networks (RBF) are designed in a similar way to regular nearest neighbor classifiers, except that a set of N centers are learned from the training data. In order to classify a test instance, a distance is computed from the test instance to each of these centers x1 . . . xN, and a density function is computed at the instance using these centers. The combination of functions computed from each of these centers is computed with the use of a neural network
“Bernoulli naive Bayes” on p. 300 (emphasis added): 
Two classes of models are commonly used for naive Bayes classification. Both models essentially compute the posterior probability of a class, based on the distribution of the words in the document. These models ignore the actual position of the words in the document, and work with the “bag of words” assumption. The major difference between these two models is the assumption in terms of taking (or not taking) word frequencies into account, and the corresponding approach for sampling the probability space:
Multivariate Bernoulli Model: In this model, we use the presence or absence of words in a text document as features to represent a document. Thus, the frequencies of the words are not used for the modeling a document, and the word features in the text are assumed to be binary, with the two values indicating presence or absence of a word in text. Since the features to be modeled are binary, the model for documents in each class is a multivariate Bernoulli model.
“multimodal naive Bayes” on p. 300 (emphasis added): 
Two classes of models are commonly used for naive Bayes classification. Both models essentially compute the posterior probability of a class, based on the distribution of the words in the document. These models ignore the actual position of the words in the document, and work with the “bag of words” assumption. The major difference 
• Multinomial Model: In this model, we capture the frequencies of terms in a document by representing a document with a bag of words. The documents in each class can then be modeled as samples drawn from a multinomial word distribution. As a result, the conditional probability of a document given a class is simply a product of the probability of each observed word in the corresponding class.
“nearest neighbor classifiers” on p. 158 (“The classical example of an instance-based learning algorithm is the k-nearest neighbor classification algorithm, in which the k nearest neighbors of a classifier are used in order to create a local model for the test instance”); 
“support vector machines” on p. 144 (“It is well-known that Support Vector Machines (SVM) [57], Naive Bayesian (NB) [58], and Rocchio’s algorithm [60] are among the most popular techniques for text categorization, also called text classification”); p. 6 (“For example, an SVM would tend to prefer features in which the two classes separate out using a linear model”). 
It would have been obvious to one skilled in the art before the effective filing date of the invention to modify the plurality of models in Elkington et al. to include “a plurality of models comprises two or more of logistic regression, naive Bayes, elastic nets, neural networks, Bernoulli naive Bayes, multimodal naive Bayes, nearest neighbor classifiers, support vector machines” as taught by Aggarwal. The motivation would have been 
“to handle features with high correlations”; to “perform the product-wise simplification”; “directly estimate the posterior probability”; and to categorize text in which “two classes separate out using a linear model.”  See citations above. 
Claims 7 and 16  are rejected under 35 U.S.C. 103 as being unpatentable over Elkington US 2014/027973 in view of Ragavan US 2017/0293666 as applied to claim 6 and 15 above and further in view of Cook US 20150161210. 
With respect to claim 7 and 16 Elkington fails to teach, but Ragavan (US 2017/0293666) teaches “7. The method of claim 6, wherein the monitoring a) comprises monitoring a plurality of data sources to determine whether data relating to a person has updated” in ¶ 5; 
Elkington teaches “and wherein the applying (f) further comprises (iii) determining which of the plurality of data sources the at least one value selected in (ii) originated from” in Fig. 4 item 410A-C (each value in each record originated from Referral, trade show, and web form respectively); 
It appears Elkington et al. fails to teach, but Cook US 20150161210 A1 teaches 
“(iv) determining whether a client has permission to the data source determined in (iii)” in ¶ 76 
“(v) if the client lacks permission to the data source determined in (iii), filtering the at least one value from results before the results are presented to the client” in ¶ 76 
Cook and Elkington et al. are analogous art because they are in the same field of endeavor. 
It would have been obvious to one skilled in the art before the effective filing date of the invention to modify the applying in Elkington et al. to include determining whether a client has permission to the data source determined in “(v) if the client lacks permission to the data source determined in (iii), filtering the at least one value from results before the 
Response to Arguments
 
Applicant argues  
The Specification provides sufficient detail that one skilled in the art can reasonably conclude that the inventors had possession of the claimed invention. MPEP 2163. For example, the Specification teaches that "[u]sing the retrieved data and the indication, a model is trained such that the model can predict whether another person's value for the particular property is accurate. Specification, [0011]. The Specification further discloses that "using the features, trainer 1015 trains model 1022 such that model 1022 can predict whether another person's value for the particular property is accurate." Specification, [0118]. A person of ordinary skill in the art ("POSA") would have understood that "the training involves inputting a set of parameters, called features, and known correct or incorrect values for the input features... [and a]fter the model is trained, it may be applied to new features for which the appropriate solution is unknown." Specification, [0009]. A POSA would have understood that "predict[ing] whether another person's value for the particular property is accurate" would be determined using new features, that describe the "another person," in the trained model. Specification, [0009], [0011], [0018]. Thus, it would have been clear to a POSA that the inventors were in possession of the claimed invention.
These arguments are persuasive. 
	Applicant further argues 
Claims 5-7 and 14-16 stand rejected under 35 USC 112(b) for allegedly lacking an antecedent basis. Applicant respectfully disagrees with the rejections of claims 5 and 14. Neither of the claims recite "the applying," as alleged. Office Action, 5. Applicant respectfully requests the Office to reconsider the claim 5 and 14 rejections. Claims 6 and 15 are amended herein to correct the alleged insufficiencies, thereby rendering this rejection moot. 
The rejections under 112(b) are withdrawn in view of Applicant’s amendments. 
	Applicant further argues 

Elkington in view of Ragavan: 
Claims 1-6, 8-15 and 17-22 stand rejected under 35 U.S.C. § 103 as allegedly being unpatentable over U.S. Pat. Publ. No. 2014/0279739 Al to Elkington et al. ("Elkington") in view of U.S. Pat. Publ. No. 2017/0293666 Al to Ragavan et al. ("Ragavan"). Applicant respectfully disagrees and traverses. 
The Elkington-Ragavan combination fails to teach or suggest "(e) training a plurality of models using the retrieved data, each model utilizing a different type of machine learning algorithm; (f) evaluating accuracy of the plurality of models using available training data; and (g) selecting a model from the plurality of models determined based on the evaluated accuracy," as recited in amended claim 1. 
The Office agrees that Elkington fails to explicitly teach the foregoing feature. Office Action, 8-9 ("Elkington fails to explicitly teach "(f) evaluating accuracy of the plurality of models using available training data; and "(g) selecting a model from the plurality of models determined based on the evaluated accuracy"). 
The Office takes "official notice" that the foregoing features are allegedly "well known and common sense." Office Action, 9. However, the Office's official notice is improper because it is not "clear and unmistakable" and is not capable of "instant and unquestionable demonstration as being well-known." MPEP §2144.03. 
Examiner respectfully disagrees and offers the following references as an unquestionable demonstration that this feature was well known in the art before the effective filing date of the invention (emphasis added): 
US 20090157573 A1
[0042] Appendix A lists exemplary sets of attributes in named categories (i.e., Compositional, Electrical Design, Past Outage History, Derived, and Dynamic) that were used as training and test data in the trial applications. There were more than 400 different data attributes, which were investigated using different types of machine learning algorithms, to determine the most effective combination of attributes that predict future failures of feeders (e.g., OA). It will be understood that the set of attributes and the main categories listed in Appendix A are exemplary and can be modified or changed in practice, for example, in response to training results.
s an example, it may be practical to train failure-by-failure to produce a real-time ROC curve of prediction accuracy of the trained models. FIG. 9 shows Daily Area Under the ROC Curve (AUC) numbers that are calculated and plotted failure-by-failure over the year. In trials, the performance of IDSF models that were trained or re-trained on daily, weekly, and monthly basis was evaluated. In general, the models had very similar results, suggesting that the AUC is being controlled by dynamic attributes rather than by static attributes. Specifically, Load Pocket Weights summed over each feeder was selected as the most predictive attribute consistently throughout the hottest parts of the summer of 2005 (FIG. 10).
US 20190138946 A1
[0038] In other examples, the database server 210 may automatically generate a predictive learning model based in part on the plurality of features. For example, the database server 210 may generate the predictive learning model at learning model generator 260. In other examples, the database server 210 may generate the predictive learning model based on training a plurality of candidate machine learning models. The plurality of candidate machine learning models may include any number of machine learning algorithms used in predictive model building. The database server 210 may then evaluate the plurality of candidate machine learning models. This evaluation may be based in part on a predictive accuracy of each of the machines. The database server 210 may subsequently select the predictive machine learning model based in part on the evaluation (e.g., based on which model is most accurate, or is otherwise best suited for the selected data set or the desired predictive value). 
[0050] In some examples, the database server may then generate a predictive learning model 420 based in part on the plurality of features. The predictive learning model may utilize the received data set 405, the received selection of the prediction field 410, and the plurality of features 415 to generate the model 420. In some instances, upon generating the learning model 420, a plurality of candidate machine learning models may be trained 450. The plurality of candidate machine learning models 450 may be trained, for example, based on one or more of the data set or the prediction field received by the database server. Upon training the candidate machine models 450, the models may be evaluated based in part on a predictive accuracy of each of th777e plurality of models 455. Stated alternatively, the models may be evaluated to determine a threshold level of accuracy given the input received by the database server. Thus, in some examples, the predictive machine learning model that would result in a most-accurate prediction given the input received by the database server may be selected.
US 20150317563 A1
0030] In one embodiment, the tuning process (116, 114, 106) explores models with more or fewer features, and also considers different machine learning algorithms to build the model. The training and tuning process 102 produces a predictive model 108 that can be used to forecast hardware (e.g., GPU) performance outcomes for new applications. In one embodiment of the present disclosure, all aspects of the training and tuning process 102 run fully automatically without user intervention.
   [0059] Cross-validation is a known standard technique to estimate how well a statistical model built using a limited data set will generalize to an independent data set. Cross-validation may be used to evaluate the accuracy of predictive models, which indicates how well models are expected to perform in practice. Cross-validation involves several computation rounds. One round of cross-validation splits the data available in two disjoint sets: a training set and a testing/validation set. The training set is used to build the model, while the accuracy of the model is evaluated on the testing set. To reduce variability, multiple rounds of cross-validation may be performed using different partitions of the data set.
US 20170330099 A1
   [0010] In an embodiment, the training the machine learning model comprises selecting a first machine learning algorithm, and the re-training the machine learning model comprises selecting a second machine learning algorithm that is different from the first machine learning algorithm. 
[0034] In certain embodiments, a subset of transactions can be selected for training the machine learning model, and a subset of transactions can be selected for testing the machine learning model. For example, 20% of transactions in a given time period (e.g., from a particular day) can be selected to train the machine learning model. In certain embodiments, the machine learning model comprises a plurality of decision trees input with bootstrap samples of the training set, and the plurality of decision trees are aggregated into a single decision function. A separate set of transactions can be selected as a test set to test the machine learning model after it has been trained. The test set The accuracy of the machine learning model can be determined by providing the machine learning model with the initial versions of transactions contained in the test set and comparing the data input errors detected by the machine learning model to those found by the human auditors. The machine learning model can be revised and or further trained based on the results of such testing. For example, if a first machine learning algorithm is used to train the machine learning model (e.g., a Random Forest algorithm, a CART algorithm, a SVM-C algorithm, etc.), but the test reveals that the machine learning algorithm is not particularly effective, a different machine learning algorithm can be selected to re-train the machine learning algorithm. In another example, if it is determined that the machine learning algorithm is not effectively detecting errors in a particular data field, the data field can be specified during additional training so that during training the machine learning algorithm pays particular attention to errors in that data field as it is further trained.
US 20150379430 A1
[0095] . . . A wide variety of machine learning algorithms may be supported natively by the MLS libraries, including for example random forest algorithms, neural network algorithms, stochastic gradient descent algorithms, and the like. In at least one embodiment, the MLS may be designed to be extensible--e.g., clients may provide or register their own modules (which may be defined as user-defined functions) for input record handling, feature processing, or for implementing additional machine learning algorithms than are supported natively by the MLS. In some embodiments, some of the intermediate results (e.g., summarized statistics produced by the input record handlers) of a machine learning workflow may be stored in MLS artifact repository 120.
[0208] To evaluate the model after it has been trained, a test set Tst1 may be determined using the consistency metadata (element 3216) (e.g., using a set of pseudo-random numbers obtained from the same source, or from a source whose state has been synchronized with that of the source used for selecting Trn1). In one implementation, for example, the consistency metadata may indicate a seed Seed1 and a count N1 of pseudo-random numbers that are obtained from a PRNG for generating Trn1. If the original PRNG is not available to provide pseudo-random numbers for selecting Tst1 (e.g., if the test set is being identified at a different server than the server used for identifying Trn1, and local PRNGs have to be used at each server), an equivalent PRNG may be initialized with Seed1, and the first N1 pseudo-random numbers Model M1 may be tested/evaluated (e.g., the accuracy/quality of the model's predictions may be determined) using test set Tst1.

[0285] One challenge with quantile binning is that it may not be straightforward to select, in advance, the bin counts (i.e., the number of bins to which a given input variable's raw values should be mapped) that will eventually lead to the most accurate and most general predictions from the model being trained or generated. Consider an example scenario in which a model generator has a choice of a bin count of 10, or a bin count of 1000, for a given input variable. With a bin count of 10, approximately 10 percent of the observation records would be mapped to each of the 10 bins, while with a bin count of 1000, only roughly 0.1% of the observation records would be mapped to each bin. In one approach to determining which bin count is the superior choice, two versions of the model may have to be fully trained separately and then evaluated. A first version M1 of the model may be trained with features obtained from the 10-bin transformation (as well as other features, if any are identified by the model generator), and a second version M2 may be trained using features obtained from the 1000-bin transformation (as well as the other features). M1's predictions on test data may be compared to M2's predictions on the same test data to determine which approach is better. Such an approach, in which different bin counts are used for training respective versions of a model, may be less than optimal for a number of reasons. First, training multiple models with respective groups of binned features may be expensive even for a single input variable. When several different binnable variables have to be considered for the same model, as is usually the case, the number of possible combinations to try may become extremely large. Second, it may not be possible to capture subtle non-linear relationships with any single bin-count setting (even for one input variable) in some cases--e.g., features obtained using several different bin-counts for the same variable may be useful for some predictions, depending on the nature of the nonlinear relationships. Thus, in some scenarios, for at least some variables, any 
US 20110307422 A1
0005] For example, programmers incorporating machine learning algorithms (or models) into their code often like to explore how well the models classify different data and which features might impact the models that are build. Too often, however, programmers vary the parameters for one specific model and do not thoroughly explore the space of algorithms, data, and features. A typical machine learning formulation first extracts features from labeled data and then subsequently splits the labeled data into a training and a testing set. The training set is used to create a model, and the testing set is used to evaluate the model. While there are techniques that combine the results of several classifiers into a single joint classifier, there is little or no work that attempts to combine hundreds of classifiers and visualize the results so that users can interpret the results.
0027] Embodiments of the system 100 and method then simultaneously run different classifier training and evaluation experiments using the multiple models (box 220). In some embodiments the multiple models and associated parameters are systematically varied during generation of the multiple models. In some embodiments this systematic variance yields an optimally-differentiated set of results. In other embodiments this systematic variance yields a set of results that matches constraints previously set by a user. A different set of predicted labels then is generated for each of the multiple models (box 230). Embodiments of the system 100 and method then aggregate each set of predicted labels (box 240). From these aggregated labels embodiments of the system 100 and method compute summary statistics (box 250).
6. The method of claim 5, further comprising: computing summary statistics for sets of examples, trials, and sets of trials; computing a percentage of correctly-classified tuples to determine accuracies for each of the multiple models; comparing accuracies across the multiple models; and computing label entropy.
	Applicant further argues 
That is, training "a plurality of models using the retrieved data, each model utilizing a different type of machine learning algorithm," was not well known or common sense. 
Examiner respectfully disagrees.  One skilled in the art before the effective filing date of the invention would recognize that one of the fundamental 
	Applicant further argues 
Training a single model "can quickly become cumbersome to a computing device." Specification, [0029]. The training tasks "can conflict with one another and compete inefficiently for computing resources, such as processor power and memory capacity." Id. These issues are compounded when training a plurality of models. Therefore, it would not be common sense to expend such resources to train "a plurality of models using the retrieved data, each model utilizing a different type of machine learning algorithm," as recited in amended claim 1 and similarly recited in independent claims 10 and 19. 
Examiner has established by the cited art above that the claimed training was well known in the art before the effective filing date of the invention. The provided motivation to combine is not negated by Applicant’s assertions of possible obstacles.  Based on the cited prior art of record, Examiner finds that testing for accuracy was well known and common sense in the computing arts before the effective filing date of the invention  and that testing machine learning algorithms for accuracy was well known and common sense in the computing arts before the effective filing date of the invention. 
	Applicant’s remaining arguments have been addressed above. 







Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.                                                                                                                                                                                 Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALBERT M PHILLIPS, III whose telephone number is (571)270-3256.  The examiner can normally be reached on 10a-6:30pm EST M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela D. Reyes can be reached on (571)270-1006.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/ALBERT M PHILLIPS, III/Primary Examiner, Art Unit 2159                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 Examiner finds that all the language after the words “such that” does not cause any steps to be performed.  Thus, everything after “such that’ has no patentable weight. See MPEP 2111.04 (“Claim scope is not limited by claim language that suggests or makes optional but does not require steps to be performed. . .”) (emphasis added).  Examiner nevertheless finds that the prior art teaches this language, however. 
        2 Examiner finds that “another person” includes the same person named in a future duplicate record.  
        3 Even though Examiner finds that the prior art teaches the whereby clause, Examiner also finds the whereby clause has no patentable weight because it does not cause any steps to be performed or limit the claim to any particular structure. See MPEP 2111.04
        Claim scope is not limited by claim language that suggests or makes optional but does not require steps to be performed, or by claim language that does not limit a claim to a particular structure. However, examples of claim language, although not exhaustive, that may raise a question as to the limiting effect of the language in a claim are: 
        (A) “adapted to” or “adapted for” clauses; 
        (B) “wherein” clauses; and 
        (C) “whereby” clauses. 
        
        4 “Demographic information” has no patentable weight because it is non-functional descriptive material that merely conveys meaning to the human reader rather than assist in performing a function.  See MPEP 2111.05 (I) (B) (III). 
        5 The language “the person's data includes an indication of whether the person has engaged in fraud” has no patentable weight because it is non-functional descriptive material that merely conveys meaning to the human reader rather than assist in performing a function.  See MPEP 2111.05 (I) (B) (III).