DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 2019-09-26, 2019-11-18, 2021-12-02, and 2022-03-03 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Status
Claims 1-30 are pending in the application.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
For the double patenting rejections below, please see the mapping below of Claims 1-2-3 in the instant application vs Claims 1 and 9 in the previously allowed U.S. Patent No. 11,347,718:
Instant (Claim 1 + 2 + 3)
receive a plurality of data records from a data source, each data record of the plurality of data records comprising a feature vector comprising a plurality of predictor variables and a plurality of corresponding predictor variable values; 

output an indication that the plurality of data records contains at least one anomalous data record.



for each data record P of the plurality of data records, apply a first model to the data record P

and wherein the first model and the subsequent models are ranked in an order from most predictive variables to least predictive variables.

wherein an output of the first model represents a probability that the data record P belongs to a distribution represented by a true data set, 


wherein the first model is generated based in part on the true data set and an adversarial data set generated based on the true data set; 
receiving the true data set comprising a plurality of true data records organized according to a plurality of true data set columns and a plurality of true data set rows, each true data record of the plurality of first data records comprising a feature vector comprising a plurality of predictor variables and a plurality of corresponding predictor variable values, wherein each predictor variable represents a unique true data set column of the plurality of true data set columns and wherein each feature vector represents a unique true data set row of the plurality of true data set rows; 

generating, based on the true data set, the adversarial data set comprising a plurality of adversarial data records organized according to a plurality of adversarial data set columns and a plurality of adversarial data set rows, wherein each predictor variable represents a adversarial data set column of the plurality of adversarial data set columns, and wherein the plurality of adversarial data set rows is generated by, for each predictor variable, randomly shuffling the corresponding predictor variable values; 


adding a first data set target column to the true data set, wherein each corresponding first data set target row comprises a first classification value; 

adding a second data set target column to the second data set, wherein each corresponding second data set target row comprises a second classification value; 

building the first model to distinguish the true data set from the adversarial data set, wherein a first model output represents a probability of a point P belonging to a sub- space represented by the true data set; 

removing a true data set column from the true data set and a corresponding adversarial data set column from the adversarial data set that are both associated with a predictor variable of the first model identified as having a highest feature importance for distinguishing the true data set from the adversarial data set; 

and until determining, based on a subsequent model, that the true data set can no longer be distinguished from the adversarial data set, 


iteratively building a plurality of subsequent models to distinguish the true data set from the adversarial data set, wherein each subsequent model is built without a true data set column and adversarial data set column associated with a predictive variable of an immediately preceding subsequent model as having a highest feature importance for distinguishing the true data set from the adversarial data set, and wherein the first model and the subsequent models are ranked in an order from most predictive variables to least predictive variables.


and upon determining that the first model cannot provide a successful output for any data record P of the plurality of data records, output an indication that the plurality of data records contains at least one anomalous data record.

upon determining that the first model cannot provide a successful output for any data record P of the plurality of data records, 
sequentially apply subsequent models of a plurality of subsequent models to the data record P until a successful output is obtained; 


determine a predictor variable of the plurality of predictor variables that had been excluded from a final subsequent model from which the successful output was obtained; 
and identify a corresponding predictor variable value associated with the predictor variable as an anomalous value associated with the data record P.
11,347,718 (Claim 1 + 9)
receive a plurality of data records from a data source, each data record of the plurality of data records comprising a feature vector comprising a plurality of predictor variables and a plurality of corresponding predictor variable values

wherein the plurality of data records comprises at least one anomalous data record; 
outputting an indication that the plurality of data records contains at least one anomalous data record.

and for each data record P of the plurality of data records, sequentially apply each ranked model of the plurality of ranked models to the data record P 
the first model and the ranked models are ranked in an order from most predictive variables to least predictive variables;

building the first model to distinguish the true data set from the adversarial data set, wherein a first model output represents a probability of a point P belonging to a sub-space represented by the true data set; 

generate a first model of a plurality of ranked models by receiving a true data set comprising a plurality of true data records organized according to a plurality of true data set columns and a plurality of true data set rows, wherein (a) each true data record of the plurality of first data records comprising a feature vector comprising a plurality of predictor variables and a plurality of corresponding predictor variable values, (b) each predictor variable represents a unique true data set column of the plurality of true data set columns, and each feature vector represents a unique true data set row of the plurality of true data set rows; 



generating, based at least in part on the true data set, an adversarial data set comprising a plurality of adversarial data records organized according to a plurality of adversarial data set columns and a plurality of adversarial data set rows, wherein (a) each predictor variable represents a adversarial data set column of the plurality of adversarial data set columns, and (b) the plurality of adversarial data set rows is generated by, for each predictor variable, randomly shuffling the corresponding predictor variable values; 

adding a first data set target column to the true data set, wherein each corresponding first data set target row comprises a first classification value; 

adding a second data set target column to the second data set, wherein each corresponding second data set target row comprises a second classification value; 

building the first model to distinguish the true data set from the adversarial data set, wherein a first model output represents a probability of a point P belonging to a sub-space represented by the true data set; 

removing a true data set column from the true data set and a corresponding adversarial data set column from the adversarial data set that are both associated with a predictor variable of the first model identified as having a highest feature importance for distinguishing the true data set from the adversarial data set; 

and until determining, based at least in part on a subsequent model, that the true data set can no longer be distinguished from the adversarial data set, 

iteratively building a plurality of ranked models to distinguish the true data set from the adversarial data set, wherein (a) each ranked model is built without a true data set column and adversarial data set column associated with a predictive variable of an immediately preceding subsequent model as having a highest feature importance for distinguishing the true data set from the adversarial data set, and (b) the first model and the ranked models are ranked in an order from most predictive variables to least predictive variables; 


and upon determining that the first model cannot provide a successful output for any data record P of the plurality of data records, outputting an indication that the plurality of data records contains at least one anomalous data record.

upon determining a successful output has been obtained from a ranked model, 

sequentially apply each ranked model of the plurality of ranked models to the data record P until a successful output is obtained from a ranked model of the plurality of ranked models;

determine a predictor variable of the plurality of predictor variables that has been excluded from the ranked model from which the successful output was obtained; 
and identify a corresponding predictor variable value associated with the predictor variable as an anomalous value associated with the data record P


Claims 1-3, 11-13, and 21-23 are rejected on the nonstatutory double patenting grounds detailed below.
Claim 3 is rejected on the ground of nonstatutory double patenting as being unpatentable over Anticipatory Claim 9 of U.S. Patent No. 11,347,718. Although the claims at issue are not identically worded, they are not patentably distinct from each other because they consist of an equivalent set of limitations, as shown in the mapping provided by the Examiner above.  There is no discernible difference between the claims.
Claim 13 is rejected on the ground of nonstatutory double patenting as being unpatentable over Anticipatory Claim 18 of U.S. Patent No. 11,347,718. Although the claims at issue are not identically worded, they are not patentably distinct from each other because they consist of an equivalent set of limitations, as shown in the mapping provided by the Examiner above.  There is no discernible difference between the claims.
Claim 23 is rejected on the ground of nonstatutory double patenting as being unpatentable over Anticipatory Claim 27 of U.S. Patent No. 11,347,718. Although the claims at issue are not identically worded, they are not patentably distinct from each other because they consist of an equivalent set of limitations, as shown in the mapping provided by the Examiner above.  There is no discernible difference between the claims.
Claim 1 is rejected on the ground of nonstatutory double patenting as being unpatentable over Anticipatory Claim 9 of U.S. Patent No. 11,347,718. Although the claims at issue are not identical, they are not patentably distinct from each other because Instant Claim 1 consists of a subset of the limitations of Instant Claim 3 / Anticipatory Claim 9, and is therefore broader in scope than Anticipatory Claim 9.
Claim 2 is rejected on the ground of nonstatutory double patenting as being unpatentable over Anticipatory Claim 9 of U.S. Patent No. 11,347,718. Although the claims at issue are not identical, they are not patentably distinct from each other because Instant Claim 2 consists of a subset of the limitations of Instant Claim 3 / Anticipatory Claim 9, and is therefore broader in scope than Anticipatory Claim 9.
Claim 11 is rejected on the ground of nonstatutory double patenting as being unpatentable over Anticipatory Claim 18 of U.S. Patent No. 11,347,718. Although the claims at issue are not identical, they are not patentably distinct from each other because Instant Claim 11 consists of a subset of the limitations of Instant Claim 13 / Anticipatory Claim 18, and is therefore broader in scope than Anticipatory Claim 18.
Claim 12 is rejected on the ground of nonstatutory double patenting as being unpatentable over Anticipatory Claim 18 of U.S. Patent No. 11,347,718. Although the claims at issue are not identical, they are not patentably distinct from each other because Instant Claim 12 consists of a subset of the limitations of Instant Claim 13 / Anticipatory Claim 18, and is therefore broader in scope than Anticipatory Claim 18.
Claim 21 is rejected on the ground of nonstatutory double patenting as being unpatentable over Anticipatory Claim 27 of U.S. Patent No. 11,347,718. Although the claims at issue are not identical, they are not patentably distinct from each other because Instant Claim 21 consists of a subset of the limitations of Instant Claim 23 / Anticipatory Claim 27, and is therefore broader in scope than Anticipatory Claim 27.
Claim 22 is rejected on the ground of nonstatutory double patenting as being unpatentable over Anticipatory Claim 27 of U.S. Patent No. 11,347,718. Although the claims at issue are not identical, they are not patentably distinct from each other because Instant Claim 22 consists of a subset of the limitations of Instant Claim 23 / Anticipatory Claim 27, and is therefore broader in scope than Anticipatory Claim 27.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 9, 19, and 29 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 9, 19, and 29 recite the limitation "wherein the subsequent models are sequentially applied to the data record P in an order according to their respective rankings".  However, the “respective rankings” for the “models” are not claimed in respective parent claims 1, 11, and 21.  These are not claims until respective claims 3, 13, and 23:  “wherein the first model and the subsequent models are ranked in an order from most predictive variables to least predictive variables.”  Examiner is interpreting the limitation as “wherein the subsequent models are sequentially applied to the data record P in an order according to rankings of most predictive to least predictive”.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-2, 7-12, 17-22, and 27-30 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea, specifically a mental process, without significantly more.  
Step 1:
Claims 1-10 are directed to an apparatus, Claims 11-20 are directed to a computer program product, and Claims 21-30 are directed to a method, and thus all the claims are directed to one of the four statutory categories of patent eligible subject matter.
Step 2A:
Claims 1, 11, and 21 recite:
 “for each data record P of the plurality of data records, apply a first model to the data record P, wherein an output of the first model represents a probability that the data record P belongs to a distribution represented by a true data set, wherein the first model is generated based in part on the true data set and an adversarial data set generated based on the true data set”; “generating” and “applying” a model are examples of “evaluating” mathematical formulae, and can be performed by a human being using pen and paper, and are therefore a mental process as per MPEP 21060.04(a)(3):  “concepts performed in the human mind (including an observation, evaluation, judgment, opinion)”
“and upon determining that the first model cannot provide a successful output for any data record P of the plurality of data records”; “determining” is an “observation, evaluation, judgment, opinion” and is therefore a mental process.
Step 2B Prong 1:
This judicial exception is not integrated into a practical application because additional elements “receive a plurality of data records from a data source, each data record of the plurality of data records comprising a feature vector comprising a plurality of predictor variables and a plurality of corresponding predictor variable values” and “output an indication that the plurality of data records contains at least one anomalous data record” amount to insignificant extra solution activity (necessary data gathering and outputting; see MPEP 2106.05(g)(3)).
Step 2B Prong 2:
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, as discussed above, additional elements “receive a plurality of data records from a data source, each data record of the plurality of data records comprising a feature vector comprising a plurality of predictor variables and a plurality of corresponding predictor variable values” and “output an indication that the plurality of data records contains at least one anomalous data record” amount to insignificant extra solution activity (necessary data gathering and outputting; see MPEP 2106.05(g)(3)).
	Dependent claims 2-10, 12-20, and 22-30 are also determined to be directed to a mental process.
	Claims 2, 12, and 22 recite:  
“upon determining that the first model cannot provide a successful output for any data record P of the plurality of data records”; “determining” is a mental process
“sequentially apply subsequent models of a plurality of subsequent models to the data record P until a successful output is obtained; “applying” models is a mental process that can be performed with pen and paper
determine a predictor variable of the plurality of predictor variables that had been excluded from a final subsequent model from which the successful output was obtained; “determining” is a mental process
and identify a corresponding predictor variable value associated with the predictor variable as an anomalous value associated with the data record P”; “identifying” is a mental process
Claims 7, 17, and 27 recite:  “generate an anomaly free data records set comprising a subset of the plurality of data records by removing each data record associated with an anomalous value from the plurality of data records”; “generating” records is a mental process that can be performed with pen and paper
Regarding Steps 2B Prong 1 and Prong 2, additional element “transmit the anomaly free data records set to a downstream data consumer”; “transmit” amounts to insignificant extra solution activity (necessary data gathering and outputting; see MPEP 2106.05(g)(3)).
Claims 8, 18, and 28 recite:  “transmit instructions for rendering an indication that a corresponding predictor variable value associated with the predictor variable is an anomalous value associated with the data record P”; “transmit” amounts to insignificant extra solution activity (necessary data gathering and outputting; see MPEP 2106.05(g)(3)).
Claims 9, 19, and 29 recite:  “wherein the subsequent models are sequentially applied to the data record P in an order according to their respective rankings”; “applying” models in order is a mental process that can be performed with pen and paper.
Claims 10, 20, and 30 recite:  “wherein the input data set comprises insurance claims data”; this merely describes the data, and the claims are still directed to a mental process.
Examiner notes that Claims 3, 13, and 23, as well as their respective dependent claims 4-6, 14-16, and 24-26, recite limitations of such complexity that they cannot be practically performed in the human mind.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 8-9, 11, 18-19, 21, and 28-29 are rejected under 35 U.S.C. 103 as being unpatentable over Achin et al. (US 2018/0060738 A1; hereinafter “Achin”) in view of Hou et al. (“Shuffle based Anomaly Detection in Multi-state System”; hereinafter “Hou”)
As per Claim 1, Achin teaches an apparatus for detecting anomalous data in an input data set, the apparatus comprising at least one processor and at least one memory comprising computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to (Achin, Para [0026], discloses:  “According to another aspect of the present disclosure, a predictive modeling apparatus is provided, including a memory configured to store processor-executable instructions; and a processor configured to execute the processor-executable instructions, wherein executing the processor-executable instructions causes the apparatus to perform steps including”)
receive a plurality of data records from a data source, each data record of the plurality of data records comprising a feature vector comprising a plurality of predictor variables and a plurality of corresponding predictor variable values (Achin, Para [0174], discloses:  “Service-based prediction may occur either interactively or via API. For interactive predictions, the user may enter the values of features for each new observation or upload a file containing the data for one or more observations.”  Here, Achin discloses receive a plurality of data records (“observations”) from a data source (“upload a file”).  Each observation includes a feature vector with a plurality of variables with values (“values of features”)).
for each data record P of the plurality of data records, apply a first model to the data record P, [wherein an output of the first model represents a probability that the data record P belongs to a distribution represented by a true data set], wherein the first model is generated based in part on the true data set and an adversarial data set generated based on the true data set (Achin, Para [0298], discloses:  “Given a dataset (or any sample thereof) and a modeling technique, the exploration engine 110 may calculate the importance of any feature using universal partial dependence. First, the engine 110 obtains the accuracy metric for a predictive model fitted on the sample using the modeling technique. The engine 110 can either perform this fitting from scratch or use a previous fitting. Then, for a given feature, the engine 110 takes all its values across all observations, shuffles them, and reassigns them (e.g., randomly reassigns them) to the observations. This random shuffling may reduce (e.g., destroy) any predictive value for that feature. The engine may then rescore the model on the dataset with the shuffled feature values, producing a new value for the accuracy metric. (Optionally before rescoring the fitted model on the shuffled dataset, the engine may refit the model to the shuffled dataset.) The decrease in the accuracy of the model indicates how much predictive value was lost and thus the feature's importance to the model and/or within the scope of the modeling technique.”  Here, Achin discloses to apply a first model for each data record (“rescore the model on the dataset”), wherein the first model is generated in part on the true data set (“First, the engine 110 obtains the accuracy metric for a predictive model fitted on the sample”) and an adversarial data set generated from the true data set (“the engine may refit the model to the shuffled dataset”).
While Achin teaches output an indication that the plurality of data records contains at least one anomalous data record (Achin, Para [0298] above, discloses feature importance:  “The decrease in the accuracy of the model indicates how much predictive value was lost and thus the feature's importance to the model”.  Achin also suggests in [0193]:  “Predictive modeling system 100 may include automated procedures for outlier detection and replacement, missing value imputation, and the detection and treatment of other data anomalies, requiring less skill and effort by the user.”  Achin also suggests a binary classifier to indicate the similarity of two predictions in [0055]:  “In some embodiments, the library 130 of modeling techniques includes tools for assessing the similarities (or differences) between predictive modeling techniques. Such tools may express the similarity between two predictive modeling techniques as a score (e.g., on a predetermined scale), a classification (e.g., “highly similar”, “somewhat similar”, “somewhat dissimilar”, “highly dissimilar”), a binary determination (e.g., “similar” or “not similar”).”), Achin does not appear to teach wherein an output of the first model represents a probability that the data record P belongs to a distribution represented by a true data set; and upon determining that the first model cannot provide a successful output for any data record P of the plurality of data records, output an indication that the plurality of data records contains at least one anomalous data record. 
Hou explicitly teaches for each data record P of the plurality of data records, apply a first model to the data record P, wherein an output of the first model represents a probability that the data record P belongs to a distribution represented by a true data set, wherein the first model is generated based in part on the true data set and an adversarial data set generated based on the true data (Hou, Page 876 Section B Para 2, discloses an adversarial data set generated based on the true data:  “In order to reduce the effects of the temporal characteristic of the multi-state systems, we shuffle the dataset with randomly generated permutation matrixes.”  Hou, Page 876 bottom left paragraph, discloses:  “Given a shuffled dataset Xi = {xi1, xi2, ..., xiT}, the first 2tw items are given the labels that the first tw points regarded as normal are labeled as class 0, the second points are labeled as class 1 and selected to learn a classifier f with one-class SVM method.”  Here, Hou discloses applying a model (“one-class SVM”) to the data record P (“shuffled dataset”).  Hou, Page 876 Section C, discloses:  “The strategy of the one-class SVM method [25], [26] is to map the dataset in to a feature space, then surround most of the positive points with a minimal hypersphere. The points out the “ball” are regarded as negative.”  Here, Hou discloses that an output of the model represents a probability that P belongs to a distribution of a true dataset, as it determines if the result lies in the “hypersphere” or outside the “hypersphere”.  Hou, Page 877 just before Section IV, discloses: “The testing samples with the highest abnormal scores are distinguished as outliers.”)
and upon determining that the first model cannot provide a successful output for any data record P of the plurality of data records, output an indication that the plurality of data records contains at least one anomalous data record. (Hou, Page 877 just before Section IV, discloses: “The testing samples with the highest abnormal scores are distinguished as outliers.”  Here, “abnormal score” is a non-successful output.)
Achin and Hou are analogous art because they are both in the field of endeavor of machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the shuffling-based feature importance of Achin with the shuffling-based anomaly detection of Hou.  Hou’s single-class SVM classifier would perform what Achin hints at in [0193]:  “Predictive modeling system 100 may include automated procedures for outlier detection and replacement, missing value imputation, and the detection and treatment of other data anomalies, requiring less skill and effort by the user,” and [0055]:  “a binary determination (e.g., “similar” or “not similar”)”. One of ordinary skill in the art would be motivated to do so in order to effectively identify anomalous data despite variability in the dataset (Hou, Page 879, Conclusion:  “In this paper, we aim at to solve the anomalies detection in multi-state system which has several special characteristics: the states of the systems cannot be ensure, the data of the systems have obvious step changes, and the transition of the system are random. In order to reduce the effects of the random step changes of the datasets, we randomly shuffle the given testing dataset for several times.”)

As per Claim 8, the combination of Achin and Hou teaches the apparatus of Claim 2.  Achin teaches an indication that a corresponding predictor variable value associated with the predictor variable is an [anomalous] important value associated with the data record P.  (Achin, Para [0299], discloses:  “Using the above-described technique for calculating the importance of one feature for one model and/or modeling technique, the engine can iterate over features to determine the relative importance of features within a model and/or modeling technique, iterate over models and/or modeling techniques to determine the relative importance of a feature across models and/or modeling techniques, or both.”  Here, Achin discloses that the most important or influential features can be identified.)
However, Achin does not teach wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to: transmit instructions for rendering an indication that a corresponding predictor variable value associated with the predictor variable is an anomalous value associated with the data record P.
Hou teaches wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to: transmit instructions for rendering an indication that [a corresponding predictor variable value associated with the predictor] variable is an anomalous value associated with the data record P.  (Hou, Page 877 just above Section IV, discloses:  “The testing samples with the highest abnormal scores are distinguished as outliers”.  Here, Hou teaches transmit instructions for rendering an indication, as the computer code instructions are an indication that the highest abnormal scores are outliers, and they are transmitted to the processor.  Here, Hou discloses an indication that a data record is an anomalous data record.  In combination with Achin, who discloses identifying the most important feature, the combination results in an indication that a predictor variable is an anomalous value associated with the data record.  Examiner notes that the entire “data record” may be considered anomalous, including all of its constituent variables.)
It would have been obvious to combine the teachings of Achin and Hou for at least the reasons recited in Claim 1.

As per Claim 9, the combination of Achin and Hou teaches the apparatus of Claim 1.  Achin suggests wherein the subsequent models are sequentially applied to the data record P in an order according to their respective rankings.  (Achin, Para [0299], discloses:  “Equivalently, the exploration engine 110 may select the fraction of the modeling procedures having the highest suitability ranks (e.g., in cases where the suitability scores for the modeling procedures are not available, but the ordering (ranking) of the modeling procedures' suitabilities is available). The fraction may be provided by a user or determined by exploration engine 110. In some embodiments, exploration engine 110 may adjust the fraction to increase or decrease the number of modeling procedures selected for execution, depending on the amount of processing resources available for execution of the modeling procedures.”  Here, Achin discloses choosing the highest ranked models to be applied to the data.  Achin’s “suitability” is a measure of predictiveness, as stated in [0078]:  “The “suitability” of a predictive modeling procedure for a prediction problem may include data indicative of the expected performance on the prediction problem of predictive models generated using the predictive modeling procedure. In some embodiments, a predictive model's expected performance on a prediction problem includes one or more expected scores (e.g., expected values of one or more objective functions) and/or one or more expected ranks (e.g., relative to other predictive models generated using other predictive modeling techniques)”, and thus the top of Achin’s ranking is the most predictive model.)

Claims 11 and 18-19 are computer program product claims corresponding to apparatus claims 1 and 8-9, respectively, and are rejected for the same reasons.

Claims 21 and 28-29 are method claims corresponding to apparatus claims 1 and 8-9, respectively, and are rejected for the same reasons.

Claims 2, 12, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Achin and Hou, further in view of McMahon et al. (US 2016/0048766 A1; hereinafter “McMahon”).
As per Claim 2, the combination of Achin and Hou teaches the apparatus of Claim 1 as well as anomalous value (see Rejection to Claim 1).  Achin teaches wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to: 
upon determining that the first model cannot provide a successful output for any data record P of the plurality of data records, sequentially apply subsequent models of a plurality of subsequent models to the data record P until a successful output is obtained; (Achin, Para [0114], discloses:  “Two or more models may be blended by combining the outputs of the constituent models. In some embodiments, the blended model may comprise a weighted, linear combination of the outputs of the constituent models. A blended predictive model may perform better than the constituent predictive models, particularly in cases where different constituent models are complementary.”  Here, Achin discloses that two models are “complementary”, meaning that they may not perform well individually, but perform “better” than the constituent models.  Therefore, the output of the first model is not “successful”, and a remainder of a plurality (“two or more”) models is applied, and in combination they “perform better”, and thus produce a “successful output”.)
identify a corresponding predictor variable value associated with the predictor variable as an anomalous value associated with the data record P.  (Recall above in Hou discloses an anomalous value in the dataset, identified based on shuffling features in the dataset.  Achin suggests that shuffling features in a dataset can identify a corresponding predictor variable value associated with the predictor variable, as Achin states that the shuffling can identify “feature importance” in [0299]:  “The engine may then rescore the model on the dataset with the shuffled feature values, producing a new value for the accuracy metric. (Optionally before rescoring the fitted model on the shuffled dataset, the engine may refit the model to the shuffled dataset.) The decrease in the accuracy of the model indicates how much predictive value was lost and thus the feature's importance to the model and/or within the scope of the modeling technique.”)
However, Achin does not explicitly teach determine a predictor variable of the plurality of predictor variables that had been excluded from a final subsequent model from which the successful output was obtained 
McMahon teaches determine a predictor variable of the plurality of predictor variables that had been excluded from a final subsequent model from which the successful output was obtained (McMahon, Para [0047], discloses:  “Testing and cross-validation 310 may include testing each model on the dataset by utilizing a test set of data points held out or omitted from the training dataset to determine accuracy, discarding models with insufficient predictive power, and determining overall weighting of the models within each dataset. In the initial training of ensemble model 308, a set of features may be removed from a given sub-dataset, thereby removing a subset of data bearing features, and additional models trained using the remaining features. Training additional models of the ensemble against these subsets of the total feature set allows for a broader set of models to be created and evaluated.”  Here, McMahon discloses excluding predictor variables from the final model (“a set of features may be removed from a given sub-dataset, thereby removing a subset of data bearing features, and additional models trained using the remaining features”).
McMahon and the combination of Achin and Hou are analogous art because they are both in the field of endeavor of machine learning.  Both Achin and Hou suggest removing data for model classifiers, as even Achin states in [0301]:  “In addition to assisting the user in understanding the prediction problem, the engine 110 may use the results from this application of feature importance to guide additional automated analysis. For example, for features that score highly in importance across the board, the engine may allocate more resources to exploring interactions of those features. For features that score poorly in importance across the board, the engine may drop these features from the dataset entirely.”
	It would have been obvious before the effective filing date of the claimed invention to combine the feature importance with plurality of models of Achin and Hou with the removal of features from the plurality of models of McMahon.  One of ordinary skill in the art would be motivated to do so in order to gain insight into how predictions change based on the removal of various important features, and enable one to make more informed decisions on in such high-stakes data as insurance data (McMahon [0047]:  “Training additional models of the ensemble against these subsets of the total feature set allows for a broader set of models to be created and evaluated. According to another embodiment, random subsets of a feature set can be eliminated and iterative feature addition may be repeated to obtain a diverse set of models”, and [0048]:  “The prediction engine may provide a classifier comprising the ensemble of models to aid in underwriting life insurance applications including the monitoring of underwriting decisions quality and the updating of the classifier over time. The classifier is operable to estimate or predict outcomes related to insurance claim frauds, medical issues, investment risk, accident likeliness, etc.”)

Claim 12 is a computer program product claim corresponding to apparatus claim 2 and is rejected for the same reasons.

Claim 22 is a method claim corresponding to apparatus claim 2 and is rejected for the same reasons.


Claims 7, 17, and 27 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Achin, Hou, and McMahon, further in view of Schierz et al. (US 2021/0103580 A1; hereinafter “Schierz”).
	As per Claim 7, the combination of Achin, Hou, and McMahon teaches the apparatus of Claim 2.  However, the combination of Achin, Hou, and McMahon does not teach wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to: generate an anomaly free data records set comprising a subset of the plurality of data records by removing each data record associated with an anomalous value from the plurality of data records; and transmit the anomaly free data records set to a downstream data consumer.
	Schierz teaches wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to: generate an anomaly free data records set comprising a subset of the plurality of data records by removing each data record associated with an anomalous value from the plurality of data records; and transmit the anomaly free data records set to a downstream data consumer. (Schierz, Para [0009-0013] discloses that their method for anomaly detection is superior because it does not suffer from the problem of existing methods, which are not sufficient for removing anomalous data:  “As yet another example, many current anomaly detection processes rely upon supervised machine learning, which can be unsuitable for anomaly detection in a variety of use cases such as, for example… Use cases in which training data quality is low, requiring the removal of outliers before building predictive models”  Examiner notes that this same rationale is recited in Page 1 of the related Provisional Application 62779172 filed on 2018-12-13, and thus has support as prior art.  In the final application, Schierz also confirms this in [0060]:  “Responsive to the correlation being less than a threshold correlation, the method further comprises removing the set of anomalous training data samples from the plurality of training data samples for training the supervised machine learning model.”  Thus, Schierz discloses removing anomalies from the data set and thus generating an anomaly free data set.  This is transmitted to a downstream consumer (“supervised machine learning model”).  The machine learning model consumes the anomaly free data set for “training”.)
	Schierz and the combination of Achin, Hou, and McMahon are analogous art because they are both in the field of endeavor of machine learning and anomaly detection.
	It would have been obvious before the effective filing date of the claimed invention to combine the anomaly detection of Achin, Hou, and McMahon with the anomaly removal of Schierz.  One of ordinary skill in the art would be motivated to do so in order to maintain the accuracy of a machine learning model by not skewing the training with anomalous data (Schierz, Para [0013]:  “Use cases in which training data quality is low, requiring the removal of outliers before building predictive models”).

Claim 17 is a computer program product claim corresponding to apparatus claim 7 and is rejected for the same reasons.

Claim 27 is a method claim corresponding to apparatus claim 7 and is rejected for the same reasons.

Claims 10, 20, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Achin and Hou, further in view of Schierz et al. (US 2021/0103580 A1; hereinafter “Schierz”).
	As per Claim 10, the combination of Achin and Hou teaches the apparatus of Claim 1.  However, the combination of Achin and Hou does not teach wherein the input data set comprises insurance claims data.
	Schierz teaches wherein the input data set comprises insurance claims data. (Schierz, Para [0007] discloses:  “For example, predictive models can be used to identify anomalous insurance claims for further review to determine whether the anomalous claims are fraudulent and should be denied.” Examiner notes that evaluating insurance claims for fraud is also recited in related Provisional Application 62779172 filed on 2018-12-13)
	Schierz and the combination of Achin and Hou are analogous art because they are both in the field of endeavor of machine learning and anomaly detection.
	It would have been obvious before the effective filing date of the claimed invention to combine the anomaly detection of Achin, Hou, and McMahon with the insurance claim data of Schierz.  One of ordinary skill in the art would be motivated to do so in order to save the insurance company money by avoiding paying out fraudulent claims (Schierz, Para [0007]:  “For example, predictive models can be used to identify anomalous insurance claims for further review to determine whether the anomalous claims are fraudulent and should be denied”).

Claim 20 is a computer program product claim corresponding to apparatus claim 10 and is rejected for the same reasons.

Claim 30 is a method claim corresponding to apparatus claim 10 and is rejected for the same reasons.

Prior Art Not Applied
Claims 3-6, 13-16, and 23-26 are not rejected over any known prior art.  Shuffling the columns of a dataset in order to identify outliers is taught by Achin and Hou, as shown above.  As shown below in the Conclusion, this is also taught by Nourian et al. (US 2021/0049503 A1), Moriyama et al. (US 2019/0205234 A1), Merrill et al. (US 2019/0043070 A1), Balamurali et al. (“Detection of Outliers in Geochemical Data Using Ensembles of Subsets of Variables”), Elghazel et al., (“Unsupervised feature selection with ensemble learning”), Hasan et al. (“Feature Selection for Intrusion Detection Using Random Forest”), and Huang et al. (“A Permutation Importance-Based Feature Selection Method for Short-Term Electricity Load Forecasting Using Random Forest”).
McMahon, as shown above, discloses using a plurality of classifiers, where the classifiers consist of different subsets of features.
As shown below in the Conclusion, Vatamanu et al. (US 2016/0335432 A1) discloses a model cascade where a plurality of models are iterated through until a successful classification is achieved.
As shown below in the Conclusion, Baikalov et al. (US 2020/0097858 A1) discloses ranking features by their importance and then removing features from the model based on their importance.
However, no combination of prior art has been found by Examiner to render obvious the complete set of limitations set forth in independent Claims 3, 13, and 23.  Therefore, the claims are not rejected over prior art.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Yoon et al. (US 2020/0019852 A1) discloses a system of anomaly detection using a plurality of anomaly detection sub-models
Nourian et al. (US 2021/0049503 A1) discloses in [0074]:  “For example, permutation importance method may be used to randomly permute a feature to determine how the model performs in the presence of perturbed data. This approach may be implemented based on brute-force, and thereby may involve a substantial level of resources for complex models. However, such approach may perform better than model-dependent counterparts for specific methods, such as Gini index in Random Forest.”
Zawadzki et al. (US 2021/0035235 A1) discloses in [0083]:  The machine learning training module 118 can generate the perturbed historical data 130 by changing a very small number of data values in the historical professional data 123…This can help the machine learning training process to better identify anomalies.
Yuan et al. (US 2020/0334492 A1) discloses, in subsection “Feature Importance”, permuting input in [0066]:  “At the next step, a subgroup is selected and is adjusted 105 relative to the other subgroups in the input. The adjustment might involve the blanking or deletion of the subgroup from the input, the application of a weighting to the subgroup (e.g. increasing or decreasing information values within the subgroup), or the permutation of values within the subgroup”, and identifying anomalies in [0073]:  “For instance, if a classifier is producing classifications that appear to be erroneous (or at least anomalous), identifying the influential components that caused these erroneous classifications can help a user to assess whether the data is indeed erroneous”
Moriyama et al. (US 2019/0205234 A1) discloses, in [0071], applying permutation importance to random forest:  “Next, the values of the i-th variables are permutated in the oob data. Then, each oob data after the permutation is applied to the corresponding tree and the ratio of correct classifications is calculated. Moreover, the difference in the ratio of correct classifications of the oob data before and after the permutation is calculated for each tree. The permutation importance of the i-th variable is defined as the mean of difference in the ratio for all the trees in the forest”
Merrill et al. (US 2019/0043070 A1) discloses in [0088]:  “for each variable v in S, determining an importance of the variable v by comparing the model accuracy metric AO with a model accuracy metric AV generated by shuffling values of v in S”, and in [0190]:  “In other embodiments, the modeling engine is constructed of multiple ensembles of models”
Baikalov et al. (US 2020/0097858 A1) discloses in [0016]:  “Then the features are ranked by their importance (significance) to the predicted results for a target observation of interest, and the features with low importance are removed from the model. The feature ranking indicates the relative importance of each feature to the prediction that was made, and helps explain the reasons for that prediction.”
Vatamanu et al. (US 2016/0335432 A1) discloses iterating through models in a cascade until success is achieved in [0074]:  “A sequence of steps 374-394 is repeated in a loop until a successful classification of the target object is achieved, each instance of the loop corresponding to a consecutive level of the cascade.”
Balamurali et al. (“Detection of Outliers in Geochemical Data Using Ensembles of Subsets of Variables”) discloses on Pages 371-372 Section 2.2, “At each round, the Tree Bagger algorithm is applied to a dataset containing N variables and M observations. The Tree Bagger algorithm has multiple training iterations. Within each training iteration, the value of a single variable is randomly swapped with its value at another observation while keeping the rest of the dataset unchanged. The difference in model error when permuting the values of a specific variable is the measure of the importance of that variable.”
Elghazel et al., (“Unsupervised feature selection with ensemble learning”) discloses in Page 162 Section 3.2: “These subsets, called out-of-bag (oob for short), can be used to give unbiased measures of feature importance. RCE estimates the relevance of features entering the clustering in the following way. After each clustering is constructed, one at a time, each feature f = 1, 2,...,M is shuffled (randomly permuted) in the oob examples and the oob data are re-assigned into clusters. At the end of the run, the oob cluster assignments for x with the f th variable noised up is compared with the original cluster assignment of x. Intuitively, irrelevant features will not change the classification of x when altered in this way. The relative difference in classification between the original and shuffled data sets is therefore related to the relevance of the shuffled feature.”
Hasan et al. (“Feature Selection for Intrusion Detection Using Random Forest”) discloses in Page 134 Para 2:  “That leaves a set of out of bag (OOB) samples, which can be used to measure the forest’s classification accuracy. In order to measure a specific feature’s importance in the tree, randomly shuffle the values of this feature in the OOB samples and compare the classification accuracy between the intact OOB samples and the OOB samples with the particular feature permutated.”
Huang et al. (“A Permutation Importance-Based Feature Selection Method for Short-Term Electricity Load Forecasting Using Random Forest”) discloses on Page 5 Section 3.2.1:  “When calculating the importance value of feature Fj based on the ith tree, OOBErrori is first calculated based on Equation (3). Then, the values of feature Fj in the OOB dataset are randomly rearranged and those of the other features are unchanged, thereby forming a new OOB dataset OOBi.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/L.A.S./Examiner, Art Unit 2126 
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126