Detailed Action

AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claims
Claims 1-20 are pending and rejected in the application. 

Claim Rejections – 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 4, 5, 7, 8, 10, 11, 12, 14, 15, 17, 18, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Dirac et al. U.S. Patent Publication (2015/0379425; hereinafter: Dirac) in view of Tran Non-Patent Publication (“Improving Random Forest Algorithm through Automatic Programming”, May 15, 2015; hereinafter: Tran, in IDS dated August 18, 2021) and further in view of Raghavrv Non-Patent Publication (“[RFC] Missing values in RandomForest #5870”; November 2015, hereinafter: Raghavrv)  

Claims 1, 8, and 15
As to claims 1, 8, and 15, Dirac discloses a system comprising: 
a processor configured to execute instructions (paragraph[0169], “In the illustrated embodiment, computing device 9000 includes one or more processors 9010 coupled to a system memory…etc.”); 
a computer-readable medium containing instructions for execution on the processor, the instructions, when executed, causing the processor to perform steps of (paragraph[0174], “In some embodiments, system memory 9020 may be one embodiment of a computer-accessible medium configured to store program instructions and data as described above for FIG. 1 through FIG. 32 for implementing embodiments of the corresponding methods and apparatus…etc.”):
accessing an external data entry comprising a set of external values, each external value associated with a corresponding attribute from a set of attributes relating to fraudulent behavior in a web application (figure 2, paragraph[0168], “A logically centralized repository of machine learning objects corresponding to numerous types of entities (such as models, data sources, or recipes) may enable multiple users or collaborators to share and re-use feature-processing recipes on a variety of data sets…The MLS may be used for, and may incorporate techniques optimized for, a variety of problem domains covering both supervised and unsupervised learning, such as, fraud detection…etc.”); 
applying the classifier to the transformed external data entry to generate a classification label estimate configured to indicate whether a user is engaging in fraudulent behavior (paragraph[0094], “In some embodiments, quantitative measures of model predictive effectiveness such as the area under receiver operating characteristic (ROC) curves for various classifiers may also be collected. In one embodiment, some of the information regarding quality may be deduced or observed implicitly by the MLS instead of being obtained via explicit client feedback, e.g., by keeping track of the set of parameters that are changed during training iterations before a model is finally used for a test data set…etc.”); and 
storing the classification label estimate in association with the external data entry in a data store (paragraph[0292], “The method as recited in any of clauses 19-21, wherein the plurality of parameter value options comprise one or more of: (a) respective lengths of n-grams to be derived from a language processing data set, (b) respective quantile bin boundaries for a particular variable, (c) image processing parameter values, (d) a number of clusters into which a data set is to be classified…etc.”).

Dirac does not appear to explicitly disclose accessing a classifier trained using entries of a training database, wherein each entry is associated with a classification label from a set of two or more classification labels; 
wherein each entry comprises a set of transformed values, each transformed value being associated with a corresponding transformed attribute from a set of transformed attributes; 
wherein each of the transformed values of a given entry was generated from a transformation and interpolation applied to values associated with an attribute from the set of attributes of that given entry to decluster the values, wherein a majority of the values are clustered by being concentrated within a sub-range of a range of the values, the sub-range range being smaller than the range and constituting a percentage of the range, and wherein the transformed values are declustered by being distributed across a new range, a majority of the transformed values not being within a sub-range of the new range that constitutes the percentage of the new range; 
applying the transformation to an external value associated with the attribute in the external data entry to generate a transformed external data entry; 

However, Tran discloses accessing a classifier trained using entries of a training database(page 27, “Random forest is one of the most well-known ensemble algorithms that uses decision tree as base classifier…etc.”), wherein each entry is associated with a classification label from a set of two or more classification labels (page 5, “The class label in the leaf node indicates the class to which the instance should belong…etc.”, page 29, “On the other hand, if the mth variable is categorical it is replaced…etc.”, the reference describes the decision tree having classification labels which is used as training data.); 
wherein each entry comprises a set of transformed values, each transformed value being associated with a corresponding transformed attribute from a set of transformed attributes (pages 28-29, “The proximity matrix from the random forests is used to update the imputations of the missing values. For numerical variable, the imputed value is the weighted average of the nonmissing cases, where the weights are the proximities. For categorical variable, the imputed value is the category with the largest proximity…etc.”, the reference describes transforming missing data into random forest training data.); 
wherein each of the transformed values of a given entry was generated from a transformation and interpolation applied to values associated with an attribute from the set of attributes of that given entry to decluster the values (pages 28-31, “Proximity matrix can be used to replace missing values for training and test set. It can also be employed to detect outliers. The following sections will illustrate how missing values are replaced and outliers are detected using the proximity matrix…etc.”),
applying the transformation to an external value associated with the attribute in the external data entry to generate a transformed external data entry (page 13, pages 28-29, “Proximity matrix can be used to replace missing values for training and test set…etc.”, the examiner interprets the test set as external data). It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to have modified the teachings of Dirac with the teachings of Tran to transform missing data which would result in the claim invention. The skilled artisan would have been motivated to improve the teachings of Dirac with the teachings of Tran to improve the steps of which classifiers are generated for Random Forest models (Tran: Abstract). 

The combination of Rossi and Tran discloses wherein a majority of the values are clustered by being concentrated within a sub-range of a range of the values, the sub-range range being smaller than the range and constituting a percentage of the range, and wherein the transformed values are declustered by being distributed across a new range, a majority of the transformed values not being within a sub-range of the new range that constitutes the percentage of the new range; 


However, Raghavrv discloses wherein a majority of the values are clustered by being concentrated within a sub-range of a range of the values (Raghavrv: “(-1-1) Add an optional imputation variable, where we can either Specify the strategy 'mean', 'median', 'most_frequent' (or missing_value?) and let the clf construct the Imputer on the fly...etc.”), the sub-range range being smaller than the range and constituting a percentage of the range, and wherein the transformed values are declustered by being distributed across a new range, a majority of the transformed values not being within a sub-range of the new range that constitutes the percentage of the new range (“(+1+1) Find the best split by sending the missing-valued samples either side and choosing the direction that brings about a maximum reduce in the entropy (impurity)…etc.”). It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to have modified the teachings of Dirac with the teachings of Tran and Raghavrv to organize missing which would result in the claim invention. The skilled artisan would have been motivated to improve the teachings of Dirac with the teachings of Tran and Raghavrv to organize missing data before fitting the Random Forest model (Raghavrv). 



Claims 3, 10, and 17
As to claims 3, 10, and 17, the combination of Dirac, Tran, and Raghavrv discloses all the elements in claim 15, as noted above, and Tran further disclose wherein units of the transformed values associated with the transformed attribute are different from units of the values associated with the attribute (page 29, “To be specific, if the m’th variable of case n is missing and it is numeric, it is replaced with the median of all values of this variable in the same class, say j, with case n. On the other hand, if the mth variable is categorical, it is replaced with the most frequent non-missing value in class j…etc.”).

Claims 4, 11, and 18
As to claims 4, 11, and 18, the combination of Dirac, Tran, and Raghavrv discloses all the elements in claim 15, as noted above, and Tran further disclose wherein at least one of the entries in the training database includes an interpolated value associated with a transformed attribute, wherein the interpolated value is determined based on an interpolation function associated with the interpolation applied to a subset of transformed values associated with the transformed attribute (pages 28-29, “The proximity matrix from the random forests is used to update the imputations of the missing values. For numerical variable, the imputed value is the weighted average of the non-missing cases, where the weights are the proximities. For categorical variable, the imputed value is the category with the largest proximity…etc.”, the reference describes replacing the missing (i.e., interpolated) value with determined value by the system.).


Claims 5, 12, and 19
As to claims 4, 12, and 19, the combination of Dirac, Tran, and Raghavrv discloses all the elements in claim 15, as noted above, and Tran further disclose wherein an interpolation function of the interpolation is a median, mode, or weighted average of the subset of transformed values (pages 28-29, “the imputed value is the weighted average…etc.”).

Claims 7 and 14
As to claims 7 and 14, the combination of Dirac, Tran, and Raghavrv discloses all the elements in claim 15, as noted above, and Dirac further disclose wherein the set of external values are numerical or categorical (paragraph[0081], “variables may include instances of any of a variety of data types, such as, for example text, a numeric data type (e.g., real or integer), Boolean, a binary data type, a categorical data type…etc.”).

Claims 2, 9, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Dirac et al. U.S. Patent Publication (2015/0379425; hereinafter: Dirac) in view of Tran Non-Patent Publication (“Improving Random Forest Algorithm through Automatic Programming”, May 15, 2015; hereinafter: Tran) and further in view of Raghavrv Non-Patent Publication (“[RFC] Missing values in RandomForest #5870; November 2015, hereinafter: Raghavrv) and further in view of Bach et al. Non-Patent Publication (“Beyond Independent Components: Trees and Cluster”, 2003; hereinafter: Bach, in IDS dated August 18, 2021) 

Claims 2, 9, and 16
As to claims 2, 9, and 16, the combination of Dirac, Tran, and Raghavrv discloses all the elements in claim 15, as noted above, but do not appear to explicitly disclose wherein the transformation is invertible. 

However, Bach discloses wherein the transformation is invertible (“Since the KL divergence is invariant under an invertible transformation...etc.”). It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to have modified the teachings of Dirac with the teachings of Tran, Raghavrv, and Bach to create an invertible transformation which would result in the claim invention. The skilled artisan would have been motivated to improve the teachings of Rossi with the teachings of Tran, Raghavrv, and Bach to efficiently create an invertible transformation. 

Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Dirac et al. U.S. Patent Publication (2015/0379425; hereinafter: Dirac) in view of Tran Non-Patent Publication (“Improving Random Forest Algorithm through Automatic Programming”, May 15, 2015; hereinafter: Tran) and further in view of Raghavrv Non-Patent Publication (“[RFC] Missing values in RandomForest #5870”; November 2015, hereinafter: Raghavrv) and further in view of Cánovas-García et al. Non-Patent Publication (“Optimal Combination of Classification Algorithms and Feature Ranking Methods for Object-Based Classification of Submeter Resolution Z/I-Imaging DMC Imagery”, April 2015; hereinafter: Cánovas-García, in IDS dated August 18, 2021) 

Claims 6, 13, and 20
As to claims 6, 13, and 20, the combination of Dirac, Tran, and Raghavrv discloses all the elements in claim 1, as noted above, but do not appear to explicitly disclose wherein a distance metric between the entry comprising the interpolated value and each entry associated with the subset of transformed values is below a predetermined threshold.

However, Cánovas-García discloses wherein a distance metric between the entry comprising the interpolated value and each entry associated with the subset of transformed values is below a predetermined threshold (Section 2.5.2. Weighted k-Nearest Neighbors, “When used for classification, k-nearest neighbors [53] estimates the class for every new observation using the k-closest observations, according to a distance metric, from the training set. Class probabilities for the new observation are estimated as the proportion of training set neighbors in each class…etc.”). It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to have modified the teachings of Dirac with the teachings of Tran, and Raghavrv, and Cánovas-García to determine a distance metric between entity neighbors which would result in the claim invention. The skilled artisan would have been motivated to improve the teachings of Dirac with the teachings of Tran, and Raghavrv, and Cánovas-García to efficiently determine a distance metric between entity neighbors. 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAWAUNE A CONYERS whose telephone number is (571)270-3552.  The examiner can normally be reached on M-F 8:00am-4:30pm EST. EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached on (571) 270-0474.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DAWAUNE A CONYERS/Primary Examiner, Art Unit 2152  
October 6, 2021