Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .  The following is a Non-Final Office Action.  Claims 1-20 are rejected below. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.  
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. Specifically, Claims 1-20 are directed to an abstract idea without additional elements amounting to significantly more than the abstract idea. 
Step 1 of the Alice/Mayo analysis is directed to determining whether or not the claims fall within a statutory class.  Based on a facial reading of the claim elements, Claims 1-20 fall within a statutory class of process, machine, manufacture, or composition of matter.  
With respect to Step 2A Prong One of the framework, the claims recite an abstract idea. Claim 1, 11, and 16 includes limitations reciting functionality for an algorithm for predicting missing values, including: 
Pre-processing a set of structured data comprising applying a plurality of cleaning policies…
Filtering the set of pre-processed features using correlation-based filtering…
Performing feature subset selecting comprising applying one or more algorithms to the set of filtered features…wherein a subset of the set of filtered features is selected…
which is an abstract idea reasonably categorized as 
mental processes – as each of the steps can be performed in the human mind (including an observation, evaluation, judgment, opinion). 
Similarly, Claims 2-10, 12-15, and 17-20 further recite operations that can be practically performed in the human mind and descriptive data that further narrows the abstract idea.  
With respect to Step 2A Prong Two, the claims do not include additional elements that integrate the abstract idea into a practical application. Claim 1, 11, and 16 includes various elements that are not directed to the abstract idea under Step 2A Prong One of the framework. These additional elements include devices, processor, memory, instructions, computer readable storage media, supervised machine learning.  When considered in view of the claim as a whole, Examiner submits that the additional elements are not additional elements that integrate the abstract idea into a practical application because, these elements are generic computing elements performing generic computing functions and amount to mere instructions to apply the abstract idea on a computer under MPEP 2106.05(f).    The “supervised machine learning” generally links the use of the abstract idea to a particular technological environment or field of use under MPEP 2106.05(h).  The elements for receiving and outputting the data amount to insignificant extrasolution data gathering/presenting activities to the judicial exception
 As a result, Claim 1, 11, and 16 do not include additional elements that would integrate the abstract idea into a practical application under Step 2A Prong Two. 
Similarly, Claims 2-10, 12-15, and 17-20 do not include any additional elements beyond those recited with respect to claim 1. As a result, Claims 2-10, 12-15, and 17-20  do not include additional elements that would integrate the abstract idea into a practical application under Step 2A Prong Two for the same reasons as stated above with respect to claim 1. 
With respect to Step 2B of the framework, the claims do not include additional elements amounting to significantly more than the abstract idea. As noted above, claim 1 includes various elements that are not directed to the abstract idea under Step 2A Prong One of the framework. These additional elements include devices, processor, memory, instructions, computer readable storage media, supervised machine learning.    Examiner submits that the additional elements do not amount to significantly more than the abstract idea because these elements are generic computing elements performing generic computing functions and amount to mere instructions to apply the abstract idea on a computer under MPEP 2106.05(f) and/or recite generic computer structure that serves to perform generic computer functions that are well-understood, routine, and conventional activities previously known to the pertinent industry. The “supervised machine learning” generally links the use of the abstract idea to a particular technological environment or field of use under MPEP 2106.05(h), and are well-understood, routine, and conventional computer function in view of Spec, 0034, which describes the additional elements in such a manner as to indicate that the additional element is sufficiently well-known in the art.  The outputting is a well-understood, routine, and conventional computer function in view of Spec, 0071.  
The elements for receiving data amount to well-understood, routine, and conventional computer functions in view of MPEP 2106.05(d)(ll).
Further, looking at the additional elements as an ordered combination adds nothing that is not already present when looking at the additional elements individually. As a result, Claim 1, 11, and 16 do not include additional elements amounting to significantly more than the abstract idea under Step 2B. 
As noted above, Claims 2-10, 12-15, and 17-20 do not include any additional elements beyond those recited with respect to claim 1. As a result, Claims 2-10, 12-15, and 17-20 do not include additional elements amounting to significantly more than the abstract idea under Step 2B for the same reasons as stated above with respect to claim 1, 21 and 28. 
Accordingly, Claims 1-20 are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6 and 9-20 are rejected under 35 U.S.C. 103(a) as being unpatentable over Zheng (2022/0351087) in view of Wu (US 20150378975)

Regarding Claim 1, Zheng discloses:  
A method, performed by one or more computing devices, for identifying features for predicting missing attribute values, (Abstract, Fig 1, 0028-0029 – method, device; processors, instructions) the method comprising: 
receiving a set of structured data comprising a plurality of features, which are identified by at least feature name, feature data type, and feature value, and one or more labels; (0030-0031, 0043, Table 1 – feature name (age, gender, etc.), feature data type characteristics (string, Boolean, integer, or float)(numerical, non-numerical), feature value (ex. Owen, 22, male), label (feature to be predicted; whether car buyer purchased Electric Vehicle: Yes or No)   	
filtering a set of pre-processed features using correlation-based filtering, wherein the correlation-based filtering applies one or more correlation estimation techniques to the set of pre-processed features to remove at least some highly correlated features and produce a set of filtered features; (0033- pruning one or more correlated features (correlating determined based on Pearson correlation coefficient (PCC); Figure 4(404) – remaining features)  							performing feature subset selection comprising applying one or more supervised machine learning algorithms to the set of filtered features to determine relative importance values among the set of filtered features in relation to the one or more labels, wherein a subset of the set of filtered features is selected based at least in part on the determined relative importance values; (0034, 0046, Figure 3B-filtering out features based on importance to the performance of the machine learning model based on their univariate receiver operating characteristic (ROC) area under curve (AUC) score compared to a threshold;
0019-ML used - logistics regression, decision tree, random forests, extreme gradient boosting, neural networks
Figure 4(404) – remaining features) 
and outputting the subset of the set of filtered features. (0035, 0047, Figure 4-outputting remaining features) 
Zheng does not explicitly state:  Wu, in analogous art, discloses: pre-processing the set of structured data comprising applying a plurality of cleaning policies, wherein different cleaning policies, of the plurality of cleaning policies, are applied to different feature data types, and wherein the pre-processing produces a set of pre-processed features; 												[0016] Once a list of the possible attribute values for an item is extracted from similar items, the values may be cleaned and/or normalized. This part of the process may be used in order to put each of the possible values into a comparable state. In this part of the process, the attribute values may be converted into alternative unit types. For example, 12 inches, 18″, and 2 feet may be converted into 1 foot, 1.5 feet, and 2 feet respectively. Additionally, in some examples “feet” may be converted to “ft,” “ft.,” or any other measurement designation (e.g., “inches” or “in.”) as desired.  (includes text)   
0057-   Candidate values may also be chosen as maximum or minimum values. For example, in the text: “compatible with 16 GB, 32 GB, and 64 GB SD cards,” the process 600 may choose “64 GB” as the appropriate value for the available external memory attribute”   (includes numerical) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to integrate Wu’s cleaning to Zhang’s structured data,  helping to place all values into a comparable state (0016) and since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.

Regarding Claim 2, Zheng discloses: The method of claim 1, wherein one or more of the plurality of cleaning policies are data type specific cleaning policies, wherein a data type specific cleaning policy is specific to a particular feature data type. (See Claim 1 Above) 
Regarding Claim 3, Zheng discloses:The method of claim 1, wherein a first cleaning policy, of the plurality of cleaning policies, is applied to features having a feature data type of textual, and a second cleaning policy, of the plurality of cleaning policies, is applied to features having a feature data type of numerical. (See Claim 1 Above)
Regarding Claim 4, Zheng discloses: The method of claim 1, wherein the correlation-based filtering comprises: retaining features that are relatively uncorrelated in the set of filtered features. (0033-special features (to be removed) based on correlation are only those that are “correlated” to another feature based on PCC of the pair exceeding a threshold)
Regarding Claim 5, Zheng discloses: The method of claim 1, wherein the correlation-based filtering comprises: calculating pairwise correlation measures between pairs of features of the set of pre-processed features; and based on the correlation measures, determining which pairs of features are highly correlated. (0033-special features (to be removed) based on correlation are those that are “correlated” to another feature based on PCC of the pair exceeding a threshold)
Regarding Claim 6, Zheng discloses: The method of claim 5, wherein determining which pairs of features are highly correlated comprises comparing the pairwise correlation measures to a threshold value. (0033-special features (to be removed) based on correlation are those that are “correlated” to another feature based on PCC of the pair exceeding a threshold)	
Regarding Claim 9, Zheng discloses: The method of claim 1, wherein performing feature subset selection further comprises: retaining features, of the set of filtered features, that have relative importance above a threshold value; and filtering out features, of the set of features, that have relative importance at or below the threshold value. (0037- The feature pruning engine 150 may generate a pruned dataset based on correlated features by removing, from the input dataset, any feature sets associated with correlated features that are determined to be less important. … In some implementations, the feature pruning engine 150 may classify a correlated feature as less important if its ROC AUC score is below a threshold score.; Figure 3A, 0045(bottom) – “Remaining Correlated” based on ROC AUC equal to or above threshold)
Regarding Claim 10, Zheng discloses:The method of claim 1, wherein the subset of the set of filtered features is usable to predict missing values of the one or more labels. (0053, Figure 7- the machine learning system 700 may be used for training an ML model 710 based on the ML algorithm 218 and the pruned input dataset 220 produced by the pre-processing system 100…. In one example, the ML model 710 may be trained to predict a potential car buyer's preference for electric cars. With reference for example to Table 1, the pruned input dataset 220 may include only a subset of the values of the original input dataset. For example, the pruned input dataset 220 may include only the feature sets associate with the “name,” “age,” “gender,” “residence,” and “EV” features).
Claims 11-20 stand rejected based on the same citations and rationale as applied to Claims 1-6 and 9-10 above. 

Claims 7-8 are rejected under 35 U.S.C. 103(a) as being unpatentable over Zheng (2022/0351087) in view of Wu (US 20150378975) in view of Jiang (2006/0161403). 

Regarding Claim 7, Zheng discloses: The method of claim 5, further comprising: for one or more pairs of highly correlated features: determining which feature, of the pair of features, has a “ROC AUC score” below a threshold; filtering out the feature that has a  “ROC AUC score”  below a threshold; and retaining the other feature of the pair of features  (0045(bottom) - features associated with a ROC AUC scores equal to or above a threshold score may be classified as remaining correlated features 309. On the other hand, features associated with ROC AUC scores below the threshold score may be classified as less important correlated features 308).  		However Zheng does not explicitly state: Jiang, in analogous art, discloses, 	“for one or more pairs of highly correlated features: determining which feature, of the pair of features, has more populated values; filtering out the feature that has more populated values; and retaining the other feature of the pair of features”  (0029, 0073 -when a categorical and continuous variable are correlated, the categorical variable is dropped because categorical variables are expanded into multiple dummy variables (has more populated values), which require greater processing time and system resources when building the statistical model				[0029] “…categorical variables contained in the data set are expanded into dummy variables and added to the design matrix along with continuous variables….if a categorical variable is highly correlated with any continuous one, the categorical variable is discarded. In this embodiment, the categorical variables are dropped rather than continuous variables because categorical variables are expanded into multiple dummy variables, which require greater processing time and system resources when building the statistical model.		73(bottom) –dummy variables described as “variables that take the value I (one) for a particular category of the variable and the value 0 (zero) for all other categories of the variable.)    It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to associate Jiang’s feature filtering for more populated values to Zhang’s filtering, “helping save processing time and system resources when building a statistical model” (0029) 
Regarding Claim 8, Zheng discloses: The method of claim 1 wherein the correlation-based filtering comprises:  calculating pairwise correlation measures between pairs of features; and for each pairwise correlation measure above a threshold value, filtering out one of the pair of features associated with the pairwise correlation measure. [0037] Still further, in some aspects, the feature pruning engine 150 may generate a pruned dataset based on correlated features by removing, from the input dataset, any feature sets associated with correlated features that are determined to be less important. As described above, the importance of a feature may depend on its contribution to the performance of a machine learning model. A less important correlated feature is defined as any correlated feature that contributes very little (if at all) to the performance of a machine learning model. In some implementations, the feature pruning engine 150 may classify a correlated feature as less important if its ROC AUC score is below a threshold score. For example, the threshold may be defined or otherwise indicated by one or more special feature parameters stored in the database 120. In some other aspects, the feature pruning engine 150 may generate a pruned dataset based on less important numerical features by removing, from the input dataset, any feature sets associated with less important numerical features.) 						Zheng does not explicitly state the correlating and filtering is with respect to “each of a plurality of feature type groupings based on feature data type” Jiang discloses this limitation (0065- calculating different measures for correlation for continuous-continuous variable pairs versus continuous-binary variable pairs (different feature type groups)).						 			[0065] Univariate statistics pertain to a single variable. An exception is the sample correlation, which measures the degree of linear association between a pair of variables…. The correlation measure depends upon the underlying type of variables (i.e., it differs for a continuous-continuous pair and for a continuous-binary pair). 											(0082]  In one embodiment, if two variables exhibit a high pairwise correlation estimate, one of the two variables is dropped. The choice of which of the pair is dropped is governed by univariate correlation with the target.)   It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to associate Jiang’s feature type groupings to Zhang’s correlating and filtering, helping customize correlation for different feature type groups (0065), and since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
“Explanation Analysis for Records With Missing Values”   2022/0083519 
Figure 9, [0093] a relative importance for each candidate predictor 904 is generated, where such analysis is presented herein in the form of a bar chart 908 for illustrative purposes. As shown in the bar chart 908, a plurality of candidate predictors 910 (substantially similar to the candidate predictors 904/420) labeled “cholesterol,” “blood pressure,” “Na-to-K,” “gender,” and “age” as well as nondescript “A” through “E” are presented as a bar 912 (only one labeled) to indicate a relative importance with reference to the relative importance scale 914 in unitless values extending from 0.0 to +0.6 and 0.0 to −0.4.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Scott Ross whose telephone number is (571) 270-1555.  The examiner can normally be reached on Monday-Friday 8:00 AM - 5:00 PM E.S.T..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Rutao Wu, can be reached on (571) 272-6045.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Scott Ross/
Examiner - Art Unit 3623
/RUTAO WU/Supervisory Patent Examiner, Art Unit 3623