DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 USC 101 as being drawn to non-statutory subject matter. 
The independent claims are rejected under 35 U.S.C. 101 because the claimed invention is drawn to an abstract idea without significantly more. Independent claim 1 and the other independent claims are drawn to (a) select a feature set that includes multiple features from of the plurality of datasets; (b) train a model using data values of the multiple features of the feature set; (c) calculate metagradient data for each of the features in the feature set based on the trained model; (d) change, by taking into account the calculated metagradient data, which features, from among the features of the plurality of datasets, are included in the feature set; and repeat at least (b)-(d) until a convergence criteria is satisfied for the feature set.
The claims fall within the “Mental Processes” grouping of abstract ideas.  Specifically, the limitations of (a) select a feature set that includes multiple features from of the plurality of datasets; (b) train a model using data values of the multiple features of the feature set; (c) calculate metagradient data for each of the features in the feature set based on the trained model; (d) change, by taking into account the calculated metagradient data, which features, from among the features of the plurality of datasets, are included in the feature set; and repeat at least (b)-(d) until a convergence criteria is satisfied for the feature set, discussed above, as claimed, is a process that covers performance of the limitations in the mind, or with pen and paper, but for the recitation of generic computer components (e.g., computer, storage, computer readable medium) because a user can mentally, or with pen and paper, select a feature set that includes multiple features from of the plurality of datasets (e.g. verbally or on paper), train a model using data values of the multiple features of the feature set (e.g. train their brain over time), calculate metagradient data for each of the features in the feature set based on the trained model (e.g. calculate on paper metagradient data based on the trained brain), change, by taking into account the calculated metagradient data, which features, from among the features of the plurality of datasets, are included in the feature set (e.g. change features on paper), and repeat at least (b)-(d) until a convergence criteria is satisfied for the feature set (e.g., mentally repeat steps over time and make notes on paper of changes in order to train brain).  For example, the claim can be accomplished mentally or with the aid of pen and paper by a user hearing or seeing a request for information from another person and making a mental note of the request and determining from the person that they wish to cancel the request and stop the request mentally or with pen and paper.    
This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements (e.g., computer, database storage, computer readable medium) that are recited at a high-level of generality (e.g., as a generic processor performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. See 2106.05(d)(II).  Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception using a generic computer component.  Mere instructions to apply an exception using a generic computer component is not significantly more than the judicial exception.
The dependent claims depend on a rejected parent claim and do not cure its deficiencies.  Similar to the above discussion, each of the dependent claims are drawn to an abstract idea within the “Mental Processes” groupings of abstract ideas.  The claims are drawn to subject matter that covers performance of the claimed limitations in the mind, or with pen and paper, but for the recitation of generic computer components as discussed above.  The claims are not integrated into a practical application.  The claims only recite additional elements that is/are recited at a high-level of generality (e.g., as a generic processor performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computer component.  For example, claims 2-4 are drawn to determining and adjusting probability and changing features within a hierarchical graph. Claims 6-8 are drawn to selecting features, assigning probabilities, and adding changes to a feature set.  The other dependent claims contain similar issues. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception using a generic computer component.  Therefore, the claims are not patent eligible.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 5-13, 15, 19 and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Phillipps et al (US 2014/0372346 A1).

Regarding claim 1, Phillipps discloses a computer system (systems and computer program products; system 100 for data intelligence; Fig. 1 & para [0031], [0035]) comprising: a non-transitory storage medium configured to store a plurality of datasets that each include a plurality of features (a data storage device, a hardware computing device with a processor and memory, or another entity in communication with a data intelligence module; data repository 406, stores initialization data; data set including one or more features of the initialization data; Figs. 1, 4 & para [0035], [0134], [0145]); a processing system that includes at least one hardware processor (a hardware computing device with a processor and memory, or another entity in communication with a data intelligence module, para [0035]), the processing system configured to (a processor and memory, or another entity in communication with a data intelligence module; para [0035]): (a) select a feature set that includes multiple features from of the plurality of datasets (a feature selector module selects subsets of features; training data subset may include a subset of features of initialization data; para [0106], [(0112]-[0115]); (b) train a model using data values of the multiple features of the feature set (function generator module 301 may generate learned functions from multiple different machine learning classes, models; a learned function to actual values from initialization data; data set including one or more features of the initialization data; para [0100], [0133], [(0135]); (c) calculate metagradient data for each of the features in the feature set based on the trained model (based on the evaluation metadata, the machine learning compiler module may use learned functions (trained model) that utilize the selected features to build the machine learning ensemble; para [0112], [0113], [0016], [0117], [0436]); (d) change, by taking into account the calculated metagradient data, which features, from among the features of the plurality of datasets, are included in the feature set (the associated features which are most accurate or effective based on the evaluation metadata for the different machine learning ensembles; the machine learning compiler module may generate tens, hundreds, thousands, millions, or more different machine learning ensembles so that the feature selector module 304 may select an optimal set of features (e.g. the most accurate, most effective); para [0112], [0116]-[0119], [0136]); and repeat at least (b)-(d) until a convergence criteria is satisfied for the feature set (repeat generating, evaluating learned functions while iterating through permutations of feature sets; identify a set of relationships, distances, and/or confidences that satisfy a simplicity threshold; para [0077], [0117]).

Regarding claim 5, Phillipps discloses the computer system of claim 1. in addition, Phillipps discloses wherein change of the features that are included in the feature set is further based on applying at least one of a penalty function or at feast one constraint function (integrity constraints; para [0673], [0123]).

Regarding claim 6, Phillipps discloses the computer system of claim 1. In addition, Phillipps discloses wherein change of the features that are included in the feature set is further based selection of datasets from among the plurality of datasets (feature selector module evaluates the effectiveness of various features, based on evaluation metadata from the metadata library; generates a plurality of learned functions for various combinations of features, and the machine learning compiler module 302 may evaluate the learned functions and generate evaluation metadata; based on the evaluation metadata, the feature selector module 304 may select a subset of features that are most accurate or effective; para [0112]) and then selection of features from within the selected datasets for the selected feature set (the feature selector module may select an optimal set of features; para [0112)).

Regarding claim 7, Phillipps discloses the computer system of claim 6. In addition, Phillipps discloses wherein each of the plurality of datasets is assigned a selection probability (heterogeneous data sets based on probabilistic relationships derived from machine learning; para [0073]) and selection of the datasets is further based on the selection probability (the feature selector module may select the associated features which are most accurate or effective: the feature selector module to determine one or more features, instances of features that correlate with higher confidence metrics; para [0112], [0113], [0117], [0745)).

Regarding claim 8, Phillipps discloses the computer system of claim 6. In addition, Phillipps discloses wherein each of the plurality of features within each of the plurality of datasets is assigned a selection probability (the feature selector module may select the associated features which are most accurate or effective; the feature selector module to determine one or more features, instances of features that correlate with higher confidence metrics; para [0112], [0113], [0117], [0145]) and the change in which features are included in the feature set is further based on the selection probability (the associated features which are most accurate or effective based on the evaluation metadata for the different machine learning ensembles; the machine learning compiler module may generate tens, hundreds, thousands, millions, or more different machine learning ensembles so that the feature selector module 304 may select an optimal set of features (9.c. the most accurate, most effective); para [0112], [0116]-[0119], [0136]).

Regarding claim 9, Phillipps discloses the computer system of claim 1. In addition, Phillipps discloses wherein the processing system is further configured to (a processor and memory, or another entity in communication with a data intelligence module; para [0035]): calculate, prior to (a), a relevancy value for each one of the plurality of datasets that indicates the relevancy of a corresponding dataset to a target signal (identify relationships between data elements in the unstructured data set; calculate the maximum and minimum values in a column/row, the average column length, and the number of distinct values in a column; these statistics can assist the unsupervised learning module to identify the likelihood that two or more columns/row are related; two data sets that have a maximum value of 10 and 10,000, respectively, may be less likely to be related than two data sets that have identical maximum values; generates learned functions pseudo-randomly, without prior knowledge regarding the suitability of the generated learned functions for the associated training data; para [0085], [0086], [0101]).

Regarding claim 10, Phillipps discloses the computer system of claim 9. In addition, Phillipps discloses wherein selection of the feature set in (a) is further based on based on the relevancy value that is associated with each of the plurality of datasets (the unsupervised learning module may establish a confidence value, a confidence metric between features of data sets; the confidence metric of a predicted result by attempting different combinations of features and subsets of instances within an individual feature’s dataset; para [0074], [0090], [0018]-[0120]).

Regarding claim 11, Phillipps discloses the computer system of claim 1. In addition, Phillipps discloses wherein the model is trained with a machine learning process that includes an expectation learner and a policy learner (supervised learning methods to generate one or more machine learning ensembles; machine learning ensembles may be used to provide machine learning results, such as a confidence matric, an inferred function (expectation), a prediction, a rule (policy), a recommendation, or other results; predictive analytics is the study of past performance accomplished using a variety of techniques including statistics, modeling, machine learning; para [0040], [0044]).

Regarding claim 12, Phillipps discloses a method performed on a computer system that includes at least one hardware processor (a hardware computing device with a processor and memory, or another entity in communication with a data intelligence module; para [0035]), the method (the method; Abstract) comprising: storing, to a data storage device that is coupled to the computer system, a plurality of datasets that each include a plurality of features (a data storage device, a hardware computing device with a processor and memory, or another entity in communication with a data intelligence module; data repository 406, stores initialization data; data set including one or more features of the initialization data; Figs. 1, 4 & para [0035], [0131], [0145]); selecting a feature set from among the plurality of features of the plurality of datasets (a feature selector module selects subsets of features; training data subset may include a subset of features of initialization data; para [0106], [0112]-[0115]); and performing a process that loops (a)-(d) until a convergence criteria (e) is satisfied for the feature set (repeat generating, evaluating learned functions while iterating through permutations of feature sets; iteratively increasing the number of features used to generate machine learning ensembles until an increase in effectiveness or usefulness of the results of the generated machine learning ensembles fails to satisfy a feature effectiveness threshold (convergence criteria); para [0114], [0115], [0117]), the process including: (a) training a model using data values of those features that are included in the feature set (function generator module 304 may generate learned functions from multiple different machine learning classes, models; a learned function to actual values from initialization data; data set including one or more features of the initialization data; para [0100], [0133], [0135]), (b) calculating metagradient data for the feature set with respect to the trained model (based on the evaluation metadata, the machine learning compiler module may use learned functions (trained model) that utilize the selected features to build the machine learning ensemble; para [0112], [0413], [0016], [0117], [0136]), (c) adjusting selection probability of at least one of the features of the plurality of features based on the calculated metagradient data (based on the evaluation metadata, the feature selector module may select a subset of features that are most accurate or effective; establish a confidence value, a confidence metric that a certain field belongs to a feature; a confidence metric may include a percentage or another indicator of accuracy, effectiveness, and/or confidence; para [0112], [0113], [0138]), (d) replacing at least one of the features in the feature set by taking into account the selection probability that was adjusted according to the metagradient data (selecting a subset of features that complement each other to provide best prediction accuracy; based on the evaluation metadata, the feature selector module may select a subset of features that are most accurate or effective; para [0112], [0113]), and (e) determining whether the convergence criteria is satisfied for the feature set in (d) (repeat generating, evaluating learned functions while iterating through permutations of feature sets; iteratively increasing the number of features used to generate machine learning ensembles until an increase in effectiveness or usefulness of the results of the generated machine learning ensembles fails to satisfy a feature effectiveness threshold (convergence criteria); identify a set of relationships, distances, and/or confidences that satisfy a simplicity threshold, para [0077], [0414], [0115], [0447]).

Regarding claim 13, Phillipps discloses the method of claim 12. In addition, Phillipps discloses discloses storing a selection probability value for each of the plurality of features (store a confidence metric representing a likelihood (probability) that a different confidence value that the field belongs in a feature; stores initialization data; initialization data indexed by feature, by instance, by training data subset, by test data subset; para [0075], [0145]), wherein application of the calculated metagradient data adjusts the selection probability value of the at least one feature (based on the evaluation metadata, the feature selector module may select a subset of features that are most accurate or effective; establish a confidence value, a confidence metric that a certain field belongs to a feature; a confidence metric may include a percentage or another indicator of accuracy, effectiveness, and/or confidence; para [0112], [0113], [0138]).

Regarding claim 15, Phillipps discloses the method of claim 12. In addition, Phillipps discloses wherein change of the features that are included in the feature set is further based on applying at least one of a penalty function or at least one constraint function (integrity constraints; para [0073], [0123]).

Regarding claim 19, Phillipps discloses a non-transitory computer readable storage medium storing computer executable instructions for use with a computer system that that includes at least one hardware processor (a data storage device, a hardware computing device with a processor and memory, or another entity in communication with a data intelligence module; data repository 406, stores initialization data; data set including one or more features of the initialization data; Figs. 1, 4 & para [0035], [0131], [0145]), the computer system coupled a storage device storing a plurality of datasets that each include a plurality of features (a data storage device, a hardware computing device with a processor and memory, or another entity in communication with a data intelligence module; data repository 406, stores initialization data; data set including one or more features of the initialization data; Figs. 1, 4 & para [0035], [0131], [0145]), the computer executable instructions comprising instructions that cause the computer system to (computer program instructions may also be stored in a computer readable storage medium that can direct a computer to function in a particular manner, such that the instructions stored in the computer readable storage medium produce an article of manufacture including instructions which implement the function/act; para [0029]): (a) select a feature set that includes multiple features from of the plurality of datasets (a feature selector module selects subsets of features; training data subset may include a subset of features of initialization data; para [0106], [0142]-[0115]); (b) train a model using data values of the multiple features of the feature set (function generator module 301 may generate learned functions from multiple different machine learning classes, models; a learned function to actual values from initialization data; data set including one or more features of the initialization data; para [0100], [0133], [0135]); (c) calculate metagradient data based on the trained model (based on the evaluation metadata, the machine learning compiler module may use learned functions (trained model) that utilize the selected features to build the machine learning ensemble; para [0112], [0113], [0116], [0117], [0136]); (d) change, by taking into account the calculated metagradient data, which features, from among the features of the plurality of datasets, are included in the feature set (the associated features which are most accurate or effective based on the evaluation metadata for the different machine learning ensembles; the machine learning compiler module may generate tens, hundreds, thousands, millions, or more different machine learning ensembles so that the feature selector module 304 may select an optimal set of features (e.g. the most accurate, most effective); para [0112], [0116]-[0119], [0136]); and repeat at least (b)-(d) until a convergence criteria is satisfied for the feature set (repeat generating, evaluating learned functions while iterating through permutations of feature sets; iteratively increasing the number of features used to generate machine learning ensembles until an increase in effectiveness or usefulness of the results of the generated machine learning ensembles fails to satisfy a feature effectiveness threshold (convergence criteria); identify a set of relationships, distances, and/or confidences that satisfy a simplicity threshold; para [0077], [0114], [0115], [0117]).

Regarding claim 20, Phillipps discloses the non-transitory computer readable storage medium of claim 19. in addition, Phillipps discloses wherein change of the features that are included in the feature set is further based on applying at least one of a penalty function or at least one constraint function (integrity constraints; para [0073], [0123]).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 2-4, 14, and 16-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Phillipps et al (US 2014/0372346 A1) in view of Guyon et al (US 2008/0087938 A1).

Regarding claim 2, Phillipps discloses the computer system of claim 1. In addition, Phillipps fails to disclose wherein the plurality of datasets and the plurality of features are stored in a hierarchical graph data structure and each one of the plurality of features has a probability value of being selected for the selected feature set. 
Guyon discloses wherein the plurality of datasets and the plurality of features are stored in a hierarchical graph data structure (processing heterogeneous data sets in a hierarchical structure; a single graph a structure of features previously obtained, including ranked (hierarchical) lists of subsets of features, ranked lists of features, or trees of features; para [0069] & claim 27) and each one of the plurality of features has a probability value of being selected for the selected feature set (conversion of scores into a quantity that can be interpreted as a probability or a degree of belief that a given feature or feature subset is "good": selecting a subset of features that complement each other to provide past prediction accuracy; para [0102[-[0107]). 
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to include wherein the plurality of datasets and the plurality of features are stored in a hierarchical graph data structure and each one of the plurality of features has a probability value of being selected for the selected feature set as taught by Guyon into the system of Phillipps for the purpose of providing a data analysis engine that includes a pre-processing function for feature selection, for reducing the amount of data to be processed by selecting the optimum number of attributes, or "features", relevant to the information to be discovered.

Regarding claim 3, Phillipps in view of Guyon discloses the computer system of claim 2. in addition, Phillipps discloses wherein the processing system is further configured to (a processor and memory, or another entity in communication with a data intelligence module; para [0035]): and adjust the probability value for at least one feature of the plurality of features based on the calculated metagradient data (based on the evaluation metadata, the feature selector module may select a subset of features that are most accurate or effective: establish a confidence value, a confidence metric that a certain field belongs to a feature; a confidence metric may include a percentage or another indicator of accuracy, effectiveness, and/or confidence; para [0112], [0113], [0138]).

Regarding claim 4, Phillipps in view of Guyon discloses the computer system of claim 2. Phillipps fail to disclose wherein the hierarchical graph data structure includes a root node with a plurality of first child nodes that each correspond to one of the plurality of datasets, wherein each of the plurality of first child nodes has a plurality of second child nodes that each correspond to one of the plurality of features of a corresponding dataset, wherein changing which features are included is performed by using the hierarchical graph data structure. 
Guyon discloses wherein the hierarchical graph data structure includes a root node with a plurality of first child nodes that each correspond to one of the plurality of datasets (processing heterogeneous data sets in a hierarchical structure; FIG. 14 illustrates the gene tree (observation graph): a the children of the root node represent alternate choices for the first feature; the children of the children of the root are alternate choices for the second features of the data sets; Figure 14 & para [0069], [0112]-[0114], [0278]), wherein each of the plurality of first child nodes has a plurality of second child nodes that each correspond to one of the plurality of features of a corresponding dataset (the children of the children of the root are alternate choices for the second features of the data sets; Figure 14 & para [0112]-[0114]), wherein changing which features are included is performed by using the hierarchical graph data structure (adjusting kernel parameters; processing heterogeneous data sets in a hierarchical structure; FIG. 14 illustrates the gene tree (observation graph); para [0054], [0069], [0278]).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to include wherein the hierarchical graph data structure includes a root node with a plurality of first child nodes that each correspond to one of the plurality of datasets, wherein each of the plurality of first child nodes has a plurality of second child nodes that each correspond to one of the plurality of features of a corresponding dataset, wherein changing which features are included is performed by using the hierarchical graph data structure as taught by Guyon into the system of Phillipps for the purpose of providing a data analysis engine that includes a pre-processing function for feature selection, for reducing the amount of data to bs processed by selecting the optimum number of attributes, or “features”, relevant to the information to be discovered.

Regarding claim 14, Phillipps discloses the method of claim 12. Phillipps fails to disclose wherein the plurality of datasets and plurality of features are stored in a hierarchical graph data structure includes a root node with a plurality of first child nodes that each correspond to one of the plurality of datasets, wherein each of the plurality of first child nodes has a plurality of second child nodes that each correspond to one of the plurality of features of a corresponding dataset, wherein the hierarchical graph data structure includes at least one value that is used for selection probability, wherein the at least one value is adjusted based on the calculated metagradient data. 
Guyon discloses discloses wherein the plurality of datasets and plurality of features are stored in a hierarchical graph data structure includes a root node with a plurality of first child nodes that each correspond to one of the plurality of datasets (processing heterogeneous data sets in a hierarchical structure; FIG. 14 illustrates the gene tree (observation graph):a the children of the root node represent alternate choices for the first feature; the children of the children of the root are alternate choices for the second features of the data sets; Figure 14 & para [0069], [0112]-[0114], [0278]), wherein each of the plurality of first child nodes has a plurality of second child nodes that each correspond to one of the plurality of features of a corresponding dataset (the children of the children of the root are alternate choices for the second features of the data sets; Figure 14 & para [0112]-[0114]), wherein the hierarchical graph data structure includes at least one value that is used for selection probability (adjusting kernel parameters; processing heterogeneous data sets in a hierarchical structure; FIG. 14 illustrates the gene tree (observation graph), selection scheme where f.sub.b can be added once f.sub.a has been selected with the probability P(f.sub.af.sub.b|f.sub.a) of making a good choice; para [0054], [0069], [0108], [0278]), wherein the at least one value is adjusted based on the calculated metagradient data (conversion of scores into a quantity that can be interpreted as a probability or a degree of belief that a given feature or feature subset is "good"; para [0102]-[0107]). 
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to include wherein the plurality of datasets and plurality of features are stored in a hierarchical graph data structure includes a root node with a plurality of first child nodes that each correspond to one of the plurality of datasets, wherein each of the plurality of first child nodes has a plurality of second child nodes that each correspond to one of the plurality of features of a corresponding dataset, wherein the hierarchical graph data structure includes at least one value that is used for selection probability, wherein the at least one value is adjusted based on the calculated metagradient data as taught by Guyon into the system of Phillipps for the purpose of providing a data analysis engine that includes a pre-processing function for feature selection, for reducing the amount of data to be processed by selecting the optimum number of attributes, or “features”, relevant to the information to be discovered.

Regarding claim 16, Phillipps discloses the method of claim 12. Phillipps fails to disclose wherein replacement of features in the feature set is further based on selection of datasets from among the plurality of datasets and then selection of features from within the selected datasets for the selected feature set. 
Guyon discloses discloses wherein replacement of features in the feature set is further based on selection of datasets from among the plurality of datasets and then selection of features from within the selected datasets for the selected feature set (selected kernels may be replaced or modified; after the kernel selection is adjusted, the pre-processed training data is input into the SVM for training purposes; the live data is pre-processed in the same manner as was the training data and the test data; ranked lists of subsets of features can be represented, with the features identifiers being replaced by the identifiers of all features of the subset; para [0054], [0055], [0271], [0275]). 
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to include wherein replacement of features in the feature set is further based on selection of datasets from among the plurality of datasets and then selection of features from within the selected datasets for the selected feature set as taught by Guyon into the system of Phillipps for the purpose of providing a ranked list of features or subsets of features to select a subset of features that complement each other to provide best prediction accuracy.

Regarding claim 17, Phillipps discloses the method of claim 12, Phillipps fails to disclose wherein replacement of the at least one of the features is performed without replacement. 
Guyon discloses wherein replacement of the at least one of the features is performed without replacement (selection of the kernel may be adjusted and the support vector machine may be retrained and retested; once it is determined that the optimal solution has been identified, a live data set may be collected and pre-processed in the same manner as was the training data set to select the features that best represent the data; selecting one or more new kernels or adjusting kernel parameters; in the case where multiple SVMs were trained and tested simultaneously, selected kernels may be replaced or modified (without replacement); para [0023], [0054]). 
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to include to include wherein replacement of the at least one of the features is performed without replacement as taught by Guyon into the system of Phillipps for the purpose of providing kernels may be re-used for control purposes.

Regarding claim 18, Phillipps discloses the method of claim 12. in addition, Phillipps discloses wherein the model is trained with a machine learning process that includes an expectation learner and a policy learner (supervised learning methods to generate one or more machine learning ensembles; machine learning ensembles may be used to provide machine learning results, such as a confidence metric, an inferred function, a prediction (expectation), a rule (policy), a recommendation, or other results; predictive analytics is the study of past performance accomplished using a variety of techniques including statistics, modeling, machine learning; para [0040], [0041]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JARED M BIBBEE whose telephone number is (571)270-1054. The examiner can normally be reached Monday-Thursday 8AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, APU MOFIZ can be reached on 5712724080. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JARED M BIBBEE/          Primary Examiner, Art Unit 2161