DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The filing date of the present application is 06/03/2019.
This action is in response to amendment and/or arguments filed on 11/22/2022, claims 1-5, 7, 10-15, 17 and 20 have been amended, claims 8-9 and 18 have been cancelled and claims 21-23 have been added. Claims 1-7, 10-17 and 19-23 are currently pending and have been examined. 
In response to amendments and/or remarks filed on 11/22/2022, the claim objections made in the previous Office Action has been withdrawn. 
In response to amendments and/or remarks filed on 11/22/2022, the 35 U.S.C 112f claim interpretations made in the previous Office Action has been withdrawn. 
In response to amendments and/or remarks filed on 11/22/2022, the 35 U.S.C 112(b) rejections made in the previous Office Action has been withdrawn. 


Response to Arguments
Applicant’s arguments with respect to claim(s) 11/29/2022 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-7, 10-17 and 19-23 are rejected under 35 U.S.C. 103 as being unpatentable over US20200090075A1 to Achin et al. (hereinafter, Achin), in view of “Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning” to Raschka et al. (hereinafter, Raschka) and further in view of Huang et al. (“Model selection for support vector machines via uniform design”). 
Regarding claim 1 (Currently Amended)
Achin teaches a system, comprising: a memory that stores computer executable components; and a processor that executes the computer executable components stored in the memory; wherein the computer executable components comprise (para [0025] “According to another aspect of the present disclosure, a predictive modeling apparatus is provided, including a memory configured to store processor-executable instructions; and a processor configured to execute the processor-executable instructions,”):
a selection component that selects a metric of performance evaluation accuracy; a configuration component that configures a performance evaluation scheme for a set of machine learning algorithms (Figure 3, Box 310. [0040] “A machine-executable template may include one or more predictive modeling algorithms… The algorithm(s), pre-processing steps, and/or post-processing steps may be parameterized. A machine-executable template may be applied to a user dataset to generate potential predictive modeling solutions for the prediction problem represented by the dataset.”), 
wherein the configuration component: generates a set of data points for performance evaluation schemes, (para [0132] “To facilitate cross-validation, predictive modeling system 100 may partition the dataset (or suggest a partitioning of the dataset) into K "folds”. Cross-validation comprises fitting a predictive model to the partitioned dataset K times, such that during each fitting, a different fold serves as the test set and the remaining folds serve as the training set. Cross - validation can generate useful information about how the accuracy of a predictive model varies with different training data. In steps 418 and 422, predictive modeling system may partition the dataset into K folds, where the number of folds K is a default parameter” also see para [0133] “To facilitate rigorous testing of the predictive models, predictive modeling system 100 may partition the data set (or suggest a partitioning of the dataset) into a training set and a “holdout” test set. In some embodiments, the training set is further partitioned into K folds for cross- validation. The training set may then be used to train and evaluate the predictive models, but the holdout test set may be reserved strictly for testing the predictive models.”)
wherein each data point comprises: a respective for a corresponding performance evaluation scheme of the performance evaluation schemes, (para [0132] “To facilitate cross-validation, predictive modeling system 100 may partition the dataset (or suggest a partitioning of the dataset) into K "folds”. Cross-validation comprises fitting a predictive model to the partitioned dataset K times, such that during each fitting, a different fold serves as the test set and the remaining folds serve as the training set. Cross - validation can generate useful information about how the accuracy of a predictive model varies with different training data. In steps 418 and 422, predictive modeling system may partition the dataset into K folds, where the number of folds K is a default parameter. In step 426, the user may change the number of folds K or cancel the use of cross-validation altogether.”)
and a value for the metric of performance evaluation accuracy of the corresponding performance evaluation scheme using the respective performance evaluation configuration parameter; (para [0120] “At step 406 of method 400, exploration engine 110 prompts the user to identify which of the variables are targets and / or which are features. In some embodiments, exploration engine 110 also prompts the user to identify the metric of model performance to be used for scoring the models (e.g., the metric of model performance to be optimized, in the sense of statistical optimization techniques, by the statistical learning algorithm implemented by exploration engine 110 )”) 
…
and an optimization component that optimizes accuracy of the set of machine learning algorithms as a second function of a size of a training data set relative to a size of a validation data set through selection of values associated with configuration parameters of the set of machine learning algorithms (para [0100] “Fitting the predictive models to the prediction problem's dataset(s) may include tuning one or more hyper-parameters of the predictive modeling procedure that generates the predictive model, tuning one or more parameters of the generated predictive model, and/or other suitable model-fitting steps.” [0134] “In some embodiments, predictive modeling system 100 partitions the dataset to facilitate efficient use of computing resources during the evaluation of the modeling search space. For example, predictive modeling system 100 may partition the cross-validation folds of the dataset into smaller samples. Reducing the size of the data samples to which the predictive models are fitted may reduce the amount of computing resources needed to evaluate the relative performance of different modeling techniques… Hyper-parameters include variable settings for a modeling technique that can affect the speed, efficiency, and/or accuracy of model fitting process.”).  
Achin does not teach identifies a first function that best fits the set of data points; 
and selects the respective performance evaluation configuration parameter for the corresponding performance evaluation scheme that maximizes the first function as a configuration of the performance evaluation scheme for the set of machine learning algorithms; 
…
a characterization component that employs a supervised learning-based approach to characterize a relationship between the configuration of the performance evaluation scheme and fidelity of performance estimates.
Raschka teaches a characterization component that employs a supervised learning-based approach to characterize relationship between the configuration of the performance evaluation scheme and fidelity of performance estimates (p.45-46, “Mainly, we can think of model selection as another training procedure, and hence, we would need a decently-sized, independent test set that we have not seen before to get an unbiased estimate of the models’ performance... Varma and Simon found that the nested cross-validation approach can reduce the bias, compared to regular k-fold cross-validation when used for both hyperparameter tuning and evaluation, can be considerably be reduced. As the researchers state, "A nested CV procedure provides an almost unbiased estimate of the true error”” Figure 22. Examiner Note: Performing a nested training loop as described by Raschka includes both the inner loop parameter tuning for accuracy in a particular data fold and a second, outer training function used to estimate the generalization performance (e.g., fidelity of the inner performance estimate). When applied to Achin’s system including selection of a variety of performance evaluation scheme configurations, the resulting system would apply a particular performance evaluation scheme configuration to the inner loop evaluation, and characterize the relationship between the configuration and the generalization performance of the inner loop in a supervised learning approach.); 
Achin and Raschka are analogous art because they are both directed to machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Achin’s performance evaluation with Raschka’s confidence intervals. The combination would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention because he/she would have been motivated to increase reliability of the system, which can be accomplished by optimizing its certainty (Raschka p.14 “Let us assume that we would like to compute a confidence interval around a performance estimate to judge its certainty – or uncertainty.”).
Achin in view of Raschka does not teach identifies a first function that best fits the set of data points; and selects the respective performance evaluation configuration parameter for the corresponding performance evaluation scheme that maximizes the first function as a configuration of the performance evaluation scheme for the set of machine learning algorithms.
Huang teaches identifies a first function that best fits the set of data points; (pg. 340 “We perform the UD-based method in two stages. At the first stage, we use a 13-run UD sampling pattern (see Fig. 3) in the appropriate search range proposed above. At the second stage, we halve the search range for each parameter coordinate in the log-scale and let the best point from the first stage be the center point of the new search box. We do allow the second-stage UD points to fall outside the prescribed search box.”)
and selects the respective performance evaluation configuration parameter for the corresponding performance evaluation scheme that maximizes the first function as a configuration of the performance evaluation scheme for the set of machine learning algorithms (pg. 338 first paragraph “In this article, we consider the parameter space consisting of the regularization parameter C and the Gaussian kernel width parameter. For e-insensitive support vector regression, we leave the parameter e as user pre-specified. That is, our search region is a two-dimensional box. It is obvious that the exhaustive grid search cannot do automatic model selection effectively due to its high computational cost. For example, for a grid search with 20 × 20 mesh parameter combinations (400 trials) in a 5-fold cross-validation, it will take 2000 times of SVM trainings to select the best parameter combination.” Section 5 “Many numerical experiments and past experience have indicated that the width parameter is the key factor in SVMs model selection. Hence, the appropriate range must be made prior to parameter search. We note that the function value of the Gaussian kernel not only depends on but also on the distance between two data points. The magnitude of the distance between a pair of data points also depends on the input space dimension.”).
Achin, Raschka and Huang are analogous art because they are all directed to machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Achin’s in view of Raschka’s performance evaluation with Huang model selection using best fits data points. The combination would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention because he/she would have been motivated to “dramatically cut down the number of parameter trials and also provide the flexibility to adjust the candidate set size under computational time constraint” as disclosed by (Huang p.14 “Let us assume that we would like to compute a confidence interval around a performance estimate to judge its certainty – or uncertainty.”).

Regarding claim 2
As per claim 2, the combination of Achin in view of Raschka and Huang thus far teaches The system of claim 1.
Achin teaches further comprising a validation component that adapts a validation configuration decision to the machine learning algorithms and adjusts ratio of the size of training data set relative to the size of validation data set ([0082] “A predictive modeling procedure may be eliminated from consideration based on the results of applying one or more deductive rules to the attributes of the predictive modeling procedure and the characteristics of the prediction problem… (5) if the width of the dataset exceeds a threshold width, select or prioritize techniques that provide dimension reduction; (6) if the dataset is large and sparse (e.g., the size of the dataset exceeds a threshold size and the sparseness of the dataset exceeds a threshold sparseness), select or prioritize techniques that execute efficiently on sparse data structures; and/or any rule for selecting, prioritizing, or eliminating a modeling technique wherein the rule can be expressed in the form of an if-then statement.” [0133] “To facilitate rigorous testing of the predictive models, predictive modeling system 100 may partition the dataset (or suggest a partitioning of the dataset) into a training set and a “holdout” test set. In some embodiments, the training set is further partitioned into K folds for cross-validation. The training set may then be used to train and evaluate the predictive models, but the holdout test set may be reserved strictly for testing the predictive models.” [0134] “In some embodiments, predictive modeling system 100 partitions the dataset to facilitate efficient use of computing resources during the evaluation of the modeling search space. For example, predictive modeling system 100 may partition the cross-validation folds of the dataset into smaller samples. Reducing the size of the data samples to which the predictive models are fitted may reduce the amount of computing resources needed to evaluate the relative performance of different modeling techniques.”).  

Regarding claim 3
As per claim 3, the combination of Achin in view of Raschka and Huang thus far teaches The system of claim 1.
Achin teaches further comprising a determining component that respectively determines use of same or different configuration for the machine learning algorithms (Figure 3, Box 310. [0040] “A machine-executable template may include one or more predictive modeling algorithms… The algorithm(s), pre-processing steps, and/or post-processing steps may be parameterized. A machine-executable template may be applied to a user dataset to generate potential predictive modeling solutions for the prediction problem represented by the dataset.” Examiner Note: Selecting a template including one parameterization step instead of selecting a template including a second parameterization step is seen as equivalent to determining the use of a different configuration.).  
Regarding claim 4
As per claim 4, the combination of Achin in view of Raschka and Huang thus far teaches The system of claim 1.
Achin teaches further comprising a determination component that determines configuration for each machine learning algorithm or subset of machine learning algorithms (Figure 3, Box 310. [0040] “A machine-executable template may include one or more predictive modeling algorithms… The algorithm(s), pre-processing steps, and/or post-processing steps may be parameterized. A machine-executable template may be applied to a user dataset to generate potential predictive modeling solutions for the prediction problem represented by the dataset.”).  


Regarding claim 5 (Currently Amended)
As per claim 5, the combination of Achin in view of Raschka and Huang thus far teaches The system of claim 2.
Achin teaches wherein the configuration component generates a set of samples of the ratio and the metric of performance evaluation accuracy ([0134] “For example, predictive modeling system 100 may partition the cross-validation folds of the dataset into smaller samples. Reducing the size of the data samples to which the predictive models are fitted may reduce the amount of computing resources needed to evaluate the relative performance of different modeling techniques. In some embodiments, the smaller samples may be generated by taking random samples of a fold's data. Likewise, reducing the size of the data samples to which the predictive models are fitted may reduce the amount of computing resources needed to tune the parameters of a predictive model or the hyper-parameters of a modeling technique.” [0203] “4. Generate input features, fit models, optimize model-specific tuning parameters, and evaluate performance: In some embodiments, feature generating may include scaling for numerical covariates, Box-Cox transformations, principal components, etc. Tuning parameters for the models may be optimized via cross-validation. Validation set performance measures may be computed and presented for each model, along with other summary characteristics (e.g., model parameters for regression models, variable importance measures for boosted trees or random forests)”). 


 Regarding claim 6
As per claim 6, the combination of Achin in view of Raschka and Huang thus far teaches The system of claim 5.
Achin teaches further comprising a supervision component that evaluates the set of samples of the ratio ([0134] “In some embodiments, predictive modeling system 100 partitions the dataset to facilitate efficient use of computing resources during the evaluation of the modeling search space. For example, predictive modeling system 100 may partition the cross-validation folds of the dataset into smaller samples. Reducing the size of the data samples to which the predictive models are fitted may reduce the amount of computing resources needed to evaluate the relative performance of different modeling techniques. In some embodiments, the smaller samples may be generated by taking random samples of a fold's data. Likewise, reducing the size of the data samples to which the predictive models are fitted may reduce the amount of computing resources needed to tune the parameters of a predictive model or the hyper-parameters of a modeling technique.”).  

 Regarding claim 7 (Currently Amended)
As per claim 7, the combination of Achin in view of Raschka and Huang thus far teaches The system of claim 6.
Achin further teaches The system of claim 6, wherein the supervision component uses a subset of the set of data points (para [0289] “To improve execution speed and reduce resource consumption, the engine 110 may suggest a default down-sampling. There are two types of down-sampling, chronological and cross-sectional. In chronological down-sampling, the total number of observations within each partition may be reduced by a fixed percentage or aggregated to longer time interval resolution.” [0291] “One challenge in producing accurate predictions from time series models is that they may be sensitive to the choice of training and validation time windows. In some embodiments, the engine automatically evaluates this sensitivity. For example, the engine 110 may evaluate the sensitivity to time window choice for every modeling technique as it is executing. As another example, the engine 110 may evaluate this sensitivity after a model exceeds a certain threshold of predictive accuracy. A third option is to evaluate sensitivity of the top models based on their relative predictive accuracy.”).  

 Regarding claim 10 (Currently Amended)
As per claim 10, the combination of Achin in view of Raschka and Huang thus far teaches The system of claim 2. 
Raschka further teaches wherein the validation component uses a procedure for selecting holdout size and a scheme to reduce variance in a generalization error estimate in holdout validation via bootstrapping (p.16, “The bootstrap method is a resampling technique for estimating a sampling distribution, and in the context of this article, we are particularly interested in estimating the uncertainty of a performance estimate – the prediction accuracy or error…Walking through it step by step, the bootstrap method works like this: 1. We are given a dataset of size n. 2. For b bootstrap rounds: We draw one single instance from this dataset and assign it to the jth bootstrap sample. We repeat this step until our bootstrap sample has size n – the size of the original dataset. Each time, we draw samples from the same original dataset such that certain examples may appear more than once in a bootstrap sample and some not at all. 3. We fit a model to each of the b bootstrap samples and compute the resubstitution accuracy. 4. We compute the model accuracy as the average over the b accuracy estimates (Equation 23).”).  
Achin, Huang and Raschka are analogous art because they are all directed to machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Achin’s performance evaluation with Raschka’s confidence intervals. The combination would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention because he/she would have been motivated to increase reliability of the system, which can be accomplished by optimizing its certainty (Raschka p.14 “Let us assume that we would like to compute a confidence interval around a performance estimate to judge its certainty – or uncertainty.”).

Regarding claim 11 
Claim 11 is a method claim corresponding to system claim 1. Claim 11 is rejected for at least the same reasons as claim 1. 

Regarding claim 12
Claim 12 is a method claim corresponding to system claim 2. Claim 12 is rejected for at least the same reasons as claim 2. 
Regarding claim 13
Claim 13 is a method claim corresponding to system claim 3. Claim 13 is rejected for at least the same reasons as claim 3.

Regarding claim 14
Claim 14 is a method claim corresponding to system claim 4. Claim 14 is rejected for at least the same reasons as claim 4.   

Regarding claim 15
Claim 15 is a method claim corresponding to system claim 5. Claim 15 is rejected for at least the same reasons as claim 5.

Regarding claim 16
Claim 16 is a method claim corresponding to system claim 6. Claim 16 is rejected for at least the same reasons as claim 6.

Regarding claim 17
Claim 17 is a method claim corresponding to system claim 7. Claim 17 is rejected for at least the same reasons as claim 7.

Regarding claim 19
Claim 19 is a method claim corresponding to system claim 10. Claim 19 is rejected for at least the same reasons as claim 10.
Regarding claim 20
Claim 20 is a computer program product claim corresponding to system claim 1. Claim 20 is rejected for at least the same reasons as claim 1.

Regarding claim 21
Claim 21 is a computer program product claim corresponding to system claim 2. Claim 21 is rejected for at least the same reasons as claim 2.

Regarding claim 22
Claim 22 is a computer program product claim corresponding to system claim 3. Claim 22 is rejected for at least the same reasons as claim 3.

Regarding claim 23
Claim 23 is a computer program product claim corresponding to system claim 4. Claim 23 is rejected for at least the same reasons as claim 4.




Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAN C MANG whose telephone number is (571)270-7598. The examiner can normally be reached Mon - Fri 8:00-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on 5712729767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/V.M./
Examiner, Art Unit 2126  
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126