Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
The following claims is/are pending in this office action: 1-20
The following claim(s) is/are amended: 1, 6, 7, 12, 18, and 19
The following claim(s) is/are new: None
The following claim(s) is/are cancelled: None
Claim(s) rejected: 1-14, and 16-20
Claims objected: 15

Previous Rejections Withdrawn
Rejections to claims 1-11 under 35 U.S.C. 101 and 112(b) are withdrawn based on the
Amendments.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the 


Claims 1-4, 6-9, 12-14, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Figueroa et al. (“Predicting sample size required for classification performance”) in view of Nowozin et al. (US20140172753A1).

Regarding claim 1, Figueroa teaches to determine an approximate best machine-learning configuration among a set of machine learning configurations by performing operations comprising  (Page 4 Section Model fitting: “Learning curves can generally be represented using inverse power law functions [1,27,37,38].Equation (1) describes the classifier’s accuracy (Yacc) as function of the training sample size × with the parameters a, b, and c representing the minimum achievable error, learning rate and decay rate respectively.” Page 2 Section Learning curve fitting: “A learning curve is a collection of data points (xj, yj) that in this case describe how the performance of a classifier (yj) is related to training sample sizes (xj), where j = 1 to m, m being the total number of instances.” Multiple classifiers (meaning classifiers with different configurations) are tested at different sample size to find the best sample size for a given configuration or best classifier for a given sample size. Classification accuracy is performance measure used. The best classifier for a given size can be selected when equation 1 is applied)
selecting a machine-learning configuration within the set for training (Page 3 Section Learning curve creation: Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Yacc), as a function of the training set size (X) is created. To obtain the data points (xj, yj), classifiers are created and tested at increasing training set sizes xj. With a batch size k, x j = k·j, j = 1, 2,...,m, i.e. x j = {k, 2k, 3k, ..., k · m}” Classifiers Yj are created (each classifier contains a given set of model configurations) for training using a determined training sample size.)
and determining an associated sample size for training (Page 3 Section Learning curve creation: Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Yacc), as a function of the training set size (X) is created. To obtain the data points (xj, yj), classifiers are created and tested at increasing training set sizes xj. With a batch size k, x j = k·j, j = 1, 2,...,m, i.e. x j = {k, 2k, 3k, ..., k · m}” Classifiers Yj are created (each classifier contains a given set of model configurations) for training using a determined training sample size. Equation 1 
causing a training dataset to be sampled in accordance with the determined sample size to obtain a sampled training dataset (Page 2 Section Learning curve fitting Para 1: “A learning curve is a collection of data points (xj, yj) that in this case describe how the performance of a classifier (yj) is related to training sample sizes (xj), where j = 1 to m, m being the total number of instances” Page 3 Section Learning curve creation Para 1: “Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Yacc), as a function of the training set size (X) is created. To obtain the data points (xj, yj), classifiers are created and tested at increasing training set sizes xj. With a batch size k, x j = k·j, j = 1, 2,...,m, i.e. xj = {k, 2k, 3k, ..., k · m}.” Training set size is determined. Classifiers are tested at determined training set sizes.)
(Page 4 Section Performance Prediction: “Additionally, the 95% confidence interval of the estimated accuracyˆys is also calculated by using Hessian matrix and the second-order derivatives on the function describing the curve” Page 5 Section Results Para 2: “Figure 2 shows the curve fitting and prediction results for the random sampling learning curve using D2 data at different sample sizes. In Figure 2a the curve was fitted using 6 data points; the predicted curve (blue) deviates slightly from the actual data points (black), though the actual data points do fall in the relatively large confidence interval (red).” Page 4 Section Evaluation: “Each dataset was randomly split into a training set and a testing set. Test sets for D1 and D2 contained 1,000 instances each while 2,500 instances were set apart as test set in D3” Dataset D2 is split into training and test set. Confidence interval of a quality metric (in this case accuracy) is calculated for each datasets including D2 (which contains both training and test data set.  Figueroa further states that cross validation is used for testing or validation and confidence interval is calculated accordingly as mentioned on Page 7 Para 4: “In many studies, as well as ours, the learning curves appear to be smooth because each data point on the curve is assigned the average value from multiple experiments (e.g. 10-fold cross validation repeated 100 times). With fewer experiments (e.g. 1 round of training and testing per data point), the curve will not be as smooth. We expect the model fitting to be more accurate and the confidence interval to be narrower on smoother curves, though the fitting process remains the same for the less smooth curves.”

Nowozin, however, teaches one or more machine-readable media storing instructions for execution by one or more hardware processors, execution of the instructions causing the one or more hardware processors (Para 0112: “The computer executable instructions may be provided using any computer-readable media that is accessible by computing based device 504.”)
pruning the set based on comparisons between the estimated confidence interval of the trained selected machine-learning configuration and estimated confidence intervals of other machine learning configurations within the set (Para 0034: “The racing logic 108 identifies those options which meet confidence interval conditions. For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.”).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine the use of training sample size for a given classifier’s configuration and confidence interval generation for quality metric of Figueroa with the pruning (Nowozin, Para 0002).

Regarding claim 2, Figueroa and Nowozin teach the method of claim 1.
Nowozin also teaches wherein the selecting, causing, estimating, and pruning operations are performed iteratively (Para 0034: “For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.” Per Fig. 3 selecting the features (options), estimation of confidence interval, and removing process is repeated until the desired result is achieved.).
Same motivation to combine the teachings of Figueroa and Nowozin as claim 1.

 Regarding claim 3, Figueroa and Nowozin teach the method of claim 2.
Nowozin also teaches wherein the approximate best machine-learning configuration is a last machine-learning configuration remaining within the set upon iterative pruning (Para 0034: “The racing logic 108 is used to calculate 308 a confidence interval for the score for each option. The racing logic 108 identifies those options which meet confidence interval conditions. For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.” The last option that remains is the desired or optimal configuration.).
Same motivation to combine the teachings of Figueroa and Nowozin as claim 1.

Regarding claim 4, Figueroa and Nowozin teach the method of claim 2.
Figueroa also teaches wherein, for each of the machine-learning configurations, the associated sample sizes increase progressively over repeated training iterations (Page 2 Last Para: “Another research area related to our work is progressive sampling. Both active learning and progressive sampling start with a very small batch of instances and progressively increase the training data size until a termination criteria is met [31-36].”)

Regarding claim 6, Figueroa and Nowozin teach the method of claim 1.
Figueroa also teaches test value of the quality metric is determined for the trained machine-learning configuration based on a sampled test dataset (Page 3 Last Para: “Classification accuracy points (yj), i.e. the proportion of correctly classified samples, can be calculated at each training sample size xj using an independent test set or through n-fold cross validation” Quality metric (classification accuracy) of a machine learning model is calculated using a test set).

Regarding claim 7, Figueroa and Nowozin teach the method of claim 1.
Nowozin also teaches wherein an upper bound of the estimated confidence interval is greater than the training value (Per formula in para 0039, upper bound formula in para 0039, the upper bound of the CI will be higher than the training score (or training value) if CI is calculated using training dataset.)
and a lower bound of the confidence interval is smaller than the test value (Per formula in para 0039, Lower bound of CI will be lower than the test score (or test value or information) if CI is calculated using test dataset.).
Same motivation to combine the teachings of Figueroa and Nowozin as claim 1.

Regarding claim 8, Figueroa and Nowozin teach the method of claim 1.
Figueroa also teaches wherein the quality metric measures an accuracy of predictions made by the trained machine-learning configuration (Page 2 Para 1: “The published criteria are generally based on target accuracy, classifier confidence, uncertainty estimation, and minimum expected error.” Page 2 Para 2: “Within this category we find methods that predict the sample size required for a classifier to reach a particular accuracy [2,4,26].” Page 3 Last Para: “Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Yacc), as a function of the training set size (X) is created.” Classification accuracy of machine learning model is used as quality metric in Figueroa.).

Regarding claim 9, Figueroa and Nowozin teach the method of claim 1.
Nowozin also teaches the one or more machine-readable media of claim 1, wherein pruning the set of machine learning configurations comprises determining, among lower bounds of the confidence intervals of the machine-learning configurations within the set, a (Para 0034: “For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302.” Para 0032: “The racing logic may use the confidence intervals to decide whether enough training examples have been received to give an option selection which is accurate within a specified error tolerance.” If Confidence interval of a given option (or configuration) falls outside the CI of the highest scoring option are identified. This is only possible when the lowest bound of CI of the highest scoring option is higher than the upper bound of a CI of a given option if there exist no error or error within specified error tolerance. Those options (or configurations) are removed or pruned and remaining are selected.).
Same motivation to combine the teachings of Figueroa and Nowozin as claim 1.

Regarding claim 12, Figueroa teaches training the selected machine-learning configuration based on the sampled training dataset and determining a training accuracy associated with the trained selected machine learning configuration (Page 3 Section Learning curve creation Para 1: “Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Yacc), as a function of the training set size (X) is created. To obtain the data points (xj, yj), classifiers are created and tested at increasing training set sizes xj. With a batch size k, x j = k·j, j = 1, 2,...,m, i.e. xj = {k, 2k, 3k, ..., k · m}.” A given classifier (containing machine learning configuration) is trained on a training set size. A learning curve is generated as a result of training the classifier) determining a training accuracy associated with the trained selected machine learning configuration (Page 4 Para 1: “Learning curves can generally be represented using inverse power law functions [1,27,37,38]. Equation (1) describes the classifier’s accuracy (Yacc) as function of the training sample size × with the parameters a, b, and c representing the minimum achievable error, learning rate and decay rate respectively.” Training accuracy is calculated using equation 1.)
evaluating the trained selected machine-learning configuration based on the sampled test dataset to determine a test accuracy associated with the trained selected machine learning configuration (Page 4 Para 1: “Learning curves can generally be represented using inverse power law functions [1,27,37,38]. Equation (1) describes the classifier’s accuracy (Yacc) as function of the training sample size × with the parameters a, b, and c representing the minimum achievable error, learning rate and decay rate respectively.” Training accuracy is calculated as a function of sample dataset.)
determining a confidence interval associated with the trained selected machine learning configuration based at least in part on the training and test accuracies (Page 4 Section Performance Prediction: “Additionally, the 95% confidence interval of the estimated accuracyˆys is also calculated by using Hessian matrix and the second-order derivatives on the function describing the curve.” Page 8: “Figure 5 Progression of confidence interval widths for the observed values (training set) and the predicted values.” Confidence intervals are generated for estimated accuracy from training and test set)
the confidence interval providing estimated bounds of a real test accuracy if the selected machine-learning configuration were trained and tested on the training and test (Fig. 2 shows confidence intervals lower and upper bounds made. Also the confidence interval was generated for D2 dataset which contains both train and test set.  Page 5 Results Para 2: “Figure 2 shows the curve fitting and prediction results for the random sampling learning curve using D2 data at different sample sizes. In Figure 2a the curve was fitted using 6 data points; the predicted curve (blue) deviates slightly from the actual data points (black), though the actual data points do fall in the relatively large confidence interval (red).” Page 4 Section Evaluation Para 2: “Each dataset was randomly split into a training set and a testing set. Test sets for D1 and D2 contained 1,000 instances each.”
Nowozin teaches A method comprising: iteratively pruning a set of machine-learning configurations based on a training dataset and a test dataset by using one or more hardware processors to perform operation comprising, in each of a plurality of iterations (Para 0034: “For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.” Para 0024: “Each of the option selector 102, scoring logic 106 and racing logic 108 are computer implemented using software and/or hardware.”).
pruning the set of machine-learning configurations based on comparisons between the determined confidence interval and confidence intervals associated with other machine learning configurations within the set (Para 0034: “For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.”)
selecting one of the machine-learning configurations remaining within the pruned set for a next iteration (Para 0034: “The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.” The last option that remains is the desired or optimal configuration until new data is available).
Same motivation to combine the teachings of Figueroa and Nowozin as claim 1.

Regarding claim 13, Figueroa and Nowozin teach the method claim 12.
Nowozin also teaches wherein pruning the set of machine-learning configurations comprises comparing, among the confidence intervals associated with the machine-learning configurations within the set, a confidence interval having a highest lower bound against all other confidence intervals (Para 0034: “The racing logic 108 identifies those options which meet confidence interval conditions. For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.” Also in Fig 2, it shows feature B has higher lower bound than feature. Para 0031. While comparing feature A and feature B, feature B is selected as having highest information gain) and removing from the set of machine-learning configurations any machine-learning configuration whose associated confidence interval overlaps by no more than a prescribed loss (Para 0034: “For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302.” Para 0032: “The racing logic may use the confidence intervals to decide whether enough training examples have been received to give an option selection which is accurate within a specified error tolerance.” If Confidence interval of a given option (or configuration) falls outside the CI of the highest scoring option are identified. This is only possible when the lowest bound of CI of the highest scoring option is higher than the upper bound of a CI of a given option if there exist no error or error within specified error tolerance. Those options (or configurations) are removed or pruned and remaining are selected.).
Same motivation to combine the teachings of Figueroa and Nowozin as claim 1.

Regarding claim 14, Figueroa and Nowozin teach the method claim 12.
Nowozin also teaches wherein the sampling schedules associated with the machine learning configurations increase at least a sample size of the sampled training dataset over repeated training of a same machine-learning configuration (Para 0031 and 0032: Per Fig. 2, sample size of training examples increases for machine learning features or options, once it reaches 500 information gain settles down. Para 0034: Options/features evaluation process I repeated until only one option remains.).
Same motivation to combine the teachings of Figueroa and Nowozin as claim 1.

Regarding claim 16, Figueroa and Nowozin teach the method claim 12.
Figueroa also teaches wherein the confidence interval is determined based further on sample sizes of the sampled training dataset and the sampled test dataset (“Figure 2 shows the curve fitting and prediction results for the random sampling learning curve using D2 data at different sample sizes. In Figure 2a the curve was fitted using 6 data points; the predicted curve (blue) deviates slightly from the actual data points (black), though the actual data points do fall in the relatively large confidence interval (red).” D2 is split into training and test dataset and confidence interval is created for D2 dataset for classification accuracy).

Regarding claim 17, Figueroa and Nowozin teach the method of claim 12.
Nowozin also teaches wherein the set of machine-learning configurations is iteratively pruned until it consists of only one remaining machine-learning configuration (Para 0034: “For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.”).
Same motivation to combine the teachings of Figueroa and Nowozin as in claim 1.

Regarding claim 18, Figueroa teaches for determining an approximate best machine-learning configuration among a set of machine-learning configurations, the processing components comprising: a training and test component configured to, upon selection of one of the machine learning configurations within the set and determination of an associated sample (Page 3 Section Learning curve creation: “Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Yacc), as a function of the training set size (X) is created. To obtain the data points (xj, yj), classifiers are created and tested at increasing training set sizes xj. With a batch size k, x j = k·j, j = 1, 2,...,m, i.e. xj = {k, 2k, 3k, ..., k · m} . Classification accuracy points (yj), i.e. the proportion of correctly classified samples, can be calculated at each training sample size xj using an independent test set or through n-fold cross validation” Classifiers (containing machine learning configurations) are tested against classification accuracy for best configuration for a given sample size.)
train the selected machine learning configuration on the sampled training dataset to optimize a training value of a quality metric (Page 2 Section Learning curve fitting Para 1: “A learning curve is a collection of data points (xj, yj) that in this case describe how the performance of a classifier (yj) is related to training sample sizes (xj), where j = 1 to m, m being the total number of instances” Page 3 Section Learning curve creation Para 1: “Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Yacc), as a function of the training set size (X) is created. To obtain the data points (xj, yj), classifiers are created and tested at increasing training set sizes xj. With a batch size k, x j = k·j, j = 1, 2,...,m, i.e. xj = {k, 2k, 3k, ..., k · m}.” Training set size is determined. Classifiers are tested at determined training set sizes. A learning curves comes out of the training of classifiers. Classification accuracy is a quality metric used)
(Page 3 Section Learning curve creation Para 1: “Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Yacc), as a function of the training set size (X) is created. To obtain the data points (xj, yj), classifiers are created and tested at increasing training set sizes xj. With a batch size k, x j = k·j, j = 1, 2,...,m, i.e. xj = {k, 2k, 3k, ..., k · m}.” Classifiers (or Machine learning configurations) are tested on training set sizes for a quality metric (which is classification accuracy).
And a sampling and scheduling component configured to: compute from the training and test values of the quality metric, a real value of the quality metric if the selected machine-learning configuration were trained and tested on the training and test datasets in their entirety (Page 4 Section Evaluation Para 2: “Each dataset was randomly split into a training set and a testing set. Test sets for D1 and D2 contained 1,000 instances each while 2,500 instances were set apart as test set in D3” Dataset D2 is split into training and test set. Confidence interval of a quality metric (in this case accuracy) is calculated for each datasets including D2 (which contains both training and test data set. Figueroa further states that cross validation is used for testing or validation and confidence interval is calculated accordingly as mentioned on Page 7 Para 4: “In many studies, as well as ours, the learning curves appear to be smooth because each data point on the curve is assigned the average value from multiple experiments (e.g. 10-fold cross validation repeated 100 times). With fewer experiments (e.g. 1 round of training and testing per data point), the curve will not be as smooth. We expect the model fitting to be more accurate and the confidence interval to be narrower on smoother curves, though the fitting process remains the same for the less smooth curves.”)
Figueroa does not teach A system comprising: one or more hardware processors configured to implement a plurality of processing components; and iteratively prune the set of machine-learning configurations based on the confidence intervals, select machine-learning configurations for training by the training and test component, and determine, for the selected machine-learning configurations, associated sample sizes for the sampled training dataset
Nowozin teaches A system comprising: one or more hardware processors configured to implement a plurality of processing components (Para 0112: “The computer executable instructions may be provided using any computer-readable media that is accessible by computing based device 504.”)
iteratively prune the set of machine-learning configurations based on the confidence intervals, select machine-learning configurations for training by the training and test component (Para 0034: “For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.”)
and determine, for the selected machine-learning configurations, associated sample sizes for the sampled training dataset (Para 0031: “In the example of FIG. 2 there are two options which are feature A and feature B. although in practice many more options may be present (two are shown for clarity… Once more than 500 samples have been received the information gain scores settled down and are clearly separated for features A and B with feature B having a higher information gain score.” Associated sample size is determined for machine learning features/configurations from training dataset.).
Same motivation to combine the teachings of Figueroa and Nowozin as claim 1.

Regarding claim 19, Figueroa and Nowozin teach the method of claim 18.
Nowozin also teaches wherein the processing components further comprise: a data sampler configured to sample the training dataset based on the sample sizes determined by the sampling and scheduling component for the selected machine-learning configurations (Fig. 1 shows samples are derived from datasets for training and providing scores to features/option. Para 0031: “Once more than 500 samples have been received the information gain scores settled down and are clearly separated for features A and B with feature B having a higher information gain score.” Fig. 2 also plots information gain for multiple features of training samples at different sample sizes.).
Same motivation to combine the teachings of Figueroa and Nowozin.

Regarding claim 20, Figueroa and Nowozin teach the method of claim 18.
Figueroa also teaches wherein the sampling and scheduling component determines the sample sizes for each of the machine-learning configurations based on a predetermined progressive sampling schedule associated with that machine-learning configuration (Page 2 Last Para: “Another research area related to our work is progressive sampling. Both active learning and progressive sampling start with a very small batch of instances and progressively increase the training data size until a termination criteria is met [31-36].”)

Claim 5 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Figueroa (“Predicting sample size required for classification performance”) in view of Nowozin (US 2014/0172753A1 ~ IDS) further in view of Bergstra et al. (“Random search for hyper-parameter optimization.” ~ IDS).

Regarding claim 5, Figueroa and Nowozin teaches the method of claim 4.
Neither Figueroa nor Nowozin do not explicitly teach wherein the associated sample sizes increase geometrically over repeated training iterations.
Bergstra, however, teaches wherein the associated sample sizes increase geometrically over repeated training iterations (Fig. 6 shows trials/sample are done geometrically (increasing by multiple of 2. Training is done iteratively. Section 1 Para 1 and 2. Training is performed iteratively on a training set.).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine the use of training sample size for a given classifier’s configuration and confidence interval generation for quality metric of Figueroa as modified by Nowozin with the sampling method of Bergstra to find the right sampling size that determines the best or better model/configuration at a given computational budget (Bergstra, Fig. 6).

Regarding claim 10, Figueroa and Nowozin teach the method of claim 1.

Bergstra, however, teaches wherein selection of a machine learning configuration for training is based at least in part on training costs associated with reducing the confidence intervals of the machine-learning configurations within the set (Fig 6. shows confidence intervals based on computational budget/cost are shrinking as sample/# trial increases. The method to calculate confidence interval is mentioned on Fig 2.).
Same motivation to combine the teachings of Figueroa, Nowozin and Bergstra as claim 5.

Claims 11 is rejected under 35 U.S.C. 103 as being unpatentable over Figueroa (“Predicting sample size required for classification performance”) in view of Nowozin (US 2014/0172753A1) further in view of Luo (“A review of automatic selection methods for machine learning algorithms and hyperparamter values.”).

Regarding claim 11, Figueroa and Nowozin teaches the method of claim 1.
Neither Figueroa nor Nowozin teach explicitly wherein the approximate best machine-learning configuration is one of one or more machine-learning configurations remaining within the pruned set when a time limit has been reached.
Luo also teaches wherein the approximate best machine-learning configuration is one of one or more machine-learning configurations remaining within the pruned set when a time (Page 3 Para 1: “Third, the researcher trains the machine learning model to automatically optimize the ordinary parameters of the chosen algorithm…This process is repeated until the researcher obtains a model with satisfactory accuracy, runs out of time, or thinks that the model’s accuracy cannot be improved much further any more.” When time limit is reached or evaluation process runs out of time, the best configurations search process will be stopped and the remaining configuration can be selected.).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine the use of training sample size for a given classifier’s configuration and confidence interval generation for quality metric of Figueroa as modified by Nowozin with the time constraint to determine model accuracy to optimize the parameters of algorithm (Luo, Page 3 Para 1).

Response to Arguments
Applicant’s arguments filed on 09/02/2021 with respect to the 35 U.S.C. 102 and 103 rejections have been fully considered. New amendments have been added in 103 rejections and relevant citations have been provided. Examiner adds a new reference (Figueroa et. al.) owing to the amendments made in the independent and dependent claims. Applicant’s arguments are responded below.
Applicant Argument: Applicant argues that none of the prior art teaches claim limitations after they are amended. For example, “based at least in part on the training and test values of a quality metric”, “confidence interval of a real value of the quality metric if the 
Examiner’s Response: Examiner’s added a reference Figueroa et al. to teach these amended limitations. Examiner’s has provided the relevant citations in 103 rejection section.

Allowable Subject Matter
Claim 15 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Q.I/ 
Examiner 
Art unit 2123
02/02/2021

/BRIAN M SMITH/Primary Examiner, Art Unit 2122