Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 04/28/2022 has been entered.

Status of Claims
This action is in reply to the amendments and remarks filed on 04/28/2022.
Claims 1-20 are pending.
Claims 1, 7, 12, and 18 have been amended.  

Response to Arguments
Applicant’s arguments, with respect to the rejection(s) of claim(s) 1, 12, and 18 under 35 U.S.C. 103, has been considered but are not persuasive. More specifically, the applicant argues that no art of record teaches the limitations of amended claims 1, 12, and 18, since (1) “one of ordinary skill in the art will understand, the selection of a machine-learning configuration…is sperate and distinct from the training of the machine-learning model on training data to determine free parameters…once a configuration has been selected”. Due to the broadness of the claim language, the examiner respectfully disagrees.
While structural parameters of a machine-learning model (e.g., hyperparameters) are known to be different from “free parameters” as argued (e.g., algorithm internal node weights), the setting of both parameters make up unique configurations of an algorithm upon which the algorithm will operate accordingly to produce a specific output and both sets are determined in training/learning procedures (see applicant’s spec paragraph 0018). Applicant is encouraged to amend the claims in order to narrow the scope so the claim cannot be read as broadly.
See 35 U.S.C 103 section for full mapping of claim limitations necessitated by applicant’s amendments.

(2) Applicant next argues that at no art of record teaches the amended limitations of claims 1, 12, and 18, since “Figueroa does not pertain to selecting a machine-learning configuration among a set of multiple configurations…[and Figueroa’s] confidence interval is computed in a fundamentally different manner” than claimed. Figueroa determines a “single confidence interval” without using the “training accuracies…, and the test accuracies are used only statistically”; whereas the claims calculate a confidence interval “in each iteration”. Due to the broadness of the claim language, the examiner respectfully disagrees. 
Figueroa, page 2 teaches creating and testing multiple models to find which is the most accurate on the data (selecting a machine-learning configuration among a set of multiple configurations as argued), thus meeting the claimed limitation language. Next, Figueroa page 7 and Figs. 2-3 teach the determining each model’s output curve over “repeated” experiments and adjusting the “confidence interval” each time; and page 5 Section Results Para 2 and 5: “Figure 2 shows the curve fitting and prediction results for the random sampling learning curve using D2 data at different sample sizes. In Figure 2a the curve was fitted using 6 data points; the predicted curve (blue) deviates slightly from the actual data points (black), though the actual data points do fall in the relatively large confidence interval (red) (confidence interval providing upper and lower bounds)…the predicted confidence interval narrows dramatically as more samples are used and the prediction becomes more accurate”; here the confidence interval bounds are taught as being dependent on the sets’ accuracies (the upper bound being computed from the training value and the lower bound being computed from the test value).
Further, Nowozin was previously cited for teaching computing confidence intervals at each iteration: Para 0034: “For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.”
Further still, new reference Poh has been added for teaching the amended concepts.
See 35 U.S.C 103 section for full mapping of claim limitations necessitated by applicant’s amendments.

(3) Applicant next argues that at no art of record teaches the amended limitations of claims 1, 12, and 18, since Nowozin’s “confidence intervals…are not computed based on training and test values of a quality metric determined for a selected option”. The examiner respectfully disagrees. 
It is noted that Figueroa (and new reference Poh taught in alternative) was cited as teaching the amended confidence bound calculations and Nowozin was cited as teaching pruning model configurations based on confidence intervals.
Nonetheless, Nowozin, paragraph 0032 teaches computing “confidence intervals” for a model’s “scores” from “training examples” (based on training and test values of a quality metric). See claim 7 updating mappings with Poh cited in alternative.
Applicant is additionally directed to MPEP section 2141.III stating "Prior art is not limited just to the references being applied, but includes the understanding of one of ordinary skill in the art. The prior art reference (or references when combined) need not teach or suggest all the claim limitations, however, Office personnel must explain why the difference(s) between the prior art and the claimed invention would have been obvious to one of ordinary skill in the art. The 'mere existence of differences between the prior art and an invention does not establish the invention’s nonobviousness.' Dann v. Johnston, 425 U.S. 219, 230, 189 USPQ 257, 261 (1976)."
See 35 U.S.C 103 section for full mapping of claim limitations necessitated by applicant’s amendments.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-4, 6-9, 12-14, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Figueroa et al. (“Predicting sample size required for classification performance”), in view of Nowozin et al. (US20140172753A1), in view of Poh et al. (“Estimating the Confidence Interval of Expected Performance Curve in Biometric Authentication Using Joint Bootstrap”, 2007).

Regarding claim 1, Figueroa teaches one or more non-transitory machine-readable media storing instructions for execution by one or more hardware processors (Page 7: “we intend to integrate the function to predict sample size into our NLP software”. Software is well known to be executed on one or more processors communicatively connected to memory included in a computer for performing the embodiments of the disclosure), execution of the instructions causing the one or more hardware processors to determine an approximate best machine-learning configuration among a set of machine learning configurations by performing operations comprising  (Page 4 Section Model fitting: “Learning curves can generally be represented using inverse power law functions [1,27,37,38].Equation (1) describes the classifier’s accuracy (Yacc) as function of the training sample size × with the parameters a, b, and c representing the minimum achievable error, learning rate and decay rate respectively.” Page 2 Section Learning curve fitting: “A learning curve is a collection of data points (xj, yj) that in this case describe how the performance of a classifier (yj) is related to training sample sizes (xj), where j = 1 to m, m being the total number of instances.” Multiple classifiers (meaning classifiers with different configurations) are tested at different sample size to find the best sample size for a given configuration or best classifier for a given sample size. Classification accuracy is performance measure used. The best classifier for a given size can be selected when equation 1 is applied)
selecting a machine-learning configuration within the set for training (Page 3 Section Learning curve creation: “Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Yacc), as a function of the training set size (X) is created. To obtain the data points (xj, yj), classifiers are created and tested at increasing training set sizes xj. With a batch size k, x j = k·j, j = 1, 2,...,m, i.e. x j = {k, 2k, 3k, ..., k · m}” Classifiers Yj are created (each classifier contains a given set of model configurations) for training using a determined training sample size.)
and determining an associated sample size for training (Page 3 Section Learning curve creation: Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Yacc), as a function of the training set size (X) is created. To obtain the data points (xj, yj), classifiers are created and tested at increasing training set sizes xj. With a batch size k, x j = k·j, j = 1, 2,...,m, i.e. x j = {k, 2k, 3k, ..., k · m}” Classifiers Yj are created (each classifier contains a given set of model configurations) for training using a determined training sample size. Equation 1 
causing a training dataset to be sampled in accordance with the determined sample size to obtain a sampled training dataset; causing the selected machine-learning configuration to be trained on the sampled training dataset to optimize a training value of a quality metric; causing the trained machine-learning configuration to be tested on at least a sample of a test dataset to determine a test value of the quality metric (Page 2 Section Learning curve fitting Para 1: “A learning curve is a collection of data points (xj, yj) that in this case describe how the performance of a classifier (yj) is related to training sample sizes (xj), where j = 1 to m, m being the total number of instances” Page 3 Section Learning curve creation Para 1: “Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Yacc), as a function of the training set size (X) is created. To obtain the data points (xj, yj), classifiers are created and tested at increasing training set sizes xj. With a batch size k, x j = k·j, j = 1, 2,...,m, i.e. xj = {k, 2k, 3k, ..., k · m}.” Training set size is determined. Classifiers are tested at determined training set sizes.);
estimating, based at least in part on the training and test values of the quality metric, a confidence interval providing upper and lower bounds for a real value of the quality metric if the trained selected machine learning configuration were trained and tested on the training and test datasets in their entirety, the upper bound being computed from the training value and the lower bound being computed from the test value (Page 4 Section Performance Prediction: “Additionally, the 95% confidence interval of the estimated accuracyˆys is also calculated by using Hessian matrix and the second-order derivatives on the function describing the curve”. Page 5 Section Results Para 2 and 5: “Figure 2 shows the curve fitting and prediction results for the random sampling learning curve using D2 data at different sample sizes. In Figure 2a the curve was fitted using 6 data points; the predicted curve (blue) deviates slightly from the actual data points (black), though the actual data points do fall in the relatively large confidence interval (red) (confidence interval providing upper and lower bounds)…the predicted confidence interval narrows dramatically as more samples are used and the prediction becomes more accurate”; here the confidence interval bounds are taught as being dependent on the sets’ accuracies (the upper bound being computed from the training value and the lower bound being computed from the test value). Page 4 Section Evaluation: “Each dataset was randomly split into a training set and a testing set. Test sets for D1 and D2 contained 1,000 instances each while 2,500 instances were set apart as test set in D3” Dataset D2 is split into training and test set. Confidence interval of a quality metric (in this case accuracy) is calculated for each datasets including D2 (which contains both training and test data set).  Figueroa further states that cross validation is used for testing or validation and confidence interval is calculated accordingly as mentioned on Page 7 Para 4: “In many studies, as well as ours, the learning curves appear to be smooth because each data point on the curve is assigned the average value from multiple experiments (e.g. 10-fold cross validation repeated 100 times). With fewer experiments (e.g. 1 round of training and testing per data point), the curve will not be as smooth. We expect the model fitting to be more accurate and the confidence interval to be narrower on smoother curves, though the fitting process remains the same for the less smooth curves.”); and 


Figueroa does not explicitly teach pruning the set of machine-learning configurations based on comparisons between the estimated confidence interval of the trained selected machine-learning configuration and estimated confidence intervals of other machine learning configurations within the set.
Nowozin, however, teaches pruning the set of machine-learning configurations based on comparisons between the estimated confidence interval of the trained selected machine-learning configuration and estimated confidence intervals of other machine learning configurations within the set (Para 0034: “the options may be randomly generated decision tree split functions from a decision tree training process (machine-learning configurations)…The racing logic 108 identifies those options which meet confidence interval conditions. For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.” Para 002 an option selector 102 which is a component of a machine learning system”.).
Further, Figueroa at least implies estimating; however, Nowozin teaches one or more machine-readable media storing instructions for execution by one or more hardware processors, execution of the instructions causing the one or more hardware processors (Para 0112: “The computer executable instructions may be provided using any computer-readable media that is accessible by computing based device 504.”).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine the use of training sample size for a given classifier’s configuration and confidence interval generation for quality metric of Figueroa with the pruning mechanism of options or configurations of Nowozin to ensure better allocation of the resources by selecting appropriate options (Nowozin, Para 0002).
Further still, Figueroa at least implies estimating, based at least in part on the training and test values of the quality metric, a confidence interval providing upper and lower bounds for a real value of the quality metric if the trained selected machine learning configuration were trained and tested on the training and test datasets in their entirety, the upper bound being computed from the training value and the lower bound being computed from the test value (see mappings above); however Poh teaches estimating, based at least in part on the training and test values of the quality metric, a confidence interval providing upper and lower bounds for a real value of the quality metric if the trained selected machine learning configuration were trained and tested on the training and test datasets in their entirety, the upper bound being computed from the training value and the lower bound being computed from the test value (sections 3.2 and 5 teach using a “31-user data set” training set and a “64-user data set” testing set for establishing and validating a confidence region with calculated “upper and lower bounds” (upper/lower bound being computed) from “sample variability” and “coverage” (training and test values of the quality metric)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the use of training sample size for a given classifier’s configuration and confidence interval generation for quality metric, as taught by Figueroa as modified by pruning mechanism of options or configurations as taught by Nowozin, to include machine learning algorithm upper and lower confidence region calculations as taught by Poh in order to “put realistic upper and lower bounds on a priori performance evaluation based on EPC[, and be] better than the bootstrap subset technique in terms of coverage” (Poh, section 6). 

Regarding claim 2, Figueroa, Nowozin, and Poh teach the method of claim 1.
Nowozin also teaches wherein the selecting, causing, estimating, and pruning operations are performed iteratively (Para 0034: “For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.” Per Fig. 3 selecting the features (options), estimation of confidence interval, and removing process is repeated until the desired result is achieved.).
Same motivation to combine the teachings of Figueroa, Nowozin, and Poh as claim 1.

 Regarding claim 3, Figueroa, Nowozin, and Poh teach the method of claim 2.
Nowozin also teaches wherein the approximate best machine-learning configuration is a last machine-learning configuration remaining within the set upon iterative pruning (Para 0034: “The racing logic 108 is used to calculate 308 a confidence interval for the score for each option. The racing logic 108 identifies those options which meet confidence interval conditions. For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.” The last option that remains is the desired or optimal configuration.).
Same motivation to combine the teachings of Figueroa, Nowozin, and Poh as claim 1.

Regarding claim 4, Figueroa, Nowozin, and Poh teach the method of claim 2.
Figueroa also teaches wherein, for each of the machine-learning configurations, the associated sample sizes increase progressively over repeated training iterations (Page 2 Last Para: “Another research area related to our work is progressive sampling. Both active learning and progressive sampling start with a very small batch of instances and progressively increase the training data size until a termination criteria is met [31-36].”)

Regarding claim 6, Figueroa, Nowozin, and Poh teach the method of claim 1.
Figueroa also teaches test value of the quality metric is determined for the trained machine-learning configuration based on a sampled test dataset (Page 3 Last Para: “Classification accuracy points (yj), i.e. the proportion of correctly classified samples, can be calculated at each training sample size xj using an independent test set or through n-fold cross validation” Quality metric (classification accuracy) of a machine learning model is calculated using a test set).

Regarding claim 7, Figueroa, Nowozin, and Poh teach the method of claim 1.
Nowozin also teaches wherein the upper bound of the estimated confidence interval is greater than the training value (Per formula in para 0039, upper bound formula in para 0039, the upper bound of the CI will be higher than the training score (or training value) if CI is calculated using training dataset.)
and the lower bound of the estimated confidence interval is smaller than the test value (Per formula in para 0039, Lower bound of CI will be lower than the test score (or test value or information) if CI is calculated using test dataset.).
Same motivation to combine the teachings of Figueroa, Nowozin, and Poh as claim 1.
Nowozin at least implies wherein the upper bound of the estimated confidence interval is greater than the training value and the lower bound of the estimated confidence interval is smaller than the test value (see mappings above); however Poh teaches wherein the upper bound of the estimated confidence interval is greater than the training value and the lower bound of the estimated confidence interval is smaller than the test value (sections 3.2 and 5 teach using a “31-user data set” training set and a “64-user data set” testing set for establishing and validating a confidence region with calculated “upper and lower bounds” (upper/lower bound being computed) from “sample variability” and “coverage” (training and test values of the quality metric) and taught as being within the bounds).
Same motivation to combine the teachings of Figueroa, Nowozin, and Poh as claim 1.

Regarding claim 8, Figueroa, Nowozin, and Poh teach the method of claim 1.
Figueroa also teaches wherein the quality metric measures an accuracy of predictions made by the trained machine-learning configuration (Page 2 Para 1: “The published criteria are generally based on target accuracy, classifier confidence, uncertainty estimation, and minimum expected error.” Page 2 Para 2: “Within this category we find methods that predict the sample size required for a classifier to reach a particular accuracy [2,4,26].” Page 3 Last Para: “Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Yacc), as a function of the training set size (X) is created.” Classification accuracy of machine learning model is used as quality metric in Figueroa.).

Regarding claim 9, Figueroa, Nowozin, and Poh teach the method of claim 1.
Nowozin also teaches the one or more machine-readable media of claim 1, wherein pruning the set of machine learning configurations comprises determining, among lower bounds of the confidence intervals of the machine-learning configurations within the set, a highest lower bound, and removing any machine-learning configuration from the set whose confidence interval has an upper bound that exceeds the highest lower bound by no more than a prescribed loss tolerance (Para 0034: “For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302.” Para 0032: “The racing logic may use the confidence intervals to decide whether enough training examples have been received to give an option selection which is accurate within a specified error tolerance.” If Confidence interval of a given option (or configuration) falls outside the CI of the highest scoring option are identified. This is only possible when the lowest bound of CI of the highest scoring option is higher than the upper bound of a CI of a given option if there exist no error or error within specified error tolerance. Those options (or configurations) are removed or pruned and remaining are selected.).
Same motivation to combine the teachings of Figueroa, Nowozin, and Poh as claim 1.

Regarding claim 12, Figueroa teaches 


training the selected machine-learning configuration based on the sampled training dataset and determining a training accuracy associated with the trained selected machine learning configuration (Page 3 Section Learning curve creation Para 1: “Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Yacc), as a function of the training set size (X) is created. To obtain the data points (xj, yj), classifiers are created and tested at increasing training set sizes xj. With a batch size k, x j = k·j, j = 1, 2,...,m, i.e. xj = {k, 2k, 3k, ..., k · m}.” A given classifier (containing machine learning configuration) is trained on a training set size. A learning curve is generated as a result of training the classifier) determining a training accuracy associated with the trained selected machine learning configuration (Page 4 Para 1: “Learning curves can generally be represented using inverse power law functions [1,27,37,38]. Equation (1) describes the classifier’s accuracy (Yacc) as function of the training sample size × with the parameters a, b, and c representing the minimum achievable error, learning rate and decay rate respectively.” Training accuracy is calculated using equation 1.)
evaluating the trained selected machine-learning configuration based on the sampled test dataset to determine a test accuracy associated with the trained selected machine learning configuration (Page 4 Para 1: “Learning curves can generally be represented using inverse power law functions [1,27,37,38]. Equation (1) describes the classifier’s accuracy (Yacc) as function of the training sample size × with the parameters a, b, and c representing the minimum achievable error, learning rate and decay rate respectively.” Training accuracy is calculated as a function of sample dataset.)
determining a confidence interval associated with the trained selected machine learning configuration based at least in part on the training and test accuracies (Page 4 Section Performance Prediction: “Additionally, the 95% confidence interval of the estimated accuracyˆys is also calculated by using Hessian matrix and the second-order derivatives on the function describing the curve.” Page 8: “Figure 5 Progression of confidence interval widths for the observed values (training set) and the predicted values.” Confidence intervals are generated for estimated accuracy from training and test set), the confidence interval providing estimated upper and lower bounds of a real test accuracy if the selected machine-learning configuration were trained and tested on the training and test datasets in their entirety, the upper bound being computed from the training accuracy and the lower bound being computed from the test accuracy (Page 4 Section Performance Prediction: “Additionally, the 95% confidence interval of the estimated accuracyˆys is also calculated by using Hessian matrix and the second-order derivatives on the function describing the curve”. Page 5 Section Results Para 2 and 5: “Figure 2 shows the curve fitting and prediction results for the random sampling learning curve using D2 data at different sample sizes. In Figure 2a the curve was fitted using 6 data points; the predicted curve (blue) deviates slightly from the actual data points (black), though the actual data points do fall in the relatively large confidence interval (red) (confidence interval providing upper and lower bounds)…the predicted confidence interval narrows dramatically as more samples are used and the prediction becomes more accurate”; here the confidence interval bounds are taught as being dependent on the sets’ accuracies (the upper bound being computed from the training value and the lower bound being computed from the test value). Page 4 Section Evaluation: “Each dataset was randomly split into a training set and a testing set. Test sets for D1 and D2 contained 1,000 instances each while 2,500 instances were set apart as test set in D3” Dataset D2 is split into training and test set. Confidence interval of a quality metric (in this case accuracy) is calculated for each datasets including D2 (which contains both training and test data set).  Figueroa further states that cross validation is used for testing or validation and confidence interval is calculated accordingly as mentioned on Page 7 Para 4: “In many studies, as well as ours, the learning curves appear to be smooth because each data point on the curve is assigned the average value from multiple experiments (e.g. 10-fold cross validation repeated 100 times). With fewer experiments (e.g. 1 round of training and testing per data point), the curve will not be as smooth. We expect the model fitting to be more accurate and the confidence interval to be narrower on smoother curves, though the fitting process remains the same for the less smooth curves.”);



However, Figueroa does not explicitly teach a method comprising: iteratively pruning a set of machine-learning configurations based on a training dataset and a test dataset by using one or more hardware processors to perform operation comprising, in each of a plurality of iterations; pruning the set of machine-learning configurations based on comparisons between the determined confidence interval and confidence intervals associated with other machine learning configurations within the set; and selecting one of the machine-learning configurations remaining within the pruned set for a next iteration.
Nowozin teaches a method comprising: iteratively pruning a set of machine-learning configurations based on a training dataset and a test dataset by using one or more hardware processors to perform operation comprising, in each of a plurality of iterations (Para 0034: “For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.” Para 0024: “Each of the option selector 102, scoring logic 106 and racing logic 108 are computer implemented using software and/or hardware.”).
pruning the set of machine-learning configurations based on comparisons between the determined confidence interval and confidence intervals associated with other machine learning configurations within the set (Para 0034: “For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.”)
selecting one of the machine-learning configurations remaining within the pruned set for a next iteration (Para 0034: “The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.” The last option that remains is the desired or optimal configuration until new data is available).
Same motivation to combine the teachings of Figueroa and Nowozin as claim 1.
Further, Figueroa at least implies determining a confidence interval associated with the trained selected machine learning configuration based at least in part on the training and test accuracies, the confidence interval providing estimated upper and lower bounds of a real test accuracy if the selected machine-learning configuration were trained and tested on the training and test datasets in their entirety, the upper bound being computed from the training accuracy and the lower bound being computed from the test accuracy (see mappings above); however Poh teaches determining a confidence interval associated with the trained selected machine learning configuration based at least in part on the training and test accuracies, the confidence interval providing estimated upper and lower bounds of a real test accuracy if the selected machine-learning configuration were trained and tested on the training and test datasets in their entirety, the upper bound being computed from the training accuracy and the lower bound being computed from the test accuracy (sections 3.2 and 5 teach using a “31-user data set” training set and a “64-user data set” testing set for establishing and validating a confidence region with calculated “upper and lower bounds” (upper/lower bound being computed) from “sample variability” and “coverage” (training and test values of the quality metric)).
Same motivation to combine the teachings of Figueroa, Nowozin, and Poh as claim 1.

Regarding claim 13, Figueroa, Nowozin, and Poh teach the method claim 12.
Nowozin also teaches wherein pruning the set of machine-learning configurations comprises comparing, among the confidence intervals associated with the machine-learning configurations within the set, a confidence interval having a highest lower bound against all other confidence intervals (Para 0034: “The racing logic 108 identifies those options which meet confidence interval conditions. For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.” Also in Fig 2, it shows feature B has higher lower bound than feature. Para 0031. While comparing feature A and feature B, feature B is selected as having highest information gain) and removing from the set of machine-learning configurations any machine-learning configuration whose associated confidence interval overlaps by no more than a prescribed loss tolerance with the confidence interval having the highest lower bound (Para 0034: “For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302.” Para 0032: “The racing logic may use the confidence intervals to decide whether enough training examples have been received to give an option selection which is accurate within a specified error tolerance.” If Confidence interval of a given option (or configuration) falls outside the CI of the highest scoring option are identified. This is only possible when the lowest bound of CI of the highest scoring option is higher than the upper bound of a CI of a given option if there exist no error or error within specified error tolerance. Those options (or configurations) are removed or pruned and remaining are selected.).
Same motivation to combine the teachings of Figueroa, Nowozin, and Poh as claim 1.

Regarding claim 14, Figueroa, Nowozin, and Poh teach the method claim 12.
Nowozin also teaches wherein the sampling schedules associated with the machine learning configurations increase at least a sample size of the sampled training dataset over repeated training of a same machine-learning configuration (Para 0031 and 0032: Per Fig. 2, sample size of training examples increases for machine learning features or options, once it reaches 500 information gain settles down. Para 0034: Options/features evaluation process I repeated until only one option remains.).
Same motivation to combine the teachings of Figueroa, Nowozin, and Poh as claim 1.

Regarding claim 16, Figueroa, Nowozin, and Poh teach the method claim 12.
Figueroa also teaches wherein the confidence interval is determined based further on sample sizes of the sampled training dataset and the sampled test dataset (“Figure 2 shows the curve fitting and prediction results for the random sampling learning curve using D2 data at different sample sizes. In Figure 2a the curve was fitted using 6 data points; the predicted curve (blue) deviates slightly from the actual data points (black), though the actual data points do fall in the relatively large confidence interval (red).” D2 is split into training and test dataset and confidence interval is created for D2 dataset for classification accuracy).

Regarding claim 17, Figueroa, Nowozin, and Poh teach the method of claim 12.
Nowozin also teaches wherein the set of machine-learning configurations is iteratively pruned until it consists of only one remaining machine-learning configuration (Para 0034: “For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.”).
Same motivation to combine the teachings of Figueroa, Nowozin, and Poh as in claim 1.

Regarding claim 18, Figueroa teaches a system comprising: one or more hardware processors configured to implement a plurality of processing components for determining an approximate best machine-learning configuration among a set of machine-learning configurations (Page 7: “we intend to integrate the function to predict sample size into our NLP software”. Software is well known to be executed on one or more processors communicatively connected to memory included in a computer for performing the embodiments of the disclosure), the processing components comprising: a training and test component configured to, upon selection of one of the machine learning configurations within the set and determination of an associated sample size: cause a training dataset to be sampled in accordance with the sample size to obtain a sampled training dataset (Page 3 Section Learning curve creation: “Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Yacc), as a function of the training set size (X) is created. To obtain the data points (xj, yj), classifiers are created and tested at increasing training set sizes xj. With a batch size k, x j = k·j, j = 1, 2,...,m, i.e. xj = {k, 2k, 3k, ..., k · m} . Classification accuracy points (yj), i.e. the proportion of correctly classified samples, can be calculated at each training sample size xj using an independent test set or through n-fold cross validation” Classifiers (containing machine learning configurations) are tested against classification accuracy for best configuration for a given sample size.)
train the selected machine learning configuration on the sampled training dataset to optimize a training value of a quality metric (Page 2 Section Learning curve fitting Para 1: “A learning curve is a collection of data points (xj, yj) that in this case describe how the performance of a classifier (yj) is related to training sample sizes (xj), where j = 1 to m, m being the total number of instances” Page 3 Section Learning curve creation Para 1: “Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Yacc), as a function of the training set size (X) is created. To obtain the data points (xj, yj), classifiers are created and tested at increasing training set sizes xj. With a batch size k, x j = k·j, j = 1, 2,...,m, i.e. xj = {k, 2k, 3k, ..., k · m}.” Training set size is determined. Classifiers are tested at determined training set sizes. A learning curves comes out of the training of classifiers. Classification accuracy is a quality metric used)
cause the trained machine-learning configuration to be tested on at least a sample of a test dataset to determine a test value of the quality metric (Page 3 Section Learning curve creation Para 1: “Assuming the target performance measure is classification, a learning curve that characterizes classification accuracy (Yacc), as a function of the training set size (X) is created. To obtain the data points (xj, yj), classifiers are created and tested at increasing training set sizes xj. With a batch size k, x j = k·j, j = 1, 2,...,m, i.e. xj = {k, 2k, 3k, ..., k · m}.” Classifiers (or Machine learning configurations) are tested on training set sizes for a quality metric (which is classification accuracy); and
a sampling and scheduling component configured to: compute from the training and test values of the quality metric, confidence interval providing upper and lower bounds for a real value of the quality metric if the selected machine-learning configuration were trained and tested on the training and test datasets in their entirety, the upper bound being computed from the training value and the lower bound being computed from the test value (Page 4 Section Performance Prediction: “Additionally, the 95% confidence interval of the estimated accuracyˆys is also calculated by using Hessian matrix and the second-order derivatives on the function describing the curve”. Page 5 Section Results Para 2 and 5: “Figure 2 shows the curve fitting and prediction results for the random sampling learning curve using D2 data at different sample sizes. In Figure 2a the curve was fitted using 6 data points; the predicted curve (blue) deviates slightly from the actual data points (black), though the actual data points do fall in the relatively large confidence interval (red) (confidence interval providing upper and lower bounds)…the predicted confidence interval narrows dramatically as more samples are used and the prediction becomes more accurate”; here the confidence interval bounds are taught as being dependent on the sets’ accuracies (the upper bound being computed from the training value and the lower bound being computed from the test value). Page 4 Section Evaluation: “Each dataset was randomly split into a training set and a testing set. Test sets for D1 and D2 contained 1,000 instances each while 2,500 instances were set apart as test set in D3” Dataset D2 is split into training and test set. Confidence interval of a quality metric (in this case accuracy) is calculated for each datasets including D2 (which contains both training and test data set).  Figueroa further states that cross validation is used for testing or validation and confidence interval is calculated accordingly as mentioned on Page 7 Para 4: “In many studies, as well as ours, the learning curves appear to be smooth because each data point on the curve is assigned the average value from multiple experiments (e.g. 10-fold cross validation repeated 100 times). With fewer experiments (e.g. 1 round of training and testing per data point), the curve will not be as smooth. We expect the model fitting to be more accurate and the confidence interval to be narrower on smoother curves, though the fitting process remains the same for the less smooth curves.”), and
iteratively prune the set of machine-learning configurations based on the confidence intervals, select machine-learning configurations for training by the training and test component, and determine, for the selected machine-learning configurations, associated sample sizes for the sampled training dataset.

However, Figueroa does not teach iteratively prune the set of machine-learning configurations based on the confidence intervals, select machine-learning configurations for training by the training and test component, and determine, for the selected machine-learning configurations, associated sample sizes for the sampled training dataset
Nowozin teaches iteratively prune the set of machine-learning configurations based on the confidence intervals, select machine-learning configurations for training by the training and test component (Para 0034: “For example, options whose confidence intervals do not overlap with the confidence interval of the highest scoring option are identified. The identified options are removed 314 from the list of potential options at step 302 and the process may repeat until only one option remains 312 or only a specified number of options remains.”)
and determine, for the selected machine-learning configurations, associated sample sizes for the sampled training dataset (Para 0031: “In the example of FIG. 2 there are two options which are feature A and feature B. although in practice many more options may be present (two are shown for clarity… Once more than 500 samples have been received the information gain scores settled down and are clearly separated for features A and B with feature B having a higher information gain score.” Associated sample size is determined for machine learning features/configurations from training dataset.).
Further, Figueroa at least implies a system comprising: one or more hardware processors configured to implement a plurality of processing components; however, Nowozin teaches a system comprising: one or more hardware processors configured to implement a plurality of processing components (Para 0112: “The computer executable instructions may be provided using any computer-readable media that is accessible by computing based device 504.”).
Same motivation to combine the teachings of Figueroa and Nowozin as claim 1.
Further still, Figueroa at least implies compute from the training and test values of the quality metric, confidence interval providing upper and lower bounds for a real value of the quality metric if the selected machine-learning configuration were trained and tested on the training and test datasets in their entirety, the upper bound being computed from the training value and the lower bound being computed from the test value (see mappings above); however Poh teaches compute from the training and test values of the quality metric, confidence interval providing upper and lower bounds for a real value of the quality metric if the selected machine-learning configuration were trained and tested on the training and test datasets in their entirety, the upper bound being computed from the training value and the lower bound being computed from the test value (sections 3.2 and 5 teach using a “31-user data set” training set and a “64-user data set” testing set for establishing and validating a confidence region with calculated “upper and lower bounds” (upper/lower bound being computed) from “sample variability” and “coverage” (training and test values of the quality metric)).
Same motivation to combine the teachings of Figueroa, Nowozin, and Poh as claim 1.

Regarding claim 19, Figueroa, Nowozin, and Poh teach the method of claim 18.
Nowozin also teaches wherein the processing components further comprise: a data sampler configured to sample the training dataset based on the sample sizes determined by the sampling and scheduling component for the selected machine-learning configurations (Fig. 1 shows samples are derived from datasets for training and providing scores to features/option. Para 0031: “Once more than 500 samples have been received the information gain scores settled down and are clearly separated for features A and B with feature B having a higher information gain score.” Fig. 2 also plots information gain for multiple features of training samples at different sample sizes.).
Same motivation to combine the teachings of Figueroa, Nowozin, and Poh.

Regarding claim 20, Figueroa, Nowozin, and Poh teach the method of claim 18.
Figueroa also teaches wherein the sampling and scheduling component determines the sample sizes for each of the machine-learning configurations based on a predetermined progressive sampling schedule associated with that machine-learning configuration (Page 2 Last Para: “Another research area related to our work is progressive sampling. Both active learning and progressive sampling start with a very small batch of instances and progressively increase the training data size until a termination criteria is met [31-36].”)

Claim 5 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Figueroa et al. (“Predicting sample size required for classification performance”), in view of Nowozin et al. (US20140172753A1), in view of Poh et al. (“Estimating the Confidence Interval of Expected Performance Curve in Biometric Authentication Using Joint Bootstrap”, 2007), further in view of Bergstra et al. (“Random search for hyper-parameter optimization.” ~ IDS).

Regarding claim 5, Figueroa, Nowozin, and Poh teaches the method of claim 4.
Neither Figueroa nor Nowozin do not explicitly teach wherein the associated sample sizes increase geometrically over repeated training iterations.
Bergstra, however, teaches wherein the associated sample sizes increase geometrically over repeated training iterations (Fig. 6 shows trials/sample are done geometrically (increasing by multiple of 2. Training is done iteratively. Section 1 Para 1 and 2. Training is performed iteratively on a training set.).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine the use of training sample size for a given classifier’s configuration and confidence interval generation for quality metric of Figueroa as modified by Nowozin with the sampling method of Bergstra to find the right sampling size that determines the best or better model/configuration at a given computational budget (Bergstra, Fig. 6).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the use of training sample size for a given classifier’s configuration and confidence interval generation for quality metric, as taught by Figueroa as modified by pruning mechanism of options or configurations as taught by Nowozin, as modified by machine learning algorithm upper and lower confidence region calculations as taught by Poh, to include the sampling method of Bergstra to find the right sampling size that determines the best or better model/configuration at a given computational budget (Bergstra, Fig. 6).

Regarding claim 10, Figueroa, Nowozin, and Poh teach the method of claim 1.
Neither Figueroa and nor Nowozin explicitly teach wherein selection of a machine learning configuration for training is based at least in part on training costs associated with reducing the confidence intervals of the machine-learning configurations within the set.
Bergstra, however, teaches wherein selection of a machine learning configuration for training is based at least in part on training costs associated with reducing the confidence intervals of the machine-learning configurations within the set (Fig 6. shows confidence intervals based on computational budget/cost are shrinking as sample/# trial increases. The method to calculate confidence interval is mentioned on Fig 2.).
Same motivation to combine the teachings of Figueroa, Nowozin, Poh, and Bergstra as claim 5.

Claims 11 is rejected under 35 U.S.C. 103 as being unpatentable over Figueroa et al. (“Predicting sample size required for classification performance”), in view of Nowozin et al. (US20140172753A1), in view of Poh et al. (“Estimating the Confidence Interval of Expected Performance Curve in Biometric Authentication Using Joint Bootstrap”, 2007), in view of Luo (“A review of automatic selection methods for machine learning algorithms and hyperparamter values.”).

Regarding claim 11, Figueroa, Nowozin, and Poh teaches the method of claim 1.
Neither Figueroa nor Nowozin teach explicitly wherein the approximate best machine-learning configuration is one of one or more machine-learning configurations remaining within the pruned set when a time limit has been reached.
Luo also teaches wherein the approximate best machine-learning configuration is one of one or more machine-learning configurations remaining within the pruned set when a time limit has been reached (Page 3 Para 1: “Third, the researcher trains the machine learning model to automatically optimize the ordinary parameters of the chosen algorithm…This process is repeated until the researcher obtains a model with satisfactory accuracy, runs out of time, or thinks that the model’s accuracy cannot be improved much further any more.” When time limit is reached or evaluation process runs out of time, the best configurations search process will be stopped and the remaining configuration can be selected.).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the use of training sample size for a given classifier’s configuration and confidence interval generation for quality metric, as taught by Figueroa as modified by pruning mechanism of options or configurations as taught by Nowozin, as modified by machine learning algorithm upper and lower confidence region calculations as taught by Poh, to include the time constraint as taught by Luo in order to determine model accuracy to optimize the parameters of algorithm (Luo, Page 3 Para 1).

Allowable Subject Matter
Claim 15 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Paik et al (US Pub 20170046839) teaches machine learning models with tuned parameters that a split training and test data are used for obtaining model confidence intervals. 

Conclusion
12.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        


/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123