DETAILED ACTION
Response to Arguments
Applicant’s arguments, see Applicant Arguments/Remarks, filed 05/12/2022, with respect to the objection to the specification and the rejection under 35 USC § 112(b) have been fully considered and are persuasive. They have been withdrawn. 
Applicant’s arguments, see Applicant Arguments/Remarks, filed 05/12/2022, with respect to 35 USC § 103 have been fully considered but they are not persuasive. 
Applicant Argues that Wang nor Mueller teaches, suggests, or discloses executing software under test at all nor with test configurations selected based on test results predicted by a trained predictive 17Atty. Dkt. No. 04500-0120-02model in which the test configurations are selected to maximize the memory consumption value or the execution time value or to result in failed execution of the software under test as recited in the independent claims 1, 29, and 30. See pages 16-22 of  Applicant Arguments/Remarks filed 05/12/2022. 
Examiner respectfully disagrees. As a preliminary matter, In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies upon in the arguments (i.e., test configurations selected based on test results predicted by a trained predictive 17Atty. Dkt. No. 04500-0120-02model) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Nevertheless, the prior art of Wang does teach executing software under test(Wang, paras. 0017-0018, “Each machine-learning configuration(i.e. under test) within the set 102 may specify a mathematical model (e.g., an equation or algorithm)(i.e. software )for predicting output data from input data, in conjunction with a learning algorithm for setting adjustable parameters of the model to fit the training dataset 107(i.e. executing).”) with test configurations selected based on test results predicted by a trained predictive 17Atty. Dkt. No. 04500-0120-02model(Wang, para. 0027, see also fig. 3A, “FIG. 3A shows the upper and lower confidence bounds for all five configurations (i.e. with test configurations) at a time within the iterative training and testing process when all configurations have been trained and tested on respective sampled datasets(i.e. selected based on test results predicted by a trained predictive 17Atty. Dkt. No. 04500-0120-02model).”) and the newly added prior art of Groce teaches the test configurations are selected to maximize the memory consumption value or the execution time value or to result in failed execution of the software under test(Groce, pg. 313, left-column, “Fig. 3 graphs these differences: the efficiencies shown are averages of rates over all suites (whose sizes ranged from five to 25 test cases)  and all classifiers at each training set size…[e]ven given this extremely accurate classifier, 63 percent of the CONFIDENCE-generated test cases detected failures(i.e. the test configurations are selected to result in failed execution of the software under test).”).1 Accordingly, Wang in view of Groce and in view of Mueller teaches, suggests, or discloses independent claims 1, 29, and 30. The rejection under 35 USC § 103 has not been withdrawn.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 4, 6-7, 11-14, 21, and 27-30 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. US 2020/0065712 Al (“Wang”) in view of Groce, et al. "You are the only possible oracle: Effective test selection for end users of interactive machine learning systems." IEEE Transactions on Software Engineering 40.3 (2013)(“Groce”) and in view of Mueller et al.  US 2021/0326717 Al (“Mueller”). 
Regarding claim 1, Wang teaches a non-transitory computer-readable medium having stored thereon computer-readable instructions that when executed by a computing device cause the computing device to: (A) execute software under test with a first plurality of test configurations to generate a test result for each test configuration of the first plurality of test configurations, wherein each test configuration of the first plurality of test configurations includes a value for each test parameter of the plurality of test parameters, wherein each test parameter of the plurality of test parameters is an input to the software under test(Wang, para. 0028, see also fig. 4 and fig. 1, “With reference to FIG. 4, a method 400 for the efficient automated determination of an approximate best machine-learning configuration, in accordance with various example embodiments, will now be described in more detail. The method 400 takes, at 402, an initial set C of                         
                            |
                            C
                            |
                        
                    =n candidate configurations (corresponding to set 102), training and test datasets (107, 108), and a prescribed loss tolerance(e.g., accuracy-loss tolerance)                         
                            ϵ
                        
                     as inputs” & see also Wang, paras. 0017-0018, “Each machine-learning configuration within the set 102 may specify a mathematical model (e.g., an equation or algorithm) for predicting output data from input data, in conjunction with a learning algorithm for setting adjustable parameters of the model to fit the training dataset 107. The model and/or learning algorithm may further include hyperparameters that are fixed for a given configuration, but can differ between potentially multiple configurations for a model and learning algorithm of a given type…[t]he machine-learning configurations fanning the candidate set 102, including the kinds of models and algorithms contained therein, generally depend on the particular machine-learning task and type of data they pertain to. For a task involving the prediction of a dependent quantitative variable from an independent quantitative variable, for instance, the candidate set 102 may include a decision tree and/or one or more regression models specifying candidate functional relationships between the variables. As another example, for a classification task, the models within the candidate set 102 may include, without limitation, a naïve Bayes classifier, a decision tree or random forest, a logistic regression model, and/or one or more artificial neural networks (with possibly various network architectures). Machine-learning configurations for neural-network models, in turn, may specify various associated learning algorithms (e.g., backpropagation of errors, or reinforcement learning with various rewards), and differ in hyperparameters such as the number of layers within a network, or the step size used when adjusting network weights in the learning process.” ); (B) train a predictive model using each test configuration of the first plurality of test configurations and a target variable value that is the test result generated for each test configuration of the first plurality of test configurations based on an objective function value(Wang, para. 0027, see also fig. 3A, “FIG. 3A shows the upper and lower confidence bounds for all five configurations at a time within the iterative training and testing process when all configurations have been trained and tested on respective sampled datasets. The best-performing configuration at this time is configuration Cl . As can be seen, the upper bound 300 of the confidence interval for configuration C5 is below the lower bound 302 of the confidence interval for configuration Cl. Thus, configuration C5 can be removed from the candidate set” & see also Wang, para. 0019, “In the context of predicting dependent variables from independent variables, the output items are the actual values of the dependent variables for given independent-variable inputs. The prediction accuracy of a trained model can, in this case, be determined as a function of the discrepancy between actual and predicted output values (e.g., as measured by the sum of squared errors)” & see also Wang, paras. 0017-0018, “Each machine-learning configuration within the set 102 may specify a mathematical model (e.g., an equation or algorithm) for predicting output data from input data, in conjunction with a learning algorithm for setting adjustable parameters of the model to fit the training dataset 107. The model and/or learning algorithm may further include hyperparameters that are fixed for a given configuration, but can differ between potentially multiple configurations for a model and learning algorithm of a given type…[t]he machine-learning configurations fanning the candidate set 102, including the kinds of models and algorithms contained therein, generally depend on the particular machine-learning task and type of data they pertain to. For a task involving the prediction of a dependent quantitative variable from an independent quantitative variable, for instance, the candidate set 102 may include a decision tree and/or one or more regression models specifying candidate functional relationships between the variables. As another example, for a classification task, the models within the candidate set 102 may include, without limitation, a naïve Bayes classifier, a decision tree or random forest, a logistic regression model, and/or one or more artificial neural networks (with possibly various network architectures). Machine-learning configurations for neural-network models, in turn, may specify various associated learning algorithms (e.g., backpropagation of errors, or reinforcement learning with various rewards), and differ in hyperparameters such as the number of layers within a network, or the step size used when adjusting network weights in the learning process.”); (C) execute the trained predictive model with a second plurality of test configurations to predict the test result for each test configuration of the second plurality of test configurations, wherein each test configuration of the second plurality of test configurations includes the value for each test parameter of the plurality of test parameters(Wang, para. 0027, see also fig. 3B, “FIG. 3B shows the remaining candidate set a few iterations later with updated confidence intervals. Now, configuration C2 has surpassed the performance of configuration Cl and has the highest associated lower bound 304, and configuration C3 has fallen below that lower bound 304 with its upper bound 306. Accordingly, configuration C3 is removed at this stage.”); (D) select a predefined number of test configurations from the second plurality of test configurations based on the predicted test result for each test configuration of the second plurality of test configurations, wherein the selected predefined number of test configurations define a third plurality of test configurations; (E) execute the software under test with the defined third plurality of test configurations to generate the test result for each test configuration of the third plurality of test configurations, wherein each test configuration of the third plurality of test configurations includes the value for each test parameter of the plurality of test parameters (Wang, para. 0027, see also fig. 3C, “Still a few iterations later, as shown in FIG. 3C, the upper bounds 308, 310 for configurations Cl and C4 are below the updated lower bound 304 of configuration C2. Thus, configurations Cl and C4 can be pruned, leaving only configuration C2 as the approximate best configuration in the candidate set.”). 
Wang does not teach: wherein the test result includes at least one of a severity code, a memory consumption value, or an execution time value, wherein the severity code indicates whether or not execution of the software under test failed using a respective test configuration, wherein the memory consumption value indicates an amount of computer memory used for execution of the software under test using the respective test configuration, wherein the execution time value indicates an amount of time used for execution of the software under test using the respective test configuration; wherein the third plurality of test configurations are selected to maximize the memory consumption value or the execution time value or to result in failed execution of the software under test; and for each test configuration of the third plurality of test configurations to identify errors in the software under test. 
However, Groce teaches: wherein the test result includes at least one of a severity code, a memory consumption value, or an execution time value, wherein the severity code indicates whether or not execution of the software under test failed using a respective test configuration, wherein the memory consumption value indicates an amount of computer memory used for execution of the software under test using the respective test configuration, wherein the execution time value indicates an amount of time used for execution of the software under test using the respective test configuration(Groce, pg. 311, left-column, “From a software testing perspective, CONFIDENCE, a method based on prioritizing test cases in ascending order of p in (i,                         
                            l
                        
                    , p)(such that cases where the label has the lowest probability are tested first), is analogous to asking the software’s original programmer to prioritize testing code most likely to fail—but in our case, the “programmer” is also software. Thus, the CONFIDENCE approach is a prioritization method that capitalizes on the ability of classifiers to “find their own bugs” by selecting cases where they have low confidence… [c]onfidence can be measured in a variety of ways. We compute confidence as the magnitude of the probability assigned to the most likely labeling, and prioritize test cases according to those with the lowest probabilities. CONFIDENCE selects ambiguous test instances—instances that fall on or close to decision boundaries.”)2; wherein the third plurality of test configurations are selected to maximize the memory consumption value or the execution time value or to result in failed execution of the software under test(Groce, pg. 313, left-column, “Fig. 3 graphs these differences: the efficiencies shown are averages of rates over all suites (whose sizes ranged from five to 25 test cases) and all classifiers at each training set size. For all but the smallest training sets (200-500 instances), differences between all pairs of methods, except where data points coincide, are significant at the 95 percent confidence level. Fig. 3g shows 95 percent confidence intervals at three training set sizes of Fig. 3a’s configuration. As the Fig. 3 illustrates, the best methods were very efficient at identifying failures. For example, consider the RANDOM line in Fig. 3c. RANDOM is statistically guaranteed to detect failures at the rate they occur, and thus is also a statistical representative of the classifier’s accuracy. This indicator shows that the Reuters SVM classifier was extremely accurate when trained on 2,000 instances (the rightmost point on the x-axis), with a failure rate of only 3.5 percent. Even given this extremely accurate classifier, 63 percent of the CONFIDENCE-generated test cases detected failures.”)3; and for each test configuration of the third plurality of test configurations to identify errors in the software under test(Groce, pg. 316, left-column, “Our prototype prioritizes the classifier’s topic predictions that are most likely to be wrong, and communicates these prioritizations using saturated green squares to draw a user’s eye (e.g., Fig. 1, fourth message). The prioritizations may not be perfect, but they are only intended to be advisory; users are free to test any messages they want, not just ones the system suggests.”).	
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to combine the teachings of Wang with the above
teachings of Groce, the motivation to do so would be to test machine learning classifiers just as typical software is tested before being put into production(Groce, pgs. 307-308, “A classifier can mislabel an input (fail) even if the algorithm that generated the classifier is correctly implemented. Failures occur for a variety of reasons such as noise in the training data, overfitting due to a limited amount of training data, underfitting because the classifier’s decision boundaries
are not expressive enough to correctly capture the concept to be learned, and sample selection bias, where training instances are biased towards inputs uncommonly seen in real-world usage. Further, many machine learning systems are interactive and continually learn from their
users’ behavior. An interactive machine learning system may apply different output labels to the same input from one week to the next, based on how its user has recently interacted with the system… [t]his paper encapsulates our previous work, which introduced a user interface (WYSIWYT/ML)…to support end users testing classifiers, and makes the following additional
contributions… [p]roposes a methodology for experimentally evaluating classifier testing methods for humans as a filtering process before costly user studies… [i]nvestigates our test selection methods in large-scale automated experiments, over a larger variety of benchmarks than possible with humans… [e]valuates which found failures are unexpected, making them both especially challenging and important to detect, a problem difficult to explore in human studies.”)
	Wang does not teach: and (F) output the generated test result for each test configuration of the first plurality of test configurations.  
	However, Mueller teaches: and (F) output the generated test result for each test configuration of the first plurality of test configurations (Mueller, paras. 0078-0081, see also fig. 8, “FIG. 8 is a diagram illustrating one exemplary interactive code exploration user interface 800 for viewing and/or modifying an automated machine learning pipeline exploration according to some embodiments. As shown in FIG. 8, an interactive code exploration may be presented to
a user (e.g., via a web browser as a web application) that allows the user to explore code, run code, modify and run code, etc. In some embodiments, code for performing ML pipeline exploration may be presented to users to provide the users visibility into what particular pipelines are recommended to be tested, what preprocessing operations will be used, etc… The interactive code exploration user interface 800 may also include a section 815 to display results of the
pipelines being run, and finally, a code section 820 to define different combinations of machine learning models and pipelines, each including values for a name, an ML algorithm to use, a set of hyperparameters, an identifier of a storage location storing a particular set of input values generated by one of the feature processing pipelines, and the like.”).
	Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to combine the teachings of Wang with the above
teachings of Mueller, the motivation to do so would be to do code free machine learning in which all that has to be provided is the data(Mueller, paras. 0018-0019, “According
to some embodiments, a code-free machine learning ("CIVIL") service of a service provider network enables users to easily train high-quality custom machine learning (ML) models and/or pipelines for without necessarily needing to write code or have significant knowledge of ML
concepts or techniques. In some embodiments, the CML service allows users to easily construct optimized ML pipelines by simply providing a training dataset—and possibly, nothing more.”). 
Regarding claim 2, Wang in view of Groce and in view of  Mueller teaches the non-transitory computer-readable medium of claim 1, wherein a model type of the predictive model is a random forest model type (Wang, para. 0018, “The machine-learning configurations forming the candidate set 102, including the kinds of models and algorithms contained therein, generally depend on the particular machine-learning task and type of data they pertain to. For a task involving the prediction of a dependent quantitative variable from an independent quantitative variable, for instance, the candidate set 102 may include a decision tree and/or one or more regression models specifying candidate functional relationships between the variables. As another example, for a classification task, the models within the candidate set 102 may include, without limitation, a naïve Bayes classifier, a decision tree or random forest, a logistic regression model, and/or one or more artificial neural networks (with possibly various network architectures).”).  
Regarding claim 4, Wang in view of Groce and in view of  Mueller teaches the non-transitory computer-readable medium of claim 1, wherein after (E), the computer-readable instructions further cause the computing device to: determine an importance value for each test parameter of the plurality of test parameters using a second predictive model trained using each test configuration of the first plurality of test configurations in association with the test result generated for each test configuration of the first plurality of test configurations(Wang, paras. 0029-0031, see also fig. 4, “In each loop of the iterative process, the training and test datasets are sampled (e.g., by data sampler 114), at 410, based on the determined sample sizes. At 412, the probe configuration                         
                            
                                
                                    C
                                
                                
                                    p
                                    r
                                    o
                                    b
                                
                            
                        
                     is trained on the sampled training dataset, and then evaluated on the sampled test dataset (or, in some embodiments, on the full test dataset) (e.g., by training and test component 112). In the course of training and testing, a quality metric characterizing the performance of the trained probe configuration                         
                            
                                
                                    C
                                
                                
                                    p
                                    r
                                    o
                                    b
                                
                            
                        
                    is evaluated on the sampled training and test datasets. For example, if predictive accuracy is used as the quality metric, training and test accuracies are computed. At 414, the estimated confidence interval associated with the probe configuration                         
                            
                                
                                    C
                                
                                
                                    p
                                    r
                                    o
                                    b
                                
                            
                        
                     is updated (e.g., by scheduling and pruning component 116) based on the training and test accuracies ( or training and test values of some other quality metric), optionally in conjunction with other parameters.”), and using each test configuration of the third plurality of test configurations in association with the test result generated for each test configuration of the third plurality of test configurations(Wang, paras. 0029-0031, see also fig. 4, “At 416, the updated lower bound                        
                             
                            
                                
                                    C
                                
                                
                                    p
                                    r
                                    o
                                    b
                                
                            
                            .
                            l
                        
                     is compared against the lower bound                         
                            
                                
                                    C
                                
                                
                                    
                                        
                                            i
                                        
                                        
                                            '
                                        
                                    
                                
                            
                            l
                             
                        
                    of the current presumed best configuration                         
                            
                                
                                    C
                                
                                
                                    
                                        
                                            i
                                        
                                        
                                            '
                                        
                                    
                                
                            
                             
                        
                    and if                         
                            
                                
                                    C
                                
                                
                                    p
                                    r
                                    o
                                    b
                                
                            
                            .
                            l
                        
                     >                        
                            
                                
                                    C
                                
                                
                                    
                                        
                                            i
                                        
                                        
                                            '
                                        
                                    
                                
                            
                            l
                             
                        
                     the presumed best configuration                        
                            
                                
                                     
                                    C
                                
                                
                                    
                                        
                                            i
                                        
                                        
                                            '
                                        
                                    
                                
                            
                             
                        
                    is updated to the probe configuration                         
                            
                                
                                    C
                                
                                
                                    p
                                    r
                                    o
                                    b
                                
                            
                        
                     (and the lower bound                         
                            
                                
                                    C
                                
                                
                                    
                                        
                                            i
                                        
                                        
                                            '
                                        
                                    
                                
                            
                            l
                             
                        
                    is, accordingly, updated to                         
                            
                                
                                    C
                                
                                
                                    p
                                    r
                                    o
                                    b
                                
                            
                            .
                            l
                        
                    ). The remaining configurations set                         
                            Ω
                        
                     is then pruned, at 418, based on comparisons between the lower bound                         
                            
                                
                                    C
                                
                                
                                    
                                        
                                            i
                                        
                                        
                                            '
                                        
                                    
                                
                            
                            l
                        
                     of the new presumed best configuration (which, by virtue of the iterative updating of the presumed best configuration, is the configuration with the highest lower bound) and the upper bounds C.u… Following pruning (at 418), a configuration for the next probe is selected from the remaining configurations set                         
                            Ω
                        
                    , and the associated sample sizes for sampling the training and test datasets are determined (e.g., by scheduling and pruning component 116) at 420.”); and output the determined importance value for each test parameter of the plurality of test parameters(Mueller, para. 0077, see also fig. 7 and fig. 6, “For additional detail, FIG. 7 is a diagram illustrating one exemplary user interface 700 for viewing trial results of an automated machine learning pipeline exploration according to some embodiments. This user interface 700 may be displayed, for example, when a user selects a particular trial in the second panel 610 of FIG. 6 and presents additional detail about a particular trial… a fourth panel 720 allows the user to view/download other pipeline artifacts (e.g., model code/weights, hyperparameter values).”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Wang with the above teachings of Mueller for the same rationale stated at Claim 1.  
Regarding claim 6, Wang in view of Groce and in view of  Mueller teaches the non-transitory computer-readable medium of claim 1, wherein each test configuration of the first plurality of test configurations is unique relative to other test configurations of the first plurality of test configurations (Wang, para. 0026, “FIGS. 3A-3C illustrate the progressive, confidence-interval-based pruning according to various embodiments with an example sequence of estimated confidence intervals for an example candidate set of initially five machine-learning configurations, labeled Cl through C5. For ease of illustration, pruning in the depicted example is performed with an accuracy-loss tolerance of zero, meaning that a configuration is deleted from the set only when its upper bound is at or falls below the lower bound of the highest-confidence interval among all the configurations (such that there is no longer a range of overlap between the two configurations).”).  
Regarding claim 7, Wang in view of Groce and in view of  Mueller teaches the non-transitory computer-readable medium of claim 1, wherein each test configuration of the third plurality of test configurations is unique relative to other test configurations of the third plurality of test configurations and is unique relative to the first plurality of test configurations(Wang, para. 0026, “FIGS. 3A-3C illustrate the progressive, confidence-interval-based pruning according to various embodiments with an example sequence of estimated confidence intervals for an example candidate set of initially five machine-learning configurations, labeled Cl through C5. For ease of illustration, pruning in the depicted example is performed with an accuracy-loss tolerance of zero, meaning that a configuration is deleted from the set only when its upper bound is at or falls below the lower bound of the highest-confidence interval among all the configurations (such that there is no longer a range of overlap between the two configurations).”).  
Regarding claim 11, Wang in view of Groce and in view of  Mueller teaches the non-transitory computer-readable medium of claim 1, wherein before (A), the computer-readable instructions further cause the computing device to 42Atty. Dkt. No.: 04500-0120-02 (100290) define the first plurality of test configurations based on random selection of a value for each test parameter of the plurality of test parameters, wherein the value is selected between a minimum value and a maximum value defined for each test parameter of the plurality of test parameters(Wang, para. 0028, see also fig. 4, “At 404, the remaining configurations set                         
                            Ω
                        
                     is initialized to C, the probe configuration                        
                             
                            
                                
                                    C
                                
                                
                                    p
                                    r
                                    o
                                    b
                                
                            
                        
                      is initialized to a candidate configuration                         
                            
                                
                                    C
                                
                                
                                    1
                                
                            
                             
                        
                    selected (e.g., randomly) from C, and a presumed best configuration                         
                            
                                
                                    C
                                
                                
                                    
                                        
                                            i
                                        
                                        
                                            '
                                        
                                    
                                
                            
                        
                     is initialized to the same candidate configuration                         
                            
                                
                                    C
                                
                                
                                    1
                                
                            
                        
                    . Further, estimated confidence intervals for all configurations within the set C may be initialized, e.g., to the full possible range that the selected quality metric can assume (e.g., to a range from 0 to 1 for confidence intervals based on accuracy). At 406, an initial training sample size for configuration                         
                            
                                
                                    C
                                
                                
                                    1
                                
                            
                        
                     is determined, e.g., based on a predetermined sampling schedule associated with the configuration. An initial test sample size may also be determined at 406 in accordance with a sampling schedule for                         
                            
                                
                                    C
                                
                                
                                    1
                                
                            
                        
                    .”).  
Regarding claim 12, Wang in view of Groce and in view of  Mueller teaches the non-transitory computer-readable medium of claim 11, wherein the value is selected based on a sampling method defined for each test parameter of the plurality of test parameters(Wang, para. 0022, see also fig. 1, “The data sampler 114 is configured to generate the sampled training and test datasets 118, 119 by sampling (e.g., randomly) from the full training and test datasets 107, 108, respectively, using sample sizes 122 determined by the scheduling and pruning component 116 and communicated to the data sampler 114, e.g., via the training and test component 112.”).  
Regarding claim 13, Wang in view of Groce and in view of  Mueller teaches the non-transitory computer-readable medium of claim 1, wherein before (A), the computer-readable instructions further cause the computing device to define the plurality of test configurations based on random selection of a value for each test parameter of the plurality of test parameters, wherein the value is selected between a minimum value and a maximum value defined for each test parameter of the plurality of test parameters, wherein the first plurality of test configurations are randomly selected from the defined plurality of test configurations(Wang, para. 0028, see also fig. 4, “At 404, the remaining configurations set                         
                            Ω
                        
                     is initialized to C, the probe configuration                        
                             
                            
                                
                                    C
                                
                                
                                    p
                                    r
                                    o
                                    b
                                
                            
                        
                      is initialized to a candidate configuration                         
                            
                                
                                    C
                                
                                
                                    1
                                
                            
                             
                        
                    selected (e.g., randomly) from C, and a presumed best configuration                         
                            
                                
                                    C
                                
                                
                                    
                                        
                                            i
                                        
                                        
                                            '
                                        
                                    
                                
                            
                        
                     is initialized to the same candidate configuration                         
                            
                                
                                    C
                                
                                
                                    1
                                
                            
                        
                    . Further, estimated confidence intervals for all configurations within the set C may be initialized, e.g., to the full possible range that the selected quality metric can assume (e.g., to a range from 0 to 1 for confidence intervals based on accuracy). At 406, an initial training sample size for configuration                         
                            
                                
                                    C
                                
                                
                                    1
                                
                            
                        
                     is determined, e.g., based on a predetermined sampling schedule associated with the configuration. An initial test sample size may also be determined at 406 in accordance with a sampling schedule for                         
                            
                                
                                    C
                                
                                
                                    1
                                
                            
                        
                    .”).   
Regarding claim 14, Wang in view of Groce and in view of  Mueller teaches the non-transitory computer-readable medium of claim 13, wherein the second plurality of test configurations are the defined plurality of test configurations after excluding the selected first plurality of test configurations(Wang, para. 0027, see also fig. 3B, “FIG. 3B shows the remaining candidate set a few iterations later with updated confidence intervals. Now, configuration C2 has surpassed the performance of configuration Cl and has the highest associated lower bound 304, and configuration C3 has fallen below that lower bound 304 with its upper bound 306. Accordingly, configuration C3 is removed at this stage.”).  
Regarding claim 21, Wang in view of Groce and in view of  Mueller teaches the non-transitory computer-readable medium of claim 1, wherein the predicted test result is a probability that execution of the software under test with a test configuration of the third plurality of test configurations has the severity code that indicates execution of the software under test failed (Wang, para. 0027, see also fig. 3C and fig.4, “Still a few iterations later, as shown in FIG. 3C, the upper bounds 308, 310 for configurations Cl and C4 are below the updated lower bound 304 of configuration C2. Thus, configurations Cl and C4 can be pruned, leaving only configuration C2 as the approximate best configuration in the candidate set” & see also Wang, para. 0036, “From the two above inequality relations, it follows that, with a probability of at least                         
                            1
                            -
                            
                                
                                    δ
                                
                                
                                    
                                        
                                            n
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                     the true accuracy                         
                            A
                            (
                            
                                
                                    H
                                
                                
                                    t
                                    r
                                
                            
                            ,
                             
                            
                                
                                    D
                                
                                
                                    t
                                    e
                                
                            
                            )
                        
                     of a trained configuration is within the confidence interval [l,                         
                            U
                        
                    ] with the above expressions for the lower and upper bounds of the confidence interval. It can farther be shown that, with a probability of at least                         
                            1
                            -
                            δ
                        
                    , the method 400 of FIG. 4, using these expressions for 1 and u, returns, as the approximate best configuration                         
                            
                                
                                    C
                                
                                
                                    
                                        
                                            i
                                        
                                        
                                            '
                                        
                                    
                                
                            
                            ,
                             
                        
                     a configuration whose real test accuracy                         
                            
                                
                                    A
                                
                                
                                    i
                                
                            
                        
                     is within the accuracy-loss tolerance                         
                            ϵ
                        
                     of the test accuracy                        
                            
                                
                                     
                                    A
                                
                                
                                    
                                        
                                            i
                                        
                                        
                                            *
                                        
                                    
                                
                            
                        
                     of the true best configuration                        
                            
                                
                                     
                                    C
                                
                                
                                    
                                        
                                            i
                                        
                                        
                                            *
                                        
                                    
                                
                            
                            :
                             
                            
                                
                                    A
                                
                                
                                    
                                        
                                            i
                                        
                                        
                                            *
                                        
                                    
                                
                            
                            -
                            
                                
                                     
                                    A
                                
                                
                                    
                                        
                                            i
                                        
                                        
                                            '
                                        
                                    
                                
                            
                            ≤
                            ϵ
                        
                    .”).  
Regarding claim 27, Wang in view of Groce and in view of  Mueller teaches the non-transitory computer-readable medium of claim 1, wherein the predefined number of test configurations are selected based on having maximum values for the predicted test result(Wang, para. 0026, see also figs. 3A-3C, “FIGS. 3A-3C illustrate the progressive, confidence- interval-based pruning according to various embodiments with an example sequence of estimated confidence intervals for an example candidate set of initially five machine-learning configurations, labeled C1 through C5. For ease of illustration, pruning in the depicted example is performed with an accuracy-loss tolerance of zero, meaning that a configuration is deleted from the set only when its upper bound is at or falls below the lower bound of the highest-confidence interval among all the configurations.”).  
Regarding claim 28, Wang in view of Groce and in view of  Mueller teaches the non-transitory computer-readable medium of claim 1, wherein the predefined number of test configurations are further selected from the second plurality of test configurations based on a second predicted test result(Wang, para. 0027, see also figs. 3A and 3B, “FIG. 3A shows the upper and lower confidence bounds for all five configurations at a time within the iterative training and testing process when all configurations have been trained and tested on respective sampled datasets… FIG. 3B shows the remaining candidate set a few iterations later with updated confidence intervals. Now, configuration C2 has surpassed the performance of configuration Cl and has the highest associated lower bound 304, and configuration C3 has fallen below that lower bound 304 with its upper bound 306. Accordingly, configuration C3 is removed at this stage.”).  
Referring to independent claims 29-30, they are rejected on the same basis as
independent claim 1 since they are analogous claims.
Claims 3 and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. US 2020/0065712 Al (“Wang”) in view of Groce, et al. "You are the only possible oracle: Effective test selection for end users of interactive machine learning systems." IEEE Transactions on Software Engineering 40.3 (2013)(“Groce”) and in view of Mueller et al.  US 2021/0326717 Al (“Mueller”) and further in view of Adams et al. US 2014/0358831 Al(“Adams”). 
Regarding claim 3, Wang in view of Groce and in view of Mueller does teach the non-transitory computer-readable medium of claim 1, but does not teach: wherein a model type of the predictive model is a Bayesian optimization model type.  
However, Adams teaches: wherein a model type of the predictive model is a Bayesian optimization model type(Adams, para. 0078, “Bayesian optimization techniques described herein involve generating a probabilistic model of an objective function for a particular task (e.g., an objective function relating hyperparameters of a machine learning system to its performance). Any suitable type of probabilistic model of the objective function may be used. In some embodiments, the probabilistic model may comprise a Gaussian process, which is a stochastic process that specifies a distribution over functions. A Gaussian process may be specified by a mean function m:                         
                            X
                            →
                            R
                        
                     and a covariance function (sometimes termed "kernel"
function). For example, when the objective function relates hyper-parameters of a machine learning system to its performance, the Gaussian process is defined on the space of hyperparameters such that the mean function maps sets of hyperparameter values ( each set of hyper-parameter values corresponding to values of one or more hyper-parameters of the machine learning system) to real numbers and the covariance function represents correlation among sets of hyperparameter values.”).
	 Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to combine the teachings of Wang in view of Groce
and in view of Mueller and further in view of Adams, the motivation to do so would be to have an optimization approach for learning machine learning hyperparameters with minimal computations(Mueller, para. 0062, “Accordingly, optimization techniques that require a closed-form analytic representation of the objective function (e.g., techniques that require calculation of gradients) and/or a large number of objective function evaluations ( e.g., interior point methods) are generally not viable approaches to identifying hyper-parameter values of machine learning systems. On the other hand, Bayesian optimization techniques require neither exact knowledge of the objective function nor a large number of objective function evaluations. Although Bayesian optimization techniques rely on evaluations of the objective function, they are designed to reduce the number of such evaluations.”).  
Regarding claim 5, Wang in view of Groce and in view of  Mueller and further in view of Adams teaches the non-transitory computer-readable medium of claim 4, wherein a model type of the second predictive model is a random forest model type(Wang, para. 0018, “The machine-learning configurations forming the candidate set 102, including the kinds of models and algorithms contained therein, generally depend on the particular machine-learning task and type of data they pertain to. For a task involving the prediction of a dependent quantitative variable from an independent quantitative variable, for instance, the candidate set 102 may include a decision tree and/or one or more regression models specifying candidate functional relationships between the variables. As another example, for a classification task, the models within the candidate set 102 may include, without limitation, a naïve Bayes classifier, a decision tree or random forest, a logistic regression model, and/or one or more artificial neural networks (with possibly various network architectures).”).
Claims 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over) Wang et al. US 2020/0065712 Al (“Wang”) in view of Groce, et al. "You are the only possible oracle: Effective test selection for end users of interactive machine learning systems." IEEE Transactions on Software Engineering 40.3 (2013)(“Groce”) and in view of Mueller et al.  US 2021/0326717 Al (“Mueller”) and further in view of Patel, Dhaval, et al. "Smart-ML: A System for Machine Learning Model Exploration using Pipeline Graph." 2020 IEEE International Conference on Big Data (Big Data). IEEE, 2020(“Patel”). 
Regarding claim 8, Wang in view of Groce and in view of Mueller teaches the non-transitory computer-readable medium of claim 1, but does not teach: wherein after (E), the computer-readable instructions further cause the computing device to: (AA) train the predictive model using each test configuration of the first plurality of test configurations in association with the test result generated for each test configuration of the first plurality of test configurations and using each test configuration of the third plurality of test configurations in association with the test result generated for each test configuration of the third plurality of test configurations; 41Atty. Dkt. No.: 04500-0120-02 (100290) (BB) update the second plurality of test configurations by removing the third plurality of test configurations from the second plurality of test configurations; (CC) execute the predictive model trained in (AA) with the updated second plurality of test configurations to predict the test result for each test configuration of the second plurality of test configurations; (DD) select the predefined number of test configurations from the updated second plurality of test configurations based on the predicted test result for each test configuration of the second plurality of test configurations in (CC), wherein the selected predefined number of test configurations define a fourth plurality of test configurations, wherein each test configuration of the fourth plurality of test configurations includes the value for each test parameter of the plurality of test parameters, wherein each test configuration of the fourth plurality of test configurations is unique and is unique relative to the first plurality of test configurations and relative to the third plurality of test configurations; (EE) execute the software under test with the fourth plurality of test configurations to generate the test result for each test configuration of the fourth plurality of test configurations; and (FF) output the generated test result for each test configuration of the fourth plurality of test configurations. 
However, Patel teaches: wherein after (E), the computer-readable instructions further cause the computing device to: (AA) train the predictive model using each test configuration of the first plurality of test configurations in association with the test result generated for each test configuration of the first plurality of test configurations and using each test configuration of the third plurality of test configurations in association with the test result generated for each test configuration of the third plurality of test configurations
 (BB) update the second plurality of test configurations by removing the third plurality of test configurations from the second plurality of test configurations((Patel, pgs. 1608-1609, right-column,  see also Fig.3 , “In the first round, only the last layer of pipeline graph is executed. The execution is limited to the default parameter and does not involve any hyperparameter tuning. Note that, the last layer of pipeline graph is composed of machine learning models and it is compulsory component of any modeling activity. The execution can use spark or celery or cloud engine for speedup. The model performance obtained at the end of the first round act as an baseline for subsequent operation. We select around 50% of the top-performing models to become a candidate for the second round. The first round result is also available to the user… [i]n the third round, we yet again initiated a random search based hyperparameter tuning on the models that are selected in previous round. Compared to previous round, the number of parameters to be tried out for each models in this round is
more and adjustable. At present, we run nearly 5+ different models with 30 different parameter values in this round. Note that, some model does not have many parameters to be tuned.” 41Atty. Dkt. No.: 04500-0120-02 (100290)); (CC) execute the predictive model trained in (AA) with the updated second plurality of test configurations to predict the test result for each test configuration of the second plurality of test configurations(DD) select the predefined number of test configurations from the updated second plurality of test configurations based on the predicted test result for each test configuration of the second plurality of test configurations in (CC)( Patel, pgs. 1608-1609, see also Fig.3, “In the second round, we initiated a random search based hyperparameter tuning on the models that are selected in previous round. A randomized hyperparameter tuning is highly parallel activity. The number of parameters to be tried out for each models in the current round is adjustable but we keep it a small value such as 10. In early stage of execution, we like to control the exploration search space. In this round, we run nearly 10+ different models with 10 different randomly generated parameter values. Out of 10+ models, we select nearly 50% of the top-performing models…”), wherein the selected predefined number of test configurations define a fourth plurality of test configurations, wherein each test configuration of the fourth plurality of test configurations includes the value for each test parameter of the plurality of test parameters, wherein each test configuration of the fourth plurality of test configurations is unique and is unique relative to the first plurality of test configurations and relative to the third plurality of test configurations; (EE) execute the software under test with the fourth plurality of test configurations to generate the test result for each test configuration of the fourth plurality of test configurations; and (FF) output the generated test result for each test configuration of the fourth plurality of test configurations(Patel, pgs. 1608-1609, see also Fig.3, “In round 4 and 5, we focus on discovering length 2 pipeline
paths for each top-performing models. Given a pipeline graph                         
                            
                                
                                    G
                                
                                
                                    k
                                
                            
                        
                     for a                         
                            
                                
                                    k
                                
                                
                                    t
                                    h
                                
                            
                        
                     top-performing model, we decompose                         
                            
                                
                                    G
                                
                                
                                    k
                                
                            
                        
                      into multiple pipeline graphs of depth 2. Note that, the last layer
in each decomposed graph is same. Next, we process each decomposed graph in two stages (i.e., round 4 and round 5) to discover a length-2 pipeline paths that perform better than the
pipeline path from which it got enlarged. Round 4 is similar to Round 1, where pipelines with default parameter is tried, whereas, round 5 is similar to round 2, where randomized hyper parameter tuning is conducted on the top-performing paths outputted by round 4. After round 4 and round 5, we know what are the nodes other than nodes from last stages also helps to improve the performance. In Round 6, we use these nodes to grow a longer length patterns. Up until round 6, we use highly parallelized randomized search operations. In round 7, we apply intelligent search mechanisms for promising pipeline paths that have been discovered. In particular, given a path, we apply evolS, hbandS, bayesS, and rbfOptS on each path to discover a hyper parameter
tuning that can help to improve the performance. Instead of applying intelligent method on each and every pipeline paths, we apply it on top-performing pipeline-paths to improve the execution time.”).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Wang in view of Groce and in view of Mueller and further in view of Patel, the motivation to do so would be to have an automated approach for optimizing machine learning pipelines without exploring the entire configuration space(Patel, pg. 1604, right-column, see also fig. 1, “The term [‘]predictive modeling[’] in the field of data science refers to the process of analysis to discover the best data transformations and modeling forms for ultimately drawing accurate and meaningful inferences from new realizations of (scoring) data. Predictive modeling is of fundamental interest to researchers in machine learning, pattern recognition, databases, statistics, artificial intelligence, knowledge acquisition for expert systems, and data visualization. In the machine learning community, researchers and or data scientists build “Pipeline” to define a series of steps to be performed on an input dataset for building model. As supported by the literature, it is not an easy task to know beforehand what options will work well for a given dataset. As a result, user needs to build and manage many pipelines manually…[t]o make the model exploration task easier, we introduce a concept of [‘]Pipeline Graph[’]. A pipeline graph defines the nature of and ordering of the operations to perform when exploring predictive models for tasks such as classification, regression, or clustering.”)
Regarding claim 9, Wang in view of Groce and in view of  Mueller and further in view of  Patel teaches the non-transitory computer-readable medium of claim 8, wherein before (A), the computer-readable instructions further cause the computing device to define the plurality of test configurations based on random selection of a value for each test parameter of the plurality of test parameters, wherein the value is selected between a minimum value and a maximum value defined for each test parameter of the plurality of test parameters, wherein the first plurality of test configurations are randomly selected from the defined plurality of test configurations(Wang, para. 0028, see also fig. 4, “At 404, the remaining configurations set                         
                            Ω
                        
                     is initialized to C, the probe configuration                        
                             
                            
                                
                                    C
                                
                                
                                    p
                                    r
                                    o
                                    b
                                
                            
                        
                      is initialized to a candidate configuration                         
                            
                                
                                    C
                                
                                
                                    1
                                
                            
                             
                        
                    selected (e.g., randomly) from C, and a presumed best configuration                         
                            
                                
                                    C
                                
                                
                                    
                                        
                                            i
                                        
                                        
                                            '
                                        
                                    
                                
                            
                        
                     is initialized to the same candidate configuration                         
                            
                                
                                    C
                                
                                
                                    1
                                
                            
                        
                    . Further, estimated confidence intervals for all configurations within the set C may be initialized, e.g., to the full possible range that the selected quality metric can assume (e.g., to a range from 0 to 1 for confidence intervals based on accuracy). At 406, an initial training sample size for configuration                         
                            
                                
                                    C
                                
                                
                                    1
                                
                            
                        
                     is determined, e.g., based on a predetermined sampling schedule associated with the configuration. An initial test sample size may also be determined at 406 in accordance with a sampling schedule for                         
                            
                                
                                    C
                                
                                
                                    1
                                
                            
                        
                    .”).  
Regarding claim 10, Wang in view of Groce and in view of  Mueller and further in view of Patel teaches the non-transitory computer-readable medium of claim 9, wherein the second plurality of test configurations are the defined plurality of test configurations after excluding the selected first plurality of test configurations(Wang, para. 0027, see also fig. 3A, “FIG. 3A shows the upper and lower confidence bounds for all five configurations at a time within the iterative training and testing process when all configurations have been trained and tested on respective sampled datasets. The best-performing configuration at this time is configuration Cl . As can be seen, the upper bound 300 of the confidence interval for configuration C5 is below the lower bound 302 of the confidence interval for configuration Cl. Thus, configuration C5 can be removed from the candidate set.”).   
Claims 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. US 2020/0065712 Al (“Wang”) in view of Groce, et al. "You are the only possible oracle: Effective test selection for end users of interactive machine learning systems." IEEE Transactions on Software Engineering 40.3 (2013)(“Groce”) and in view of Mueller et al.  US 2021/0326717 Al (“Mueller”) and further in view of Doniwa et al. US 2017 /0169329 Al (“Doniwa”). 
Regarding claim 15, Wang in view of Groce and in view of Mueller teaches the non-transitory computer-readable medium of claim 1, but do not teach wherein in (A), the software under test is executed in parallel using a plurality of sessions, wherein each session of the plurality of sessions is assigned a different test configuration of the first plurality of test configurations.  
However, Doniwa teaches: wherein in (A), the software under test is executed in parallel using a plurality of sessions, wherein each session of the plurality of sessions is assigned a different test configuration of the first plurality of test configurations(Doniwa, para. 0016, see also fig. 1 and fig. 5, “FIG. 1 is a block diagram showing a specific configuration of a hyper-parameter search system according to the embodiment. This system is a server system of a cluster configuration, wherein a server (hereinafter, referred to as a manager) 11 called a manager, and a plurality (four in the embodiment) of servers (hereinafter, referred to as workers) 12-i (i is any one of 1 to 4), are connected to a network 13” & see also Doniwa, para. 0028-0029, “FIG. 5 is a flowchart showing processing performed by the above-mentioned manager 11. First, when start of a search shown in FIG. 5 is instructed, a search range is read from the hyper-parameter search range storage unit 111 (step S11), and a plurality of initial hyper-parameter
candidates are generated within the search range (step S12). Since this candidate generation is an initial value search, the random system is adopted. Generated candidates are issued as tasks to arbitrary workers 12-i to instruct them to perform learning (step S13), and the end of the tasks is waited for (step S14). Upon receiving a response indicating the end of a task from each worker 12-i, the manager receives a result of learning from the same (step S15)… [i]n contrast, if there is no other search, subsequent hyper-parameter candidates that reflect the results of learning
collected in the steps up to step S16 are generated (step S17). Since past search results are prepared for candidate generation at this time, the Bayesian method is adopted. Generated candidates are issued as tasks to arbitrary workers 12-i to instruct them to perform learning (step S18), and the end of the tasks is waited for (step S19). Upon receiving a response indicating the end of a task from each worker 12-i, the manager receives therefrom a result of learning (step
S20).” ).  
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to combine the teachings of Wang in view of Groce and in view of Mueller and further in view of Doniwa, the motivation to do so would be to speed up the search for optimal machine learning hyperparameters for deep learning through the use of cluster based computing(Doniwa, paras. 0026-0027, “FIG. 4 shows the structure of a deep neural network, and the types of hyper-parameters processed by the respective layers of the deep neural network. In the deep neural network, if the number of network layers is small, and there are three types of hyper-parameters, each of which can assume three values, the combinations of the hyper-parameters is                         
                            
                                
                                    3
                                
                                
                                    3
                                
                            
                            =
                            27
                        
                    . However, if the number of layers of the deep neural network is 7 as shown in FIG. 4, and each hyperparameter can assume three values, the combinations of the
hyper-parameters is                         
                            
                                
                                    3
                                
                                
                                    7
                                
                            
                        
                     =2,187. Supposing that one hour is required for one-time learning of this deep neural network, 2,187 hours (about 91 days) are required for obtaining an optimal combination. Thus, it is very difficult to obtain the optimal combination. In light of the above, the server system of the embodiment is made to have a cluster structure comprising one server 11 called a manager, and a plurality of servers 12-i called workers, thereby realizing an efficient and fast search for an optimal combination of hyper-parameters.”).  
Regarding claim 16, Wang in view of Groce and in view of Mueller and further in view of Doniwa teaches the non-transitory computer-readable medium of claim 15, wherein each session of the plurality of sessions includes a plurality of worker computing devices(Doniwa, para. 0028, see also fig. 5, “FIG. 5 is a flowchart showing processing performed by the above-mentioned manager 11. First, when start of a search shown in FIG. 5 is instructed, a search range is read from the hyper-parameter search range storage unit 111 (step S11), and a plurality of initial hyper-parameter candidates are generated within the search range (step S12). Since this candidate generation is an initial value search, the random system is adopted. Generated candidates are issued as tasks to arbitrary workers 12-i to instruct them to perform learning (step S13), and the end of the tasks is waited for (step S14). Upon receiving a response indicating the end of a task from each worker 12-i, the manager receives a result of learning from the same (step S15).”).  
Regarding claim 17, Wang in view of Groce and in view of  Mueller and further in view of Doniwa teaches the non-transitory computer-readable medium of claim 16, wherein in (A), the software under test is executed with a test dataset input to the software under test, wherein the test dataset is distributed across the plurality of worker computing devices of each session of the plurality of sessions(Doniwa, para. 0041, “It is known that deep learning utilizing a neural network requires a long learning period. In order to shorten the learning period, the amount of learning data used by each worker 12-i during learning may be halved.”).  
Regarding claim 18, Wang in view of Groce and in view of Mueller teaches the non-transitory computer-readable medium of claim 1, but does not teach wherein in (E), the software under test is executed in parallel using a plurality of sessions, wherein each session of the plurality of sessions is assigned a different test configuration of the third plurality of test configurations. 
 However, Doniwa teaches: wherein in (E), the software under test is executed in parallel using a plurality of sessions, wherein each session of the plurality of sessions is assigned a different test configuration of the third plurality of test configurations(Doniwa, para. 0016, see also fig. 1 and fig. 5, “FIG. 1 is a block diagram showing a specific configuration of a hyper-parameter search system according to the embodiment. This system is a server system of a cluster configuration, wherein a server (hereinafter, referred to as a manager) 11 called a manager, and a plurality (four in the embodiment) of servers (hereinafter, referred to as workers) 12-i (i is any one of 1 to 4), are connected to a network 13” & see also Doniwa, para. 0028-0029, “FIG. 5 is a flowchart showing processing performed by the above-mentioned manager 11. First, when start of a search shown in FIG. 5 is instructed, a search range is read from the hyper-parameter search range storage unit 111 (step S11), and a plurality of initial hyper-parameter
candidates are generated within the search range (step S12). Since this candidate generation is an initial value search, the random system is adopted. Generated candidates are issued as tasks to arbitrary workers 12-i to instruct them to perform learning (step S13), and the end of the tasks is waited for (step S14). Upon receiving a response indicating the end of a task from each worker 12-i, the manager receives a result of learning from the same (step S15)… [i]n contrast, if there is no other search, subsequent hyper-parameter candidates that reflect the results of learning
collected in the steps up to step S16 are generated (step S17). Since past search results are prepared for candidate generation at this time, the Bayesian method is adopted. Generated candidates are issued as tasks to arbitrary workers 12-i to instruct them to perform learning (step S18), and the end of the tasks is waited for (step S19). Upon receiving a response indicating the end of a task from each worker 12-i, the manager receives therefrom a result of learning (step
S20).” ).  
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to combine the teachings of Wang in view of Groce and in view of Mueller and further in view of Doniwa, the motivation to do so would be to speed up the search for optimal machine learning hyperparameters for deep learning through the use of cluster based computing(Doniwa, paras. 0026-0027, “FIG. 4 shows the structure of a deep neural network, and the types of hyper-parameters processed by the respective layers of the deep neural network. In the deep neural network, if the number of network layers is small, and there are three types of hyper-parameters, each of which can assume three values, the combinations of the hyper-parameters is                         
                            
                                
                                    3
                                
                                
                                    3
                                
                            
                            =
                            27
                        
                    . However, if the number of layers of the deep neural network is 7 as shown in FIG. 4, and each hyperparameter can assume three values, the combinations of the
hyper-parameters is                         
                            
                                
                                    3
                                
                                
                                    7
                                
                            
                        
                     =2,187. Supposing that one hour is required for one-time learning of this deep neural network, 2,187 hours (about 91 days) are required for obtaining an optimal combination. Thus, it is very difficult to obtain the optimal combination. In light of the above, the server system of the embodiment is made to have a cluster structure comprising one server 11 called a manager, and a plurality of servers 12-i called workers, thereby realizing an efficient and fast search for an optimal combination of hyper-parameters.”).  
Regarding claim 19, Wang in view of Groce and in view of Mueller and further in view of Doniwa teaches the non-transitory computer-readable medium of claim 18, wherein each session of the plurality of sessions includes a plurality of worker computing devices(Doniwa, para. 0028, see also fig. 5, “FIG. 5 is a flowchart showing processing performed by the above-mentioned manager 11. First, when start of a search shown in FIG. 5 is instructed, a search range is read from the hyper-parameter search range storage unit 111 (step S11), and a plurality of initial hyper-parameter candidates are generated within the search range (step S12). Since this candidate generation is an initial value search, the random system is adopted. Generated candidates are issued as tasks to arbitrary workers 12-i to instruct them to perform learning (step S13), and the end of the tasks is waited for (step S14). Upon receiving a response indicating the end of a task from each worker 12-i, the manager receives a result of learning from the same (step S15).”).  
Regarding claim 20, Wang in view of Groce and in view of Mueller and further in view of Doniwa teaches the non-transitory computer-readable medium of claim 19, wherein in (A), the software under test is executed with a test dataset input to the software under test, wherein the test dataset is distributed across the plurality of worker computing devices of each session of the plurality of sessions(Doniwa, para. 0041, “It is known that deep learning utilizing a neural network requires a long learning period. In order to shorten the learning period, the amount of learning data used by each worker 12-i during learning may be halved.”).  
Claims 22-23 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. US 2020/0065712 Al (“Wang”) in view of Groce, et al. "You are the only possible oracle: Effective test selection for end users of interactive machine learning systems." IEEE Transactions on Software Engineering 40.3 (2013)(“Groce”) and in view of Mueller et al.  US 2021/0326717 Al (“Mueller”) and further in view of B'Far et al. US 8,954,309 B2(“B'Far”).
Regarding claim 22, Wang in view of Groce and in view of  Mueller teaches the non-transitory computer-readable medium of claim 1, but does not teach wherein the predicted test result is a memory consumption by the software under test with a test configuration of the third plurality of test configurations. 
However B'Far teaches: wherein the predicted test result is a memory consumption by the software under test with a test configuration of the third plurality of test configurations(B'Far, col. 13, lines 6-19, see also fig. 9, “[T]he performance of the system is measured 906. As discussed, measuring the system's performance while it is being simulated may include measuring one or more performance characteristics of the system. Performance characteristics may include latency, throughput, memory usage, processor usage, and, generally, any characteristic of the system's performance. Because of the complexity of systems being tested, simulating the system and measuring the system's performance may be performed several times to improve the accuracy of the measurements. In an embodiment, the system is simulated between six and ten times for each configuration, although the system may be simulated a different number of times, which may depend on the cost and resources available for simulating the system.”).
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to combine the teachings of Wang in view of Groce and in view of Mueller and further in view of B'Far, the motivation to do so would be to combine simulations with traditional parameter optimization  methods (B'Far, col. 5, lines 46-65, “[T]he first model is used, along with design of experiment techniques, to find an optimal set of simulations to employ in order to collect data that is used to create the second model. The second model, in an embodiment, predicts performance based on configuration settings, and can also be used, via optimization techniques, to recommend configuration settings to tune an application's performance so that it meets given goals. The system may be configured as a generalized off-line automated process for creating a statistical model of the performance characteristics of a system, which may be then coupled with an optimization algorithm to create a decision support system (DSS) that recommends configuration settings that will best achieve desired performance characteristics. The process and system may combine and adapt a number of algorithms for optimization, machine learning, and statistical analysis. In an embodiment, the system is configured to model and optimize enterprise applications, but can be adapted to tune any configurable system, such as configurable IT applications or, generally, automated processes.”). 
Regarding claim 23, Wang in view of Groce and in view of Mueller teaches the non-transitory computer-readable medium of claim 1, but does not teach wherein the predicted test result is an execution time required by the software under test with a test configuration of the third plurality of test configurations.
However B'Far teaches:  wherein the predicted test result is an execution time required by the software under test with a test configuration of the third plurality of test configurations(B'Far, col. 13, lines 6-19, see also fig. 9, “[T]he performance of the system is measured 906. As discussed, measuring the system's performance while it is being simulated may include measuring one or more performance characteristics of the system. Performance characteristics may include latency, throughput, memory usage, processor usage, and, generally, any characteristic of the system's performance. Because of the complexity of systems being tested, simulating the system and measuring the system's performance may be performed several times to improve the accuracy of the measurements. In an embodiment, the system is simulated between six and ten times for each configuration, although the system may be simulated a different number of times, which may depend on the cost and resources available for simulating the system.”). 
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to combine the teachings of Wang in view of Groce and in view of  Mueller and further in view of B'Far, the motivation to do so would be to combine simulations with traditional parameter optimization  methods (B'Far, col. 5, lines 46-65, “[T]he first model is used, along with design of experiment techniques, to find an optimal set of simulations to employ in order to collect data that is used to create the second model. The second model, in an embodiment, predicts performance based on configuration settings, and can also be used, via optimization techniques, to recommend configuration settings to tune an application's performance so that it meets given goals. The system may be configured as a generalized off-line automated process for creating a statistical model of the performance characteristics of a system, which may be then coupled with an optimization algorithm to create a decision support system (DSS) that recommends configuration settings that will best achieve desired performance characteristics. The process and system may combine and adapt a number of algorithms for optimization, machine learning, and statistical analysis. In an embodiment, the system is configured to model and optimize enterprise applications, but can be adapted to tune any configurable system, such as configurable IT applications or, generally, automated processes.”). 
Claims 24-26 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. US 2020/0065712 Al (“Wang”) in view of Groce, et al. "You are the only possible oracle: Effective test selection for end users of interactive machine learning systems." IEEE Transactions on Software Engineering 40.3 (2013)(“Groce”) and in view of Mueller et al.  US 2021/0326717 Al (“Mueller”) and further in view of (2019, April 7). Conditional entropy. Wikipedia. Retrieved February 17, 2022, from https://web.archive.org/web/20190428121511/https://en.wikipedia.org/wiki/Conditional_entropy(“Wikipedia”). 
Regarding claim 24, Wang in view of Groce and in view of  Mueller teaches the non-transitory computer-readable medium of claim 1, but does not teach wherein the predefined number of test configurations are selected from the second plurality of test configurations based on an entropy score value computed from the predicted test result.  
However Wikipedia teaches: wherein the predefined number of test configurations are selected from the second plurality of test configurations based on an entropy score value computed from the predicted test result(Wikipedia, sec. Motivation,  “The entropy of Y conditioned on X taking the value of x is defined analogously by conditional expectation:                         
                            H
                            
                                
                                    Y
                                
                                
                                    X
                                    =
                                    x
                                
                            
                            =
                            -
                            
                                
                                    ∑
                                    
                                        y
                                        ∈
                                        Y
                                    
                                
                                
                                    
                                        
                                            Pr
                                        
                                        ⁡
                                        
                                            
                                                
                                                    Y
                                                    =
                                                    y
                                                
                                                
                                                    X
                                                    =
                                                    x
                                                
                                            
                                        
                                    
                                    
                                        
                                            
                                                
                                                    log
                                                
                                                
                                                    2
                                                
                                            
                                        
                                        ⁡
                                        
                                            
                                                
                                                    Pr
                                                
                                                ⁡
                                                
                                                    
                                                        
                                                            Y
                                                            =
                                                            y
                                                        
                                                        
                                                            X
                                                            =
                                                            x
                                                        
                                                    
                                                
                                            
                                            .
                                        
                                    
                                
                            
                        
                    ”).
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to combine the teachings of Wang in view of Groce and in view of Mueller and further in view of Wikipedia, the motivation to do so would be to have a test result that took  into consideration prior information(Wikipedia, top-page, “In information theory, the conditional entropy…quantifies the amount of information needed to describe the outcome of a random variable Y given that the value of another random X is known.”).
Regarding claim 25, Wang in view of Groce and in view of Mueller and further in view of Wikipedia teaches the non-transitory computer-readable medium of claim 24, wherein the predefined number of test configurations are selected based on having maximum values for the entropy score value computed from the predicted test result(Wikipedia, sec. Motivation,  “The entropy of Y conditioned on X taking the value of x is defined analogously by conditional expectation:                         
                            H
                            
                                
                                    Y
                                
                                
                                    X
                                    =
                                    x
                                
                            
                            =
                            -
                            
                                
                                    ∑
                                    
                                        y
                                        ∈
                                        Y
                                    
                                
                                
                                    
                                        
                                            Pr
                                        
                                        ⁡
                                        
                                            
                                                
                                                    Y
                                                    =
                                                    y
                                                
                                                
                                                    X
                                                    =
                                                    x
                                                
                                            
                                        
                                    
                                    
                                        
                                            
                                                
                                                    log
                                                
                                                
                                                    2
                                                
                                            
                                        
                                        ⁡
                                        
                                            
                                                
                                                    Pr
                                                
                                                ⁡
                                                
                                                    
                                                        
                                                            Y
                                                            =
                                                            y
                                                        
                                                        
                                                            X
                                                            =
                                                            x
                                                        
                                                    
                                                
                                            
                                            .
                                        
                                    
                                
                            
                        
                    ”).  
Regarding claim 26, Wang in view of Groce and in view of Mueller and further in view of Wikipedia teaches the non-transitory computer-readable medium of claim 24, wherein the entropy score is computed using                        
                             
                            H
                            
                                
                                    x
                                
                            
                            =
                            -
                            
                                
                                    ∑
                                    
                                        y
                                        ∈
                                        Y
                                    
                                
                                
                                    
                                        
                                            Pr
                                        
                                        ⁡
                                        
                                            
                                                
                                                    y
                                                
                                                
                                                    x
                                                
                                            
                                        
                                    
                                    
                                        
                                            log
                                        
                                        ⁡
                                        
                                            
                                                
                                                    Pr
                                                
                                                ⁡
                                                
                                                    
                                                        
                                                            y
                                                        
                                                        
                                                            x
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                     where H(x) is an entropy score value, P(ylx) is a posterior probability of a predicted test result y given a respective test configuration x of the second plurality of test configurations, and Y is a set of possible test results(Wikipedia, sec. Motivation,  “The entropy of Y conditioned on X taking the value of x is defined analogously by conditional expectation:                         
                            H
                            
                                
                                    Y
                                
                                
                                    X
                                    =
                                    x
                                
                            
                            -
                            
                                
                                    ∑
                                    
                                        y
                                        ∈
                                        Y
                                    
                                
                                
                                    
                                        
                                            Pr
                                        
                                        ⁡
                                        
                                            
                                                
                                                    Y
                                                    =
                                                    y
                                                
                                                
                                                    X
                                                    =
                                                    x
                                                
                                            
                                        
                                    
                                    
                                        
                                            
                                                
                                                    log
                                                
                                                
                                                    2
                                                
                                            
                                        
                                        ⁡
                                        
                                            
                                                
                                                    Pr
                                                
                                                ⁡
                                                
                                                    
                                                        
                                                            Y
                                                            =
                                                            y
                                                        
                                                        
                                                            X
                                                            =
                                                            x
                                                        
                                                    
                                                
                                            
                                            .
                                        
                                    
                                
                            
                        
                    ”).  

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Adam Clark Standke whose telephone number is (571)270-1806. The examiner can normally be reached 10AM-7PM M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Adam Clark Standke
Assistant Examiner
Art Unit 2129



/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
        2 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
        3 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.