DETAILED ACTION
Response to Arguments
Applicant’s arguments, see Remarks, filed 08/29/2022, with respect to the rejection of claims 1-30 under 35 U.S.C. § 103 have been fully considered and are persuasive.  Accordingly, the Final Rejection of 06/20/2022 has been withdrawn. 
Applicant’s arguments submitted on 08/29/2022 with respect to independent claims 1, 29 and 30 have been considered but are moot because the new ground of rejection does not rely on the previous references applied in the prior rejection of record for any teaching or matter specifically challenged by Applicant’s arguments.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/27/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 28 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. Claim 28 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being incomplete for omitting essential elements, such omission amounting to a gap between the elements.  See MPEP § 2172.01.  The omitted elements are: not detailing beforehand a first predicted test result when discussing a second predicted test result as claim 28 does. Hence claim 28 is indefinite.  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 4-7, 11-14, 21, and 27-30 are rejected under 35 U.S.C. 103 as being unpatentable over in view of Chakkrit, et al. "The impact of automated parameter optimization on defect prediction models." IEEE Transactions on Software Engineering 45.7 (2018)(“Chakkrit”) and in view of Goth, et al. DE 102019210562 A1(“Goth”) and in view of Groce, et al. "You are the only possible oracle: Effective test selection for end users of interactive machine learning systems." IEEE Transactions on Software Engineering 40.3 (2013)(“Groce”). 
Regarding claim 1, Chakkrit teaches a non-transitory computer-readable medium having stored thereon computer-readable instructions that when executed by a computing device cause the computing device to: (A) execute software under test with a first plurality of test configurations to generate a test result for each test configuration of the first plurality of test configurations, wherein each test configuration of the first plurality of test configurations includes a value for each test parameter of the plurality of test parameters, wherein each test parameter of the plurality of test parameters is an input to the software under test(Nitish, pg. 689, right-column, see also figs. 1, 2, and table 1, “Generate candidate parameter settings. The train function will generate candidate parameter settings based on a given budget threshold…for evaluation. The budget threshold indicates the number of different values to be evaluated for each parameter… [t]able 1 shows the candidate parameter settings for each of the examined parameters. The default settings are shown in bold typeface.”); (B) train a predictive model using each test configuration of the first plurality of test configurations and a target variable value that is the test result generated for each test configuration of the first plurality of test configurations based on an objective function value(Nitish, pg. 690, left-column, “To measure the impact of automated parameter optimization on defect prediction models, we train our models using the optimized… settings.”); (C) execute the trained predictive model with a second plurality of test configurations to predict the test result for each test configuration of the second plurality of test configurations, wherein each test configuration of the second plurality of test configurations includes the value for each test parameter of the plurality of test parameters(Nitish, pg. 690, left-column, see also Table 3,  “We apply the defect prediction models that we train using the training corpus to the testing corpus in order to measure their performance… [t]able 3 provides the definitions and descriptions of our 9 threshold-dependent and 3 threshold-independent performance measures. In total, we study 12 performance measures that are commonly-used in defect prediction studies.”); (D) select a predefined number of test configurations from the second plurality of test configurations based on the predicted test result for each test configuration of the second plurality of test configurations, wherein the selected predefined number of test configurations define a third plurality of test configurations(Nitish, pgs. 691-692, see also fig. 3(5.5.1 Calculate a generic variable importance score), “For each testing dataset, we first randomly permute the values of the variable, producing a dataset with that one variable permuted, while all other variables as is.”); and (F) output the generated test result for each test configuration of the first plurality of test configuration(Nitish, pg. 693, left-column, see also fig. 4 (AUC Performance Improvement), “Fig. 4 shows the performance improvement for each of the 18 studied datasets and for each of the classification techniques. The boxplots show that optimization can improve the AUC performance by up to 40 percentage points.” ) and for each test configuration of the third plurality of test configurations to identify errors in the software under test(Nitish, pg. 691-692, As detailed by fig. 3, after the original testing dataset has been randomly permuted in Step 1,  the randomly permuted testing dataset is output before being inputted into Step 2 of computing the misclassification rate.). 
Chakkrit does not teach: (E) execute the software under test with the defined third plurality of test configurations to generate the test result for each test configuration of the third plurality of test configurations, wherein each test configuration of the third plurality of test configurations includes the value for each test parameter of the plurality of test parameters. 
However, Goth  teaches (E) execute the software under test with the defined third plurality of test configurations to generate the test result for each test configuration of the third plurality of test configurations, wherein each test configuration of the third plurality of test configurations includes the value for each test parameter of the plurality of test parameters (Goth, pg. 3, “[T]he generated regression model is used to select the most promising test cases from the cluster data                         
                            
                                
                                    D
                                
                                
                                    o
                                    b
                                    s
                                
                            
                        
                    . This selection of test cases could then be used - in terms of time and computing power - more complex checks for runtime errors that require instrumentation of the source code. Therefore, only test cases that penetrate the software to the maximum are executed.”).1 
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to combine the teachings of Chakkrit in view of Goth the motivation to do so would be to speed up the software testing process through the use of clustering analysis before the tests are executed (Goth, pg. 2, “Conventional fuzzers such as AFL are dependent on the execution of many test cases (sometimes hundreds or thousands per second), which is not feasible for slow software - especially software systems that require some form of data transmission. It is currently up to the developer of the fuzz wrapper to find a fast enough solution to this problem… [a]gainst this background, a main idea of the proposed approach is to subject the input queue or the corpus to a clustering analysis before
the actual test execution in order to select an optimal input from each cluster and in this way to accelerate the fuzz testing or a test metric (e.g. coverage) to maximize.”).2   
Chakkrit also does not teach: wherein the test result includes at least one of a severity code, a memory consumption value, or an execution time value, wherein the severity code indicates whether or not execution of the software under test failed using a respective test configuration, wherein the memory consumption value indicates an amount of computer memory used for execution of the software under test using the respective test configuration, wherein the execution time value indicates an amount of time used for execution of the software under test using the respective test configuration; wherein the third plurality of test configurations are selected to maximize the memory consumption value or the execution time value or to result in failed execution of the software under test.
However, Groce teaches: wherein the test result includes at least one of a severity code, a memory consumption value, or an execution time value, wherein the severity code indicates whether or not execution of the software under test failed using a respective test configuration, wherein the memory consumption value indicates an amount of computer memory used for execution of the software under test using the respective test configuration, wherein the execution time value indicates an amount of time used for execution of the software under test using the respective test configuration(Groce, pg. 311, left-column, “From a software testing perspective, CONFIDENCE, a method based on prioritizing test cases in ascending order of p in (i,                         
                            l
                        
                    , p)(such that cases where the label has the lowest probability are tested first), is analogous to asking the software’s original programmer to prioritize testing code most likely to fail—but in our case, the “programmer” is also software. Thus, the CONFIDENCE approach is a prioritization method that capitalizes on the ability of classifiers to “find their own bugs” by selecting cases where they have low confidence… [c]onfidence can be measured in a variety of ways. We compute confidence as the magnitude of the probability assigned to the most likely labeling, and prioritize test cases according to those with the lowest probabilities. CONFIDENCE selects ambiguous test instances—instances that fall on or close to decision boundaries.”)3; wherein the third plurality of test configurations are selected to maximize the memory consumption value or the execution time value or to result in failed execution of the software under test(Groce, pg. 313, left-column, “Fig. 3 graphs these differences: the efficiencies shown are averages of rates over all suites (whose sizes ranged from five to 25 test cases) and all classifiers at each training set size. For all but the smallest training sets (200-500 instances), differences between all pairs of methods, except where data points coincide, are significant at the 95 percent confidence level. Fig. 3g shows 95 percent confidence intervals at three training set sizes of Fig. 3a’s configuration. As the Fig. 3 illustrates, the best methods were very efficient at identifying failures. For example, consider the RANDOM line in Fig. 3c. RANDOM is statistically guaranteed to detect failures at the rate they occur, and thus is also a statistical representative of the classifier’s accuracy. This indicator shows that the Reuters SVM classifier was extremely accurate when trained on 2,000 instances (the rightmost point on the x-axis), with a failure rate of only 3.5 percent. Even given this extremely accurate classifier, 63 percent of the CONFIDENCE-generated test cases detected failures.”)4; (Groce, pg. 316, left-column, “Our prototype prioritizes the classifier’s topic predictions that are most likely to be wrong, and communicates these prioritizations using saturated green squares to draw a user’s eye (e.g., Fig. 1, fourth message). The prioritizations may not be perfect, but they are only intended to be advisory; users are free to test any messages they want, not just ones the system suggests.”).	
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to combine the teachings of Chakkrit with the above teachings of Groce, the motivation to do so would be to test machine learning classifiers just as typical software is tested before being put into production(Groce, pgs. 307-308, “A classifier can mislabel an input (fail) even if the algorithm that generated the classifier is correctly implemented. Failures occur for a variety of reasons such as noise in the training data, overfitting due to a limited amount of training data, underfitting because the classifier’s decision boundaries are not expressive enough to correctly capture the concept to be learned, and sample selection bias, where training instances are biased towards inputs uncommonly seen in real-world usage. Further, many machine learning systems are interactive and continually learn from their users’ behavior. An interactive machine learning system may apply different output labels to the same input from one week to the next, based on how its user has recently interacted with the system… [t]his paper encapsulates our previous work, which introduced a user interface (WYSIWYT/ML)…to support end users testing classifiers, and makes the following additional
contributions… [p]roposes a methodology for experimentally evaluating classifier testing methods for humans as a filtering process before costly user studies… [i]nvestigates our test selection methods in large-scale automated experiments, over a larger variety of benchmarks than possible with humans… [e]valuates which found failures are unexpected, making them both especially challenging and important to detect, a problem difficult to explore in human studies.”). 
Regarding claim 2, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 1, wherein a model type of the predictive model is a random forest model type(Chakkrit, pg. 693, “Fortunately, random forest that is popularly-used in defect prediction studies tends to have negligible to small impact on the
AUC performance of defect prediction models. Fig. 4 shows that optimization improves the performance of random forest classifiers as little as 2 percent of Brier and 5 percent of
AUC. However, the random forest classifier tends to have a larger impact on threshold-dependent performance measures. Indeed, we observe that optimization improves the
performance of random forest classifiers by up to 12 percent of Precision, 18 percent of Recall, and 15 percent of F-measure. This finding provides supporting evidence…that automated parameter optimization impacts the threshold-dependent measures of random forest classifiers.
Moreover, this finding suggests that random forest classifiers tend to robust to parameter settings when considering threshold-independent measures.”).  
Regarding claim 4, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 1, wherein after (E), the computer-readable instructions further cause the computing device to: determine an importance value for each test parameter of the plurality of test parameters using a second predictive model trained using each test configuration of the first plurality of test configurations in association with the test result generated for each test configuration of the first plurality of test configurations, and using each test configuration of the third plurality of test configurations in association with the test result generated for each test configuration of the third plurality of test configurations; and output the determined importance value for each test parameter of the plurality of test parameters(Chakkrit, pgs. 691-692, see also fig. 3, “[In step 2], [w]e then compute the difference in the misclassification rates of defect prediction models that are trained using the clean datasets and the datasets with the randomly-permuted variables. The larger the difference, the greater the importance of that particular variable. We repeat the Steps 1 and 2 for each variable in order to produce a variable important score across all the variables. Since the experiment is repeated 100 times, each variable will have several variable importance scores (i.e., one score for each of the repetitions.).”).
Regarding claim 5, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 4, wherein a model type of the second predictive model is a random forest model type(Chakkrit, pg. 693, “Fortunately, random forest that is popularly-used in defect prediction studies tends to have negligible to small impact on the AUC performance of defect prediction models. Fig. 4 shows that optimization improves the performance of random forest classifiers as little as 2 percent of Brier and 5 percent of AUC. However, the random forest classifier tends to have a larger impact on threshold-dependent performance measures. Indeed, we observe that optimization improves the performance of random forest classifiers by up to 12 percent of Precision, 18 percent of Recall, and 15 percent of F-measure. This finding provides supporting evidence…that automated parameter optimization impacts the threshold-dependent measures of random forest classifiers. Moreover, this finding suggests that random forest classifiers tend to robust to parameter settings when considering threshold-independent measures.”).
Regarding claim 6, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 1, wherein each test configuration of the first plurality of test configurations is unique relative to other test configurations of the first plurality of test configurations(Chakkrit, pg. 689, left-column, see also fig. 2 and table 1, “To address the first six research questions, we use the grid search optimized parameter settings as suggested by thetrain function of the caret R package… [f]ig. 2 provides an overview of the grid-search parameter optimization process. The optimization process is made up of three steps. (Step 1) Generate candidate parameter settings. The train function will generate candidate parameter settings based on a given budget threshold (i.e., the tuneLength) for evaluation. The budget threshold indicates the number of different values to be evaluated for each parameter…[f]or example, the number of boosting iterations of the C5.0 classification technique is initialized to 1 and is increased by 10 until the number of candidate settings reaches the budget threshold (e.g., 1, 11, 21, 31, 41). Table 1 shows the candidate parameter settings for each of the examined parameters. The default settings are shown in bold typeface.”).  
Regarding claim 7, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 1, wherein each test configuration of the third plurality of test configurations is unique relative to other test configurations of the third plurality of test configurations and is unique relative to the first plurality of test configurations(Chakkrit, pg. 689, left-column, see also fig. 2 and table 1, “To address the first six research questions, we use the grid search optimized parameter settings as suggested by thetrain function of the caret R package… [f]ig. 2 provides an overview of the grid-search parameter optimization process. The optimization process is made up of three steps. (Step 1) Generate candidate parameter settings. The train function will generate candidate parameter settings based on a given budget threshold (i.e., the tuneLength) for evaluation. The budget threshold indicates the number of different values to be evaluated for each parameter…[f]or example, the number of boosting iterations of the C5.0 classification technique is initialized to 1 and is increased by 10 until the number of candidate settings reaches the budget threshold (e.g., 1, 11, 21, 31, 41). Table 1 shows the candidate parameter settings for each of the examined parameters. The default settings are shown in bold typeface.”).  
Regarding claim 11, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 1, wherein before (A), the computer-readable instructions further cause the computing device to 42Atty. Dkt. No.: 04500-0120-02 (100290) define the first plurality of test configurations based on random selection of a value for each test parameter of the plurality of test parameters, wherein the value is selected between a minimum value and a maximum value defined for each test parameter of the plurality of test parameters(Chakkrit, pg. 701, left-column, “Iteration indicates maximum number of the combination of different parameter settings to be evaluated for a classification technique…[u]nlike grid search technique that needs pre-defined candidate parameter settings, a random search technique randomly generates candidate parameter settings based on a given iteration threshold. For example, an iteration threshold of 5 will limit the number of candidate parameter settings for each classification technique to 5. Thus, regardless of the number of parameter settings, the random search
technique always generates 5 combinations of parameter settings for a classification technique.”).  
Regarding claim 12, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 11, wherein the value is selected based on a sampling method defined for each test parameter of the plurality of test parameters(Chakkrit, pg. 701, left-column, “Unlike grid search technique that needs pre-defined candidate parameter settings, a random search technique randomly generates candidate parameter settings based on a given iteration threshold.”).  
Regarding claim 13, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 1, wherein before (A), the computer-readable instructions further cause the computing device to define the plurality of test configurations based on random selection of a value for each test parameter of the plurality of test parameters, wherein the value is selected between a minimum value and a maximum value defined for each test parameter of the plurality of test parameters, wherein the first plurality of test configurations are randomly selected from the defined plurality of test configurations(Chakkrit, pgs. 701-703, left-column, “Iteration indicates maximum number of the combination of different parameter settings to be evaluated for a classification technique…[u]nlike grid search technique that needs pre-defined candidate parameter settings, a random search technique randomly generates candidate parameter settings based on a given iteration threshold. For example, an iteration threshold of 5 will limit the number of candidate parameter settings for each classification technique to 5. Thus, regardless of the number of parameter settings, the random search technique always generates 5 combinations of parameter settings for a classification technique.”).   
Regarding claim 14, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 13, wherein the second plurality of test configurations are the defined plurality of test configurations after excluding the selected first plurality of test configurations(Chakkrit, pgs. 688-689, see also fig. 1, “In order to ensure that our conclusions are robust, we use the out-of-sample bootstrap validation technique…[a] bootstrap sample of size N is randomly drawn with replacement from an original dataset, which
is also of size N. A model is trained using the bootstrap sample and tested using the rows that do not appear in the bootstrap sample… [u]nlike the ordinary bootstrap, the out-of-sample bootstrap
technique fits models using the bootstrap samples, but rather than testing the model on the original sample, the model is instead tested using the rows that do not appear in the bootstrap sample…[t]hus, the training and testing corpora do not share overlapping observations.”).  
Regarding claim 21, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 1, wherein the predicted test result is a probability that execution of the software under test with a test configuration of the third plurality of test configurations has the severity code that indicates execution of the software under test failed(Goth, pg. 3, “Promising in this context means that this data guarantees the greatest possible code coverage and has a high probability of leading to a software crash. It is
therefore important how certain a prediction is in order to decide whether it makes sense to use the test case in question. In order to learn a basic model, it is advisable to determine the parameters of a Gaussian process using the data                         
                            
                                
                                    D
                                
                                
                                    o
                                    b
                                    s
                                
                            
                        
                      from process 13th to investigate, by maximizing the logarithmic plausibility function (log likelihood):                        
                            a
                            r
                            g
                            m
                            a
                            
                                
                                    x
                                
                                
                                    θ
                                
                            
                             
                            l
                            o
                            g
                            ⁡
                            (
                            p
                            (
                            y
                            |
                            X
                            ,
                             
                            θ
                            )
                            )
                        
                    .”).5  
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Chakkrit with the above teachings of Goth for the same rationale stated at Claim 1.
Regarding claim 27, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 1, wherein the predefined number of test configurations are selected based on having maximum values for the predicted test result(Chakkrit, pg. 688, left-column, see also Table 2, “An influential characteristic in the performance of a classification technique is the number of Events Per Variable (EPV)… i.e., the ratio of the number of occurrences of the least frequently occurring class of the dependent variable (i.e., the number of defective modules) to the number of independent variables that are used to train the model (i.e., the numbers of variables)… [l]arger EPV values indicate the lower risk of producing unstable results…defect prediction models that are trained using datasets with low EPV values are especially susceptible to unstable results…[t]o mitigate this risk, we choose to study datasets that have an EPV above 10…[t]o satisfy criterion 2, we exclude the 78 datasets that we found to have an EPV value that is below 10.”).  
Regarding claim 28, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 1, wherein the predefined number of test configurations are further selected from the second plurality of test configurations based on a second predicted test result(Chakkrit, pg. 688, right-column, As Table 2 details in addition to predicting defective rates (i.e. failures) the column Granularity details the code being tested (i.e. methods, class or files)).  
Referring to independent claims 29-30, they are rejected on the same basis as
independent claim 1 since they are analogous claims.
Claims 3 is rejected under 35 U.S.C. 103 as being unpatentable over Chakkrit, et al. "The impact of automated parameter optimization on defect prediction models." IEEE Transactions on Software Engineering 45.7 (2018)(“Chakkrit”) and in view of Goth, et al. DE 102019210562 A1(“Goth”) and in view of Groce, et al. "You are the only possible oracle: Effective test selection for end users of interactive machine learning systems." IEEE Transactions on Software Engineering 40.3 (2013)(“Groce”) and further in view of Adams et al. US 2014/0358831 Al(“Adams”). 
Regarding claim 3, Chakkrit in view of Goth and in view of Groce do teach the non-transitory computer-readable medium of claim 1, but does not teach: wherein a model type of the predictive model is a Bayesian optimization model type.  
However, Adams teaches: wherein a model type of the predictive model is a Bayesian optimization model type(Adams, para. 0078, “Bayesian optimization techniques described herein involve generating a probabilistic model of an objective function for a particular task (e.g., an objective function relating hyperparameters of a machine learning system to its performance). Any suitable type of probabilistic model of the objective function may be used. In some embodiments, the probabilistic model may comprise a Gaussian process, which is a stochastic process that specifies a distribution over functions. A Gaussian process may be specified by a mean function m:                         
                            X
                            →
                            R
                        
                     and a covariance function (sometimes termed "kernel"
function). For example, when the objective function relates hyper-parameters of a machine learning system to its performance, the Gaussian process is defined on the space of hyperparameters such that the mean function maps sets of hyperparameter values ( each set of hyper-parameter values corresponding to values of one or more hyper-parameters of the machine learning system) to real numbers and the covariance function represents correlation among sets of hyperparameter values.”).
	 Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to combine the teachings of Chakkrit in view of Goth and in view of Groce and further in view of Adams, the motivation to do so would be to have an optimization approach for learning machine learning hyperparameters with minimal computations(Adams, para. 0062, “Accordingly, optimization techniques that require a closed-form analytic representation of the objective function (e.g., techniques that require calculation of gradients) and/or a large number of objective function evaluations ( e.g., interior point methods) are generally not viable approaches to identifying hyper-parameter values of machine learning systems. On the other hand, Bayesian optimization techniques require neither exact knowledge of the objective function nor a large number of objective function evaluations. Although Bayesian optimization techniques rely on evaluations of the objective function, they are designed to reduce the number of such evaluations.”).  

Claims 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over Chakkrit, et al. "The impact of automated parameter optimization on defect prediction models." IEEE Transactions on Software Engineering 45.7 (2018)(“Chakkrit”) and in view of Goth, et al. DE 102019210562 A1(“Goth”) and in view of Groce, et al. "You are the only possible oracle: Effective test selection for end users of interactive machine learning systems." IEEE Transactions on Software Engineering 40.3 (2013)(“Groce”) and further in view of Patel, Dhaval, et al. "Smart-ML: A System for Machine Learning Model Exploration using Pipeline Graph." 2020 IEEE International Conference on Big Data (Big Data). IEEE, 2020(“Patel”). 
Regarding claim 8, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 1, but do not teach: wherein after (E), the computer-readable instructions further cause the computing device to: (AA) train the predictive model using each test configuration of the first plurality of test configurations in association with the test result generated for each test configuration of the first plurality of test configurations and using each test configuration of the third plurality of test configurations in association with the test result generated for each test configuration of the third plurality of test configurations; 41Atty. Dkt. No.: 04500-0120-02 (100290) (BB) update the second plurality of test configurations by removing the third plurality of test configurations from the second plurality of test configurations; (CC) execute the predictive model trained in (AA) with the updated second plurality of test configurations to predict the test result for each test configuration of the second plurality of test configurations; (DD) select the predefined number of test configurations from the updated second plurality of test configurations based on the predicted test result for each test configuration of the second plurality of test configurations in (CC), wherein the selected predefined number of test configurations define a fourth plurality of test configurations, wherein each test configuration of the fourth plurality of test configurations includes the value for each test parameter of the plurality of test parameters, wherein each test configuration of the fourth plurality of test configurations is unique and is unique relative to the first plurality of test configurations and relative to the third plurality of test configurations; (EE) execute the software under test with the fourth plurality of test configurations to generate the test result for each test configuration of the fourth plurality of test configurations; and (FF) output the generated test result for each test configuration of the fourth plurality of test configurations. 
However, Patel teaches: wherein after (E), the computer-readable instructions further cause the computing device to: (AA) train the predictive model using each test configuration of the first plurality of test configurations in association with the test result generated for each test configuration of the first plurality of test configurations and using each test configuration of the third plurality of test configurations in association with the test result generated for each test configuration of the third plurality of test configurations
 (BB) update the second plurality of test configurations by removing the third plurality of test configurations from the second plurality of test configurations((Patel, pgs. 1608-1609, right-column,  see also Fig.3 , “In the first round, only the last layer of pipeline graph is executed. The execution is limited to the default parameter and does not involve any hyperparameter tuning. Note that, the last layer of pipeline graph is composed of machine learning models and it is compulsory component of any modeling activity. The execution can use spark or celery or cloud engine for speedup. The model performance obtained at the end of the first round act as an baseline for subsequent operation. We select around 50% of the top-performing models to become a candidate for the second round. The first round result is also available to the user… [i]n the third round, we yet again initiated a random search based hyperparameter tuning on the models that are selected in previous round. Compared to previous round, the number of parameters to be tried out for each models in this round is
more and adjustable. At present, we run nearly 5+ different models with 30 different parameter values in this round. Note that, some model does not have many parameters to be tuned.” 41Atty. Dkt. No.: 04500-0120-02 (100290)); (CC) execute the predictive model trained in (AA) with the updated second plurality of test configurations to predict the test result for each test configuration of the second plurality of test configurations(DD) select the predefined number of test configurations from the updated second plurality of test configurations based on the predicted test result for each test configuration of the second plurality of test configurations in (CC)( Patel, pgs. 1608-1609, see also Fig.3, “In the second round, we initiated a random search based hyperparameter tuning on the models that are selected in previous round. A randomized hyperparameter tuning is highly parallel activity. The number of parameters to be tried out for each models in the current round is adjustable but we keep it a small value such as 10. In early stage of execution, we like to control the exploration search space. In this round, we run nearly 10+ different models with 10 different randomly generated parameter values. Out of 10+ models, we select nearly 50% of the top-performing models…”), wherein the selected predefined number of test configurations define a fourth plurality of test configurations, wherein each test configuration of the fourth plurality of test configurations includes the value for each test parameter of the plurality of test parameters, wherein each test configuration of the fourth plurality of test configurations is unique and is unique relative to the first plurality of test configurations and relative to the third plurality of test configurations; (EE) execute the software under test with the fourth plurality of test configurations to generate the test result for each test configuration of the fourth plurality of test configurations; and (FF) output the generated test result for each test configuration of the fourth plurality of test configurations(Patel, pgs. 1608-1609, see also Fig.3, “In round 4 and 5, we focus on discovering length 2 pipeline
paths for each top-performing models. Given a pipeline graph                         
                            
                                
                                    G
                                
                                
                                    k
                                
                            
                        
                     for a                         
                            
                                
                                    k
                                
                                
                                    t
                                    h
                                
                            
                        
                     top-performing model, we decompose                         
                            
                                
                                    G
                                
                                
                                    k
                                
                            
                        
                      into multiple pipeline graphs of depth 2. Note that, the last layer
in each decomposed graph is same. Next, we process each decomposed graph in two stages (i.e., round 4 and round 5) to discover a length-2 pipeline paths that perform better than the
pipeline path from which it got enlarged. Round 4 is similar to Round 1, where pipelines with default parameter is tried, whereas, round 5 is similar to round 2, where randomized hyper parameter tuning is conducted on the top-performing paths outputted by round 4. After round 4 and round 5, we know what are the nodes other than nodes from last stages also helps to improve the performance. In Round 6, we use these nodes to grow a longer length patterns. Up until round 6, we use highly parallelized randomized search operations. In round 7, we apply intelligent search mechanisms for promising pipeline paths that have been discovered. In particular, given a path, we apply evolS, hbandS, bayesS, and rbfOptS on each path to discover a hyper parameter
tuning that can help to improve the performance. Instead of applying intelligent method on each and every pipeline paths, we apply it on top-performing pipeline-paths to improve the execution time.”).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chakkrit in view of Goth and in view of Groce and further in view of Patel, the motivation to do so would be to have an automated approach for optimizing machine learning pipelines without exploring the entire configuration space(Patel, pg. 1604, right-column, see also fig. 1, “The term [‘]predictive modeling[’] in the field of data science refers to the process of analysis to discover the best data transformations and modeling forms for ultimately drawing accurate and meaningful inferences from new realizations of (scoring) data. Predictive modeling is of fundamental interest to researchers in machine learning, pattern recognition, databases, statistics, artificial intelligence, knowledge acquisition for expert systems, and data visualization. In the machine learning community, researchers and or data scientists build “Pipeline” to define a series of steps to be performed on an input dataset for building model. As supported by the literature, it is not an easy task to know beforehand what options will work well for a given dataset. As a result, user needs to build and manage many pipelines manually…[t]o make the model exploration task easier, we introduce a concept of [‘]Pipeline Graph[’]. A pipeline graph defines the nature of and ordering of the operations to perform when exploring predictive models for tasks such as classification, regression, or clustering.”)
Regarding claim 9, Chakkrit in view of Goth and in view of Groce and further in view of  Patel teaches the non-transitory computer-readable medium of claim 8, wherein before (A), the computer-readable instructions further cause the computing device to define the plurality of test configurations based on random selection of a value for each test parameter of the plurality of test parameters, wherein the value is selected between a minimum value and a maximum value defined for each test parameter of the plurality of test parameters, wherein the first plurality of test configurations are randomly selected from the defined plurality of test configurations(Chakkrit, pgs. 701-703, left-column, “Iteration indicates maximum number of the combination of different parameter settings to be evaluated for a classification technique…[u]nlike grid search technique that needs pre-defined candidate parameter settings, a random search technique randomly generates candidate parameter settings based on a given iteration threshold. For example, an iteration threshold of 5 will limit the number of candidate parameter settings for each classification technique to 5. Thus, regardless of the number of parameter settings, the random search technique always generates 5 combinations of parameter settings for a classification technique.”).  
Regarding claim 10, Chakkrit in view of Goth and in view of Groce and further in view of Patel teaches the non-transitory computer-readable medium of claim 9, wherein the second plurality of test configurations are the defined plurality of test configurations after excluding the selected first plurality of test configurations(Chakkrit, pgs. 688-689, see also fig. 1, “In order to ensure that our conclusions are robust, we use the out-of-sample bootstrap validation technique…[a] bootstrap sample of size N is randomly drawn with replacement from an original dataset, which is also of size N. A model is trained using the bootstrap sample and tested using the rows that do not appear in the bootstrap sample… [u]nlike the ordinary bootstrap, the out-of-sample bootstrap technique fits models using the bootstrap samples, but rather than testing the model on the original sample, the model is instead tested using the rows that do not appear in the bootstrap sample…[t]hus, the training and testing corpora do not share overlapping observations.”). 
  
Claims 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Chakkrit, et al. "The impact of automated parameter optimization on defect prediction models." IEEE Transactions on Software Engineering 45.7 (2018)(“Chakkrit”) and in view of Goth, et al. DE 102019210562 A1(“Goth”) and in view of Groce, et al. "You are the only possible oracle: Effective test selection for end users of interactive machine learning systems." IEEE Transactions on Software Engineering 40.3 (2013)(“Groce”)  and further in view of Doniwa et al. US 2017/0169329 Al (“Doniwa”). 
Regarding claim 15, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 1, but do not teach wherein in (A), the software under test is executed in parallel using a plurality of sessions, wherein each session of the plurality of sessions is assigned a different test configuration of the first plurality of test configurations.  
However, Doniwa teaches: wherein in (A), the software under test is executed in parallel using a plurality of sessions, wherein each session of the plurality of sessions is assigned a different test configuration of the first plurality of test configurations(Doniwa, para. 0016, see also fig. 1 and fig. 5, “FIG. 1 is a block diagram showing a specific configuration of a hyper-parameter search system according to the embodiment. This system is a server system of a cluster configuration, wherein a server (hereinafter, referred to as a manager) 11 called a manager, and a plurality (four in the embodiment) of servers (hereinafter, referred to as workers) 12-i (i is any one of 1 to 4), are connected to a network 13” & see also Doniwa, para. 0028-0029, “FIG. 5 is a flowchart showing processing performed by the above-mentioned manager 11. First, when start of a search shown in FIG. 5 is instructed, a search range is read from the hyper-parameter search range storage unit 111 (step S11), and a plurality of initial hyper-parameter
candidates are generated within the search range (step S12). Since this candidate generation is an initial value search, the random system is adopted. Generated candidates are issued as tasks to arbitrary workers 12-i to instruct them to perform learning (step S13), and the end of the tasks is waited for (step S14). Upon receiving a response indicating the end of a task from each worker 12-i, the manager receives a result of learning from the same (step S15)… [i]n contrast, if there is no other search, subsequent hyper-parameter candidates that reflect the results of learning
collected in the steps up to step S16 are generated (step S17). Since past search results are prepared for candidate generation at this time, the Bayesian method is adopted. Generated candidates are issued as tasks to arbitrary workers 12-i to instruct them to perform learning (step S18), and the end of the tasks is waited for (step S19). Upon receiving a response indicating the end of a task from each worker 12-i, the manager receives therefrom a result of learning (step
S20).” ).  
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to combine the teachings of Chakkrit in view of Goth and in view of Groce and further in view of Doniwa, the motivation to do so would be to speed up the search for optimal machine learning hyperparameters for deep learning through the use of cluster based computing(Doniwa, paras. 0026-0027, “FIG. 4 shows the structure of a deep neural network, and the types of hyper-parameters processed by the respective layers of the deep neural network. In the deep neural network, if the number of network layers is small, and there are three types of hyper-parameters, each of which can assume three values, the combinations of the hyper-parameters is                         
                            
                                
                                    3
                                
                                
                                    3
                                
                            
                            =
                            27
                        
                    . However, if the number of layers of the deep neural network is 7 as shown in FIG. 4, and each hyperparameter can assume three values, the combinations of the
hyper-parameters is                         
                            
                                
                                    3
                                
                                
                                    7
                                
                            
                        
                     =2,187. Supposing that one hour is required for one-time learning of this deep neural network, 2,187 hours (about 91 days) are required for obtaining an optimal combination. Thus, it is very difficult to obtain the optimal combination. In light of the above, the server system of the embodiment is made to have a cluster structure comprising one server 11 called a manager, and a plurality of servers 12-i called workers, thereby realizing an efficient and fast search for an optimal combination of hyper-parameters.”).  
Regarding claim 16, Chakkrit in view of Goth and in view of Groce and further in view of Doniwa teaches the non-transitory computer-readable medium of claim 15, wherein each session of the plurality of sessions includes a plurality of worker computing devices( Doniwa, para. 0028, see also fig. 5, “FIG. 5 is a flowchart showing processing performed by the above-mentioned manager 11. First, when start of a search shown in FIG. 5 is instructed, a search range is read from the hyper-parameter search range storage unit 111 (step S11), and a plurality of initial hyper-parameter candidates are generated within the search range (step S12). Since this candidate generation is an initial value search, the random system is adopted. Generated candidates are issued as tasks to arbitrary workers 12-i to instruct them to perform learning (step S13), and the end of the tasks is waited for (step S14). Upon receiving a response indicating the end of a task from each worker 12-i, the manager receives a result of learning from the same (step S15).”).  
Regarding claim 17, Chakkrit in view of Goth and in view of Groce and further in view of Doniwa teaches the non-transitory computer-readable medium of claim 16, wherein in (A), the software under test is executed with a test dataset input to the software under test, wherein the test dataset is distributed across the plurality of worker computing devices of each session of the plurality of sessions(Doniwa, para. 0041, “It is known that deep learning utilizing a neural network requires a long learning period. In order to shorten the learning period, the amount of learning data used by each worker 12-i during learning may be halved.”).  
Regarding claim 18, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 1, but does not teach wherein in (E), the software under test is executed in parallel using a plurality of sessions, wherein each session of the plurality of sessions is assigned a different test configuration of the third plurality of test configurations. 
 However, Doniwa teaches: wherein in (E), the software under test is executed in parallel using a plurality of sessions, wherein each session of the plurality of sessions is assigned a different test configuration of the third plurality of test configurations(Doniwa, para. 0016, see also fig. 1 and fig. 5, “FIG. 1 is a block diagram showing a specific configuration of a hyper-parameter search system according to the embodiment. This system is a server system of a cluster configuration, wherein a server (hereinafter, referred to as a manager) 11 called a manager, and a plurality (four in the embodiment) of servers (hereinafter, referred to as workers) 12-i (i is any one of 1 to 4), are connected to a network 13” & see also Doniwa, para. 0028-0029, “FIG. 5 is a flowchart showing processing performed by the above-mentioned manager 11. First, when start of a search shown in FIG. 5 is instructed, a search range is read from the hyper-parameter search range storage unit 111 (step S11), and a plurality of initial hyper-parameter
candidates are generated within the search range (step S12). Since this candidate generation is an initial value search, the random system is adopted. Generated candidates are issued as tasks to arbitrary workers 12-i to instruct them to perform learning (step S13), and the end of the tasks is waited for (step S14). Upon receiving a response indicating the end of a task from each worker 12-i, the manager receives a result of learning from the same (step S15)… [i]n contrast, if there is no other search, subsequent hyper-parameter candidates that reflect the results of learning
collected in the steps up to step S16 are generated (step S17). Since past search results are prepared for candidate generation at this time, the Bayesian method is adopted. Generated candidates are issued as tasks to arbitrary workers 12-i to instruct them to perform learning (step S18), and the end of the tasks is waited for (step S19). Upon receiving a response indicating the end of a task from each worker 12-i, the manager receives therefrom a result of learning (step
S20).” ).  
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to combine the teachings of Chakkrit in view of Goth and in view of Groce and further in view of Doniwa, the motivation to do so would be to speed up the search for optimal machine learning hyperparameters for deep learning through the use of cluster based computing(Doniwa, paras. 0026-0027, “FIG. 4 shows the structure of a deep neural network, and the types of hyper-parameters processed by the respective layers of the deep neural network. In the deep neural network, if the number of network layers is small, and there are three types of hyper-parameters, each of which can assume three values, the combinations of the hyper-parameters is                         
                            
                                
                                    3
                                
                                
                                    3
                                
                            
                            =
                            27
                        
                    . However, if the number of layers of the deep neural network is 7 as shown in FIG. 4, and each hyperparameter can assume three values, the combinations of the
hyper-parameters is                         
                            
                                
                                    3
                                
                                
                                    7
                                
                            
                        
                     =2,187. Supposing that one hour is required for one-time learning of this deep neural network, 2,187 hours (about 91 days) are required for obtaining an optimal combination. Thus, it is very difficult to obtain the optimal combination. In light of the above, the server system of the embodiment is made to have a cluster structure comprising one server 11 called a manager, and a plurality of servers 12-i called workers, thereby realizing an efficient and fast search for an optimal combination of hyper-parameters.”).  
Regarding claim 19, Chakkrit in view of Goth and in view of Groce and further in view of Doniwa teaches the non-transitory computer-readable medium of claim 18, wherein each session of the plurality of sessions includes a plurality of worker computing devices(Doniwa, para. 0028, see also fig. 5, “FIG. 5 is a flowchart showing processing performed by the above-mentioned manager 11. First, when start of a search shown in FIG. 5 is instructed, a search range is read from the hyper-parameter search range storage unit 111 (step S11), and a plurality of initial hyper-parameter candidates are generated within the search range (step S12). Since this candidate generation is an initial value search, the random system is adopted. Generated candidates are issued as tasks to arbitrary workers 12-i to instruct them to perform learning (step S13), and the end of the tasks is waited for (step S14). Upon receiving a response indicating the end of a task from each worker 12-i, the manager receives a result of learning from the same (step S15).”).  
Regarding claim 20, Chakkrit in view of Goth and in view of Groce and further in view of Doniwa teaches the non-transitory computer-readable medium of claim 19, wherein in (A), the software under test is executed with a test dataset input to the software under test, wherein the test dataset is distributed across the plurality of worker computing devices of each session of the plurality of sessions(Doniwa, para. 0041, “It is known that deep learning utilizing a neural network requires a long learning period. In order to shorten the learning period, the amount of learning data used by each worker 12-i during learning may be halved.”).  

Claims 22-23 are rejected under 35 U.S.C. 103 as being unpatentable over Chakkrit, et al. "The impact of automated parameter optimization on defect prediction models." IEEE Transactions on Software Engineering 45.7 (2018)(“Chakkrit”) and in view of Goth, et al. DE 102019210562 A1(“Goth”) and in view of Groce, et al. "You are the only possible oracle: Effective test selection for end users of interactive machine learning systems." IEEE Transactions on Software Engineering 40.3 (2013)(“Groce”) and further in view of B'Far et al. US 8,954,309 B2(“B'Far”).
Regarding claim 22, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 1, but does not teach wherein the predicted test result is a memory consumption by the software under test with a test configuration of the third plurality of test configurations. 
However B'Far teaches: wherein the predicted test result is a memory consumption by the software under test with a test configuration of the third plurality of test configurations(B'Far, col. 13, lines 6-19, see also fig. 9, “[T]he performance of the system is measured 906. As discussed, measuring the system's performance while it is being simulated may include measuring one or more performance characteristics of the system. Performance characteristics may include latency, throughput, memory usage, processor usage, and, generally, any characteristic of the system's performance. Because of the complexity of systems being tested, simulating the system and measuring the system's performance may be performed several times to improve the accuracy of the measurements. In an embodiment, the system is simulated between six and ten times for each configuration, although the system may be simulated a different number of times, which may depend on the cost and resources available for simulating the system.”).
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to combine the teachings of Chakkrit in view of Goth and in view of Groce and further in view of B'Far, the motivation to do so would be to combine simulations with traditional parameter optimization  methods (B'Far, col. 5, lines 46-65, “[T]he first model is used, along with design of experiment techniques, to find an optimal set of simulations to employ in order to collect data that is used to create the second model. The second model, in an embodiment, predicts performance based on configuration settings, and can also be used, via optimization techniques, to recommend configuration settings to tune an application's performance so that it meets given goals. The system may be configured as a generalized off-line automated process for creating a statistical model of the performance characteristics of a system, which may be then coupled with an optimization algorithm to create a decision support system (DSS) that recommends configuration settings that will best achieve desired performance characteristics. The process and system may combine and adapt a number of algorithms for optimization, machine learning, and statistical analysis. In an embodiment, the system is configured to model and optimize enterprise applications, but can be adapted to tune any configurable system, such as configurable IT applications or, generally, automated processes.”). 
Regarding claim 23, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 1, but does not teach wherein the predicted test result is an execution time required by the software under test with a test configuration of the third plurality of test configurations.
However B'Far teaches:  wherein the predicted test result is an execution time required by the software under test with a test configuration of the third plurality of test configurations(B'Far, col. 13, lines 6-19, see also fig. 9, “[T]he performance of the system is measured 906. As discussed, measuring the system's performance while it is being simulated may include measuring one or more performance characteristics of the system. Performance characteristics may include latency, throughput, memory usage, processor usage, and, generally, any characteristic of the system's performance. Because of the complexity of systems being tested, simulating the system and measuring the system's performance may be performed several times to improve the accuracy of the measurements. In an embodiment, the system is simulated between six and ten times for each configuration, although the system may be simulated a different number of times, which may depend on the cost and resources available for simulating the system.”). 
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to combine the teachings of Chakkrit in view of Goth and in view of Groce and further in view of B'Far, the motivation to do so would be to combine simulations with traditional parameter optimization  methods (B'Far, col. 5, lines 46-65, “[T]he first model is used, along with design of experiment techniques, to find an optimal set of simulations to employ in order to collect data that is used to create the second model. The second model, in an embodiment, predicts performance based on configuration settings, and can also be used, via optimization techniques, to recommend configuration settings to tune an application's performance so that it meets given goals. The system may be configured as a generalized off-line automated process for creating a statistical model of the performance characteristics of a system, which may be then coupled with an optimization algorithm to create a decision support system (DSS) that recommends configuration settings that will best achieve desired performance characteristics. The process and system may combine and adapt a number of algorithms for optimization, machine learning, and statistical analysis. In an embodiment, the system is configured to model and optimize enterprise applications, but can be adapted to tune any configurable system, such as configurable IT applications or, generally, automated processes.”).

Claims 24-26 are rejected under 35 U.S.C. 103 as being unpatentable over Chakkrit, et al. "The impact of automated parameter optimization on defect prediction models." IEEE Transactions on Software Engineering 45.7 (2018)(“Chakkrit”) and in view of Goth, et al. DE 102019210562 A1(“Goth”) and in view of Groce, et al. "You are the only possible oracle: Effective test selection for end users of interactive machine learning systems." IEEE Transactions on Software Engineering 40.3 (2013)(“Groce”) and further in view of (2019, April 7). Conditional entropy. Wikipedia. Retrieved February 17, 2022, from https://web.archive.org/web/20190428121511/https://en.wikipedia.org/wiki/Conditional_entropy(“Wikipedia”). 
Regarding claim 24, Chakkrit in view of Goth and in view of Groce teaches the non-transitory computer-readable medium of claim 1, but does not teach wherein the predefined number of test configurations are selected from the second plurality of test configurations based on an entropy score value computed from the predicted test result.  
However Wikipedia teaches: wherein the predefined number of test configurations are selected from the second plurality of test configurations based on an entropy score value computed from the predicted test result(Wikipedia, sec. Motivation,  “The entropy of Y conditioned on X taking the value of x is defined analogously by conditional expectation:                 
                    H
                    
                        
                            Y
                        
                        
                            X
                            =
                            x
                        
                    
                    =
                    -
                    
                        
                            ∑
                            
                                y
                                ∈
                                Y
                            
                        
                        
                            
                                
                                    Pr
                                
                                ⁡
                                
                                    
                                        
                                            Y
                                            =
                                            y
                                        
                                        
                                            X
                                            =
                                            x
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            log
                                        
                                        
                                            2
                                        
                                    
                                
                                ⁡
                                
                                    
                                        
                                            Pr
                                        
                                        ⁡
                                        
                                            
                                                
                                                    Y
                                                    =
                                                    y
                                                
                                                
                                                    X
                                                    =
                                                    x
                                                
                                            
                                        
                                    
                                    .
                                
                            
                        
                    
                
            ”).
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to combine the teachings of Chakkrit in view of Goth and in view of Groce and further in view of Wikipedia, the motivation to do so would be to have a test result that took  into consideration prior information(Wikipedia, top-page, “In information theory, the conditional entropy…quantifies the amount of information needed to describe the outcome of a random variable Y given that the value of another random X is known.”).
Regarding claim 25, Chakkrit in view of Goth and in view of Groce and further in view of Wikipedia teaches the non-transitory computer-readable medium of claim 24, wherein the predefined number of test configurations are selected based on having maximum values for the entropy score value computed from the predicted test result(Wikipedia, sec. Motivation,  “The entropy of Y conditioned on X taking the value of x is defined analogously by conditional expectation:                 
                    H
                    
                        
                            Y
                        
                        
                            X
                            =
                            x
                        
                    
                    =
                    -
                    
                        
                            ∑
                            
                                y
                                ∈
                                Y
                            
                        
                        
                            
                                
                                    Pr
                                
                                ⁡
                                
                                    
                                        
                                            Y
                                            =
                                            y
                                        
                                        
                                            X
                                            =
                                            x
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            log
                                        
                                        
                                            2
                                        
                                    
                                
                                ⁡
                                
                                    
                                        
                                            Pr
                                        
                                        ⁡
                                        
                                            
                                                
                                                    Y
                                                    =
                                                    y
                                                
                                                
                                                    X
                                                    =
                                                    x
                                                
                                            
                                        
                                    
                                    .
                                
                            
                        
                    
                
            ”).  
Regarding claim 26, Chakkrit in view of Goth and in view of Groce and further in view of Wikipedia teaches the non-transitory computer-readable medium of claim 24, wherein the entropy score is computed using                
                     
                    H
                    
                        
                            x
                        
                    
                    =
                    -
                    
                        
                            ∑
                            
                                y
                                ∈
                                Y
                            
                        
                        
                            
                                
                                    Pr
                                
                                ⁡
                                
                                    
                                        
                                            y
                                        
                                        
                                            x
                                        
                                    
                                
                            
                            
                                
                                    log
                                
                                ⁡
                                
                                    
                                        
                                            Pr
                                        
                                        ⁡
                                        
                                            
                                                
                                                    y
                                                
                                                
                                                    x
                                                
                                            
                                        
                                    
                                
                            
                        
                    
                
             where H(x) is an entropy score value, P(ylx) is a posterior probability of a predicted test result y given a respective test configuration x of the second plurality of test configurations, and Y is a set of possible test results(Wikipedia, sec. Motivation,  “The entropy of Y conditioned on X taking the value of x is defined analogously by conditional expectation:                 
                    H
                    
                        
                            Y
                        
                        
                            X
                            =
                            x
                        
                    
                    -
                    
                        
                            ∑
                            
                                y
                                ∈
                                Y
                            
                        
                        
                            
                                
                                    Pr
                                
                                ⁡
                                
                                    
                                        
                                            Y
                                            =
                                            y
                                        
                                        
                                            X
                                            =
                                            x
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            log
                                        
                                        
                                            2
                                        
                                    
                                
                                ⁡
                                
                                    
                                        
                                            Pr
                                        
                                        ⁡
                                        
                                            
                                                
                                                    Y
                                                    =
                                                    y
                                                
                                                
                                                    X
                                                    =
                                                    x
                                                
                                            
                                        
                                    
                                    .
                                
                            
                        
                    
                
            ”).  
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Adam Clark Standke whose telephone number is (571)270-1806. The examiner can normally be reached 10AM-7PM M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



Adam Clark Standke
Assistant Examiner
Art Unit 2129


/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 See para. 0020 of the original untranslated document: “[W]ird das generierte Regressionsmodell
        verwendet, um die vielversprechendsten Testfälle aus den Clusterdaten             
                
                    
                        D
                    
                    
                        o
                        b
                        s
                    
                
            
         auszuwählen. Diese Auswahl an Testfällen könnte dann für - im Hinblick auf Zeitaufwand und Rechenleistung - aufwändigere Überprüfungen auf Laufzeitfehler genutzt werden, die eine Instrumentierung des Quellkodes erfordern. Es werden daher nur Testfälle ausgeführt, die die Software maximal durchdringen.”; see also paras. 0013-0019 in regards to fig. 1.
        2 See paras. 0007-0008 of the original untranslated document: “Herkömmliche Fuzzer wie AFL sind
        hierbei darauf angewiesen, viele Testfälle (mitunter Hunderte oder Tausende pro Sekunde) auszuführen, was für langsame Software - insbesondere solcherlei Softwaresysteme, die irgendeine Form der Datenübertragung
        benötigen - nicht durchführbar ist. Zurzeit obliegt es dem Entwickler des Fuzz-Wrappers, eine hinreichend schnelle Lösung für dieses Problem zu finden…[e]ine Hauptidee des vorgeschlagenen Ansatzes besteht vor diesem Hintergrund darin, die Eingabewarteschlange oder den Korpus vor der eigentlichen Testausführung einer Ballungsanalyse (clustering) zu unterziehen, um aus jedem Cluster eine optimale Eingabe auszuwählen und das Fuzz-Testing auf diese Weise zu beschleunigen bzw. eine Testmetrik (z. B. Abdeckung) zu maximieren.” 
        3 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
        4 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
        5 See para. 0018 of the original untranslated document: Vielversprechend bedeutet in diesem Zusammenhang, dass diese Daten eine weitestgehende Codeabdeckung gewährleisten und mit hoher Wahrscheinlichkeit zu einem
        Softwareabsturz führen. Daher ist es von Bedeutung, wie sicher eine Vorhersage ist, um zu entscheiden,
        ob es sinnvoll ist, den betreffenden Testfall zu verwenden. Um ein Basismodell zu erlernen, empfiehlt es sich, die Parameter eines Gaußprozesses anhand der Daten             
                
                    
                        D
                    
                    
                        o
                        b
                        s
                    
                
            
          aus Prozess 13 zu ermitteln indem die logarithmische Plausibilitätsfunktion (log likelihood) maximiert wird:             
                a
                r
                g
                m
                a
                
                    
                        x
                    
                    
                        θ
                    
                
                 
                l
                o
                g
                ⁡
                (
                p
                (
                y
                |
                X
                ,
                 
                θ
                )
                )