DETAILED ACTION
1.	This communication is in response to Application No. 16/696,920 filed on November 26, 2019 in which claims 1-20 are presented for examination.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
3.	The information disclosure statements submitted on 11/26/2019 and 12/18/2019 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.

Claim Objections
4.	Independent Claims 1, 13, and 20 and their according dependent claims are objected to because it is not entirely clear what defines a “weak learner model” and “strong learner model” and what differentiates each learner model from the other when used in an ensemble. Appropriate correction is required.

Claim Rejections - 35 USC § 112
5.	The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


6.	Claims 10 and 20 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.

7.	Claim 10 recites the limitation “the second strong model” when there is no mention of a second strong learner model in Claim 1, which Claim 10 is dependent upon.  There is insufficient antecedent basis for this limitation in the claim.

8.	Claim 20 recites the limitation “the design solution” with no prior reference to a design solution and no claim limitations on what steps of the method constitute the design solution. There is insufficient antecedent basis for this limitation in the claim.

Claim Rejections - 35 USC § 101
9.	35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


10.	Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
	Claim 1 recites a method for design space optimization, comprising: generating, by one or more processors, a plurality of first data points by evaluating a function; generating, by the one or more processors, a weak learner model using the plurality of first data points; generating, by the one or more processors, a strong learner model using the plurality of first data points, the strong learner model different from the weak learner model; generating, by the one or more processors using the weak learner model, at least one second data point that satisfies an optimization condition; generating, by the one or more processors using the strong learner model, at least one third data point using an optimizer; evaluating, by the one or more processors using the function, input values corresponding to the at least one second data point and the at least one third data point to generate a candidate optimum output; and outputting, by the one or more processors, the candidate optimum output responsive to an output condition being satisfied.
	2A Prong 1: The limitation generating, by one or more processors, a plurality of first data points by evaluating a function, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, other than reciting “one or more processors”, generating data points by evaluating a function can be performed manually by a user generating first data points by evaluating a function. Further, the limitation generating, by the one or more processors, a weak learner model using the plurality of first data points, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, other than reciting “one or more processors”, a weak learner model may be generated by the user using the plurality of first data points. Further, the limitation generating, by the one or more processors, a strong learner model using the plurality of first data points, the strong learner model different from the weak learner model, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, other than reciting “one or more processors”, a strong learner model different from the weak learner model may be generated by the user using the plurality of first data points. Further, the limitation generating, by the one or more processors using the weak learner model, at least one second data point that satisfies an optimization condition, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, other than reciting “one or more processors”, the user may manually generate a second data point to satisfy an optimization condition. Further, the limitation generating, by the one or more processors using the strong learner model, at least one third data point using an optimizer, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, other than reciting “one or more processors” and “using an optimizer”, the user may generate at least one third data point. Further, the limitation evaluating, by the one or more processors using the function, input values corresponding to the at least one second data point and the at least one third data point to generate a candidate optimum output, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, other than reciting “one or more processors using the function”, the user may evaluate input values corresponding to the at least one second and third data points to generate a candidate optimum output. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
	2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements – one or more processors. The one or more processors are recited at a high-level of generality (i.e., as a generic one or more processors able to generate a plurality of data points, generate weak and strong learner models, and evaluate input values to generate a candidate optimum output) such that it amounts to no more than mere instructions to apply the exception using generic computer components. Further, the claim recites the additional element – an optimizer. The optimizer is recited at a high-level of generality (i.e., as a generic optimizer able to generate a third data point) such that it amounts to no more than mere instructions to apply the exception using generic computer components. Further, the claim recites outputting, by the one or more processors, the candidate optimum output responsive to an output condition being satisfied. The outputting step is recited at a high level of generality and amounts to merely displaying data, which is a form of insignificant extra-solution activity. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
	2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of one or more processors amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using generic computer components cannot provide an inventive concept. Further, the additional element of an optimizer amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using generic computer components cannot provide an inventive concept. Further, the outputting step was considered to be insignificant extra-solution activity in Step 2A Prong 2, and thus it is re-evaluated in Step 2B to determine if it is more than what is well-understood, routine, conventional activity in the field. Sewell (“Ensemble Learning”) discloses various methods of ensemble learning, including bagging, boosting, and stacking, in which multiple strong or weak learners are employed and their predictions are combined. Sewell Pgs. 9-10 describes the taxonomies of different ensemble methods, each of which includes outputting optimum results from each model in the ensemble, and then combining the results accordingly. Thus, Sewell highlights that within ensemble machine learning and when combining multiple machine learning models, it is necessary to generate output, such that the outputs can be combined for a more accurate prediction and/or used to train or improve training of a higher-level learner. Thereby, a conclusion that the claimed outputting step is a well-understood, routine, conventional activity is supported by the teachings of Sewell. The claim is not patent eligible.
For the reasons above, Claim 1 is rejected as being directed to an abstract idea without significantly more. This rejection applies equally to dependent claims 2-12. The additional limitations of the dependent claims are addressed below. 

	Claim 2 recites the method of claim 1, wherein generating, by the one or more processors, the plurality of first data points includes applying randomly selected inputs to the function. At Step 2A Prong 1, Dependent claim 2 recites mental process applying randomly selected inputs to the function, which may be performed manually by a user. Accordingly, under Step 2A Prong 2 and Step 2B, these additional elements do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.

	Claim 3 recites the method of claim 1, wherein generating, by the one or more processors, the weak learner model includes providing the plurality of first data points as input to at least one of a regression model or a support vector machine. Dependent Claim 3 is just another activity specifying that the weak learner model may comprise a regression model or a support vector machine. Accordingly, under Step 2A Prong 2 and Step 2B, these additional elements do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.

	Claim 4 recites the method of claim 1, wherein generating, by the one or more processors, the strong learner model includes providing the plurality of first data points as input to at least one of a neural network or a random forest model. Dependent Claim 4 is just another activity specifying that the strong learner model may comprise a neural network or a random forest model. Accordingly, under Step 2A Prong 2 and Step 2B, these additional elements do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.

	Claim 5 recites the method of claim 1, further comprising: generating, by the one or more processors, a plurality of candidate second data points using the weak learner model; and determining, by the one or more processors, that each second data point satisfies the optimization condition based on a candidate second data point meeting or exceeding a threshold percentile relative to the plurality of candidate second data points. At Step 2A Prong 1, Dependent Claim 5 recites mental processes “generating a plurality of candidate second data points using the weak learner model” and “determining that each second data point satisfies the optimization condition based on a candidate second data point meeting or exceeding a threshold percentile relative to the plurality of candidate second data points” both of which may be performed manually by a user. At Step 2A Prong 2 and Step 2B, the additional element “one or more processors” do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.

	Claim 6 recites the method of claim 5, further comprising selecting, by the one or more processors, input values to generate the plurality of candidate second data points using a pseudorandom number generator and based on a distance between each input value. At Step 2A Prong 1, Dependent Claim 6 recites mental process “selecting input values to generate the plurality of candidate second data points using a pseudorandom number generator and based on a distance between each input value” which may be performed manually by a user. At Step 2A Prong 2 and Step 2B, the additional element “one or more processors” do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.

	Claim 7 recites the method of claim 1, wherein further comprising determining, by the one or more processors, the output condition to be satisfied responsive to at least one of a threshold number of iterations or the candidate optimum output being within a threshold of an expected optimum of the function. At Step 2A Prong 1, Dependent Claim 7 recites mental process “determining the output condition to be satisfied responsive to at least one of a threshold number of iterations or the candidate optimum output being within a threshold of an expected optimum of the function” which may be performed manually by a user. At Step 2A Prong 2 and Step 2B, the additional element “one or more processors” do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.

	Claim 8 recites the method of claim 1, wherein an optimum of the function is a maximum value or a minimum value. Dependent Claim 8 is just another activity specifying that the optimum of the function may be a maximum or minimum value, such that it amounts to no more than mere instructions to apply the exception using generic computer components. Accordingly, under Step 2A Prong 2 and Step 2B, these additional elements do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.

	Claim 9 recites the method of claim 1, further comprising: generating, by the one or more processors, the strong learner model to include a plurality of neural networks, each neural network of the plurality of neural networks provided with at least one of different weights or different biases; and generating, by the one or more processors, the third data point using each of the plurality of neural networks. Dependent Claim 9 is just another activity specifying that the strong learner model includes a plurality of neural networks, each neural network provided with at least one of different weights or different biases and using the neural networks to generate the third data point, such that it amounts to no more than mere instructions to apply the exception using generic machine learning components. Further, the additional element of a neural network is recited at a high-level of generality (i.e., as a generic neural network that is able to generate a third data point). Mere instructions to apply an exception using generic machine learning components cannot provide an inventive concept. Accordingly, under Step 2A Prong 2 and Step 2B, these additional elements do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.

	Claim 10 recites the method of claim 1, further comprising updating, by the one or more processors responsive to the output condition not being satisfied, the weak learner model and the second strong model using the plurality of second data points and the third data point. At Step 2A Prong 1, Dependent Claim 10 recites mental process “updating the weak learner model and the second strong model using the plurality of second data points and the third data point” such that a user may manually perform the update using the plurality of second data points and the third data point. At Step 2A Prong 2 and Step 2B, the additional element “one or more processors” do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.

	Claim 11 recites the method of claim 1, further comprising using the function to optimize a combustion process. Dependent Claim 11 is just another activity specifying that the function should be used to optimize a combustion process, such that it amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using generic computer components cannot provide an inventive concept. Accordingly, under Step 2A Prong 2 and Step 2B, these additional elements do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.

	Claim 12 recites the method of claim 1, further comprising increasing, by the one or more processors, a first count of the at least one second data point to be generated using the weak learner model relative to a second count of the at least one third data point to be generated using the strong learner model responsive to a measure of effectiveness of the weak learner model satisfying a corresponding threshold. At Step 2A Prong 1, Dependent Claim 12 recites mental process “increasing a first count of the at least one second data point to be generated using the weak learner model relative to a second count of the at least one third data point to be generated using the strong learner model responsive to a measure of effectiveness of the weak learner model satisfying a corresponding threshold” such that this may be performed manually by a user. At Step 2A Prong 2 and Step 2B, the additional element “one or more processors” do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.

	Independent Claim 13 recites substantially the same limitations as Claim 1, in the form of a system, including generic computer components. The claim is also directed to performing mental processes without significantly more, therefore it is rejected under the same rationale. 
For the reasons above, Claim 13 is rejected as being directed to an abstract idea without significantly more. This rejection applies equally to dependent claims 14-19. The additional limitations of the dependent claims are addressed below. 

Claim 14 recites substantially the same limitations as Claim 2, in the form of a system, including generic computer components. The claim is also directed to performing mental processes without significantly more, therefore it is rejected under the same rationale. 


Claim 15 recites substantially the same limitations as Claims 3 and 4, in the form of a system, including generic computer components. The claim is also directed to performing mental processes without significantly more, therefore it is rejected under the same rationale. 

Claim 16 recites substantially the same limitations as Claim 5, in the form of a system, including generic computer components. The claim is also directed to performing mental processes without significantly more, therefore it is rejected under the same rationale. 

Claim 17 recites substantially the same limitations as Claim 7, in the form of a system, including generic computer components. The claim is also directed to performing mental processes without significantly more, therefore it is rejected under the same rationale. 

Claim 18 recites substantially the same limitations as Claim 9, in the form of a system, including generic computer components. The claim is also directed to performing mental processes without significantly more, therefore it is rejected under the same rationale. 

Claim 19 recites substantially the same limitations as Claim 12, in the form of a system, including generic computer components. The claim is also directed to performing mental processes without significantly more, therefore it is rejected under the same rationale. 

Claim 20 recites a method for optimizing a design space comprising: (i) populating a design space with random N design points evaluated by a function evaluator; (ii) generating a weak learner model using available data; (iii) generating a strong learner model using the available data; (iv) randomly sampling points that the weak learner predicts will be above a selected objective value based on an objective value function; (v) finding the optimum predicted by strong learner based on the random sample points in step 4 using a global optimization scheme; (vi) adding the parameter identified in step five to the N points to be evaluated by the design function; (vii) performing design function evaluations on N; and (viii) adding the design solution to a database and repeating steps 2-7 until the method converges or maximum number of iterations is reached.
2A Prong 1: The limitation (i) populating a design space with random N design points evaluated by a function evaluator, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, other than reciting “a function evaluator”, populating a design space with random N design points and evaluating the design points may be performed manually by a user. Further, the limitation (ii) generating a weak learner model using available data, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, generating a weak learner model using available data may be performed manually by a user or within the user’s mind. Further, the limitation (iii) generating a strong learner model using the available data, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, generating a strong learner model using available data may be performed manually by a user or within the user’s mind. Further, the limitation (iv) randomly sampling points that the weak learner predicts will be above a selected objective value based on an objective value function, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind and by mathematical calculations That is, randomly sampling points based on a user’s prediction of which points will be above a selected objective value may be performed manually by the user and further, basing the prediction on an objective value function involves the use of mathematical calculation. Further, the limitation (v) finding the optimum predicted by strong learner based on the random sample points in step 4 using a global optimization scheme, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind and by mathematical calculation. That is, finding the optimum based on a user’s prediction and the random sample points of the prediction in step 4, may be performed manually by a user and further, using a global optimization scheme involves the use of mathematical calculation. Further, the limitation (vi) adding the parameter identified in step five to the N points to be evaluated by the design function, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation by mathematical calculation. That is, considering an additional parameter as identified in step five to be evaluated, may be performed by adding the parameter to the mathematical calculation to be evaluated by the design function. Further, the limitation (vii) performing design function evaluations on N, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, design function evaluations may be performed manually by a user. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation by mathematical calculation but for the recitation of generic computer components, then it falls within the “Mathematic Concepts” grouping of abstract ideas. Similarly, if a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
2A Prong 2: This judicial exception is not integrated into a practical application. The claim recites (viii) adding the design solution to a database and repeating steps 2-7 until the method converges or maximum number of iterations is reached. The adding step is recited at a high level of generality and amounts to merely storing and retrieving information, which is a form of insignificant extra-solution activity. The repeating of steps 2-7 also amounts to merely performing repetitive calculations, which is also a form of insignificant extra-solution activity. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The adding step was considered to be insignificant extra-solution activity in Step 2A Prong 2, and thus it is re-evaluated in Step 2B to determine if it is more than what is well-understood, routine, conventional activity in the field. The court decisions cited in MPEP 2106.05(d)(II) indicate that merely “Storing and retrieving information in memory” is a well-understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the adding step is well-understood, routine, conventional activity is supported under Berkheimer. Further, the repeating of steps 2-7 was considered to be insignificant extra-solution activity in Step 2A Prong 2, and thus it is re-evaluated in Step 2B to determine if it is more than what is well-understood, routine, conventional activity in the field. The court decisions cited in MPEP 2106.05(d)(II) indicate that merely “Performing repetitive calculations” is a well-understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the repeating of steps 2-7 is well-understood, routine, conventional activity is supported under Berkheimer. The claim is not patent eligible.
For the reasons above, Claim 20 is rejected as being directed to an abstract idea without significantly more.

Claim Rejections - 35 USC § 103
11.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

12.	Claims 1-8, 10, 12-17, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Nardi et al. (hereinafter Nardi) (“Practical Design Space Exploration”), in view of Tristan et al. (hereinafter Tristan) (US PG-PUB 20190095805).
Regarding Claim 1, Nardi teaches a method for design space optimization, comprising: 
generating, by one or more processors, a plurality of first data points by evaluating a function (Nardi, Pg. 3, B. Multi-Objective Optimization: Problem Statement, “We define f : X → Rd as our vector of objective functions f = (f1,...,fp), taking x as input, and evaluating y = f(x)”, thus a plurality of first data points are generated by evaluating a function); 
generating, by the one or more processors using the weak learner model (See introduction of Tristan reference below for teaching of weak learner model), at least one second data point (Nardi, Pg. 3, B. Multi-Objective Optimization: Problem Statement, “Formally, let us consider a multi-objective optimization (minimization) over a design space X ⊆ Rd. We define f : X → Rp as our vector of objective functions f = (f1,...,fp), taking x as input, and evaluating y = f(x). Our goal is to identify the Pareto frontier of f; that is, the set Γ ⊆ X of points which are not dominated by any other point, i.e., the maximally desirable x which cannot be optimized further for any single objective without making a trade-off.”, thus, as also shown in Fig. 1, at least one second data point is generated, as the multi-objective function maps each point in the design space to the optimization space) that satisfies an optimization condition (Nardi, Pg. 3, “We can then introduce a set of inequality constraints c(x) = (c1(x), ...,cq(x)), b = (b1,...,bq) to the optimization, such that we only consider points where all constraints are satisfied (ci(x) bi). These constraints directly correspond to real world limitations of the design space under consideration.”, thus, generated data points are required to satisfy an optimization condition);
generating, by the one or more processors using the strong learner model (See introduction of Tristan reference below for teaching of strong learner model), at least one third data point (Nardi, Pg. 3, B. Multi-Objective Optimization: Problem Statement, “Formally, let us consider a multi-objective optimization (minimization) over a design space X ⊆ Rd. We define f : X → Rp as our vector of objective functions f = (f1,...,fp), taking x as input, and evaluating y = f(x). Our goal is to identify the Pareto frontier of f; that is, the set Γ ⊆ X of points which are not dominated by any other point, i.e., the maximally desirable x which cannot be optimized further for any single objective without making a trade-off.”, thus, as also shown in Fig. 1, at least one third data point is generated, as the multi-objective function maps each point in the design space to the optimization space) using an optimizer (Nardi, Pg. 6, IV. Evaluation, “We run the evaluation on the recently proposed Spatial compiler [19], which implies a full integration of HyperMapper 2.0 on the Spatial production-level compiler toolchain for designing application hardware accelerators on FPGAs. We compare HyperMapper 2.0 with the HyperMapper 1.0 multi-objective auto-tuner to show the effectiveness of the feasibility constraints methodology. Then we compare HyperMapper 2.0 against the real Pareto where exhaustive search is possible, i.e. a total of three benchmarks. This is to give an insight on how the optimizer works in a controlled environment, i.e. when the Pareto front in known and the benchmark is small.”, thus, the HyperMapper 2.0, an optimizer, is used to generate one or more data points);
evaluating, by the one or more processors using the function, input values corresponding to the at least one second data point and the at least one third data point to generate a candidate optimum output (Nardi, Pgs. 9-10, “Comparisons are synthesized in Figure 7. The optimal Pareto front is very close to the approximated one provided by HyperMapper 2.0, showing our software’s ability to recover the optimal Pareto front. About 1500 total samples are required to recover the Pareto optimum, about the same number of samples for BlackScholes and 66 times fewer for OuterProduct and DotProduct compared to the prior Spatial design space exploration approach using pruning and random sampling.”, thus, as shown in Figure 7, the optimum is generated for each benchmark, based on the evaluation of input values corresponding to data points); and 
outputting, by the one or more processors, the candidate optimum output responsive to an output condition being satisfied (Nardi, Pg. 5, Figure 3 & Algorithm 1, both of which depict that the optimum is output responsive to a feasibility constraint being satisfied and a predefined maximum number of samples being processed within an active learning iteration).

Nardi does not explicitly disclose generating, by the one or more processors, a weak learner model using the plurality of first data points; 
However, Tristan teaches generating, by the one or more processors (Tristan, Par. [0037], “Thus, each classifier may be executed using a different compute node or processor core. In some embodiments, each classifier may execute in a separate process or thread without relying on other classifiers in the ensemble.”, therefore one or more processors are disclosed), a weak learner model using the plurality of first data points (Tristan, Par. [0036], “In some embodiments, the classifiers may be implemented using different machine learning models, such as, for example, decision trees, support vector machines, neural networks, Bayesian networks, and the like. Decision trees, in particular, work particularly well with feature hashing, because a reduction in the number of input features reduces the complexity of the decision tree (i.e., the number of levels).”, thus, a weak learner model, such as a support vector machine is generated by using a plurality of first data points, as shown in Fig. 1, label 122);

Nardi does not explicitly disclose generating, by the one or more processors, a strong learner model using the plurality of first data points, the strong learner model different from the weak learner model; 
However, Tristan teaches by the one or more processors, a strong learner model using the plurality of first data points, the strong learner model different from the weak learner model (Tristan, Par. [0036], “In some embodiments, the classifiers may be implemented using different machine learning models, such as, for example, decision trees, support vector machines, neural networks, Bayesian networks, and the like. Decision trees, in particular, work particularly well with feature hashing, because a reduction in the number of input features reduces the complexity of the decision tree (i.e., the number of levels).”, thus, a strong learner model, such as a neural network is generated by using a plurality of first data points, as shown in Fig. 1, label 124);

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method for design space optimization including generating a plurality of first data points by evaluating a function, generating at least one second data point that satisfies an optimization condition, generating at least one third data point using an optimizer, evaluating the at least one second data point and at least one third data point to generate a candidate optimum output and outputting the candidate optimum output responsive to an output condition being satisfied, as disclosed by Nardi to include the generation and use of weak and strong learner models, as disclosed by Tristan. One of ordinary skill in the art would have been motivated to make this modification to enable more efficient and accurate optimization of a design space through the use of ensemble machine learning, combining weak and strong learner models to accelerate convergence (Tristan, Par. [0044], “The resulting classifiers were then combined to form a decision ensemble. Different ensembles were constructed that varied both the number of contributing classifiers and the feature vector size of the classifiers. The experimental results show the expected tradeoff between feature vector size and accuracy. However, as the size of the ensemble grows, the accuracy of the ultimate classification by the ensemble converges to almost 99%. In all test configurations, the ensembled classification system produced a result that was more than 97% accurate when the ensemble included 10,000 hash featuring classifiers.”)

Regarding Claim 2, Nardi in view of Tristan teaches the method of claim 1, wherein generating, by the one or more processors, the plurality of first data points includes applying randomly selected inputs to the function (Nardi, Pg. 4, “We first warm-up our model with simple random sampling. In the design of experiments (DoE) literature [26], this is the most commonly used sampling technique to warm-up the search. When prior knowledge is used, samples are drawn from each variable’s prior distribution, or the uniform distribution by default if no prior knowledge is provided”, thus, as also shown in Algorithm 1 on Pg. 5 random sampling is used to apply selected inputs to the function).

Regarding Claim 3, Nardi in view of Tristan teaches the method of claim 1, wherein generating, by the one or more processors, the weak learner model includes providing the plurality of first data points as input to at least one of a regression model or a support vector machine (Tristan, Par. [0036], “In some embodiments, the classifiers may be implemented using different machine learning models, such as, for example, decision trees, support vector machines, neural networks, Bayesian networks, and the like. Decision trees, in particular, work particularly well with feature hashing, because a reduction in the number of input features reduces the complexity of the decision tree (i.e., the number of levels).”, thus, the plurality of first data points is provided as input to a weak learner model, such as a support vector machine as shown in Fig. 1, label 122);
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 4, Nardi in view of Tristan teaches the method of claim 1, wherein generating, by the one or more processors, the strong learner model includes providing the plurality of first data points as input to at least one of a neural network or a random forest model (Tristan, Par. [0036], “In some embodiments, the classifiers may be implemented using different machine learning models, such as, for example, decision trees, support vector machines, neural networks, Bayesian networks, and the like. Decision trees, in particular, work particularly well with feature hashing, because a reduction in the number of input features reduces the complexity of the decision tree (i.e., the number of levels).”, thus, the plurality of first data points is provided as input to a strong learner model, such as a neural network as shown in Fig. 1, label 124. Further, a random forest model is used in the Nardi reference.);
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 5, Nardi in view of Tristan teaches the method of claim 1, further comprising: 
generating, by the one or more processors, a plurality of candidate second data points (Nardi, Pg. 3, “We aim to identify Γ with the fewest possible function evaluations, solving a sequential decision problem and constructing a strategy X : f → {X1,X2,X3,...} to iteratively generate the next Xn+1 ∈ X to evaluate. If the evaluation Xi is not very expensive then it is possible to construct a strategy that, for each sequential step, runs multiple evaluations, i.e., a batch of evaluations. In this case it is standard practice to warm-up the strategy with some previously sampled points, using sampling techniques from the design of experiments literature [26]. It is worth noting that, while infeasible points are never considered our best experiment, they are still useful to add to our set of performed experiments to improve the probabilistic model posteriors.”, thus, regardless of being considered a feasible or infeasible data point by the optimization condition, a plurality of data points are generated) using the weak learner model (See teaching of Tristan reference in Claim 1); and 
determining, by the one or more processors, that each second data point satisfies the optimization condition based on a candidate second data point meeting or exceeding a threshold percentile relative to the plurality of candidate second data points (Nardi, Pg. 3, “Applying these constraints gives the constrained Pareto Γ ={x∈X:∀iq, ci(x)bi} where x∈X such that ci(x) bi and x≺ x Similarly to the mono-objective case in [13], we can define the feasibility indicator function ∆i(x) ∈ 0,1 which is 1 if ci(x) bi, and 0 otherwise. A design point where ∆i(x) = 1 is termed feasible. Otherwise, it is called infeasible.”, thus, each data point is evaluated for feasibility based on their objective and satisfaction of feasibility/inequality constraints relative to the plurality of data points in the design space).

Regarding Claim 6, Nardi in view of Tristan teaches the method of claim 5, further comprising selecting, by the one or more processors, input values to generate the plurality of candidate second data points using a pseudorandom number generator (Tristan, Par. [0028], “For example, the ensemble learning process may use different methods to allocate the training data set among the different decision models. In some embodiments, a bootstrap aggregation (abbreviated “bagging”) technique may be used. In a bagging process, a number n of “bootstrap” data sets is created from the initial training data set. Each bootstrap data set may be used to train one decision model. In some embodiments, to obtain a bootstrap set, the training data set is sampled uniformly in a pseudorandom fashion. The sampling may be performed “with replacement,” that is, the sampling permits the same data record to be repeated during training. In some embodiments, the bagging method reduces the variance of linear regression algorithms and the accuracy of decision models such as classifiers. The pseudorandom sampling also speeds up the training process and ensures that each decision model is exposed to different portions of training data and injects a degree of independence to each of the models.”, therefore, the data points may be generated/sampled in a pseudorandom fashion) and based on a distance between each input value (Nardi, Pg. 4, “We first warm-up our model with simple random sampling. In the design of experiments (DoE) literature [26], this is the most commonly used sampling technique to warm-up the search. When prior knowledge is used, samples are drawn from each variable’s prior distribution, or the uniform distribution by default if no prior knowledge is provided”, thus, when using prior knowledge, samples are drawn based on prior distribution of input values).

	It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method for design space optimization in Claim 1, as disclosed by Nardi in view of Tristan to include the use of a pseudorandom number generator to generate the plurality of candidate second data points, as disclosed by Tristan. One of ordinary skill in the art would have been motivated to make this modification to reduce variance/bias in the sampled data and ensure accuracy of the learner model (Tristan, Par. [0028], “In some embodiments, to obtain a bootstrap set, the training data set is sampled uniformly in a pseudorandom fashion. The sampling may be performed “with replacement,” that is, the sampling permits the same data record to be repeated during training. In some embodiments, the bagging method reduces the variance of linear regression algorithms and the accuracy of decision models such as classifiers. The pseudorandom sampling also speeds up the training process and ensures that each decision model is exposed to different portions of training data and injects a degree of independence to each of the models.”)

Regarding Claim 7, Nardi in view of Tristan teaches the method of claim 1, wherein further comprising determining, by the one or more processors, the output condition to be satisfied responsive to at least one of a threshold number of iterations or the candidate optimum output being within a threshold of an expected optimum of the function (Nardi, Pg. 5, “Algorithm 1 shows the pseudo-code of the model-based search algorithm used in HyperMapper 2.0. Figure 3 shows a corresponding graphical representation of the algorithm. The while loop on line 0 in Algorithm 1 is the active learning loop, represented by the big loop in the preprocessing box of Figure 3. The user specifies a maximum number of active learning iterations given by the variable maxAL.”, therefore, the output condition to be satisfied consists of a threshold/maximum number of iterations).

Regarding Claim 8, Nardi in view of Tristan teaches the method of claim 1, wherein an optimum of the function is a maximum value or a minimum value (Nardi, Pgs. 2-3, “Mathematically, in the mono-objective formulation, we consider the problem of finding a global minimizer (or maximizer) of an unknown (black-box) objective function f under a set of constraint functions ci: x∗ =argmin f(x) x∈X subject to ci(x) ≤ bi, i = 1,...,q, where X is some input design space of interest and ci are q unknown constraint functions. The problem addressed in this paper is the optimization of a deterministic function f : X → R over a domain of interest that includes lower and upper bounds on the problem variables.”, thus, as also depicted in Figure 1, the optimum of the function is a maximum or minimum value).

Regarding Claim 10, Nardi in view of Tristan teaches the method of claim 1, further comprising updating (Nardi, Pg. 5, D. Active Learning, “Active learning is a paradigm in supervised machine learning which uses fewer training examples to achieve better prediction accuracy by iteratively training a predictor, and using the predictor in each iteration to choose the training examples which will increase its accuracy the most”, therefore, the predictor is iteratively updated to choose examples/points that will increase accuracy), by the one or more processors responsive to the output condition not being satisfied, the weak learner model and the second strong model (See teaching of Tristan reference in Claim 1) using the plurality of second data points and the third data point (Nardi, Pg. 5, D. Active Learning, “The application is evaluated on the sampled points, yielding the labels of the supervised setting given by the multiple objectives. Since our goal is to accurately estimate the points near the Pareto-optimal front, we use the current predictor to provide performance values over the parameter space and thus estimate the Pareto fronts. For the next iteration, only parameter points near the predicted Pareto front are sampled and evaluated, and subsequently used to train new predictors using the entire collection of training points from current and all previous iterations. This process is repeated over a number of iterations forming the active learning loop.”, thus, per active learning optimization, if the output condition is not satisfied, the predictor is iteratively trained/updated such that only parameter points (inclusive of second and third data points) near the predicted value are sampled and evaluated in each iteration).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 12, Nardi in view of Tristan teaches the method of claim 1, further comprising increasing, by the one or more processors, a first count of the at least one second data point to be generated using the weak learner model relative to a second count of the at least one third data point to be generated using the strong learner mode responsive to a measure of effectiveness of the weak learner model satisfying a corresponding threshold (Nardi, Pg. 5, D. Active Learning, “Active learning is a paradigm in supervised machine learning which uses fewer training examples to achieve better prediction accuracy by iteratively training a predictor, and using the predictor in each iteration to choose the training examples which will increase its accuracy the most.” & “Since our goal is to accurately estimate the points near the Pareto-optimal front, we use the current predictor to provide performance values over the parameter space and thus estimate the Pareto fronts. For the next iteration, only parameter points near the predicted Pareto front are sampled and evaluated and subsequently used to train new predictors using the entire collection of training points from current and all previous iterations”, therefore, responsive to a measure of effectiveness satisfying a corresponding threshold (parameter points near the Pareto front/optimum), the count of data points to be generated is increased if the points are found to increase accuracy. Accordingly, these generated data points are used to train new predictors in future iterations, thus, as subsequent predictors are trained the number of points increases during the active learning loop if they are found to be optimal)

Regarding Claim 13, Nardi in view of Tristan teaches a system comprising:
	one or more processors (Tristan, Par. [0037], “Thus, each classifier may be executed using a different compute node or processor core. In some embodiments, each classifier may execute in a separate process or thread without relying on other classifiers in the ensemble.”, therefore one or more processors are disclosed. Further details on the processors can be found in Par. [0082-0083]) configured to: […]
	The rest of the claim language in Claim 13 recites substantially the same limitations as Claim 1, in the form of a system, therefore it is rejected under the same rationale.
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Claim 14 recites substantially the same limitations as Claim 2 in the form of a system, therefore it is rejected under the same rationale. 

Claim 15 recites substantially the same limitations as Claims 3 and 4 in the form of a system, therefore it is rejected under the same rationale. 

Claim 16 recites substantially the same limitations as Claim 5 in the form of a system, therefore it is rejected under the same rationale. 

Claim 17 recites substantially the same limitations as Claim 7 in the form of a system, therefore it is rejected under the same rationale. 

Claim 19 recites substantially the same limitations as Claim 12 in the form of a system, therefore it is rejected under the same rationale. 

Regarding Claim 20, Nardi teaches a method for optimizing a design space comprising:
(i) populating a design space with random N design points evaluated by a function evaluator (Nardi, Pg. 3, B. Multi-Objective Optimization: Problem Statement, “We define f : X → Rd as our vector of objective functions f = (f1,...,fp), taking x as input, and evaluating y = f(x)”, thus a plurality of first data points are generated by evaluating a function. Further, Algorithm 1 on Pg. 5 depicts that the design space X is populated with random N design points that are then evaluated by a function evaluator); 
 (iv) randomly sampling points that the weak learner (See introduction of Tristan reference below for teaching of weak learner model) predicts will be above a selected objective value based on an objective value function (Nardi, Pgs. 3-4, “A decision tree represents a recursive binary partitioning of the input space, and uses a simple decision (a one-dimensional decision threshold) at each non-leaf node that aims at maximizing an “information gain” function. Prediction is performed by “dropping” down the test data point from the root, and letting it traverse a path decided by the node decisions, until it reaches a leaf node. Each leaf node has a corresponding function value (or probability distribution on function values), adjusted according to training data, which is predicted as the function value for the test input. During training, randomization is injected into the procedure to reduce variance and avoid overfitting.”, thus, the points are randomly selected based on predicting a function value – the sampling of points based on a objective function f is also depicted in Figure 1 on Pg. 3 and further explanation can be found in the subsequent paragraphs below Figure 1); 
(v) finding the optimum predicted by strong learner (See introduction of Tristan reference below for teaching of strong learner model) based on the random sample points in step 4 using a global optimization scheme (Nardi, Pg. 2, “Mathematically, in the mono-objective formulation, we consider the problem of finding a global minimizer (or maximizer) of an unknown (black-box) objective function f under a set of constraint functions ci: x∗ =argmin f(x) x∈X subject to ci(x) ≤ bi, i = 1,...,q, where X is some input design space of interest and ci are q unknown constraint functions. The problem addressed in this paper is the optimization of a deterministic function f : X → R over a domain of interest that includes lower and upper bounds on the problem variables.”, thus, a global optimization scheme is implemented to find the optimum based on the objective function f and the random sample points per step 4); 
(vi) adding the parameter identified in step five to the N points to be evaluated by the design function (Nardi, Pgs. 9-10, “Comparisons are synthesized in Figure 7. The optimal Pareto front is very close to the approximated one provided by HyperMapper 2.0, showing our software’s ability to recover the optimal Pareto front. About 1500 total samples are required to recover the Pareto optimum, about the same number of samples for BlackScholes and 66 times fewer for OuterProduct and DotProduct compared to the prior Spatial design space exploration approach using pruning and random sampling”, thus, as also depicted in Figure 7 the optimum is also added to the N points evaluated by the design function); 
(vii) performing design function evaluations on N (Nardi, Pgs. 5-6, “The function Fit RF Classifier() on lines 6 and 15 trains a random forests classifier Mfea to predict if a parameter vector is feasible or infeasible. The classifier becomes increasingly accurate during active learning. Using a classifier to predict the infeasible parameter vectors has proven to be very effective as later shown in Section IV-C. The random forests classifier is represented by the box ”Classifier (Filter)” in Figure 3.”, therefore, design function evaluations are performed on N, also shown in Algorithm 1); and 
(viii) adding the design solution to a database (See introduction of Tristan reference below for teaching of adding design solution to a database) and repeating steps 2-7 until the method converges or maximum number of iterations is reached (Nardi , Pg. 5, “Algorithm 1 shows the pseudo-code of the model-based search algorithm used in HyperMapper 2.0. Figure 3 shows a corresponding graphical representation of the algorithm. The while loop on line 9 in Algorithm 1 is the active learning loop, represented by the big loop in the preprocessing box of Figure 3. The user specifies a maximum number of active learning iterations given by the variable maxAL.”, thus, as shown in Algorithm 1, the steps are repeated until the maximum number of iterations is reached).

Nardi does not explicitly disclose (ii) generating a weak learner model using available data; 
However, Tristan teaches (ii) generating a weak learner model using available data (Tristan, Par. [0036], “In some embodiments, the classifiers may be implemented using different machine learning models, such as, for example, decision trees, support vector machines, neural networks, Bayesian networks, and the like. Decision trees, in particular, work particularly well with feature hashing, because a reduction in the number of input features reduces the complexity of the decision tree (i.e., the number of levels).”, thus, a weak learner model, such as a support vector machine is generated by using a plurality of first data points, as shown in Fig. 1, label 122);

Nardi does not explicitly disclose (iii) generating a strong learner model using the available data; 
However, Tristan teaches (iii) generating a strong learner model using the available data (Tristan, Par. [0036], “In some embodiments, the classifiers may be implemented using different machine learning models, such as, for example, decision trees, support vector machines, neural networks, Bayesian networks, and the like. Decision trees, in particular, work particularly well with feature hashing, because a reduction in the number of input features reduces the complexity of the decision tree (i.e., the number of levels).”, thus, a strong learner model, such as a neural network is generated by using a plurality of first data points, as shown in Fig. 1, label 124);

Nardi does not explicitly disclose (viii) adding the design solution to a database
However, Tristan teaches (viii) adding the design solution to a database (Tristan, Par. [0060], “In some embodiments, the training resources may include a number of storage resources to store training data, truth labels, and the models themselves. Such storage resources may be implemented as, for example, databases, file systems, and the like.”, thus storage resources, such as a database, may be used to store relevant models)

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method for design space optimization, as disclosed by Nardi to include the generation and use of weak and strong learner models and a database to store the design solution, as disclosed by Tristan. One of ordinary skill in the art would have been motivated to make this modification to enable more efficient and accurate optimization of a design space through the use of ensemble machine learning, combining weak and strong learner models to accelerate convergence and including a database such that design solutions/models can be stored and improved incrementally (Tristan, Par. [0044], “The resulting classifiers were then combined to form a decision ensemble. Different ensembles were constructed that varied both the number of contributing classifiers and the feature vector size of the classifiers. The experimental results show the expected tradeoff between feature vector size and accuracy. However, as the size of the ensemble grows, the accuracy of the ultimate classification by the ensemble converges to almost 99%. In all test configurations, the ensembled classification system produced a result that was more than 97% accurate when the ensemble included 10,000 hash featuring classifiers.” & Par. [0006], “Second, the ensemble approach allows system designers to improve the accuracy of the system incrementally, while limiting the complexity of the individual decision models.”)

13.	Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Nardi et al. (hereinafter Nardi) (“Practical Design Space Exploration”), in view of Tristan et al. (hereinafter Tristan) (US PG-PUB 20190095805), further in view of Baker et al. (hereinafter Baker) (US PG-PUB 20200210812).
Regarding Claim 9, Nardi in view of Tristan teaches the method of claim 1. 
Nardi in view of Tristan does not explicitly disclose further comprising: 
generating, by the one or more processors, the strong learner model to include a plurality of neural networks, each neural network of the plurality of neural networks provided with at least one of different weights or different biases; and 
generating, by the one or more processors, the third data point using each of the plurality of neural networks.
However, Baker discloses further comprising:
generating, by the one or more processors, the strong learner model to include a plurality of neural networks (Baker, Par. [0011], “In one aspect, FIG. 1 depicts a combined machine-learning system comprising an ensemble of machine-learning systems 102A-C and a joint optimization network 104, in which the members of the ensemble are neural networks trained to optimize a joint objective from the joint optimization network 104. Each member 102A, 102B, 102C of the ensemble illustrated in FIG. 1 is a neural network that has been pre-trained or that may be trained to optimize its individual objective 103A, 103B, or 103C, respectively, for a specified set of input values 101A, 101B, or 101C, respectively. In some embodiments, each of the neural networks 102A-C is merely initialized, e.g. with random weights. Initialization of a neural network with random weights is well-known to those skilled in the art of training neural networks. Although, three ensemble members 102A-C are shown in FIG. 1, there may be any number of ensemble members. The joint optimization network 104 is also a neural network, with a joint objective 105”, thus, a plurality of neural networks in an ensemble is disclosed), each neural network of the plurality of neural networks provided with at least one of different weights or different biases (Baker, Par. [0042], “In example 203 of Step 202, the computer system allows different ensemble members (e.g., the ensemble members 102A-C) to be trained with different input data sets. Example 203 includes cases in which each data item is multiplied by a weight and the weights are different in different ensemble members. Different subsets may also be represented by multiplying by weights, using weights of zero and one. In effect different weights arise naturally as a side effect of bagging, since some data items may occur multiple times while others do not occur at all. Explicitly weighted data items occur in the initial building of an ensemble in some variants of boosting. Other examples of data selection and data weighting occur in joint optimization training, as discussed in more detail in association with Steps 208 and 209.”, thus, each neural network in the plurality is provided with different weights); and 
generating, by the one or more processors, the third data point using each of the plurality of neural networks (Baker, Par. [0047], “In Step 206, in the first pass through the loop, the computer system adds joint optimization network 104 to the ensemble member networks, to create the full system illustrated in FIG. 1. In Step 206, the computer system initializes the learned parameters of network 104. In some embodiments, joint optimization network 104 is initialized to mimic a simple ensemble-combining rule. For example, initially joint optimization network 104 may compute the arithmetic or geometric average of the output values computed by the ensemble members 102A-C.”, therefore, each of the ensemble neural networks produces an output and the output may be averaged/combined among networks to produce a single output value/point).

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method for design space optimization of Claim 1, as disclosed by Nardi in view of Tristan to include the strong learner model that comprises a plurality of neural networks where each neural network is provided with different weights, as disclosed by Baker. One of ordinary skill in the art would have been motivated to make this modification as a combination of multiple neural networks with different weights would provide better performance and decrease error due to greater diversity among ensemble members (Baker, Par. [0004], “An “ensemble” of machine learning systems is a plurality of machine learning systems, such as neural networks, where the plurality of machine learning systems together solve a problem. Each ensemble member typically implements a separate model and the ensemble typically combines the outputs of the separate ensemble members in some manner of voting or averaging of the member output to produce a desired output for the ensemble. Frequently, an ensemble of machine learning systems performs better than any individual ensemble member because the various errors of the systems average out.”)

Claim 18 recites substantially the same limitations as Claim 9 in the form of a system, therefore it is rejected under the same rationale. 

14.	Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Nardi et al. (hereinafter Nardi) (“Practical Design Space Exploration”), in view of Tristan et al. (hereinafter Tristan) (US PG-PUB 20190095805), further in view of Cheng et al. (hereinafter Cheng) (“ThermalNet: A deep reinforcement learning-based combustion optimization system for coal-fired boiler”). 
Regarding Claim 11, Nardi in view of Tristan teaches the method of claim 1
Nardi in view of Tristan does not explicitly disclose further comprising using the function to optimize a combustion process.
However, Cheng teaches further comprising using the function to optimize a combustion process (Cheng, Pg. 1, Abstract, “This paper presents a combustion optimization system for coal-fired boilers that includes a trade-off between emissions control and boiler efficiency. Designing an optimizer for this nonlinear, multiple-input multiple-output problem is challenging. This paper describes the development of an integrated combustion optimization system called ThermalNet, which is based on a deep Q-network (DQN) and a long short-term memory (LSTM) module.”, thus, optimization of a combustion process is disclosed. Further details on the reward function that is used to optimize the combustion process is shown on Pg. 2 and Figure 1).

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method for design space optimization of Claim 1, as disclosed by Nardi in view of Tristan to include using the function to optimize a combustion process, as disclosed by Cheng. One of ordinary skill in the art would have been motivated to make this modification as machine learning models can optimize a combustion process more quickly and efficiently, without requiring expert knowledge and allowing for lower experimentation costs (Cheng, Pg. 1, 1. Introduction, “Large coal-fired power plants are major contributors to total pollutant emissions; consequently, they offer the possibility of reducing emissions through increased thermal efficiency. It is difficult to understand a complex mechanism such as NOx combustion and emissions with only a limited knowledge of combustion theory and chemical kinetics; however, physical experimentation may be expensive (Janakiraman et al., 2016). Furthermore, to control the pollutant discharge without impairing power generation efficiency, the power system needs to dynamically regulate numerous control variables. In this paper, a data-based deep neural network model is established to overcome these challenges. Time-varying relationships and combustion process mechanisms can be obtained using the proposed model without requiring expert knowledge. The generality and extensibility of deep neural networks could further enable a wide range of data processing applications that execute faster and with lower costs.”)

Conclusion
15.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure:
Goel et al. (US PG-PUB 20110078100) disclosed methods and systems for a multi-objective evolutionary algorithm based engineering design optimization.
Chen et al. (US PG-PUB 20160048771) disclosed a method for ensemble machine learning, including a random forest model that generates a plurality of trees in parallel. 
Krasser et al. (US PG-PUB 20190026466) disclosed techniques for detecting malware using a computational model, consisting of an ensemble machine learning model.
Urbanke et al. (US PG-PUB 20210042297) disclosed ensemble learning for automating feature generation.
Baker et al. (US PG-PUB 20200410090) disclosed systems and methods for building and training an ensemble of machine learning systems to be robust against adversarial attacks.
Sevastyanov et al. (US PG-PUB 20070005313) disclosed methods for multi-objective optimization methods in a design space. 
Zhang et al. (“Deep Belief Networks Ensemble with Multi-objective Optimization for Failure Diagnosis”) disclosed a multi-objective ensemble of deep belief networks as related to failure diagnosis.
Liu et al. (“On Learning-Based Methods for Design-Space Exploration with High-Level Synthesis”) disclosed high-level synthesis tools for design space exploration.

16.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Devika S Maharaj whose telephone number is 571-272-0829. The examiner can normally be reached Monday - Thursday 7:30am - 4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/D.S.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123