DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1, 3, and 5-20 are presented for examination.

Continued Examination under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on March 29, 2021 has been entered.

Response to Amendment
	Applicant’s amendment has obviated most, but not all, of the objections to the specification, drawings, and claims given in the previous Office Actions.  To the extent that an objection or rejection appears in the previous Office Action(s) but not this Office Action, that objection or rejection is withdrawn.  To the extent that is appears both in a previous Office Action(s) and this Office Action, the objection or rejection is maintained.

Specification
The disclosure is objected to because of the following informalities:
In paragraph 22, equation (10) uses the variable κ (Greek letter kappa); however, subsequent references to the equation use the (Roman) letter K.  Note that Applicant’s amendment has still not cured the issue because the issue was with the use of Roman vs. Greek letters, not with the use of italics vs. non-italics.  If Applicant is using Microsoft Word®, the correct letter may be .
Appropriate correction is required.

Claim Objections
Claims 1 and 9 are objected to because of the following informalities:  for consistency of language, the initial recitation of “an optimization” (or “a Bayesian optimization”) should be “an iteration of an optimization” (or “an iteration of a Bayesian optimization”).  Appropriate correction is required.

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 3, and 5-20 are rejected under 35 U.S.C. 103 as being unpatentable over Brochu et al., “A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning” (2010) (“Brochu”) in view of Vincent et al., “Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion” (2010) (“Vincent”) and further in view of Snoek et al., “Practical Bayesian Optimization of Machine Learning Algorithms” (2012) (“Snoek”).
Regarding claim 1, Brochu discloses “[a] method for reducing dimensions of an input in a black-box and simulation-based optimization, the method comprising: 
generating, by evaluating a black-box function characterizing an equipment component, a first plurality of inputs and a plurality of outputs corresponding to the first plurality of inputs (Bayesian optimization is a strategy for finding the extrema of objective functions that are expensive to evaluate [i.e., black box] and is applicable in situations where one can obtain observations of the function at sampled values; in Bayesian optimization, if xi is the ith sample [input] and f(xi) [output] is the observation of an objective function at xi, and each observation comprises the sample and the observation of the objective function of the sample at a given time, a prior distribution is combined with a likelihood function of the observation given the objective function – Brochu, sec. 1.1; Bayesian optimization can be used to model the appearance of a material [equipment] – id. at sec. 3.2; see also sec. 2.9 (noting that Bayesian optimization is “a means of optimizing difficult black box optimizations” and can be used, inter alia, to learn a set of robot gait parameters)); … [and]
performing an optimization using [a] second plurality of inputs and the plurality of outputs (in Bayesian optimization, the extrema of objective functions that are expensive to evaluate are found; if xi is the ith sample [input] and f(xi) [output] is the observation of an objective function at xi, and each observation comprises the sample and the observation of the objective function of the sample at a given time, a prior distribution is combined with a likelihood function of the observation given the objective function – Brochu, sec. 1.1; as dimensionality is increased, more samples are required to cover the space, and more parameters and hyperparameters may need to be tuned; to deal with this problem, it may be necessary to do automatic feature selection [i.e., use a second plurality of inputs with fewer dimensions than the first] – id. at p. 43, second full paragraph); [and]
based on [a] sampling point, performing at least one more iteration of the optimization until a termination criterion is met (in Bayesian optimization, the extrema of objective functions that are expensive to evaluate are found; if xi is the ith sample [input] and f(xi) [output] is the observation of an objective function at xi, and each observation comprises the sample and the observation of the objective function of the sample at a given time, a prior distribution is combined with a likelihood function of the observation given the objective function – Brochu, sec. 1.1 [so the optimization is based on the sampling points xi]; the text associated with Figure 1 indicates that the optimization is carried out by approximating the objective function over four iterations of sampled values of the objective function; in one formulation of the Bayesian optimization problem, the minimum of an instantaneous regret function over a number T of iterations for which the optimization is to be run [so the termination criterion is having reached the maximum number of iterations T] – id. at p. 15, first full paragraph)….”
five-dimensional input vector x is encoded by encoding function ftheta into a three dimensional vector, which is in turn encoded by encoding function ftheta(2) to a two-dimensional vector – Vincent, p. 3383, right-hand side of Fig. 3); … [and]
decoding, by the machine-trained autoencoder, an output … into a sampling point having dimensions of the first plurality of inputs (decoding function g-theta’ is used to reconstruct the original five-dimensional vector x [input of the first plurality] from encoded representation y [output], thereby producing a five-dimensional reconstruction z [sampling point] – Vincent, Figure 1 [note that Brochu teaches an output of an “optimization” and decoding such an output would merely be a matter of substituting the optimized output of Brochu into the encoded representation y of Vincent, with predictable results, see KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007)]);
wherein the first plurality of inputs … and the second plurality of inputs that are encoded by the machine-trained autoencoder from the first plurality of inputs are multiple dimension vectors (five-dimensional input vector x is encoded by encoding function ftheta into a three dimensional vector, which is in turn encoded by encoding function ftheta(2) to a two-dimensional vector – Vincent, p. 3383, right-hand side of Fig. 3; note that Brochu teaches that the first inputs are generated from the black-box function).”
Brochu and Vincent both relate to machine learning and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Brochu to reduce the dimensionality of the input data using an autoencoder and then decode the reduced-dimensionality encoding, as taught by Vincent, and an ordinary artisan could reasonably expect to do so successfully.  Doing so would cause the decoded representation of the input data to be less noisy than the original input data, thereby extracting the most important features from the data while eliminating the unimportant features.  See Vincent, sec. 3.1 and paragraph immediately preceding it.
Gaussian process is a convenient and powerful prior distribution on functions, which can be taken to be of the form f: X [Wingdings font/0xE0] R [where R is the set of real numbers in one dimension and X is a bounded subset of RD] – Snoek, sec. 2 up to sec. 2.1, first paragraph; note that Brochu discloses that the outputs are generated from a black-box function and that the function f disclosed by Snoek could be any variety of function, including the black-box function of Brochu).”
Brochu, Vincent, and Snoek all relate to artificial intelligence and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Brochu and Vincent to make the output a scalar, as disclosed by Snoek.  Doing so would reduce the computational expense of performing the evaluation of the objective function.  See Snoek, first two paragraphs of sec. 2.

Regarding claim 3, Brochu, as modified by Vincent and Snoek, discloses that “encoding the first plurality of inputs comprises applying layers of non-linear transformations to the first plurality of inputs to generate the second plurality of inputs (five-dimensional input vector x [first inputs] is encoded by encoding function ftheta [transformation] into a three dimensional vector, which is in turn encoded by encoding function ftheta(2) [second layer of transformations] to a two-dimensional vector [second inputs] – Vincent, p. 3383, right-hand side of Fig. 3; typical form of the mapping ftheta is an affine mapping followed by a nonlinearity – id. at sec. 2.2, second paragraph).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Brochu and Snoek to apply nonlinear transformations to the inputs to generate the reduced-dimensional inputs, as disclosed by Vincent, and an ordinary artisan could reasonably expect to do so successfully.  Doing so would cause the decoded representation of the input data to be less noisy than the original input data, thereby extracting the most important features from the data while eliminating the unimportant features.  See Vincent, sec. 3.1 and paragraph immediately preceding it.

five-dimensional input vector x [first inputs] is encoded by encoding function ftheta [transformation] into a three dimensional vector, which is in turn encoded by encoding function ftheta(2) [second layer of transformations] to a two-dimensional vector [second inputs] – Vincent, p. 3383, right-hand side of Fig. 3; typical form of the mapping ftheta is an affine mapping followed by a nonlinearity – id. at sec. 2.2, second paragraph).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Brochu and Snoek to apply nonlinear transformations to the inputs to generate the reduced-dimensional inputs, as disclosed by Vincent, and an ordinary artisan could reasonably expect to do so successfully.  Doing so would allow the autoencoder to learn useful representations of its inputs by reducing such representations to their essential features without unnecessary noise.  See Vincent, secs. 2.1 (discussing the need for retention of significant amounts of information about the input), 2.3 (discussing the need to separate useful information from noise).

Regarding claim 5, Brochu, as modified by Vincent and Snoek, discloses that “the autoencoder is a stacked denoising autoencoder (text of Vincent Figure 3 describes the autoencoder as a “stacking denoising autoencoder”).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Brochu to use a stacked denoising autoencoder to reduce the dimensionality of the input data and decode the reduced-dimensional data, as disclosed by Vincent, and an ordinary artisan could reasonably expect to do so successfully.  Doing so would cause the decoded representation of the input data to be less noisy than the original input data, thereby extracting the most important features from the data while eliminating the unimportant features.  See Vincent, sec. 3.1 and paragraph immediately preceding it.

see Brochu, sec. 1.1 and rejection of claim 1 above and note that Bayesian optimization was previously recited therein).”  

Regarding claim 7, Brochu, as modified by Vincent and Snoek, discloses that “the output of the Bayesian optimization is a sampling point (in a typical run of Bayesian optimization on a 1D problem, at each iteration an acquisition function is maximized to determine where next to sample from the objective function; the objective is then sampled at the argmax of the acquisition function, the Gaussian process is updated, and the process is repeated [so the output at each iteration is a sampling point of the objective function corresponding to the argmax of the acquisition function] – Brochu, last paragraph before sec. 1.2; see also Fig. 1, particularly red dots).”  

Regarding claim 8, Brochu, as modified by Vincent and Snoek, discloses “evaluating the black-box function at the sampling point (Bayesian optimization uses an acquisition function to determine the next location to sample; the optimization technique has the property that it aims to minimize the number of objective function evaluations – Brochu, last paragraph of p. 3 [note that Vincent teaches decoding and decoding the sampling point would merely be a matter of using the stacked denoising autoencoder of Vincent to perform such decoding, with predictable results, see KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007)).”

Regarding claim 9, Brochu discloses “[a] system for reducing dimensions of an input in an optimization, the system comprising: 
a memory configured to store a plurality of input vectors and a plurality of outputs for an unknown function that characterizes requirements for equipment design (Bayesian optimization is a strategy for finding the extrema of objective functions that are expensive to evaluate [i.e., unknown] and is applicable in situations where one can obtain observations of the function at sampled values; in Bayesian optimization, if xi is the ith sample [input vector] and f(xi) [output] is the observation of an objective function at xi, and each observation comprises the sample and the observation of the objective function of the sample at a given time, a prior distribution is combined with a likelihood function of the observation given the objective function – Brochu, sec. 1.1; Bayesian optimization can be used to model the appearance of a material [requirement for equipment design] – id. at sec. 3.2; see also sec. 2.9 (noting that Bayesian optimization is “a means of optimizing difficult black box optimizations” and can be used, inter alia, to learn a set of robot gait parameters), sec. 4.1 (discussing the use of memory in a Bayesian optimization task for hierarchical control)); and 
a processor (in active user modeling, computers must ask the right questions and the number of questions must be kept to a minimum – Brochu, last paragraph before sec. 1.1 [note that the recitation of a computer implies the existence of a processor]) configured to: 
receive, from the memory, the plurality of input vectors and the plurality of outputs (Bayesian optimization is a strategy for finding the extrema of objective functions that are expensive to evaluate [i.e., unknown] and is applicable in situations where one can obtain observations of the function at sampled values; in Bayesian optimization, if xi is the ith sample [input vector] and f(xi) [output] is the observation of an objective function at xi, and each observation comprises the sample and the observation of the objective function of the sample at a given time, a prior distribution is combined with a likelihood function of the observation given the objective function – Brochu, sec. 1.1); … [and]
perform a Bayesian optimization based on [a] reduced dimensional space of the plurality of input vectors and the plurality of outputs (in Bayesian optimization, the extrema of objective functions that are expensive to evaluate are found; if xi is the ith sample [input vector] and f(xi) [output] is the observation of an objective function at xi, and each observation comprises the sample and the observation of the objective function of the sample at a given time, a prior distribution is combined with a likelihood function of the observation given the objective function – Brochu, sec. 1.1; as dimensionality is increased, more samples are required to cover the space, and more parameters and hyperparameters may need to be tuned; to deal with this problem, it may be necessary to do automatic feature selection [i.e., reduce the dimensional space of the input vectors] – id. at p. 43, second full paragraph); … [and]
based on … sampling points, perform at least one more iteration of the Bayesian optimization until a termination criterion is met (in Bayesian optimization, the extrema of objective functions that are expensive to evaluate are found; if xi is the ith sample [input] and f(xi) [output] is the observation of an objective function at xi, and each observation comprises the sample and the observation of the objective function of the sample at a given time, a prior distribution is combined with a likelihood function of the observation given the objective function – Brochu, sec. 1.1 [so the optimization is based on the sampling points xi]; the text associated with Figure 1 indicates that the optimization is carried out by approximating the objective function over four iterations of sampled values of the objective function; in one formulation of the Bayesian optimization problem, the minimum of an instantaneous regret function over a number T of iterations for which the optimization is to be run [so the termination criterion is having reached the maximum number of iterations T] – id. at p. 15, first full paragraph)….”
Vincent discloses “reduc[ing], with a machine-learnt stacked autoencoder, a dimensional space of the plurality of input vectors (five-dimensional input vector x is encoded by encoding function ftheta into a three dimensional vector, which is in turn encoded by encoding function ftheta(2) to a two-dimensional vector – Vincent, p. 3383, right-hand side of Fig. 3); … [and]
project[ing], with the stacked autoencoder, an output … into sampling points having the dimensional space of the plurality of input vectors (decoding function g-theta’ is used to reconstruct the original five-dimensional vector x [input vector] from encoded representation y [output], thereby producing a five-dimensional reconstruction [sampling point] z – Vincent, Figure 1 [note that Brochu teaches an output of a “Bayesian optimization” and projecting such an output onto the input space dimensionality would merely be a matter of substituting the optimized output of Brochu into the encoded representation y of Vincent, with predictable results, see KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007)]),
wherein the plurality of input vectors … and the reduced dimensional space from the machine-learnt stacked autoencoder are multiple dimension vectors (five-dimensional input vector x is encoded by encoding function ftheta into a three dimensional vector, which is in turn encoded by encoding function ftheta(2) to a two-dimensional vector – Vincent, p. 3383, right-hand side of Fig. 3; note that Brochu teaches that the first inputs are generated from the unknown function)….”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Brochu to reduce the dimensionality of the input data using an autoencoder and then decode the reduced-dimensionality encoding, as taught by Vincent, and an ordinary artisan could reasonably expect to do so successfully.  Doing so would cause the decoded representation of the input data to be less noisy than the original input data, thereby extracting the most important features from the data while eliminating the unimportant features.  See Vincent, sec. 3.1 and paragraph immediately preceding it.
Snoek further discloses that “the plurality of outputs … are single-dimensional vectors (Gaussian process is a convenient and powerful prior distribution on functions, which can be taken to be of the form f: X [Wingdings font/0xE0] R [where R is the set of real numbers in one dimension and X is a bounded subset of RD] – Snoek, sec. 2 up to sec. 2.1, first paragraph; note that Brochu discloses that the outputs are generated from an unknown function and that the function f disclosed by Snoek could be any variety of function, including the unknown function of Brochu).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Brochu and Vincent to make the output a scalar, as disclosed by Snoek.  Doing so would reduce the computational expense of performing the evaluation of the objective function.  See Snoek, first two paragraphs of sec. 2.

in a typical run of Bayesian optimization on a 1D problem, at each iteration an acquisition function is maximized to determine where next to sample from the objective function; the objective is then sampled at the argmax of the acquisition function, the Gaussian process is updated, and the process is repeated [so the output at each iteration is a sampling point of the objective function corresponding to the argmax of the acquisition function] – Brochu, last paragraph before sec. 1.2; see also Fig. 1, particularly red dots).”

Regarding claim 11, Brochu, as modified by Vincent and Snoek, discloses that “the processor is further configured to:  
28evaluate the unknown function at the sampling point [in] the dimensional space of the plurality of input vectors (black box optimization involves querying at a point [input vector] x and getting a (possibly noisy) response and typically requires that all dimensions have bounds on the search space, such that the search space is a hyperrectangle of dimension d – Brochu, p. 5, first full paragraph; Bayesian optimization involves sampling [evaluating] the objective [unknown] function at point xt [of dimension d] – id. at Algorithm 1; note that Vincent discloses that the sampling point is “projected” onto the dimensional space of the input via decoding, and the evaluation of the function can take place in the same dimensionality as the input as disclosed in Brochu).”

Regarding claim 12, Brochu, as modified by Vincent and Snoek, discloses that “the processor is further configured to: 
update the plurality of input vectors and the plurality of outputs for the unknown function with an input vector and an output for the evaluated sampling point (in Bayesian optimization, at each time step, an acquisition function is optimized given all the data at all previous time steps over a Gaussian process such that a vector xt [input vector] results; the objective function of that vector is sampled such that yt [output for evaluated sampling point] results, the data are augmented with (xt, yt) and the GP is updated, and the process repeats for a next time step – Brochu, p. 6, Algorithm 1).” 

Regarding claim 13, Brochu, as modified by Vincent and Snoek, discloses that “the Bayesian optimization comprises a Gaussian process to generate a probabilistic model of the unknown function at the … space (Bayesian optimization on a problem may include a Gaussian process approximation of the objective function over four iterations of sampled values of the objective function; the area on the left has high uncertainty [i.e., the function’s model is probabilistic] – Brochu, Fig. 1 and associated text, especially the zones of uncertainty around the true objective function; note that Vincent discloses reducing the dimensionality of the space and that the reduced dimensional space generated by Vincent could be substituted in to the Gaussian process of Brochu with predictable results, see KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007)).”

Regarding claim 14, Brochu discloses “[a] method for reducing input dimensions for optimizing an unknown function, the method comprising: 
generating a plurality of input vectors and a plurality of outputs based on an unknown function characterizing an equipment component (Bayesian optimization is a strategy for finding the extrema of objective functions that are expensive to evaluate [i.e., unknown] and is applicable in situations where one can obtain observations of the function at sampled values; in Bayesian optimization, if xi is the ith sample [input vector] and f(xi) [outputs] is the observation of an objective function at xi, and each observation comprises the sample and the observation of the objective function of the sample at a given time, a prior distribution is combined with a likelihood function of the observation given the objective function – Brochu, sec. 1.1; Bayesian optimization can be used to model the appearance of a material [equipment] – id. at sec. 3.2); … [and]
optimizing parameters of … feature vectors based on the plurality of outputs (in Bayesian optimization, at each time step, an acquisition function is optimized given all the data at all previous time steps over a Gaussian process such that a vector xt [feature vector] results; the objective function of that vector is sampled such that yt [output] results, the data are augmented with (xt, yt) and the GP is updated, and the process repeats for a next time step [so that the xt+1 of the next time step is based upon the outputs yj at all previous time steps j = 1, …, t] – Brochu, p. 6, Algorithm 1; note that Examiner construes the “parameters” to be the elements of the vector xt; compare specification paragraphs 17-18 (the next point represents the multi-dimensional input vector x* with optimized parameters of the size and shape of a turbine blade, where x* = argminx f(x))); [and] …
based on … parameters for [an] … input vector, performing at least one more iteration of the optimization until a termination criterion is met (in Bayesian optimization, the extrema of objective functions that are expensive to evaluate are found; if xi is the ith sample [input] and f(xi) [output] is the observation of an objective function at xi, and each observation comprises the sample and the observation of the objective function of the sample at a given time, a prior distribution is combined with a likelihood function of the observation given the objective function – Brochu, sec. 1.1 [so the optimization is based on the parameters of the input vector xi]; the text associated with Figure 1 indicates that the optimization is carried out by approximating the objective function over four iterations of sampled values of the objective function; in one formulation of the Bayesian optimization problem, the minimum of an instantaneous regret function over a number T of iterations for which the optimization is to be run [so the termination criterion is having reached the maximum number of iterations T] – id. at p. 15, first full paragraph)….”
Vincent discloses “extracting, with a machine-learnt stacked autoencoder, a plurality of feature vectors from the plurality of input vectors, wherein the feature vectors are represented by fewer dimensions than the input vectors (five-dimensional input vector x is encoded by encoding function ftheta into a three dimensional vector, which is in turn encoded by encoding function ftheta(2) to a two-dimensional vector [feature vector] – Vincent, p. 3383, right-hand side of Fig. 3); … [and]
29decoding, by the stacked autoencoder, the … parameters of the extracted feature vectors to generate parameters for an … input vector (decoding function g-theta’ is used to reconstruct the original five-dimensional vector x from encoded representation y [feature vector, whose parameters are its entries], thereby producing [generating] a five-dimensional reconstruction [input vector, whose parameters are its entries] z – Vincent, Figure 1 [note that Brochu teaches that the parameters of the vector are “optimized” and decoding such parameters would merely be a matter of substituting the optimized parameters of Brochu into the encoded representation y of Vincent, with predictable results, see KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007)]),
wherein the plurality of input vectors … and the plurality of feature vectors that are extracted by the machine-learnt autoencoder from the plurality of input vectors are multiple dimension vectors (five-dimensional input vector x is encoded by encoding function ftheta into a three dimensional vector, which is in turn encoded by encoding function ftheta(2) to a two-dimensional vector – Vincent, p. 3383, right-hand side of Fig. 3; note that Brochu teaches that the first inputs are generated from the unknown function)….”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Brochu to reduce the dimensionality of the input data using an autoencoder and then decode the reduced-dimensionality encoding, as taught by Vincent, and an ordinary artisan could reasonably expect to do so successfully.  Doing so would cause the decoded representation of the input data to be less noisy than the original input data, thereby extracting the most important features from the data while eliminating the unimportant features.  See Vincent, sec. 3.1 and paragraph immediately preceding it.
Snoek further discloses that “the plurality of outputs … are single-dimensional vectors (Gaussian process is a convenient and powerful prior distribution on functions, which can be taken to be of the form f: X [Wingdings font/0xE0] R [where R is the set of real numbers in one dimension and X is a bounded subset of RD] – Snoek, sec. 2 up to sec. 2.1, first paragraph; note that Brochu discloses that the outputs are generated from an unknown function and that the function f disclosed by Snoek could be any variety of function, including the unknown function of Brochu).”
See Snoek, first two paragraphs of sec. 2.

Regarding claim 15, Brochu, as modified by Vincent and Snoek, discloses that “extracting the plurality of feature vectors comprises applying a plurality of non-linear transformations, each non-linear transformation comprising one of a plurality of layers of the stacked autoencoder (five-dimensional input vector x is encoded by encoding function ftheta [transformation] into a three dimensional vector, which is in turn encoded by encoding function ftheta(2) to a two-dimensional vector [feature vector] – Vincent, p. 3383, right-hand side of Fig. 3; typical form of the mapping ftheta is an affine mapping followed by a nonlinearity – id. at sec. 2.2, second paragraph).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Brochu and Snoek to apply multiple non-linear transformations to the feature vectors, as disclosed by Vincent, and an ordinary artisan could reasonably expect to do so successfully.  Doing so would cause the decoded representation of the input data to be less noisy than the original input data, thereby extracting the most important features from the data while eliminating the unimportant features.  See Vincent, sec. 3.1 and paragraph immediately preceding it.

Regarding claim 16, Brochu, as modified by Vincent and Snoek, discloses that “generating the plurality of input vectors and the plurality of outputs comprises sampling the unknown function (Bayesian optimization is applicable in situations where one has certain observations of an objective function at sampled values; it is particularly useful when these evaluations are costly – Brochu, first paragraph of sec. 1.1).” 

see rejection of claim 14 supra and note that the optimization disclosed by Brochu is a Bayesian optimization; note also that Vincent teaches the extraction of the feature vectors and that the motivation to extract the feature vectors using an autoencoder as disclosed in Vincent is the same as that enumerated in claim 14).”

Regarding claim 18, Brochu, as modified by Vincent and Snoek, discloses that “performing the Bayesian optimization comprises a Gaussian process generating a probabilistic model for the unknown feature vectors based on the plurality of outputs (Gaussian process approximation of an objective function involves the calculation of an acquisition function that is high where the Gaussian process predicts a high objective and where the prediction uncertainty is high; area on the left has high uncertainty [i.e., low probability] – Brochu, Fig. 1 and accompanying text, esp. purple zones of uncertainty labeled “posterior uncertainty”; Gaussian process involves the calculation of a probability of certain objective functions of sample feature vectors x1, …, xt – id. at p. 25, first full paragraph).”  

Regarding claim 19, Brochu, as modified by Vincent and Snoek, discloses that “the … parameters for the optimized input vector comprise a new sampling point for the unknown function (in Bayesian optimization, at each time step, an acquisition function is optimized given all the data at all previous time steps over a Gaussian process such that a vector xt [input vector, whose entries collectively are a new sampling point] results; the objective function of that vector is sampled such that yt results, the data are augmented with (xt, yt) and the GP is updated, and the process repeats for a next time step – Brochu, p. 6, Algorithm 1; note that the Bayesian optimization process disclosed by Brochu could be applied both to the “feature vectors” and “generated parameters for input vectors” of Vincent, as this would merely involve substitution of one mathematical object for another with predictable results, see KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007)).”

Regarding claim 20, Brochu, as modified by Vincent and Snoek, discloses “evaluating the unknown function at the new sampling point (in Bayesian optimization, at each time step, an acquisition function is optimized given all the data at all previous time steps over a Gaussian process such that a vector xt [sampling point] results; the objective function of that vector is sampled [evaluated] such that yt results – Brochu, p. 6, Algorithm 1); and 
updating the plurality of input vectors and the plurality of outputs based on the new sampling point (in Bayesian optimization, at each time step, an acquisition function is optimized given all the data at all previous time steps over a Gaussian process such that a vector xt [input vector, whose entries collectively are a new sampling point] results; the objective function of that vector is sampled such that yt [output] results, the data are augmented [updated] with (xt, yt) and the GP is updated, and the process repeats for a next time step – Brochu, p. 6, Algorithm 1).”

Response to Arguments
Applicant's arguments filed March 29, 2021 (“Remarks”) have been fully considered but they are not persuasive.
	Applicant’s only substantive arguments are (a) that Vincent allegedly fails to disclose “performing an optimization using a second plurality of inputs and the plurality of outputs” because Vincent does not perform an optimization using single-dimensional outputs, and (b) that Vincent allegedly fails to disclose “decoding, by the machine-trained autoencoder, an output of the optimization into a sampling point having dimensions of the first plurality of inputs; and based on the sampling point, performing at least one more iteration of the optimization until a termination criterion is met” because Vincent allegedly merely discloses reconstructing the latent representation to the dimensions of the original input.  Remarks at 13.
Regarding (a), Applicant’s statement that Examiner alleges that Vincent teaches the limitation is false.  As the rejection above clearly illustrates, Examiner’s position is that Brochu teaches the limitation.  Furthermore, while Examiner agrees that Vincent does not perform an optimization with single-dimensional In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).  Altering the optimization function of Brochu so that it produces a single-dimensional output, as disclosed by Snoek, would be well within the capability of the ordinary artisan.
Regarding (b), the argument suffers the same deficiency as argument (a), namely that the references are being attacked individually where the rejection is based on the combination.  Examiner’s position is that Brochu, not Vincent, teaches the optimization, including “performing one more iteration of the optimization,” and that Vincent teaches the encoding and decoding using an autoencoder.  As discussed above, an ordinary artisan would know how to perform sequential Bayesian optimization (disclosed by Brochu) on a wide variety of inputs and outputs, including those generated by an autoencoder (disclosed by Vincent).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Probst, “Denoising Autoencoders for Fast Combinatorial Black Box Optimization,” in Proc. Companion Publication 2015 Ann. Conf. Genetic and Evolutionary Computation 1459-60 (2015) (disclosing the integration of a denoising autoencoder into an estimation of distribution algorithm for combinatorial optimization).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RYAN C VAUGHN whose telephone number is (571)272-4849.  The examiner can normally be reached on M-R 7a-5:30p ET.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/R.C.V./             Examiner, Art Unit 2125

/KAMRAN AFSHAR/             Supervisory Patent Examiner, Art Unit 2125