DETAILED ACTION
This action is in response to the claims filed 12/09/2021 for application 15/899,599. Claims 1, 3, 4, 6, 7, 9, and 10 have been amended. Claim 8 has been canceled. Claims 1-7, 9 and 10 are currently pending. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-7, 9 and 10 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1, 
Step 1 Analysis: Claim 1 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 1 recites, in part, calculating a value of a first objective function and a second objective function and updating the first model parameter and the second model parameter so that a value of a third objective function is optimized including a sum of the value of the first objective function and a value of the second objective function and the learning method being used to learn the first model 
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites the additional elements – “first and second objective function”, “first and second model parameter”, “hyperparameter”, and “model”. These elements that are recited are only generally linked to the judicial exception. Additionally, the claim recites the – “one or more hardware processors”. Thus, the element in the claim is recited at a high level of generality (i.e. as a generic processor performing a generic computer function of generating an index) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing a first/second/third objective function, first and second model parameter, hyperparameter, and model to perform the steps of the claimed process amount to no more than generally linking the elements to the judicial exception. Additionally, the one or more hardware processor amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.  

Regarding claim 2, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein the distance scale is a distance scale in a predetermined projective space. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 3, the rejection of claim 2 is further incorporated, and further, the claim recites: wherein the model is a neural network, and the distance scale is a distance scale in a projective space indicating an output of an interlayer of the neural network. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 4, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein the distance scale is an average of a distance between each of a plurality of pieces of first learning data and second learning data that is a piece of learning data in which a distance from the piece of learning data to the first learning data is shorter than a distance from the piece of learning data to other piece of learning data. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 5, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein the distance scale is calculated for each piece of learning data. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 6, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein the hyperparameter is for calculating the smoothness. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 7, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein the model is a neural network. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 



Regarding claim 9, 
Step 1 Analysis: Claim 9 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 9 recites, in part, calculating a value of a first objective function and a second objective function and updating the first model parameter and the second model parameter so that a value of a third objective function is optimized including a sum of the value of the first objective function and a value of the second objective function and the learning method being used to learn the first model parameter and to which the hyperparameter is set. The limitations of calculating a value of a first objective function and a second objective function and updating the first model parameter and the second model parameter so that a value of a third objective function is optimized including a sum of the value of the first objective function and a value of the second objective function and the learning method being used to learn the first model parameter and to which the hyperparameter is set, as drafted, are processes that, under broadest reasonable interpretation, covers the recitation of mathematical relationships which falls within the “Mathematical concepts” grouping of abstract ideas. Additionally, the limitations of controlling information processing using the model determined by the updated first model parameter and wherein the information processing includes speech recognition, image recognition, character recognition, prediction of abnormality of a device, and prediction of a value of a sensor, under broadest reasonable interpretation, covers performance of the limitation in the mind. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites the additional elements – “first and second objective function”, “first and second model parameter”, “hyperparameter”, and “model”. These elements that are recited are only generally linked to the judicial exception. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing a first/second/third objective function, first and second model parameter, hyperparameter, and model to perform the steps of the claimed process amount to no more than generally linking the elements to the judicial exception. The claim is not patent eligible.  

Regarding claim 10, 
Step 1 Analysis: Claim 10 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 10 recites, in part, calculating a value of a first objective function and a second objective function and updating the first model parameter and the second model parameter so that a value of a third objective function is optimized including a sum of the value of the first objective function and a value of the second objective function and the learning method being used to learn the first model parameter and to which the hyperparameter is set. The limitations of calculating a value of a first objective function and a second objective function and updating the first model parameter and the second model parameter so that a value of a third objective function is optimized including a sum of the value of the first objective function and a value of the second objective function and the learning method being used to learn the first model parameter and to which the hyperparameter is set, as drafted, are processes that, under broadest reasonable interpretation, covers the recitation of mathematical relationships which falls within the “Mathematical concepts” grouping of abstract ideas. Additionally, the limitations of controlling information processing using the model determined by the updated first model parameter and wherein the information processing includes speech recognition, image recognition, character recognition, prediction of abnormality of a device, and prediction of a value of a sensor, under broadest reasonable interpretation, covers performance of the limitation in the mind. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites the additional elements – “first and second objective function”, “first and second model parameter”, “hyperparameter”, and “model”. These elements that are recited are only generally linked to the judicial exception. Additionally, the claim recites the – “computer program product”, “non-transitory computer readable medium”, and “computer”. Thus, these elements in the claim are recited at a high level of generality (i.e. as a generic processor performing a generic computer function of generating an index) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing a first/second/third objective function, first and second model parameter, hyperparameter, and model to perform the steps of the claimed process amount to no more than generally linking the elements to the judicial exception. Additionally, the computer program product, non-transitory computer readable medium, and computer amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-7, 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Adams et al. (US 20140358831 A1, cited by Applicant in the IDS filed on 02/20/2018, hereinafter "Adams") in view of Miyato et al. ("Distributional Smoothing with Virtual Adversarial Training", cited by Applicant in the IDS filed on 02/20/2018, hereinafter "Miyato") and further in view of Yoshizumi ("US 20180018567 A1", hereinafter "Yoshizumi") and further in view of Hill et al. ("Anomaly detection in streaming environmental sensor data: A data-driven modeling approach", hereinafter "Hill").

Regarding claim 1, Adams teaches An information processing device comprising: 
one or more hardware processors (Adams discloses use of processors in [¶0020]) configured to:
calculate a value of a first objective function (“a first point at which to evaluate an objective function in the plurality of objective functions; selecting, based at least in part on the joint probabilistic model, a first objective function in the plurality of objective functions to evaluate at the identified first point; evaluating the first objective function at the identified first point; and updating the joint probabilistic model based on results of the evaluation to obtain an updated joint probabilistic model.” [¶0037]) and a value of a second objective function (“a second objective function in the plurality of objective functions to evaluate at the identified first point; and evaluating the second objective function at the identified first point.” [¶0040]), the first objective function being used to estimate a first model parameter for determining the model (“In some embodiments, including any of the preceding embodiments, the first objective function relates values of a plurality of hyper-parameters of a neural network for identifying objects in images to respective values providing a measure of performance of the neural network in identifying the objects in the images.” [¶0019; Examiner is interpreting relates to be equivalent to the objective function being “used to estimate”. Additionally, see ¶0084 “Regardless of the type of probabilistic model used for modeling the objective function, the probabilistic model may be used to obtain an estimate of the objective function and a measure of uncertainty associated with the estimate.”]), the second objective function being used to estimate a second model parameter that is a hyperparameter of a learning method of learning (“As another non-limiting example, one of the multiple related tasks may comprise identifying hyper-parameters to optimize performance of one machine learning system for a first set data associated, and another task of the multiple related tasks may comprise identifying hyper-parameters to optimize performance of another related machine learning system for a second set data (the first set of data may be different from or the same as the second set of data). In each of these examples, the objective function corresponding to a particular task may relate hyper-parameter values of the machine learning system to its performance.” [¶0187]), the learning method being used to learn the first model parameter of the model by using the first objective function (“For example, as illustrated in FIG. 1, machine learning system 102 may be configured by first manually setting hyper-parameters 104, and subsequently learning, during training stage 110, the values of parameters 106a, based on training data 108 and hyper-parameters 104, to obtain learned parameter values 106b. The performance of the configured machine learning system 112 may then be evaluated during the evaluation stage 116, by using testing data 114 to calculate one or more values providing a measure of performance 118 of the configured machine learning system 112. Measure of performance 118 may be a measure of generalization performance and/or any other suitable measure of performance.” [¶0057; See further: “This approach involves treating the problem of setting hyper-parameters of a machine learning system as an optimization problem whose goal is to find a set of hyper-parameter values for a machine learning system that correspond to the best performance of the machine learning system and applying an optimization technique to solve this optimization problem. To this end, the relationship between the hyper-parameter values of a machine learning system and its performance may be considered an objective function for the optimization problem” [¶0061]]), the second model parameter to be estimated being closer to a distance scale of learning data (“For example, when the objective function relates hyper-parameter values of a machine learning system to its performance, a Gaussian process having a short-length scale may be more appropriate for modeling the objective function at points near its maximum value and a Gaussian process having a longer-length scale may be more appropriate for modeling the objective function at points farther away from its maximum value (e.g., because a machine learning system may perform equally poorly for all "bad" values of hyper-parameters, but its performance may be sensitive to small tweaks in "good" hyper-parameter regimes). In contrast, a stationary Gaussian process model would represent the objective function using the same length scale for all points on which the objective function is defined.” [¶0136; Examiner is interpreting the second model parameter to be equivalent to hyperparameter values that are evaluated to be close to the hyperparameters that are believed to be “good”. Examiner is interpreting distance scale to be a distance metric in a space, Adams discloses “evaluating the objective function at hyper-parameter values far away, according to a suitable distance metric” in ¶0094]), the first model parameter being learned by the learning method to which the hyperparameter is set (“For example, when the objective function relates values of hyper-parameters of a machine learning system to its performance, the estimate of the objective function obtained based on the probabilistic model may provide an estimate of the performance of the machine learning system for each set of hyper-parameter values and the measure of uncertainty associated with the estimate may provide a measure of uncertainty (e.g., a variance, a confidence, etc.) associated with the estimate of how well the machine learning system performs for a particular set of hyper-parameter values.” [¶0084]); 
update the first model parameter and the second model parameter so that a value of a third objective function is optimized (“The values of the parameter(s) may be updated when one or more additional evaluations of any of the multiple objective functions are performed. In this way, the parameter(s) of the joint probabilistic model that model correlation among tasks in the plurality of tasks may be adaptively estimated.” [¶0179; Adams further discloses optimization in ¶0061]).
and control information processing using the model determined by the updated first model parameter (“The method comprises using at least one computer hardware processor to perform: identifying a first point at which to evaluate the objective function at least in part by using an acquisition utility function and a probabilistic model of the objective function, wherein the probabilistic model depends on a non-linear one-to-one mapping of elements in the first domain to elements in a second domain; evaluating the objective function at the identified first point to obtain a corresponding first value of the objective function; and updating the probabilistic model of the objective function using the first value to obtain an updated probabilistic model of the objective function.” [¶0025; Examiner is interpreting controlling information processing to be equivalent to using the first value of the objective function to update the probabilistic model.]), 
wherein the information processing includes speech recognition (“machine learning systems for processing radar data, machine learning systems for speech processing (e.g., speech recognition, speaker identification, speaker diarization, natural language understanding etc.), and machine learning systems for machine translation.” [¶0076]), image recognition (“Other non-limiting examples of machine learning systems to which Bayesian optimization techniques described herein may be applied (to set the hyper-parameters of the machine system) include, but are not limited to, machine learning systems for medical image processing (e.g., machine learning systems for identifying anomalous objects in medical images” [¶0076]), character recognition (“Another non-limiting example of such a machine learning system is a machine learning system for processing natural language text (e.g., identifying one or more topics in the text, text mining, etc.)” [¶0076]), 
However Adams fails to explicitly teach the first objective function including smoothness that indicates smoothness of a local distribution of an output of a model
Miyato teaches the first objective function including smoothness that indicates smoothness of a local distribution of an output of a model (“
    PNG
    media_image1.png
    192
    535
    media_image1.png
    Greyscale
” [pg. 2, § 2.1 Formalization of Local Distributional Smoothness, note: D is being used to train the objective function and Q is the output of a model.])
Adams and Miyato are both in the same field of endeavor of optimizing objective functions in neural networks. Adams discloses a method of optimizing an objective function to obtain an updated model. Miyato discloses a local distributional smoothness technique with virtual adversarial training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Adam’s first objective function by substituting it for the local distribution smoothness objective function as taught by Miyato. One would have been motivated to make this modification in order to prevent overfitting by making the optimal parameter less dependent on the likelihood term. [§1 Introduction, ¶2, Miyato]
However Adams/Miyato fails to explicitly teach the value of the third objective function including a sum of the value of the first objective function and the value of the second objective function. 
Yoshizumi the value of the third objective function including a sum of the value of the first objective function and the value of the second objective function (“the apparatus 100 may be operable to calculate a solution of a minimization problem for the third objective function h*(i, j) defined as the sum of the first objective function g*(i, j) and the second objective function f*(i, j) by performing the processing from S110 to S160.” [¶0043]).
Adams/Miyato/Yoshizumi are all in the same field of endeavor of optimizing objective functions in neural networks. Adams discloses a method of optimizing an objective function to obtain an updated model. Miyato discloses a local distributional smoothness technique with virtual adversarial training. Yoshizumi teaches calculating an objective function based on two objective functions. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Adams’/Miyato’s teachings by calculating a third objective function by summing two objective functions as taught by Yoshizumi. One would have been motivated to make this modification in order to acquire a solution for an optimization problem. [¶0004, Yoshizumi]
However Adams/Miyato/Yoshizumi fails to explicitly teaches prediction of abnormality of a device, and prediction of a value of a sensor.
Hill teaches prediction of abnormality of a device (“when the measurement at time t + 1 arrives from the sensor, compare the sensor measurement with this range, and if it falls outside the range, classify it as anomalous, otherwise classify it is non-anomalous” [pg. 1015, § 2. Methods, ¶2]), and prediction of a value of a sensor (“the expected value of the sensor measurement at time t + 1; (2) calculate the upper and lower bounds of the range within which the sensor measurement should lie (i.e., the prediction interval) with probability p;” [pg. 1015, 2. Methods, ¶2]).
Adams/Miyato/Yoshizumi/Hill are all in the same field of endeavor of machine learning. Adams discloses a method of optimizing an objective function to obtain an updated model. Miyato discloses a local distributional smoothness technique with virtual adversarial training. Yoshizumi teaches calculating an objective function based on two objective functions. Hill teaches anomaly detection in sensor data. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Adams’/Miyato’s/Yoshizumi’s teachings by including predicting an abnormality of a device and a prediction value of a sensor as taught by Hill. Anomaly detection of sensors is well-known in the field of machine learning and thus one would have been motivated to make this modification to yield predictable results. 

Regarding claim 2, Adams/Miyato/Yoshizumi/Hill teaches The device according to claim 1, where Adams further teaches wherein the distance scale is a distance scale in a predetermined projective space (“For example, when the objective function relates hyper-parameters of a machine learning system to its performance, the Gaussian process is defined on the space of hyper-parameters such that the mean function maps sets of hyper-parameter values (each set of hyper-parameter values corresponding to values of one or more hyper-parameters of the machine learning system) to real numbers and the covariance function represents correlation among sets of hyper-parameter values.” [¶0078; Examiner is interpreting predetermined projective space to be equivalent to a space of hyperparameters.]).

Regarding claim 3, Adams/Miyato/Yoshizumi/Hill teaches The device according to claim 2, where Adams further teaches wherein the model is a neural network (“In some embodiments, including any of the preceding embodiments, the probabilistic model of the objective function comprises a Gaussian process or a neural network.” [¶0011]), and the distance scale is a distance scale in a projective space indicating an output of an interlayer of the neural network (“
    PNG
    media_image2.png
    280
    411
    media_image2.png
    Greyscale
” [¶0083; Examiner is interpreting the final layer of a multi-layer neural network to be equivalent to an output of an interlayer of the neural network]).

Regarding claim 4, Adams/Miyato/Yoshizumi/Hill teaches The device according to claim 1, where Adams further teaches wherein the distance scale is an average of a distance between each of a plurality of pieces of first learning data and second learning data that is a piece of learning data in which a distance from the piece of learning data to the first learning data is shorter than a distance from the piece of learning data to other piece of learning data (“In T-fold cross-validation, the data used to train a machine learning system is partitioned into T subsets, termed "folds," and the measure of performance of a machine learning system is calculated as the average performance of the machine learning system across the T folds. The performance of the machine learning system for a particular fold is obtained by training the machine learning system on data in all other folds and evaluating the performance of the system on data in the particular fold. Accordingly, to evaluate the performance of the machine learning system for a particular set of hyper-parameter values, the machine learning system must be trained T times, which is computationally expensive for complex machine learning systems and/or large datasets. However, it is likely that the measures of performance associated with each of the T folds are correlated with one another, such that evaluating the performance of the machine learning system for a particular fold using a set of hyper-parameter values may provide information indicating the performance of the machine learning system for another fold using the same set of hyper-parameter values. As a result, performance of the machine learning system may not need to be evaluated for each one of T folds for each set of hyper-parameter values.” [¶0182; Examiner is interpreting T subsets (folds) to be equivalent to pieces of learning data. Additionally learning data shorter than other pieces of learning data is being interpreted as equivalent to a particular fold with a similar correlation with another fold, therefore the distance between them is “shorter” when compared to other folds.]).

Regarding claim 5, Adams/Miyato/Yoshizumi/Hill teaches The device according to claim 1, where Adams further teaches wherein the distance scale is calculated for each piece of learning data (“The objective function for a task relates hyper-parameter values of the machine learning system to performance of the machine learning system for the cross-validation fold associated with the task (e.g., the objective function for the task associated with cross-validation fold t relates values of hyper-parameters of the machine learning system to performance of the machine learning system calculated by training the machine learning system on data in all folds other than fold t and evaluating the performance of the resulting trained machine learning system on data in fold t.).” [¶0183; note: all the T folds is equivalent to each piece of learning data. As noted above and disclosed in ¶0094, examiner is interpreting distance scale to be the distance metric in a space. Adams further discloses the measure of performance is based off the correlation of T-folds (i.e. pieces of learning data). It is implicit that a distance metric would need to be calculated for each T-folds for each set of hyper parameter values in order to measure the performance of the system.]).

Regarding claim 6, Adams/Miyato/Yoshizumi/Hill teaches The learning device according to claim 1, where Miyato further teaches wherein the hyperparameter is for calculating the smoothness (“Large value of LDS therefore forces large relative margin around the decision boundary. One can achieve large value of LDS g with L2 regularization, dropout and random perturbation training with appropriate choice of hyperparameters by smoothing the model distribution globally” [pg. 7, ¶1]).
Adams/Miyato/Yoshizumi/Hill are all in the same field of endeavor of machine learning. Adams discloses a method of optimizing an objective function to obtain an updated model. Miyato discloses a local distributional smoothness technique with virtual adversarial training. Yoshizumi teaches calculating an objective function based on two objective functions. Hill teaches anomaly detection in sensor data. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Adams’/Miyato/Yoshizumi/Hill by substituting the first objective function taught by Adams for the local distribution smoothness objective function as taught by Miyato. One would have been motivated to make this modification in order to prevent overfitting by making the optimal parameter less dependent on the likelihood term. [§1 Introduction, ¶2, Miyato]

Regarding claim 7, Adams/Miyato/Yoshizumi/Hill teaches The device according to claim 1, where Adams further teaches wherein the model is a neural network (“In some embodiments, including any of the preceding embodiments, the probabilistic model of the objective function comprises a Gaussian process or a neural network.” [¶0011]).

Regarding claim 9, Adams teaches An information processing device comprising: 
calculating a value of a first objective function (“a first point at which to evaluate an objective function in the plurality of objective functions; selecting, based at least in part on the joint probabilistic model, a first objective function in the plurality of objective functions to evaluate at the identified first point; evaluating the first objective function at the identified first point; and updating the joint probabilistic model based on results of the evaluation to obtain an updated joint probabilistic model.” [¶0037]) and a value of a second objective function (“a second objective function in the plurality of objective functions to evaluate at the identified first point; and evaluating the second objective function at the identified first point.” [¶0040]), the first objective function being used to estimate a first model parameter for determining the model (“In some embodiments, including any of the preceding embodiments, the first objective function relates values of a plurality of hyper-parameters of a neural network for identifying objects in images to respective values providing a measure of performance of the neural network in identifying the objects in the images.” [¶0019; Examiner is interpreting relates to be equivalent to the objective function being “used to estimate”. Additionally, see ¶0084 “Regardless of the type of probabilistic model used for modeling the objective function, the probabilistic model may be used to obtain an estimate of the objective function and a measure of uncertainty associated with the estimate.”]), the second objective function being used to estimate a second model parameter that is a hyperparameter of a learning method of learning (“As another non-limiting example, one of the multiple related tasks may comprise identifying hyper-parameters to optimize performance of one machine learning system for a first set data associated, and another task of the multiple related tasks may comprise identifying hyper-parameters to optimize performance of another related machine learning system for a second set data (the first set of data may be different from or the same as the second set of data). In each of these examples, the objective function corresponding to a particular task may relate hyper-parameter values of the machine learning system to its performance.” [¶0187]), the learning method being used to learn the first model parameter of the model by using the first objective function (“For example, as illustrated in FIG. 1, machine learning system 102 may be configured by first manually setting hyper-parameters 104, and subsequently learning, during training stage 110, the values of parameters 106a, based on training data 108 and hyper-parameters 104, to obtain learned parameter values 106b. The performance of the configured machine learning system 112 may then be evaluated during the evaluation stage 116, by using testing data 114 to calculate one or more values providing a measure of performance 118 of the configured machine learning system 112. Measure of performance 118 may be a measure of generalization performance and/or any other suitable measure of performance.” [¶0057; See further: “This approach involves treating the problem of setting hyper-parameters of a machine learning system as an optimization problem whose goal is to find a set of hyper-parameter values for a machine learning system that correspond to the best performance of the machine learning system and applying an optimization technique to solve this optimization problem. To this end, the relationship between the hyper-parameter values of a machine learning system and its performance may be considered an objective function for the optimization problem” [¶0061]]), the second model parameter to be estimated being closer to a distance scale of learning data (“For example, when the objective function relates hyper-parameter values of a machine learning system to its performance, a Gaussian process having a short-length scale may be more appropriate for modeling the objective function at points near its maximum value and a Gaussian process having a longer-length scale may be more appropriate for modeling the objective function at points farther away from its maximum value (e.g., because a machine learning system may perform equally poorly for all "bad" values of hyper-parameters, but its performance may be sensitive to small tweaks in "good" hyper-parameter regimes). In contrast, a stationary Gaussian process model would represent the objective function using the same length scale for all points on which the objective function is defined.” [¶0136; Examiner is interpreting the second model parameter to be equivalent to hyperparameter values that are evaluated to be close to the hyperparameters that are believed to be “good”. Examiner is interpreting distance scale to be a distance metric in a space, Adams discloses “evaluating the objective function at hyper-parameter values far away, according to a suitable distance metric” in ¶0094]), the first model parameter being learned by the learning method to which the hyperparameter is set (“For example, when the objective function relates values of hyper-parameters of a machine learning system to its performance, the estimate of the objective function obtained based on the probabilistic model may provide an estimate of the performance of the machine learning system for each set of hyper-parameter values and the measure of uncertainty associated with the estimate may provide a measure of uncertainty (e.g., a variance, a confidence, etc.) associated with the estimate of how well the machine learning system performs for a particular set of hyper-parameter values.” [¶0084]); 
updating the first model parameter and the second model parameter so that a value of a third objective function is optimized (“The values of the parameter(s) may be updated when one or more additional evaluations of any of the multiple objective functions are performed. In this way, the parameter(s) of the joint probabilistic model that model correlation among tasks in the plurality of tasks may be adaptively estimated.” [¶0179; Adams further discloses optimization in ¶0061]).
and control information processing using the model determined by the updated first model parameter (“The method comprises using at least one computer hardware processor to perform: identifying a first point at which to evaluate the objective function at least in part by using an acquisition utility function and a probabilistic model of the objective function, wherein the probabilistic model depends on a non-linear one-to-one mapping of elements in the first domain to elements in a second domain; evaluating the objective function at the identified first point to obtain a corresponding first value of the objective function; and updating the probabilistic model of the objective function using the first value to obtain an updated probabilistic model of the objective function.” [¶0025; Examiner is interpreting controlling information processing to be equivalent to using the first value of the objective function to update the probabilistic model.]), 
wherein the information processing includes speech recognition (“machine learning systems for processing radar data, machine learning systems for speech processing (e.g., speech recognition, speaker identification, speaker diarization, natural language understanding etc.), and machine learning systems for machine translation.” [¶0076]), image recognition (“Other non-limiting examples of machine learning systems to which Bayesian optimization techniques described herein may be applied (to set the hyper-parameters of the machine system) include, but are not limited to, machine learning systems for medical image processing (e.g., machine learning systems for identifying anomalous objects in medical images” [¶0076]), character recognition (“Another non-limiting example of such a machine learning system is a machine learning system for processing natural language text (e.g., identifying one or more topics in the text, text mining, etc.)” [¶0076]), 
However Adams fails to explicitly teach the first objective function including smoothness that indicates smoothness of a local distribution of an output of a model
Miyato teaches the first objective function including smoothness that indicates smoothness of a local distribution of an output of a model (“
    PNG
    media_image1.png
    192
    535
    media_image1.png
    Greyscale
” [pg. 2, § 2.1 Formalization of Local Distributional Smoothness, note: D is being used to train the objective function and Q is the output of a model.])
Adams and Miyato are both in the same field of endeavor of optimizing objective functions in neural networks. Adams discloses a method of optimizing an objective function to obtain an updated model. Miyato discloses a local distributional smoothness technique with virtual adversarial training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Adam’s first objective function by substituting it for the local distribution smoothness objective function as taught by Miyato. One would have been motivated to make this modification in order to prevent overfitting by making the optimal parameter less dependent on the likelihood term. [§1 Introduction, ¶2, Miyato]
However Adams/Miyato fails to explicitly teach the value of the third objective function including a sum of the value of the first objective function and the value of the second objective function. 
Yoshizumi the value of the third objective function including a sum of the value of the first objective function and the value of the second objective function (“the apparatus 100 may be operable to calculate a solution of a minimization problem for the third objective function h*(i, j) defined as the sum of the first objective function g*(i, j) and the second objective function f*(i, j) by performing the processing from S110 to S160.” [¶0043]).
Adams/Miyato/Yoshizumi are all in the same field of endeavor of optimizing objective functions in neural networks. Adams discloses a method of optimizing an objective function to obtain an updated model. Miyato discloses a local distributional smoothness technique with virtual adversarial training. Yoshizumi teaches calculating an objective function based on two objective functions. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Adams’/Miyato’s teachings by calculating a third objective function by summing two objective functions as taught by Yoshizumi. One would have been motivated to make this modification in order to acquire a solution for an optimization problem. [¶0004, Yoshizumi]
However Adams/Miyato/Yoshizumi fails to explicitly teaches prediction of abnormality of a device, and prediction of a value of a sensor.
Hill teaches prediction of abnormality of a device (“when the measurement at time t + 1 arrives from the sensor, compare the sensor measurement with this range, and if it falls outside the range, classify it as anomalous, otherwise classify it is non-anomalous” [pg. 1015, § 2. Methods, ¶2]), and prediction of a value of a sensor (“the expected value of the sensor measurement at time t + 1; (2) calculate the upper and lower bounds of the range within which the sensor measurement should lie (i.e., the prediction interval) with probability p;” [pg. 1015, 2. Methods, ¶2]).
Adams/Miyato/Yoshizumi/Hill are all in the same field of endeavor of machine learning. Adams discloses a method of optimizing an objective function to obtain an updated model. Miyato discloses a local distributional smoothness technique with virtual adversarial training. Yoshizumi teaches calculating an objective function based on two objective functions. Hill teaches anomaly detection in sensor data. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Adams’/Miyato’s/Yoshizumi’s teachings by including predicting an abnormality of a device and a prediction value of a sensor as taught by Hill. Anomaly detection of sensors is well-known in the field of machine learning and thus one would have been motivated to make this modification to yield predictable results. 

Regarding claim 10, Adams teaches A computer program product having a non-transitory computer readable medium including programmed instructions, wherein the instructions, when executed by a computer (See [¶0207-0209]), cause the computer to execute: 
calculating a value of a first objective function (“a first point at which to evaluate an objective function in the plurality of objective functions; selecting, based at least in part on the joint probabilistic model, a first objective function in the plurality of objective functions to evaluate at the identified first point; evaluating the first objective function at the identified first point; and updating the joint probabilistic model based on results of the evaluation to obtain an updated joint probabilistic model.” [¶0037]) and a value of a second objective function (“a second objective function in the plurality of objective functions to evaluate at the identified first point; and evaluating the second objective function at the identified first point.” [¶0040]), the first objective function being used to estimate a first model parameter for determining the model (“In some embodiments, including any of the preceding embodiments, the first objective function relates values of a plurality of hyper-parameters of a neural network for identifying objects in images to respective values providing a measure of performance of the neural network in identifying the objects in the images.” [¶0019; Examiner is interpreting relates to be equivalent to the objective function being “used to estimate”. Additionally, see ¶0084 “Regardless of the type of probabilistic model used for modeling the objective function, the probabilistic model may be used to obtain an estimate of the objective function and a measure of uncertainty associated with the estimate.”]), the second objective function being used to estimate a second model parameter that is a hyperparameter of a learning method of learning (“As another non-limiting example, one of the multiple related tasks may comprise identifying hyper-parameters to optimize performance of one machine learning system for a first set data associated, and another task of the multiple related tasks may comprise identifying hyper-parameters to optimize performance of another related machine learning system for a second set data (the first set of data may be different from or the same as the second set of data). In each of these examples, the objective function corresponding to a particular task may relate hyper-parameter values of the machine learning system to its performance.” [¶0187]), the learning method being used to learn the first model parameter of the model by using the first objective function (“For example, as illustrated in FIG. 1, machine learning system 102 may be configured by first manually setting hyper-parameters 104, and subsequently learning, during training stage 110, the values of parameters 106a, based on training data 108 and hyper-parameters 104, to obtain learned parameter values 106b. The performance of the configured machine learning system 112 may then be evaluated during the evaluation stage 116, by using testing data 114 to calculate one or more values providing a measure of performance 118 of the configured machine learning system 112. Measure of performance 118 may be a measure of generalization performance and/or any other suitable measure of performance.” [¶0057; See further: “This approach involves treating the problem of setting hyper-parameters of a machine learning system as an optimization problem whose goal is to find a set of hyper-parameter values for a machine learning system that correspond to the best performance of the machine learning system and applying an optimization technique to solve this optimization problem. To this end, the relationship between the hyper-parameter values of a machine learning system and its performance may be considered an objective function for the optimization problem” [¶0061]]), the second model parameter to be estimated being closer to a distance scale of learning data (“For example, when the objective function relates hyper-parameter values of a machine learning system to its performance, a Gaussian process having a short-length scale may be more appropriate for modeling the objective function at points near its maximum value and a Gaussian process having a longer-length scale may be more appropriate for modeling the objective function at points farther away from its maximum value (e.g., because a machine learning system may perform equally poorly for all "bad" values of hyper-parameters, but its performance may be sensitive to small tweaks in "good" hyper-parameter regimes). In contrast, a stationary Gaussian process model would represent the objective function using the same length scale for all points on which the objective function is defined.” [¶0136; Examiner is interpreting the second model parameter to be equivalent to hyperparameter values that are evaluated to be close to the hyperparameters that are believed to be “good”. Examiner is interpreting distance scale to be a distance metric in a space, Adams discloses “evaluating the objective function at hyper-parameter values far away, according to a suitable distance metric” in ¶0094]), the first model parameter being learned by the learning method to which the hyperparameter is set (“For example, when the objective function relates values of hyper-parameters of a machine learning system to its performance, the estimate of the objective function obtained based on the probabilistic model may provide an estimate of the performance of the machine learning system for each set of hyper-parameter values and the measure of uncertainty associated with the estimate may provide a measure of uncertainty (e.g., a variance, a confidence, etc.) associated with the estimate of how well the machine learning system performs for a particular set of hyper-parameter values.” [¶0084]); 
updating the first model parameter and the second model parameter so that a value of a third objective function is optimized (“The values of the parameter(s) may be updated when one or more additional evaluations of any of the multiple objective functions are performed. In this way, the parameter(s) of the joint probabilistic model that model correlation among tasks in the plurality of tasks may be adaptively estimated.” [¶0179; Adams further discloses optimization in ¶0061]).
and control information processing using the model determined by the updated first model parameter (“The method comprises using at least one computer hardware processor to perform: identifying a first point at which to evaluate the objective function at least in part by using an acquisition utility function and a probabilistic model of the objective function, wherein the probabilistic model depends on a non-linear one-to-one mapping of elements in the first domain to elements in a second domain; evaluating the objective function at the identified first point to obtain a corresponding first value of the objective function; and updating the probabilistic model of the objective function using the first value to obtain an updated probabilistic model of the objective function.” [¶0025; Examiner is interpreting controlling information processing to be equivalent to using the first value of the objective function to update the probabilistic model.]), 
wherein the information processing includes speech recognition (“machine learning systems for processing radar data, machine learning systems for speech processing (e.g., speech recognition, speaker identification, speaker diarization, natural language understanding etc.), and machine learning systems for machine translation.” [¶0076]), image recognition (“Other non-limiting examples of machine learning systems to which Bayesian optimization techniques described herein may be applied (to set the hyper-parameters of the machine system) include, but are not limited to, machine learning systems for medical image processing (e.g., machine learning systems for identifying anomalous objects in medical images” [¶0076]), character recognition (“Another non-limiting example of such a machine learning system is a machine learning system for processing natural language text (e.g., identifying one or more topics in the text, text mining, etc.)” [¶0076]), 
However Adams fails to explicitly teach the first objective function including smoothness that indicates smoothness of a local distribution of an output of a model
Miyato teaches the first objective function including smoothness that indicates smoothness of a local distribution of an output of a model (“
    PNG
    media_image1.png
    192
    535
    media_image1.png
    Greyscale
” [pg. 2, § 2.1 Formalization of Local Distributional Smoothness, note: D is being used to train the objective function and Q is the output of a model.])
Adams and Miyato are both in the same field of endeavor of optimizing objective functions in neural networks. Adams discloses a method of optimizing an objective function to obtain an updated model. Miyato discloses a local distributional smoothness technique with virtual adversarial training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Adam’s first objective function by substituting it for the local distribution smoothness objective function as taught by Miyato. One would have been motivated to make this modification in order to prevent overfitting by making the optimal parameter less dependent on the likelihood term. [§1 Introduction, ¶2, Miyato]
However Adams/Miyato fails to explicitly teach the value of the third objective function including a sum of the value of the first objective function and the value of the second objective function. 
Yoshizumi the value of the third objective function including a sum of the value of the first objective function and the value of the second objective function (“the apparatus 100 may be operable to calculate a solution of a minimization problem for the third objective function h*(i, j) defined as the sum of the first objective function g*(i, j) and the second objective function f*(i, j) by performing the processing from S110 to S160.” [¶0043]).
Adams/Miyato/Yoshizumi are all in the same field of endeavor of optimizing objective functions in neural networks. Adams discloses a method of optimizing an objective function to obtain an updated model. Miyato discloses a local distributional smoothness technique with virtual adversarial training. Yoshizumi teaches calculating an objective function based on two objective functions. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Adams’/Miyato’s teachings by calculating a third objective function by summing two objective functions as taught by Yoshizumi. One would have been motivated to make this modification in order to acquire a solution for an optimization problem. [¶0004, Yoshizumi]
However Adams/Miyato/Yoshizumi fails to explicitly teaches prediction of abnormality of a device, and prediction of a value of a sensor.
Hill teaches prediction of abnormality of a device (“when the measurement at time t + 1 arrives from the sensor, compare the sensor measurement with this range, and if it falls outside the range, classify it as anomalous, otherwise classify it is non-anomalous” [pg. 1015, § 2. Methods, ¶2]), and prediction of a value of a sensor (“the expected value of the sensor measurement at time t + 1; (2) calculate the upper and lower bounds of the range within which the sensor measurement should lie (i.e., the prediction interval) with probability p;” [pg. 1015, 2. Methods, ¶2]).
Adams/Miyato/Yoshizumi/Hill are all in the same field of endeavor of machine learning. Adams discloses a method of optimizing an objective function to obtain an updated model. Miyato discloses a local distributional smoothness technique with virtual adversarial training. Yoshizumi teaches calculating an objective function based on two objective functions. Hill teaches anomaly detection in sensor data. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Adams’/Miyato’s/Yoshizumi’s teachings by including predicting an abnormality of a device and a prediction value of a sensor as taught by Hill. Anomaly detection of sensors is well-known in the field of machine learning and thus one would have been motivated to make this modification to yield predictable results. 

Response to Arguments
Applicant's arguments filed 12/09/2021 have been fully considered but they are not persuasive. 

Regarding the 35 U.S.C. § 101 rejection: 
Applicant’s arguments regarding the 101 rejection of claims 1-7, 9, and 10 have been considered but are not persuasive. Examiner asserts that the limitations of claim 1 do involve the recitation of mathematical concepts. In part, calculating a first and second objective function and summing the objective functions are all mathematical concepts that is recited in the claim. Additionally, the newly amended limitations regarding controlling information processing can be interpreted under broadest reasonable interpretation as steps that can be performed practically in the mind thus recites a mental process. Furthermore, the one or more hardware processors recited in the claim amount to no more than mere instructions to apply the exception using a generic computer component. The claim does not provide any details to show an improvement to the functionality of the processor.  Please see the updated 101 rejection above.

Regarding the 35 U.S.C. § 103 rejection:
Applicant’s arguments regarding the 103 rejection of claim 1, 9 and 10 have been considered but are not persuasive. Applicant appears to assert that Adams fails to explicitly teach “the learning method being used to learn the first model parameter of the model” and “the first model parameter being learned by the learning method to which the hyperparameter is set”. However, these limitations are taught in paras [¶0057], [¶0061], and [¶0084] as shown above in the prior art rejection. Please see the updated prior art rejection above. 

Please see the updated 103 rejection for how the newly amended limitations are addressed by the new arts presented by Yoshizumi and Hill. 

Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims.

Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        



/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122