DETAILED ACTION
This action is in response to the application filed 02/20/2018 which claims foreign priority to JP2017-169448 filed on 09/04/2017. Claims 1-10 are pending and have been considered.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/20/2018 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Specification
The disclosure is objected to because of the following informalities:
 Paragraphs are not labeled. 
Specification uses the term “neutral” network instead of “neural” network. The term “neutral network” is not a term of art and for examination purposes, examiner will read the term as a “neural network”.
 Appropriate correction is required.
Claim Objections
Claim 3, 4, and 7 is objected to because of the following informalities:  
Regarding Claims 3 and 7, "neutral" should read "neural". For examination purposes, the examiner will be interpreting a neutral network as a neural network.  
Regarding claim 4, lines 5-6 recite “first learning data is shorter than from other piece of learning data.” appears to be grammatically incorrect. For examination purposes, the examiner is interpreting this limitation as “shorter than other pieces of learning data”. 
 Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitations uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: 
a calculator configured to calculate in claim 1.
a learner configured to update in claim 1.
a controller configured to control in claim 8. 
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recites sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim does not fall within at least one of the four statutory categories of patent eligible subject matter because the claim can be interpreted as directed to signal per se. The specification fails to explicitly disclose whether a computer readable medium is not to be construed as being transitory signals per se. Examiner proposes the applicant to amend computer readable medium to be non-transitory computer readable medium. 

Claims 1-10 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1, 
Step 1 Analysis: Claim 1 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 1 recites, in part, calculating a value of a first objective function and a second objective function and updating the first model parameter and the second model parameter. The limitations of calculating a value of a first objective function and a second objective function and updating the first model parameter and the second model parameter, as drafted, are processes that, under broadest reasonable interpretation, covers the recitation of mathematical relationships which falls within the “Mathematical concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites the additional elements – “first and second objective function”, “”, “first and second model parameter”, “hyperparameter”, and “model”. These elements that are recited are only generally linked to the judicial exception. Additionally, the claim recites the – “learning device”, “calculator”, and “learner”. These elements invoke 112(f) and can be interpreted to be a processor as disclosed on pg. 14 of the specification. Thus, the elements in the claim are recited at a high level generality (i.e. as a generic processor performing a generic computer function of generating an index) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing a first and second objective function, first and second model parameter, hyperparameter, and model to perform the steps of the claimed process amount to no more than generally linking the elements to the judicial exception. Additionally, the learning device, calculator, and learner amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.  

Regarding claim 2, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein the distance scale is a distance scale in a predetermined projective space. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 3, the rejection of claim 2 is further incorporated, and further, the claim recites: wherein the model is a neutral network, and the distance scale is a distance scale in a projective space indicating an output of an interlayer of the neutral network. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 4, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein the distance scale is an average of a distance between each of a plurality of pieces of first learning data and second learning data that is a piece of learning data in which a distance from the piece of learning data to the first learning data is shorter than from other piece of learning data. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 5, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein the distance scale is calculated for each piece of learning data. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 6, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein the hyperparameter is for calculating the smoothness. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 7, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein the model is a neutral network. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 8, the rejection of claim 1 is further incorporated, and further, the claim recites: the learning device according to claim 1; and configured to control information processing using the model determined by the updated first model parameter. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does recite the additional elements “information processing device” and “a controller”, however they do not amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 1 above. The claim is not patent eligible. 

Regarding claim 9, 
Step 1 Analysis: Claim 9 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 9 recites, in part, calculating a value of a first objective function and a second objective function and updating the first model parameter and the second model parameter. The limitations of calculating a value of a first objective function and a second objective function and updating the first model parameter and the second model parameter, as drafted, are processes that, under broadest reasonable interpretation, covers the recitation of mathematical relationships which falls within the “Mathematical concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites the additional elements – “first and second objective function”, “”, “first and second model parameter”, “hyperparameter”, and “model”. These elements that are recited are only generally linked to the judicial exception. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing a first and second objective function, first and second model parameter, hyperparameter, and model to perform the steps of the claimed process amount to no more than generally linking the elements to the judicial exception. The claim is not patent eligible.  

Regarding claim 10, 
Step 1 Analysis: As noted above, Claim 10 recites a computer readable medium which fails the requirement for step 1 analysis as it does not fall within any one of the four statutory categories.
Step 2A Prong 1 Analysis: Claim 10 recites, in part, calculating a value of a first objective function and a second objective function and updating the first model parameter and the second model parameter. The limitations of calculating a value of a first objective function and a second objective function and updating the first model parameter and the second model parameter, as drafted, are processes that, under broadest reasonable interpretation, covers the recitation of mathematical relationships which falls within the “Mathematical concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites the additional elements – “first and second objective function”, “”, “first and second model parameter”, “hyperparameter”, and “model”. These elements that are recited are only generally linked to the judicial exception. Additionally, the claim recites the – “computer program product”, “computer readable medium”, and “computer”. The elements in the claim are recited at a high level generality (i.e. as a generic processor performing a generic computer function of generating an index) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing a first and second objective function, first and second model parameter, hyperparameter, and model to perform the steps of the claimed process amount to no more than generally linking the elements to the judicial exception. Additionally, the computer program product, computer readable medium, and computer amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-10 are rejected under 35 U.S.C. 103 as being unpatentable over Adams et al. (US 20140358831 A1, cited by Applicant in the IDS filed on 02/20/2018, hereinafter "Adams") in view of Miyato et al. ("Distributional Smoothing with Virtual Adversarial Training", cited by Applicant in the IDS filed on 02/20/2018, hereinafter "Miyato").

Regarding claim 1, Adams teaches A learning device comprising: 
a calculator (Adams discloses use of processors in [¶0020]) configured to calculate a value of a first objective function (“a first point at which to evaluate an objective function in the plurality of objective functions; selecting, based at least in part on the joint probabilistic model, a first objective function in the plurality of objective functions to evaluate at the identified first point; evaluating the first objective function at the identified first point; and updating the joint probabilistic model based on results of the evaluation to obtain an updated joint probabilistic model.” [¶0037; Examiner is interpreting the claim under 112f and thus interpreting a processor to be equivalent structure to limitations that invoke 112(f).]) and a value of a second objective function (“a second objective function in the plurality of objective functions to evaluate at the identified first point; and evaluating the second objective function at the identified first point.” [¶0040]), the first objective function being used to estimate a first model parameter for determining the model (“In some embodiments, including any of the preceding embodiments, the first objective function relates values of a plurality of hyper-parameters of a neural network for identifying objects in images to respective values providing a measure of performance of the neural network in identifying the objects in the images.” [¶0019; Examiner is interpreting relates to be equivalent to the objective function being “used to estimate”. Additionally, see ¶0084 “Regardless of the type of probabilistic model used for modeling the objective function, the probabilistic model may be used to obtain an estimate of the objective function and a measure of uncertainty associated with the estimate.”]), the second objective function being used to estimate, with a second model parameter that is a hyperparameter of a learning method of learning, the model by using the first objective function (“As another non-limiting example, one of the multiple related tasks may comprise identifying hyper-parameters to optimize performance of one machine learning system for a first set data associated, and another task of the multiple related tasks may comprise identifying hyper-parameters to optimize performance of another related machine learning system for a second set data (the first set of data may be different from or the same as the second set of data). In each of these examples, the objective function corresponding to a particular task may relate hyper-parameter values of the machine learning system to its performance.” [¶0187]), the second model parameter to be estimated being closer to a distance scale of learning data (“For example, when the objective function relates hyper-parameter values of a machine learning system to its performance, a Gaussian process having a short-length scale may be more appropriate for modeling the objective function at points near its maximum value and a Gaussian process having a longer-length scale may be more appropriate for modeling the objective function at points farther away from its maximum value (e.g., because a machine learning system may perform equally poorly for all "bad" values of hyper-parameters, but its performance may be sensitive to small tweaks in "good" hyper-parameter regimes). In contrast, a stationary Gaussian process model would represent the objective function using the same length scale for all points on which the objective function is defined.” [¶0136; Examiner is interpreting the second model parameter to be equivalent to hyperparameter values that are evaluated to be close to the hyperparameters that are believed to be “good”. Examiner is interpreting distance scale to be a distance metric in a space, Adams discloses “evaluating the objective function at hyper-parameter values far away, according to a suitable distance metric” in ¶0094]); and a learner configured to (Adams discloses use of processors in [¶0020]) update the first model parameter and the second model parameter so that the value of the first objective function and the value of the second objective function are optimized (“The values of the parameter(s) may be updated when one or more additional evaluations of any of the multiple objective functions are performed. In this way, the parameter(s) of the joint probabilistic model that model correlation among tasks in the plurality of tasks may be adaptively estimated.” [¶0179; It is implicit that evaluating multiple objective functions would include both first and second objective functions. Adams further discloses optimization in ¶0061]).
However Adams fails to explicitly teach the first objective function including smoothness that indicates smoothness of a local distribution of an output of a model
Miyato teaches the first objective function including smoothness that indicates smoothness of a local distribution of an output of a model (“
    PNG
    media_image1.png
    192
    535
    media_image1.png
    Greyscale
” [pg. 2, § 2.1 Formalization of Local Distributional Smoothness, note: D is being used to train the objective function and Q is the output of a model.])
Adams and Miyato are both in the same field of endeavor of optimizing objective functions in neural networks. Adams discloses a method of optimizing an objective function to obtain an updated model. Miyato discloses a local distributional smoothness technique with virtual adversarial training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Adam’s first objective function by substituting it for the local distribution smoothness objective function as taught by Miyato. One would have been motivated to make this modification in order to prevent overfitting by making the optimal parameter less dependent on the likelihood term. [§1 Introduction, ¶2, Miyato] 

Regarding claim 2, the combination of Adams and Miyato teaches The device according to claim 1, where Adams further teaches wherein the distance scale is a distance scale in a predetermined projective space (“For example, when the objective function relates hyper-parameters of a machine learning system to its performance, the Gaussian process is defined on the space of hyper-parameters such that the mean function maps sets of hyper-parameter values (each set of hyper-parameter values corresponding to values of one or more hyper-parameters of the machine learning system) to real numbers and the covariance function represents correlation among sets of hyper-parameter values.” [¶0078; Examiner is interpreting predetermined projective space to be equivalent to a space of hyperparameters.]).

Regarding claim 3, the combination of Adams and Miyato teaches The device according to claim 2, where Adams further teaches wherein the model is a neutral network (“In some embodiments, including any of the preceding embodiments, the probabilistic model of the objective function comprises a Gaussian process or a neural network.” [¶0011]), and the distance scale is a distance scale in a projective space indicating an output of an interlayer of the neutral network (“
    PNG
    media_image2.png
    280
    411
    media_image2.png
    Greyscale
” [¶0083; Examiner is interpreting the final layer of a multi-layer neural network to be equivalent to an output of an interlayer of the neural network]).

Regarding claim 4, the combination of Adams and Miyato teaches The device according to claim 1, where Adams further teaches wherein the distance scale is an average of a distance between each of a plurality of pieces of first learning data and second learning data that is a piece of learning data in which a distance from the piece of learning data to the first learning data is shorter than from other piece of learning data (“In T-fold cross-validation, the data used to train a machine learning system is partitioned into T subsets, termed "folds," and the measure of performance of a machine learning system is calculated as the average performance of the machine learning system across the T folds. The performance of the machine learning system for a particular fold is obtained by training the machine learning system on data in all other folds and evaluating the performance of the system on data in the particular fold. Accordingly, to evaluate the performance of the machine learning system for a particular set of hyper-parameter values, the machine learning system must be trained T times, which is computationally expensive for complex machine learning systems and/or large datasets. However, it is likely that the measures of performance associated with each of the T folds are correlated with one another, such that evaluating the performance of the machine learning system for a particular fold using a set of hyper-parameter values may provide information indicating the performance of the machine learning system for another fold using the same set of hyper-parameter values. As a result, performance of the machine learning system may not need to be evaluated for each one of T folds for each set of hyper-parameter values.” [¶0182; Examiner is interpreting T subsets (folds) to be equivalent to pieces of learning data. Additionally learning data shorter than other pieces of learning data is being interpreted as equivalent to a particular fold with a similar correlation with another fold, therefore the distance between them is “shorter” when compared to other folds.]).

Regarding claim 5, the combination of Adams and Miyato teaches The device according to claim 1, where Adams further teaches wherein the distance scale is calculated for each piece of learning data (“The objective function for a task relates hyper-parameter values of the machine learning system to performance of the machine learning system for the cross-validation fold associated with the task (e.g., the objective function for the task associated with cross-validation fold t relates values of hyper-parameters of the machine learning system to performance of the machine learning system calculated by training the machine learning system on data in all folds other than fold t and evaluating the performance of the resulting trained machine learning system on data in fold t.).” [¶0183; note: all the T folds is equivalent to each piece of learning data. As noted above and disclosed in ¶0094, examiner is interpreting distance scale to be the distance metric in a space. Adams further discloses the measure of performance is based off the correlation of T-folds (i.e. pieces of learning data). It is implicit that a distance metric would need to be calculated for each T-folds for each set of hyper parameter values in order to measure the performance of the system.]).

Regarding claim 6, the combination of Adams and Miyato teaches The learning device according to claim 1, where Miyato further teaches wherein the hyperparameter is for calculating the smoothness (“Large value of LDS therefore forces large relative margin around the decision boundary. One can achieve large value of LDS g with L2 regularization, dropout and random perturbation training with appropriate choice of hyperparameters by smoothing the model distribution globally” [pg. 7, ¶1]).
Adams and Miyato are both in the same field of endeavor of optimizing objective functions in neural networks. Adams discloses a method of optimizing an objective function to obtain an updated model. Miyato discloses a local distributional smoothness technique with virtual adversarial training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Adam’s first objective function by substituting it for the local distribution smoothness objective function as taught by Miyato. One would have been motivated to make this modification in order to prevent overfitting by making the optimal parameter less dependent on the likelihood term. [§1 Introduction, ¶2, Miyato] 

Regarding claim 7, the combination of Adams and Miyato teaches The learning device according to claim 1, where Adams further teaches wherein the model is a neutral network (“In some embodiments, including any of the preceding embodiments, the probabilistic model of the objective function comprises a Gaussian process or a neural network.” [¶0011]).

Regarding claim 8, the combination of Adams and Miyato teaches An information processing device comprising: the learning device according to claim 1; where Adams further teaches and a controller ([Adams discloses use of processors in ¶0020]) configured to control information processing using the model determined by the updated first model parameter (“The method comprises using at least one computer hardware processor to perform: identifying a first point at which to evaluate the objective function at least in part by using an acquisition utility function and a probabilistic model of the objective function, wherein the probabilistic model depends on a non-linear one-to-one mapping of elements in the first domain to elements in a second domain; evaluating the objective function at the identified first point to obtain a corresponding first value of the objective function; and updating the probabilistic model of the objective function using the first value to obtain an updated probabilistic model of the objective function.” [¶0025; Examiner is interpreting controlling information processing to be equivalent to using the first value of the objective function to update the probabilistic model.]).

Regarding claim 9, Adams teaches A learning method comprising: 
calculating a value of a first objective function (“a first point at which to evaluate an objective function in the plurality of objective functions; selecting, based at least in part on the joint probabilistic model, a first objective function in the plurality of objective functions to evaluate at the identified first point; evaluating the first objective function at the identified first point; and updating the joint probabilistic model based on results of the evaluation to obtain an updated joint probabilistic model.” [¶0037]) and a value of a second objective function (“a second objective function in the plurality of objective functions to evaluate at the identified first point; and evaluating the second objective function at the identified first point.” [¶0040]), the first objective function being used to estimate a first model parameter for determining the model (“In some embodiments, including any of the preceding embodiments, the first objective function relates values of a plurality of hyper-parameters of a neural network for identifying objects in images to respective values providing a measure of performance of the neural network in identifying the objects in the images.” [¶0019; Examiner is interpreting relates to be equivalent to the objective function being “used to estimate”. Additionally, see ¶0084 “Regardless of the type of probabilistic model used for modeling the objective function, the probabilistic model may be used to obtain an estimate of the objective function and a measure of uncertainty associated with the estimate.”]), the second objective function being used to estimate, with a second model parameter that is a hyperparameter of a learning method of learning, the model by using the first objective function (“As another non-limiting example, one of the multiple related tasks may comprise identifying hyper-parameters to optimize performance of one machine learning system for a first set data associated, and another task of the multiple related tasks may comprise identifying hyper-parameters to optimize performance of another related machine learning system for a second set data (the first set of data may be different from or the same as the second set of data). In each of these examples, the objective function corresponding to a particular task may relate hyper-parameter values of the machine learning system to its performance.” [¶0187]), the second model parameter to be estimated being closer to a distance scale of learning data (“For example, when the objective function relates hyper-parameter values of a machine learning system to its performance, a Gaussian process having a short-length scale may be more appropriate for modeling the objective function at points near its maximum value and a Gaussian process having a longer-length scale may be more appropriate for modeling the objective function at points farther away from its maximum value (e.g., because a machine learning system may perform equally poorly for all "bad" values of hyper-parameters, but its performance may be sensitive to small tweaks in "good" hyper-parameter regimes). In contrast, a stationary Gaussian process model would represent the objective function using the same length scale for all points on which the objective function is defined.” [¶0136; Examiner is interpreting the second model parameter to be equivalent to hyperparameter values that are evaluated to be close to the hyperparameters that are believed to be “good”. Examiner is interpreting distance scale to be a distance metric in a space, Adams discloses “evaluating the objective function at hyper-parameter values far away, according to a suitable distance metric” in ¶0094]); and updating the first model parameter and the second model parameter so that the value of the first objective function and the value of the second objective function are optimized (“The values of the parameter(s) may be updated when one or more additional evaluations of any of the multiple objective functions are performed. In this way, the parameter(s) of the joint probabilistic model that model correlation among tasks in the plurality of tasks may be adaptively estimated.” [¶0179; It is implicit that evaluating multiple objective functions would include both first and second objective functions. Adams further discloses optimization in ¶0061]).
However Adams fails to explicitly teach the first objective function including smoothness that indicates smoothness of a local distribution of an output of a model
Miyato teaches the first objective function including smoothness that indicates smoothness of a local distribution of an output of a model (“
    PNG
    media_image1.png
    192
    535
    media_image1.png
    Greyscale
” [pg. 2, § 2.1 Formalization of Local Distributional Smoothness, note: D is being used to train the objective function and Q is the output of a model.])
Adams and Miyato are both in the same field of endeavor of optimizing objective functions in neural networks. Adams discloses a method of optimizing an objective function to obtain an updated model. Miyato discloses a local distributional smoothness technique with virtual adversarial training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Adam’s first objective function by substituting it for the local distribution smoothness objective function as taught by Miyato. One would have been motivated to make this modification in order to prevent overfitting by making the optimal parameter less dependent on the likelihood term. [§1 Introduction, ¶2, Miyato] 

Regarding claim 10, Adams teaches A computer program product having a computer readable medium including programmed instructions, wherein the instructions, when executed by a computer (See [¶0207-0209]), cause the computer to execute:
calculating a value of a first objective function (“a first point at which to evaluate an objective function in the plurality of objective functions; selecting, based at least in part on the joint probabilistic model, a first objective function in the plurality of objective functions to evaluate at the identified first point; evaluating the first objective function at the identified first point; and updating the joint probabilistic model based on results of the evaluation to obtain an updated joint probabilistic model.” [¶0037]) and a value of a second objective function (“a second objective function in the plurality of objective functions to evaluate at the identified first point; and evaluating the second objective function at the identified first point.” [¶0040]), the first objective function being used to estimate a first model parameter for determining the model (“In some embodiments, including any of the preceding embodiments, the first objective function relates values of a plurality of hyper-parameters of a neural network for identifying objects in images to respective values providing a measure of performance of the neural network in identifying the objects in the images.” [¶0019; Examiner is interpreting relates to be equivalent to the objective function being “used to estimate”. Additionally, see ¶0084 “Regardless of the type of probabilistic model used for modeling the objective function, the probabilistic model may be used to obtain an estimate of the objective function and a measure of uncertainty associated with the estimate.”]), the second objective function being used to estimate, with a second model parameter that is a hyperparameter of a learning method of learning, the model by using the first objective function (“As another non-limiting example, one of the multiple related tasks may comprise identifying hyper-parameters to optimize performance of one machine learning system for a first set data associated, and another task of the multiple related tasks may comprise identifying hyper-parameters to optimize performance of another related machine learning system for a second set data (the first set of data may be different from or the same as the second set of data). In each of these examples, the objective function corresponding to a particular task may relate hyper-parameter values of the machine learning system to its performance.” [¶0187]), the second model parameter to be estimated being closer to a distance scale of learning data (“For example, when the objective function relates hyper-parameter values of a machine learning system to its performance, a Gaussian process having a short-length scale may be more appropriate for modeling the objective function at points near its maximum value and a Gaussian process having a longer-length scale may be more appropriate for modeling the objective function at points farther away from its maximum value (e.g., because a machine learning system may perform equally poorly for all "bad" values of hyper-parameters, but its performance may be sensitive to small tweaks in "good" hyper-parameter regimes). In contrast, a stationary Gaussian process model would represent the objective function using the same length scale for all points on which the objective function is defined.” [¶0136; Examiner is interpreting the second model parameter to be equivalent to hyperparameter values that are evaluated to be close to the hyperparameters that are believed to be “good”. Examiner is interpreting distance scale to be a distance metric in a space, Adams discloses “evaluating the objective function at hyper-parameter values far away, according to a suitable distance metric” in ¶0094]); and updating the first model parameter and the second model parameter so that the value of the first objective function and the value of the second objective function are optimized (“The values of the parameter(s) may be updated when one or more additional evaluations of any of the multiple objective functions are performed. In this way, the parameter(s) of the joint probabilistic model that model correlation among tasks in the plurality of tasks may be adaptively estimated.” [¶0179; It is implicit that evaluating multiple objective functions would include both first and second objective functions. Adams further discloses optimization in ¶0061]).
However Adams fails to explicitly teach the first objective function including smoothness that indicates smoothness of a local distribution of an output of a model
Miyato teaches the first objective function including smoothness that indicates smoothness of a local distribution of an output of a model (“
    PNG
    media_image1.png
    192
    535
    media_image1.png
    Greyscale
” [pg. 2, § 2.1 Formalization of Local Distributional Smoothness, note: D is being used to train the objective function and Q is the output of a model.])
Adams and Miyato are both in the same field of endeavor of optimizing objective functions in neural networks. Adams discloses a method of optimizing an objective function to obtain an updated model. Miyato discloses a local distributional smoothness technique with virtual adversarial training. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Adam’s first objective function by substituting it for the local distribution smoothness objective function as taught by Miyato. One would have been motivated to make this modification in order to prevent overfitting by making the optimal parameter less dependent on the likelihood term. [§1 Introduction, ¶2, Miyato] 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Abbas et al. ("Understanding Regularlization by Virtual Adversarial Training, Ladder Networks and Others") discloses virtual adversarial training with regularization.
 Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491.  The examiner can normally be reached on Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        




/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122