Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because of the following informalities: 
The title of the invention is not descriptive. A new title is required that is clearly
indicative of the invention to which the claims are directed.  Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
“A memory configured to” in claim 4
“A processor configured to” in claim 4
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1, 2, 4, and 5 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
	Regarding claims 1, 4, and 5, “sampling a prediction performance within a predetermined range from the first prediction performance curve a plurality of times for each of different data sizes” is indefinite.  From the included drawings the data size is given as the x axis and therefore from the description one of ordinary skill in the art would interpret to sample the same point a plurality of times.  For further examination this is being interpreted as “sampling a prediction performance within a predetermined 
	Regarding claim 2, “a larger data size” is indefinite.  A direct basis for comparison is expected.  Since there is no relative precedent for what the data size is larger than and therefore the comparative adjective “larger” is indefinite.  For further examination this is being interpreted as “a large data size”.
Regarding claim 2, “a smaller width” is indefinite.  A direct basis for comparison is expected.  Since there is no relative precedent for what the width is smaller than and therefore the comparative adjective “smaller” is indefinite.  For further examination this is being interpreted as “a small width”. 
	
Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-6 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding Claims 1, 4, and 5,
Claims 1, 4, and 5 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claims 1, 4, and 5 are directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claims 1, 4, and 5 recite a computer implemented method of processing neural networks.  Each of the following limitations calculating, by a processor, based on measured data in which a first data size is associated with a prediction performance of a model generated by using training data of the first data size, a first parameter value which defines a first prediction performance curve that indicates a relationship between a data size and a prediction performance, sampling, by the processor, a prediction performance within a predetermined range from the first prediction performance curve a plurality of times for each of different data sizes, to generate a plurality of sample point sequences, each of which is a sequence of combinations of a data size and a prediction performance, calculating, by the processor, a plurality of second parameter values which defines a plurality of second prediction performance curves that represents the plurality of sample point sequences and determining a plurality of weights associated with the plurality of second prediction performance curves by using the plurality of second parameter values and the measured data,and generating, by the processor, variance information which indicates variation of a prediction performance of a second data size estimated from the first prediction performance curve by using the plurality of second prediction performance curves and the plurality of weight. as drafted, under its broadest reasonable covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion) but for the recitation of generic computer components.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: “calculating, by a processor, based on measured data in which a first data 
Step 2A Prong Two Analysis: This judicial exception recited in these claims is not integrated into a practical application.  Claims 1, 4, and 5 additionally recites “a processor” and “a model”.  However, these additional features are generic computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  For example, computing variance by a processor amounts to simply adding a general 
Step 2B Analysis:  Claims 1, 4, and 5 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amounts to no more than mere instructions to apply the judicial exception using a generic computer component.  
The above analysis also applies to claims 4 and 5 which recite corresponding features.  Therefore, claims 1, 4, and 5 recite an abstract idea which is a judicial exception.

Regarding Claim 2:  Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 2 is directed to an apparatus, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 2 recites a computer implemented method of processing neural networks.  For example, the limitations when a prediction performance for a larger data size is sampled, a smaller width is set to the predetermined range as drafted, under its broadest reasonable covers mental processes (concepts performed in the human mind (including an observation, 
Step 2A Prong Two Analysis:  This judicial exception recited in these claims is not integrated into a practical application.  Claim 2 additionally recites “a processor” and “a model”.  However, these additional features are generic computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  For example, computing variance by a processor amounts to simply adding a general purpose computer or computer components after the fact to a well know abstract idea.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis:  Claim 2 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the 

Regarding Claim 3:  Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 3 is directed to an apparatus, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 3 recites a computer implemented method of processing neural networks.  For example, the limitations wherein the determining of a plurality of weights includes calculating a plurality of first occurrence probabilities corresponding to the plurality of second parameter values by using the plurality of second parameter values and the measured data, converting the plurality of first occurrence probabilities into a plurality of second occurrence probabilities corresponding to the plurality of sample point sequences by using the plurality of sample point sequences and the plurality of second parameter values, and determining the plurality of weights from the plurality of second occurrence probabilities as drafted, under its broadest reasonable covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: wherein the determining of a plurality of weights includes calculating a plurality of first occurrence probabilities corresponding to the plurality of second parameter values by using the plurality of second parameter 
Step 2A Prong Two Analysis:  This judicial exception recited in these claims is not integrated into a practical application.  Claim 3 additionally recites “a processor” and “a model”.  However, these additional features are generic computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  For example, computing variance by a processor amounts to simply adding a general purpose computer or computer components after the fact to a well know abstract idea.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis:  Claim 3 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 2 amount to no more than mere instructions to apply the judicial exception using a generic computer component.

Claims 4 and 5?


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5 are rejected under 35 U.S.C. 103 as being unpatentable over Klein  (Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets, 2017) and in view of Hara (US 2017/0228639 A1).


Regarding claim 1, Klein, who discloses a method of optimizing machine learning hyperparameters teaches An estimation method comprising:
calculating, by a processor, based on measured data in which a first data size is associated with a prediction performance of a model generated by using training data of the first data size ([Abstract] “To accelerate hyperparameter optimization, we  a first parameter value which defines a first prediction performance curve that indicates a relationship between a data size and a prediction performance ( see figure 6 on p. 14, also included below, for error as a function of dataset size.  See eqn. 6 for description of kernel.  First parameter represented by C. Second parameter represented by gamma. See also section 3.1) sampling, by the processor, a prediction performance within a predetermined range from the first prediction performance curve a plurality of times for each of different data sizes ( [p. 11 Secton A.1] “after sampling K hyperparameter settings from the marginal loglikelihood for the GP using MCMC (line 1), for every hyperparameter setting.” [p. 13 Section B] “Scaling of Loss and Computational Cost With Dataset Size…Figure 6 shows these trends for ten random configurations, evaluated on subsets of different sizes” See figure 8 for range of noise (variance) detected at each dataset size) to generate a plurality of sample point sequences each of which is a sequence of combinations of a data size and a prediction performance, ( [p. 13 Section B] See Figure 6 and 7.  Each point is a combination of dataset size and model performance, each curve is a sequence of these points and there are a plurality of curves in each graph to compare performance.  “To show that our method, i.e. the kernel we use and our initial design, actually capture these trends, we sampled points from that data as our initial design and predicted loss and cost of unseen configurations” ) a plurality of second parameter values which defines a plurality of second prediction performance curves that represents the plurality of sample point sequences ( Figure 6 shows a plurality of curves that represent prediction performance each of which can be represented by a variance information which indicates variation of a prediction performance of a second data size ([Klein 2.3] "(multi-task Bayesian optimization) The blackbox function f : X _ R ! R now takes another input representing the data subset size;” Data subset interpreted as second data “we will use relative sizes s = Nsub=N 2 [0; 1], with s = 1 representing the entire dataset. While the eventual goal is to minimize the loss f(x; s = 1) for the entire dataset, evaluating f for smaller s is usually cheaper...We propose a principled rule for the automatic selection of the next (x; s) pair to evaluate...Based on these observations, we expect that relatively small fractions of the dataset yield representative performances and therefore vary our relative size parameter s on a logarithmic scale.” [Section C] “We repeated each run with a given subset size K = 10 times using different subsets, and estimate the observation noise variance at each point” See eqn. 9)
	While Klein teaches the relationship between the model prediction error and the data set size, Klein does not explicitly teach a prediction performance value.  Klein also does not explicitly teach determining a plurality of weights associated with the plurality of second prediction performance curves using the plurality of second parameter values and the measured data, generating, by the processor, variance information which indicates variation of a prediction performance of a second data size, or a second data size estimated from the first prediction performance curve by using the plurality of second prediction performance curves and the plurality of weights.


    PNG
    media_image1.png
    718
    554
    media_image1.png
    Greyscale


determining a plurality of weights associated with the plurality of second prediction performance curves using the plurality of second parameter values and the measured data ([¶0015] "The function is operable to estimate the evaluation value from differences between the tentative weight data of a first iteration of the plurality of iterations and the tentative weight data of a second iteration of the plurality of iterations. According to the tenth aspect, the apparatus may generate an accurate predictive model based on the tentative weight data." [¶0068] "At S170, the training section may generate a new setting used for training of second neural networks. [¶0042] "the apparatus 100 may improve prediction accuracy of the predictive model, and thereby may efficiently determine an optimized setting of the neural network"), generating, by the processor (Hara [¶0104] “The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.”), , and a second data size estimated from the first prediction performance curve by using the plurality of second prediction performance curves and the plurality of weights [[¶0050] "In the embodiment, the setting may include one or more hyper parameters relating to a local response normalization (or LRN) such as local size" [¶0041] "The selecting section 190 may select one setting based on performances of neural networks of which training is not terminated. For example, the selecting section 190 may select a setting that gives a neural network the best evaluation value among the first neural networks, and, the second neural networks of which training is not terminated by the terminating section 170" [¶0042] "As explained above, the apparatus 100 may improve prediction accuracy of the predictive model, and thereby may efficiently determine an optimized setting of the neural network by terminating at least part of the training of the neural networks by predicting the performance from  the tentative weight data." Local size is an explicitly determined parameter of model setting using both weights and model settings of first and second models.].

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to generate, by a processor a prediction performance parameter relative to the model weights and indicative of the model prediction error. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Hara that the method ([¶0042] “may improve prediction accuracy of the predictive model, and thereby may efficiently determine an optimized setting of the neural network by terminating at least part of the training of the neural networks by predicting the performance from the tentative weight data”). 

Claims 4 and 5 mirror the limitations of claim 1 and are therefore rejected under the same premise.

Regarding claim 2, the combination of Klein and Hara teaches when a prediction performance for a larger data size is sampled, a smaller width is set to the predetermined range. ([Klein p. 13 Section B] "We note that, as training size increases, the loss of many configurations decreases, but the relative ordering does not change dramatically, such that training on few data points provides information about the full data set." See figure 6)

Regarding claim 3, the combination of Klein and Hara teaches wherein the determining of a plurality of weights includes calculating a plurality of first occurrence probabilities corresponding to the plurality of second parameter values by using the plurality of second parameter values and the measured data (Hara [¶0015] "the function is operable to estimate the evaluation value from differences between the tentative weight data of a first iteration of the plurality of iterations and the tentative weight data of a second iteration of the plurality of iterations. According to the tenth aspect, the apparatus may generate an accurate predictive model based on the tentative weight data." [¶0045] "The training data may include at least one set of input data" [¶0065] "the generating section may normalize the tentative weight data of a first iteration of the plurality of iterations and the tentative weight data of a second iteration of the plurality of iteration" [¶0066] "The generating section may adopt calculation of converting the plurality of first occurrence probabilities into a plurality of second occurrence probabilities corresponding to the plurality of sample point sequences by using the plurality of sample point sequences and the plurality of second parameter values ([Hara ¶0015] “the function is operable to estimate the evaluation value from differences between the tentative weight data of a first iteration of the plurality of iterations and the tentative weight data of a second iteration of the plurality of iterations… the apparatus may generate an accurate predictive model based on the tentative weight data.” Converting and generating interpreted as synonymous) determining the plurality of weights from the plurality of second occurrence probabilities ([Hara ¶0065] “the generating section may normalize the tentative weight data of a first iteration of the plurality of iterations and the tentative weight data of a second iteration of the plurality of iteration”).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Domhan (Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves,2015).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela Reyes can be reached on (571)270-1006.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SB/Examiner, Art Unit 4193

/Ramon A. Mercado/Primary Examiner, Art Unit 2132