Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 12, 14-15, 22-25, 32 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Brown US 2018/0357543).

12, 23. (New) A system, comprising:
one or more processors, coupled to memory, configured to:
receive output of a neural network model trained with input including a hyperparameter (Brown: “The hyperlearner module 325 can optionally be contained in a different AI-engine module than the instructor module 324 such as the architect module 326 or the learner module 328, or the hyperlearner module 325 can be an AI-engine module itself. The hyperlearner module 325 can be configured to select one or more hyperparameters for each AI neural network configuration, a learning algorithm, and the like”, 0073;
“The AI engine can be configured to continuously train the trained AI-engine neural network in providing the enabling AI for proposing the neural networks and picking the appropriate learning algorithms thereby getting better at building AI models”, 0196), the hyperparameter configured to control at least one of speed, efficiency or accuracy associated with execution of the neural network model (“The learning speed and accuracy of an artificial neural network greatly depends on not only the structure of the artificial neural network and the kind of a learning optimization algorithm, but the hyperparameters”, 0099;
“Adam is a technique that increases optimization accuracy by adjusting the step size and the step direction by combining the momentum and the RMSProp. The Nadam is a technique that increases optimization accuracy by adjusting the step size and the step direction by combining the NAG and the RMSProp”, 0098);
generate a model corresponding to one or more values of the hyperparameter in response to the output of the neural network model (“The hyperlearner module 325 can optionally be contained in a different AI-engine module than the instructor module 324 such as the architect module 326 or the learner module 328, or neural network configuration, a learning algorithm, and the like”, 0073); and
provide the one or more values of the hyperparameter to a user device to cause the user device to generate an object based on the hyperparameter for presentation via a graphical user interface of the user device (e.g., Figs. 5 and 6a “the AI engine further includes a graphing module. The graphing module is configured to display a training graph for each AI model of the one or more AI models using the training data from the training data buffer, the testing data from the testing data buffer, or the training data together with the testing data. The training accuracy is expressed in the training graph as a function of training episodes”, 0008, 0078; “each training graph of training graphs 600A and 600B provides training accuracy in the training graph as a function of training episodes for training one or more concepts of a mental model. In each training graph of the training graphs 600A and 600B”, 0128).

14, 24. (New) The method of claim 12, further comprising:  generating, by the one or more processors, the values of the hyperparameter based on the neural network model and based on a plurality of phases (phases read on periods of time; a neural network has a high/fast learning rate when it is struggling to converge and tuning the hyperparameters are used to optimize the convergence over time; phases read on start, middle and end of training or episodes or epochs) associated with the hyperparameter and the model (e.g., Figs. 5 and 6a “the AI engine further includes a graphing module. The graphing module is configured to display a training graph for each AI model of the one or more AI models using the training data from the training data buffer, the testing data from the testing data buffer, or the training data together with the testing data. The training accuracy is expressed in the training graph as a function of training episodes”, 0008, 0078; “each training graph of training graphs 600A and 600B provides training accuracy in the training graph as a function of training episodes for training one or more concepts of a mental model. In each training graph of the training graphs 600A and 600B”, 0128).  

15, 25. (New) The method of claim 14, wherein the phases comprise a general training phase and at least one of a warm-up phase occurring before the general training phase and a warm-down phase occurring after the general training phase, (phases read on periods of time; a neural network has a high/fast learning rate when it is struggling to converge and tuning the comprising: providing, by the one or more processors, a graph for display via the graphical user interface indicating the general training phase and at least one of the warm-phase (e.g., Figs. 5 and 6a “the AI engine further includes a graphing module. The graphing module is configured to display a training graph for each AI model of the one or more AI models using the training data from the training data buffer, the testing data from the testing data buffer, or the training data together with the testing data. The training accuracy is expressed in the training graph as a function of training episodes”, 0008, 0078; “each training graph of training graphs 600A and 600B provides training accuracy in the training graph as a function of training episodes for training one or more concepts of a mental model. In each training graph of the training graphs 600A and 600B”, 0128). 
22, 32. (New) The method of claim 21, wherein the hyperparameter is based on a second hyperparameter including an adjustable momentum (“Adam is a technique that increases optimization accuracy by adjusting the step size and the step direction by combining the momentum and the RMSProp. The Nadam is a technique that increases optimization accuracy by adjusting the step size .

 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 16-17, 26-27 are rejected under 35 U.S.C. 103 as being unpatentable over Brown in view of Triplet (US 2021/0174246).

16, 26. (New) The method of claim 15, wherein the hyperparameter increases during the warm-up phase, decreases at a first rate during the general training phase, and decreases at a second rate during the warm-down phase(e.g., Figs. 5 and 6a “the AI engine further includes a graphing module. The graphing module is configured to display a training graph for each AI model of the one or more AI models using the training data from the training data buffer, the testing data from the testing data .
17, 27. (New) The method of claim 15, wherein the hyperparameter decreases during the warm up phase, increase during the general training phase, and is constant during the warm-down phase.  
Brown fails to particularly call for the details of increasing and decreasing hyperparameters.
Triplet teaches increasing and decreasing hyperparameters (“enhancing, improving, augmenting, or tuning hyperparameters of Machine Learning (ML) techniques for creating a ML model”, abstract;
“tune hyperparameters, which are then used for creating a ML model. One way to measure how well the system is at learning how to tune the hyperparameters, according to the embodiments of the present disclosure, is to measure how accurate the system is at arriving a different metrics. In addition to accuracy, other rewards may be provided for meeting other criteria. For example, the system can learn how fast it can perform the entire training process (i.e., training time) or can learn how fast the ML model 
“Furthermore, the hyperparameter selection may impact convergence, sample efficiency, and the overall accuracy of the model. For instance, given different hyperparameters, the same technique may quickly converge to an accurate model during training, may slowly converge to an inaccurate model (which would thereby require more training data before the model can be used effectively), may be unable to converge at all (e.g., if the learning rate is too high), etc.”, 0031).
	It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and tuning hyperparameters can optimize the learning of neural networks.

Claim Rejections - 35 USC § 103
s 18-21, 28-31 are rejected under 35 U.S.C. 103 as being unpatentable over Brown in view of Lee (US 2021/0190360).
 
18,28. (New) The method of claim 12, wherein the hyperparameter is based on a size of training data, and the neural network model is trained with input including the training data.
  
19, 29. (New) The method of claim 18, wherein the training data comprises a mini-batch having a size less than the size of the training data.
	Brown fails to particularly call for batch sizes.
	Lee teaches sizes of data (“For example, the hyperparameter may include an initial weight between nodes, an initial bias between nodes, a mini-batch size, the number of learning repetitions, a learning rate, and the like. The model parameter may include inter-node weights, inter-node deflections, and the like.”, 0088).
	It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and tuning neural networks to converge takes on many forms.  Batch sizes can affect the learning rate.
  
20, 30. (New) The method of claim 12, wherein the hyperparameter includes a dropout rate associated with the model and corresponding to a predetermined percentage of neurons associated with the neural network model (dropout rates read on loss functions Lee: “a loss function, a cost function, a learning algorithm, an optimization algorithm, or the like, and has its contents specified in such a way that a hyperparameter is previously set before learning, a model parameter is set through learning”, 0085; “The loss function can be used for an index (reference) for determining optimum model parameters in a training process of an artificial neural network. In an artificial neural network, training means a process of adjusting model parameters to reduce the loss function and the object of training can be considered as determining model parameters that minimize the loss function”, 0089).

21, 31.(New) The method of claim 12, wherein the hyperparameter includes a learning rate associated with the model, comprising: providing, by the one or more processors, a graph indicating the learning rate for display via the graphical user interface (Brown: Figs. 5 and 6A, optimize the learning rate;
Lee: “For example, the hyperparameter may include an initial weight between nodes, an initial bias between nodes, a mini-batch size, the number of learning repetitions, a learning rate, and the like. The model parameter may include inter-node weights, inter-node deflections, and the like”, 0088; “In this learning rate.”, 0095; “A gradient descent method may obtain a slope by differentiating the loss function to model parameters, and perform updating by changing the model parameters in the direction of the obtained slope at the learning rate.”, 0096).
It would have been obvious to plot learning rates as well an training time and accuracy to see how hyperparameters are or need to be adjusted.
 
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Huang (US 2019/0236487) teaches batch sizes (“The number of items in a batch of training data (such as the number of images for image recognition) is dictated by the batch size (BS) hyperparameter”, 62, 64,65).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID R VINCENT whose telephone number is (571)272-3080.  The examiner can normally be reached on ~Mon-Fri 12-8:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 5712703428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/DAVID R VINCENT/Primary Examiner, Art Unit 2123