Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 4/6/21 have been fully considered but they are not persuasive.
Applicant argues Talathi does not, however, disclose Application No. 15/716,417Amendment dated April 6, 2021Reply to Office Action of November 2, 2020"modifying the training parameter to a second value based at least in part on the first output [of an iteration of a training process using a first value]" and further "calculating a second output of a subsequent iteration" during the training process as claimed. 
In response the independent claims specify the following:
1. (Currently Amended) A computer-implemented method, comprising: initializing, by a computer system, a training parameter to a first value, wherein the training parameter controls at least part of a training process for generating a machine-learning model;
calculating, by the computer system, a first output of an iteration of the training process based at least in part on applying a first value of the training parameter and training data to a machine-learning algorithm; 
modifying, by the computer system, the training parameter to a second value based at least in part on the first output; and calculating, by the computer system, a second output of a subsequent iteration of the training process, wherein the second output is calculated based at least in part on the second value of the training parameter.

5. (Currently Amended) A system, comprising: one or more processors; and memory that stores computer-executable instructions that, as a result of being executed, cause the one or more processors to: initiate a training of a machine-learning model with one or more parameters for the training having at least a first value, the training to determine a set of parameters for the machine-learning model; 
calculate output of the training; and 
change the one or more parameters of the training to have at least a second value during the training based at least in part on the output.

13. (Original) A non-transitory computer-readable storage medium having stored thereon executable instructions that, as a result of being executed by one or more processors of a computer system, cause the computer system to at least: 
select a first value for one or more parameters for a training of a machine-learning model, the training to determine a set of parameters for the model; 
calculate an output of the training; and during the training, change the one or more parameters to have a second value determined based at least in part on the output.

The applicant discloses “For example, neural networks are sometimes used in connection with machine-learning. In many 
cases, training parameters also referred to as hyperparameters are parameters that control10 various aspects of the training process which are manually set by an operator of the machine- learning algorithm” (specification, 0001) and 

Talathi discloses modifying, by the computer system, the training parameter to a second value based at least in part on the first output, as specified by claims 1 only (training parameters are modified based on a plurality of factors such as learning and receiving inputs from previous outputs or layers, see e.g., “Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes and input to a third layer of neurons, and so on. Deep neural networks may be trained to recognize a hierarchy of features and so they have increasingly been used in object recognition applications”, 0007; Fig. 6; “Number of training epochs”, 0056; “network architectures includes one or more local logistic regression layers and is trained to generate a corresponding validation error that is stored in the database”, 0013; “The neural network may be trained using a gradient descent type learning process or 

In re pg. 8, Applicant respectfully submits that claims 5 and 13 are allowable at least for reasons including some of those discussed above in connection with claim 1. For example, claim 5 recites, in part, "changing the one or more parameters of the training to have at least a second value during the training based at least in part on the output" and claim 13 recites, in part, "during the training, change the one or more parameters to have a second value determined based at least in part on the output."
In response Talathi discloses change the one or more parameters (not further defined, reads on any parameter that is modified in the learning and training process, Figs. 6 or 8) of the training to have at least a second value during the training based at least in part on the output (feedback loop, Fig. 6; creating models using a plurality of layers, inputs and outputs, and iterations/epochs, “the architecture and/or learning parameters of the DCN may be modified to facilitate training. For example, the DCN may be modified such that each layer of the DCN is forced to learn the mapping x.fwdarw.y locally, by introducing a logistic regression cost function at the output of Hyper-parameters are selected for training a deep convolutional network by selecting a number of network architectures as part of a database. Each of the network architectures includes one or more local logistic regression layer and is trained to generate a corresponding validation error that is stored in the database”, abstract; “architecture hyper-parameters and/or learning hyper-parameters may be selected to facilitate training a neural network”, 0031; “Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes and input to a third layer of neurons, and so on”, 0007).
	
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-8, 10-15, 18-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Talathi (US 2016/0224903) .

1. A computer-implemented method, comprising: 
initializing a training parameter to a first value, wherein the training parameter controls at least part of a training process for generating a machine-learning model (creating models using a plurality of iterations/epochs, “Hyper-parameters are selected for training a deep convolutional network by selecting a number of network architectures as part of a database. Each of the network architectures includes one or more local logistic regression layer and is trained to generate a corresponding validation error that is stored in the database”, abstract; “architecture hyper-parameters and/or learning hyper-parameters may be selected to facilitate training a neural network”, 0031; Fig. 6); 
calculating a first output of an iteration of the training process based at least in part on applying a first value of the training parameter and training data to a machine-learning algorithm (Fig. 6; “Number of training epochs”, 0056); 
modifying the training parameter to a second value based at least in part on the first output (“Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes and trained to recognize a hierarchy of features and so they have increasingly been used in object recognition applications”, 0007; Fig. 6; “Number of training epochs”, 0056); and 
calculating a second output of a subsequent iteration of the training process, wherein the second output is calculated based at least in part on the second value of the training parameter (Fig. 6; “Number of training epochs”, 0056; “Before training, the output produced by the DCN is likely to be incorrect, and so an error may be calculated between the actual output and the target output. The weights of the DCN may then be adjusted so that the output scores of the DCN are more closely aligned with the target”, 0077).2. The computer-implemented method of claim 1, further comprising determining the second output based at least in part on applying a Bayesian optimization algorithm to the first output (“the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061;  “hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, .3. The computer-implemented method of claim 1, wherein the machine-learning model comprises a neural network (neural networks, 0030-0031; “the higher layer neurons in a given region may receive inputs that are tuned through training”, 0075).4. The computer-implemented method of claim 1, further comprising: allocating a set of computing resources based on detecting that the training parameter is modified to the second value; and wherein the calculating of the second output is performed using the set of computing resources (“hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117; computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061).5. A system, comprising one or more machine-readable mediums having stored thereon a set of instructions, which if performed by one or more processors, cause the system to at least: initiate a training of a machine-learning model with one or more parameters for the training having at least a first value, the training to determine a set of parameters for the model (Fig. 6; creating models using a plurality of iterations/epochs, “the architecture and/or learning parameters of the DCN may be modified to facilitate training. For example, the DCN may be modified such that each layer of the DCN is forced to learn the mapping locally, by introducing a logistic regression cost function at the output of each convolution block”, 0030; “Hyper-parameters are selected for training a deep convolutional network by selecting a number of network architectures as part of a database. Each of the network architectures includes one or more local logistic regression layer and is trained to generate a corresponding validation error that is stored in the database”, abstract; “architecture hyper-parameters and/or learning hyper-parameters may be selected to facilitate training a neural network”, 0031); calculate output of the training; and change the one or more parameters of the training to have at least a second value during the training based at least in part on the output (feedback loop, Fig. 6; creating models using a plurality of iterations/epochs, “Hyper-parameters are selected for training a deep convolutional network by selecting a number of network architectures as part of a database. Each of the trained to generate a corresponding validation error that is stored in the database”, abstract; “architecture hyper-parameters and/or learning hyper-parameters may be selected to facilitate training a neural network”, 0031).6. The system of claim 5, wherein the instructions to change the one or more parameters of the training to have at least the second value, which if performed by the one or more processors, cause the system to compute the second value based at least in part on a result of a sequential model-based optimization algorithm, the result determined based at least in part on the output of the training (“hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117; computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061-0063, 0060).7. The system of claim 6, wherein the sequential model-based optimization algorithm is a Bayesian optimization algorithm optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117; computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061-0063).8. The system of claim 5, wherein: the first value of the one or more parameters corresponds to an amount of computing resources to utilize to calculate outputs of the training (“hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117; computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061); the second value of the one or more parameters indicating a different amount of computing resources to utilize to calculate outputs of the training; and the instructions, which if performed by the one or more processors, further cause the system to allocate computing resources for the training of the machine-learning model, wherein, the computing resources are allocated in response to detecting a change in the one or more parameter from the first value to the second value (feedback loop, Fig. 6; “hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117; computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061).10. The system of claim 5, wherein the parameter is an optimization hyperparameter that controls at least part of the training of the machine-learning model (computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061).11. The system of claim 10, wherein the optimization hyperparameter is a learning rate hyperparameter (“Cooling learning rate is scaled by a factor alpha <1. As the learning rate is cooled, the fluctuations in the weight update values are smaller and the likelihood of the solution converging to a given local minima is higher”, 0058; feedback loop, Fig. 6; “hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117).12. The system of claim 5, wherein: the instructions, which if performed by the one or more processors, further cause the system store a plurality of outputs of the training generated at least in part by using the one or more parameters (e.g., 612, Fig. 6 and respective disclosure); and the instructions to change the one or more parameters of the training to have the second value, which, if performed by the one or more processors, further causes the system to change the one or more parameters based at least in part on the plurality of outputs (creating models using a plurality of iterations/epochs, “Hyper-parameters are selected for training a deep convolutional network by selecting a number of network architectures as part of a database. Each of the network architectures includes one or more trained to generate a corresponding validation error that is stored in the database”, abstract; “architecture hyper-parameters and/or learning hyper-parameters may be selected to facilitate training a neural network”, 0031; Fig. 6; “multi-layered architectures may be trained one layer at a time and may be fine-tuned using back propagation”, 0007; “During training, a DCN may be presented with an image, such as a cropped image of a speed limit sign 326, and a "forward pass" may then be computed to produce an output 322. The output 322 may be a vector of values corresponding to features such as "sign," "60," and "100." The network designer may want the DCN to output a high score for some of the neurons in the output feature vector,”, 0077).13. A non-transitory computer-readable storage medium having stored thereon executable instructions that, as a result of being executed by one or more processors of a computer system, cause the computer system to at least: select a first value for one or more parameters for a training of a machine-learning model, the training to determine a set of parameters for the model; calculate an output of the training; and during the training, change the one or more parameters to have a second value determined based at least in part on the output (creating models using a plurality of iterations/epochs, “Hyper-parameters training a deep convolutional network by selecting a number of network architectures as part of a database. Each of the network architectures includes one or more local logistic regression layer and is trained to generate a corresponding validation error that is stored in the database”, abstract; “architecture hyper-parameters and/or learning hyper-parameters may be selected to facilitate training a neural network”, 0031; Fig. 6).14. The non-transitory computer-readable storage medium of claim 13, wherein the one or more parameters comprises an optimization hyperparameter (“architecture hyper-parameters and/or learning hyper-parameters may be selected to facilitate training a neural network, such as a deep convolutional network (DCN). Hyper-parameters include architectural parameters, which describe the neural network, as well as learning parameters used for training the neural network via a training process such as back propagation.”, 0031; “FIG. 6 is a block diagram illustrating a method 600 of selecting hyper-parameters for training a deep convolutional network in accordance with aspects of the present disclosure. In block 602, the process generates a database of neuron models (e.g., DCN). For each neuron model included in the database, a hyper-parameter and error (e.g., validation error) may be specified”, 0097).15. The non-transitory computer-readable storage medium of claim 13, wherein the instructions further comprise instructions that, as a result of being executed by the one or more processors, cause the computer system to: select a plurality of values for a parameter of the one or more parameter (FIG. 6 is a block diagram illustrating a method 600 of selecting hyper-parameters for training a deep convolutional network in accordance with aspects of the present disclosure. In block 602, the process generates a database of neuron models (e.g., DCN). For each neuron model included in the database, a hyper-parameter and error (e.g., validation error) may be specified”, 0097; for the plurality of values, calculate and store a respective output of the training; and determine the second value based at least in part on the respective outputs (feedback loop, Fig. 6; “hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117).18. The non-transitory computer-readable storage medium of claim 13, wherein the parameter comprises information “usable to determine” (as opposed to actually determining) an amount of computing resources to utilize to calculate the output (“hyper-optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117; computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061).19. The non-transitory computer-readable storage medium of claim 13, wherein the instructions further comprise instructions that, as a result of being executed by the one or more processors, cause the system to: calculate a second output of the training, wherein second output is calculated using at least the second value of the one or more parameters; and during the training, change the second value of the one or more parameters to a third value based at least in part on the second output (creating models using a plurality of iterations/epochs, “Hyper-parameters are selected for training a deep convolutional network by selecting a number of network architectures as part of a database. Each of the network architectures includes one or more local logistic regression layer and is trained to generate a corresponding validation error that is stored in the database”, abstract; “architecture hyper-parameters and/or learning hyper-parameters may be selected to facilitate training a neural network”, 0031; Fig. 6).20. The non-transitory computer-readable storage medium of claim 13, wherein the machine-learning model comprises a linear regression model or a Bayesian network (“hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117; computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

s 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Talathi (US 2016/0224903) in view of Golovin (US 2020/0167691).

16. The non-transitory computer-readable storage medium of claim 15, wherein the instructions that cause the computer system to select the plurality of values further include instructions that cause the computer system to pseudo-randomly select the plurality of values (“the next potential hyper-parameter may be chosen by selecting one or more of an architecture hyper-parameter, a learning hyper-parameter and a soft probability hyper-parameter from a random distribution and evaluating the next potential hyper-parameter based on a ratio of a distribution of the good set of architectures and a distribution of the bad set of architectures”, 0125-0127; reads on optimizing and feedback loop, Fig. 6; “hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117; computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061).
	Golovin teaches it is well known to use various types of random searches (“The simplest algorithms include random search and grid search, which select points uniformly at random or from a regular grid, respectively”, 0269; “Several classes of algorithms are included under the umbrella of black-box optimization techniques. The simplest of these are non-adaptive procedures such as Random Search, which selects x.sub.t uniformly at random from X at each time step t independent of the previous points selected, x.sub..tau.:1.ltoreq..tau”, 0101).However, Talathi fails to particularly call for pseudo random.

17. The non-transitory computer-readable storage medium of claim 13, wherein the instructions further comprise instructions that, as a result of being executed by the one or more processors, cause the system to apply a grid search algorithm to the output to generate the second value.
However, Talathi fails to particularly call for grid searches.
	Teaches grid searching (“Several classes of algorithms are included under the umbrella of black-box optimization techniques. The simplest of these are non-adaptive procedures such as Random Search, which selects x.sub.t uniformly at random from X at each time step t independent of the previous points selected, x.sub..tau.:1.ltoreq..tau.<t, and Grid Search, which grid (e.g., the Cartesian product of finite sets of feasible values for each parameter)”, 0101; “The simplest algorithms include random search and grid search, which select points uniformly at random or from a regular grid, respectively”, 0269).
	It would have been obvious to combine the references at time of filing because they are in the same field of endeavor and using various methods to select training parameters allows for faster or more efficient optimizing

Claim Rejections - 35 USC § 103
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Talathi in view of Liu (US 2018/0159727).

9. The system of claim 8, wherein the computing resources comprise virtual machine instances.
However, Talathi fails to particularly call for virtual machines.
Liu teaches virtual machines (“Each cloud 105 may have its own command line tools and semantics (CLI 115A, 115B, and 115N for cloud platforms 105A, 105B, and 105N, respectively) for performing operations such as, for example, creating and deleting VMs, creating and deleting virtual networks, capturing images from VMs, and listing available instance types”, 0048; virtual machine in the at least one cloud platform, and installing the representative workload in the virtual machine.”, 0087).
It would have been obvious to combine the references at time of filing because they are in the same field of endeavor and using virtual machines can allow for load balancing or workloads to be handled differently.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID R VINCENT whose telephone number is (571)272-3080.  The examiner can normally be reached on ~Mon-Fri 12-8:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 5712703428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If 






/DAVID R VINCENT/Primary Examiner, Art Unit 2123