Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 7/12/21 has been entered.
 
Response to Arguments
Applicant's arguments filed 10/8/21 have been fully considered but they are not persuasive.
Applicant argues the newly added limitations.  All arguments have been addressed in the body of the rejection.
	

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claims 1-8, 10-15, 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Talathi (US 2016/0224903) in view of Harvey (US 2017/0176190).
Talathi discloses:
1. A computer-implemented method, comprising: 
initializing a training parameter to a first value (see Farahmand), wherein the training parameter controls at least part of a training process for generating a machine-learning model (creating models using a plurality of iterations/epochs, “Hyper-parameters are selected for training a deep convolutional network by selecting a number of network architectures as part of a database. Each of the network architectures includes one or more local logistic regression layer and is trained to generate a corresponding validation error that is stored in the database”, abstract; “architecture hyper-parameters and/or learning hyper-parameters may be selected to facilitate training a neural network”, 0031; Fig. 6); 
calculating a first output of an iteration of the training process based at least in part on applying a first value of the training parameter and training data to a machine-learning algorithm (Fig. 6; “Number of training epochs”, 0056); 
modifying the training parameter to a second value based at least in part on the first output (“Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes and input to a third layer of neurons, and so on. Deep neural networks may be trained to recognize a hierarchy of features and so they have increasingly been used in object recognition applications”, 0007; Fig. 6; “Number of training epochs”, 0056); and 
calculating a second output of a subsequent iteration of the training process, wherein the second output is calculated based at least in part on the second value of the training parameter (Fig. 6; “Number of training epochs”, 0056; “Before training, the output produced by the DCN is likely to be incorrect, and so an error may be calculated between the actual output and the target output. The weights of the DCN may then be adjusted so that the output scores of the DCN are more closely aligned with the target”, 0077).
Talathi fails to use the phrase “initialize” when referring to starting the training process (training epochs, 0056, 0077)  modifying, by the computer system, the training -parameter hyperparameter to a second value based at least in part on the first output.
Harvey teaches modifying, by the computer system, the training -parameter hyperparameter to a second value based at least in part on the first output (“The sizing of the layers and the setting of various factors in the neural net which are in addition to the factors and values (parameters) that are adjusted in training are collectively referred to as hyperparameters to distinguish them from the "parameters" which are adjusted in training the neural network. The hyperparameters are initialized 121 to appropriate values. In some systems that are taught hyperparameters are adjusted during the course of training but are distinct from trainable parameters because the adjustments are on the basis of the progress of the training rather than being direct functions of the data”, 0085 
“If the convergence is judged 125 not to be adequate the training is stopped, the hyperparameters are adjusted 126, the neural network is reinitialized and the training process is repeated until satisfactory convergence is obtained.”, 0088)
It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and initializing a training parameter gives the designer the option of using pre-trained or optimized parameters 
2. The computer-implemented method of claim 1, further comprising determining the second output based at least in part on applying a Bayesian optimization algorithm to the first output (“the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061;  “hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117).3. The computer-implemented method of claim 1, wherein the machine-learning model comprises a neural network (neural networks, 0030-0031; “the higher layer neurons in a given region may receive inputs that are tuned through training”, 0075).4. The computer-implemented method of claim 1, further comprising: allocating a set of computing resources based on detecting that the training parameter is modified to the second value; and wherein the calculating of the second output is performed using the set of computing resources (“hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117; computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061).5. A system, comprising one or more machine-readable mediums having stored thereon a set of instructions, which if performed by one or more processors, cause the system to at least: 
initiate a training of a machine-learning model with one or more parameters for the training having at least a first value, the training to determine a set of parameters for the model (Fig. 6; creating models using a plurality of iterations/epochs, “the architecture and/or learning parameters of the DCN may be modified to facilitate training. For example, the DCN may be modified such that each layer of the DCN is forced to learn the mapping locally, by introducing a logistic regression cost function at the output of each convolution block”, 0030; “Hyper-parameters are selected for training a deep convolutional network by selecting a number of network architectures as part trained to generate a corresponding validation error that is stored in the database”, abstract; “architecture hyper-parameters and/or learning hyper-parameters may be selected to facilitate training a neural network”, 0031); 
calculate output of the training (using neural networks, 0031, 0056, 0077); and change the one or more parameters of the training to have at least a second value during the training based at least in part on the output (feedback loop, Fig. 6; creating models using a plurality of iterations/epochs, “Hyper-parameters are selected for training a deep convolutional network by selecting a number of network architectures as part of a database. Each of the network architectures includes one or more local logistic regression layer and is trained to generate a corresponding validation error that is stored in the database”, abstract; “architecture hyper-parameters and/or learning hyper-parameters may be selected to facilitate training a neural network”, 0031).
Talathi fails to use the phrase “initiate” when referring to starting the training process (training epochs, 0056, 0077)
Harvey teaches modifying, by the computer system, the training -parameter hyperparameter to a second value based at least in part on the first output (“The sizing of the layers and adjusted in training are collectively referred to as hyperparameters to distinguish them from the "parameters" which are adjusted in training the neural network. The hyperparameters are initialized 121 to appropriate values. In some systems that are taught hyperparameters are adjusted during the course of training but are distinct from trainable parameters because the adjustments are on the basis of the progress of the training rather than being direct functions of the data”, 0085 
“If the convergence is judged 125 not to be adequate the training is stopped, the hyperparameters are adjusted 126, the neural network is reinitialized and the training process is repeated until satisfactory convergence is obtained.”, 0088)

	It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and initializing a training parameter gives the designer the option of using pre-trained or optimized parameters or completely starting from scratch and using initialized parameters and variables change over time.
6. The system of claim 5, wherein the instructions to change the one or more parameters of the training to have at least the second value, which if performed by the one or more processors, cause the system to compute the second value based at least in part on a result of a sequential model-based optimization algorithm, the result determined based at least in part on the output of the training (“hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117; computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061-0063, 0060).7. The system of claim 6, wherein the sequential model-based optimization algorithm is a Bayesian optimization algorithm (“hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117; computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061-.8. The system of claim 5, wherein: the first value of the one or more parameters corresponds to an amount of computing resources to utilize to calculate outputs of the training (“hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117; computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061); the second value of the one or more parameters indicating a different amount of computing resources to utilize to calculate outputs of the training; and the instructions, which if performed by the one or more processors, further cause the system to allocate computing resources for the training of the machine-learning model, wherein, the computing resources are allocated in response to detecting a change in the one or more parameter from the first value to the second value (feedback loop, Fig. 6; “hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061).10. The system of claim 5, wherein the parameter is an optimization hyperparameter that controls at least part of the training of the machine-learning model (computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061).11. The system of claim 10, wherein the optimization hyperparameter is a learning rate hyperparameter (“Cooling strategy: for the sake of clarity one exemplary cooling strategy may provide that as the training progresses, the learning rate is scaled by a factor alpha <1. As the learning rate is cooled, the fluctuations in the weight update values are smaller and the likelihood of the solution converging to a given local minima is higher”, 0058; feedback loop, Fig. 6; “hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a .12. The system of claim 5, wherein: 
the instructions, which if performed by the one or more processors, further cause the system store a plurality of outputs of the training generated at least in part by using the one or more parameters (e.g., 612, Fig. 6 and respective disclosure); and the instructions to change the one or more parameters of the training to have the second value, which, if performed by the one or more processors, further causes the system to change the one or more parameters based at least in part on the plurality of outputs (creating models using a plurality of iterations/epochs, “Hyper-parameters are selected for training a deep convolutional network by selecting a number of network architectures as part of a database. Each of the network architectures includes one or more local logistic regression layer and is trained to generate a corresponding validation error that is stored in the database”, abstract; “architecture hyper-parameters and/or learning hyper-parameters may be selected to facilitate training a neural network”, 0031; Fig. 6; “multi-layered architectures may be trained one layer at a time and may be fine-tuned using back propagation”, 0007; “During training, a DCN may be presented with an image, such as output 322. The output 322 may be a vector of values corresponding to features such as "sign," "60," and "100." The network designer may want the DCN to output a high score for some of the neurons in the output feature vector,”, 0077).13. A non-transitory computer-readable storage medium having stored thereon executable instructions that, as a result of being executed by one or more processors of a computer system, cause the computer system to at least: 
select a first value for one or more parameters for a training of a machine-learning model, the training to determine a set of parameters for the model;
calculate an output of the training; and 
during the training, change the one or more parameters to have a second value determined based at least in part on the output (creating models using a plurality of iterations/epochs, “Hyper-parameters are selected for training a deep convolutional network by selecting a number of network architectures as part of a database. Each of the network architectures includes one or more local logistic regression layer and is trained to generate a corresponding validation error that is stored in the database”, abstract; “architecture hyper-parameters and/or hyper-parameters may be selected to facilitate training a neural network”, 0031; Fig. 6).
Talathi fails to use the phrase select a first value for one or more parameters when referring to starting the training process (training epochs, 0056, 0077)
Harvey teaches modifying, by the computer system, the training -parameter hyperparameter to a second value based at least in part on the first output (“The sizing of the layers and the setting of various factors in the neural net which are in addition to the factors and values (parameters) that are adjusted in training are collectively referred to as hyperparameters to distinguish them from the "parameters" which are adjusted in training the neural network. The hyperparameters are initialized 121 to appropriate values. In some systems that are taught hyperparameters are adjusted during the course of training but are distinct from trainable parameters because the adjustments are on the basis of the progress of the training rather than being direct functions of the data”, 0085 
“If the convergence is judged 125 not to be adequate the training is stopped, the hyperparameters are adjusted 126, the neural network is reinitialized and the training process is repeated until satisfactory convergence is obtained.”, 0088)
 
	It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and selecting a first value or initializing a training parameter gives the designer the option of using various initial values, pre-trained parameters, optimized parameters or starting from scratch since system variables change over time and different starting points may help the network converge sooner.
14. The non-transitory computer-readable storage medium of claim 13, wherein the one or more parameters comprises an optimization hyperparameter (“architecture hyper-parameters and/or learning hyper-parameters may be selected to facilitate training a neural network, such as a deep convolutional network (DCN). Hyper-parameters include architectural parameters, which describe the neural network, as well as learning parameters used for training the neural network via a training process such as back propagation.”, 0031; “FIG. 6 is a block diagram illustrating a method 600 of selecting hyper-parameters for training a deep convolutional network in accordance with aspects of the present disclosure. In block 602, the process generates a database of neuron models (e.g., DCN). For each neuron model included in the database, a hyper-parameter and error (e.g., validation error) may be specified”, 0097).15. The non-transitory computer-readable storage medium of claim 13, wherein the instructions further comprise instructions that, as a result of being executed by the one or more processors, cause the computer system to: select a plurality of values for a parameter of the one or more parameter (FIG. 6 is a block diagram illustrating a method 600 of selecting hyper-parameters for training a deep convolutional network in accordance with aspects of the present disclosure. In block 602, the process generates a database of neuron models (e.g., DCN). For each neuron model included in the database, a hyper-parameter and error (e.g., validation error) may be specified”, 0097; for the plurality of values, calculate and store a respective output of the training; and determine the second value based at least in part on the respective outputs (feedback loop, Fig. 6; “hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117).
18. The non-transitory computer-readable storage medium of claim 13, wherein the parameter comprises information “usable to determine” (as opposed to actually determining) an amount of computing resources to utilize to calculate the output (“hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117; computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061).19. The non-transitory computer-readable storage medium of claim 13, wherein the instructions further comprise instructions that, as a result of being executed by the one or more processors, cause the system to: calculate a second output of the training, wherein second output is calculated using at least the second value of the one or more parameters; and during the training, change the second value of the one or more parameters to a third value based at least in part on the second output (creating models using a plurality of iterations/epochs, “Hyper-parameters are selected for training a deep convolutional network by selecting a number of network architectures as part of a database. Each of the network architectures includes one or more local logistic regression layer and is trained to generate a corresponding validation error that is stored in the database”, abstract; “architecture hyper-parameters and/or learning hyper-parameters may be selected to facilitate training a neural network”, 0031; Fig. 6).20. The non-transitory computer-readable storage medium of claim 13, wherein the machine-learning model comprises a linear regression model or a Bayesian network (“hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117; computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061).

Claim Rejections - 35 USC § 103
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Talathi (US 2016/0224903) in view of Harvey (US 2017/0176190), as set forth above in view of Farahmand (US 2018/0348013).

16. The non-transitory computer-readable storage medium of claim 15, wherein the instructions that cause the computer system to select the plurality of values further include instructions that cause the computer system to pseudo-randomly select the plurality of values (“the next potential hyper-parameter may be random distribution and evaluating the next potential hyper-parameter based on a ratio of a distribution of the good set of architectures and a distribution of the bad set of architectures”, 0125-0127; reads on optimizing and feedback loop, Fig. 6; “hyper-parameter selection may be further optimized or enhanced based on validation error, cost (e.g., computation complexity, memory footprint), a combination thereof and other neural network and system design considerations”, 0117; computing resources, “the Bayesian approach strategy as outlined above is that optimizing the expected value of the utility function is much cheaper and computationally faster than solving the original problem of selecting .psi., as stated in Eq. 1”, 0061).
Farahmand teaches pseudo-randomly select the plurality of values (“In order to optimize GPM hyperparameters with a standard gradient descent algorithm such as fminunc in Matlab optimization toolbox (from Mathworks), the present embodiment employs a callback function with the following pseudo-code”, 0165, 0210, 0223, 0228).
It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and randomly selecting a training parameter gives the .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Talathi (US 2016/0224903) and Harvey (US 2017/0176190), as set forth above, in view of Bergstra (Random Search for Hyper-Parameter Optimization).

17. The non-transitory computer-readable storage medium of claim 13, wherein the instructions further comprise instructions that, as a result of being executed by the one or more processors, cause the system to apply a grid search algorithm to the output to generate the second value.

	Bergstra teaches grid searching (“Grid search and manual search are the most widely used strategies for hyper-parameter optimization”, abstract).
	It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and using various methods to select training parameters allows for faster or more efficient optimizing/converging. random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space.

Claim Rejections - 35 USC § 103
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Talathi and Harvey in view of Liu (US 2018/0159727).

9. The system of claim 8, wherein the computing resources comprise virtual machine instances.
However, Talathi fails to particularly call for virtual machines.
virtual networks, capturing images from VMs, and listing available instance types”, 0048; “In at least one embodiment, installing the representative workload to the at least one cloud platform may include creating a virtual machine in the at least one cloud platform, and installing the representative workload in the virtual machine.”, 0087).
It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and using virtual machines can allow for load balancing or workloads to be handled differently.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Dai (US 10,528,866) teaches initializing training parameters (“the system initializes, for the training of the document classification neural network, the values of the parameters of the one or more LSTM layers to be the pre-trained values of the parameters and then trains the document classification neural network using a conventional supervised .

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID R VINCENT whose telephone number is (571)272-3080.  The examiner can normally be reached on ~Mon-Fri 12-8:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 5712703428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, 






/DAVID R VINCENT/Primary Examiner, Art Unit 2123