DETAILED ACTION
This action is written in response to the application filed 8/20/19. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. 35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
In determining whether the claims are subject matter eligible, the Examiner applies the 2019 USPTO Patent Eligibility Guidelines. (2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50, Jan. 7, 2019.)
Step 1: Is the claim to a process, machine, manufacture, or composition of matter? Yes—claim 1 recites a method which is a process.
Step 2A, prong one: Does the claim recite an abstract idea, law of nature or natural phenomenon? Yes—the limitations identified below each, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components:
determining, by one or more processors, a network architecture of a learning model, the learning model being configured for performing a computing task based on machine learning;
This is akin to a human observation / judgment.
obtaining, by the one or more processors, a metric value record associated with a group of hyper-parameters during hyper-parameter determination for the learning model;\
This is akin to a human judgment.
obtaining, by the one or more processors, an estimation of a metric value based on the network architecture, the metric value record and an association relationship representing an association between network architectures and metric values for the network architecture; and
This is akin to a human judgment.
selecting, by the one or more processors, the group of hyper-parameters in response to the estimation of the metric value meeting a predefined criterion.
This is akin to a human judgment.
Therefore, the claim recites a mental process.
Step 2A, prong two: Does the claim recite additional elements that integrate the judicial exception into a practical application? No—the judicial exception is not integrated into a practical application. Although the claim recites that the recited functionality is performed “by one or more processors”, the recited processor is recited at a high-level of generality such that it amounts to no more than a mere instructions to apply the exception using a generic computer component.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? No—the only limitation on the performance of the described method is that it must be performed “by one or more processors”. The claim thus recites computing components only at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. The statement that the method is performed by computer does not satisfy the test of “inventive concept.” See Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 573 U.S. 208, 134 S. Ct. 2347, 2360 (2014).
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claims 11 and 20, which recite a system and a computer program product, respectively, as well as to dependent claims 2-10 and 12-19. The additional limitations of the dependent claims are addressed briefly below:

Dependent claims 2 additionally recite:
extracting, by the one or more processors, a connection relationship among a plurality of nodes comprised in the learning model; and
This is akin to a human judgment.
determining, by the one or more processors, the network architecture based on the connection relationship and the plurality of nodes.
This is akin to a human judgment.

Dependent claims 3 and 13 additionally recite:
determining, by the one or more processors, a plurality of layers formed by the plurality of nodes; and
This is akin to a human observation.
determining, by the one or more processors, the network architecture based on the connection relationship and the plurality of layers.
This is akin to a human judgment.

Dependent claims 4 and 14 additionally recite:
obtaining, by the one or more processors, a further metric value record associated with a further group of hyper-parameters during the hyper-parameter determination for the learning model;
This is insignificant extra-solution activity (gathering data / results to be analyzed).
obtaining, by the one or more processors, a further estimation of a metric value based on the network architecture, the further metric value record and the association relationship; and
This is insignificant extra-solution activity (gathering data / results to be analyzed).
wherein the selecting, by the one or more processors, the group of hyper-parameters in response to the estimation of the metric value meeting the predefined criterion comprises:
This is akin to a human judgment.
selecting, by the one or more processors, the group of hyper-parameters in response to the estimation of the metric value being closer to a convergence during the hyper-parameter determination than the further estimation.
This is akin to a human judgment.

Dependent claims 5 and 15 additionally recite:
wherein the estimation of the metric value comprises an extreme value among a plurality of metric values associated with a plurality of group of hyper-parameters during the hyper-parameter determination.

This is akin to a human judgment.

Dependent claims 6 and 16 additionally recite:
with respect to a sample learning model in a plurality of sample learning models, determining, by the one or more processors, a sample network architecture of the sample learning model, the plurality of sample learning models being configured for performing a plurality of sample tasks based on the machine learning, respectively;
This is akin to a human judgment.
obtaining, by the one or more processors, a plurality of metric value records during a plurality of experiments for the hyper-parameter determination; and
This is akin to a human judgment.
training, by the one or more processors, the association relationship based on the sample network architecture and the plurality of metric value records, such that the trained association relationship represents an association between the sample network architecture and the plurality of metric value records.
This is akin to a human judgment.

Dependent claims 7 and 17 additionally recite:
obtaining, by the one or more processors, one of the plurality of the metric value records based on metric values associated with a progress of the hyper-parameter determination.
This is insignificant extra-solution activity (gathering data / results to be analyzed).

Dependent claims 8 and 18 additionally recite:
determining, by the one or more processors, a convergence during the hyper-parameter determination; and
This is akin to a human judgment.
obtaining, by the one or more processors, a metric value record based on the determined convergence.
This is insignificant extra-solution activity (gathering data / results to be analyzed).

Dependent claims 9 and 19 additionally recite:
obtaining, by the one or more processors, a group of sample data for training the learning model; and
This is insignificant extra-solution activity (gathering data / results to be analyzed).
training, by the one or more processors, the learning model based on the group of sample data and the selected group of hyper-parameters.
This is akin to a human judgment. The Examiner notes that no particular learning model is specified, and there are some machine learning models which can be practically implemented mentally (or perhaps with the aid of pencil and paper).

Dependent claim 10 recites:
obtaining, by the one or more processors, an object that is to be processed by the computing task; and
This is insignificant extra-solution activity (gathering data / results to be analyzed).
processing, by the one or more processors, the object based on the trained learning model.
This is akin to a human judgment.

Taken alone, the additional elements of the dependent claims above do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation.
The Examiner also notes that “computer readable storage medium”—as used in claim 20—is being interpreted in view of the written description at [0092] as not encompassing a transitory signal.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The following reference is relied upon in the rejections below:
Yao (primary reference). (Yao Q, Wang M, Chen Y, Dai W, Li YF, Tu WW, Yang Q, Yu Y. Taking human out of learning applications: A survey on automated machine learning. arXiv preprint arXiv:1810.13306. 2019 Jan 17. 26 pages.)
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Yao.
Regarding claims 1, 11 and 20, Yao discloses a computer-implemented method (and a related system and computer readable storage medium), comprising:
determining, by one or more processors, a network architecture of a learning model, the learning model being configured for performing a computing task based on machine learning;
P. 3, sec. 1.3: “We use the term configuration to denote all factors but the model parameters x (which are usually obtained from model training) that influence the performance of a learning tool. Examples of configurations are, the hypothesis class of a model, the features utilized by the model, hyper-parameters that control the training procedure, and the architecture of a neural network.” (Emphasis added.)
obtaining, by the one or more processors, a metric value record associated with a group of hyper-parameters during hyper-parameter determination for the learning model;
P. 6, sec. 2.2.2: “Evaluator: The duty of the evaluator is to measure the performance of the learning tools with configurations provided by the optimizer. After that, it generates feedbacks to the optimizer. Usually, to measure the performance of learning tools with given configuration, the evaluator needs to train a model based on the input data, which can be time-consuming. However, the evaluator can also directly estimate the performance based on external knowledge, which mimics humans’ experience. Such estimation is very fast but may be inaccurate. Thus, for the evaluator, it needs to be efficient but also accurate in measuring the performance of configurations.” (Emphasis added.)
obtaining, by the one or more processors, an estimation of a metric value based on the network architecture, the metric value record and an association relationship representing an association between network architectures and metric values for the network architecture; and
P. 6, sec. 2.2.2: “Evaluator: The duty of the evaluator is to measure the performance of the learning tools with configurations provided by the optimizer. After that, it generates feedbacks to the optimizer. Usually, to measure the performance of learning tools with given configuration, the evaluator needs to train a model based on the input data, which can be time-consuming. However, the evaluator can also directly estimate the performance based on external knowledge, which mimics humans’ experience. Such estimation is very fast but may be inaccurate. Thus, for the evaluator, it needs to be efficient but also accurate in measuring the performance of configurations.” (Emphasis added.)See also p. 9, sec. 4.1: Simple Search Approaches and p. 12, sec. 5, discussing the trade-off between fast evaluation (estimation) and accurate evaluation.Also P. 13: “Surrogate evaluator: For configurations that can be readily quantized, one straightforward method to cut down the evaluation cost is to build a model that predicts the performance of given configurations, with experience of past evaluations [10], [43], [59], [129], [132], [133]. These models, serving as surrogate evaluators, spare the computationally expensive model training, and significantly accelerate AutoML. Surrogate evaluators can predict not only the performance of learning tools, but also the training time and model parameters. However, their application scope is limited to hyper-parameter optimization since other kinds of configurations are often hard to quantize, which hinders surrogate model training. In Section 6.1, we will introduce meta-learning techniques that are promising to address this problem. Finally, it should be noted that, while surrogate models are also used in sampled-based optimization techniques (Section 4.2.2), they do not act as surrogate evaluators, but are used to generate potentially promising configurations.” (Emphasis added.) The Examiner interprets “association relationship” as encompassing the “surrogate evaluator” described above.
selecting, by the one or more processors, the group of hyper-parameters in response to the estimation of the metric value meeting a predefined criterion.
P. 5, sec. 2.2.2: “Optimizer: Then, for the optimizer, its duty is to update or generate configurations for learning tools. The search space process, and new configurations are expected to have better performance than previous ones. However, feedbacks offered by the evaluator are not necessarily required or exploited by the optimizer. This depends on which type of the optimizer we are utilizing. Finally, the optimizer should be chosen based on the learning process and corresponding search space, as the latter determines the applicability of different optimization methods. We also wish the structure of the search space can be simple and compact so that more generic and efficient optimization methods can be employed.” (Emphasis added.)See also p. 9, sec. 3.4.1 Network Architecture Search, including figs. 10 and 11, illustrating architecture parameters which may be optimized.
Although Yao discloses every limitation of claim 1, as noted above, it does so in the context of a survey paper discussing distinct but related techniques pertaining to automated machine learning. At the time of filing, it would have been obvious to a person of ordinary skill to use these above noted techniques together to yield a coherent system for tuning a neural network to achieve optimal performance with respect to processing speed and accuracy. These techniques (i.e. model selection and evaluation) are inherently designed to be used together in an iterative process as illustrated in figs. 6 and 7 (from p. 6, reproduced below).

    PNG
    media_image1.png
    408
    1005
    media_image1.png
    Greyscale

Regarding claims 11 and 20, the Examiner notes that generic computer hardware, including a processor, a computer-readable memory, and computer-readable storage media are inherent throughout the Yao disclosure.

Regarding claims 2 and 12 Yao discloses their further limitation wherein the determining, by the one or more processors, the network architecture of the learning model comprises:
extracting, by the one or more processors, a connection relationship among a plurality of nodes comprised in the learning model; and
P. 9, fig. 11 listing ‘[s]ome common design choices for one convolutional layer in a CNN” including number of filters, filter height, filter width, stride height, stride width, and skip connections. See also fig. 10, reproduced below.
    PNG
    media_image2.png
    329
    608
    media_image2.png
    Greyscale

determining, by the one or more processors, the network architecture based on the connection relationship and the plurality of nodes.
P. 12, sec. 4.4: “Multi-step decision-making problems are also commonly encountered in AutoML. For example, in NAS problem, the architecture for each layer needs to be decided, and greedy search is applied in [24] for multi-attribute learning problems; greedy search is also employed in [10], [116] to search block structures within a cell, which is later used to construct a full CNN.”

Regarding claims 3 and 13 Yao discloses their further limitation wherein the determining, by the one or more processors, the network architecture based on the connection relationship and the plurality of nodes comprises:
determining, by the one or more processors, a plurality of layers formed by the plurality of nodes; and
P. 9, fig. 10, reproduced above, illustrating a CNN with a plurality of layers, each of which comprises a plurality of nodes.
determining, by the one or more processors, the network architecture based on the connection relationship and the plurality of layers.
Id.

Regarding claims 4 and 14, Yao discloses their further limitations comprising:
obtaining, by the one or more processors, a further metric value record associated with a further group of hyper-parameters during the hyper-parameter determination for the learning model;
The Examiner notes that the optimization approach described at sec. 4.2 (pp. 9-10) is in iterative approach which “generates new configurations based on previously evaluated samples.”P. 10: each of “grid search”, “random search”, “heuristic search”, and “evolutionary algorithms” described on this page defines a new model—defined by its hyperparameters—to evaluate at each step.
obtaining, by the one or more processors, a further estimation of a metric value based on the network architecture, the further metric value record and the association relationship; and
See excerpts from this section reproduced above in rejection of claim 1 regarding evaluation.
wherein the selecting, by the one or more processors, the group of hyper-parameters in response to the estimation of the metric value meeting the predefined criterion comprises:
selecting, by the one or more processors, the group of hyper-parameters in response to the estimation of the metric value being closer to a convergence during the hyper-parameter determination than the further estimation.
PP. 9-10, sec. 4.2: “Optimization from samples [84] is a kind of smarter search approach compared with simple ones in Section 4.1. It iteratively generates new configurations based on previously evaluated samples. Thus, it is also generally more efficient than simple search methods. Besides, it does not make specific assumptions about the objective.”See also fig. 13, reproduced below, illustrating an iterative optimization process.
    PNG
    media_image3.png
    245
    404
    media_image3.png
    Greyscale

Regarding claims 5 and 15, Yao discloses their further limitation wherein the estimation of the metric value comprises an extreme value among a plurality of metric values associated with a plurality of group of hyper- parameters during the hyper-parameter determination.
P. 8, sec. 3.3.1: “For each learning tool, many algorithms can be used. Some popularly approaches to minimize smooth objective functions, like logistic regression, are summarized in Table 5. While gradient descent (GD) does not involve extra parameters, it suffers from slow convergence and expensive per iteration complexity. Two popular variants of GD are limited memory-BFGS (L-BFGS) and stochastic gradient descent (SGD). The former is more expensive but converges faster [77], while in the latter each iteration is very cheap but many iterations are need before convergence [75].” (Emphasis added.)The Examiner interprets “extreme value” in view of the written description at [0059] as encompassing a local or global minimum value, e.g. as sought by the gradient descent algorithm mentioned in the passage above. (“A metric value occurred at the convergence may represent an extreme value for the metric values. Usually, the lower the extreme value is, the more appropriate the group of hyperparameters is.”, emphasis added.)

Regarding claims 6 and 16, Yao discloses the further limitation comprising:
with respect to a sample learning model in a plurality of sample learning models, determining, by the one or more processors, a sample network architecture of the sample learning model, the plurality of sample learning models being configured for performing a plurality of sample tasks based on the machine learning, respectively;
P. 9, sec. 4.1 discussing grid search and random search approaches. 
Also p. 13: “Sub-sampling: As the training time depends heavily on the amount of training data, an intuitive method to accelerate evaluation is to train parameters with a subset of the training data. This can be done by either using a subset of samples, a subset of features or multi-fidelity evaluations [127]. In general, the less training data is used, the faster and more noisy will be the evaluation.”
obtaining, by the one or more processors, a plurality of metric value records during a plurality of experiments for the hyper-parameter determination; and
P. 9, sec. 4.1: “Simple search approaches gather the feedbacks from the evaluator merely to keep track of the good configurations. Because simple search does not exploit the knowledge gained from the past evaluations, it is usually inefficient. However, due to its simplicity, it is still popularly used in AutoML.”P. 10, sec. 4.2.1: “The framework of heuristic search is shown in Figure 13. The initialization step generates the first population (a bunch of configurations in AutoML). At each iteration, a new population is generated based on the last one, and the fitness (performances) of the individuals are evaluated. The core idea of heuristic search is how to update the population.”
training, by the one or more processors, the association relationship based on the sample network architecture and the plurality of metric value records, such that the trained association relationship represents an association between the sample network architecture and the plurality of metric value records.
P. 13: “Surrogate evaluator: For configurations that can be readily quantized, one straightforward method to cut down the evaluation cost is to build a model that predicts the performance of given configurations, with experience of past evaluations [10], [43], [59], [129], [132], [133]. These models, serving as surrogate evaluators, spare the computationally expensive model training, and significantly accelerate AutoML. Surrogate evaluators can predict not only the performance of learning tools, but also the training time and model parameters. However, their application scope is limited to hyper-parameter optimization since other kinds of configurations are often hard to quantize, which hinders surrogate model training. In Section 6.1, we will introduce meta-learning techniques that are promising to address this problem. Finally, it should be noted that, while surrogate models are also used in sampled-based optimization techniques (Section 4.2.2), they do not act as surrogate evaluators, but are used to generate potentially promising configurations.” (Emphasis added.)

Regarding claims 7 and 17, Yao discloses its further limitation wherein the obtaining, by the one or more processors, the plurality of metric value records comprises:
obtaining, by the one or more processors, one of the plurality of the metric value records based on metric values associated with a progress of the hyper-parameter determination.
P. 8, sec. 3.3.1: “For each learning tool, many algorithms can be used. Some popularly approaches to minimize smooth objective functions, like logistic regression, are summarized in Table 5. While gradient descent (GD) does not involve extra parameters, it suffers from slow convergence and expensive per iteration complexity. Two popular variants of GD are limited memory-BFGS (L-BFGS) and stochastic gradient descent (SGD). The former is more expensive but converges faster [77], while in the latter each iteration is very cheap but many iterations are need before convergence [75].” (Emphasis added.)The Examiner interprets “a progress of the hyper-parameter determination” as encompassing the gradient descent algorithm mentioned in the passage above, wherein the error function progresses towards a local minimum, indicating the set of optimal hyperparameters.

Regarding claims 8 and 18, Yao discloses its further limitation wherein the obtaining, by the one or more processors, one of the plurality of the metric value records comprises:
determining, by the one or more processors, a convergence during the hyper- parameter determination; and
P. 8, sec. 3.3.1, discussing convergence under gradient descent (and variations of gradient descent).
obtaining, by the one or more processors, a metric value record based on the determined convergence.
Id. The Examiner notes that upon convergence, the gradient descent algorithm indicates the optimal set of hyperparameters, i.e. the set of hyperparameters leading to the minimum error function.

Regarding claims 9 and 19, Yao discloses its further limitation comprising:
obtaining, by the one or more processors, a group of sample data for training the learning model; and
P. 9, sec. 4.1 discussing random search, i.e. searching from among “randomly sampled configurations”. Also sec. 4.2, discussing optimization from samples.Also p. 13, first col., discussing sub-sampling.
training, by the one or more processors, the learning model based on the group of sample data and the selected group of hyper-parameters.
See generally pp. 9-10, sec. 4.2, discussing the iterative process of choosing hyperparameters, then evaluating the model that these hyperparameters define, then choosing a new model.

Regarding claim 10, Yao discloses its further limitation comprising:
obtaining, by the one or more processors, an object that is to be processed by the computing task; and
P. 4, first col.: “P is the performance on testing images.” (Emphasis added.)P. 13: “performance is measured on the validation set afterwards.” (Emphasis added.)
processing, by the one or more processors, the object based on the trained learning model.
Id.

Additional Relevant Prior Art
The following references were identified by the Examiner as being relevant to the disclosed invention, but are not relied upon in any particular prior art rejection:
Domhan discloses, inter alia, techniques for optimizing neural network hyperparameters pertaining to architecture. See especially p. 3464, table 1, which lists hyperparameters including learning rate, number of layers, input dropout and number of units (nodes). (Domhan T, Springenberg JT, Hutter F. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Twenty-fourth international joint conference on artificial intelligence 2015 Jun 27.)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Vincent Gonzales whose telephone number is (571) 270-3837. The examiner can normally be reached on Monday-Friday 7 a.m. to 4 p.m. MT.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang, can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Vincent Gonzales/Primary Examiner, Art Unit 2124