Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Examiner’s Note
Providing supporting paragraph(s) with a clear explanation for each limitation of amended/new claim(s) in Remarks is strongly requested for clear and definite claim interpretations by Examiner.
Regarding 35 U.S.C. § 101 claim eligibility, claim 10 recites a combination of additional elements, and the claim as a whole integrates a mental process(es) (e.g., “selecting …” step, “selecting …” step, “determining …” step) into a practical application. Specifically, the additional elements recite a specific improvement over prior art systems by training a machine learning model on the set of training data using the selected machine learning platform, the selected algorithm, and the determined one or more hyperparameters. That is, it trains the machine learning model based on training data in a specific manner. Thus, it seems to be an improvement to the functioning of a computer. In other words, the computer is not just being used as a tool, but the operation of the computer is improved. Thus, the claim is eligible because it is not directed to the recited judicial exception. In addition, claim 18 is/are eligible for a similar reason. Thus, claims 10-20 are eligible under 35 U.S.C. § 101.

Priority
Acknowledgment is made of applicant's claim for the present application filed on 12/31/2019.

Claim Objections
Claim(s) 1 is/are objected to because of the following informalities: “selecting a first machine learning platform from a set of machine learning platforms based respective first performance metrics” may need to read “selecting a first machine learning platform from a set of machine learning platforms based on respective first performance metrics”. Appropriate correction is required.
Claim(s) 18 is/are objected to because of the following informalities: the indentation of the last limitation “training …” is not correct. Appropriate correction is required.
Claim(s) 19 is/are objected to because of the following informalities: the indentation of the last limitation “comparing…” is not correct. Appropriate correction is required.
Claim(s) 20 is/are objected to because of the following informalities: the indentation of the last limitation “outputting…” is not correct. Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-9 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a system; therefore, it falls into the statutory category of machines.
Step 2A Prong 1: 
The limitations of 
“…; and 
…, that in response to execution by …, causes … to perform operations comprising: 
iteratively selecting … to train on a set of training data, each iteration of the iteratively selecting comprising: 
selecting … from … based respective first performance metrics corresponding to … from … that were used for training in previous iterations; 
determining … based on the selecting …; 
selecting … from … based on respective second performance metrics of … from … that were used for training in the previous iterations; 
determining a set of hyperparameters based on the selecting …; and 
selecting a first combination of hyperparameters from the set of hyperparameters based on respective third performance metrics of one or more combinations of hyperparameters from the set of hyperparameters that were used for training in the previous iterations.”, as drafted, are a machine that, under its broadest reasonable interpretation, covers performance of the limitations in the mind. That is, other than reciting “one or more hardware processors”, “at least one memory storing computer-executable instructions”, “the one or more hardware processors”, “the system”, “a plurality of machine learning models”, “a first machine learning platform”, “a set of machine learning platforms”, “one or more machine learning platforms”, “the set of machine learning platforms”, “a set of machine learning algorithms”, “the first machine learning platform”, “a first machine learning algorithm”, “the set of machine learning algorithms”, “one or more machine learning algorithms”, “the set of machine learning algorithms”, “the first machine learning algorithm”, nothing in the claim element precludes the step from practically being performed in the mind. 
For example, but for the “one or more hardware processors”, “at least one memory storing computer-executable instructions”, “the one or more hardware processors”, “the system”, “a plurality of machine learning models”, “a first machine learning platform”, “a set of machine learning platforms”, “one or more machine learning platforms”, “the set of machine learning platforms”, “a set of machine learning algorithms”, “the first machine learning platform”, “a first machine learning algorithm”, “the set of machine learning algorithms”, “one or more machine learning algorithms”, “the set of machine learning algorithms”, “the first machine learning algorithm” languages, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper) of iteratively selecting a model as follows; selecting a model platform from multiple platforms based on performance measures of the multiple platforms; determining multiple methods based on the selected platform; selecting a method from the multiple methods based on performance measures of the multiple methods; determining multiple parameters based on the selected method; selecting a set of parameters from the multiple parameters based on the performance measures of the multiple parameters.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
The claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). In particular, the claim recites additional elements – using “one or more hardware processors”, “at least one memory”, “computer-executable instructions”, “the one or more hardware processors”, “the system”, “a plurality of machine learning models”, “a first machine learning platform”, “a set of machine learning platforms”, “one or more machine learning platforms”, “the set of machine learning platforms”, “a set of machine learning algorithms”, “the first machine learning platform”, “a first machine learning algorithm”, “the set of machine learning algorithms”, “one or more machine learning algorithms”, “the set of machine learning algorithms”, “the first machine learning algorithm” to process data. The device and machine learning in each step are recited at a high-level of generality (i.e., as a generic computer performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
In particular, the claim recites an additional element – the act of storing data (“storing computer-executable instructions”). The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of storing data is recited at a high-level of generality (i.e., as a generic act of storing performing a generic act function of storing data) such that it amounts no more than a mere act to apply the exception using a generic act of storing. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
In addition, the claim is appending a well-understood, routine, conventional activity previously known to the industry, specified at a high level of generality, to the judicial exception - see MPEP 2106.05(d)(II) – “Storing and retrieving information in memory” is Well-Understood, Routine, and Conventional Activity (MPEP 2106.05(d)). As discussed above with respect to integration of the abstract idea into a practical application, the additional element of the act of storing data amounts to no more than a mere act to apply the exception using a generic act of storing. A mere act to apply an exception using a generic act of storing cannot provide an inventive concept. The claim is not patent eligible.

Regarding claim 2
Step 1: The claim recites a system; therefore, it falls into the statutory category of machines.
Step 2A Prong 1: 
The limitations of 
“… are supported by ….”, as drafted, are a machine that, under its broadest reasonable interpretation, covers performance of the limitations in the mind. That is, other than reciting “the set of machine learning algorithms”, “the first machine learning platform”, nothing in the claim element precludes the step from practically being performed in the mind. 
For example, but for the “the set of machine learning algorithms”, “the first machine learning platform” languages, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper) of a method being supported by a platform.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
The claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). In particular, the claim recites additional elements – using “the set of machine learning algorithms”, “the first machine learning platform” to process data. The device and machine learning in each step are recited at a high-level of generality (i.e., as a generic computer performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.

Regarding claim 3
Step 1: The claim recites a system; therefore, it falls into the statutory category of machines.
Step 2A Prong 1: 
The limitations of 
“… supports … that is different than … supported by ….”, as drafted, are a machine that, under its broadest reasonable interpretation, covers performance of the limitations in the mind. That is, other than reciting “a second machine learning platform of the set of machine learning platforms”, “a second set of machine learning algorithms”, “the set of machine learning algorithms”, “the first machine learning platform”, nothing in the claim element precludes the step from practically being performed in the mind. 
For example, but for the “a second machine learning platform of the set of machine learning platforms”, “a second set of machine learning algorithms”, “the set of machine learning algorithms”, “the first machine learning platform” languages, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper) of another platform supporting other methods.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
The claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). In particular, the claim recites additional elements – using “a second machine learning platform of the set of machine learning platforms”, “a second set of machine learning algorithms”, “the set of machine learning algorithms”, “the first machine learning platform” to process data. The device and machine learning in each step are recited at a high-level of generality (i.e., as a generic computer performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.

Regarding claim 4
Step 1: The claim recites a system; therefore, it falls into the statutory category of machines.
Step 2A Prong 1: 
The limitations of 
“the set of hyperparameters are supported by ….”, as drafted, are a machine that, under its broadest reasonable interpretation, covers performance of the limitations in the mind. That is, other than reciting “the first machine learning algorithm”, nothing in the claim element precludes the step from practically being performed in the mind. 
For example, but for the “the first machine learning algorithm” languages, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper) of parameters being supported by a method.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
The claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). In particular, the claim recites additional elements – using “the first machine learning algorithm” to process data. The device and machine learning in each step are recited at a high-level of generality (i.e., as a generic computer performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.

Regarding claim 5
Step 1: The claim recites a system; therefore, it falls into the statutory category of machines.
Step 2A Prong 1: 
The limitations of 
“… supports a second set of hyperparameters that is different than the set of hyperparameters supported by ….”, as drafted, are a machine that, under its broadest reasonable interpretation, covers performance of the limitations in the mind. That is, other than reciting “a second machine learning algorithm of the set of machine learning algorithms”, “the first machine learning algorithm”, nothing in the claim element precludes the step from practically being performed in the mind. 
For example, but for the “a second machine learning algorithm of the set of machine learning algorithms”, “the first machine learning algorithm” languages, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper) of another method supporting other parameters.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
The claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). In particular, the claim recites additional elements – using “a second machine learning algorithm of the set of machine learning algorithms”, “the first machine learning algorithm” to process data. The device and machine learning in each step are recited at a high-level of generality (i.e., as a generic computer performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.


Regarding claim 6
Step 1: The claim recites a system; therefore, it falls into the statutory category of machines.
Step 2A Prong 1: 
The limitations of 
“… comprise …”, as drafted, are a machine that, under its broadest reasonable interpretation, covers performance of the limitations in the mind. That is, other than reciting “the set of machine learning algorithms”, “at least one of a neural network, a recurrent neural network, a gradient boosted tree, a logistic regression, or a random forest”, nothing in the claim element precludes the step from practically being performed in the mind. 
For example, but for the “a second machine learning algorithm of the set of machine learning algorithms”, “the first machine learning algorithm” languages, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper) of using one of methods.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
The claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). In particular, the claim recites additional elements – using “the set of machine learning algorithms”, “at least one of a neural network, a recurrent neural network, a gradient boosted tree, a logistic regression, or a random forest” to process data. The device and machine learning in each step are recited at a high-level of generality (i.e., as a generic computer performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.

Regarding claim 7
Step 1: The claim recites a system; therefore, it falls into the statutory category of machines.
Step 2A Prong 1: 
The limitations of 
“the set of hyperparameters comprises at least one of a learning rate, an activation function, a number of iterations, number of trees, a maximum depth, a dropout rate, a number of hidden layers, or a number of hidden nodes”, as drafted, are a machine that, under its broadest reasonable interpretation, covers performance of the limitations in the mind. That is, nothing in the claim element precludes the step from practically being performed in the mind. 
For example, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper) of using one of parameters.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim does not recite additional elements. Thus, the claim is directed to an abstract idea.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Thus, the claim is not patent eligible.

Regarding claim 8
The claim is rejected under 35 U.S.C. 101 because it only modifies the abstract idea by selecting a data processing package and a feature selection package, which also does not add significantly more or provide a specific application of the judicial exception.
Step 1: The claim recites a system; therefore, it falls into the statutory category of machines.
Step 2A Prong 1: 
The limitations of 
“selecting a data processing package from a set of data processing packages and a feature selection package from a set of feature selection packages”, as drafted, are a machine that, under its broadest reasonable interpretation, covers performance of the limitations in the mind. That is, nothing in the claim element precludes the step from practically being performed in the mind. 
For example, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper) of selecting processing and feature selection packages.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim does not recite additional elements. Thus, the claim is directed to an abstract idea.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Thus, the claim is not patent eligible.

Regarding claim 9
Step 1: The claim recites a system; therefore, it falls into the statutory category of machines.
Step 2A Prong 1: 
The limitations of 
“selecting … having a model performance metric that satisfies a metric threshold”, as drafted, are a machine that, under its broadest reasonable interpretation, covers performance of the limitations in the mind. That is, other than reciting “a final machine learning model”, nothing in the claim element precludes the step from practically being performed in the mind. 
For example, but for the “a final machine learning model” languages, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper) of selecting a method which satisfies a performance threshold.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
The claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). In particular, the claim recites additional elements – using “a final machine learning model” to process data. The device and machine learning in each step are recited at a high-level of generality (i.e., as a generic computer performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

(Note: Hereinafter, if a limitation has brackets (i.e. [·]) around claim languages, the bracketed claim languages indicate that they have not been taught yet by the current prior art reference but they will be taught by another prior art reference afterwards.)

Claim(s) 1-2, 4-8, 10-11, 16-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bergstra et al. (Hyperopt: a Python library for model selection and hyperparameter optimization) in view of Akiba et al. (Optuna: A Next-generation Hyperparameter Optimization Framework) 

Regarding claim 1
Bergstra teaches 
A system, comprising: 
one or more hardware processors; and 
at least one memory storing computer-executable instructions, that in response to execution by the one or more hardware processors, causes the system to perform operations comprising: 
(Bergstra [fig(s) 1-2] [sec(s) Example usage] “Here is the simplest example of using this software”; e.g., the example code read(s) on “one or more hardware processors” and “memory” since code runs on a computer.)

iteratively selecting a plurality of machine learning models to train on a set of training data, each iteration of the iteratively selecting comprising: 
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.”
    PNG
    media_image1.png
    525
    1159
    media_image1.png
    Greyscale
 [sec(s) Getting started with hyperopt] “The way to use Hyperopt is to describe: • the objective function to minimize, • the space over which to search” [sec(s) Introduction] “Many widely-used machine learning algorithms take a significant amount of time to train from data. At the same time, these same algorithms must be configured prior to training. Most implementations of machine learning algorithms have a set of configuration variables that the user can set which have various effects on how the training is done.”; e.g., “Choosing a model within this configuration space means choosing paths in an ancestral sampling process” along with “six possible preprocessing modules and six possible classifiers” and “full search space” read(s) on “iteratively selecting”.)

selecting a first machine learning platform from a set of machine learning platforms based [respective first performance metrics] corresponding to one or more machine learning platforms from the set of machine learning platforms that were used for training in previous iterations; 
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Introduction] “This paper describes the usage and architecture of Hyperopt, for both sequential and parallel optimization of expensive functions.” [sec(s) Getting started with hyperopt] “Later, the section ‘Trial results: more than just the loss’ will explain how to use the trials database to analyze the results of a search and the section Parallel Evaluation with a Cluster will explain how to use parallel computation to search faster.” [sec(s) Parallel evaluation with a cluster] “Hyperopt has been designed to make use of a cluster of computers for faster search. … Parallel search can be done with the same objective functions as the ones used for sequential search, but users wishing to take advantage of asynchronous evaluation in the parallel case can do so by using a lower-level calling convention for their objective function.”; e.g., “Parallel evaluation with a cluster” and “sequential search” read(s) on “machine learning platforms”. Furthermore, e.g., iterations prior to selecting a platform read(s) on “previous iterations”. In addition, Bergstra does not appear to explicitly teach but suggests “performance metrics” based on e.g., “Hyperopt has been designed to make use of a cluster of computers for faster search”. 
Examiner notes that par(s) 33 of the Instant Specification describe(s) “The set of machine learning platforms may include various libraries, tools, toolkits, operating systems, and/or other software components by which computer programs for implementing machine learning algorithms may be executed.”.)

determining a set of machine learning algorithms based on the selecting the first machine learning platform; 
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Scikit-learn model selection as a search problem] “The configuration space we provide includes six preprocessing algorithms and seven classification algorithms. The full search space is illustrated in figure 1. … The classification algorithms were (by class name (used + unused hyperparameters)): SVC(23), KNN(4+5), RandomForest(8), ExtraTrees(8), SGD(8 +4), and MultinomialNB(2).”;)

selecting a first machine learning algorithm from the set of machine learning algorithms based on respective second performance metrics of one or more machine learning algorithms from the set of machine learning algorithms that were used for training in the previous iterations;
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process.” [sec(s) Getting started with hyperopt] “This section introduces basic usage of the hyperopt.fmin function, which is Hyperopt’s basic optimization driver. We will look at how to write an objective function that fmin can optimize, and how to describe a configuration space that fmin can search.  … To summarize, these are the steps to using Hyperopt: (1) implement an objective function that maps configuration points to a real-valued loss value, (2) define a configuration space of valid configuration points, and then (3) call fmin to search the space to optimize the objective function.” [sec(s) Scikit-learn model selection as a search problem] “Model selection is the process of estimating which machine learning model performs best from among a possibly infinite set of possibilities. … In this paper we discuss solving it with the Hyperopt optimization library. The basic approach is to set up a search space with random variable hyperparameters, use Scikit-learn to implement the objective function that performs model training and model validation, and use Hyperopt to optimize the hyperparamters. … Hyperopt-Sklearn provides a parameterization of a search space over pipelines, that is, of sequences of preprocessing steps and classifiers. The configuration space we provide includes six preprocessing algorithms and seven classification algorithms. The full search space is illustrated in figure 1. … The classification algorithms were (by class name (used + unused hyperparameters)): SVC(23), KNN(4+5), RandomForest(8), ExtraTrees(8), SGD(8 +4), and MultinomialNB(2).” [sec(s) Example usage] “Following Scikit-learn’s convention, Hyperopt-Sklearn provides an Estimator class with a fit method and a predict method. The fit method of this class performs hyperparameter optimization, and after it has completed, the predict method applies the best model to test data. Each evaluation during optimization performs training on a large fraction of the training set, estimates test set accuracy on a validation set and returns that validation set score to the optimizer. At the end of search, the best configuration is retrained on the whole data set to produce the classifier that handles subsequent predict calls.” 

    PNG
    media_image2.png
    699
    808
    media_image2.png
    Greyscale
; e.g., “estimating which machine learning model performs best” along with fig 1 read(s) on “first machine learning algorithm”. In addition, e.g., “hyperopt.fmin function, which is Hyperopt’s basic optimization driver” along with “Estimator class” which sets different algorithms read(s) on “second performance metrics”. Note that Komer et al. (Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn) teaches an example of HyperoptEstimator class setting different algorithms in [sec(s) Example Usage] “
    PNG
    media_image3.png
    270
    839
    media_image3.png
    Greyscale
”.)

determining a set of hyperparameters based on the selecting the first machine learning algorithm; and 
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Scikit-learn model selection as a search problem] “Model selection is the process of estimating which machine learning model performs best from among a possibly infinite set of possibilities. … In this paper we discuss solving it with the Hyperopt optimization library. The basic approach is to set up a search space with random variable hyperparameters, use Scikit-learn to implement the objective function that performs model training and model validation, and use Hyperopt to optimize the hyperparamters …The configuration space we provide includes six preprocessing algorithms and seven classification algorithms. The full search space is illustrated in figure 1. … The classification algorithms were (by class name (used + unused hyperparameters)): SVC(23), KNN(4+5), RandomForest(8), ExtraTrees(8), SGD(8 +4), and MultinomialNB(2).”; e.g., “choosing paths in an ancestral sampling process” and “use Hyperopt to optimize the hyperparamters” along with fig 1 read(s) on “determining a set of hyperparameters”.
Examiner notes that par(s) 34 of the Instant Specification describe(s) “Hyperparameters may include, but are not limited to, a learning rate, an activation function, a number of iterations, a number of hidden nodes, a number of hidden layers, a number of trees, a dropout rate, a maximum tree depth, and/or the like”.)

selecting a first combination of hyperparameters from the set of hyperparameters based on respective third performance metrics of one or more combinations of hyperparameters from the set of hyperparameters that were used for training in the previous iterations.
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Getting started with hyperopt] “To summarize, these are the steps to using Hyperopt: (1) implement an objective function that maps configuration points to a real-valued loss value, (2) define a configuration space of valid configuration points, and then (3) call fmin to search the space to optimize the objective function.” [sec(s) Scikit-learn model selection as a search problem] “Model selection is the process of estimating which machine learning model performs best from among a possibly infinite set of possibilities. … In this paper we discuss solving it with the Hyperopt optimization library. The basic approach is to set up a search space with random variable hyperparameters, use Scikit-learn to implement the objective function that performs model training and model validation, and use Hyperopt to optimize the hyperparamters. … Hyperopt-Sklearn provides a parameterization of a search space over pipelines, that is, of sequences of preprocessing steps and classifiers. The configuration space we provide includes six preprocessing algorithms and seven classification algorithms. The full search space is illustrated in figure 1. … The classification algorithms were (by class name (used + unused hyperparameters)): SVC(23), KNN(4+5), RandomForest(8), ExtraTrees(8), SGD(8 +4), and MultinomialNB(2).”; e.g., “choosing paths in an ancestral sampling process” and “optimize the objective function” and “use Hyperopt to optimize the hyperparamters” along with fig 1 read(s) on “determining a set of hyperparameters”. In addition, e.g., “optimize the objective function” and “use Hyperopt to optimize the hyperparamters” read(s) on “third performance metrics”.)

(Note: Hereinafter, if a limitation has one or more bold underlines, the one or more underlined claim languages indicate that they are taught by the current prior art reference, while the one or more non-underlined claim languages indicate that they have been taught already by one or more previous art references.)

	However, Bergstra does not appear to explicitly teach:
selecting a first machine learning platform from a set of machine learning platforms based [respective first performance metrics] corresponding to one or more machine learning platforms from the set of machine learning platforms that were used for training in previous iterations; 

	Akiba teaches
selecting a first machine learning platform from a set of machine learning platforms based respective first performance metrics corresponding to one or more machine learning platforms from the set of machine learning platforms that were used for training in previous iterations; 
(Akiba [fig(s) 1-2] [fig(s) 11] “Figure (b) illustrates the effect of the number of workers on the performance.” [fig(s) 12] [sec(s) 5] “We also evaluated the scalability of Optuna’s distributed optimization. Based on the same experimental setup used in Section 5.2, we recorded the transition of the best scores obtained by TPE with 1, 2, 4, and 8 workers in a distributed environment. Figure 11b shows the relationship between optimization score and execution time. We can see that the convergence speed increases with the number of workers.”;)

Bergstra teaches a system that enables to select models and hyperparameters on sequential or parallel platforms based on an optimization. In addition, Akiba teaches a system that evaluates the scalability of the distributed optimization based on the relationship between optimization score and execution time, and effectively demonstrates that the convergence speed increases with the number of workers. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the machine learning model selection system of Bergstra with the performance metrics of different platforms of Akiba. 
One of ordinary skill in the art would have been motived to combine in order to effectively demonstrate that the convergence speed increases with the number of workers.
(Akiba [fig(s) 1-2] [fig(s) 11] “Figure (b) illustrates the effect of the number of workers on the performance.” [fig(s) 12] [sec(s) 5] “We also evaluated the scalability of Optuna’s distributed optimization. Based on the same experimental setup used in Section 5.2, we recorded the transition of the best scores obtained by TPE with 1, 2, 4, and 8 workers in a distributed environment. Figure 11b shows the relationship between optimization score and execution time. We can see that the convergence speed increases with the number of workers.”;)

Regarding claim 2
The combination of Bergstra, Akiba teaches claim 1.

Bergstra further teaches 
the set of machine learning algorithms are supported by the first machine learning platform.
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Introduction] “This paper describes the usage and architecture of Hyperopt, for both sequential and parallel optimization of expensive functions.” [sec(s) Getting started with hyperopt] “Later, the section ‘Trial results: more than just the loss’ will explain how to use the trials database to analyze the results of a search and the section Parallel Evaluation with a Cluster will explain how to use parallel computation to search faster.” [sec(s) Parallel evaluation with a cluster] “Hyperopt has been designed to make use of a cluster of computers for faster search. … Parallel search can be done with the same objective functions as the ones used for sequential search, but users wishing to take advantage of asynchronous evaluation in the parallel case can do so by using a lower-level calling convention for their objective function.”; e.g., “Parallel search can be done with the same objective functions as the ones used for sequential search, but users wishing to take advantage of asynchronous evaluation in the parallel case can do so by using a lower-level calling convention for their objective function” along with “six possible classifiers” read(s) on “the set of machine learning algorithms are supported by the first machine learning platform”.)

Regarding claim 4
The combination of Bergstra, Akiba teaches claim 1.

Bergstra further teaches 
wherein the set of hyperparameters are supported by the first machine learning algorithm.
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Scikit-learn model selection as a search problem] “Model selection is the process of estimating which machine learning model performs best from among a possibly infinite set of possibilities. … In this paper we discuss solving it with the Hyperopt optimization library. The basic approach is to set up a search space with random variable hyperparameters, use Scikit-learn to implement the objective function that performs model training and model validation, and use Hyperopt to optimize the hyperparamters …The configuration space we provide includes six preprocessing algorithms and seven classification algorithms. The full search space is illustrated in figure 1. … The classification algorithms were (by class name (used + unused hyperparameters)): SVC(23), KNN(4+5), RandomForest(8), ExtraTrees(8), SGD(8 +4), and MultinomialNB(2).”; e.g., “choosing paths in an ancestral sampling process” and “use Hyperopt to optimize the hyperparamters” along with fig 1 read(s) on “set of hyperparameters are supported by the first machine learning algorithm”.)

Regarding claim 5
The combination of Bergstra, Akiba teaches claim 4.
a second machine learning algorithm of the set of machine learning algorithms supports a second set of hyperparameters that is different than the set of hyperparameters supported by the first machine learning algorithm.
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Scikit-learn model selection as a search problem] “Model selection is the process of estimating which machine learning model performs best from among a possibly infinite set of possibilities. … In this paper we discuss solving it with the Hyperopt optimization library. The basic approach is to set up a search space with random variable hyperparameters, use Scikit-learn to implement the objective function that performs model training and model validation, and use Hyperopt to optimize the hyperparamters …The configuration space we provide includes six preprocessing algorithms and seven classification algorithms. The full search space is illustrated in figure 1. … The classification algorithms were (by class name (used + unused hyperparameters)): SVC(23), KNN(4+5), RandomForest(8), ExtraTrees(8), SGD(8 +4), and MultinomialNB(2).”; e.g., “choosing paths in an ancestral sampling process”, “use Hyperopt to optimize the hyperparamters”, “classification algorithms were (by class name (used + unused hyperparameters)): SVC(23), KNN(4+5), RandomForest(8), ExtraTrees(8), SGD(8 +4), and MultinomialNB(2)” along with fig 1 read(s) on “second set of hyperparameters that is different than the set of hyperparameters”.)

Regarding claim 6
The combination of Bergstra, Akiba teaches claim 1.

Bergstra further teaches 
the set of machine learning algorithms comprise at least one of a neural network, a recurrent neural network, a gradient boosted tree, a logistic regression, or a random forest.
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Scikit-learn model selection as a search problem] “The classification algorithms were (by class name (used + unused hyperparameters)): SVC(23), KNN(4+5), RandomForest(8), ExtraTrees(8), SGD(8 +4), and MultinomialNB(2).”;)

Regarding claim 7
The combination of Bergstra, Akiba teaches claim 1.

Akiba further teaches 
the set of hyperparameters comprises at least one of a learning rate, an activation function, a number of iterations, number of trees, a maximum depth, a dropout rate, a number of hidden layers, or a number of hidden nodes.
(Akiba [fig(s) 1] “An example code of Optuna’s define-by-run style API. This code builds a space of hyperparameters for a classifier of the MNIST dataset and optimizes the number of layers and the number of hidden units at each layer.” [sec(s) 2] “Upon the invocation of ‘suggest API’, a hyperparameter is statistically sampled based on the history of previously evaluated trials. At Line 5, ‘suggest_int’ method suggests a value for ‘n_layers’, the integer hyperparameter that determines the number of layers in the Multilayer Perceptron. … The method ‘create_model’ generates ‘n_layers’ in Line 5 and uses a for loop to construct a neural network of depth equal to ‘n_layers’. The method also generates ‘n_units_i’ at each i-th loop, a hyperparameter that determines the number of the units in the i-th layer”;)

	The combination of Bergstra, Akiba is combinable with Akiba for the same rationale as set forth above with respect to claim 1.

Regarding claim 8
The combination of Bergstra, Akiba teaches claim 1.

Bergstra further teaches 
each iteration of the iteratively selecting further comprises: (see the rejections of claim 1)
selecting a data processing package from a set of data processing packages and a feature selection package from a set of feature selection packages.
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Scikit-learn model selection as a search problem] “The configuration space we provide includes six preprocessing algorithms and seven classification algorithms. The full search space is illustrated in figure 1. The preprocessing algorithms were (by class name, followed by n. hyperparameters + n. unused hyperparameters): PCA(2), StandardScaler(2), MinMaxScaler(1), Normalizer(1), None, and TFIDF(0+9). The first four preprocessing algorithms were for dense features. PCA performed whitening or non-whitening PCO. The StandardScaler, MinMaxScaler, and Normalizer did various feature-wise affine transforms to map numeric input features onto values near 0 and with roughly unit variance. The TF-IDF preprocessing module performed feature extraction from text data.”; e.g., “preprocessing modules” along with “dense features”, “various feature-wise affine transforms”, “feature extraction from text data” read(s) on “set of data processing packages” and “set of feature selection packages”.
Examiner notes that par(s) 31-32 of the Instant Specification describe(s) “The data processing packages may include one or more libraries, code, and/or other software components that implement different data processing techniques. … The feature selection packages may include one or more libraries, code, and/or other software components that implement different feature selection techniques.”)

Regarding claim 10
Bergstra teaches
A method, comprising: 
receiving, by a service provider system comprising one or more hardware processors, a set of training data; and 
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Example usage] “# Load data ({train,test}_{data,label})”

    PNG
    media_image4.png
    554
    824
    media_image4.png
    Greyscale

“# Download data and split training and test sets”

    PNG
    media_image5.png
    699
    808
    media_image5.png
    Greyscale
; e.g., the example code read(s) on “one or more hardware processors” since code runs on a computer.)

selecting a first machine learning platform from a set of machine learning platforms based on a [first optimization function that metrics past machine learning platforms] used for training on the set of training data; 
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Introduction] “This paper describes the usage and architecture of Hyperopt, for both sequential and parallel optimization of expensive functions.” [sec(s) Getting started with hyperopt] “Later, the section ‘Trial results: more than just the loss’ will explain how to use the trials database to analyze the results of a search and the section Parallel Evaluation with a Cluster will explain how to use parallel computation to search faster.” [sec(s) Parallel evaluation with a cluster] “Hyperopt has been designed to make use of a cluster of computers for faster search. … Parallel search can be done with the same objective functions as the ones used for sequential search, but users wishing to take advantage of asynchronous evaluation in the parallel case can do so by using a lower-level calling convention for their objective function.”; e.g., “Parallel evaluation with a cluster” and “sequential search” read(s) on “machine learning platforms” and “past machine learning platforms”. In addition, Bergstra does not appear to explicitly teach but suggests “first optimization function that metrics past machine learning platforms” based on e.g., “Hyperopt has been designed to make use of a cluster of computers for faster search”.)

selecting a first algorithm from a set of algorithms supported by the first machine learning platform based on a second optimization function that metrics past algorithms used for training on the set of training data; 
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process.” [sec(s) Getting started with hyperopt] “This section introduces basic usage of the hyperopt.fmin function, which is Hyperopt’s basic optimization driver. We will look at how to write an objective function that fmin can optimize, and how to describe a configuration space that fmin can search.  … To summarize, these are the steps to using Hyperopt: (1) implement an objective function that maps configuration points to a real-valued loss value, (2) define a configuration space of valid configuration points, and then (3) call fmin to search the space to optimize the objective function.” [sec(s) Scikit-learn model selection as a search problem] “Model selection is the process of estimating which machine learning model performs best from among a possibly infinite set of possibilities. … In this paper we discuss solving it with the Hyperopt optimization library. The basic approach is to set up a search space with random variable hyperparameters, use Scikit-learn to implement the objective function that performs model training and model validation, and use Hyperopt to optimize the hyperparamters. … Hyperopt-Sklearn provides a parameterization of a search space over pipelines, that is, of sequences of preprocessing steps and classifiers. The configuration space we provide includes six preprocessing algorithms and seven classification algorithms. The full search space is illustrated in figure 1. … The classification algorithms were (by class name (used + unused hyperparameters)): SVC(23), KNN(4+5), RandomForest(8), ExtraTrees(8), SGD(8 +4), and MultinomialNB(2).” [sec(s) Parallel evaluation with a cluster] “Hyperopt has been designed to make use of a cluster of computers for faster search. … Parallel search can be done with the same objective functions as the ones used for sequential search, but users wishing to take advantage of asynchronous evaluation in the parallel case can do so by using a lower-level calling convention for their objective function.” [sec(s) Example usage] “Following Scikit-learn’s convention, Hyperopt-Sklearn provides an Estimator class with a fit method and a predict method. The fit method of this class performs hyperparameter optimization, and after it has completed, the predict method applies the best model to test data. Each evaluation during optimization performs training on a large fraction of the training set, estimates test set accuracy on a validation set and returns that validation set score to the optimizer. At the end of search, the best configuration is retrained on the whole data set to produce the classifier that handles subsequent predict calls.”

    PNG
    media_image2.png
    699
    808
    media_image2.png
    Greyscale
; e.g., “estimating which machine learning model performs best” along with fig 1 read(s) on “first algorithm”. In addition, e.g., “Parallel search can be done with the same objective functions as the ones used for sequential search, but users wishing to take advantage of asynchronous evaluation in the parallel case can do so by using a lower-level calling convention for their objective function” along with “six possible classifiers” read(s) on “set of algorithms supported by the first machine learning platform”. Furthermore, e.g., “hyperopt.fmin function, which is Hyperopt’s basic optimization driver” along with “fit method” and “Estimator class” which sets different algorithms read(s) on “second optimization function”. Note that Komer et al. (Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn) teaches an example of HyperoptEstimator class setting different algorithms in [sec(s) Example Usage] “
    PNG
    media_image3.png
    270
    839
    media_image3.png
    Greyscale
”.)

determining one or more hyperparameters from a set of hyperparameters supported by the first algorithm based on a third optimization function that metrics past combinations of hyperparameters from the set of hyperparameters used for training on the set of training data; and 
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Getting started with hyperopt] “This section introduces basic usage of the hyperopt.fmin function, which is Hyperopt’s basic optimization driver. We will look at how to write an objective function that fmin can optimize, and how to describe a configuration space that fmin can search. … To summarize, these are the steps to using Hyperopt: (1) implement an objective function that maps configuration points to a real-valued loss value, (2) define a configuration space of valid configuration points, and then (3) call fmin to search the space to optimize the objective function.” [sec(s) Scikit-learn model selection as a search problem] “Model selection is the process of estimating which machine learning model performs best from among a possibly infinite set of possibilities. … In this paper we discuss solving it with the Hyperopt optimization library. The basic approach is to set up a search space with random variable hyperparameters, use Scikit-learn to implement the objective function that performs model training and model validation, and use Hyperopt to optimize the hyperparamters. … Hyperopt-Sklearn provides a parameterization of a search space over pipelines, that is, of sequences of preprocessing steps and classifiers. The configuration space we provide includes six preprocessing algorithms and seven classification algorithms. The full search space is illustrated in figure 1. … The classification algorithms were (by class name (used + unused hyperparameters)): SVC(23), KNN(4+5), RandomForest(8), ExtraTrees(8), SGD(8 +4), and MultinomialNB(2).” [sec(s) Example usage] “Following Scikit-learn’s convention, Hyperopt-Sklearn provides an Estimator class with a fit method and a predict method. The fit method of this class performs hyperparameter optimization, and after it has completed, the predict method applies the best model to test data. Each evaluation during optimization performs training on a large fraction of the training set, estimates test set accuracy on a validation set and returns that validation set score to the optimizer. At the end of search, the best configuration is retrained on the whole data set to produce the classifier that handles subsequent predict calls.”

    PNG
    media_image2.png
    699
    808
    media_image2.png
    Greyscale
; e.g., “choosing paths in an ancestral sampling process” and “optimize the objective function” and “use Hyperopt to optimize the hyperparamters” along with fig 1 read(s) on “determining one or more hyperparameters”. In addition, e.g., “objective function” along with “Estimator class” which sets different hyperparameters read(s) on “third optimization function”. Note that Komer et al. (Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn) teaches an example of HyperoptEstimator class setting different hyperparameters in [sec(s) Example Usage] “
    PNG
    media_image6.png
    510
    790
    media_image6.png
    Greyscale
”.)

training a machine learning model on the set of training data using the first machine learning platform, the first algorithm, and the one or more hyperparameters.
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Example usage] “Following Scikit-learn’s convention, Hyperopt-Sklearn provides an Estimator class with a fit method and a predict method. The fit method of this class performs hyperparameter optimization, and after it has completed, the predict method applies the best model to test data. Each evaluation during optimization performs training on a large fraction of the training set, estimates test set accuracy on a validation set and returns that validation set score to the optimizer. At the end of search, the best configuration is retrained on the whole data set to produce the classifier that handles subsequent predict calls. … # Load data ({train,test}_{data,label})”

    PNG
    media_image4.png
    554
    824
    media_image4.png
    Greyscale

“# Download data and split training and test sets”

    PNG
    media_image5.png
    699
    808
    media_image5.png
    Greyscale
; e.g., “At the end of search, the best configuration is retrained on the whole data set” read(s) on “training a machine learning model on the set of training data”.)

However, Bergstra does not appear to explicitly teach:
selecting a first machine learning platform from a set of machine learning platforms based on a [first optimization function that metrics past machine learning platforms] used for training on the set of training data

	Akiba teaches
selecting a first machine learning platform from a set of machine learning platforms based on a first optimization function that metrics past machine learning platforms used for training on the set of training data; 
(Akiba [fig(s) 1-2] [fig(s) 11-12] “Figure (b) illustrates the effect of the number of workers on the performance.” [sec(s) 5] “We also evaluated the scalability of Optuna’s distributed optimization. Based on the same experimental setup used in Section 5.2, we recorded the transition of the best scores obtained by TPE with 1, 2, 4, and 8 workers in a distributed environment. Figure 11b shows the relationship between optimization score and execution time. We can see that the convergence speed increases with the number of workers.”;)

Bergstra is combinable with Akiba for the same rationale as set forth above with respect to claim 1.

Regarding claim 11
The combination of Bergstra, Akiba teaches claim 10.

Bergstra further teaches 
subsequent to the training the machine learning model, calculating a model performance metric corresponding to the machine learning model; and 
(Bergstra [fig(s) 1] [sec(s) Example usage] “Following Scikit-learn’s convention, Hyperopt-Sklearn provides an Estimator class with a fit method and a predict method. The fit method of this class performs hyperparameter optimization, and after it has completed, the predict method applies the best model to test data. Each evaluation during optimization performs training on a large fraction of the training set, estimates test set accuracy on a validation set and returns that validation set score to the optimizer. At the end of search, the best configuration is retrained on the whole data set to produce the classifier that handles subsequent predict calls.”

    PNG
    media_image7.png
    554
    824
    media_image7.png
    Greyscale
; e.g., “Report the accuracy of the classifier” read(s) on “calculating a model performance metric”.)

comparing the model performance metric to a performance threshold.
(Bergstra [fig(s) 1] [fig(s) 2] “For each data set, searching the full configuration space (‘any classifier’) delivered performance approximately on par with a search that was restricted to the best classifier type. (Best viewed in color.)” [sec(s) Experiments] “Figure 2 shows that there was no penalty for searching broadly. We performed optimization runs of up to 300 function evaluations searching the entire space, and compared the quality of solution with specialized searches of specific classifier types (including best known classifiers).”

    PNG
    media_image7.png
    554
    824
    media_image7.png
    Greyscale
; e.g., “Report the accuracy of the classifier” read(s) on “calculating a model performance metric”. In addition, e.g., “searching the full configuration space (‘any classifier’) delivered performance approximately on par with a search that was restricted to the best classifier type” and “best known classifiers” read(s) on “performance threshold”.)

Regarding claim 16
The combination of Bergstra, Akiba teaches claim 10.

Bergstra further teaches 
the first algorithm is included in a set of algorithms supported by the first machine learning platform, the set of algorithms comprising at least one of a neural network, a recurrent neural network, a gradient boosted tree, a logistic regression, or a random forest.
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Scikit-learn model selection as a search problem] “The classification algorithms were (by class name (used + unused hyperparameters)): SVC(23), KNN(4+5), RandomForest(8), ExtraTrees(8), SGD(8 +4), and MultinomialNB(2).” [sec(s) Parallel evaluation with a cluster] “Hyperopt has been designed to make use of a cluster of computers for faster search. … Parallel search can be done with the same objective functions as the ones used for sequential search, but users wishing to take advantage of asynchronous evaluation in the parallel case can do so by using a lower-level calling convention for their objective function.”;)

Regarding claim 17
The combination of Bergstra, Akiba teaches claim 10.

Bergstra further teaches 
the one or more hyperparameters are included in a set of hyperparameters supported by the first algorithm, 
the set of hyperparameters comprising at least one of [a learning rate, an activation function, a number of iterations, number of trees, a maximum depth, a dropout rate, a number of hidden layers, or a number of hidden nodes].
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Getting started with hyperopt] “This section introduces basic usage of the hyperopt.fmin function, which is Hyperopt’s basic optimization driver. We will look at how to write an objective function that fmin can optimize, and how to describe a configuration space that fmin can search. … To summarize, these are the steps to using Hyperopt: (1) implement an objective function that maps configuration points to a real-valued loss value, (2) define a configuration space of valid configuration points, and then (3) call fmin to search the space to optimize the objective function.” [sec(s) Scikit-learn model selection as a search problem] “Model selection is the process of estimating which machine learning model performs best from among a possibly infinite set of possibilities. … In this paper we discuss solving it with the Hyperopt optimization library. The basic approach is to set up a search space with random variable hyperparameters, use Scikit-learn to implement the objective function that performs model training and model validation, and use Hyperopt to optimize the hyperparamters. … Hyperopt-Sklearn provides a parameterization of a search space over pipelines, that is, of sequences of preprocessing steps and classifiers. The configuration space we provide includes six preprocessing algorithms and seven classification algorithms. The full search space is illustrated in figure 1. … The classification algorithms were (by class name (used + unused hyperparameters)): SVC(23), KNN(4+5), RandomForest(8), ExtraTrees(8), SGD(8 +4), and MultinomialNB(2).”; e.g., “choosing paths in an ancestral sampling process” and “optimize the objective function” and “use Hyperopt to optimize the hyperparamters” along with fig 1 read(s) on “one or more hyperparameters”.)

Akiba further teaches 
the one or more hyperparameters are included in a set of hyperparameters supported by the first algorithm, 
the set of hyperparameters comprising at least one of a learning rate, an activation function, a number of iterations, number of trees, a maximum depth, a dropout rate, a number of hidden layers, or a number of hidden nodes.
(Akiba [fig(s) 1] “An example code of Optuna’s define-by-run style API. This code builds a space of hyperparameters for a classifier of the MNIST dataset and optimizes the number of layers and the number of hidden units at each layer.” [sec(s) 2] “Upon the invocation of ‘suggest API’, a hyperparameter is statistically sampled based on the history of previously evaluated trials. At Line 5, ‘suggest_int’ method suggests a value for ‘n_layers’, the integer hyperparameter that determines the number of layers in the Multilayer Perceptron. … The method ‘create_model’ generates ‘n_layers’ in Line 5 and uses a for loop to construct a neural network of depth equal to ‘n_layers’. The method also generates ‘n_units_i’ at each i-th loop, a hyperparameter that determines the number of the units in the i-th layer”;)

	The combination of Bergstra, Akiba is combinable with Akiba for the same rationale as set forth above with respect to claim 1.

Regarding claim 18
The claim is a computer-readable storage medium claim corresponding to the method claim 10, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of the method claim. 
Note that Bergstra teaches computer readable medium and one or more hardware processors (Bergstra [fig(s) 1-2] [sec(s) Example usage] “Here is the simplest example of using this software”; e.g., the example code read(s) on “one or more hardware processors” and “computer readable medium” since code runs on a computer.)

Regarding claim 19
The claim is a computer-readable storage medium claim corresponding to the method claim 11, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of the method claim.

Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bergstra et al. (Hyperopt: a Python library for model selection and hyperparameter optimization) in view of Akiba et al. (Optuna: A Next-generation Hyperparameter Optimization Framework) further in view of Nguyen et al. (Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey)

Regarding claim 3
The combination of Bergstra, Akiba teaches claim 2.

Bergstra further teaches 
a second machine learning platform of the set of machine learning platforms supports a second set of machine learning algorithms that is [different] than the set of machine learning algorithms supported by the first machine learning platform.
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Introduction] “This paper describes the usage and architecture of Hyperopt, for both sequential and parallel optimization of expensive functions.” [sec(s) Getting started with hyperopt] “Later, the section ‘Trial results: more than just the loss’ will explain how to use the trials database to analyze the results of a search and the section Parallel Evaluation with a Cluster will explain how to use parallel computation to search faster.” [sec(s) Parallel evaluation with a cluster] “Hyperopt has been designed to make use of a cluster of computers for faster search. … Parallel search can be done with the same objective functions as the ones used for sequential search, but users wishing to take advantage of asynchronous evaluation in the parallel case can do so by using a lower-level calling convention for their objective function.”; e.g., “Parallel evaluation with a cluster” and “sequential search” read on “first machine learning platform” and “second machine learning platform”.)

However, the combination of Bergstra, Akiba does not appear to explicitly teach:
a second machine learning platform of the set of machine learning platforms supports a second set of machine learning algorithms that is [different] than the set of machine learning algorithms supported by the first machine learning platform.

Nguyen teaches
a second machine learning platform of the set of machine learning platforms supports a second set of machine learning algorithms that is different than the set of machine learning algorithms supported by the first machine learning platform.
(Nguyen [fig(s) 2] [table(s) 3] “ML frameworks and libraries without special hardware supports” [sec(s) 4] “

    PNG
    media_image8.png
    703
    1507
    media_image8.png
    Greyscale


    PNG
    media_image9.png
    156
    1486
    media_image9.png
    Greyscale

LibSVM is a specialized library for Support Vector Machines (SVM). Its development started in 2000 at National Taiwan University (Chang and Lin 2011; LibSVM 2018). The library is written in C/C++ but has also Java source code. Its learning tasks are (1) support vector classification (SVC) for binary and multi-class, (2) support vector regression (SVR), and (3) distribution estimation. … LibLinear is a library designed for solving large-scale linear classification problems. It was developed starting in 2007 at National Taiwan University (Fan et al. 2008; LibLinear 2018). The library is written in C/C++. The supported ML tasks are logistic regression and linear SVM. The supported problem formulation are: L2-regularized logistic regression, L2-loss and L1-loss linear SVMs. … XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable (Chen and Guestrin 2016; DMLC 2018; Mitchell 2017). The XGBoost is an open-source library that implements the gradient boosting decision tree algorithm.”;)

Bergstra teaches a system that enables to select models and hyperparameters on sequential or parallel platforms based on an optimization. In addition, Akiba teaches a system that evaluates the scalability of the distributed optimization based on the relationship between optimization score and execution time, and shows that the convergence speed increases with the number of workers. Furthermore, Nguyen teaches that different machine learning platforms support different algorithms and they provide their own specialized functions, e.g., LibLinear (a library designed for solving large-scale linear classification problems) and XGBoost (an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the machine learning model selection system of Bergstra, Akiba with the different machine learning platforms supporting different algorithms of Nguyen. 
One of ordinary skill in the art would have been motived to combine in order to provide machine learning platforms with their own specialized functions, e.g., LibLinear (a library designed for solving large-scale linear classification problems) and XGBoost (an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable).
(Nguyen [sec(s) 4] “LibSVM is a specialized library for Support Vector Machines (SVM). Its development started in 2000 at National Taiwan University (Chang and Lin 2011; LibSVM 2018). The library is written in C/C++ but has also Java source code. Its learning tasks are (1) support vector classification (SVC) for binary and multi-class, (2) support vector regression (SVR), and (3) distribution estimation. … LibLinear is a library designed for solving large-scale linear classification problems. It was developed starting in 2007 at National Taiwan University (Fan et al. 2008; LibLinear 2018). The library is written in C/C++. The supported ML tasks are logistic regression and linear SVM. The supported problem formulation are: L2-regularized logistic regression, L2-loss and L1-loss linear SVMs. … XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable (Chen and Guestrin 2016; DMLC 2018; Mitchell 2017). The XGBoost is an open-source library that implements the gradient boosting decision tree algorithm.”;)

Claim(s) 9, 12-13, 15, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bergstra et al. (Hyperopt: a Python library for model selection and hyperparameter optimization) in view of Akiba et al. (Optuna: A Next-generation Hyperparameter Optimization Framework) further in view of Tu et al. (AutoNE: Hyperparameter Optimization for Massive Network Embedding)

Regarding claim 9
The combination of Bergstra, Akiba teaches claim 1.

Bergstra further teaches 
a final iteration of the iteratively selecting comprises: 
selecting a final machine learning model having a model performance metric that satisfies a metric [threshold].
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.”
    PNG
    media_image1.png
    525
    1159
    media_image1.png
    Greyscale
 [sec(s) Getting started with hyperopt] “The way to use Hyperopt is to describe: • the objective function to minimize, • the space over which to search … To summarize, these are the steps to using Hyperopt: (1) implement an objective function that maps configuration points to a real-valued loss value, (2) define a configuration space of valid configuration points, and then (3) call fmin to search the space to optimize the objective function” [sec(s) Introduction] “Many widely-used machine learning algorithms take a significant amount of time to train from data. At the same time, these same algorithms must be configured prior to training. Most implementations of machine learning algorithms have a set of configuration variables that the user can set which have various effects on how the training is done.” [sec(s) Scikit-learn model selection as a search problem] “Model selection is the process of estimating which machine learning model performs best from among a possibly infinite set of possibilities. … In this paper we discuss solving it with the Hyperopt optimization library. The basic approach is to set up a search space with random variable hyperparameters, use Scikit-learn to implement the objective function that performs model training and model validation, and use Hyperopt to optimize the hyperparamters. … Hyperopt-Sklearn provides a parameterization of a search space over pipelines, that is, of sequences of preprocessing steps and classifiers. The configuration space we provide includes six preprocessing algorithms and seven classification algorithms. The full search space is illustrated in figure 1. … The classification algorithms were (by class name (used + unused hyperparameters)): SVC(23), KNN(4+5), RandomForest(8), ExtraTrees(8), SGD(8 +4), and MultinomialNB(2).”; e.g., “estimating which machine learning model performs best”, “optimize the objective function” and “use Hyperopt to optimize the hyperparamters” along with fig 1 read(s) on “selecting a final machine learning model having a model performance metric that satisfies a metric”.)

However, the combination of Bergstra, Akiba does not appear to explicitly teach:
selecting a final machine learning model having a model performance metric that satisfies a metric [threshold].

Tu teaches
selecting a final machine learning model having a model performance metric that satisfies a metric threshold.
(Tu [fig(s) 2-5] “The number of trials required by each method to reach a certain performance threshold. The NE algorithm being tuned is DeepWalk. The vertical dash line marks the conjectured performance when the number of trials is unlimited.” [sec(s) 4] “we can see from Figure 3 that our framework takes much fewer trials to find a good hyperparameter configuration, which demonstrates that our framework is more capable of handling large-scale networks on a limited time budget.”;)

Bergstra teaches a system that enables to select models and hyperparameters on sequential or parallel platforms based on an optimization. In addition, Akiba teaches a system that evaluates the scalability of the distributed optimization based on the relationship between optimization score and execution time, and shows that the convergence speed increases with the number of workers. Furthermore, Tu teaches that different machine learning algorithms reach a certain performance threshold at different number of trials, and effectively demonstrates which framework takes much fewer trials to find a good hyperparameter configuration.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the machine learning model selection system of Bergstra, Akiba with the different machine learning platforms supporting different algorithms of Tu. 
One of ordinary skill in the art would have been motived to combine in order to effectively demonstrate which framework takes much fewer trials to find a good hyperparameter configuration.
(Tu [fig(s) 2-5] “The number of trials required by each method to reach a certain performance threshold. The NE algorithm being tuned is DeepWalk. The vertical dash line marks the conjectured performance when the number of trials is unlimited.” [sec(s) 4] “we can see from Figure 3 that our framework takes much fewer trials to find a good hyperparameter configuration, which demonstrates that our framework is more capable of handling large-scale networks on a limited time budget.”;)

Regarding claim 12
The combination of Bergstra, Akiba teaches claim 11.

Bergstra further teaches 
determining that the model performance metric [satisfies] the performance threshold; and 
(Bergstra [fig(s) 1] [fig(s) 2] “For each data set, searching the full configuration space (‘any classifier’) delivered performance approximately on par with a search that was restricted to the best classifier type. (Best viewed in color.)” [sec(s) Experiments] “Figure 2 shows that there was no penalty for searching broadly. We performed optimization runs of up to 300 function evaluations searching the entire space, and compared the quality of solution with specialized searches of specific classifier types (including best known classifiers).”

    PNG
    media_image7.png
    554
    824
    media_image7.png
    Greyscale
; e.g., “Report the accuracy of the classifier” read(s) on “model performance metric”. In addition, Bergstra does not appear to explicitly teach but suggests “satisfies” based on e.g., “searching the full configuration space (‘any classifier’) delivered performance approximately on par with a search that was restricted to the best classifier type” and “best known classifiers”.)

outputting the machine learning model as a final machine learning model.
(Bergstra [fig(s) 1] [fig(s) 2] “For each data set, searching the full configuration space (‘any classifier’) delivered performance approximately on par with a search that was restricted to the best classifier type. (Best viewed in color.)” [sec(s) Example usage] “The fit method of this class performs hyperparameter optimization, and after it has completed, the predict method applies the best model to test data. Each evaluation during optimization performs training on a large fraction of the training set, estimates test set accuracy on a validation set and returns that validation set score to the optimizer. At the end of search, the best configuration is retrained on the whole data set to produce the classifier that handles subsequent predict calls.” [sec(s) Experiments] “Figure 2 shows that there was no penalty for searching broadly. We performed optimization runs of up to 300 function evaluations searching the entire space, and compared the quality of solution with specialized searches of specific classifier types (including best known classifiers). … Return instances of the classifier and preprocessing steps model = estim.best_model()”

    PNG
    media_image7.png
    554
    824
    media_image7.png
    Greyscale
; e.g., “Return instances of the classifier and preprocessing steps model = estim.best_model()” read(s) on “outputting the machine learning model as a final machine learning model”.)

However, the combination of Bergstra, Akiba does not appear to explicitly teach:
determining that the model performance metric [satisfies] the performance threshold; 

Tu teaches
determining that the model performance metric satisfies the performance threshold; 
(Tu [fig(s) 2-5] “The number of trials required by each method to reach a certain performance threshold. The NE algorithm being tuned is DeepWalk. The vertical dash line marks the conjectured performance when the number of trials is unlimited.” [sec(s) 4] “we can see from Figure 3 that our framework takes much fewer trials to find a good hyperparameter configuration, which demonstrates that our framework is more capable of handling large-scale networks on a limited time budget.”;)

The combination of Bergstra, Akiba is combinable with Tu for the same rationale as set forth above with respect to claim 9.

Regarding claim 13
The combination of Bergstra, Akiba teaches claim 11.

Bergstra further teaches 
determining that the model performance metric [fails to satisfy] the performance threshold; and 
(Bergstra [fig(s) 1] [fig(s) 2] “For each data set, searching the full configuration space (‘any classifier’) delivered performance approximately on par with a search that was restricted to the best classifier type. (Best viewed in color.)” [sec(s) Experiments] “Figure 2 shows that there was no penalty for searching broadly. We performed optimization runs of up to 300 function evaluations searching the entire space, and compared the quality of solution with specialized searches of specific classifier types (including best known classifiers).”

    PNG
    media_image7.png
    554
    824
    media_image7.png
    Greyscale
; e.g., “Report the accuracy of the classifier” read(s) on “model performance metric”. In addition, e.g., “searching the full configuration space (‘any classifier’) delivered performance approximately on par with a search that was restricted to the best classifier type” and “best known classifiers” read(s) on “performance threshold”. Furthermore, Bergstra does not appear to explicitly teach but suggests “satisfy” based on e.g., “searching the full configuration space (‘any classifier’) delivered performance approximately on par with a search that was restricted to the best classifier type” and “best known classifiers”.)

selecting a second machine learning platform, a second algorithm supported by the second machine learning platform, and one or more hyperparameters supported by the second algorithm to train a second machine learning model on the set of training data.
(Bergstra [fig(s) 1] [fig(s) 2] “For each data set, searching the full configuration space (‘any classifier’) delivered performance approximately on par with a search that was restricted to the best classifier type. (Best viewed in color.)” [sec(s) Getting started with hyperopt] “Later, the section ‘Trial results: more than just the loss’ will explain how to use the trials database to analyze the results of a search and the section Parallel Evaluation with a Cluster will explain how to use parallel computation to search faster.” [sec(s) Parallel evaluation with a cluster] “Hyperopt has been designed to make use of a cluster of computers for faster search. … Parallel search can be done with the same objective functions as the ones used for sequential search, but users wishing to take advantage of asynchronous evaluation in the parallel case can do so by using a lower-level calling convention for their objective function.” [sec(s) Experiments] “Figure 2 shows that there was no penalty for searching broadly. We performed optimization runs of up to 300 function evaluations searching the entire space, and compared the quality of solution with specialized searches of specific classifier types (including best known classifiers).”

    PNG
    media_image7.png
    554
    824
    media_image7.png
    Greyscale
;)

However, the combination of Bergstra, Akiba does not appear to explicitly teach:
determining that the model performance metric [fails to satisfy] the performance threshold; and 

Tu teaches
determining that the model performance metric fails to satisfy the performance threshold; and 
(Tu [fig(s) 2-5] “The number of trials required by each method to reach a certain performance threshold. The NE algorithm being tuned is DeepWalk. The vertical dash line marks the conjectured performance when the number of trials is unlimited.” [sec(s) 4] “we can see from Figure 3 that our framework takes much fewer trials to find a good hyperparameter configuration, which demonstrates that our framework is more capable of handling large-scale networks on a limited time budget.”;)

The combination of Bergstra, Akiba is combinable with Tu for the same rationale as set forth above with respect to claim 9.

Regarding claim 15
The combination of Bergstra, Akiba, Tu teaches claim 13.

Bergstra further teaches 
wherein the second algorithm is different than the first algorithm, and wherein the second algorithm supports a different set of hyperparameters than the first algorithm.
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Scikit-learn model selection as a search problem] “Model selection is the process of estimating which machine learning model performs best from among a possibly infinite set of possibilities. … In this paper we discuss solving it with the Hyperopt optimization library. The basic approach is to set up a search space with random variable hyperparameters, use Scikit-learn to implement the objective function that performs model training and model validation, and use Hyperopt to optimize the hyperparamters …The configuration space we provide includes six preprocessing algorithms and seven classification algorithms. The full search space is illustrated in figure 1. … The classification algorithms were (by class name (used + unused hyperparameters)): SVC(23), KNN(4+5), RandomForest(8), ExtraTrees(8), SGD(8 +4), and MultinomialNB(2).”; e.g., “choosing paths in an ancestral sampling process”, “use Hyperopt to optimize the hyperparamters”, “classification algorithms were (by class name (used + unused hyperparameters)): SVC(23), KNN(4+5), RandomForest(8), ExtraTrees(8), SGD(8 +4), and MultinomialNB(2)” along with fig 1 read(s) on “different set of hyperparameters”.)

Regarding claim 20
The claim is a computer-readable storage medium claim corresponding to the method claim 12, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of the method claim.

Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bergstra et al. (Hyperopt: a Python library for model selection and hyperparameter optimization) in view of Akiba et al. (Optuna: A Next-generation Hyperparameter Optimization Framework) further in view of Tu et al. (AutoNE: Hyperparameter Optimization for Massive Network Embedding) further in view of Nguyen et al. (Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey)

Regarding claim 14
The combination of Bergstra, Akiba, Tu teaches claim 13.

Bergstra further teaches 
the second machine learning platform supports a [different] set of algorithms than the first machine learning platform.
(Bergstra [fig(s) 1] “Hyeropt-Sklearns full search space (‘any classifier’) consists of a (preprocessing, classsifier) pair. There are six possible preprocessing modules and six possible classifiers. Choosing a model within this configuration space means choosing paths in an ancestral sampling process. The highlighted green edges and nodes represent a (PCA, K-Nearest Neighbor) model. The number of active hyperparameters in a model is the sum of parenthetical numbers in the selected boxes. For the PCA + KNN combination, seven hyperparameters are activated.” [sec(s) Introduction] “This paper describes the usage and architecture of Hyperopt, for both sequential and parallel optimization of expensive functions.” [sec(s) Getting started with hyperopt] “Later, the section ‘Trial results: more than just the loss’ will explain how to use the trials database to analyze the results of a search and the section Parallel Evaluation with a Cluster will explain how to use parallel computation to search faster.” [sec(s) Parallel evaluation with a cluster] “Hyperopt has been designed to make use of a cluster of computers for faster search. … Parallel search can be done with the same objective functions as the ones used for sequential search, but users wishing to take advantage of asynchronous evaluation in the parallel case can do so by using a lower-level calling convention for their objective function.”; e.g., “Parallel evaluation with a cluster” and “sequential search” read on “first machine learning platform” and “second machine learning platform”.)

However, the combination of Bergstra, Akiba, Tu does not appear to explicitly teach:
the second machine learning platform supports a [different] set of algorithms than the first machine learning platform.

Nguyen teaches
the second machine learning platform supports a different set of algorithms than the first machine learning platform.
(Nguyen [fig(s) 2] [table(s) 3] “ML frameworks and libraries without special hardware supports” [sec(s) 4] “

    PNG
    media_image8.png
    703
    1507
    media_image8.png
    Greyscale


    PNG
    media_image9.png
    156
    1486
    media_image9.png
    Greyscale

LibSVM is a specialized library for Support Vector Machines (SVM). Its development started in 2000 at National Taiwan University (Chang and Lin 2011; LibSVM 2018). The library is written in C/C++ but has also Java source code. Its learning tasks are (1) support vector classification (SVC) for binary and multi-class, (2) support vector regression (SVR), and (3) distribution estimation. … LibLinear is a library designed for solving large-scale linear classification problems. It was developed starting in 2007 at National Taiwan University (Fan et al. 2008; LibLinear 2018). The library is written in C/C++. The supported ML tasks are logistic regression and linear SVM. The supported problem formulation are: L2-regularized logistic regression, L2-loss and L1-loss linear SVMs. … XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable (Chen and Guestrin 2016; DMLC 2018; Mitchell 2017). The XGBoost is an open-source library that implements the gradient boosting decision tree algorithm.”;)
	
	The combination of Bergstra, Akiba, Tu is combinable with Nguyen for the same rationale as set forth above with respect to claim 3.

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Zhang et al. (Deep Learning in Mobile and Wireless Networking: A Survey) teaches diverse machine learning libraries for mobile applications.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409. The examiner can normally be reached Mon - Thu 7:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/S.K./Examiner, Art Unit 2129                                                                                                                                                                                                        
9/20/2022
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129