DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to communications filed on April 18, 2019
Claims 1-20 are presented for examination and are pending. 

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/29/19, 08/06/19, 04/13/20, 06/24/20, 09/24/20, 12/16/20, 02/04/21, 01/20/22, 04/29/22, 07/29/22 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Drawings
The drawings filed on April 18, 2019 are accepted. 

Specification
The abstract of the disclosure is objected to because the abstract exceeds 150 words and contains legal phraseology such as “said”.  Correction is required.  See MPEP § 608.01(b).

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 3-10 and 13-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding Claim 3, 
Claim 3 recites “generating a second plurality of second meta-feature sets, second first meta-feature set of said second plurality of second meta-feature sets describing…”. This recitation lacks clarity because it is unclear how the second first meta-feature set can refer to the second plurality of second meta-feature sets. Additionally, claim 1 previously recites a “first plurality of first meta-feature sets”. It is unclear whether the “second first meta-feature set” refers to the first meta-feature set of claim 1, or if it is a separate meta feature set within the second plurality of second meta-feature set. For purposes of examination, this limitation will be interpreted as a second meta-feature set. 

Regarding Claim 4, 
Claim 4 recites “using as a target, scores that are generated by said hypertuning algorithm on said first data set” and is dependent on claim 1. This recitation lacks clarity because claim 1 states that a respective target set of hyperparameter settings are generated using a hypertuning algorithm. It is unclear whether the scores recited in claim 4 refers to the target set of hyperparameter settings or if the scores are an additional set generated by a hypertuning algorithm. For purposes of examination, the scores are interpreted to be the respective target set of hyperaparamter settings. 

Regarding Claim 13, 
Claim 13 recites “generating a second plurality of second meta-feature sets, second first meta-feature set of said second plurality of second meta-feature sets describing…”. This recitation lacks clarity because it is unclear how the second first meta-feature set can refer to the second plurality of second meta-feature sets. Additionally, claim 11 previously recites a “first plurality of first meta-feature sets”. It is unclear whether the “second first meta-feature set” refers to the first meta-feature set of claim 11, or if it is a separate meta feature set within the second plurality of second meta-feature set. For purposes of examination, this limitation will be interpreted as a second meta-feature set.

Regarding Claim 14, 
Claim 14 recites “using as a target, scores that are generated by said hypertuning algorithm on said first data set” and is dependent on claim 1. This recitation lacks clarity because claim 1 states that a respective target set of hyperparameter settings are generated using a hypertuning algorithm. It is unclear whether the scores recited in claim 4 refers to the target set of hyperparameter settings or if the scores are an additional set generated by a hypertuning algorithm. For purposes of examination, the scores are interpreted to be the respective target set of hyperaparamter settings.

Dependent claims 4-10 and 14-20 are rejected due to being directly and indirectly dependent on rejected claims. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2 and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Wistuba et al. (“Scalable Gaussian process-based transfer surrogates for hyperparameter optimization”), hereinafter Wistuba (2017), in view of Wistuba et al. (“Learning Hyperparameter Optimization Initializations”), hereinafter Wistuba (2015). 

Regarding Claim 1, 
Wistuba (2017) teaches: 
A method comprising: for each mini-machine learning model (MML model) of a plurality of MML models, (Page 43, Abstract: “Sequential model-based optimization (SMBO), based on so-called “surrogate models”, has been employed to allow for faster and more direct hyperparameter optimization. A surrogate model is a machine learning regression model which is trained on the meta-level instances in order to predict the performance of an algorithm on a specific data set given the hyperparameter settings and data set descriptors.” and Page 51, Section 4: “What all these surrogate models have in common is that they are relatively easy and fast to evaluate, at least in comparison to evaluating y, while still being able to learn complex functions.” teaches using surrogate models (mini-machine learning models) for sequential model-based optimization)

training a respective hyperparameter predictor set that predicts a respective set of predicted hyperparameter settings, (Page 44, Section 1: “The majority of efforts to solve this problem are based on the sequential model-based optimization (SMBO) framework, which has its roots in the area of black-box optimization. SMBO is an iterative approach which trains a surrogate model Ψ on the observed meta-level instances of y. Then, it can be used in order to predict the performance of an algorithm on a specific data set given the hyperparameter settings and data set descriptors. We use this method to find promising hyperparameter configurations, evaluate y for these configurations, and finally retrain Ψ.” teaches using sequential model-based optimization (hyperparameter predictor) to predict optimal hyperparameter configurations)

wherein training said respective hyperparameter predictor set for said each MML model comprises: generating first training data used to train said respective hyperparameter predictor set, (Page 44, Section 1: “SMBO is an iterative approach which trains a surrogate model Ψ on the observed meta-level instances of y. Then, it can be used in order to predict the performance of an algorithm on a specific data set given the hyperparameter settings and data set descriptors. We use this method to find promising hyperparameter configurations, evaluate y for these configurations, and finally retrain Ψ. The overall process is repeated T many times, and in the end, we take the best hyperparameter configuration found so far.” teaches that SMBO involves training the surrogate models; Page 50, Section 3: “Now we can define the task of hyperparameter optimization as finding the configuration
x*, that yields a model learned on the training partition which minimizes the loss on the
validation partition Dvalid of the data set:

    PNG
    media_image1.png
    80
    1021
    media_image1.png
    Greyscale
teaches using a training data partition for SMBO)

wherein generating first training data comprises: generating a first plurality of data set samples from a first data set; (Page 49-50, Section 3: “Given a concrete setting of x, A searches through the model space M to find a model that minimizes the empirical loss L on the training partition of data D, Dtrain, considering a regularization R to avoid overfitting… ” teaches using a training partition of data D, Dtrain, the training partition consists of sampled data from dataset D)

generating a first plurality of first meta-feature sets, each first meta-feature set of said first plurality of first meta-feature sets describing a respective first data set sample of said first plurality; (Page 53-54, Section 4.1: “The majority of work has focused on coming up with surrogate models that can effectively integrate the meta-knowledge, for example by being trained on the observation history prior to starting SMBO on a new data set. In order to do so, we have to add so called meta-features, that describe the characteristics of a data set, as otherwise the surrogate model would be unable to differentiate between instances if the same hyperparameter configuration has been used.” teaches that meta-features describe the characteristics of a data set and that meta-features are generated and used within SMBO)


Wistuba (2017) does not appear to explicitly teach: 
wherein said each MML model represents a respective reference machine learning algorithm (RML), 
generating a respective target set of hyperparameter settings for said each MML model using a hypertuning algorithm.

However, Wistuba (2015) teaches: 
wherein said each MML model represents a respective reference machine learning algorithm (RML), (Page 2, Section 3: “Sequential model-based optimization (SMBO) [7] was proposed as a black-box optimization framework. The idea is to replace the expensive-to-evaluate function f to minimize with a cheap-to-evaluate surrogate function Ψ that approximates f. An acquisition function (e.g. expected improvement [7]) is used to tackle the exploitation-exploration dilemma… In our scenario, evaluating f is equivalent to learning a machine learning algorithm on some training data for a given hyperparameter configuration and estimate the model’s performance on a holdout data set.” teaches that surrogate models are approximations (references) a machine learning model that is more expensive to evaluate)

generating a respective target set of hyperparameter settings for said each MML model using a hypertuning algorithm. (Page 1, Section 1: “We propose to transfer knowledge of hyperparameter configurations from past experiments to new data sets using an initialization strategy that can be used for, but is not limited to, the SMBO framework.” and Page 3, Section 4: “In this paper, the task of initializing hyperparameter optimization strategies is generalized and a novel approach to choose initial hyperparameter configurations is proposed. Instead of choosing from hyperparameter configurations that have been best on previous data sets [8], [14], we directly learn hyperparameter configurations and thus are not limited to hyperparameter configurations that have been evaluated on previous data sets. The idea is to initialize the initial sequence of hyperparameter configurations with promising configurations and further improve them by minimizing the proposed hyperparameter loss. Since this loss is differentiable, the initial sequence of hyperparameter configurations can be learned with gradient-based optimization techniques such as stochastic gradient descent.” teaches learning initializations of hyperparameter configurations (target set of hyperparameter settings) by using gradient based optimization techniques (hypertuning algorithm))

Wistuba (2017) and Wistuba (2015) are analogous art because they are directed to hyperparameter optimization. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the gradient based hyperparameter initialization of Wistuba (2015) with the sequential model-based optimization of Wistuba (2017) with a motivation to obtain better initializations and lead to an accelerated hyperparameter optimization (Wistuba (2015), Page 1). 

Regarding Claim 2, 
The combination of Wistuba (2017) and Wistuba (2015) teaches: 
The method of claim 1,

Wistuba (2017) further teaches: 
wherein training said respective hyperparameter predictor set comprises: using the first plurality of first meta-feature sets… to train said respective hyperparameter predictor set. (Page 44, Section 1: “The majority of efforts to solve this problem are based on the sequential model-based optimization (SMBO) framework, which has its roots in the area of black-box optimization. SMBO is an iterative approach which trains a surrogate model Ψ on the observed meta-level instances of y. Then, it can be used in order to predict the performance of an algorithm on a specific data set given the hyperparameter settings and data set descriptors. We use this method to find promising hyperparameter configurations, evaluate y for these configurations, and finally retrain Ψ.” teaches that training using SMBO involves using meta-level instances of a function; Page 53-54, Section 4.1: “The majority of work has focused on coming up with surrogate models that can effectively integrate the meta-knowledge, for example by being trained on the observation history prior to starting SMBO on a new data set. In order to do so, we have to add so called meta-features, that describe the characteristics of a data set, as otherwise the surrogate model would be unable to differentiate between instances if the same hyperparameter configuration has been used.” teaches that the meta-level instances are meta-features)

Wistuba (2015) further teaches:
…and the respective target set of hyperparameter settings to train [a] hyperparameter predictor set (Page 1, Section 1: “We propose to transfer knowledge of hyperparameter configurations from past experiments to new data sets using an initialization strategy that can be used for, but is not limited to, the SMBO framework. We mathematically formalize the problem of hyperparameter optimization and derive a hyperparameter optimization loss. Since this meta-loss is yet not differentiable, we propose to use a differentiable plug-in estimator. Given this estimator, initial hyperparameter configurations can be learned by gradient-based optimization techniques that minimize the meta-loss.” teaches initializing hyperparameter configurations (target hyperparameter settings) by using gradient-based optimization techniques and these techniques can be applied to the SMBO framework (hyperparameter predictor))

The combination of claim 1 has already incorporated hyperparameter initialization, therefore already incorporating the details of the target hyperparameter settings required by claim 2. 

Regarding Claim 11, 
This claim recites A non-transitory computer-readable storage medium…, which performs a plurality of operations as recited by the method of claim 1, and has limitations that are similar to those of claim 1, thus is rejected with the same rationale applied against claim 1.
Wistuba further teaches: 
A non-transitory computer-readable storage medium storing sequences of instructions that, when executed by one or more processors, cause… (Page 62, Section 8.1: “The other meta-data set is extended so that it contains both hyperparameter performance on different data sets and performance of a set of different algorithms. In order to accomplish this, we used WEKA (Holmes et al. 1994) to run 19 different classifiers on 59 data sets for a total of 21,871 hyperparameter configurations to evaluate per data set. An overview of the employed classifiers can be seen in Table 2. In total, this sums up to roughly 1.3 million experiments. The overall computation of this meta-data set took about 900 CPU hours” teaches a computer based implementation)

Regarding Claim 12, 
This claim recites The non-transitory computer-readable storage medium of claim 11…, which performs a plurality of operations as recited by the method of claim 2, and has limitations that are similar to those of claim 2, thus is rejected with the same rationale applied against claim 2.


Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Wistuba (2017) in view of Wistuba (2015), further in view of Cohen et al. (“Online Row Sampling”).

Regarding Claim 3, 
The combination of Wistuba (2017) and Wistuba (2015) teaches: 
The method of claim 1,

Wistuba (2017) further teaches:
further comprising: training each MML model of the plurality of MML models thereby generating a respective trained version of said each MML model of a plurality of trained MML models and respective scores by at least: ((Page 44, Section 1: “The majority of efforts to solve this problem are based on the sequential model-based optimization (SMBO) framework, which has its roots in the area of black-box optimization. SMBO is an iterative approach which trains a surrogate model Ψ on the observed meta-level instances of y. Then, it can be used in order to predict the performance of an algorithm on a specific data set given the hyperparameter settings and data set descriptors. We use this method to find promising hyperparameter configurations, evaluate y for these configurations, and finally retrain Ψ.” teaches training the surrogate model (MML model); Page 59, Section 6: “As mentioned before, training involves dividing the meta-instances in M subsets, one subset for each data set on which we observed evaluations. Thus, every Gaussian process becomes the expert of the respective data set. The prediction uses these experts plus one additional expert that is estimated on the observed performances on the new data set. Based on Eqs. 31 and 32, the mean and uncertainty is estimated. In the following subsections, we will discuss how to derive possible options for choosing w and v which we introduced in Eqs. 31 and 32. Each version is a possible surrogate model Ψ that can be used in SMBO (see Algorithm 1).” teaches dividing the meta-features among multiple subsets and creating multiple surrogate models using each meta-feature subset; Page 53, Section 4: “An overview of how SMBO works can be seen in Fig. 2, where a Gaussian process (black solid line with uncertainty indicated by dashed gray lines) is initially learned on three data points to approximate the ground truth (yellow solid line). The blue line at the bottom indicates the score of the acquisition function, the cross indicates the maximum of the acquisition function, which is the argument that will be evaluated next by SMBO.” teaches that the SMBO evaluates the scores of acquisition functions)

generating a second plurality of second meta-feature sets, second first meta- feature set of said second plurality of second meta-feature sets describing a respective second data set sample of… second plurality of data set samples; (Page 57-58: “Therefore, in order to still learn Gaussian processes, we propose to subdivide the metadata into M many individual parts and learn a single Gaussian process independently on each of the parts, including a single, additional Gaussian process for all the new observations that we will see during the SMBO trials. Formally, we divide our meta-data… in a way where all X(i) are pairwise disjoint. However, instead of taking an arbitrary subdivision of our meta-data, we simply divide it by the data sets we have already observed. This means, for each data set Di , we create a subset X(i) , y(i) which contains all meta-instances of data set Di . As a result, we have M Gaussian processes learned, one for each data set…” teaches dividing the metadata into multiple subsets; Page 53-54, Section 4.1: “The majority of work has focused on coming up with surrogate models that can effectively integrate the meta-knowledge, for example by being trained on the observation history prior to starting SMBO on a new data set. In order to do so, we have to add so called meta-features, that describe the characteristics of a data set, as otherwise the surrogate model would be unable to differentiate between instances if the same hyperparameter configuration has been used.” teaches that SMBO uses meta-features to train the surrogate models, therefore meta-features of the meta-data set are divided into multiple subsets (plurality of meta-feature sets))


generating respective predicted hyperparameter settings by applying the respective hyperparameter predictor set of said each trained MML model to said second plurality of second meta-feature sets. (Page 44, Section 1: “SMBO is an iterative approach which trains a surrogate model Ψ on the observed meta-level instances of y. Then, it can be used in order to predict the performance of an algorithm on a specific data set given the hyperparameter settings and data set descriptors. We use this method to find promising hyperparameter configurations, evaluate y for these configurations, and finally retrain Ψ. The overall process is repeated T many times, and in the end, we take the best hyperparameter configuration found so far. In comparison to exhaustive search methods, SMBO tries to adaptively steer the optimization into promising regions in the hyperparameter space.” teaches that SMBO generates optimized hyperparameter configurations by using surrogate models (MML model) and meta-level instances (meta-features))

The combination of Wistuba (2017) and Wistuba (2015) does not appear to explicitly teach: 
generating a second plurality of data set samples from said first data set; 

However, Cohen teaches: 
generating a second plurality of data set samples from [a] first data set; (Page 3, Section 2: “Let A be an n × d matrix with rows a1, . . . , an. A natural approach to row sampling from A is picking an a priori probability with which each row is kept, and then deciding whether to keep each row independently. A common choice is for the sampling probabilities to be proportional to the leverage scores of the rows.” teaches preforming row sampling to obtain multiple samples from a dataset)

Wistuba (2017), Wistuba (2015), and Cohen are analogous art because they are directed to creating machine learning models using configurations of metadata. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the row sampling of Cohen with the multiple data sets of Wistuba (2017)/Wistuba (2015) with a motivation improve interpretability, save space when a dataset is sparse, and preserver row structure for a dataset (Cohen, Page 1).

Regarding Claim 13, 
This claim recites The non-transitory computer-readable storage medium of claim 11…, which performs a plurality of operations as recited by the method of claim 3, and has limitations that are similar to those of claim 3, thus is rejected with the same rationale applied against claim 3.


	Conclusion
Claims 4-10 and 14-20 have been searched, but no prior art to teach these claims has been uncovered. 
	
The prior art made of record but not relied upon is considered pertinent to the applicant’s disclosure: 
	Feurer et al. (“Scalable Meta-Learning for Bayesian Optimization”) teaches performing Bayesian optimization methods and forming an ensemble of Gaussian processes to optimize hyperparameters. 
Bardenet et al. (“Collaborative hyperparameter tuning”) teaches using surrogate-based ranking and optimization techniqeus to perform hyperparamter optimization. 
Yogatama et al. (“Efficient Transfer Learning Method for Automatic Hyperparameter Tuning”) teaches using sequential model-based optimization to optimize hyperparameters by transferring information between multiple datasets. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHOUN ABRAHAM whose telephone number is (571)272-8144. The examiner can normally be reached Mon - Fri 08:00-16:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.J.A./Examiner, Art Unit 2125                                                                                                                                                                                                        

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125