Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 are rejected under 35 U.S.C. 103 as being unpatentable over Feurer et al., (hereafter Feurer) “Efficient and Robust Automated Machine Learning”, in view of Wistuba et al. (hereafter Wistuba) “Two-Stage Transfer Surrogate Model for Automatic Hyperparmeter Optimization”.
Regarding claim 1
A method comprising:
Feurer teaches training a first artificial intelligence model using training data from a first application; ([pg. 2, lines 13-16] “we automatically construct ensembles of the models considered by Bayesian optimization”, the examiner notes ““we automatically construct ensembles of the models considered by Bayesian optimization” teaches “training a first artificial intelligence model using training data from a first application”, because an ensemble of models includes a first model, where each model is a variant trained on the same data set.
optimizing a first hyperparameter operator in the first artificial intelligence model; ([pg. 2, lines 36-40] “This CASH problem was first tackled by Thornton et al. [2] in the AUTO-WEKA system using the matching learning framework WEKA [8] and tree-based Bayesian optimization methods [9,10].  In a nutshell, Bayesian optimization [3] fits a probabilistic model to capture the relationship between hyperparameter settings and their measured performance; it then uses this model to select the most promising hyperparameter setting (trading off exploration of new parts of the space vs. exploitation in known good regions), evaluates that hyperparameter setting, updates the model with the result, and iterates.” The examiner notes “Bayesian optimization fits a probabilistic model to capture the relationship between hyperparameter settings and their measured performance” teaches “optimizing a first hyperparameter operator in the first artificial intelligence model”.)
creating a second artificial intelligence model for a second application;  ([pg. 3, lines 30-35] “for each machine learning dataset in a dataset repository (in our case 140 datasets from the OpenML [18] repository), we evaluated a set of meta-features (described below) and used Bayesian optimization to determine and store an instantiation of the given ML framework with strong empirical performance for that dataset.”  The examiner notes “used Bayesian optimization to determine and store an installation of the given ML framework with strong empirical performance for that dataset” teaches “creating a second artificial intelligence model” and “for each machine learning dataset in a dataset repository” teaches “for a second application”).	
Feurer does not teach using the first hyperparameter operator optimized during training of the first artificial intelligence model in training the second artificial intelligence model.
Wistuba teaches using the first hyperparameter operator optimized during training of the first artificial intelligence model in training the second artificial intelligence model. ([pg. 2, lines 11-14] “Human experts utilize their experience with a machine learning model and try hyperparameter configuration that have been good on other data sets.  This transfer of knowledge is one important research direction in the domain of automatic hyperparameter organization.”  The examiner notes “Human experts utilize their experience with a machine learning model and try hyperparameter configuration that have been good on other data sets” teaches “using the first hyperparameter operator optimized during training of the first artificial intelligence model in training the second artificial intelligence model”).
Feurer and Wistuba are analogous art because they are from the same field of endeavor, and/or are reasonably pertinent to the problem of using machine learning methods to perform their respective work. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Feurer to incorporate the teaching of Wistuba to use a previous hyperparameter configurations as per Wistuba [pg. 2, lines 11-14] to transfer the knowledge from good performance on a model for one data set to another.
Regarding claim 2
The combination of Feurer and Wistuba teaches claim 1.
Feurer teaches wherein the first application and the second application are the same application ([pg. 4, lines 4-12] “While Bayesian hperparameter optimization is data-efficient in finding the best-performing hyperparameter setting, we note that it is a very wasteful procedure when the goal is simply to make good predictions:  all the models it trains during the course of the search are lost, usually including some that perform almost as well as the best.  Rather than discarding these models, we propose to store them and to use an efficient post-processing method (which can be run in a second process on-the-fly) to construct an ensemble out of them.”  The examiner notes “rather than discarding these models, we propose to store them and to use an efficient post-processing method (...) to construct an ensemble out of them” teaches “wherein the first application and the second application are the same application” because constructing an ensemble of models to make predication on a dataset means the application is the same).
Regarding claim 3
The combination of Feurer and Wistuba teaches claim 1.
Feurer teaches wherein the first application and the second application are different applications.  ([pg. 6, lines 4-9] “we gathered 140 binary and multiclass classification datasets from the OpenML repository [18], (…) These datasets cover a diverse range of applications, such as text classification, digit and letter recognition, gene sequence and RNA classification, advertisement, particle classification for telescope data, and cancer detection in tissue samples.”  The examiner notes “These datasets cover a diverse range of applications” teaches “the first application and the second applications are different applications” because the different models are developed for different classification tasks).
Claims 4 and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Feurer, in view of Wistuba, in further view of Nickson et al. (hereafter Nickson) “Automated machine learning on big data using stochastic algorithm tuning”.
Regarding claim 4
The combination of Feurer and Wistuba teaches claim 1.
Feurer teaches wherein the optimizing of the first hyperparameter operator is performed using machine learning algorithms ([pg. 2, lines 36-40] “This CASH problem was first tackled by Thornton et al. [2] in the AUTO-WEKA system using the matching learning framework WEKA [8] and tree-based Bayesian optimization methods [9,10].  In a nutshell, Bayesian optimization [3] fits a probabilistic model to capture the relationship between hyperparameter settings and their measured performance; it then uses this model to select the most promising hyperparameter setting (trading off exploration of new parts of the space vs. exploitation in known good regions), evaluates that hyperparameter setting, updates the model with the result, and iterates.” The examiner notes “matching learning framework WEKA [8] and tree-based Bayesian optimization methods [9,10].  In a nutshell, Bayesian optimization [3] fits a probabilistic model to capture the relationship between hyperparameter settings and their measured performance” teaches “optimizing of the first hyperparameter operator is performed using machine learning algorithms”.)
Feurer and Wistuba do not teach wherein the optimizing of the first hyperparameter operator is performed using stochastic instantiation.  
Nickson teaches wherein the optimizing of the first hyperparameter operator is performed using stochastic instantiation.  ([Abstract] “We introduce a means of automating machine learning (ML) for big data tasks, by performing scalable stochastic Bayesian optimization of ML algorithm parameters and hyper-parameters.”)
Feurer, Wistuba, and Nickson are analogous art because they are from the same field of endeavor, and/or are reasonably pertinent to the problem of using machine learning methods to perform their respective work. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Feurer and Wistuba to incorporate the teaching of Nickson to use scalable stochastic Bayesian optimization as per Nickson [Abstract] to optimize parameters and hyper-parameters for Machine Learning algorithms.
Regarding claim 5
The combination of Feurer, Wistuba, and Nickson teaches claim 4.
Feurer teaches wherein the creating the second artificial intelligence model for the second application is performed automatically, ([pg. 4, lines 4-12] “Rather than discarding these models, we propose to store them and to use an efficient post-processing method (which can be run in a second process on-the-fly) to construct an ensemble out of them.  This automatic ensemble construction avoids to commit itself to a single hyperparameter setting.”  The examiner notes “rather than discarding these models, we propose to store them and to use an efficient post-processing method (…) to construct an ensemble out of them. This automatic ensemble construction” teaches “the creating the second artificial intelligence model for the second application is performed automatically”).
Feurer and Wistuba do not teach wherein operators to use in the second artificial intelligence model and parameters set for the operators are selected by a second hyperparameter operator, the second hyperparameter operator using data for the second application as a condition array to select operators and set parameters.
Nickson teaches wherein operators to use in the second artificial intelligence model and parameters set for the operators are selected by a second hyperparameter operator, the second hyperparameter operator using data for the second application as a condition array to select operators and set parameters.  ([pg. 3, lines 15-25] “Perhaps the most widely used method is the fully independent conditional (FITC) method [15].  This method uses an inducing matrix [X] (whose rows are features vectors describing the locations of inducing points) with associated latent values [u] to restrict the bandwidth of the kernel, forcing information exchange between the training and test data to pass through these points rather than the full bandwidth of a full GP.  In the implementation used in this paper, we select a set of inducing points on a linear grid, optimize the GP hyper-parameters”.  The examiner notes “This method uses an inducing matrix [X] (whose rows are features vectors describing the locations of inducing points) with associated latent values [u] to restrict the bandwidth of the kernel, forcing information exchange between the training and test data to pass through these points (…) optimize the GP hyper-parameters” teaches “the second hyperparameter operator using data for the second application as a condition array to select operators and set parameters” because the matrix[X] and the information exchange between the training and test data teach “using data for the second application as a condition array to select operators and set parameters”, and the matrix [X] teaches “the second hyperparameter”).
Feurer, Wistuba, and Nickson are analogous art because they are from the same field of endeavor, and/or are reasonably pertinent to the problem of using machine learning methods to perform their respective work. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Feurer and Wistuba to incorporate the teaching of Nickson to use the fully independent conditional method as per Nickson [pg. 3, lines 15-25] and develop a method for optimizing the GP hyper-parameters to increase classification accuracy.

Conclusion                                                                                                                                                                                                                                   
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN S WALKER whose telephone number is (303)297-4479.  The examiner can normally be reached on Monday - Friday 0730-1700 (MT).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ANN LO can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to 

/BENJAMIN WALKER/Examiner, Art Unit 2126                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126