DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to §112(b) have been fully considered and are persuasive.  The rejection of claims 1, 8 and 14 has been withdrawn. 
Applicant's arguments filed with respect to §102(a)(1) have been fully considered but they are not persuasive. Applicant’s first argument is as follows:
“Regarding the previously recited elements of independent claims 1, 8, and 14 of "automatically modifying one or more of the existing models selected according to one or more data recipes of the data streams to generate one or more adaptive models", the Office asserts (see Office Action, page 4) that Feurer generally describes this functionality in such sections as §3.2. Without reproducing said sections, Feurer appears to disclose an automatic ensemble construction of a plurality of off-the-shelf models to include in the ensemble according to the data at hand. The Office appears to assert that this automated ensemble creation is equivalent to 'automatically modifying one or more existing models' selected according to the data recipes of the input stream. 

Applicants, however, have amended this language of independent claims 1, 8, and 14 to more clearly describe that the 'automated modification' of the models is not a modification of retrieving existing models and merely selecting certain models to include in an ensemble, but rather actually tuning each model individually to fit the 'data recipe' of the data class of the input data. While Feurer does discuss, in section §3.1, that meta-features of data sets are computed to obtain a cosine distance to known datasets in a repository, Feurer does not appear to describe taking singular models, modifying the tuning parameters thereof to meet the data recipe in the input set, and generating a new model (the 'adaptive model') based on a training of the existing model which most closely resembles those parameters necessitated by the input data set into the adaptive model trained specifically for the input data set.”

Feurer teaches in §3.2 that during Bayesian hyperparameter optimization (i.e., the data recipe), a plurality of models having different hyperparameter values are 
Applicant’s second argument is as follows:
“Moreover, with respect to the recited elements of independent claims 1, 8, and 14 of "wherein a model lineage of each of the one or more adaptive models is maintained to indicate the modifications performed to the one or more existing models over [time]", the Office further asserts (see Office Action, page 4) that Feurer discloses this functionality in substantially the same sections, as Feurer describes that 'all models during the course of search/training are usually lost' and therefore Feurer retains the trained models for use in the included automated ensemble construction. 
Applicants, however, respectfully traverse that this maintaining of old models for use in a new ensemble is representative of a lineage of each model, even as previously recited. A lineage, according to Google®, is "a lineal descent from an ancestor" or "a sequence... which is considered as having evolved from its predecessor". Thus, the 'lineage' of models stored over time is more representative of a model version, or a model instance at a particular point in time which is then further tuned/trained according to input data. Feurer does not disclose maintaining a 'lineage' or a sequence or pedigree of models over time as they are modified into a new adaptive model.”

Feurer teaches in §3.2 that conventional Bayesian hyperparameter optimization has the disadvantage of being wasteful since all the models it trains other than the highest performing model are discarded, even those that perform almost as well as the best.  Feurer’s approach modifies
Applicant’s third argument is as follows:
“Notwithstanding, Applicants have further amended this functionality to more clearly indicate this intended interpretation…Again, Feurer does not appear to disclose such techniques”

As will be discussed in the §103 rejections below, Szeto teaches a plurality of models having state rollback with automatic version control and tracking [0008] which is being relied upon in combination with Feurer to teach the newly presented limitations in the independent claims.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 6-9, 12-15 and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Feurer et al (NPL: “Efficient and Robust Automated Machine Learning”) in view of Szeto et al (US 2017/0124487).
For claim 1, Feurer teaches a method, by one or more processors, for self-managed adaptable models for prediction systems (Abstract), comprising: 
extracting one or more features (meta-features, §3.1) from data streams of new incoming data from a plurality of data sources (140 datasets from OpenML repository, ¶3 of §3.1), wherein the extracted features are pre-processed (Table 1b and 2nd to last ¶ of §4); 
dynamically binding the extracted features of the data streams to one or more model classes of existing models generated prior to receiving the data streams (Table 1a) of new incoming data (via meta-learning, §3.1), wherein the existing models of the one or more model classes are stored in a model store (dataset repository, ¶3 of §3.1); 
annotating the one or more model classes with the one or more extracted features (“rank all datasets by their L1 distance to D in meta-feature space”, ¶4 of §3.1); 
automatically modifying each one or more of the existing models, independently, (as explained in the Response to Arguments section above) selected according to one or more data recipes of the data streams to generate one or more adaptive models (automated ensemble construction, §3.2), wherein the one or more data recipes include the extracted features of the data streams (Bayesian hyperparameter optimization, §3.2), the one or more model classes for application on the data streams by the one or more adaptive models (constructing an ensemble out of the trained models from the plurality of model classes, §3.2), and key performance indicators (individually strong models and uncorrelated errors between models, ¶2 of §3.2); and 
applying the one or more adaptive models to the data streams from the plurality of data sources such that the one or more adaptive models predict a plurality of target variables (e.g., text classification, digit and letter recognition, ¶1 of §6) of the data streams (Figures 1 and 3), wherein a model lineage of each of the one or more adaptive models is maintained to indicate the modifications performed to the one or more existing models over time (“rather than discarding these models, we propose to store them and to use an efficient post-processing method…to construct an ensemble out of them”, ¶1 of §3.2).
Feurer fails to distinctly disclose:
a model lineage of each of the one or more adaptive models is maintained to indicate the modifications performed to the one or more existing models over a training phase when performing the modifying by taking snapshots of a model instance of the one or more existing models, having a specific set of model parameters set from applying the one or more model classes to the new incoming data, at incremental intervals during the modifying.
However, Szeto teaches in Figures 22 and 23 implementing machine learning model training and deployment with a rollback mechanism within a computing environment ([0039]-[0040]) wherein “there will be multiple models and multiple predictive engine variants created, each having differing predictive behaviors. It is therefore in accordance with the described embodiments that the many different versions created over time are tracked and maintained in a way that the developers may differentiate between the many variants and if necessary, even roll back or down-rev a given deployed model, such as in the case of having trained the model or updated a trained model with bad data or with new data which yields unacceptable performance results” ([0211]).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to use prior versions of Feurer’s models, as taught by Szeto, for use in the automatic ensemble construction in order to increase the likelihood of finding individually strong models that have uncorrelated errors for use in the automatic ensemble construction.  Furthermore, the particular known technique (model rollback) was recognized as part of the ordinary capabilities of one skilled in the art.
The combination of Feurer and Szeto as defined above teaches:
a model lineage of each of the one or more adaptive models is maintained to indicate the modifications performed to the one or more existing models over a training phase (as understood by examination of Figure 22 and by 2320 of Figure 23) when performing the modifying by taking snapshots (models of Figure 22, versions of Figure 23) of a model instance of the one or more existing models, having a specific set of model parameters set from applying the one or more model classes to the new incoming data, at incremental intervals during the modifying (2325-2330, Figure 23).
For claim 2, Feurer as modified by Szeto teaches all the limitations of claim 1 as cited above and Feurer further teaches:
an ontology is used to describe the one or more extracted features (“to characterize datasets, we implemented a total of 38 meta-features… such as statistics about the number of data points, features, and classes, as well as data skewness, and the entropy of the targets”, ¶4 of §3.1).
For claim 3
matching the one or more model classes in the one or more data recipes with the data streams (¶3 of §3.1).
For claim 6, Feurer as modified by Szeto teaches all the limitations of claim 1 as cited above and Feurer further teaches:
deploying the one or more data recipes to the selected number of data streams from a plurality of data sources (adjusting the weights of the ensemble, ¶3 of 3.2); 
retrieving historical data from a data source of the plurality of data sources required to apply the one or more data recipes to the selected number of data streams (hold-out set, ¶3 of 3.2); and 
updating parameters and associated ones of the KPIs of the one or more adaptive models (maximizing ensemble validation performance, ¶3 of 3.2).
For claim 7, Feurer as modified by Szeto teaches all the limitations of claim 1 as cited above and Feurer further teaches:
the model store includes model parameters, model class, training data, a pointer to the training data, or testing and validation data of the one or more existing models stored in the model store (¶3 of §3.1 teaches cross-validation).
For claim 8, Feurer teaches a system (AutoML system, Figure 1) for self-managed adaptable models for prediction systems (Abstract), comprising: 
one or more computers with executable instructions that when executed cause the system to (¶4 of §1 teaches implementing via CPU and memory):
extract one or more features (meta-features, §3.1) from data streams of new incoming data from a plurality of data sources (140 datasets from OpenML repository, , wherein the extracted features are pre-processed (Table 1b and 2nd to last ¶ of §4); 
dynamically bind the extracted features of the data streams to one or more model classes of existing models generated prior to receiving the data streams (Table 1a) of new incoming data (via meta-learning, §3.1), wherein the existing models of the one or more model classes are stored in a model store (dataset repository, ¶3 of §3.1);
annotate the one or more model classes with the one or more extracted features (“rank all datasets by their L1 distance to D in meta-feature space”, ¶4 of §3.1);
automatically modify each one or more of the existing models, independently (as explained in the Response to Arguments section above), selected according to one or more data recipes of the data streams to generate one or more adaptive models (automated ensemble construction, §3.2), wherein the one or more data recipes include the extracted features of the data streams (Bayesian hyperparameter optimization, §3.2), the one or more model classes for application on the data streams by the one or more adaptive models (constructing an ensemble out of the trained models from the plurality of model classes, §3.2), and key performance indicators (individually strong models and uncorrelated errors between models, ¶2 of §3.2); and 
apply the one or more adaptive models to the data streams from the plurality of data sources such that the one or more adaptive models predict a plurality of target variables (e.g., text classification, digit and letter recognition, ¶1 of §6) of the data streams (Figures 1 and 3), wherein a model lineage of each of the one or more adaptive models is maintained to indicate the modifications performed to the one or more existing models over time (“rather than discarding these models, we propose to store them and 
Feurer fails to distinctly disclose:
a model lineage of each of the one or more adaptive models is maintained to indicate the modifications performed to the one or more existing models over a training phase when performing the modifying by taking snapshots of a model instance of the one or more existing models, having a specific set of model parameters set from applying the one or more model classes to the new incoming data, at incremental intervals during the modifying.
However, Szeto teaches in Figures 22 and 23 implementing machine learning model training and deployment with a rollback mechanism within a computing environment ([0039]-[0040]) wherein “there will be multiple models and multiple predictive engine variants created, each having differing predictive behaviors. It is therefore in accordance with the described embodiments that the many different versions created over time are tracked and maintained in a way that the developers may differentiate between the many variants and if necessary, even roll back or down-rev a given deployed model, such as in the case of having trained the model or updated a trained model with bad data or with new data which yields unacceptable performance results” ([0211]).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to use prior versions of Feurer’s models, as taught by Szeto, for use in the automatic ensemble construction in order to increase the likelihood of finding individually strong models that have uncorrelated errors for use in the automatic 
The combination of Feurer and Szeto as defined above teaches:
a model lineage of each of the one or more adaptive models is maintained to indicate the modifications performed to the one or more existing models over a training phase (as understood by examination of Figure 22 and by 2320 of Figure 23) when performing the modifying by taking snapshots (models of Figure 22, versions of Figure 23) of a model instance of the one or more existing models, having a specific set of model parameters set from applying the one or more model classes to the new incoming data, at incremental intervals during the modifying (2325-2330, Figure 23).
For claim 9, Feurer as modified by Szeto teaches all the limitations of claim 8 as cited above and Feurer further teaches:
an ontology is used to describe the one or more extracted features (“to characterize datasets, we implemented a total of 38 meta-features… such as statistics about the number of data points, features, and classes, as well as data skewness, and the entropy of the targets”, ¶4 of §3.1); and
wherein the executable instructions further match the one or more model classes in the one or more data recipes with the data streams (¶3 of §3.1).
For claim 12, Feurer as modified by Szeto teaches all the limitations of claim 8 as cited above and Feurer further teaches:
deploy the one or more data recipes to the selected number of data streams from a plurality of data sources 
retrieve historical data from a data source of the plurality of data sources required to apply the one or more data recipes to the selected number of data streams (hold-out set, ¶3 of 3.2); and 
update parameters and associated ones of the KPIs of the one or more adaptive models (maximizing ensemble validation performance, ¶3 of 3.2).
For claim 13, Feurer as modified by Szeto teaches all the limitations of claim 8 as cited above and Feurer further teaches:
the model store includes model parameters, model class, training data, a pointer to the training data, or testing and validation data of the one or more existing models stored in the model store (¶3 of §3.1 teaches cross-validation).
For claim 14, Feurer teaches a computer program product for, by a processor (¶4 of §1 teaches implementing via CPU), self-managed adaptable models for prediction systems (Abstract), the computer program product comprising a non- transitory computer-readable storage medium having computer-readable program code portions stored therein (memory, ¶4 of §1), the computer-readable program code portions comprising: 
an executable portion that extracts one or more features (meta-features, §3.1)  from data streams of new incoming data from a plurality of data sources (140 datasets from OpenML repository, ¶3 of §3.1), wherein the extracted features are pre-processed (Table 1b and 2nd to last ¶ of §4); 
an executable portion that dynamically binds the extracted features of the data streams to one or more model classes of existing models generated prior to receiving the data streams (Table 1a) of new incoming data (via meta-learning, §3.1), wherein the existing models of the one or more model classes are stored in a model store (dataset repository, ¶3 of §3.1); 
an executable portion that annotates the one or more model classes with the one or more extracted features (“rank all datasets by their L1 distance to D in meta-feature space”, ¶4 of §3.1);
an executable portion that automatically modifies each one or more of the existing models, independently (see Response to Arguments section above) selected according to one or more data recipes of the data streams to generate one or more adaptive models (automated ensemble construction, §3.2), wherein the one or more data recipes include the extracted features of the data streams (Bayesian hyperparameter optimization, §3.2), the one or more model classes for application on the data streams by the one or more adaptive models (constructing an ensemble out of the trained models from the plurality of model classes, §3.2), and key performance indicators (individually strong models and uncorrelated errors between models, ¶2 of §3.2); and 
an executable portion that applies the one or more adaptive models to the data streams from the plurality of data sources such that the one or more adaptive models predict a plurality of target variables (e.g., text classification, digit and letter recognition, ¶1 of §6) of the data streams (Figures 1 and 3), wherein a model lineage of each of the one or more adaptive models is maintained to indicate the modifications performed to the one or more existing models over time (“
Feurer fails to distinctly disclose:
a model lineage of each of the one or more adaptive models is maintained to indicate the modifications performed to the one or more existing models over a training phase when performing the modifying by taking snapshots of a model instance of the one or more existing models, having a specific set of model parameters set from applying the one or more model classes to the new incoming data, at incremental intervals during the modifying.
However, Szeto teaches in Figures 22 and 23 implementing machine learning model training and deployment with a rollback mechanism within a computing environment ([0039]-[0040]) wherein “there will be multiple models and multiple predictive engine variants created, each having differing predictive behaviors. It is therefore in accordance with the described embodiments that the many different versions created over time are tracked and maintained in a way that the developers may differentiate between the many variants and if necessary, even roll back or down-rev a given deployed model, such as in the case of having trained the model or updated a trained model with bad data or with new data which yields unacceptable performance results” ([0211]).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to use prior versions of Feurer’s models, as taught by Szeto, for use in the automatic ensemble construction in order to increase the likelihood of finding individually strong models that have uncorrelated errors for use in the automatic ensemble construction.  Furthermore, the particular known technique (model rollback) was recognized as part of the ordinary capabilities of one skilled in the art.
The combination of Feurer and Szeto as defined above teaches:
a model lineage of each of the one or more adaptive models is maintained to indicate the modifications performed to the one or more existing models over a training phase (as understood by examination of Figure 22 and by 2320 of Figure 23) when performing the modifying by taking snapshots (models of Figure 22, versions of Figure 23) of a model instance of the one or more existing models, having a specific set of model parameters set from applying the one or more model classes to the new incoming data, at incremental intervals during the modifying (2325-2330, Figure 23).
For claim 15, Feurer as modified by Szeto teaches all the limitations of claim 14 as cited above and Feurer further teaches:
an ontology is used to describe the one or more extracted features (“to characterize datasets, we implemented a total of 38 meta-features… such as statistics about the number of data points, features, and classes, as well as data skewness, and the entropy of the targets”, ¶4 of §3.1); and
further including an executable portion that matches the one or more model classes in the one or more data recipes with the data streams (¶3 of §3.1).
For claim 18, Feurer as modified by Szeto teaches all the limitations of claim 14 as cited above and Feurer further teaches:
deploys the one or more data recipes to the selected number of data streams from a plurality of data sources (adjusting the weights of the ensemble, ¶3 of 3.2); 
retrieves historical data from a data source of the plurality of data sources required to apply the one or more data recipes to the selected number of data streams (hold-out set, ¶3 of 3.2); and 
updates parameters and associated ones of the KPIs of the one or more adaptive models (maximizing ensemble validation performance, ¶3 of 3.2).
For claim 19, Feurer as modified by Szeto teaches all the limitations of claim 14 as cited above and Feurer further teaches:
the model store includes model parameters, model class, training data, a pointer to the training data, or testing and validation data of the one or more existing models stored in the model store (¶3 of §3.1 teaches cross-validation).
Claims 5, 11 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Feurer and Szeto in view of Braden-Harder et al (US 5,933,822).
For claim 5, 11 and 17, Feurer as modified by Szeto teaches the limitations of claim 1, 8 and 14 respectively and that each of the machine learning datasets are stored in a dataset repository (¶3 of §3.1) but fails to teach indexing as claimed.  
However, Braden-Harder teaches indexing documents into a dataset to form a document repository in order to “cost-effectively disseminate large collections of documents, together with the ability to accurately search through the collection, to a wide user community” (column 20, lines 47-61).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to implement Feurer’s data repository such that each of the plurality of the machine learning datasets are indexed in order to enable users to accurately search through a collection of data sources.
The combination of Feurer and Braden-Harder as defined above teaches:
indexing each one of the plurality of data sources (as explained above); and 
identifying the one or more data recipes that are associated with each one of the plurality of data sources (ensemble selection, ¶3 of §3.2).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL CALRISSIAN PUENTES whose telephone number is (571)270-5070.  The examiner can normally be reached on M-F 9-6:30 (flex).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/DANIEL C PUENTES/Primary Examiner, Art Unit 2849