DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  
Applicant's submission filed on 10/06/2021 has been entered.
Response to Arguments
Applicant’s arguments have been fully considered but are moot in light of a new rejection.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-10, 12-15, and 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang, Shuo, Leandro L. Minku, and Xin Yao. "Resampling-based ensemble methods for online class imbalance learning." in view of Sturlaugson et al. US 2016/0358099 [herein Stur] further in view of Gerard US 2016/0283861 and Luo et al. US 2018/0197087.
	Regarding claims 1, 14, and 20, Wang teaches “sort the input dataset into a first version of data and a second version of data, wherein the first version of data is associated with a first period of time and the second version of data is associated with a second period of time” (Wang pg. 1357 left col. last ¶ “Suppose a sequence of examples xt ð Þ ; yt arriving one at a time. xt is a p-dimensional vector belonging to an input space X observed at time t,” and “When a new example xt arrives” i.e. first and second dates associated with different times), “wherein the second period of time is a shorter period of time than the first period of time” (previous citation, “1 if the true class label of xt is ck, otherwise 0. u ð Þ 0 < u < 1 is a pre-defined time decay factor, which forces older data to affect the class percentage less along with time through the exponential smoothing” older data i.e. second data is associated with a shorter period of time than the previous);
“output a predication based on the combined ensemble model” (pg. 1358 ¶1 “3) like other ensemble methods, they combine the predictions from multiple classifiers, which are expected to be more accurate than a single classifier.” wherein the abnormal behavior with the input dataset ), 
Wang however does not explicitly teach the remaining limitations. Stur however teaches “a system, comprising: a processor configured to” ([0013] “The processing unit 12 may include one or more computer processors and may include a distributed group of computer processors”):
“receiving an input dataset, wherein the input dataset is comprised of a plurality of entries” ([0019] “Data input module 20 is configured to receive a selection, e.g., a selection from a user, of machine learning models 32 and a dataset, such as a time-dependent dataset. Thus, machine learning systems 10 are configured to receive the dataset. The dataset, also called the input dataset, may be in a common format to interface with the machine learning models 32 and/or the experiment module 30”), “and wherein the plurality of entries are associated with a plurality of features and corresponding feature values” ([0018] “Time-dependent data relate to the progression of an observable (also called a quantity, an attribute, a property, or a feature) in a sequence and/or through time (e.g., measured in successive periods of time)”)
“generate a first set of one or more machine learning models based on the first version of data” ([0017] “Machine learning systems 10 are configured for machine learning model selection, i.e., to facilitate the choice of appropriate machine learning model(s) 32 for a particular data analysis problem, e.g., to compare candidate machine learning models”);
“generate a second set of one or more machine learning models based on the second version of data” ([0036] “each machine learning model 32 may be tested (optionally exclusively) with an independent division of the dataset (which may or may not be a unique division for each machine learning model). The experiment module 30 may be configured to train the machine learning model(s) 32 with the respective training dataset(s) (to produce a trained model) and to evaluate the machine learning model(s) 32 with the respective evaluation dataset(s).”)
“combine the first set of one or more machine learning models and the second set of one or more machine learning models to generate a combined ensemble model” ([0057] “Training and evaluating 106 may include repeatedly dividing 120 the dataset to perform multiple rounds of training 122 and evaluation 124 (i.e., rounds of validation) and combining 126 the (evaluation) results of the multiple rounds of training 122 and evaluation 124 to produce the performance result for each machine learning model” and [0023] “Machine learning model 32 may be a macro-procedure 36 that combines the outcomes of an ensemble of micro-procedures 38”); and 	
“wherein the prediction indicates abnormal behavior associated with the input dataset” ([0044] “For two-class classification schemes, accuracy is the total number of true positives and true negatives divided by the total population. For regression problems, accuracy may be an error measure such as mean square error.” where negative is an abnormality)
“a memory coupled to the processor and configured to provide the processor with instruction” ([0013] “The storage unit 14 (also called a computer-readable storage unit) is one or more devices configured to store computer-readable information”)
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Wang with that of Stur since “there exists a need for comparing machine learning models for applicability to various specific problems” in which Stur solves that problem and thus by comparing machine learning models with Wang, better and more accurate models will be deployed.
Both however do not explicitly teach baseline or volatility values. Gerard however teaches “determine a baseline value for a feature of the plurality of features, wherein the baseline value for the feature corresponds with a first statistical value of the feature for the first period of time” ([0041] “Document ingestion subsystem 300 ingests baseline source documents into knowledge base 106”); 
“determine a volatility value for the feature, wherein the volatility value for the feature corresponds with a second statistical value of the feature for the second period of time” ([0044] “Machine-learning model subsystem 320 occasionally computes a subsequent distribution, which includes using subsequent feature vectors 380 in addition to baseline feature vectors 350 and labeled feature vectors 355 to determine if the subsequent distribution differs from the baseline distribution (see FIG. 7 and corresponding text for further details).”); 
“compare the baseline value with the volatility value” ([0045] “When machine -learning model subsystem 320 determines that the distribution difference between the baseline distribution and the updated distribution reaches a distribution difference threshold”); and 
“in response to a determination that the baseline value varies from the volatility value more than a threshold, retrain at least one machine learning model” ([0045] “machine -learning model subsystem 320 generates an indicator to retrain the machine -learning model due to the shift in the feature vector distribution.”);
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Wang and Stur with that of Gerard since “Many source documents, however, may include time-dated information due to changing world conditions. For example, publishers typically publish textbooks every few years and publish journals on a monthly or quarterly basis. As such, question answer systems may require retraining at particular points in time.” Gerard [0003]. In a broad sense, Gerard solves the problem of having outdated information in a machine learning system by periodically checking the data in order to make sure it is still accurate and not out of date. This allows for better and more accurate machine learning. 
Both however do not explicitly teach retraining based on the volatile feature. Luo however teaches “comprising to: retrain the at least one machine learning model based on the volatile feature to tune a weight associated with the volatile feature” ([0030] “Impact estimate 106 provides an estimated impact to the performance of the initial classification model based on the extent to which modified data 117 differs from the baseline aggregated data. An example primary impact to the performance of the initial model can be a reduction in the accuracy of security classifications/security labels generated for documents created and stored within the example computer network” and [0046] “Current model 110 may eventually require retraining to generate a new model 114 that uses more recent inference associations. Retraining can depend on the scope of the modifications/changes to the documents and how those changes effect the legitimacy of the initial inference associations were used to train model 110”)
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Wang, Stur, and Gerard with that of Luo since by updating models based on a difference between baseline data and new data, one can keep models up to date and more secure.
Note that independent claims 14 and 20 recite the same substantial subject matter as independent claim 1, only differing in embodiment. As such these claims are subject to the same rejection. The difference in embodiment, including a method and non-transitory computer-readable media is taught by Stur, abstract and [0013] “The persistent storage 18 is one or more computer-readable storage devices that are non-transitory”.
Regarding claims 5 and 19, the Wang, Stur, Gerard, and Luo references have been addressed above. Stur further teaches “wherein the processor is further configured to generate one or more new features based at least in part on the plurality of features” ([0030] “Feature selection and feature extraction are other common tasks of data preprocessor 24 and a class of algorithms that may be present in the preprocessing algorithm library 26. Feature selection generally selects a subset of the input data values. Feature extraction, which also may be referred to as dimensionality reduction, generally transforms one or more input data values into a new data value”).
Regarding claim 6, the Wang, Stur, Gerard, and Luo references have been addressed above. Wang further teaches “wherein the processor is further configured to: generate at least one machine (pg. 1357 ¶2.2 ¶1 “it builds multiple base classifiers and each classifier is trained K times by using the current training example, where K follows the Poisson ð Þ ¼ 1 distribution”))
	Regarding claim 7, the Wang, Stur, Gerard, and Luo references have been addressed above. Stur further teaches “wherein the combined ensemble model is configured to output one or more predictions” ([0025] “Though the individual, trained, micro-procedures 38 may be reliable, robust, and/or stable in predicting output data (the outcome), the combination of the micro-procedure outcomes may be more reliable, robust, and/or stable than any individual outcome”).
	Regarding claim 8, the Wang, Stur, Gerard, and Luo references have been addressed above. Stur further teaches “wherein the processor is further configured to: determine a distribution of feature values associated with at least one feature” ([0036] “Generally, the training dataset and the evaluation dataset are identically and independently distributed, i.e., the training dataset and the evaluation dataset have no overlap of data and show substantially the same statistical distribution.”) 
“select the feature based at least in part on the distribution of feature values” ([0027] “the data preprocessor 24 may be configured to discretize, to apply independent component analysis to, to apply principal component analysis to, to eliminate missing data from (e.g., to remove records and/or to estimate data), to select features from, and/or to extract features from the dataset”) 
“train a machine learning model based at least in part on the selected at least one feature” ([0061] “Methods 100 may include building 114 a deployable machine learning model corresponding to one or more of the machine learning models. Building 114 a deployable machine learning model includes training the corresponding machine learning model with the entire input feature dataset (as optionally preprocessed”).
	Regarding claim 9, the Wang, Stur, Gerard, and Luo references have been addressed above. Stur further teaches “wherein the processor is further configured to apply production data to the first set of one or more machine learning models and the second set of one or more machine learning models” ([0033] “For each of the machine learning models 32, experiment module 30 is configured to perform supervised learning using the same dataset (the input feature dataset, received from the data input module 20 and/or the data preprocessor 24, and/or data derived from the input feature dataset).”).
	Regarding claims 10 and 15, the Wang, Stur, Gerard, and Luo references have been addressed above. Stur further teaches “wherein the processor is further configured to: receive an indication of one or more false positives” ([0042] “The performance result for each machine learning model 32 may include different types of indicators, values, and/or results (e.g., one performance result may include a confidence interval and one performance result may include a false positive rate).”); and
“retrain at least one machine learning model based at least in part on the indication” ([0061] “Building 114 a deployable machine learning model includes training the corresponding machine learning model with the entire input feature dataset (as optionally preprocessed). Thus, the deployable machine learning model is trained with all available data rather than just a subset (the training dataset). Building 114 may be performed after comparing the machine learning models with the performance comparison statistics and selecting one or more of the machine learning models to deploy”).
Regarding claims 12 and 17, the Wang, Stur, Gerard, and Luo references have been addressed above. Wang further teaches “wherein the processor is further configured to: sample a subset of one (pg. 1357 right col. last ¶ “Resampling in OOB and UOB is performed through the parameter of Poisson distribution”);
“determine whether a trigger event is satisfied” (pg. 1358 ¶1 “If the new training example belongs to the minority class, OOB increases value K, which decides how many times to use this example for training. Similarly, if it belongs to the majority class, UOB decreases K”); and
“retrain at least one machine learning model based on the determination” (pg. 1357 ¶2.2 ¶1 “it builds multiple base classifiers and each classifier is trained K times by using the current training example, where K follows the Poisson ð Þ ¼ 1 distribution”)
	Regarding claim 13, the Wang, Stur, Gerard, and Luo references have been addressed above. Wang further teaches “wherein the first set of one or more machine learning models and/or the second set of one or more machine learning models are periodically trained” (pg. 1357 ¶2.2 ¶1 “it builds multiple base classifiers and each classifier is trained K times by using the current training example, where K follows the Poisson ð Þ ¼ 1 distribution”).
Regarding claim 18, the Wang, Stur, Gerard, and Luo references have been addressed above. Stur further teaches “wherein the input data is comprised of a plurality of entries” ([0019] “Data input module 20 is configured to receive a selection, e.g., a selection from a user, of machine learning models 32 and a dataset, such as a time-dependent dataset. Thus, machine learning systems 10 are configured to receive the dataset. The dataset, also called the input dataset, may be in a common format to interface with the machine learning models 32 and/or the experiment module 30”) “wherein the plurality of entries are associated with a plurality of features and corresponding feature values” ([0018] “Time-dependent data relate to the progression of an observable (also called a quantity, an attribute, a property, or a feature) in a sequence and/or through time (e.g., measured in successive periods of time)”)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN W FIGUEROA whose telephone number is (571)272-4623. The examiner can normally be reached Monday-Friday, 10AM-6PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MIRANDA HUANG can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

KEVIN W FIGUEROA
Examiner
Art Unit 2124



/Kevin W Figueroa/Examiner, Art Unit 2124