DETAILED ACTION
This action is in response to claims filed 13 June 2022 for application 17654194 filed 09 March 2022. Currently claims 2-15 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 2-6, 8-13 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Nguyen et al. (Heterogeneous Ensemble for Feature Drifts in Data Streams) in view of Grmanova et al. (Incremental Ensemble Learning for Electricity Load Forecasting).

Regarding claims 2 and 9, Nguyen discloses: A machine learning computer system comprising: 
at least one processor core (“The experiments were conducted on a Windows PC with a Pentium D 3GHz Intel processor and 2GB memory. To enable more meaningful comparisons, we try to use the same parameter values for all the algorithms.” P9 §4.1 ¶2 Nguyen); and 
a memory in communication with the at least one processor core, wherein the memory stores executable instructions that, when executed by the at least one processor core, cause the at least one processor core to iteratively train an ensemble with a set of training data items (“The experiments were conducted on a Windows PC with a Pentium D 3GHz Intel processor and 2GB memory. To enable more meaningful comparisons, we try to use the same parameter values for all the algorithms.” P9 §4.1 ¶2, “In this paper, we address the above problems by presenting a novel framework to integrate feature selection techniques and ensemble learning for data streams. To alleviate ensemble updating, we propose a new concept of “feature drifts" and use it to optimize the updating process. With a gradual drift, each classifier member is updated in a real-time manner. When a feature drift occurs, which represents a significant change in the underlying distribution of the dataset, we train a new classifier to replace an outdated classifier in the ensemble.” P2 ¶2), 
wherein: 
the ensemble comprises multiple machine learning ("ML") ensemble members (“To alleviate ensemble updating, we propose a new concept of “feature drifts" and use it to optimize the updating process. With a gradual drift, each classifier member is updated in a real-time manner. When a feature drift occurs, which represents a significant change in the underlying distribution of the dataset, we train a new classifier to replace an outdated classifier in the ensemble.” P2 ¶2, Fig 1); and 
the memory stores executable instructions that, when executed by the at least one processor core, cause the at least one processor core to train the ensemble in a first iteration of the training by selectively directing certain training data items in the set of training data items to, for each of the certain training data items, a selected set of the multiple ML ensemble members (“To alleviate ensemble updating, we propose a new concept of “feature drifts" and use it to optimize the updating process. With a gradual drift, each classifier member is updated in a real-time manner. When a feature drift occurs, which represents a significant change in the underlying distribution of the dataset, we train a new classifier to replace an outdated classifier in the ensemble.” P2 ¶2, Fig 1, note: each ML model (classifier) is trained on a different subset of training data, a selected set of the multiple ML models is interpreted as one or more classifiers), wherein, for each of the certain training data items in the first iteration: 
the selected set of the multiple ML ensemble members comprises one or more, but less than all, of the multiple ML ensemble members (“To alleviate ensemble updating, we propose a new concept of “feature drifts" and use it to optimize the updating process. With a gradual drift, each classifier member is updated in a real-time manner. When a feature drift occurs, which represents a significant change in the underlying distribution of the dataset, we train a new classifier to replace an outdated classifier in the ensemble.” P2 ¶2, Fig 1, note: a new classifier corresponds to the one or more of the multiple ML ensemble members); 
there is an unselected set of the multiple ML ensemble members that comprises one or more ML ensemble members in the set of the multiple ML ensemble that are not in the selected set (Fig 1, note: when feature drift occurs, the previous classifiers are the unselected set of the multiple ML ensemble members.); 
the selected set of the multiple ML ensemble members uses the certain training item in the training of the selected set of the multiple ML ensemble members (Fig 1, note: the new feature set which has feature drift is used to train the selected model); and 
the unselected set of the multiple ML ensemble members do not use the certain training item in the training of the unselected set of the multiple ML ensemble members (Fig 1, note: the new feature selection training data which has feature drift is used to train the selected model. The previous models do not use the new feature set.).
	
	However, Nguyen does not explicitly disclose: 
…at a time that the certain training data item is directed to the selected set of the multiple ML ensemble members, such that the unselected set…;
at the time that the certain training data item is directed to the selected set of the multiple ML ensemble members.

Grmanova teaches: …at a time that the certain training data item is directed to the selected set of the multiple ML ensemble members, such that the unselected set…;
at the time that the certain training data item is directed to the selected set of the multiple ML ensemble members (“The proposed ensemble model incorporates several types of models for capturing different seasonal dependencies. The models differ in algorithm, size of data chunk and training period (see Figure 1). Different algorithms are assumed in order to increase the diversity of the models. The size of each data chunk is chosen in order to capture particular seasonal variation, e.g. data from the last 4 days for daily seasonal dependence. However, the model that is trained on a data chunk of 4 days’ data, can be trained again as soon as the data from the next day (using a 1-day training period) are available. The new data chunk overlaps with the previous one in 3 days” p103 §4.1 ¶1, Fig 1, see also all of §4).

	Nguyen and Grmanova are both in the same field of endeavor of ensemble learning methods and are analogous. Nguyen teaches a system that trains a new machine learning model as it is needed with a subset of training data. Grmanova teaches training pieces of the ensemble with new data while leaving other ensemble members unaffected. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the training method taught by Nguyen to only retrain certain ensemble members as taught by Grmanova. One would have been motivated to train only certain members to avoid wasting resources retraining models that will receive no benefit from retraining, for example, due to long-term seasonal variations (Grmanova §4.1 ¶1).

Regarding claims 3 and 10, Nguyen discloses: The machine learning computer system of claim 2, wherein the memory stores executable instructions that, when executed by the at least one processor core, cause the at least one processor core to selectively direct the certain training data items in the set of training data items to, for each of the certain training data items, the selected set of the multiple ML ensemble members based on a control signal from a system that is not a member of the ensemble (“We choose to the sliding window version of FCBF so that it has low time and space complexities. Incoming data is stored in a buffer (window) with a predefined size. Next, the matrix of symmetrical uncertainty values is computed to select the most relevant feature subset. The process is performed in a sliding window fashion, and the selected feature subsets are monitored to detect feature drifts. When two consecutive subsets are different, we postulate that a feature drift has occurred.” P6 §3.1 ¶3, Fig 1, note: the Feature drift detector is interpreted as the control system).

Regarding claims 4 and 11, Nguyen discloses: The machine learning computer system of claim 2, wherein the memory stores executable instructions that, when executed by the at least one processor core, cause the at least one processor core to classify, with a classifier, the training data items for the ensemble, wherein outputs of the classifier are used to selectively direct the certain training data items in the set of training data items to, for each of the certain training data items, the selected set of the multiple ML ensemble member (“We choose to the sliding window version of FCBF so that it has low time and space complexities. Incoming data is stored in a buffer (window) with a predefined size. Next, the matrix of symmetrical uncertainty values is computed to select the most relevant feature subset. The process is performed in a sliding window fashion, and the selected feature subsets are monitored to detect feature drifts. When two consecutive subsets are different, we postulate that a feature drift has occurred.” P6 §3.1 ¶3, Fig 1, note: the Feature drift detector is interpreted as a classifier).

Regarding claims 5 and 12, Nguyen discloses: The machine learning computer system of claim 4, wherein the classifier is not trained with the ensemble (“We choose to the sliding window version of FCBF so that it has low time and space complexities. Incoming data is stored in a buffer (window) with a predefined size. Next, the matrix of symmetrical uncertainty values is computed to select the most relevant feature subset. The process is performed in a sliding window fashion, and the selected feature subsets are monitored to detect feature drifts. When two consecutive subsets are different, we postulate that a feature drift has occurred.” P6 §3.1 ¶3, Fig 1, note: the Feature drift detector is not trained).

Regarding claims 6 and 13, Nguyen discloses: The machine learning computer system of claim 5, wherein derivatives of error cost functions for the multiple ML ensemble members are not back-propagated to the classifier (“We choose to the sliding window version of FCBF so that it has low time and space complexities. Incoming data is stored in a buffer (window) with a predefined size. Next, the matrix of symmetrical uncertainty values is computed to select the most relevant feature subset. The process is performed in a sliding window fashion, and the selected feature subsets are monitored to detect feature drifts. When two consecutive subsets are different, we postulate that a feature drift has occurred.” P6 §3.1 ¶3, Fig 1, note: the Feature drift detector is not a neural network so nothing can be back-propagated to it).

Regarding claims 8 and 15, Nguyen discloses: The machine learning computer system of claim 2, wherein the ensemble comprises a heterogeneous mixture of machine learning models (“Heterogeneous Ensemble When constructing an ensemble learner, the diversity among member classifiers is expected as the key contributor to the accuracy of the ensemble. Furthermore, a heterogeneous ensemble that consists of different classifier types usually attains high diversity [11, 23]. Motivated by this observation, we construct a small heterogeneous ensemble rather than a big homogeneous ensemble with a large number of classifiers of the same type, which will compromise speed.” P6 §3.2 ¶1).

Claims 7 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nguyen in view of Grmanova and further in view of Wang et al. (Mining ConceptDrifting Data Streams Using Ensemble Classifiers).

Regarding claims 7 and 14,  Nguyen discloses a classifier to determine which ML model training to train a Set of ML models, however, does not explicitly disclose: The machine learning computer system of claim 5, wherein the memory stores executable instructions that, when executed by the at least one processor core, cause the at least one processor core to train the classifier to optimize a combination of a cost of errors by the ensemble and a cost of computation for the ensemble.

Wang teaches: wherein the memory stores executable instructions that, when executed by the at least one processor core, cause the at least one processor core to train the classifier to optimize a combination of a cost of errors by the ensemble and a cost of computation for the ensemble (“Cost-sensitive applications usually provide higher error tolerance. For instance, in credit card fraud detection, the decision threshold of whether to launch an investigation or not is: p(fraud|y) · t(y) > cost where t(y) is the amount of transaction y. In other words, as long as p(fraud|y) > cost/t(y), transaction y will be classified as fraud no matter what the exact value of p(fraud|y) is. For example, assuming t(y) = $900, cost = $90, both p(fraud|y) = 0.2 and p(fraud|y) = 0.4 result in the same prediction. This property helps reduce the “expected” number of classifiers needed in prediction.”, p230 §5.2 ¶1, “The final weighted probability, derived after all K classifiers are consulted, is FK(x). Let k(x) = Fk(x) − FK(x) be the error at stage k. The question is, if we ignore k(x) and use Fk(x) to decide whether to launch a fraud investigation or not, how much confidence do we have that using FK(x) would have reached the same decision?” p230 §5.2 ¶3, 
    PNG
    media_image1.png
    252
    380
    media_image1.png
    Greyscale
p231 §5.2 ¶5).

Nguyen, Grmanova and Wang are all in the same filed of endeavor of ensemble machine learning and are analogous. Nguyen teaches a system that trains a new machine learning model as it is needed with a subset of training data. Grmanova teaches retraining certain ensemble members. Wang teaches a system that determines which classifiers to use based on a cost of training/using the classifier and an error. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the classifier for determining feature drift disclosed by Nguyen and Grmanova with the cost-based classifier as taught by Wang. One would have been motivated to use cost-based measures to reduce time spent consulting classifiers (Wang p230 §5 ¶1).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Yu et al. (Credit risk assessment with a multistage neural network ensemble learning approach) discloses training ensemble members with different subsets of data.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC NILSSON whose telephone number is (571)272-5246. The examiner can normally be reached M-F: 7-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571)-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ERIC NILSSON/           Primary Examiner, Art Unit 2122