DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-3, 5-10, 12-16, and 18-20 are presented for examination.

Response to Amendment
	Applicant’s amendment has obviated some, but not all, of the objections to the specification, drawings, and claims given in the previous Office Actions.  To the extent that an objection or rejection appears in the previous Office Action(s) but not this Office Action, that objection or rejection is withdrawn.  To the extent that is appears both in a previous Office Action(s) and this Office Action, the objection or rejection is maintained.
Applicant’s amendment has also obviated the rejections under 35 USC § 101.  Therefore, those rejections are withdrawn.

Drawings
The drawings are objected to because (a) In Fig. 3C, reference character 68, “data now exceeds” should be “data now exceed”; (b) in Fig. 4, reference character 100, “”a same training data” should be “same training data”; and (c) in Fig. 5, reference character 204, “was submitted” should be “were submitted”.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be 

Specification
The disclosure is objected to because of the following informalities:
In paragraph 4, “data that is used” and “that that is fed … is also fed” should be “data that are used” and “data that are fed … are also fed”, respectively.   Examiner notes that “data” is the plural of “datum” and that the specification contains multiple instances of the term “data” being used as singular, which will not be further enumerated here.  Examiner requests that all such instances be corrected.  For Applicant’s convenience, Examiner has attached a marked-up copy of the specification indicating where else in the specification this error has occurred.
In paragraphs 5-7, “a same training data” should be “same training data”.
In paragraph 14, “a predetermined criteria” should be “a predetermined criterion”.
In paragraph 39, “different distribution than” should be “different distribution from”.
Appropriate correction is required.
The abstract of the disclosure is objected to because “a same training data” should be “the same training data”.  Correction is required.  See MPEP § 608.01(b).

Claim Objections
Claim 6 is objected to because of the following informalities:  “data deviates” should be “data deviate”; “a predetermined criteria” should be “a predetermined criterion”.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 5-7, 9, 13, 15, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Maughan et al. (US 20170330109) (“Maughan”) in view of Jose et al., “Binary Hashing Using Siamese Neural Networks,” in 2017 IEEE Int’l Conf. Image Processing 2916-20 (2017) (“Jose”) and further in view of Guo et al. (US 20190130037) (“Guo”).
	Regarding claim 1, Maughan discloses “a method (may also be embodied as a non-transitory computer-readable storage medium (see Maughan, paragraph 8) or as a device comprising a memory and a processor (see Maughan, paragraphs 27 and 31)) comprising: …
	determining a deviation of … operational input data from … training data by comparing the operational input data to the training data (drift detection module may perform a statistical analysis of one or more inputs and/or outputs to determine drift; for example, the drift detection module may compare a statistical distribution of outcomes from the prediction module [operational input data] to a statistical distribution of training data – Maughan, paragraph 63); 
generating … a drift signal that characterizes the deviation of the operational input data from the training data (drift detection module may perform a statistical analysis of one or more inputs and/or outputs to determine drift; for example, the drift detection module may compare a statistical distribution of outcomes from the prediction module [operational input data] to a statistical distribution of training data – Maughan, paragraph 63; drift detection module may use a binary classification (e.g., training data labeled with a “0” and workload data [operational input data] labeled with a “1”) and if the drift detection module can tell the difference between the classes, a drift has occurred – id. at paragraph 65); and 
based on the drift signal exceeding a predetermined threshold, retraining [a] predictive learning model based on the operational input data (drift phenomenon refers to a detectable change, or to a change that violates a threshold, in one or more inputs and/or outputs for a model – Maughan, paragraph 58; retrain module is configured to retrain the model used by the prediction module in response to the drift detection module detecting the drift phenomenon – id. at paragraph 75; after the prediction module generates predictive results by applying a model to workload data and a drift phenomenon is detected, retrain module prompts a user to select whether to use new training data or modified training data – id. at paragraph 163 [so the retraining is done in response to the detection of drift in workload data – i.e., based on them]).”
Jose discloses “training a sidecar learning model based on training data used to train a predictive learning model (Siamese neural network consists of two feedforward branches with weights shared between the branches – Jose, sec. 2, second paragraph; training of the networks is done on image pairs organized as similar and dissimilar – id. at sec. 3; see also Fig. 1 (showing that one image of the image pair is sent to the top branch and the other is sent to the bottom branch, so the training of the top branch of the network [predictive learning model] by the first images of the pair is “based on” the training data consisting of the second images of the pairs that are fed into the bottom branch [part of sidecar learning model] insofar as the first and second images of the pair are related by a predetermined similarity or dissimilarity criterion)); 
directly receiving, by the sidecar learning model, operational input data directly submitted to the predictive learning model (Jose Fig. 1 and accompanying caption show that the input is fed to the neural networks as image pairs and that each image of the pair is fed directly to a branch of the two branches [note that the entire set of image pairs are considered to be collectively “operational input data,” so that elements of the same dataset that are fed into the top branch [predictive learning model] are also fed into the bottom branch [part of sidecar learning model]]); … [and]
generating, by the sidecar learning model, a drift signal that characterizes the deviation of [one set of] data from [another set of] data (loss function module [considered for purposes of examination to be part of the “sidecar learning module”] computes the distance [drift signal] between the feature vectors extracted from the two branches of the network; the network tries to minimize a loss function – Jose, sec. 2, second paragraph)….”
See Jose, sec. 2, first paragraph.
Neither Maughan nor Jose appears to disclose explicitly the further limitations of the claim.  However, Guo discloses that “the sidecar learning model compris[es] an unsupervised learning model and the predictive learning model compris[es] a supervised learning model (in a system for training two machine learning algorithms, the first and second machine learning algorithms may be selected from among many different supervised or unsupervised machine learning algorithms [so the first model may be supervised and the second unsupervised] – Guo, paragraph 51; see also Fig. 4 (showing that the first and second machine learning models are trained on the same features, so the second ML algorithm is a “sidecar” model))….”
Guo and the instant application both relate to the training of multiple machine learning models and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Maughan and Jose to train one model in an unsupervised fashion and another in a supervised fashion, as disclosed by Guo, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the user greater flexibility to tailor the models to his own needs.  See Guo, paragraph 51.

Claim 9 is a device claim corresponding to method claim 1 and is rejected for the same reasons as given in the rejection of that claim.  Similarly, claim 15 is a non-transitory computer-readable medium claim corresponding to method claim 1 and is rejected for the same reasons as given in the rejection of that claim.

Regarding claim 5, Maughan, as modified by Jose and Guo, discloses “automatically generating the sidecar learning model (in a training algorithm for Siamese neural networks, similar and dissimilar pairs are automatically generated; the dissimilar pairs are sampled based on each minibatch and an argmin is computed on a subset of the data – Jose, sec. 3 and Algorithm 1 [note that the generation of training samples and the training of the networks, including the bottom branch/sidecar learning model, is considered to be part of the generation of the model]).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Maughan/Guo to generate the sidecar model automatically, as disclosed by Jose, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would reduce the need for manual generation and training of the model.  See Jose, sec. 3.

Regarding claim 6, Maughan, as modified by Jose and Guo, discloses that “generating the drift signal that characterizes the deviation of the operational input data from the training data further comprises generating an alert that indicates the operational input data deviates from the training data by a predetermined criteri[on] (predict-time fix module uses an indication module to modify at least one predictive result from the prediction module so that the modified predictive result(s) include an indicator [alert] of the drift phenomenon detected by the drift detection module; an indicator may be a simple binary flag, a description of the drift phenomenon, a link to a description of the drift phenomenon, or the like – Maughan, paragraph 91; drift detection module may determine a baseline variation in data by performing a binary classification and may set a threshold for subsequent binary classifications based on the baseline (e.g., in response to detecting a 3% baseline variation, the drift detection module may set a threshold for detecting drift higher than 3% [predetermined criterion] – id. at paragraph 65).”

predictive analytics module provides a predictive analytics framework allowing clients to receive predictive results such as a classification or a confidence metric – Maughan, paragraph 38; drift detection module may monitor and/or analyze confidence metrics from the prediction module to detect drift (e.g., if a distribution of confidence metrics becomes bimodal and/or exhibits a different change) – id. at paragraph 64 [so the confidence level is based on the drift signal]).”

Claim 13 is a device claim corresponding to method claim 7 and is rejected for the same reasons as given in the rejection of that claim.  Similarly, claim 19 is a non-transitory computer-readable medium claim corresponding to method claim 7 and is rejected for the same reasons as given in the rejection of that claim.

Claims 2, 10, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Maughan in view of Jose and Guo and further in view of Ouyang (US 20180367573) (“Ouyang”).
	Regarding claim 2, Maughan, as modified by Jose and Guo, teaches “receiving the training data (predictive analytics module may apply a model to workload data to produce predictive results; learned functions of the model may be based on training data [implying that the training data have been received] – Maughan, paragraph 48); [and]
	modeling, by the sidecar learning model, a … distribution of the training data (drift detection module may compare a statistical distribution of outcomes from the prediction module to a statistical distribution of initialization data such as training data [implying that the distribution of the training data has previously been modeled by the ensemble, including the sidecar model] – Maughan, paragraph 63); … 
wherein determining the deviation of the operational input data from the training data comprises comparing the … distribution of the training data to the … distribution of the operational data (drift detection module may compare a statistical distribution of outcomes from the prediction module to a statistical distribution of initialization data such as training data – Maughan, paragraph 63).” 
Neither Maughan, Guo, nor Jose appears to disclose explicitly the further limitations of the claim.  However, Ouyang discloses “modeling … a joint distribution of the training data (experiment on a very fast decision tree shows that the underlying joint distribution P(X, Y) is not constant; there exists concept drift within the observed features and the label in sampled data; the reason for this can be attributed to the use of a real-time trustworthiness estimation as the label Y for training data [suggesting that the joint distribution of the training data are modeled] – Ouyang, paragraph 46);
wherein determining the deviation … comprises comparing the joint distribution of the training data to the joint distribution of [other] data (experiment on a very fast decision tree shows that the underlying joint distribution P(X, Y) is not constant; there exists concept drift within the observed features and the label in sampled data; the reason for this can be attributed to the use of a real-time trustworthiness estimation as the label Y for training data; the real-time estimation might deviate from the real ground-truth [suggesting that the system compares the joint distribution of the training data at one time to the joint distribution of the data at another time] – Ouyang, paragraph 46)..”
Ouyang and the instant application both relate to concept drift and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Maughan, Guo, and Jose to take the joint distribution of the training data into account when determining whether concept drift has occurred, as disclosed by Ouyang, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would take both the features of the underlying data sample and the labels into account in determining whether concept drift has occurred, thereby potentially improving the accuracy of the concept drift determination when compared to a system in which only the feature drift or only the label drift is considered.  See Ouyang, paragraph 46.

Claim 10 is a device claim corresponding to method claim 2 and is rejected for the same reasons as given in the rejection of that claim.  Similarly, claim 16 is a non-transitory computer-readable medium .

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Maughan in view of Jose, Guo, and Ouyang and further in view of Brand et al. (US 20160071027) (“Brand”).
Regarding claim 3, neither Maughan, Jose, Guo, nor Ouyang appears to disclose explicitly the further limitations of the claim.  However, Brand discloses that “the sidecar learning model comprises one of a Gaussian mixture model, a self organizing map, an auto-encoding neural network, and a Mahalanobis-Taguchi system (in a system for detecting trends in event streams, a system for processing the event stream includes a central modeler that can perform a Gaussian mixture model process on the event stream – Brand, paragraph 136 and Fig. 8; see also Fig. 9 [showing that this system including the Gaussian mixture model processes the event stream in parallel with another system, thereby qualifying it as a “sidecar learning model”]).” 
Brand and the instant application both relate to concept drift and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Maughan, Jose, Guo, and Ouyang to make the sidecar learning model comprise a Gaussian mixture model, as disclosed by Brand, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the system to model multimodal distributions of data, thereby making the system more robust.  See Brand, paragraph 136.

Claims 8, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Maughan in view of Jose and Guo and further in view of Das et al. (US 20170206469) (“Das”).
Regarding claim 8, neither Maughan, Guo, nor Jose appears to disclose explicitly the further limitations of the claim.  However, Das discloses “presenting, in a user interface, a real-time graph that depicts the deviation of the operational input data from the training data (once trained, a predictive model may be used to predict a label based on test measurements, but once trained, the predictive model may become out-of-date and inaccurate [i.e., the training data distribution may deviate from the operational data distribution] – Das, paragraphs 40-42; a threshold representing the predictive model may remain constant although the data drift; therefore, over time, the predictive model may become out of date and inaccurate – id. at paragraph 29; output may be provided within a GUI – id. at paragraph 35; see also Figs. 3A-D [showing graphical representations of the deviation of the data from the original/training data distribution in Fig. 3A to a later/operational input data distribution in Fig. 3D, taken over consecutive periods of time [i.e., in real-time]]).”
Das and the instant application both relate to concept drift and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Maughan, Guo, and Jose to display graphs showing the deviation of the original data from the operational input data, as disclosed by Das, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would provide the user with up-to-date information about the accuracy of the classifier, thereby indicating to the user when retraining needs to occur and preventing inaccurate classifications.  See Das, paragraph 29.

Claim 14 is a device claim corresponding to method claim 8 and is rejected for the same reasons as given in the rejection of that claim.  Similarly, claim 20 is a non-transitory computer-readable medium claim corresponding to method claim 8 and is rejected for the same reasons as given in the rejection of that claim.

Claims 12 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Maughan in view of Jose and Guo and further in view of Yoo et al. (US 20180068218) (“Yoo”).
Regarding claim 12, neither Maughan, Guo, nor Jose appears to disclose explicitly the further limitations of the claim.  However, Yoo discloses that “the processor device is further to: 
receive a request to train the predictive learning model (training apparatus generates a second neural network 920 [sidecar learning model] having the same layer structure as that of the first neural network 810 [predictive learning model] to train the neural network structure 800 – Yoo, paragraph 110; see also Fig. 9); and 
in response to the request, automatically generate the sidecar learning model (raining apparatus generates a second neural network 920 [sidecar learning model] having the same layer structure as that of the first neural network 810 [predictive learning model] to train the neural network structure 800 – Yoo, paragraph 110 [so the second network is generated for the purpose of training the first network – i.e., in response to a request to train it]; see also Fig. 9).”
Yoo and the instant application both relate to the training of multiple neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Maughan, Guo, and Jose to generate the sidecar learning model automatically in response to a request to train the predictive learning model, as disclosed by Yoo, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would eliminate the need for manual generation of multiple networks, thereby saving human time and energy.  See Yoo, paragraph 110.

Claim 18 is a non-transitory computer-readable medium claim corresponding to device claim 12 and is rejected for the same reasons as given in the rejection of that claim.

Response to Arguments
Applicant's arguments filed December 21, 2021 (“Remarks”) have been fully considered but they are, to the extent not rendered moot by the introduction of new grounds of rejection, not persuasive.
Applicant first argues that the term “data” should not be required to be used only in the plural form because (a) MPEP § 608.01 allegedly does not give Examiners the authority to require grammatical corrections; (b) the MPEP’s prohibition on requiring American spelling allegedly indicates that it contemplates variations in writing style; and (c) the requirement allegedly stems from a personal preference of Examiner rather than an actual grammatical issue.  Remarks at 8.  Regarding (a), the above-cited section See also Wash. St. U., Data/Datum, https://brians.wsu.edu/2016/05/24/data-datum/ (“[W]riters addressing an international audience of nonspecialists would probably be safer treating “data” as plural.”); APA Style Blog, Data Is, or Data Are?, https://blog.apastyle.org/apastyle/2012/07/data-is-or-data-are.html (“Scientific results are built upon testing things multiple times across multiple people, and we draw conclusions from the aggregate, not the individual, data points. Therefore, when referring to the collective results, be sure to use the plural form[.]”); IEEE Editorial Style Manual for Authors 20 (2021), http://journals.ieeeauthorcenter.ieee.org/wp-content/uploads/sites/7/IEEE-Editorial-Style-Manual_081920.pdf (“The data were collected … (always plural)”).  As such, avoiding the singular use of “data” is not a matter of personal taste, but a function of the way the English language is objectively constructed.  The objection is maintained and will continue to be maintained until the requested correction is made.
Applicant’s arguments with respect to the applicability of the previously cited art to the claims as amended, Remarks at 9-11, are moot in light of the addition of Guo to the rejection.

Conclusion 
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RYAN C VAUGHN whose telephone number is (571)272-4849.  The examiner can normally be reached on M-R 7a-5:30p ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/R.C.V./Examiner, Art Unit 2125

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125