DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

	This non-final action is responsive to the application filed on 4/28/19.
	Claims 1-20 are pending.           
                                                                                                                                                                                              
Allowable Subject Matter
Claims 5, 6, 16, and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 10, and 12 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Sample et al. (US 20160147860, Herein “Sample”).
Regarding claim 1, Sample teaches A method comprising: 
obtaining a baseline dataset, wherein the baseline dataset comprising a first set of instances, each instance comprising feature values in a feature space, wherein each instance of the first set of instances is associated with a label (for a given dataset, analyze tiles of the respective dataset (e.g., first dataset) [0076] each tile corresponding with values, such as for which hash value is computed, representing the respective content [0077], each tile labeled using a hierarchical coordinate [0082]); 
determining a set of clusters in the feature space, based on the feature values of the first set of instances (for a first dataset (and a second dataset) such that comparison is made between datasets based on cluster analysis, each cluster based on hash value for each tile in a cluster, further based on some number of bytes corresponding to content such as unique content [0077]); 
determining a baseline distribution of instances over the set of clusters, wherein said determining the baseline distribution is based on the baseline dataset (distribution of hash values, such as for comparing distribution across clusters ([0078] and [0079])); 
for each cluster, computing a performance metric for a predictor for the each cluster, wherein the predictor is configured to estimate an estimated label for an instance, wherein the performance metric is indicative of a successful estimation of the predictor to a portion of the first set of instances that are comprised by the each cluster (hash value estimating an overal change for a given cluster of the respective dataset [0080]), the hash being a label representing the respective unique content, such as a 256 byte label for a given cluster ([0077] and [0078]), each label or composite hash representing a state of a given cluster, such as for comparing changes or drifts between composite hashes of respective clusters of respective datasets [0079]); 
obtaining a second dataset, wherein the second dataset comprising a second set of instances, each of which comprising feature values in the feature space; determining a second distribution of instances over the set of clusters, wherein said determining the second distribution is based on the second dataset (hash value of second dataset for compsiron of clusters of datasets [0080]); and 
based on the second distribution and on the baseline distribution, and based on at least one performance metric of at least one the cluster of the set of clusters, identifying a data drift in the second dataset with respect to the baseline dataset (determining changes made to based on hashes and, even further, individual hashes for matching/drift analysis ([0079] and [0080])). 

Regarding claim 10, Sample teaches A computerized apparatus having a processor and coupled memory, the processor ([0007], [0062], and [0067]) being adapted to perform: 
The claim recites similar limitations as claim 1 – see above.

Regarding claim 12, the claim recites similar limitations as claims 1 and 10 – see above.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 2, 11, and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sample and in view of Kuehbandner et al. (US 20160221591, Herein “Kuehbandner”).
Regarding claim 2, Sample teaches the limitations of claim 1, as above.
Furthermore, Sample teaches The method of Claim 1, wherein said obtaining the second dataset is performed using a hardware device (processor ([0007], [0062], [0067], and [0073])).

However, Sample fails to specifically teach wherein said method further comprises: in response to identifying the data drift, replacing the hardware device. 
Yet, in a related art, Kuehbandner discloses based on the drift, replace the current settings of the hardware device by performing calibration of the sensor [0034].
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the hardware device calibration/replacement of Kuehbandner with the device monitoring using cluster analysis of Sample to have in response to identifying the data drift, replacing the hardware device. The combination would allow for, according to the motivation of Kuehbandner, verifying that a particular device is calibrated and in the event that drift occurs, perform replacement by calibration of the sensor [0034] thus ensuring that accurate recording and management of datasets such as from sensor signals is maintained ([0001] to [0010]). 

Regarding claim 11, the claim recites similar limitations as claim 2 – see above.

Regarding claim 13, the claim recites similar limitations as claim 2 – see above.


Claim(s) 3 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sample in view of Sathi et al. (US 20180337878, Herein “Sathi”) in view of Adir et al. (US 20170154280, Herein “Adir”).
Regarding claim 3, Sample teaches the limitations of claim 1, as above.
However, Sample fails to specifically teach The method of Claim 1, wherein the predictor is trained using a training dataset, wherein the training dataset comprises training instances and labels thereof; wherein said method further comprises: in response to identifying the data drift, determining a new training dataset, wherein the new training dataset comprises at least a portion of the second dataset, wherein each instance in the new training dataset has a corresponding label; and training the predictor using the new training dataset.
Yet, in a related art, Sathi discloses for each cluster, computing a performance metric for a predictor for the each cluster, wherein the predictor is configured to estimate an estimated label for an instance, wherein the performance metric is indicative of a successful estimation of the predictor to a portion of the first set of instances that are comprised by the each cluster (e.g., classification based on rule [0070]) wherein the performance metric is an accuracy metric including a confidence level for the predicted label ([0066] to [0069]). 
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the training for prediction of labeling of Sathi with the drift determination of Sample to have wherein the predictor is trained using a training dataset, wherein the training dataset comprises training instances and labels thereof; wherein said method further comprises: in response to identifying the data drift, determining a new training dataset, wherein the new training dataset comprises at least a portion of the second dataset, wherein each instance in the new training dataset has a corresponding label; and training the predictor using the new training dataset. The combination would allow for, according to the motivation of Sathi, for each cluster, assigning a predicted level based on an associated confidence level for the predicted label, thus improving the accuracy of the labeling particularly with respect to various clusters each of which is assigned a respective label [0004]. 

However, Sample in view of Sathi fails to specifically teach in response to identifying the data drift, determining a new training dataset, wherein the new training dataset comprises at least a portion of the second dataset, wherein each instance in the new training dataset has a corresponding label; and training the predictor using the new training dataset.
Yet, in a related art, Adir discloses updating existing trained data for clustering [0017] based on class labeling [0014].
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the incremental model updates of Adir with the cluster modeling of Sample in view of Sathi to have wherein the predictor is trained using a training dataset, wherein the training dataset comprises training instances and labels thereof; wherein said method further comprises: in response to identifying the data drift, determining a new training dataset, wherein the new training dataset comprises at least a portion of the second dataset, wherein each instance in the new training dataset has a corresponding label; and training the predictor using the new training dataset. The combination would allow for, according to the motivation of Adir, updating cluster analysis models when additional data becomes available, taking into account new data that has arrived, particularly with respect to incremental clustering models by taking into account old and new data ([0003] to [0006]).  

Regarding claim 14, the claim recites similar limitations as claim 3 – see above.


Claim(s) 4 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sample and in view of Zapata-Petrov et al. (US 10,878,403, Herein “Zapata-Petrov”).
Regarding claim 4, Sample teaches the limitations of claim 1, as above.
Furthermore, Sample teaches The method of Claim 1, wherein said determining the set of clusters is performed using a first clustering function (clustering of tiled images [0078]).

However, Sample fails to specifically teach wherein the method further comprises: determining a second set of clusters in the feature space, based on the feature values of the first set of instances, wherein said determining the second set of clusters is performed using a second clustering function; determining a second baseline distribution of instances over the second set of clusters, wherein said determining the second baseline distribution is based on the baseline dataset; for each cluster in the second set of clusters, computing the performance metric for the predictor for the each cluster; and determining a second actual distribution of instances over the second set of clusters, wherein said determining the second actual distribution is based on the second dataset.
Yet, in a related art, Zapata-Petrov discloses additional clusters based on the same dataset such as accessing a third dataset and affecting the clusters of the other dataset, further determining a threshold (i.e., distribution) with respect to both of the first and second datasets (col. 3, lines 45-57). 
It would hae been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the clustering strategy of Zapata-Petrov with the drifty analysis of Sample to have wherein the method further comprises: determining a second set of clusters in the feature space, based on the feature values of the first set of instances, wherein said determining the second set of clusters is performed using a second clustering function; determining a second baseline distribution of instances over the second set of clusters, wherein said determining the second baseline distribution is based on the baseline dataset; for each cluster in the second set of clusters, computing the performance metric for the predictor for the each cluster; and determining a second actual distribution of instances over the second set of clusters, wherein said determining the second actual distribution is based on the second dataset. The combination would allow for, according to the motivation of Zapata-Petrov, efficiently and effectively making comprisons by providing, e.g., benchmarks for comparison with respectifve groups overcoming problems of speed and efficiency in analyzing clusters of data, such as for differences/drift of information among various datasets (col. 1, lines 28-50). 

Regarding claim 15, the claim recites similar limitations as claim 4 – see above.


Claim(s) 7 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sample and in view of Sathi.
Regarding claim 7, Sample teaches the limitations of claim 1, as above.
However, Sample fails to specifically teach The method of Claim 1, wherein the performance metric is selected from a group consisting of: a F1 score metric, an accuracy metric, a R-squared metric, and a Root Mean Square Error (RSME) metric.
Yet, in a related art, Sathi discloses for each cluster, computing a performance metric for a predictor for the each cluster, wherein the predictor is configured to estimate an estimated label for an instance, wherein the performance metric is indicative of a successful estimation of the predictor to a portion of the first set of instances that are comprised by the each cluster (e.g., classification based on rule [0070]) wherein the performance metric is an accuracy metric including a confidence level for the predicted label ([0066] to [0069]). 
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the associated confidence level for the predicted level of Sathi with the drift determination of Sample to have wherein the performance metric is selected from a group consisting of: a F1 score metric, an accuracy metric, a R-squared metric, and a Root Mean Square Error (RSME) metric. The combination would allow for, according to the motivation of Sathi, for each cluster, assigning a predicted level based on an associated confidence level for the predicted label, thus improving the accuracy of the labeling particularly with respect to various clusters oeach of which is assigned a respective label [0004]. 

Regarding claim 18, the claim recites similar limitations as claim 7 – see above.


Claim(s) 8 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sample and in view of Hu et al. (US 20190147469, Herein “Hu”). 
Regarding claim 8, Sample teaches the limitations of claim 1, as above.
However, Sample fails to specifically teach The method of Claim 1, wherein the baseline dataset is used for testing prediction accuracy of the predictor.
Yet, in a related art, Hu discloses improving the labeling of a determined cluster label ([0042] and [0043]). 
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the testing prediction of Hu with the drift determination of Sample to have wherein the baseline dataset is used for testing prediction accuracy of the predictor. The combination would allow for, according to the motivation of Hu, better determining models modeling clusters of data for considering the impoact or difference (i.e., trends) in different clusters of data [0002]. 

Regarding claim 19, the claim recites similar limitations as claim 8 – see above.


Claim(s) 9 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sample and in view of Weiss et al. (US 20180300865, Herein “Weiss”).
Regarding claim 9, Sample teaches the limitations of claim 1, as above.
Furthermore, Sample teaches The method of Claim 1, wherein the second dataset is a production dataset (datasets for producing, e.g., images such as from tile data [0067]).

However, Sample fails to specifically teach wherein the predictor is trained using the baseline dataset; and wherein the method further comprises predicting, using the predictor, a label for an instance that is comprised by the production dataset.
Yet, in a related art, Weiss discloses labeling the cluster groups, such as labeling a second vector group  [0012] and makes abundantly clear production dataset asdata of respective groups among assembly units [0012].
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the a label for an instance that is comprised by the production dataset of Weiss with the drift determination of Sample to have wherein the method further comprises predicting, using the predictor, a label for an instance that is comprised by the production dataset. The combination would allow for, according to the motivation of Weiss, examining clusters [0012] among data sets such as based on various assemblies of a set of assembly units further based on labeled clusters for identifying a drift or difference by flagging a second cluster [0011], further allowing for identifying features associated with the drift [0013]. 

Regarding claim 20, the claim recites similar limitations as claim 9 – see above.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASON EDWARDS whose telephone number is (571) 272-5334. The examiner can normally be reached on Mon-Fri; 8am-5pm EST.
	If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Scott Baderman can be reached on 571-272-3644. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
	Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance form a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA or CANADA) or 571-272-1000.
/JASON T EDWARDS/              Examiner, Art Unit 2144