DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
   
 
Reasons For Allowance

The cited references do not disclose determining a plurality of clusters, wherein each cluster comprises a plurality of instances that are comprised by a same bucket, wherein said determining a plurality of clusters is based on valuations of the set of features for the instances, whereby grouping similar instances into a cluster, based on the plurality of clusters, determining an alternative set of features comprising a set of generalized features, wherein each generalized feature in the set of generalized features corresponds to a cluster of the plurality of clusters, wherein a generalized feature that corresponds to a cluster is indicative of the instance being a member of the corresponding cluster, determining a generalized second instance, wherein the generalized second instance comprises a valuation of the alternative set of features for the second instance, and based on the generalized second instance, determining a label for the second instance.




After a thorough search, and in light of the prior art of record, claims 1-20 are allowed.

The drawings filed 8/6/2019 have been accepted.


Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Relevance is provided in at least the Abstract of each cited document. 

Non-Patent Literature
McCallum, Andrew, et al., “Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching”, KDD 2000, Boston, MA, © ACM 2000, pp. 169-178.
Efficient technique for clustering large datasets using assumptions about a distance metric (page 169 Abstract, section 1. Introduction 3rd paragraph); Two stage clustering involving creation of a subset of elements and an approximated distance measurement (page 170, sect. 2. Efficient Clustering of Canopies); Use of GAC to group items based on similarity (pages 171-172 sect. 2.2 Canopies with Greedy Agglomerative Clustering); Related to grouping of bibliographic citations (page 169 Abstract, page 173 sect. 3. Clustering Textual Bibliographical References); No discussion of feature valuation, determination of alternative features or determining a second label).

Rokach, Lior, et al., Data Mining and Knowledge Discovery Handbook, “Chapter 15:  Clustering Methods, © Springer 2005, pp. 321-352.
Survey of clustering techniques including determination of similarity/dissimilarity for the clustering of large datasets (page 321, Abstract); Distance determination options based on attribute type (pages 322-325, sect 2. Distance Measures); Well-known clustering mechanisms (pages 330-340, sect. 5. Clustering Methods); Hybrid approaches to clustering for simplification in a 6. Clustering Large Datasets); No explicit discussion of feature valuation, determination of alternative features or determining a second label).



US Patent Application Publications
Dalyac 	 				2018/0300576
Machine learning used to classify/label a target dataset (Abstract); Clustering of data points from a dataset presumed to have similar/identical label values (para 0104); Labelling to aid in similarity searches (para 0132); No explicit discussion of the use of generalized features, determination of alternative features or determining a second label).


Harman 	 				2020/0250580
Machine learning for creation of labelers to generate training data for updating labelers of datasets (Abstract, Fig. 1); Field of endeavor - adding labels to data for training purposes (para 0003); Labelled datasets (para 0033); Labeling noise and use f target/candidate labelers (para 0052); No explicit discussion of feature valuation, determination of alternative features or determining a second label).

White 	 				2015/0262077
Development of predictive models (para 0025); Bucketizing of search activity (paras 0044, 0047); Use of temporal features (para 0063]; classes of features used in prediction (para 0068, Table 2); Use of k-means clustering (para 0083); No explicit discussion of the use of generalized features, or determination of alternative features).


US Patents
Durham 					10,311,361
Utilized to organize digital media, especially digital music (Abstract, col. 2, lines 7-28); Machine learning / training to take a small set of labeled examples and weights, and label a large set 










Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Robert Stevens whose telephone number is (571) 272-4102.  The examiner can normally be reached on M-F 6:00 – 2:30.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashish Thomas can be reached on (571) 272-0631.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/ROBERT STEVENS/Primary Examiner, Art Unit 2164                                                                                                                                                                                                        



January 14, 2022