DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant's claim for foreign priority under 35 U.S.C. 119 (a)-(d).
Information Disclosure Statement
The information disclosure statement (IDS) submitted on July 16, 2020 was filed after the mailing date of the Preliminary Amendment filed on July 16, 2020.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Response to Amendment
Applicant’s Preliminary Amendment, filed July 16, 2020, has been fully considered and entered.  Accordingly, Claims 1-47 are pending in this application.  Claims 1-13, 16, 17, 19, 22, 30, 31, 33, 40, 42, and 44-47 have been cancelled.  Claims 14, 15, 18, 20, 21, 23-29, 32, 34-39, 41, and 43 have been amended.  Claims 14, 24, 28, and 37 are independent claims.
Claim Objections
Claims 15, 18, 20, 21, 23, 25-27, 29, 32, 34-36, 38, 41, and 43 objected to because of the following informalities: “A method” should read “The method”  Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claims 18, 20, 21, 23, 25, 34, 39, 41, and 43 are rejected under 35 U.S.C. 112(b), as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, regards as the invention.
Regarding Claims 18, 20, 21, 23, 25, 34, 39, 41, and 43, the phrases "preferably" and “optionally” render the claims indefinite because it is unclear whether the limitations following the phrase are part of the claimed invention.  See MPEP § 2173.05(d).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 14, 15, 18, 21, 23, 24, 27, 41, and 43 are rejected under 35 U.S.C. 103 as being unpatentable over Horowitz (PG Pub. No. 2017/0322996 A1), and further in view of Campos (PG Pub. No. 2003/0212520 A1).
Regarding Claim 14, Horowitz discloses a computer-implemented method of clustering data in a data set comprising a plurality of data records each having respective attribute values for a plurality of attributes, the method comprising:
receiving clustering parameters comprising a partitioning attribute, specifying a selection of a given attribute of the plurality of attributes of the data records (see Horowitz, paragraph [0065], where data can be partitioned in chunks of data … the chunks of data are typically constructed of contiguous ranges of data); and
identifying a plurality of partitions of the data set based on values of the partitioning attribute (see Horowitz, paragraph [0065], where data can be partitioned in chunks of data … the chunks of data are typically constructed of contiguous ranges of data).
Horowitz does not disclose:
a cluster count specifying a number of clusters to be generated;
generating a plurality of initial cluster centres, each cluster centre defined for one of the partitions;
running a clustering algorithm using the generated initial cluster centres to define starting clusters for the clustering algorithm, the clustering algorithm identifying a plurality of clusters based on the initial cluster centres; and
outputting data defining the identified clusters.
The combination of Horowitz and Campos discloses:
a cluster count specifying a number of clusters to be generated (see Campos, paragraph [0009], where maximum number of clusters allowed is contemplated);
generating a plurality of initial cluster centres, each cluster centre defined for one of the partitions (see Campos, paragraph [0008], where the means for building an enhanced k-means clustering model comprises means for initializing centroids of clusters of the clustering model; see also paragraph [0009], where means for choosing nodes to be split for a balanced tree comprises means for choosing splits on all nodes in a level if a resulting number of leaves does not exceed a maximum number of leaves allowed and means for ranking nodes by dispersion and choosing as many splits as are possible in order of dispersion without exceeding the maximum number of clusters allowed, if splitting on all nodes in a level is not possible);
running a clustering algorithm using the generated initial cluster centres to define starting clusters for the clustering algorithm, the clustering algorithm identifying a plurality of clusters based on the initial cluster centres (see Campos, paragraph [0064 – 0066], where k-means has two steps: 1. assign data points to clusters); and
outputting data defining the identified clusters (see Campos, paragraph [0008], where means for applying the enhanced-means clustering model using the second data table to generate apply output data).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Horowitz with Campos for the benefit of enhanced K-means clustering (see Campos, Abstract).
Regarding Claim 15, Horowitz in view of Campos discloses a method according to Claim 14, wherein the partitioning attribute includes one of:
categorical data, the method comprising identifying a respective partition for each distinct category value in the partitioning attribute; and
non-categorical data, the method comprising identifying a respective partition for each of a plurality of distinct categories derived from values in the partitioning attribute, wherein a category is derived for each of a set of distinct value ranges of a numerical partitioning attribute (see Horowitz, paragraph [0065], where data can be partitioned in chunks of data … the chunks of data are typically constructed of contiguous ranges of data).
Regarding Claim 18, Horowitz in view of Campos discloses a method according to Claim 14, comprising allocating initial cluster centres to partitions, the allocating comprising at least one of:
allocating initial cluster centres to partitions in dependence on, optionally proportionally to, a number of data records in respective partitions;
where the number of partitions is less than the cluster count, allocating multiple initial cluster centres to one or more partitions, preferably one or more partitions with the most data records;
where the number of partitions is greater than the cluster count, allocating a single initial cluster centre to each of a selected set of partitions, preferably those with the most data records; and
allocating a plurality of the initial cluster centres to a given partition by subpartitioning the given partition based on a second partitioning attribute, and allocating at least one initial cluster centre to one or more of the subpartitions (see Horowitz, paragraph [0006], where a partition component configured to detect a partition size for at least one of the plurality of database partitions that exceeds a size threshold, split the at least one of the database partitions into at least a first and a second partition, control a distribution of data within the first and second partition based on a value for a database key associated with the data in the at least one of the plurality of database partitions; see also paragraph [0050], where a shard of data corresponds to a chunk of data; a chunk is also a reference to a partition of a database table).
Regarding Claim 21, Horowitz in view of Campos discloses a method according to Claim 14, further comprising:
Horowitz does not disclose sampling the data set by selecting a subset of records from respective partitions and optionally subpartitions, wherein initial cluster centres for respective partitions are generated based on the selected records of the partitions.  Campos discloses sampling the data set by selecting a subset of records from respective partitions (see Campos, paragraph [0147], where if the entire data set does not fit in the buffer, a random sample is used) and optionally subpartitions, wherein initial cluster centres for respective partitions are generated based on the selected records of the partitions.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Horowitz with Campos for the benefit of enhanced K-means clustering (see Campos, Abstract).
Regarding Claim 23,  Horowitz in view of Campos discloses a method according to Claim 14, wherein the clustering algorithm identifies the plurality of clusters by a process comprising:
Horowitz does not disclose assigning data records to the starting clusters defined by the initial cluster centres, and re-computing initial cluster centres based on data records assigned to the corresponding clusters, the assigning and re-computing preferably repeated until a termination criterion is met.  Campos discloses assigning data records to the starting clusters defined by the initial cluster centres, and re-computing initial cluster centres based on data records assigned to the corresponding clusters, the assigning and re-computing preferably repeated until a termination criterion is met (see Campos, paragraph [0064 – 0067], where k-means has two steps: assign data points to clusters; that is, assign each of the rows in the buffer to the nearest cluster; and update the centroids).
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Horowitz with Campos for the benefit of enhanced K-means clustering (see Campos, Abstract).
Regarding Claim 24, Horowitz discloses a computer-implemented method of clustering data in a data set comprising a plurality of data records each having respective attribute values for a plurality of attributes, the method comprising:
receiving a partitioning attribute, specifying a selection of a given attribute of the plurality of attributes of the data records (see Horowitz, paragraph [0065], where data can be partitioned in chunks of data … the chunks of data are typically constructed of contiguous ranges of data); and
identifying a plurality of partitions of the data set based on values of the partitioning attribute (see Horowitz, paragraph [0065], where data can be partitioned in chunks of data … the chunks of data are typically constructed of contiguous ranges of data).
Horowitz does not disclose:
sampling the data set by selecting a subset of records from respective partitions, wherein the number of records selected from a partition is dependent on the size of the partition, resulting in a sample set of records from the data set;
running a clustering algorithm on the sample set of records, the clustering algorithm identifying a plurality of clusters based on the sample set; and
outputting data defining the identified clusters.
The combination of Horowitz and Campos discloses:
sampling the data set by selecting a subset of records from respective partitions, wherein the number of records selected from a partition is dependent on the size of the partition, resulting in a sample set of records from the data set (see Campos, paragraph [0147], where if the entire data set does not fit in the buffer, a random sample is used);
running a clustering algorithm on the sample set of records, the clustering algorithm identifying a plurality of clusters based on the sample set (see Campos, paragraph [0064 – 0066], where k-means has two steps: 1. assign data points to clusters); and
outputting data defining the identified clusters (see Campos, paragraph [0008], where means for applying the enhanced-means clustering model using the second data table to generate apply output data).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Horowitz with Campos for the benefit of enhanced K-means clustering (see Campos, Abstract).
Regarding Claim 27, Horowitz in view of Campos discloses a method according to Claim 24, wherein:
Horowitz does not disclose the sampling is performed using random gap sampling.  Campos discloses the sampling is performed using random gap sampling (see Campos, paragraph [0147], where if the entire data set does not fit in the buffer, a random sample is used).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Horowitz with Campos for the benefit of enhanced K-means clustering (see Campos, Abstract).
Regarding Claim 41, Horowitz in view of Campos discloses a method of Claim 14, comprising:
Horowitz does not disclose receiving one or more further data records and classifying the one or more further data records based on the cluster definition data output in the outputting step, wherein the cluster definition data comprises the cluster centre for each cluster, optionally a centroid or medoid for each cluster.  Campos discloses receiving one or more further data records and classifying the one or more further data records based on the cluster definition data output in the outputting step, wherein the cluster definition data comprises the cluster centre for each cluster, optionally a centroid or medoid for each cluster (see Campos, paragraph [0008], where means for applying the enhanced-means clustering model using the second data table to generate apply output data).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Horowitz with Campos for the benefit of enhanced K-means clustering (see Campos, Abstract).
Regarding Claim 43, Horowitz in view of Campos discloses a method of Claim 14, wherein the data records are received from one or more remote client systems (see Horowitz, paragraph [0048], where databases such as network-based, file-based, entity-based, relational, and object oriented, can be configured to operate within a sharded environment), preferably at a central processing system performing the clustering, the method optionally further comprising controlling one or more client systems or devices connected thereto based on the identified clusters and/or based on classification of further data records using the identified clusters; wherein the outputting step comprises transmitting the cluster definition data to the client systems, and optionally using the cluster definition data at the client systems to classify subsequent data records and/or control one or more devices connected to the client systems, optionally wherein the client systems receive the data records from the one or more connected devices or generate the data records based on data received from the one or more connected devices.
Claim 20 rejected under 35 U.S.C. 103 as being unpatentable over Horowitz and Campos as applied to Claims 14, 15, 18, 21, 23, 24, 27, 41, and 43 above, and further in view of Fayyad (US Patent No. 6,012,058 A).
Regarding Claim 20, Horowitz in view of Campos discloses a method according to Claim 14, wherein:
Horowitz does not disclose generating the initial cluster centre for one or more of the partitions comprises selecting the initial cluster centre randomly within a feature space defined by values of the data records in the partition, optionally by selecting a random record of the partition as basis for the initial cluster centre, or selecting the initial cluster centre from the records in the partition based on a density function.  Fayyad discloses generating the initial cluster centre for one or more of the partitions comprises selecting the initial cluster centre randomly within a feature space defined by values of the data records in the partition (see Fayyad, column 6, lines 31-34, where one traditional K-means evaluation starts with a random choice of cluster centroids or means that are randomly placed within the extent of the data on the x axis; call these M1, M2, and M3, in Fig. 2), optionally by selecting a random record of the partition as basis for the initial cluster centre, or selecting the initial cluster centre from the records in the partition based on a density function.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Horowitz with Fayyad for the benefit of scalable k-means clustering for large databases (see Fayyad, Abstract).
Claim 28 is rejected under 35 U.S.C.103 as being unpatentable over Adachi (JP2003067389 A) and further in view of Campos.
Regarding Claim 28, Adachi discloses a computer-implemented method of clustering data in a data set comprising a plurality of data records each having respective attribute values for a plurality of attributes, the method comprising:
receiving a data type selection specifying one of a plurality of data types (see Adachi, paragraph [0020], where FIG. 8 shows an event procedure when the process execution button 55 is pressed; first, at 81, the clustering method and clustering data type setting are loaded).
Adachi does not disclose:
deriving reduced feature vectors from data records of the data set, wherein a reduced feature vector comprises a set of attributes selected from the data records having the selected data type;
running a clustering algorithm to identify a plurality of clusters in the data records, wherein the clustering algorithm clusters the derived reduced feature vectors to identify a plurality of data clusters; and
outputting data defining the identified clusters.
The combination of Adachi and Campos discloses:
deriving reduced feature vectors from data records of the data set, wherein a reduced feature vector comprises a set of attributes selected from the data records having the selected data type (see Campos, paragraph [0039 – 0040], where it can significantly speed up building clustering models. It is very expensive to run distance-based clustering algorithms in large datasets with many attributes. The tree provides a summary of the density that can be used to train clustering algorithms instead of using the original data. Fewer points translate into faster training; it introduces a gradual form of dimensionality reduction);
running a clustering algorithm to identify a plurality of clusters in the data records, wherein the clustering algorithm clusters the derived reduced feature vectors to identify a plurality of data clusters (see Campos, paragraph [0064 – 0066], where k-means has two steps: 1. assign data points to clusters); and
outputting data defining the identified clusters (see Campos, paragraph [0008], where means for applying the enhanced-means clustering model using the second data table to generate apply output data).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Horowitz with Campos for the benefit of enhanced K-means clustering (see Campos, Abstract).
Claims 25 and 37 are rejected under 35 U.S.C. 103 as being unpatentable over Horowitz and Campos as applied to Claims 14, 15, 18, 21, 23, 24, 27, 41, and 43 above, and further in view of Muffat (PG Pub. No. 2020/0250241 A1).
Regarding Claim 25, Horowitz in view of Campos discloses a method according to Claim 24, wherein:
Horowitz does not disclose the number of records selected from respective partitions is further dependent on a total required sample size and/or wherein the number of records selected from the partition is proportional to the size of the partition, optionally in accordance with a required sampling ratio.  Muffat discloses the number of records selected from respective partitions is further dependent on a total required sample size and/or wherein the number of records selected from the partition is proportional to the size of the partition (see Muffat, paragraph [0021], where the method also includes representative sampling of the subset of documents from the pool of data by using weighted clustering techniques and subclustering approaches), optionally in accordance with a required sampling ratio.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Horowitz with Muffat for the benefit of balanced sampled dataset generation (see Muffat, Abstract).
Regarding Claim 37, Horowitz in view of Campos discloses a method according to Claim 14, the method comprising, for each of a plurality of segments of the data set, each segment comprising a subset of records of the data set:
retrieving a plurality of data records of the segment from storage (see Horowitz, paragraph [0065], where data can be partitioned in chunks of data … the chunks of data are typically constructed of contiguous ranges of data).
Horowitz does not disclose:
performing an initial clustering process on the retrieved data records to identify a set of clusters, each cluster defined by a representative data record;
performing a further clustering process on the representative data records defining the clusters found for each segment to identify a second set of clusters; and
wherein the outputting step comprises outputting data defining the second set of clusters.
The combination of Horowitz and Campos discloses:
performing an initial clustering process on the retrieved data records to identify a set of clusters (see Campos, paragraph [0064 – 0066], where k-means has two steps: 1. assign data points to clusters); and
wherein the outputting step comprises outputting data defining the second set of clusters (see Campos, paragraph [0008], where means for applying the enhanced-means clustering model using the second data table to generate apply output data).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Horowitz with Campos for the benefit of enhanced K-means clustering (see Campos, Abstract).
The combination of Horowitz and Campos does not disclose:
each cluster defined by a representative data record; and
performing a further clustering process on the representative data records defining the clusters found for each segment to identify a second set of clusters.
The combination of Horowitz, Campos, and Muffat disclsoes:
each cluster defined by a representative data record (see Muffat, paragraph [0021], where the method also includes representative sampling of the subset of documents from the pool of data by using weighted clustering techniques and subclustering approaches);
performing a further clustering process on the representative data records defining the clusters found for each segment to identify a second set of clusters (see Muffat, paragraph [0021], where the method also includes representative sampling of the subset of documents from the pool of data by using weighted clustering techniques and subclustering approaches).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Horowitz and Campos with Muffat for the benefit of balanced sampled dataset generation (see Muffat, Abstract).
Claim 26 is rejected under 35 U.S.C. 103 as being unpatentable over Horowitz and Campos as applied to Claims 14, 15, 18, 21, 23, 24, 27, 41, and 43 above, and further in view of Mukherjee (PG Pub. No. 2016/0026667 A1).
Regarding Claim 26, Horowitz in view of Campos discloses a method according to Claim 24, comprising:
Horowitz does not disclose subpartitioning a given partition in dependence on at least one further partitioning attribute, and selecting sampled records for the given partition from respective subpartitions in dependence on sizes of the subpartitions.  Mukherjee discloses subpartitioning a given partition in dependence on at least one further partitioning attribute (see Mukherjee, paragraph [0204], where composite partitioning, involves creating partitions of partitions. For example, a table may be partitioned using a ranged based partitioning scheme to create a set of first-level partitions. A hash function may then be applied to each of the first-level partitions to create, for each first level partition, set of second level partitions. Further, the partitioning key used to create the partitions at one level may be different than the partitioning key used to create the partitions at other levels), and selecting sampled records for the given partition from respective subpartitions in dependence on sizes of the subpartitions.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Horowitz with Mukherjee for the benefit of distributing partitioned data across a plurality of computing devices (see Mukherjee, Abstract).
Claim 29 is rejected under 35 U.S.C. 103 as being unpatentable over Adachi and Campos as applied to Claim 28 above, and further in view of Altman (PG Pub. No. 2014/0165198 A1).
Regarding Claim 29, Adachi in view of Campos discloses a method according to claim 28, comprising:
Adachi does not disclose at least one of: repeating the clustering for each of the plurality of data types; performing the clustering in parallel for each of the plurality of data types; performing each clustering pass using a different similarity or distance metric selected in dependence on the data type.  Altman discloses at least one of: repeating the clustering for each of the plurality of data types; performing the clustering in parallel for each of the plurality of data types; performing each clustering pass using a different similarity or distance metric selected in dependence on the data type (see Altman, paragraph [0012], where in some embodiments, measuring the distance metrics includes measuring first and second different distance metrics for respective different first and second clusters. Measuring the different distance metrics may include assigning to the first and second clusters respective different weights that emphasize different dimensions of the multi-dimensional space).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Horowitz with Altman for the benefit of multidimensional clustering (see Altman, Abstract).
Claims 32, 34, and 35 are rejected under 35 U.S.C. 103 as being unpatentable over Fayyad, and further in view of Muffat.
Regarding Claim 32, Fayyad discloses computer-implemented method of clustering data in a data set comprising a plurality of data records, the method comprising:
running a clustering process to identify a plurality of clusters in the data records at a first level of clustering (see Fayyad, column 6, lines 28-30, where the k-means algorithm takes as input the number of clusters K, a set of K initial estimates of the cluster means, and the data set to be clustered); and
running a clustering process at one or more further levels of clustering, wherein the clustering process at a given further level identifies, for each of a plurality of higher-level clusters identified at a preceding level of clustering, a plurality of subclusters by clustering data records of the respective higher-level cluster (see Fayyad, column 13, lines 25-26, where the current data recarding subclustering into the data set CS is depicted in a panel 340 of the screen).
Fayyad does not disclose:
wherein clustering at each of the first and further levels of clustering is performed based on a clustering strategy selected from a plurality of available clustering strategies which is applied to records in the data set or in a cluster of records identified in a previous clustering level; and
wherein the clustering strategy used at each level of clustering is configurable and specified by way of one or more clustering parameters .
Muffat discloses:
wherein clustering at each of the first and further levels of clustering is performed based on a clustering strategy selected from a plurality of available clustering strategies which is applied to records in the data set or in a cluster of records identified in a previous clustering level (see Muffat, paragraph [0021], where the method also includes representative sampling of the subset of documents from the pool of data by using weighted clustering techniques and subclustering approaches); and
 wherein the clustering strategy used at each level of clustering is configurable and specified by way of one or more clustering parameters (see Muffat, paragraph [0021], where the method also includes representative sampling of the subset of documents from the pool of data by using weighted clustering techniques and subclustering approaches).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Fayyad with Muffat for the benefit of balanced sampled dataset generation (see Muffat, Abstract).
Regarding Claim 34, Fayyad in view of Muffat discloses a method according to Claim 32, wherein the available clustering strategies comprise one, several, or each of:
clustering data records based on initial clusters selected for a plurality of data partitions in accordance with one or more selected partitioning attributes;
clustering data records based on initial clusters identified by random centroid selection within an unpartitioned set of records to be clustered, optionally using k-means clustering (see Fayyad, column 6, lines 31-34, where one traditional K-means evaluation starts with a random choice of cluster centroids or means that are randomly placed within the extent of the data on the x axis; call these M1, M2, and M3, in Fig. 2); and
clustering data records based on reduced feature vectors type selected in dependence on data types of attributes of the data records.
Regarding Claim 35, Fayyad in view of Muffat discloses a method according to Claim 32, comprising at a given clustering level, performing subclustering in parallel for a plurality of clusters identified in a preceding level of clustering (see Fayyad, column 14, lines 21-27, where scalable K-means process could be performed until a certain percentage of the models have reached a stopping criteria. The multiple model implementation shares data structures between models and performs calculations on certain data unique to a given model. This analysis is susceptible to parallel processing on a computer 20 having multiple processing units 21).
Claims 36 is rejected under 35 U.S.C. 103 as being unpatentable over Fayyad and Muffat as applied to Claim 32 and 34 above, and further in view of Horowitz and Campos.
Regarding Claim 36, Fayyad in view of Muffat discloses a method according to Claim 32, wherein:
Fayyad does not disclose clustering at one or more of the further clustering levels is performed on a reduced set of records obtained by sampling a cluster identified in a preceding level of clustering.  Campos discloses clustering at one or more of the further clustering levels is performed on a reduced set of records obtained by sampling a cluster identified in a preceding level of clustering (see Campos, paragraph [0147], where if the entire data set does not fit in the buffer, a random sample is used).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Fayyad with Campos for the benefit of enhanced K-means clustering (see Campos, Abstract).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARHAD AGHARAHIMI whose telephone number is (571)272-9864. The examiner can normally be reached M-F 9am - 5pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Apu Mofiz can be reached on 571-272-4080. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/FARHAD AGHARAHIMI/Examiner, Art Unit 2161                                                                                                                                                                                                        
/ETIENNE P LEROUX/Primary Examiner of Art Unit 2161