DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 20 April 2022 has been entered.

Response to Amendment
This communication is in response to the amendment filed on 20 April 2022.
Claims 1, 5, 7, 8, 12, 14, 15, 19, and 20 are amended.
Claims 1-20 have been examined. 

Response to Arguments
In response to Applicant’s remarks filed on 20 April 2022:
a.	Rejections of claims  5, 12, and 19 under 35 U.S.C. 112(a) are withdrawn in view of Applicant’s amendments.
b.	Rejection of claim 19 under 35 U.S.C. 112(b) is withdrawn in view of Applicant’s amendments.
c.	Applicant's arguments with respect to the 35 U.S.C. 103 rejections of the pending claims are moot in view of new ground(s) of rejection presented hereon, as detailed below.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Fayyad et al. (U.S. Patent No. 6,374,251 B1, hereinafter referred to as Fayyad) in view of Hassanzadeh et al. (U.S. Patent Application Publication No. 20140337331 A1, hereinafter referred to as Hassanzadeh) and Shi, Yong (Dynamic Data Mining on Multi-Dimensional Data. State University of New York at Buffalo, 2006, hereinafter referred to as Shi).
As to claim 1, Fayyad teaches a computer-implemented method comprising:
receiving, by operation of a computer system, a dataset of a plurality of data records, each of the plurality of data records comprising a plurality of features (Note: the claimed “features” are interpreted in light of the instant specification, which states the following at para. 0039: “rows of the table can represent the data records and the columns can represent the features”; see Fayyad col. 5 L 19-30: database 10 contains many records, with each record having many attributes/fields; Note: attributes/fields as taught by Fayyad correspond to the claimed "features") and one or more target variables (Note: the claimed “target variable” is interpreted in light of the instant specification, which states the following at para. 0039: “The target variable can be a label, a category, a class, or other variables such as continuous values that are to be modelled and predicted by a data mining algorithm based on the features for supervised and/or unsupervised analytical tasks.”; see Fayyad col. 2 L 7-18: clustering for data mining comprises a variable indicating cluster membership of a given data item; this variable is also referred to as a label);
dividing the dataset into a plurality of subsets (see Fayyad col. 7 L23-35 and Fig. 9: the data is divided into clusters K1, K2, and K3);
for each of the plurality of subsets, identifying a plurality of clusters and respective centroids of the plurality of clusters (see Fayyad col. 7 L23-35 and Fig. 9: for each cluster, a cluster centroid is computed; the cluster centroid is also referred to as a mean) based on key features (see Fayyad col. 7 L18-23: clustering is based on attributes of the data records);
identifying a plurality of final clusters corresponding to a plurality of final centroids, the plurality of final centroids being generated through manipulation of the respective centroids of the plurality of clusters for the each of the plurality of subsets (see Fayyad col. 11 L50 to col. 12 L16 and Fig. 7: the system performs iterations over loop 220 to iteratively update the clusters and corresponding centroids until stopping criteria are met; Note: Fayyad’s iteratively updating centroids corresponds to the claimed manipulation of the respective centroids, and Fayyad’s centroids and clusters as they exist when the stopping criteria are met correspond to the claimed final centroids and final clusters, respectively); and
for each data record in the plurality of subsets, assigning the data record to one of the plurality of final clusters based on distances between the data record and the plurality of final centroids (see Fayyad col. 7 L35-45: each data item is assigned membership in a cluster having the nearest mean, based on a distance function).
Fayyad does not appear to explicitly disclose computing a similarity metric between a first subset of features selected in a first phase and a second subset of features selected in a second phase, based on a first set of relevance measures of the first subset of features and a second set of relevance measures of the second subset of features, wherein the first set of relevance measures of the first subset of features and the second set of relevance measures of the second subset of features are calculated with respect to one or more target variables; identifying key features among the plurality of features based on the similarity metric.
However, Hassanzadeh teaches
computing a similarity metric (see Hassanzadeh para. 0016: similarity function) between a first subset of features selected in a first phase (see Hassanzadeh para. 0015-0016: token sets of attributes from different datasets are compared; Note: attributes as taught by Hassanzadeh correspond to the claimed features) and a second subset of features selected in a second phase (see Hassanzadeh para. 0015-0016: token sets of attributes are compared; Note: attributes as taught by Hassanzadeh correspond to the claimed features), based on a first set of relevance measures of the first subset of features and a second set of relevance measures of the second subset of features (see Hassanzadeh para. 0019: relevance function applied to the attributes; Note: attributes as taught by Hassanzadeh correspond to the claimed features), wherein the first set of relevance measures of the first subset of features and the second set of relevance measures of the second subset of features are calculated with respect to one or more target variables (Note: The claimed “one or more target variables” are interpreted in light of the instant specification, which states the following at para. 0039: “The target variable can be a label, a category, a class, or other variables such as continuous values that are to be modelled and predicted by a data mining algorithm based on the features for supervised and/or unsupervised analytical tasks.”; see Hassanzadeh para. 0009: each data record is labeled for analysis);
identifying key features among the plurality of features based on the similarity metric  (see Hassanzadeh para. 0016: the system identifies pairs of attributes that satisfy a predetermined similarity threshold);
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified Fayyad to include the teachings of Hassanzadeh because it enables identifying matching or related entities in the dataset (see Hassanzadeh para. 0018), providing users of a data mining system with a unified view of the data (see Hassanzadeh para. 0002).
Fayyad as modified by Hassanzadeh does not appear to explicitly disclose manipulating cluster centroids relative to boundary points and insider points to provide gravity to a center of each of the plurality of clusters.
However, Shi teaches identifying a plurality of final clusters corresponding to a plurality of final centroids, the plurality of final centroids being generated through manipulating cluster centroids of the plurality of clusters for each of a plurality of subsets  (see Shi pp. 54-56: clustering algorithm proceeds iteratively, determining clusters and centroids, until a certain threshold is reached; Note: Shi’s iteratively updating centroids corresponds to the claimed manipulation of the respective centroids, and Shi’s centroids and clusters as they exist when the threshold is met correspond to the claimed final centroids and final clusters, respectively) relative to boundary points (see Shi pp. 139-140, under the heading “Finding boundary data points,” and pp. 192-193: clusters are determined relative to boundary data points) and insider points (see Shi p. 54: at each iteration, each data point moves toward the inside of the cluster; and see Shi pp. 192-193: clusters are determined relative to center data points) to provide gravity to a center of each of the plurality of clusters (see Shi p. 92: the clustering algorithm is inspired by Newton’s Universal Law of Gravitation to optimize the inner structure of data; and see Shi p. 192: in each iteration of the clustering algorithm, data points move gradually according to the gravitation of neighboring data points).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified Fayyad as modified by Hassanzadeh to include the teachings of Shi because it provides a clustering technique that prevents dramatic movement of data points, avoiding improper movement of data points (see Shi p. 192).

As to claim 2, Fayyad as modified by Hassanzadeh and Shi teaches further comprising:
assigning a respective processor to each of the plurality of subsets (see Fayyad col. 17 L6-9: the analysis is performed by parallel processing on a computer having multiple processing units);
wherein, for each of the plurality of subsets, identifying a plurality of clusters and respective centroids of the clusters comprises identifying, by the respective processor of the each of the plurality of subsets, the plurality of clusters and respective centroids of the clusters (see Fayyad col. 8 L14-23 and Fig. 6D: the output of the analysis is stored in a data structure designated MODEL, describing a plurality of final clusters 1-K; and see Fayyad col. 17 L6-9: the analysis is performed by parallel processing on a computer having multiple processing units); and
wherein, for each data record in the plurality of subsets, assigning the data record to one of the plurality of final clusters comprises, for each data record in the plurality of subsets, assigning, by the respective processor of the each of the plurality of subsets, the data record to one of the plurality of final clusters (see Fayyad col. 7 L35-45: each data item is assigned membership in a cluster having the nearest mean, based on a distance function; and see Fayyad col. 17 L6-9: the analysis is performed by parallel processing on a computer having multiple processing units).

As to claim 3, Fayyad as modified by Hassanzadeh and Shi teaches further comprising:
selecting a respective number of data records from each of the plurality of final clusters to represent the dataset of the plurality of data records (see Fayyad col. 7 L1-9: data mining engine 12 responds to requests by sending back model summaries); and
performing a data mining algorithm based on the respective number of data records from each of the plurality of final clusters (see Fayyad col. 7 L1-9: data mining engine 12 responds to requests by sending back model summaries).

As to claim 4, Fayyad as modified by Hassanzadeh and Shi teaches wherein the respective number of data records from each of the plurality of final clusters exceeds a respective threshold or is proportional to a respective size of each of the plurality of final clusters (see Fayyad col. 9 L56-67: the system performs a threshold analysis to update the dataset).

As to claim 5, Fayyad as modified by Hassanzadeh and Shi teaches:
in the first phase, selecting the first subset of features from the plurality of features of a first subset of data records (see Hassanzadeh para. 0015-0016: sets of attributes from different datasets are compared; Note: attributes as taught by Hassanzadeh correspond to the claimed features) using a selection algorithm (see Hassanzadeh para. 0014-0015: generating and grouping tokens); and
in the second phase, selecting the second subset of features from the plurality of features of a second subset of data records (see Hassanzadeh para. 0015-0016: sets of attributes from different datasets are compared; Note: attributes as taught by Hassanzadeh correspond to the claimed features) using the selection algorithm (see Hassanzadeh para. 0014-0015: generating and grouping tokens).

As to claim 6, Fayyad as modified by Hassanzadeh and Shi teaches further comprising:
calculating a first set of relevance measures of each of the first subset of features (see Hassanzadeh para. 0019: relevance function applied to the attributes; Note: attributes as taught by Hassanzadeh correspond to the claimed features) with respect to a target variable (Note: The claimed “one or more target variables” are interpreted in light of the instant specification, which states the following at para. 0039: “The target variable can be a label, a category, a class, or other variables such as continuous values that are to be modelled and predicted by a data mining algorithm based on the features for supervised and/or unsupervised analytical tasks.”; see Hassanzadeh para. 0009: each data record is labeled for analysis); and
calculating a second set of relevance measures of each of the second subset of features (see Hassanzadeh para. 0019: relevance function applied to the attributes; Note: attributes as taught by Hassanzadeh correspond to the claimed features) with respect to a target variable (Note: The claimed “one or more target variables” are interpreted in light of the instant specification, which states the following at para. 0039: “The target variable can be a label, a category, a class, or other variables such as continuous values that are to be modelled and predicted by a data mining algorithm based on the features for supervised and/or unsupervised analytical tasks.”; see Hassanzadeh para. 0009: each data record is labeled for analysis).

As to claim 7, Fayyad as modified by Hassanzadeh and Shi teaches further comprising:
determining that the first subset of features and the second subset of features do not converge based on the similarity metric (Note: the claimed convergence of features is interpreted in light of the instant specification, which states the following at para. 0054: “the subsets of features can be considered converged if the calculated similarity metric Ln-1,n equals or exceeds a predefined threshold T”;
see Hassanzadeh para. 0016: sets of attributes from different datasets are compared until the system identifies pairs of attributes that satisfy a predetermined similarity threshold; Note: attributes as taught by Hassanzadeh correspond to the claimed features);
in response to the determination, selecting a second subset of data records with a second size from the dataset of the plurality of data records (see Hassanzadeh para. 0016: sets of attributes from different datasets are compared until the system identifies pairs of attributes that satisfy a predetermined similarity threshold; Note: attributes as taught by Hassanzadeh correspond to the claimed features), the second size of the second subset of data records being larger than a first size of a first subset of data records (see Hassanzadeh para. 0022 and Table 3: size of selected subsets is increased until a set size threshold is met);
selecting a third subset of features from the plurality of features of the second subset of data records (see Hassanzadeh para. 0016: sets of attributes from different datasets are compared until the system identifies pairs of attributes that satisfy a predetermined similarity threshold; Note: attributes as taught by Hassanzadeh correspond to the claimed features);
selecting a fourth subset of features from the plurality of features of the second subset of data records (see Hassanzadeh para. 0016: sets of attributes from different datasets are compared until the system identifies pairs of attributes that satisfy a predetermined similarity threshold; Note: attributes as taught by Hassanzadeh correspond to the claimed features);
computing a second similarity metric between the third subset of features and the fourth subset of features (see Hassanzadeh para. 0016: a set similarity function is used to compare the sets of attributes);
determining that the third subset of features and the fourth subset of features converge based on the second similarity metric (Note: the claimed convergence of features is interpreted in light of the instant specification, which states the following at para. 0054: “the subsets of features can be considered converged if the calculated similarity metric Ln-1,n equals or exceeds a predefined threshold T”;
see Hassanzadeh para. 0016: the system identifies pairs of attributes that satisfy a predetermined similarity threshold); and
in response to the determining, identifying key features of the dataset based on the third subset of features and the fourth subset of features of the second subset of data records (see Hassanzadeh para. 0016: the system identifies pairs of attributes that satisfy a predetermined similarity threshold).

As to claim 8, Fayyad teaches a non-transitory, computer-readable medium storing computer-readable instructions executable by a computer and configured to perform operations comprising (see Fayyad col. 5 L39-51 and Fig. 1: the method of the invention is carried out by a computer 20 comprising a memory 22):
receiving, by operation of a computer system, a dataset of a plurality of data records, each of the plurality of data records comprising a plurality of features (Note: the claimed “features” are interpreted in light of the instant specification, which states the following at para. 0039: “rows of the table can represent the data records and the columns can represent the features”; see Fayyad col. 5 L 19-30: database 10 contains many records, with each record having many fields; Note: fields as taught by Fayyad correspond to the claimed "features") and one or more target variables (Note: the claimed “target variable” is interpreted in light of the instant specification, which states the following at para. 0039: “The target variable can be a label, a category, a class, or other variables such as continuous values that are to be modelled and predicted by a data mining algorithm based on the features for supervised and/or unsupervised analytical tasks.”; see Fayyad col. 2 L 7-18: clustering for data mining comprises a variable indicating cluster membership of a given data item; this variable is also referred to as a label);
dividing the dataset into a plurality of subsets (see Fayyad col. 7 L23-35 and Fig. 9: the data is divided into clusters K1, K2, and K3);
for each of the plurality of subsets, identifying a plurality of clusters and respective centroids of the plurality of clusters (see Fayyad col. 7 L23-35 and Fig. 9: for each cluster, a cluster centroid is computed; the cluster centroid is also referred to as a mean) based on key features (see Fayyad col. 7 L18-23: clustering is based on attributes of the data records);
identifying a plurality of final clusters corresponding to a plurality of final centroids, the plurality of final centroids being generated through manipulation of the respective centroids of the plurality of clusters for the each of the plurality of subsets (see Fayyad col. 11 L50 to col. 12 L16 and Fig. 7: the system performs iterations over loop 220 to iteratively update the clusters and corresponding centroids until stopping criteria are met; Note: Fayyad’s iteratively updating centroids corresponds to the claimed manipulation of the respective centroids, and Fayyad’s centroids and clusters as they exist when the stopping criteria are met correspond to the claimed final centroids and final clusters, respectively); and
for each data record in the plurality of subsets, assigning the data record to one of the plurality of final clusters based on distances between the data record and the plurality of final centroids (see Fayyad col. 7 L35-45: each data item is assigned membership in a cluster having the nearest mean, based on a distance function).
Fayyad does not appear to explicitly disclose computing a similarity metric between a first subset of features selected in a first phase and a second subset of features selected in a second phase, based on a first set of relevance measures of the first subset of features and a second set of relevance measures of the second subset of features, wherein the first set of relevance measures of the first subset of features and the second set of relevance measures of the second subset of features are calculated with respect to one or more target variables; identifying key features among the plurality of features based on the similarity metric.
However, Hassanzadeh teaches
computing a similarity metric (see Hassanzadeh para. 0016: similarity function) between a first subset of features selected in a first phase (see Hassanzadeh para. 0015-0016: token sets of attributes from different datasets are compared; Note: attributes as taught by Hassanzadeh correspond to the claimed features) and a second subset of features selected in a second phase (see Hassanzadeh para. 0015-0016: token sets of attributes are compared; Note: attributes as taught by Hassanzadeh correspond to the claimed features), based on a first set of relevance measures of the first subset of features and a second set of relevance measures of the second subset of features (see Hassanzadeh para. 0019: relevance function applied to the attributes; Note: attributes as taught by Hassanzadeh correspond to the claimed features), wherein the first set of relevance measures of the first subset of features and the second set of relevance measures of the second subset of features are calculated with respect to one or more target variables (Note: The claimed “one or more target variables” are interpreted in light of the instant specification, which states the following at para. 0039: “The target variable can be a label, a category, a class, or other variables such as continuous values that are to be modelled and predicted by a data mining algorithm based on the features for supervised and/or unsupervised analytical tasks.”; see Hassanzadeh para. 0009: each data record is labeled for analysis);
identifying key features among the plurality of features based on the similarity metric  (see Hassanzadeh para. 0016: the system identifies pairs of attributes that satisfy a predetermined similarity threshold);
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified Fayyad to include the teachings of Hassanzadeh because it enables identifying matching or related entities in the dataset (see Hassanzadeh para. 0018), providing users of a data mining system with a unified view of the data (see Hassanzadeh para. 0002).
Fayyad as modified by Hassanzadeh does not appear to explicitly disclose manipulating cluster centroids relative to boundary points and insider points to provide gravity to a center of each of the plurality of clusters.
However, Shi teaches identifying a plurality of final clusters corresponding to a plurality of final centroids, the plurality of final centroids being generated through manipulating cluster centroids of the plurality of clusters for each of a plurality of subsets  (see Shi pp. 54-56: clustering algorithm proceeds iteratively, determining clusters and centroids, until a certain threshold is reached; Note: Shi’s iteratively updating centroids corresponds to the claimed manipulation of the respective centroids, and Shi’s centroids and clusters as they exist when the threshold is met correspond to the claimed final centroids and final clusters, respectively) relative to boundary points (see Shi pp. 139-140, under the heading “Finding boundary data points,” and pp. 192-193: clusters are determined relative to boundary data points) and insider points (see Shi p. 54: at each iteration, each data point moves toward the inside of the cluster; and see Shi pp. 192-193: clusters are determined relative to center data points) to provide gravity to a center of each of the plurality of clusters (see Shi p. 92: the clustering algorithm is inspired by Newton’s Universal Law of Gravitation to optimize the inner structure of data; and see Shi p. 192: in each iteration of the clustering algorithm, data points move gradually according to the gravitation of neighboring data points).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified Fayyad as modified by Hassanzadeh to include the teachings of Shi because it provides a clustering technique that prevents dramatic movement of data points, avoiding improper movement of data points (see Shi p. 192).

As to claim 9, see the rejection of claim 2 above.

As to claim 10, see the rejection of claim 3 above.

As to claim 11, see the rejection of claim 4 above.

As to claim 12, see the rejection of claim 5 above.
As to claim 13, see the rejection of claim 6 above.

As to claim 14, see the rejection of claim 7 above.

As to claim 15, Fayyad teaches a system, comprising:
a memory (see Fayyad col. 5 L39-51 and Fig. 1: the method of the invention is carried out by a computer 20 comprising a memory 22);
at least one hardware processor interoperably coupled with the memory and configured to perform operations comprising  (see Fayyad col. 5 L39-51 and Fig. 1: the method of the invention is carried out by a computer 20 comprising a processing unit 21 coupled to a memory 22):
receiving, by operation of a computer system, a dataset of a plurality of data records, each of the plurality of data records comprising a plurality of features (Note: the claimed “features” are interpreted in light of the instant specification, which states the following at para. 0039: “rows of the table can represent the data records and the columns can represent the features”; see Fayyad col. 5 L 19-30: database 10 contains many records, with each record having many fields; Note: fields as taught by Fayyad correspond to the claimed "features") and one or more target variables (Note: the claimed “target variable” is interpreted in light of the instant specification, which states the following at para. 0039: “The target variable can be a label, a category, a class, or other variables such as continuous values that are to be modelled and predicted by a data mining algorithm based on the features for supervised and/or unsupervised analytical tasks.”; see Fayyad col. 2 L 7-18: clustering for data mining comprises a variable indicating cluster membership of a given data item; this variable is also referred to as a label);
dividing the dataset into a plurality of subsets (see Fayyad col. 7 L23-35 and Fig. 9: the data is divided into clusters K1, K2, and K3);
for each of the plurality of subsets, identifying a plurality of clusters and respective centroids of the plurality of clusters (see Fayyad col. 7 L23-35 and Fig. 9: for each cluster, a cluster centroid is computed; the cluster centroid is also referred to as a mean) based on key features (see Fayyad col. 7 L18-23: clustering is based on attributes of the data records);
identifying a plurality of final clusters corresponding to a plurality of final centroids, the plurality of final centroids being generated through manipulation of the respective centroids of the plurality of clusters for the each of the plurality of subsets (see Fayyad col. 11 L50 to col. 12 L16 and Fig. 7: the system performs iterations over loop 220 to iteratively update the clusters and corresponding centroids until stopping criteria are met; Note: Fayyad’s iteratively updating centroids corresponds to the claimed manipulation of the respective centroids, and Fayyad’s centroids and clusters as they exist when the stopping criteria are met correspond to the claimed final centroids and final clusters, respectively); and
for each data record in the plurality of subsets, assigning the data record to one of the plurality of final clusters based on distances between the data record and the plurality of final centroids (see Fayyad col. 7 L35-45: each data item is assigned membership in a cluster having the nearest mean, based on a distance function).
Fayyad does not appear to explicitly disclose computing a similarity metric between a first subset of features selected in a first phase and a second subset of features selected in a second phase, based on a first set of relevance measures of the first subset of features and a second set of relevance measures of the second subset of features, wherein the first set of relevance measures of the first subset of features and the second set of relevance measures of the second subset of features are calculated with respect to one or more target variables; identifying key features among the plurality of features based on the similarity metric.
However, Hassanzadeh teaches
computing a similarity metric (see Hassanzadeh para. 0016: similarity function) between a first subset of features selected in a first phase (see Hassanzadeh para. 0015-0016: token sets of attributes from different datasets are compared; Note: attributes as taught by Hassanzadeh correspond to the claimed features) and a second subset of features selected in a second phase (see Hassanzadeh para. 0015-0016: token sets of attributes are compared; Note: attributes as taught by Hassanzadeh correspond to the claimed features), based on a first set of relevance measures of the first subset of features and a second set of relevance measures of the second subset of features (see Hassanzadeh para. 0019: relevance function applied to the attributes; Note: attributes as taught by Hassanzadeh correspond to the claimed features), wherein the first set of relevance measures of the first subset of features and the second set of relevance measures of the second subset of features are calculated with respect to one or more target variables (Note: The claimed “one or more target variables” are interpreted in light of the instant specification, which states the following at para. 0039: “The target variable can be a label, a category, a class, or other variables such as continuous values that are to be modelled and predicted by a data mining algorithm based on the features for supervised and/or unsupervised analytical tasks.”; see Hassanzadeh para. 0009: each data record is labeled for analysis);
identifying key features among the plurality of features based on the similarity metric  (see Hassanzadeh para. 0016: the system identifies pairs of attributes that satisfy a predetermined similarity threshold);
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified Fayyad to include the teachings of Hassanzadeh because it enables identifying matching or related entities in the dataset (see Hassanzadeh para. 0018), providing users of a data mining system with a unified view of the data (see Hassanzadeh para. 0002).
Fayyad as modified by Hassanzadeh does not appear to explicitly disclose manipulating cluster centroids relative to boundary points and insider points to provide gravity to a center of each of the plurality of clusters.
However, Shi teaches identifying a plurality of final clusters corresponding to a plurality of final centroids, the plurality of final centroids being generated through manipulating cluster centroids of the plurality of clusters for each of a plurality of subsets  (see Shi pp. 54-56: clustering algorithm proceeds iteratively, determining clusters and centroids, until a certain threshold is reached; Note: Shi’s iteratively updating centroids corresponds to the claimed manipulation of the respective centroids, and Shi’s centroids and clusters as they exist when the threshold is met correspond to the claimed final centroids and final clusters, respectively) relative to boundary points (see Shi pp. 139-140, under the heading “Finding boundary data points,” and pp. 192-193: clusters are determined relative to boundary data points) and insider points (see Shi p. 54: at each iteration, each data point moves toward the inside of the cluster; and see Shi pp. 192-193: clusters are determined relative to center data points) to provide gravity to a center of each of the plurality of clusters (see Shi p. 92: the clustering algorithm is inspired by Newton’s Universal Law of Gravitation to optimize the inner structure of data; and see Shi p. 192: in each iteration of the clustering algorithm, data points move gradually according to the gravitation of neighboring data points).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified Fayyad as modified by Hassanzadeh to include the teachings of Shi because it provides a clustering technique that prevents dramatic movement of data points, avoiding improper movement of data points (see Shi p. 192).

As to claim 16, see the rejection of claim 2 above.

As to claim 17, see the rejection of claim 3 above.

As to claim 18, see the rejection of claim 4 above.

As to claim 19, see the rejection of claims 5 and 6 above.

As to claim 20, see the rejection of claim 7 above.

Additional Art Considered
The prior art made of record and not relied upon is considered pertinent to the Applicants’ disclosure.
The following patents and papers are cited to further show the state of the art at the time of Applicants’ invention with respect to clustering algorithms that provide gravity to cluster centers.
a.	Kundu, Sukhamay. "Gravitational clustering: a new approach based on the spatial distribution of the points." Pattern recognition 32.7 (1999): 1149-1160.
Teaches a clustering algorithm in which clusters having gravitational force (see section 3 “The new method”, pp. 1152-1153).
b.	Gomez, Jonatan, Dipankar Dasgupta, and Olfa Nasraoui. "A new gravitational clustering algorithm." Proceedings of the 2003 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2003. pp. 83-94.
Teaches a clustering algorithm in which clusters having gravitational force (see section 2 “Proposed Approach”, pp. 86-87).
c.	Zhi, Wang Gui. "Clustering-boundary-detection algorithm based on center-of-gravity of neighborhood." TELKOMNIKA Indonesian Journal of Electrical Engineering 11.12 (2013): 7302-7308. pp. 7302-7308.
Teaches a clustering algorithm in which neighborhood clusters have a center of gravity (see section 2 “S-Bound Algorithm”, pp. 7303-7305).

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to UMAR MIAN whose telephone number is (571) 270-3970.  The examiner can normally be reached on Monday to Friday, 10 am to 6:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on (571) 272-4078.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/UM/
Examiner, Art Unit 2163                                                                                                                                                                                            


/TONY MAHMOUDI/Supervisory Patent Examiner, Art Unit 2163