DETAILED ACTION
This office action is in response to the above identified application filed on July 17, 2019. The application contains claims 1-30. 
Claims 1-30 are pending

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on April 16, 2020. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5-9, 11-13, 15-19, 21-23, and 25-29 are rejected under 35 U.S.C. 103 as being unpatentable over JACKSON (US 20190259041 A1), in view of Vij et al. (US 20190244253 A1), and in further view of Roy et al. (US 10354201 B1).

With regard to claim 1,
JACKSON teaches
a method for identifying subpopulations (Abstract), comprising: 
receiving, with at least one processor ([0033]: a processor), interaction data associated with a plurality of interactions from a population of individuals, the interaction data for each individual comprising a plurality of features (Fig. 2; [0055]: collect information or data that identifies interactions by users or other entities, and the interactions are associated with attributes or characteristic, wherein attributes or characteristics correspond to “features”); 
identifying, with at least one processor, a first subpopulation of the population based on at least one feature of respective interaction data of each respective individual in the first subpopulation, wherein a second subpopulation of the population comprises all individuals of the population other than the first subpopulation ([0050]-[0051]: prepare an interior dataset and an exterior dataset, wherein users associated with the interior dataset corresponds to "a first subpopulation", users associated with the exterior dataset corresponds to "a second subpopulation", whether or not a user is interior or exterior corresponds to "at least one feature", and the exterior dataset only comprises users other than the interior users associated with the interior dataset); 
JACKSON does not explicitly teach
clustering, with at least one processor, the first subpopulation into a first plurality of clusters based on the plurality of features; 
determining, with at least one processor, a first subset of the plurality of features based on the first plurality of clusters; 
clustering, with at least one processor, the first subpopulation into a second plurality of clusters based on the first subset of the plurality of features; 
determining, with at least one processor, a range for each feature of a second subset of the plurality of features based on the second plurality of clusters; and 
determining, with at least one processor, a subset of the second subpopulation based on respective interaction data for each respective individual of the subset of the second subpopulation and the range for each respective feature of the second subset of the plurality of features.
Vij teaches
clustering, with at least one processor (Fig. 3, Processor 320), the first subpopulation into a first plurality of clusters based on the plurality of features (Fig. 1A; [0017]; Fig. 1B, 120; [0023]: use a clustering analysis to cluster a population of individuals based on profile information associated with each individual, wherein the population of individuals corresponds to “the first subpopulation”, properties included in each individual’s profile that are determined to be a set of features corresponds to “the plurality of features”, and “a first plurality of clusters” is the natural result produced by the clustering analysis); 
determining, with at least one processor, a first subset of the plurality of features based on the first plurality of clusters (Fig. 1B, 125; [0025]-[0028]: select one or more subsets of features from the set of features determined in the clustering analysis. [0108]; [0118]: repeat the process to determine more subsets of features until a final set of features is determined); 
clustering, with at least one processor, the first subpopulation into a second plurality of clusters based on the first subset of the plurality of features ([0025]; Fig. 2; [0110]: use machine learning models trained by the subsets of features, e.g., a K-means clustering analysis, to cluster individuals into particular clusters associated with particular profile information values); 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified JACKSON to incorporate the teachings of Vij to use the clustering analysis to determine features, select one or more subsets of features to train a set of machine learning models, and use trained machine learning models to identify prospective users. Doing so would conserve processing resources by selecting optimal subsets of features that are more effective in identifying prospective targets relative to an inferior platform that does not select optimal subsets of features as taught by Vij ([0025]).
JACKSON and Vij do not explicitly teach
determining, with at least one processor, a range for each feature of a second subset of the plurality of features based on the second plurality of clusters; and 
determining, with at least one processor, a subset of the second subpopulation based on respective interaction data for each respective individual of the subset of the second subpopulation and the range for each respective feature of the second subset of the plurality of features.
Roy teaches
determining, with at least one processor (Fig. 12, Processor 9010a-9010n), a range for each feature of a second subset of the plurality of features based on the second plurality of clusters (Fig. 6; Col. 13, lines 26-55: determine an attribute-type-dependent distance metric and an attribute-type-dependent normalization factor for each attribute, wherein attributes Attr1, Attr2, Attr3, and Attr4 correspond to "a second subset” of the features in the cluster model that has been trained to generate “the second plurality of clusters”, and attribute-type-dependent normalization factor corresponds to “a range” for each feature); and 
determining, with at least one processor, a subset of the second subpopulation based on respective interaction data for each respective individual of the subset of the second subpopulation and the range for each respective feature of the second subset of the plurality of features (Fig. 7; Col. 13, lines 56-67; Col. 14, lines 1-41; Col. 1, lines 18-37: assign observation records OR1 and OR2 to Cluster1 and OR3 and OR4 to Cluster2 based on the distances of each observation record to each cluster representative that is computed based on attribute-type-dependent normalization factor of each attribute in each particular observation record, wherein customers associated with all the observations records correspond to “the second subpopulation”, observation records associated with customer purchases or customers' web-page browsing behavior correspond to “respective interaction data”, and customers associated with the observation records assigned to a particular cluster correspond to “a subset of the second subpopulation”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified JACKSON and Vij to incorporate the teachings of Roy to determine a range for each feature of a second subset of the plurality of features based on the second plurality of clusters and determine a subset of the second subpopulation based on respective interaction data for each respective individual of the subset of the second subpopulation and the range for each respective feature of the second subset of the plurality of features. Doing so would cluster a set of observation records, for example, observation records associated with customer purchases or customers' web-page browsing behavior, into multiple homogeneous groups or clusters based on similarities among the observations, to identify targets for customized sales promotions, advertising, recommendations of products likely to be of interest, and so on as taught by Roy (Col. 1, lines 18-26).

With regard to claim 2,
	As discussed regarding claim 1, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the method of claim 1, wherein clustering the first subpopulation into the first plurality of clusters comprises clustering, with at least one processor, the first subpopulation into the first plurality of clusters based on the plurality of features using at least one of unsupervised clustering or k-means clustering ([0023]: a clustering analysis; [0125]: an unsupervised training technique; [0108]: principal component analysis (PCA) is a type of “unsupervised clustering”), and 
wherein clustering the first subpopulation into the second plurality of clusters comprises clustering, with at least one processor, the first subpopulation into the second plurality of clusters based on the first subset of the plurality of features using at least one of unsupervised clustering or k-means clustering ([0110]: a K-means clustering analysis).

With regard to claim 3,
	As discussed regarding claim 1, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the method of claim 1, wherein determining the first subset of the plurality of features comprises determining, with at least one processor, the first subset of the plurality of features based on the first plurality of clusters using at least one of a tree classifier or a random forest tree classifier ([0116]-[0117]: random forest (VSURF) analysis that includes decision trees).

With regard to claim 5,
	As discussed regarding claim 1, JACKSON and Vij and Roy teach all the limitations therein.
Roy further teaches
the method of claim 1, wherein determining the subset of the second subpopulation comprises determining, with at least one processor, the subset of the second subpopulation based on each respective feature of the interaction data for each respective individual of the subset of the second subpopulation being within the range for each respective feature of the second subset of the plurality of features (Fig. 6; Col. 13, lines 26-55; Fig. 7; Col. 13, lines 56-67; Col. 14, lines 1-41: assign observation records OR1 and OR2 to Cluster1 and OR3 and OR4 to Cluster2 based on each respective attribute of Attr1, Attr2, Attr3, and Attr4 for each observation record being within the designated attribute-type-dependent normalization factor, i.e., “range”).

With regard to claim 6,
	As discussed regarding claim 1, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the method of claim 1, wherein receiving interaction data comprises: 
receiving, with at least one processor, payment transaction data associated with a plurality of payment transactions from the population of individuals (Fig. 1A; [0017]-[0018]; [0020]: activity information associated with a population of individuals, for example, financial information, corresponds to “payment transaction data”); 
receiving, with at least one processor, demographic data associated with demographics of each individual of the population of individuals (Fig. 1A; [0017]-[0018]; [0020]: personal information associated with a population of individuals, for example, a date of birth value, corresponds to “demographic data”); and 
combining, with at least one processor, the payment transaction data and the demographic data for each individual to form at least part of the interaction data for each individual (Fig. 1A; [0017]-[0018]; [0020]: associate the personal information obtained from a first data source and the activity information obtained from one or more additional data sources to generate profile information of each individual in the group of individuals).

With regard to claim 7,
	As discussed regarding claim 1, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the method of claim 1, further comprising, after clustering the first subpopulation into the second plurality of clusters, determining a number of features of the second subset of the plurality of features is within a desired range ([0118]: select a threshold number of generalizable features, wherein a threshold number of features corresponds to “a number of features … is within a desired range”).

With regard to claim 8,
	As discussed regarding claim 7, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the method of claim 7, wherein the desired range is less than a predetermined threshold ([0118]: select a threshold number of generalizable features. Since the number of features starts very large and decreases with each iteration, the threshold number is put in place to ensure the subset of features is within a manageable range thus corresponds to “the desired range is less than a predetermined threshold”).

With regard to claim 9,
	As discussed regarding claim 7, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the method of claim 7, wherein determining the number of features of the second subset of the plurality of features is within the desired range comprises: 
determining, with at least one processor, a variance explained by each feature of the second subset of the plurality of features exceeds a threshold ([0117]: select, as features to be used to train the machine learning models, variables that satisfy a threshold level of importance, wherein a threshold level of importance corresponds to “a variance explained by each feature”).

With regard to claim 11,
JACKSON teaches
a system for identifying subpopulations (Abstract), comprising: 
at least one processor ([0033]: a processor); and 
at least one non-transitory computer readable medium comprising instructions to direct the at least one processor to: 
receive interaction data associated with a plurality of interactions from a population of individuals, the interaction data for each individual comprising a plurality of features (Fig. 2; [0055]: collect information or data that identifies interactions by users or other entities, and the interactions are associated with attributes or characteristic, wherein attributes or characteristics correspond to “features”); 
identify a first subpopulation of the population based on at least one feature of respective interaction data of each respective individual in the first subpopulation, wherein a second subpopulation of the population comprises all individuals of the population other than the first subpopulation ([0050]-[0051]: prepare an interior dataset and an exterior dataset, wherein users associated with the interior dataset corresponds to "a first subpopulation", users associated with the exterior dataset corresponds to "a second subpopulation", whether or not a user is interior or exterior corresponds to "at least one feature", and the exterior dataset only comprises users other than the interior users associated with the interior dataset); 
JACKSON does not explicitly teach
cluster the first subpopulation into a first plurality of clusters based on the plurality of features; 
determine a first subset of the plurality of features based on the first plurality of clusters; 
cluster the first subpopulation into a second plurality of clusters based on the first subset of the plurality of features; 
determine a range for each feature of a second subset of the plurality of features based on the second plurality of clusters; and 
determine a subset of the second subpopulation based on respective interaction data for each respective individual of the subset of the second subpopulation and the range for each respective feature of the second subset of the plurality of features.
Vij teaches
cluster the first subpopulation into a first plurality of clusters based on the plurality of features (Fig. 1A; [0017]; Fig. 1B, 120; [0023]: use a clustering analysis to cluster a population of individuals based on profile information associated with each individual, wherein the population of individuals corresponds to “the first subpopulation”, properties included in each individual’s profile that are determined to be a set of features corresponds to “the plurality of features”, and “a first plurality of clusters” is the natural result produced by the clustering analysis); 
determine a first subset of the plurality of features based on the first plurality of clusters (Fig. 1B, 125; [0025]-[0028]: select one or more subsets of features from the set of features determined in the clustering analysis. [0108]; [0118]: repeat the process to determine more subsets of features until a final set of features is determined); 
cluster the first subpopulation into a second plurality of clusters based on the first subset of the plurality of features ([0025]; Fig. 2; [0110]: use machine learning models trained by the subsets of features, e.g., a K-means clustering analysis, to cluster individuals into particular clusters associated with particular profile information values); 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified JACKSON to incorporate the teachings of Vij to use the clustering analysis to determine features, select one or more subsets of features to train a set of machine learning models, and use trained machine learning models to identify prospective users. Doing so would conserve processing resources by selecting optimal subsets of features that are more effective in identifying prospective targets relative to an inferior platform that does not select optimal subsets of features as taught by Vij ([0025]).
JACKSON and Vij do not explicitly teach
determine a range for each feature of a second subset of the plurality of features based on the second plurality of clusters; and 
determine a subset of the second subpopulation based on respective interaction data for each respective individual of the subset of the second subpopulation and the range for each respective feature of the second subset of the plurality of features.
Roy teaches
determine a range for each feature of a second subset of the plurality of features based on the second plurality of clusters (Fig. 6; Col. 13, lines 26-55: determine an attribute-type-dependent distance metric and an attribute-type-dependent normalization factor for each attribute, wherein attributes Attr1, Attr2, Attr3, and Attr4 correspond to "a second subset” of the features in the cluster model that has been trained to generate “the second plurality of clusters”, and attribute-type-dependent normalization factor corresponds to “a range” for each feature); and 
determine a subset of the second subpopulation based on respective interaction data for each respective individual of the subset of the second subpopulation and the range for each respective feature of the second subset of the plurality of features (Fig. 7; Col. 13, lines 56-67; Col. 14, lines 1-41; Col. 1, lines 18-37: assign observation records OR1 and OR2 to Cluster1 and OR3 and OR4 to Cluster2 based on the distances of each observation record to each cluster representative that is computed based on attribute-type-dependent normalization factor of each attribute in each particular observation record, wherein customers associated with all the observations records correspond to “the second subpopulation”, observation records associated with customer purchases or customers' web-page browsing behavior correspond to “respective interaction data”, and customers associated with the observation records assigned to a particular cluster correspond to “a subset of the second subpopulation”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified JACKSON and Vij to incorporate the teachings of Roy to determine a range for each feature of a second subset of the plurality of features based on the second plurality of clusters and determine a subset of the second subpopulation based on respective interaction data for each respective individual of the subset of the second subpopulation and the range for each respective feature of the second subset of the plurality of features. Doing so would cluster a set of observation records, for example, observation records associated with customer purchases or customers' web-page browsing behavior, into multiple homogeneous groups or clusters based on similarities among the observations, to identify targets for customized sales promotions, advertising, recommendations of products likely to be of interest, and so on as taught by Roy (Col. 1, lines 18-26).

With regard to claim 12,
	As discussed regarding claim 11, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the system of claim 11, wherein clustering the first subpopulation into the first plurality of clusters comprises clustering the first subpopulation into the first plurality of clusters based on the plurality of features using at least one of unsupervised clustering or k-means clustering ([0023]: a clustering analysis; [0125]: an unsupervised training technique; [0108]: principal component analysis (PCA) is a type of “unsupervised clustering”), and 
wherein clustering the first subpopulation into the second plurality of clusters comprises clustering the first subpopulation into the second plurality of clusters based on the first subset of the plurality of features using at least one of unsupervised clustering or k-means clustering ([0110]: a K-means clustering analysis).

With regard to claim 13,
	As discussed regarding claim 11, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the system of claim 11, wherein determining the first subset of the plurality of features comprises determining the first subset of the plurality of features based on the first plurality of clusters using at least one of a tree classifier or a random forest tree classifier ([0116]-[0117]: random forest (VSURF) analysis that includes decision trees).

With regard to claim 15,
	As discussed regarding claim 11, JACKSON and Vij and Roy teach all the limitations therein.
Roy further teaches
the system of claim 11, wherein determining the subset of the second subpopulation comprises determining the subset of the second subpopulation based on each respective feature of the interaction data for each respective individual of the subset of the second subpopulation being within the range for each respective feature of the second subset of the plurality of features (Fig. 6; Col. 13, lines 26-55; Fig. 7; Col. 13, lines 56-67; Col. 14, lines 1-41: assign observation records OR1 and OR2 to Cluster1 and OR3 and OR4 to Cluster2 based on each respective attribute of Attr1, Attr2, Attr3, and Attr4 for each observation record being within the designated attribute-type-dependent normalization factor, i.e., “range”).

With regard to claim 16,
	As discussed regarding claim 11, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the system of claim 11, wherein receiving interaction data comprises receiving payment transaction data associated with a plurality of payment transactions from the population of individuals (Fig. 1A; [0017]-[0018]; [0020]: activity information associated with a population of individuals, for example, financial information, corresponds to “payment transaction data”), receiving demographic data associated with demographics of each individual of the population of individuals (Fig. 1A; [0017]-[0018]; [0020]: personal information associated with a population of individuals, for example, a date of birth value, corresponds to “demographic data”), and combining the payment transaction data and the demographic data for each individual to form at least part of the interaction data for each individual (Fig. 1A; [0017]-[0018]; [0020]: associate the personal information obtained from a first data source and the activity information obtained from one or more additional data sources to generate profile information of each individual in the group of individuals).

With regard to claim 17,
	As discussed regarding claim 11, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the system of claim 11, wherein the instructions further direct the at least one processor to, after clustering the first subpopulation into the second plurality of clusters, determine a number of features of the second subset of the plurality of features is within a desired range ([0118]: select a threshold number of generalizable features, wherein a threshold number of features corresponds to “a number of features … is within a desired range”).

With regard to claim 18,
	As discussed regarding claim 17, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the system of claim 17, wherein the desired range is less than a predetermined threshold ([0118]: select a threshold number of generalizable features. Since the number of features starts very large and decreases with each iteration, the threshold number is put in place to ensure the subset of features is within a manageable range thus corresponds to “the desired range is less than a predetermined threshold”).

With regard to claim 19,
	As discussed regarding claim 17, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the system of claim 17, wherein determining the number of features of the second subset of the plurality of features is within the desired range comprises determining a variance explained by each feature of the second subset of the plurality of features exceeds a threshold ([0117]: select, as features to be used to train the machine learning models, variables that satisfy a threshold level of importance, wherein a threshold level of importance corresponds to “a variance explained by each feature”).

With regard to claim 21,
JACKSON teaches
a computer program product for identifying subpopulations (Abstract), the computer program product comprising at least one non-transitory computer-readable medium including one or more instructions that, when executed by at least one processor ([0033]: a processor), cause the at least one processor to: 
receive interaction data associated with a plurality of interactions from a population of individuals, the interaction data for each individual comprising a plurality of features (Fig. 2; [0055]: collect information or data that identifies interactions by users or other entities, and the interactions are associated with attributes or characteristic, wherein attributes or characteristics correspond to “features”); 
identify a first subpopulation of the population based on at least one feature of respective interaction data of each respective individual in the first subpopulation, wherein a second subpopulation of the population comprises all individuals of the population other than the first subpopulation ([0050]-[0051]: prepare an interior dataset and an exterior dataset, wherein users associated with the interior dataset corresponds to "a first subpopulation", users associated with the exterior dataset corresponds to "a second subpopulation", whether or not a user is interior or exterior corresponds to "at least one feature", and the exterior dataset only comprises users other than the interior users associated with the interior dataset); 
JACKSON does not explicitly teach
cluster the first subpopulation into a first plurality of clusters based on the plurality of features; 
determine a first subset of the plurality of features based on the first plurality of clusters; 
cluster the first subpopulation into a second plurality of clusters based on the first subset of the plurality of features; 
determine a range for each feature of a second subset of the plurality of features based on the second plurality of clusters; and 
determine a subset of the second subpopulation based on respective interaction data for each respective individual of the subset of the second subpopulation and the range for each respective feature of the second subset of the plurality of features.
Vij teaches
cluster the first subpopulation into a first plurality of clusters based on the plurality of features (Fig. 1A; [0017]; Fig. 1B, 120; [0023]: use a clustering analysis to cluster a population of individuals based on profile information associated with each individual, wherein the population of individuals corresponds to “the first subpopulation”, properties included in each individual’s profile that are determined to be a set of features corresponds to “the plurality of features”, and “a first plurality of clusters” is the natural result produced by the clustering analysis); 
determine a first subset of the plurality of features based on the first plurality of clusters (Fig. 1B, 125; [0025]-[0028]: select one or more subsets of features from the set of features determined in the clustering analysis. [0108]; [0118]: repeat the process to determine more subsets of features until a final set of features is determined); 
cluster the first subpopulation into a second plurality of clusters based on the first subset of the plurality of features ([0025]; Fig. 2; [0110]: use machine learning models trained by the subsets of features, e.g., a K-means clustering analysis, to cluster individuals into particular clusters associated with particular profile information values); 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified JACKSON to incorporate the teachings of Vij to use the clustering analysis to determine features, select one or more subsets of features to train a set of machine learning models, and use trained machine learning models to identify prospective users. Doing so would conserve processing resources by selecting optimal subsets of features that are more effective in identifying prospective targets relative to an inferior platform that does not select optimal subsets of features as taught by Vij ([0025]).
JACKSON and Vij do not explicitly teach
determine a range for each feature of a second subset of the plurality of features based on the second plurality of clusters; and 
determine a subset of the second subpopulation based on respective interaction data for each respective individual of the subset of the second subpopulation and the range for each respective feature of the second subset of the plurality of features.
Roy teaches
determine a range for each feature of a second subset of the plurality of features based on the second plurality of clusters (Fig. 6; Col. 13, lines 26-55: determine an attribute-type-dependent distance metric and an attribute-type-dependent normalization factor for each attribute, wherein attributes Attr1, Attr2, Attr3, and Attr4 correspond to "a second subset” of the features in the cluster model that has been trained to generate “the second plurality of clusters”, and attribute-type-dependent normalization factor corresponds to “a range” for each feature); and 
determine a subset of the second subpopulation based on respective interaction data for each respective individual of the subset of the second subpopulation and the range for each respective feature of the second subset of the plurality of features (Fig. 7; Col. 13, lines 56-67; Col. 14, lines 1-41; Col. 1, lines 18-37: assign observation records OR1 and OR2 to Cluster1 and OR3 and OR4 to Cluster2 based on the distances of each observation record to each cluster representative that is computed based on attribute-type-dependent normalization factor of each attribute in each particular observation record, wherein customers associated with all the observations records correspond to “the second subpopulation”, observation records associated with customer purchases or customers' web-page browsing behavior correspond to “respective interaction data”, and customers associated with the observation records assigned to a particular cluster correspond to “a subset of the second subpopulation”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified JACKSON and Vij to incorporate the teachings of Roy to determine a range for each feature of a second subset of the plurality of features based on the second plurality of clusters and determine a subset of the second subpopulation based on respective interaction data for each respective individual of the subset of the second subpopulation and the range for each respective feature of the second subset of the plurality of features. Doing so would cluster a set of observation records, for example, observation records associated with customer purchases or customers' web-page browsing behavior, into multiple homogeneous groups or clusters based on similarities among the observations, to identify targets for customized sales promotions, advertising, recommendations of products likely to be of interest, and so on as taught by Roy (Col. 1, lines 18-26).

With regard to claim 22,
	As discussed regarding claim 21, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the computer program product of claim 21, wherein clustering the first subpopulation into the first plurality of clusters comprises clustering the first subpopulation into the first plurality of clusters based on the plurality of features using at least one of unsupervised clustering or k-means clustering ([0023]: a clustering analysis; [0125]: an unsupervised training technique; [0108]: principal component analysis (PCA) is a type of “unsupervised clustering”), and 
wherein clustering the first subpopulation into the second plurality of clusters comprises clustering the first subpopulation into the second plurality of clusters based on the first subset of the plurality of features using at least one of unsupervised clustering or k-means clustering ([0110]: a K-means clustering analysis).

With regard to claim 23,
	As discussed regarding claim 21, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the computer program product of claim 21, wherein determining the first subset of the plurality of features comprises determining the first subset of the plurality of features based on the first plurality of clusters using at least one of a tree classifier or a random forest tree classifier ([0116]-[0117]: random forest (VSURF) analysis that includes decision trees).

With regard to claim 25,
	As discussed regarding claim 21, JACKSON and Vij and Roy teach all the limitations therein.
Roy further teaches
the computer program product of claim 21, wherein determining the subset of the second subpopulation comprises determining the subset of the second subpopulation based on each respective feature of the interaction data for each respective individual of the subset of the second subpopulation being within the range for each respective feature of the second subset of the plurality of features (Fig. 6; Col. 13, lines 26-55; Fig. 7; Col. 13, lines 56-67; Col. 14, lines 1-41: assign observation records OR1 and OR2 to Cluster1 and OR3 and OR4 to Cluster2 based on each respective attribute of Attr1, Attr2, Attr3, and Attr4 for each observation record being within the designated attribute-type-dependent normalization factor, i.e., “range”).

With regard to claim 26,
	As discussed regarding claim 21, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the computer program product of claim 21, wherein receiving interaction data comprises receiving payment transaction data associated with a plurality of payment transactions from the population of individuals (Fig. 1A; [0017]-[0018]; [0020]: activity information associated with a population of individuals, for example, financial information, corresponds to “payment transaction data”), receiving demographic data associated with demographics of each individual of the population of individuals (Fig. 1A; [0017]-[0018]; [0020]: personal information associated with a population of individuals, for example, a date of birth value, corresponds to “demographic data”), and combining the payment transaction data and the demographic data for each individual to form at least part of the interaction data for each individual (Fig. 1A; [0017]-[0018]; [0020]: associate the personal information obtained from a first data source and the activity information obtained from one or more additional data sources to generate profile information of each individual in the group of individuals).

With regard to claim 27,
	As discussed regarding claim 21, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the computer program product of claim 21, wherein the instructions further direct the at least one processor to, after clustering the first subpopulation into the second plurality of clusters, determine a number of features of the second subset of the plurality of features is within a desired range ([0118]: select a threshold number of generalizable features, wherein a threshold number of features corresponds to “a number of features … is within a desired range”).

With regard to claim 28,
	As discussed regarding claim 27, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the computer program product of claim 27, wherein the desired range is less than a predetermined threshold ([0118]: select a threshold number of generalizable features. Since the number of features starts very large and decreases with each iteration, the threshold number is put in place to ensure the subset of features is within a manageable range thus corresponds to “the desired range is less than a predetermined threshold”).

With regard to claim 29,
	As discussed regarding claim 27, JACKSON and Vij and Roy teach all the limitations therein.
Vij further teaches
the computer program product of claim 27, wherein determining the number of features of the second subset of the plurality of features is within the desired range comprises determining a variance explained by each feature of the second subset of the plurality of features exceeds a threshold ([0117]: select, as features to be used to train the machine learning models, variables that satisfy a threshold level of importance, wherein a threshold level of importance corresponds to “a variance explained by each feature”).

Claims 4, 14, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over JACKSON (US 20190259041 A1), in view of Vij et al. (US 20190244253 A1), and in further view of Roy et al. (US 10354201 B1) and Yeturu (US 10916333 B1).

With regard to claim 4,
	As discussed regarding claim 1, JACKSON and Vij and Roy teach all the limitations therein.
JACKSON and Vij and Roy do not teach
the method of claim 1, wherein determining the range comprises: 
determining, with at least one processor, a mean and a standard deviation for each feature of the second subset of the plurality of features; and 
determining, with at least one processor, a range for each respective feature of the second subset of the plurality of features based on a predefined multiple of the standard deviation of the respective feature above and below the mean of the respective feature.
Yeturu teaches
the method of claim 1, wherein determining the range comprises: 
determining, with at least one processor, a mean and a standard deviation for each feature of the second subset of the plurality of features (Col. 11, lines 16-36; Col. 12, lines 17-45; Col. 14, lines 43-50: determine a mean and a standard deviation of the mean for each feature); and 
determining, with at least one processor, a range for each respective feature of the second subset of the plurality of features based on a predefined multiple of the standard deviation of the respective feature above and below the mean of the respective feature (Col. 11, lines 16-36; Col. 12, lines 17-45; Col. 14, lines 43-50: identify data points for inclusion in a training data set, e.g., within one standard deviation of the mean, within two standard deviations of the mean, etc.. “above and below the mean” is inherently taught by standard deviation).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified JACKSON and Vij and Roy to incorporate the teachings of Yeturu to determine a mean and a standard deviation for each feature of the second subset of the plurality of features and determine a range for each respective feature of the second subset of the plurality of features based on a predefined multiple of the standard deviation of the respective feature above and below the mean of the respective feature. Doing so would make it convenient to train the machine learning model to ensure the classifier's results meet a quality threshold, because if one standard deviation of the mean includes an inadequate number of observations, a subsequent iteration of training may be attempted with two standard deviations of the means of the Gaussians, and so on until the quality of classification results meets a threshold as taught by Yeturu (Col. 11, lines 16-36).

With regard to claim 14,
	As discussed regarding claim 11, JACKSON and Vij and Roy teach all the limitations therein.
JACKSON and Vij and Roy do not teach
the system of claim 11, wherein determining the range comprises determining a mean and a standard deviation for each feature of the second subset of the plurality of features; and determining a range for each respective feature of the second subset of the plurality of features based on a predefined multiple of the standard deviation of the respective feature above and below the mean of the respective feature.
Yeturu teaches
the system of claim 11, wherein determining the range comprises determining a mean and a standard deviation for each feature of the second subset of the plurality of features (Col. 11, lines 16-36; Col. 12, lines 17-45; Col. 14, lines 43-50: determine a mean and a standard deviation of the mean for each feature); and determining a range for each respective feature of the second subset of the plurality of features based on a predefined multiple of the standard deviation of the respective feature above and below the mean of the respective feature (Col. 11, lines 16-36; Col. 12, lines 17-45; Col. 14, lines 43-50: identify data points for inclusion in a training data set, e.g., within one standard deviation of the mean, within two standard deviations of the mean, etc.. “above and below the mean” is inherently taught by standard deviation).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified JACKSON and Vij and Roy to incorporate the teachings of Yeturu to determine a mean and a standard deviation for each feature of the second subset of the plurality of features and determine a range for each respective feature of the second subset of the plurality of features based on a predefined multiple of the standard deviation of the respective feature above and below the mean of the respective feature. Doing so would make it convenient to train the machine learning model to ensure the classifier's results meet a quality threshold, because if one standard deviation of the mean includes an inadequate number of observations, a subsequent iteration of training may be attempted with two standard deviations of the means of the Gaussians, and so on until the quality of classification results meets a threshold as taught by Yeturu (Col. 11, lines 16-36).

With regard to claim 24,
	As discussed regarding claim 21, JACKSON and Vij and Roy teach all the limitations therein.
JACKSON and Vij and Roy do not teach
the computer program product of claim 21, wherein determining the range comprises determining a mean and a standard deviation for each feature of the second subset of the plurality of features; and determining a range for each respective feature of the second subset of the plurality of features based on a predefined multiple of the standard deviation of the respective feature above and below the mean of the respective feature.
Yeturu teaches
the computer program product of claim 21, wherein determining the range comprises determining a mean and a standard deviation for each feature of the second subset of the plurality of features (Col. 11, lines 16-36; Col. 12, lines 17-45; Col. 14, lines 43-50: determine a mean and a standard deviation of the mean for each feature); and determining a range for each respective feature of the second subset of the plurality of features based on a predefined multiple of the standard deviation of the respective feature above and below the mean of the respective feature (Col. 11, lines 16-36; Col. 12, lines 17-45; Col. 14, lines 43-50: identify data points for inclusion in a training data set, e.g., within one standard deviation of the mean, within two standard deviations of the mean, etc.. “above and below the mean” is inherently taught by standard deviation).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified JACKSON and Vij and Roy to incorporate the teachings of Yeturu to determine a mean and a standard deviation for each feature of the second subset of the plurality of features and determine a range for each respective feature of the second subset of the plurality of features based on a predefined multiple of the standard deviation of the respective feature above and below the mean of the respective feature. Doing so would make it convenient to train the machine learning model to ensure the classifier's results meet a quality threshold, because if one standard deviation of the mean includes an inadequate number of observations, a subsequent iteration of training may be attempted with two standard deviations of the means of the Gaussians, and so on until the quality of classification results meets a threshold as taught by Yeturu (Col. 11, lines 16-36).

Claims 10, 20, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over JACKSON (US 20190259041 A1), in view of Vij et al. (US 20190244253 A1), and in further view of Roy et al. (US 10354201 B1) and Deolalikar et al. (US 20160085811 A1).

With regard to claim 10,
	As discussed regarding claim 1, JACKSON and Vij and Roy teach all the limitations therein.
JACKSON and Vij and Roy do not explicitly teach
the method of claim 1, further comprising, after clustering the first subpopulation into the second plurality of clusters and before determining the range: 
determining, with at least one processor, a number of features of the second subset of the plurality of features is outside a desired range; 
determining, with at least one processor, a further subset of the plurality of features based on the second plurality of clusters; 
clustering, with at least one processor, the first subpopulation into a further plurality of clusters based on the further subset of the plurality of features; 
repeating, with at least one processor, determining the further subset of the plurality of features and clustering the first subpopulation into the further plurality of clusters until a number of features of the further subset of the plurality of features is within a desired range; and 
replacing, with at least one processor, the second subset of the plurality of features with the further subset of the plurality of features and the second plurality of clusters with the further plurality of clusters.
Deolalikar teaches
the method of claim 1, further comprising, after clustering the first subpopulation into the second plurality of clusters and before determining the range: 
determining, with at least one processor, a number of features of the second subset of the plurality of features is outside a desired range (Fig. 5; [0041]: determine a convergence threshold is not met, wherein the convergence threshold being not met corresponds to the number of features added to the feature set for clustering in a given iteration is above a threshold, i.e., “a number of features … is outside a desired range”); 
determining, with at least one processor, a further subset of the plurality of features based on the second plurality of clusters (Fig. 5; [0041]: select a plurality of features based on the plurality of clusters and add the plurality of features to a feature set for clustering); 
clustering, with at least one processor, the first subpopulation into a further plurality of clusters based on the further subset of the plurality of features (Fig. 5; [0041]: cluster the plurality of samples into a plurality of clusters based on the updated feature set); 
repeating, with at least one processor, determining the further subset of the plurality of features and clustering the first subpopulation into the further plurality of clusters until a number of features of the further subset of the plurality of features is within a desired range (Fig. 5; [0041]: iterate feature selection and clustering until a convergence threshold is met); and 
replacing, with at least one processor, the second subset of the plurality of features with the further subset of the plurality of features and the second plurality of clusters with the further plurality of clusters (Fig. 5; [0041]: when a convergence threshold is met and the above iteration ends, the resultant feature set replaces all previous feature sets and is used to cluster the plurality of sample data into a new plurality of clusters).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified JACKSON and Vij and Roy to incorporate the teachings of Deolalikar to determine a number of features of the second subset of the plurality of features is outside a desired range, determine a further subset of the plurality of features based on the second plurality of clusters, cluster the first subpopulation into a further plurality of clusters based on the further subset of the plurality of features, repeat determining the further subset of the plurality of features and clustering the first subpopulation into the further plurality of clusters until a number of features of the further subset of the plurality of features is within a desired range, and replace the second subset of the plurality of features with the further subset of the plurality of features and the second plurality of clusters with the further plurality of clusters. Doing so would reduce primary memory usage due to the smaller number of features, enable a clustering operation to be performed more efficiently over the entire data set. In addition, because the feature set is generated using clusters generated by the same clustering algorithm, the feature set may be tailored specifically for that clustering algorithm, which may result in improved clustering as taught by Deolalikar ([0010]).

With regard to claim 20,
	As discussed regarding claim 11, JACKSON and Vij and Roy teach all the limitations therein.
JACKSON and Vij and Roy do not explicitly teach
the system of claim 11, wherein the instructions further direct the at least one processor to, after clustering the first subpopulation into the second plurality of clusters and before determining the range: 
determine a number of features of the second subset of the plurality of features is outside a desired range; 
determine a further subset of the plurality of features based on the second plurality of clusters; 
cluster the first subpopulation into a further plurality of clusters based on the further subset of the plurality of features; 
repeat determining the further subset of the plurality of features and clustering the first subpopulation into the further plurality of clusters until a number of features of the further subset of the plurality of features is within a desired range; and 
replace the second subset of the plurality of features with the further subset of the plurality of features and the second plurality of clusters with the further plurality of clusters.
Deolalikar teaches
the system of claim 11, wherein the instructions further direct the at least one processor to, after clustering the first subpopulation into the second plurality of clusters and before determining the range: 
determine a number of features of the second subset of the plurality of features is outside a desired range (Fig. 5; [0041]: determine a convergence threshold is not met, wherein the convergence threshold being not met corresponds to the number of features added to the feature set for clustering in a given iteration is above a threshold, i.e., “a number of features … is outside a desired range”); 
determine a further subset of the plurality of features based on the second plurality of clusters (Fig. 5; [0041]: select a plurality of features based on the plurality of clusters and add the plurality of features to a feature set for clustering); 
cluster the first subpopulation into a further plurality of clusters based on the further subset of the plurality of features (Fig. 5; [0041]: cluster the plurality of samples into a plurality of clusters based on the updated feature set); 
repeat determining the further subset of the plurality of features and clustering the first subpopulation into the further plurality of clusters until a number of features of the further subset of the plurality of features is within a desired range (Fig. 5; [0041]: iterate feature selection and clustering until a convergence threshold is met); and 
replace the second subset of the plurality of features with the further subset of the plurality of features and the second plurality of clusters with the further plurality of clusters (Fig. 5; [0041]: when a convergence threshold is met and the above iteration ends, the resultant feature set replaces all previous feature sets and is used to cluster the plurality of sample data into a new plurality of clusters).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified JACKSON and Vij and Roy to incorporate the teachings of Deolalikar to determine a number of features of the second subset of the plurality of features is outside a desired range, determine a further subset of the plurality of features based on the second plurality of clusters, cluster the first subpopulation into a further plurality of clusters based on the further subset of the plurality of features, repeat determining the further subset of the plurality of features and clustering the first subpopulation into the further plurality of clusters until a number of features of the further subset of the plurality of features is within a desired range, and replace the second subset of the plurality of features with the further subset of the plurality of features and the second plurality of clusters with the further plurality of clusters. Doing so would reduce primary memory usage due to the smaller number of features, enable a clustering operation to be performed more efficiently over the entire data set. In addition, because the feature set is generated using clusters generated by the same clustering algorithm, the feature set may be tailored specifically for that clustering algorithm, which may result in improved clustering as taught by Deolalikar ([0010]).

With regard to claim 30,
	As discussed regarding claim 21, JACKSON and Vij and Roy teach all the limitations therein.
JACKSON and Vij and Roy do not explicitly teach
the computer program product of claim 21, wherein the instructions further direct the at least one processor to, after clustering the first subpopulation into the second plurality of clusters and before determining the range: 
determine a number of features of the second subset of the plurality of features is outside a desired range; 
determine a further subset of the plurality of features based on the second plurality of clusters; 
cluster the first subpopulation into a further plurality of clusters based on the further subset of the plurality of features; 
repeat determining the further subset of the plurality of features and clustering the first subpopulation into the further plurality of clusters until a number of features of the further subset of the plurality of features is within a desired range; and 
replace the second subset of the plurality of features with the further subset of the plurality of features and the second plurality of clusters with the further plurality of clusters.
Deolalikar teaches
the computer program product of claim 21, wherein the instructions further direct the at least one processor to, after clustering the first subpopulation into the second plurality of clusters and before determining the range: 
determine a number of features of the second subset of the plurality of features is outside a desired range (Fig. 5; [0041]: determine a convergence threshold is not met, wherein the convergence threshold being not met corresponds to the number of features added to the feature set for clustering in a given iteration is above a threshold, i.e., “a number of features … is outside a desired range”); 
determine a further subset of the plurality of features based on the second plurality of clusters (Fig. 5; [0041]: select a plurality of features based on the plurality of clusters and add the plurality of features to a feature set for clustering); 
cluster the first subpopulation into a further plurality of clusters based on the further subset of the plurality of features (Fig. 5; [0041]: cluster the plurality of samples into a plurality of clusters based on the updated feature set); 
repeat determining the further subset of the plurality of features and clustering the first subpopulation into the further plurality of clusters until a number of features of the further subset of the plurality of features is within a desired range (Fig. 5; [0041]: iterate feature selection and clustering until a convergence threshold is met); and 
replace the second subset of the plurality of features with the further subset of the plurality of features and the second plurality of clusters with the further plurality of clusters (Fig. 5; [0041]: when a convergence threshold is met and the above iteration ends, the resultant feature set replaces all previous feature sets and is used to cluster the plurality of sample data into a new plurality of clusters).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified JACKSON and Vij and Roy to incorporate the teachings of Deolalikar to determine a number of features of the second subset of the plurality of features is outside a desired range, determine a further subset of the plurality of features based on the second plurality of clusters, cluster the first subpopulation into a further plurality of clusters based on the further subset of the plurality of features, repeat determining the further subset of the plurality of features and clustering the first subpopulation into the further plurality of clusters until a number of features of the further subset of the plurality of features is within a desired range, and replace the second subset of the plurality of features with the further subset of the plurality of features and the second plurality of clusters with the further plurality of clusters. Doing so would reduce primary memory usage due to the smaller number of features, enable a clustering operation to be performed more efficiently over the entire data set. In addition, because the feature set is generated using clusters generated by the same clustering algorithm, the feature set may be tailored specifically for that clustering algorithm, which may result in improved clustering as taught by Deolalikar ([0010]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAOQIN HU whose telephone number is (571)272-1792.  The examiner can normally be reached on Monday-Friday 7:00am-3:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fred Ehichioya can be reached on (571) 272-4034.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/XIAOQIN HU/Examiner, Art Unit 2168