DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Previous Rejections
The previous 35 U.S.C. 101 rejections are withdrawn due to amendments.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1.	Claims 1, 4, 9, 12 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US 20170140417 A1) in view of Ramachandra (“Deep Learning for Causal Inference”) in view of Bae et al. (US 20130091007 A1) and further in view of Chalasani et al. (US 20180040032 A1)
Regarding Claim 1
Li teaches
A method, comprising:
-obtaining an initial set of a plurality of features for each of a plurality of users from one or more memory devices ([0003] “In one or more implementations, campaign data that pertains to first and second campaign groups is characterized using a plurality of features (i.e., covariates) that describe subjects included in the first and second campaign groups.”; “subjects” reads on “users”), the plurality of users comprising a first group of users that have participated in an activity and a second group of users that have not participated in the activity ([0035] “The campaign data 116, for instance, may use an algorithm to assign subjects into a first campaign group 202 and a second campaign group 204 that correspond to first and second campaigns, e.g., a “10% off” offer and a “buy one, get one free” offer respectively.”; second group of users have not participated in the first campaigns), each feature of the plurality of features comprising a different item of information for each of the plurality of users; ([0023] “For instance, a multidimensional vector may be used to express features in the marketing campaign for the subjects such as age, education, marriage status, high school degree, earnings, geographic location, and so forth.”; discloses features comprising a different item of information)
-determining a reduced set of the plurality of features, ([0024] “Accordingly, the techniques described herein first project the features of the characterized campaign data into a reduced dimension space, e.g., using linear or non-linear techniques, and thus reduces the effective number of the features needed to characterize the subjects in the two groups.”)
-… whereby the second machine learning model utilizes a lesser amount of computing resources with the reduced set than compared to using the initial set of the plurality of features for determining the propensity score; ([0025] “This achieves improved accuracy in determination of the campaign effectiveness result and also increased computational efficiency and accuracy through use of the reduced dimension space as further described in greater detail in the following sections.”)
Li does not distinctly disclose
-providing to a first machine learning model executing on one or more computing devices the initial set of the plurality of features for each of the plurality of users, the plurality of features being utilized to train the first machine learning model to determine an effectiveness of each of the plurality of features at predicting the likelihood that a particular user is to participate in the activity; 
-… the reduced set having a prediction error metric within a predetermined threshold of a prediction error metric associated with the initial set of the plurality of features; 
-providing the reduced set to a second machine learning model executing on the one or more computing devices;  
-for each of the plurality of users, determining, by the second machine learning model, a propensity score indicative of a likelihood that the particular user is to participate in the activity based on the reduced set, 
However, Ramachandra teaches
-providing to a first machine learning model executing on one or more computing devices the initial set of the plurality of features for each of the plurality of users, the plurality of features being utilized to train the first machine learning model to determine an effectiveness of each of the plurality of features at predicting the likelihood that a particular user is to participate in the activity; ([Abstract] “1.For generalized neighbor matching to estimate individual and average treatment effects, we analyze the use of autoencoders for dimensionality reduction while maintaining the local neighborhood structure among the data points in the embedding space.” [3] “The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction.”; “autoencoder” reads on “a first machine learning model”; “dimensionality reduction” means the reduction of features which is considered as keeping the features effective at predicting the likelihood that a particular user is to participate in the activity. )
-… the reduced set having a prediction error metric within a predetermined threshold of a prediction error metric associated with the initial set of the plurality of features; ([3.1] “The learnt mapping, if it maps the input to a lower dimensional encoding, becomes a form of non-linear dimensionality reduction technique. … Repeat the above steps for several epochs until the error reaches below a certain threshold or converges.”; “error reaches below a certain threshold” reads on “prediction error metric within a predetermined threshold of a prediction error metric”)
-providing the reduced set to a second machine learning model executing on the one or more computing devices;  ([4.2] “We build a DNN ‘PropensityNet’ to estimate the propensity score, with the inputs being the covariates X as well as the outcome Y across all units.”; DNN reads on “a second machine learning model”)
-for each of the plurality of users, determining, by the second machine learning model, a propensity score indicative of a likelihood that the particular user is to participate in the activity based on the reduced set, ([4.2] “We build a DNN ‘PropensityNet’ to estimate the propensity score, with the inputs being the covariates X as well as the outcome Y across all units.”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the effectiveness determination system of Li with machine learning models of Ramachandra to achieve improvement of matching performance. ([Abstract] “This is a generalization of the logistic regression technique traditionally used to estimate propensity scores and we show empirically that DNNs perform better than logistic regression at propensity score matching.”)
	The combination of Li and Ramachandra does not appear to distinctly disclose
-for each predetermined range of the propensity score, matching users from the first group to users from the second group having a propensity score falling within the predetermined range; 
	However, Bae teaches
-for each predetermined range of the propensity score, matching users from the first group to users from the second group having a propensity score falling within the predetermined range; ([0052] “That is, for each treated subgroup, a matching non-treated subgroup is defined having a matching range of propensity scores.”; “treated subgroup” reads on “first group” and “non-treated subgroup” reads on “second group”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the effectiveness determination system of Li and Ramachandra with propensity score matching of Bae to increase effectiveness of propensity score matching. ([0018] “For example, embodiments of the present invention may be used to analyze the impact of services or features applied to certain on-line advertisers, such as particular sales activities directed to certain on-line advertisers, features offered to on-line advertisers to increase the effectiveness of their advertising, and marketing events offered to certain on-line advertisers.”)
The combination of Li, Ramachandra and Bae does not appear to distinctly disclose
-estimating a measurable effect attributable to the activity based on a difference between an average participation level of users in the first group and an average participation level of the users in the second group in each predetermined range of the propensity score.
	However, Chalasani teaches
-estimating a measurable effect attributable to the activity based on a difference between an average participation level of users in the first group and an average participation level of the users in the second group in each predetermined range of the propensity score. ([0123] “The difference in response rates of the exposed population (E[Y(1)|W=1) and that of the Counterfactual Unexposed population (E[Y(0)|W=1) can be considered the Average Treatment Effect of the Treated (ATT), the causal effect.”; “response rate” reads on “participation level”; “exposed population” reads on “users in the first group” and “unexposed population” reads on “users in the second group”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the effectiveness determination system of Li, Ramachandra and Bae with average treatment effect calculation of Chalasani to achieve improvement in stability. ([0042] “As such, some embodiments of the present disclosure describe a system to account for these complications and provide significant, positive, and stable lift.”)

Regarding Claim 4
The combination of Li, Ramachandra, Bae and Chalasani teaches all of the limitations of claim 1 as cited above and Li further teaches
-wherein the plurality of features comprises at least one of: a usage history associated with one or more devices associated with the plurality of users; demographic information associated with the plurality of users; purchase activity associated with the plurality of users, the purchase activity comprising information associated with new device purchases associated with the plurality of users; or advertising campaigns associated with the plurality of users. ([0023] “For instance, a multidimensional vector may be used to express features in the marketing campaign for the subjects such as age, education, marriage status, high school degree, earnings, geographic location, and so forth.”; reads on “demographic information”)

Regarding Claim 9
	Claim 9 is a device claim comprising one or more processors and one or more storage
media corresponding to the methods of claim 1, and is directed to largely the same subject
matter. Thus, it is rejected for the same reasons as given in the rejection of claim 1. Note that
Li teaches a processor and a memory ([0066] “processors”, [0067] “memory”).

Regarding Claim 12
	Claim 12 is a device claim comprising one or more processors and one or more storage
media corresponding to the methods of claim 4, and is directed to largely the same subject
matter. Thus, it is rejected for the same reasons as given in the rejection of claim 4. Note that
Li teaches a processor and a memory ([0066] “processors”, [0067] “memory”).

Regarding Claim 17
	Li teaches
A computer-readable storage medium [“one or more computer-readable media” ¶65] having program instructions recorded thereon [“computer readable instructions” ¶71] that, when executed by a processing circuit, perform a method, the method comprising:
-receiving an initial set of a plurality of features for each of a plurality of users from one or more memory devices ([0003] “In one or more implementations, campaign data that pertains to first and second campaign groups is characterized using a plurality of features (i.e., covariates) that describe subjects included in the first and second campaign groups.”; “subjects” reads on “users”), the plurality of users comprising a first group of users that have participated in an activity and a second group of users that have not participated in the activity ([0035] “The campaign data 116, for instance, may use an algorithm to assign subjects into a first campaign group 202 and a second campaign group 204 that correspond to first and second campaigns, e.g., a “10% off” offer and a “buy one, get one free” offer respectively.”; second group of users have not participated in the first campaigns), each feature of the plurality of features comprising a different item of information for each of the plurality of users; ([0023] “For instance, a multidimensional vector may be used to express features in the marketing campaign for the subjects such as age, education, marriage status, high school degree, earnings, geographic location, and so forth.”; discloses features comprising a different item of information)
- a reduced set of the plurality of features, ([0024] “Accordingly, the techniques described herein first project the features of the characterized campaign data into a reduced dimension space, e.g., using linear or non-linear techniques, and thus reduces the effective number of the features needed to characterize the subjects in the two groups.”)
-… whereby the second machine learning model utilizes a lesser amount of computing resources with the reduced set than compared to using the initial set of the plurality of features for determining the propensity score; ([0025] “This achieves improved accuracy in determination of the campaign effectiveness result and also increased computational efficiency and accuracy through use of the reduced dimension space as further described in greater detail in the following sections.”)
Li does not distinctly disclose
-by a first machine learning model…, the plurality of features being utilized to train the first machine learning model to determine an effectiveness of each of the plurality of features at predicting the likelihood that a particular user is to participate in the activity; 
-receiving, by a second machine learning model, … the reduced set having a prediction error metric within a predetermined threshold of a prediction error metric associated with the initial set of the plurality of features;  
-for each of the plurality of users, determining, by the second machine learning model, a propensity score indicative of a likelihood that the particular user is to participate in the activity based on the reduced set, 
However, Ramachandra teaches
-by a first machine learning model…, the plurality of features being utilized to train the first machine learning model to determine an effectiveness of each of the plurality of features at predicting the likelihood that a particular user is to participate in the activity; ([Abstract] “1.For generalized neighbor matching to estimate individual and average treatment effects, we analyze the use of autoencoders for dimensionality reduction while maintaining the local neighborhood structure among the data points in the embedding space.” [3] “The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction.”; “autoencoder” reads on “a first machine learning model”; “dimensionality reduction” means the reduction of features which is considered as keeping the features effective at predicting the likelihood that a particular user is to participate in the activity. )
-receiving, by a second machine learning model, … the reduced set having a prediction error metric within a predetermined threshold of a prediction error metric associated with the initial set of the plurality of features; ([3.1] “The learnt mapping, if it maps the input to a lower dimensional encoding, becomes a form of non-linear dimensionality reduction technique. … Repeat the above steps for several epochs until the error reaches below a certain threshold or converges.”; “error reaches below a certain threshold” reads on “prediction error metric within a predetermined threshold of a prediction error metric”; [4.2] “We build a DNN ‘PropensityNet’ to estimate the propensity score, with the inputs being the covariates X as well as the outcome Y across all units.”; DNN reads on “a second machine learning model”)
-for each of the plurality of users, determining, by the second machine learning model, a propensity score indicative of a likelihood that the particular user is to participate in the activity based on the reduced set, ([4.2] “We build a DNN ‘PropensityNet’ to estimate the propensity score, with the inputs being the covariates X as well as the outcome Y across all units.”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the effectiveness determination system of Li with machine learning models of Ramachandra to achieve improvement of matching performance. ([Abstract] “This is a generalization of the logistic regression technique traditionally used to estimate propensity scores and we show empirically that DNNs perform better than logistic regression at propensity score matching.”)
	The combination of Li and Ramachandra does not appear to distinctly disclose
-for each predetermined range of the propensity score, matching users from the first group to users from the second group having a propensity score falling within the predetermined range; 
	However, Bae teaches
-for each predetermined range of the propensity score, matching users from the first group to users from the second group having a propensity score falling within the predetermined range; ([0052] “That is, for each treated subgroup, a matching non-treated subgroup is defined having a matching range of propensity scores.”; “treated subgroup” reads on “first group” and “non-treated subgroup” reads on “second group”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the effectiveness determination system of Li and Ramachandra with propensity score matching of Bae to increase effectiveness of propensity score matching. ([0018] “For example, embodiments of the present invention may be used to analyze the impact of services or features applied to certain on-line advertisers, such as particular sales activities directed to certain on-line advertisers, features offered to on-line advertisers to increase the effectiveness of their advertising, and marketing events offered to certain on-line advertisers.”)
The combination of Li, Ramachandra and Bae does not appear to distinctly disclose
-estimating a measurable effect attributable to the activity based on a difference between an average participation level of users in the first group and an average participation level of the users in the second group in each predetermined range of the propensity score.
	However, Chalasani teaches
-estimating a measurable effect attributable to the activity based on a difference between an average participation level of users in the first group and an average participation level of the users in the second group in each predetermined range of the propensity score. ([0123] “The difference in response rates of the exposed population (E[Y(1)|W=1) and that of the Counterfactual Unexposed population (E[Y(0)|W=1) can be considered the Average Treatment Effect of the Treated (ATT), the causal effect.”; “response rate” reads on “participation level”; “exposed population” reads on “users in the first group” and “unexposed population” reads on “users in the second group”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the effectiveness determination system of Li, Ramachandra and Bae with average treatment effect calculation of Chalasani to achieve improvement in stability. ([0042] “As such, some embodiments of the present disclosure describe a system to account for these complications and provide significant, positive, and stable lift.”)

Regarding Claim 18
The combination of Li, Ramachandra, Bae and Chalasani teaches all of the limitations of claim 17 as cited above and Li further teaches
-wherein the plurality of features comprises at least one of: a usage history associated with one or more devices associated with the plurality of users; demographic information associated with the plurality of users; purchase activity associated with the plurality of users, the purchase activity comprising information associated with new device purchases associated with the plurality of users; or advertising campaigns associated with the plurality of users. ([0023] “For instance, a multidimensional vector may be used to express features in the marketing campaign for the subjects such as age, education, marriage status, high school degree, earnings, geographic location, and so forth.”; reads on “demographic information”)

2. 	Claims 2 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Ramachandra  in view of Bae and in view of Chalasani and further in view of Maldonado et al. (“A wrapper method for feature selection using Support Vector Machines”)
Regarding Claim 2
The combination of Li, Ramachandra, Bae and Chalasani teaches all of the limitations of claim 1 as cited above but does not appear to distinctly disclose
-wherein determining the reduced set of the plurality of features comprises:
(a) receiving, as an output from the first machine learning model, a first ranking for each of the initial set of the plurality of features that indicates a level of effectiveness that a particular feature of the initial set has at predicting the likelihood that the particular user is to participate in the activity; 
 (b) removing an N number of lowest ranking features from the initial set, where N is a positive integer; 
 (c) providing the remaining features to the first machine learning model to generate a second ranking for each of the remaining features; 
(d) determining whether the remaining features having a prediction error metric within the predetermined threshold of the prediction error metric associated with the initial set; 
(e) in response to determining that the prediction error metric of the remaining features is not within the predetermined threshold, adding the removed N number of lowest ranking features to the remaining features and providing the remaining features to the second machine learning model, the remaining features being the reduced set; and 
(f) in response to determining that the prediction error metric of the remaining features is within the predetermined threshold: removing an N number of lowest ranking features from the remaining features; and repeating steps (c)-(f) until a determination is made that the remaining features do not have a prediction error metric within the predetermined threshold of the prediction error metric associated with the initial set.
	However, Maldonado teaches
wherein determining the reduced set of the plurality of features comprises:
(a) receiving, as an output from the first machine learning model, a first ranking for each of the initial set of the plurality of features that indicates a level of effectiveness that a particular feature of the initial set has at predicting the likelihood that the particular user is to participate in the activity; ([4.2] “Initialization: We set σ = (1,….,1) , which means we start with all features and in each iteration we remove the feature with the smallest contribution to the respective model.”; “the smallest” implies that the data is ranked)
(b) removing an N number of lowest ranking features from the initial set, where N is a positive integer; ([4.2] “Initialization: We set σ = (1,….,1) , which means we start with all features and in each iteration we remove the feature with the smallest contribution to the respective model.”; This is the case where N = 1)
(c) providing the remaining features to the first machine learning model to generate a second ranking for each of the remaining features; 
(d) determining whether the remaining features having a prediction error metric within the predetermined threshold of the prediction error metric associated with the initial set; 
(e) in response to determining that the prediction error metric of the remaining features is not within the predetermined threshold, adding the removed N number of lowest ranking features to the remaining features and providing the remaining features to the second machine learning model, the remaining features being the reduced set; and 
(f) in response to determining that the prediction error metric of the remaining features is within the predetermined threshold: removing an N number of lowest ranking features from the remaining features; and repeating steps (c)-(f) until a determination is made that the remaining features do not have a prediction error metric within the predetermined threshold of the prediction error metric associated with the initial set.
([Algorithm 1] discloses the step (c)-(f) which remove the feature with the smallest value(“lowest ranking features with N=1”). And the feature set is inputted to the SVM(“second machine learning model”) then it runs until meeting the condition(“the remaining features do not have a prediction error metric within the predetermined threshold”))
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the effectiveness determination system of Li, Ramachandra, Bae and Chalasani with feature selection of Maldonado to obtain relevant feature sets thereby  achieving prediction accuracy. ([Abstract] “ Additionally, we use two conjoint choice experiments whose results show that the proposed techniques have better fit and predictive accuracy than traditional methods and that they additionally provide an improved understanding of customer preferences.”)

Regarding Claim 10
	Claim 10 is a device claim comprising one or more processors and one or more storage
media corresponding to the methods of claim 2, and is directed to largely the same subject
matter. Thus, it is rejected for the same reasons as given in the rejection of claim 2. Note that
Li teaches a processor and a memory ([0066] “processors”, [0067] “memory”).

3. 	Claims 3, 6, 11, 14 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Ramachandra  in view of Bae and in view of Chalasani and further in view of Wang et al. (US 20160055320 A1)
Regarding Claim 3
The combination of Li, Ramachandra, Bae and Chalasani teaches all of the limitations of claim 1 as cited above but does not appear to distinctly disclose
-wherein estimating the measurable effect comprises:
for each predetermined range: 
weighting the difference between an average participation level of users of the first group and an average participation level of users of the second group in each predetermined range of the propensity score based on a number of users in the first group in the predetermined range; and
estimating the measurable effect based on the weighted difference and a number of the plurality of users.
	However, Wang teaches
-wherein estimating the measurable effect comprises:
for each predetermined range: 
weighting the difference between an average participation level of users of the first group and an average participation level of users of the second group in each predetermined range of the propensity score based on a number of users in the first group in the predetermined range; and
estimating the measurable effect based on the weighted difference and a number of the plurality of users. ([0055] “Moving to 716, success rate of the treatment group is computed based on the treatment user dataset, which is weighted by the corresponding weighting factors and/or adjusted by the corresponding adjusting factors. At 718, success rate of the control group is computed based on the control user dataset, which is weighted by the corresponding weighting factors and/or adjusted by the corresponding adjusting factors. Eventually, at 720, effectiveness metrics of the user treatment are measured, which may be the difference between the success rates computed at 716 and 718 or the ratio/amplifier of the two success rates.”)
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the effectiveness determination system of Li, Ramachandra, Bae and Chalasani with weighted effect estimation of Wang to obtain reliability in estimation of measurable effect. ([0004] “the market increasingly demands a reliable measurement and a sound comparison of the impact of the different user treatments on user actions”)

Regarding Claim 6
The combination of Li, Ramachandra, Bae and Chalasani teaches all of the limitations of claim 1 as cited above but does not appear to distinctly disclose
- wherein the second machine learning model is a gradient- boosted decision tree-based algorithm.
	However, Wang teaches
-wherein the second machine learning model is a gradient- boosted decision tree-based algorithm. ([0049] “Various approaches may be applied by the propensity score model fitting unit 616 to fit the propensity score model to estimate the probability {circumflex over (p)}.sub.i for each user with respect to each user feature X. In this example, gradient boosting tree (GBT) is used to model the propensity score model {circumflex over (P)}.sub.(X).”)
	Same motivation as claim 3.	

Regarding Claim 11
	Claim 11 is a device claim comprising one or more processors and one or more storage
media corresponding to the methods of claim 3, and is directed to largely the same subject
matter. Thus, it is rejected for the same reasons as given in the rejection of claim 3. Note that
Li teaches a processor and a memory ([0066] “processors”, [0067] “memory”).

Regarding Claim 14
	Claim 14 is a device claim comprising one or more processors and one or more storage
media corresponding to the methods of claim 6, and is directed to largely the same subject
matter. Thus, it is rejected for the same reasons as given in the rejection of claim 6. Note that
Li teaches a processor and a memory ([0066] “processors”, [0067] “memory”).

Regarding Claim 19
The combination of Li, Ramachandra, Bae and Chalasani teaches all of the limitations of claim 17 as cited above but does not appear to distinctly disclose
- wherein the second machine learning model is a gradient- boosted decision tree-based algorithm.
	However, Wang teaches
-wherein the second machine learning model is a gradient- boosted decision tree-based algorithm. ([0049] “Various approaches may be applied by the propensity score model fitting unit 616 to fit the propensity score model to estimate the probability {circumflex over (p)}.sub.i for each user with respect to each user feature X. In this example, gradient boosting tree (GBT) is used to model the propensity score model {circumflex over (P)}.sub.(X).”)
	Same motivation as claim 3.	

Regarding Claim 20
The combination of Li, Ramachandra, Bae and Chalasani teaches all of the limitations of claim 17 as cited above but does not appear to distinctly disclose
-wherein estimating the measurable effect comprises:
for each predetermined range: 
weighting the difference between an average participation level of users of the first group and an average participation level of users of the second group in each predetermined range of the propensity score based on a number of users in the first group in the predetermined range; and
estimating the measurable effect based on the weighted difference and a number of the plurality of users.
	However, Wang teaches
-wherein estimating the measurable effect comprises:
for each predetermined range: 
weighting the difference between an average participation level of users of the first group and an average participation level of users of the second group in each predetermined range of the propensity score based on a number of users in the first group in the predetermined range; and
estimating the measurable effect based on the weighted difference and a number of the plurality of users. ([0055] “Moving to 716, success rate of the treatment group is computed based on the treatment user dataset, which is weighted by the corresponding weighting factors and/or adjusted by the corresponding adjusting factors. At 718, success rate of the control group is computed based on the control user dataset, which is weighted by the corresponding weighting factors and/or adjusted by the corresponding adjusting factors. Eventually, at 720, effectiveness metrics of the user treatment are measured, which may be the difference between the success rates computed at 716 and 718 or the ratio/amplifier of the two success rates.”)
	Same motivation as claim 3.

4. 	Claims 5 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Ramachandra  in view of Bae and in view of Chalasani and further in view of King et al. (US 20190377784 A1)
Regarding Claim 5
The combination of Li, Ramachandra, Bae and Chalasani teaches all of the limitations of claim 1 as cited above but does not appear to distinctly disclose
-wherein the first machine learning model is a regression- based algorithm.
	However, King teaches
-wherein the first machine learning model is a regression-based algorithm. ([0058] “The reduced feature space may be established using a global log-bilinear regression model”)
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the effectiveness determination system of Li, Ramachandra, Bae and Chalasani with regression model of King to achieve high accuracy in prediction. ([Abstract] “Embodiments of the invention utilize a feature-extraction approach and/or a matching approach in combination with a nonparametric approach to estimate the proportion of documents in each of multiple labeled categories with high accuracy.”)

Regarding Claim 13
	Claim 13 is a device claim comprising one or more processors and one or more storage
media corresponding to the methods of claim 5, and is directed to largely the same subject
matter. Thus, it is rejected for the same reasons as given in the rejection of claim 5. Note that
Li teaches a processor and a memory ([0066] “processors”, [0067] “memory”).

5. 	Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Ramachandra  in view of Bae and in view of Chalasani and further in view of Yoneda et al. (US 20100235151 A1)
Regarding Claim 7
The combination of Li, Ramachandra, Bae and Chalasani teaches all of the limitations of claim 1 and Li further teaches
-wherein, for each predetermined range of the propensity score, matching users from the first group to users from the second group having a propensity score falling within the predetermined range comprises:
for each predetermined range of the propensity score, determining whether the matched users of the first group and users of the second group are balanced; ([0021] “As a result, this may introduce a confounding bias in the selection of subjects to form the groups (e.g., users receiving treatment, exposed to a marketing campaign) and sometimes unbalanced groupings in the feature space, whereby a number of subjects in the groups varies greatly.”)
The combination of Li, Ramachandra, Bae and Chalasani does not appear to distinctly disclose
-in response to determining that the matched users of the first group and users of the second group are not balanced, increasing the features in the reduced set and providing the increased set of features to the second machine learning model to generate a ranking of the increased set of features and a new propensity score for each of the plurality of users based on the ranking; and in response to determining that the matched users of the first group and users of the second group are balanced, maintaining the features in the reduced set.
	However, Yoneda teaches
-in response to determining that the matched users of the first group and users of the second group are not balanced, increasing the features in the reduced set and providing the increased set of features to the second machine learning model to generate a ranking of the increased set of features and a new propensity score for each of the plurality of users based on the ranking; and in response to determining that the matched users of the first group and users of the second group are balanced, maintaining the features in the reduced set. ([0068] “When the feature parameter set to be evaluated is updated by executing "the addition of a feature parameter" or/and "the adjustment of the resolution of a feature parameter", as described above, the flow returns to step S4, where the evaluation of the balance of NC in the feature parameter set thus updated is carried out. The processing (search processing of a suitable feature parameter set) in these steps S4 and S5 is repeated until it is determined that the balance of NC is good.”; “feature parameter set” reads on “features”; “repeated until … balance of NC is good” reads on “in response to determining that the matched users of the first group and users of the second group are balanced, maintaining the features in the reduced set”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the effectiveness determination system of Li, Ramachandra, Bae and Chalasani with feature increasing of Yoneda to achieve effective modeling. ([0011] “The present invention has been made in view of the above-mentioned actual circumstances, and has for its object to provide a technique that makes it possible to efficiently prepare candidates for feature parameters with different properties, which become particularly effective in modeling an object with complexity and individuality.”)
Regarding Claim 15
	Claim 15 is a device claim comprising one or more processors and one or more storage
media corresponding to the methods of claim 7, and is directed to largely the same subject
matter. Thus, it is rejected for the same reasons as given in the rejection of claim 7. Note that
Li teaches a processor and a memory ([0066] “processors”, [0067] “memory”).

6.	Claims 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Ramachandra  in view of Bae and in view of Chalasani and in view of Yoneda and further in view of Wang.

Regarding Claim 8
The combination of Li, Ramachandra, Bae, Chalasani and Yoneda teaches all of the limitations of claim 7 but does not appear to distinctly disclose
-wherein, for each predetermined range of the propensity score, determining whether the matched users of the first group and users of the second group are balanced is based on at least one of:
an analysis of a standardized bias associated with the matched users of the first group and users of the second group; 
an analysis between a participation level of the users in the first group in a time period before participating in the activity and a participation level of the users in the second group in the same time period; or 
an analysis of a distribution of features of the matched users of the first group and users of the second group.
	However, Wang teaches
-wherein, for each predetermined range of the propensity score, determining whether the matched users of the first group and users of the second group are balanced is based on at least one of:
an analysis of a standardized bias associated with the matched users of the first group and users of the second group; 
an analysis between a participation level of the users in the first group in a time period before participating in the activity and a participation level of the users in the second group in the same time period; or 
an analysis of a distribution of features of the matched users of the first group and users of the second group. ([0043] “To address the non-robustness in the model verification of the traditional methods, the model validation engine 508 implements a novel robust rank test for user features covariate balancing verification, which is suitable for addressing, for example, the skewness of advertising data with a robust weighted rank test.”; “user features covariate balancing verification” reads on “analysis of a distribution of features”)
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the effectiveness determination system of Li, Ramachandra, Bae, Chalasani and Yoneda with group balancing of Wang to obtain reliability in estimation of measurable effect. ([0004] “the market increasingly demands a reliable measurement and a sound comparison of the impact of the different user treatments on user actions”)
Regarding Claim 16
	Claim 16 is a device claim comprising one or more processors and one or more storage
media corresponding to the methods of claim 8, and is directed to largely the same subject
matter. Thus, it is rejected for the same reasons as given in the rejection of claim 8. Note that
Li teaches a processor and a memory ([0066] “processors”, [0067] “memory”).

Response to Arguments
Regarding the 35 U.S.C. 103 rejections, Applicant's arguments have been fully considered but have been found unpersuasive.  Applicant argues that the references do not disclose the following three limitations in claim 1:
…
the reduced set having a prediction error metric within a predetermined threshold of a prediction error metric associated with the initial set of the plurality of features;
providing the reduced set to a second machine learning model executing on the one or more computing devices;
…
estimating a measurable effect attributable to the activity based on a difference between an average participation level of users in the first group and an average participation level of the users in the second group in each predetermined range of the propensity score.
Examiner disagrees for at least the following reasons.
The first two limitations are closely related and will be addressed together.  Applicant argues that Ramachandra does not disclose a reduced set in these limitations.  However, what was excluded from the beginning of the first limitation was the phrase, “determining a reduced set of the plurality of features.”  This phrase was not disclosed by Ramachandra.  Instead, this phrase is disclosed by Li.  In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).  It is Li which discloses the reduced set and Ramachandra which discloses the following aspects in the limitations.  Specifically, Li discloses a reduced set of data in at least paragraph 24.  Ramachandra discloses using the data in several epochs until the error is below a threshold and then providing the data to a DNN in at least sections 3.1 and 4.2, respectively, as recited in those two limitations.  It is the combination of these two references which disclose the entirety of these two limitations.
For the final limitation, Applicant argues that Chalasani does not disclose the average response rates.  The cited reference need not explicitly state or use the claim language from the application.  “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned. They are part of the literature of the art, relevant for all they contain.” In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)).  Furthermore, a reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. MPEP 2123.  Accordingly, Chalasani discloses, “The difference in response rates of the exposed population (E[Y(1)|W=1) and that of the Counterfactual Unexposed population (E[Y(0)|W=1) can be considered the Average Treatment Effect of the Treated (ATT), the causal effect” in paraphrase 123.  The response rates are explained in the reference that, “term, E[Y(1)|W=1] can refer to the observable average response rate of exposed consumers” in paragraph 130.  Therefore, Chalasani does disclose the average response rate.
For at least these reasons, the rejections are maintained.

Conclusion
	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT H BEJCEK II whose telephone number is (571)270-3610. The examiner can normally be reached Monday - Friday: 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/R.B./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123