DETAILED ACTION
This action is written in response to the remarks and amendments dated 9/22/21. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Because this Office action includes new rejections which were not necessitated by the Applicant’s amendments dated 9/22/21, this action is made non-final.

Allowable Subject Matter
Claims 4, 11 and 18 are objected to as being dependent upon a rejected base claim, but would be allowable over the prior art if rewritten in independent form including all of the limitations of the base claim and any intervening claims. These claims each recite “wherein the performance of the linear model is measured by computing accuracy as a measure of a proportional hit rate.” The Examiner notes that “proportional hit rate” does not appear to be a widely used term of art within the field of computer science. The Examiner interprets this term in view of the definition provided by the Applicant in their specification at [0075] (formula 1, reproduced below).

    PNG
    media_image1.png
    130
    774
    media_image1.png
    Greyscale


Claim Rejections - 35 USC § 101
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. 35 U.S.C. 101 reads as follows: 

Step 1: Is the claim to a process, machine, manufacture, or composition of matter? Yes—claim 1 recites a method, which is a process.
Step 2A, prong one: Does the claim recite an abstract idea, law of nature or natural phenomenon? Yes—the limitations identified below each, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components:
“identifying... a plurality of derived attributes using an external data source”;
“selecting... a plurality of key performance indicators”;
“constructing... a linear model using the plurality of key performance indicators”; and
“predicting... occurrences of the extremely rare events using the linear model”.
Therefore, the claim recites a mental process.
Step 2A, prong two: Does the claim recite additional elements that integrate the judicial exception into a practical application? No—the judicial exception is not integrated into a practical application. Although the claim recites that the recited functionality is performed “by a computing device”, the recited computing device is recited at a high-level of generality. No particularly technological problem is addressed, because “extremely rare events” occur in every field of human endeavor.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? No—the only limitation on the performance of the described method is that it must i.e. by a generic computer. The statement that the method is performed by computer does not satisfy the test of “inventive concept.” See Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 573 U.S. 208, 134 S. Ct. 2347, 2360 (2014).
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claims 8 and 15, which recite a corresponding computer program product and system, respectively, as well as to dependent claims 2-7, 9-14, and 16-20. The additional limitations of the dependent claims are addressed briefly below:
Dependent claim 2 recites “analyzing... the plurality of derived attributes”; this is an additional mental process.
Dependent claims 3, 10, and 17 recite “measuring... performance of the linear model”; this is an additional mental process.
Dependent claims 4, 11, and 18, recite “wherein the performance... is measured by computing accuracy as a measure of a proportional hit rate”; this is an additional mental process and/or a mathematical equation.
Dependent claims 5, 12, and 19 recite “wherein the selecting the plurality of key performance indicators comprises using C5.0 Winnow Attributes”; this is an additional detail about the mental process performed in the corresponding independent claim.
Dependent claims 6 and 13 recite “performing neural network bagging using multilayer perceptron”; this is an additional mental process.
Dependent claims 7, 14, and 20 recite “using Chi-square Automatic Interaction Detector (CHAID) to automatically select the plurality of key performance indicators”; this is an additional mental process.

Taken alone, the additional elements of the dependent claims above do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The following are the references relied upon in the rejections below:
Mozer (primary reference): Mozer, Michael C., et al. "Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry." IEEE Transactions on neural networks 11.3 (2000): 690-696.
Anonymous 2017, "How to handle imbalanced classification problems in machine learning?", Analytics Vidhya, https://www.analyticsvidhya.com/blog/2107/03/imbalanced-classification-problem/. (Cited by Applicant in IDS dated 11/28/18 as non-patent literature item 2.)
Chandrashekar
Duhman, E., Y. Ekinci, and A. Tanriverdi, "Comparing alternative classifiers for database marketing: The case of imbalanced datasets", Expert Systems with Applications 39.1, January 2012, pp. 48-53.
Kuhn, Max. "Classification Using C5. 0 UseR! 2013." Pfizer Global R&D: Groton, CT, USA (2013). Available at https://staff.fmi.uvt.ro/~daniela.zaharie/dm2019/EN/lab/lab3/biblio/user_C5.0.pdf, accessed 6/18/21. 20 pages.
Milne L., Feature selection using neural networks with contribution measures. In AI-CONFERENCE- 1995 Nov 27 (pp. 571-571). World Scientific Publishing.
Song, Guojie, et al. "A mixed process neural network and its application to churn prediction in mobile communications." Sixth IEEE International Conference on Data Mining-Workshops (ICDMW'06). IEEE, 2006.

Claims 1-3, 15 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Mozer, Milne and Song.
Regarding claims 1 and 15, Mozer discloses a method (and a related system) comprising:
identifying, by a computing device, a plurality of derived attributes using an external data source;
The Examiner interprets “derived attributes” in view of the Applicant’s written description at [0059]: “The derived attributes include any metrics that are stored in the plurality of external data sources ... or that may be derived from data (e.g., customer data, demographics, usage, location, behavior, etc.) stored in the plurality of external data sources”. (Emphasis added.)P. 692, first col.: “The information sources listed above were distributed over three distinct databases maintained by the carrier. The databases contained thousands of fields, from which we identified 134 variables associated with each subscriber that we conjectured might be linked to churn. The variables included • subscriber location; • credit classification; • customer classification (e.g., corporate versus retail); • number of active services of various types; • beginning and termination dates of various services; • avenue through which services were activated; • monthly charges and usage; • number, dates, and nature of customer service calls; • number of calls made; • number of abnormally terminated calls.” (Emphasis added.)Also p. 692: “To evaluate the benefit of carefully constructing the representation, we performed studies using both naive and a sophisticated representations. The naive representation mapped the 134 variables to a vector of 148 elements in a straightforward manner. 
selecting, by the computing device, a plurality of key performance indicators ... based on an extremely rare event being modeled;
P. 692: “The sophisticated representation incorporated the domain knowledge of our experts to produce a 73-element vector encoding attributes of the subscriber. This representation collapsed across some of the variables that, in the judgement of the experts, could be lumped together (e.g., different types of calls to the customer service department) and expanded on others (e.g., translating the scalar length of time with carrier to a multidimensional basis-function representation, where the receptive-field centers of the basis functions were suggested by the domain experts) and performed transformations of other variables (e.g., ratios of two variables or time-series regression parameters).” (Emphasis added.)The Examiner interprets “extremely rare event” in view of the Applicant’s written description at [0002]: “Modeling techniques are used to predict various extremely rare events, including subscriber chum (turnover).” (Emphasis added.)
constructing, by the computing device, a linear model using the plurality of key performance indicators; and
p. 692, sec. VI: "logit regression". The Examiner notes that logit regression, aka logistic regression, is a linear model.
predicting, by the computing device, occurrences of the extremely rare event using the linear model.
P. 693: “For each predictor, we obtain an estimate of the probability of churn for each subscriber in the data set by merging the test sets from the ten data splits. Because decision making ultimately requires a “churn” or “no churn” prediction, the continuous probability measure must be thresholded to obtain a discrete predicted outcome.”P. 692: “Numerical variables, such as the length of time a subscriber had been with the carrier, were translated to an element of the representational vector that was linearly related to the variable value.” (Emphasis added.)logit regression". The Examiner notes that logit regression, aka logistic regression, is a linear model.Also, see p. 692, sec. VII, discussing boosting. (The Examiner notes that boosting creates a strong classifier by using linear combinations of weak classifiers, whether the underlying classifiers are linear or not.) 
Milne discloses the following additional limitation which Mozer does not seem to disclose explicitly:
selecting, by the computing device, a plurality of key performance indicators from the plurality of derived attributes using a neural network...
PP. 1-2L “To push [the abilities of neural networks] to the limit of their capabilities we need to recognize that ... using only the significant features in training and classification will give use the best possible results. .... So we use the neural network to help us decide which are the most useful features in giving a classification. .... By giving a measure of the contribution each input feature makes to the final output of the network we can select the features to use.” (Emphasis added.)The Examiner notes that Milne uses neural networks to perform feature selection, see abstract and passim.
At the time of filing, it would have been obvious to a person of ordinary skill to perform feature selection using a neural network (as taught by Milne) when training a neural network for a classification task (e.g. churn prediction, as in Mozer). As noted by Milne, performing feature selection using a neural network “will reduce the noise and extraneous information that the network has to deal with as well as reducing training an classification times.” (P. 2.) Both disclosures pertain to neural networks.
Independent claim 15 recites a system whose functionality is substantially identical to that of claim 1 (which recites a method). Therefore, claim 15 is rejected for the same reason as claim 1. Its additional limitations—namely a hardware processor, a computer readable memory, and a computer readable storage medium—are each inherent in each of Mozer, Milne and Song.

analyzing, by the computing device, the plurality of derived attributes on the neural network.
P. 692, sec. VI: neural network model. See also p. 691, sec. III discussing data set (i.e. derived attributes as discussed in the rejection of claim 1). In other words, the neural network in Mozer analyzes the attributes (as discussed in claim 1) in order to predict customer churn.Milne also discloses this limitation, insofar as the disclosed system analyzes input features using a neural network in order to perform a classification task. See p. 3, sec. 3.

Regarding claims 3 and 17, Mozer discloses the further limitation comprising measuring, by the computing device, performance of the linear model on different datasets.
PP. 693-94, sec. VIII: results and discussion, including performance metrics at p. 694, first col. The Examiner notes that logit regression, aka logistic regression, is a linear model.
P. 692, first col.: “The information sources listed above were distributed over three distinct databases maintained by the carrier.” (Emphasis added.)The Examiner notes that Song also teaches this limitation, see p. 4, sec. 4.3 “Accuracy evaluation”.

Claims 5 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Mozer, Milne, and Kuhn.
Regarding claims 5 and 19, Kuhn discloses its further limitation which neither Mozer, Milne nor Song seems to disclose explicitly wherein the selecting the plurality of key performance indicators comprises using C5.0 Winnow Attributes to automatically select the plurality of key performance indicators while reducing overfitting.

At the time of filing, it would have been obvious to a person of ordinary skill to apply the technique disclosed by Kuhn for winnowing to the combined system of Mozer and Milne because it can lead to similar results with a faster runtime. See e.g. results at Kuhn p. 17.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Mozer, Milne, Kuhn, and Anonymous 2017.
Regarding claim 6, Anonymous 2017 discloses its further limitation which neither Mozer, Milne, nor Kuhn seem to disclose explicitly wherein the selecting the plurality of key performance indicators further comprises performing neural network bagging using multilayer perceptron.
P. 12, sec. 2.2.1: Batting based ensemble techniques. Also p. 13, last paragraph noting the applicability to neural network classification models, continuing on p. 14.
At the time of filing, it would have been obvious to a person of ordinary skill to apply the techniques disclosed in Anonymous 2017 to the combined system of Mozer/Milne/Kuhn because it can help classifiers reduce overfitting, thus resulting in improved classification performance on real-world problems. See Anonymous 2017, p. 12.

Claims 7 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Mozer, Milne and Duman.
Regarding claims 7 and 20, Duman discloses the following further limitation which neither Mozer nor Milne seems to disclose explicitly wherein the selecting the plurality of key performance indicators comprises using Chi-square Automatic Interaction Detector (CHAID) to automatically select the plurality of key performance indicators while reducing overfitting.
P. 50, sec. 3.3 CHAID. “Using the significance of a statistical test as a criterion, it evaluates all of the values of a potential predictor field. It merges values that are judged to be statistically similar with respect to the target variable and maintains all other values that are dissimilar. It then selects the best predictor to form the first branch in the decision tree, such that each child node is made of a group of similar values of the selected field. This process continues recursively until the tree is fully grown.”
At the time of filing, it would have been obvious to a person of ordinary skill to combine the CHAID technique disclosed by Duman with the combined system of Mozer/Milne because it may result in superior classification results, as well as fast training. (See generally Duman sec. 3.3.) All three disclosures pertain to classification/prediction using imbalanced data sets.

Claims 8 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Mozer, Milne and Song.
Independent claim 8 recites a computer program product whose functionality is substantially identical to that of claim 1 (which recites a method). Therefore, claim 8 is rejected for the same reason as claim 1. Its additional limitations—namely a computer readable storage medium capable if implementing the described functionality—is inherent in each of Mozer and Milne. Claim 8 also recites one additional step which is not recited in claim 1—this limitation is taught by Song.
target a marketing campaign using the predicted occurrences of the extremely rare event.
P. 1, introduction: “It costs about five times as much to sign on a new subscriber as to retain an existing one [2]. However, given that most customers will only signal their intention to churn when they call to cancel their account, it is difficult using standard techniques to target for anti-churn marketing. Thus, an ideal solution to combating churn is to build a churn prediction model by using data mining techniques to predict customer churning behaviors, which are becoming more than necessary for the success of the enterprise, especially in the highly competitive mobile communications industry.” (Emphasis added.)PP. 693-94: “Based on a subscriber’s predicted churn probability, we must decide whether to offer the subscriber some incentive to remain with the carrier, which will presumably reduce the likelihood of churn. The incentive will be offered to any subscriber whose churn probability is above a certain threshold. The threshold will be selected to maximize the expected cost savings to the carrier; we will refer to this as the optimal decision-making policy.” (Emphasis added.)

At the time of filing, it would have been obvious to a person of ordinary skill to employ a targeted marketing campaign in response to predicted churn (as taught by Song) in the combined system of Mozer/Milne because subscriber churn results in lost revenue. Both Mozer and Song pertain to churn prediction.

Regarding claim 10, Mozer discloses the further limitation comprising measure[ing] performance of the linear model on different datasets.
PP. 693-94, sec. VIII: results and discussion, including performance metrics at p. 694, first col.
P. 692, first col.: “The information sources listed above were distributed over three distinct databases maintained by the carrier.” (Emphasis added.)The Examiner notes that Song also teaches this limitation, see p. 4, sec. 4.3 “Accuracy evaluation”.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Mozer, Milne, Song, and Chandrashekar.
analyz[ing] the plurality of derived attributes on the neural network...
P. 692, sec. VI: neural network model. See also p. 691, sec. III discussing data set (i.e. derived attributes as discussed in the rejection of claim 1).The Examiner notes that Song also discloses this limitation, see Song p. 2, sec. 2 “mixed PNN model” and figure depicting the topological structure of MPNN.
Chandrashekar discloses the following further limitation which neither Mozer nor Milne seems to disclose explicitly wherein the selecting comprises minimizing a number of the key performance indicators unless a number of events exceeds a predefined number.
P. 17, sec. 2: “Filter methods use variable ranking techniques as the principle criteria for variable selection by ordering. Ranking methods are used due to their simplicity and good success is reported for practical applications. A suitable ranking criterion is used to score the variables and a threshold is used to remove variables below the threshold.” (Emphasis added.)
At the time of filing, it would have been obvious to a person of ordinary skill to employ a threshold for determining when to continuous eliminating features during feature selection (as taught by Chandrashekar) in the combined system of Mozer and Milne because this would allow system engineers to balance two competing priorities, namely model accuracy (which may benefit from considering more features) and computational resources used in model training (which may benefit from considering fewer features). Each of Mozer, Milne, and Chandrashekar pertains to machine learning.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Mozer, Milne, Song, and Kuhn.
wherein the selecting the plurality of key performance indicators comprises using C5.0 Winnow Attributes to automatically select the plurality of key performance indicators while reducing overfitting.
P. 14: Winnowing. “Winnowing is a feature selection step conducted before modeling.” See generally pp. 14-16 discussing winnowing using C5.0.
At the time of filing, it would have been obvious to a person of ordinary skill to apply the technique disclosed by Kuhn for winnowing to the combined system of Mozer/Milne/Song because it can lead to similar results with a faster runtime. See e.g. results at Kuhn p. 17.

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Mozer, Milne, Song, Kuhn, and Anonymous 2017.
Regarding claim 13, Anonymous 2017 discloses its further limitation which neither Mozer, Milne, Song, nor Kuhn seem to disclose explicitly wherein the selecting the plurality of key performance indicators further comprises performing neural network bagging using multilayer perceptron.
P. 12, sec. 2.2.1: Batting based ensemble techniques. Also p. 13, last paragraph noting the applicability to neural network classification models, continuing on p. 14.
At the time of filing, it would have been obvious to a person of ordinary skill to apply the techniques disclosed in Anonymous 2017 to the combined system of Mozer/Milne/Song/Kuhn because it can help classifiers reduce overfitting, thus resulting in improved classification performance on real-world problems. See Anonymous 2017, p. 12.

Claims 14 is rejected under 35 U.S.C. 103 as being unpatentable over Mozer, Milne, Song, and Duman.
Regarding claim 14, Duman discloses the following further limitation which neither Mozer nor Song seems to disclose explicitly wherein the selecting the plurality of key performance indicators comprises using Chi-square Automatic Interaction Detector (CHAID) to automatically select the plurality of key performance indicators while reducing overfitting.
P. 50, sec. 3.3 CHAID. “Using the significance of a statistical test as a criterion, it evaluates all of the values of a potential predictor field. It merges values that are judged to be statistically similar with respect to the target variable and maintains all other values that are dissimilar. It then selects the best predictor to form the first branch in the decision tree, such that each child node is made of a group of similar values of the selected field. This process continues recursively until the tree is fully grown.”
At the time of filing, it would have been obvious to a person of ordinary skill to combine the CHAID technique disclosed by Duman with the combined system of Mozer/Milne/Song because it may result in superior classification results, as well as fast training. (See generally Duman sec. 3.3.) All three disclosures pertain to classification/prediction using imbalanced data sets.

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Mozer, Milne and Chandrashekar.
Regarding claim 16, Mozer discloses the further limitation comprising analyz[ing] the plurality of derived attributes on the neural network...
See also p. 691, sec. III discussing data set (i.e. derived attributes as discussed in the rejection of claim 1).The Examiner notes that Song also discloses this limitation, see Song p. 2, sec. 2 “mixed PNN model” and figure depicting the topological structure of MPNN.
Chandrashekar discloses the following further limitation which neither Mozer nor Milne seems to disclose explicitly wherein the selecting comprises minimizing a number of the key performance indicators unless a number of events exceeds a predefined number.
P. 17, sec. 2: “Filter methods use variable ranking techniques as the principle criteria for variable selection by ordering. Ranking methods are used due to their simplicity and good success is reported for practical applications. A suitable ranking criterion is used to score the variables and a threshold is used to remove variables below the threshold.” (Emphasis added.)
At the time of filing, it would have been obvious to a person of ordinary skill to employ a threshold for determining when to continuous eliminating features during feature selection (as taught by Chandrashekar) in the combined system of Mozer and Milne because this would allow system engineers to balance two competing priorities, namely model accuracy (which may benefit from considering more features) and computational resources used in model training (which may benefit from considering fewer features). All three disclosures pertain to machine learning.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Vincent Gonzales whose telephone number is (571) 270-3837. The examiner can normally be reached on Monday-Friday 7 a.m. to 4 p.m. MT.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Vincent Gonzales/Primary Examiner, Art Unit 2124