DETAILED ACTION
This action is written in response to the remarks and amendments dated 4/11/22. This action is made final. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
In view of the Applicant’s arguments, the Examiner withdraws the outstanding rejections under §101.
The Applicants argue that the previous art of record does not anticipate or render obvious the claims as currently amended. The Examiner provides updated prior art rejections below necessitated by the current amendments. Additional arguments are also addressed below.

	
Claim Rejections - 35 USC § 112(b) - Indefiniteness
The following is a quotation of the second paragraph of 35 U.S.C. 112:
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 1-7 are rejected under 35 U.S.C. 112(b), as being indefinite for failing to particularly point out and distinctly claim the subject matter which applicant regards as the invention. Claim 1 recites “a plurality of key performance indicators from the plurality of derived attributes using a neural network and based on an extremely rare event being modeled assuming a predetermined minimum number of events in training data of the neural network”. However, it is unclear which of several meaning the Applicant intends:
the model assumes that a predetermined minimum number of events (ie observations) are present the training data
the model assumes that a predetermined minimum number of events (ie observations) have been identified as extremely rare among the observations in the training data
selection of key performance indicators occurs only if a predetermined minimum number of events (ie observations) are present the training data
selection of key performance indicators occurs only if a predetermined minimum number of events (ie observations) have been identified as extremely rare among the observations in the training data
Because it is not clear which of the above interpretations is applicable, the term is ambiguous, and consequently a person of ordinary skill would not be able to understand the scope of the claim with reasonable certainty. Therefore the claim is indefinite. This rejection applies equally to dependent claims 2-7.

Allowable Subject Matter
Claims 8-14 are allowed. Claims 4 and 18 are objected to as being dependent upon a rejected base claim, but would be allowable over the prior art if rewritten in independent form including all of the limitations of the base claim and any intervening claims. These claims each recite “wherein the performance of the linear model is measured by computing accuracy as a measure of a proportional hit rate.” The Examiner notes that “proportional hit rate” does not appear to be a widely used term of art within the field of computer science. The Examiner interprets this term in view of the definition provided by the Applicant in their specification at [0075] (formula 1, reproduced below).

    PNG
    media_image1.png
    130
    774
    media_image1.png
    Greyscale



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.
The following are the references relied upon in the rejections below:
Mozer (primary reference): Mozer, Michael C., et al. "Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry." IEEE Transactions on neural networks 11.3 (2000): 690-696.
Anonymous 2017, "How to handle imbalanced classification problems in machine learning?", Analytics Vidhya, https://www.analyticsvidhya.com/blog/2107/03/imbalanced-classification-problem/. (Cited by Applicant in IDS dated 11/28/18 as non-patent literature item 2.)
Basheer IA, Hajmeer M. Artificial neural networks: fundamentals, computing, design, and application. Journal of microbiological methods. 2000 Dec 1;43(1):3-1.
Chandrashekar G, Sahin F. A survey on feature selection methods. Computers & Electrical Engineering. 2014 Jan 1;40(1):16-28.
Duhman, E., Y. Ekinci, and A. Tanriverdi, "Comparing alternative classifiers for database marketing: The case of imbalanced datasets", Expert Systems with Applications 39.1, January 2012, pp. 48-53.
Kuhn, Max. "Classification Using C5. 0 UseR! 2013." Pfizer Global R&D: Groton, CT, USA (2013). Available at https://staff.fmi.uvt.ro/~daniela.zaharie/dm2019/EN/lab/lab3/biblio/user_C5.0.pdf, accessed 6/18/21. 20 pages.
Milne L., Feature selection using neural networks with contribution measures. In AI-CONFERENCE- 1995 Nov 27 (pp. 571-571). World Scientific Publishing.
Yu R, Qiu H, Wen Z, Lin C, Liu Y. A survey on social media anomaly detection. ACM SIGKDD Explorations Newsletter. 2016 Aug 1;18(1):1-4.

Claims 1-3, 15 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Mozer, Milne, Basheer, and Yu.
Regarding claims 1 and 15, Mozer discloses a method (and a related system) comprising:
identifying, by a computing device, a plurality of derived attributes using an external data source;
The Examiner interprets “derived attributes” in view of the Applicant’s written description at [0059]: “The derived attributes include any metrics that are stored in the plurality of external data sources ... or that may be derived from data (e.g., customer data, demographics, usage, location, behavior, etc.) stored in the plurality of external data sources”. (Emphasis added.)P. 692, first col.: “The information sources listed above were distributed over three distinct databases maintained by the carrier. The databases contained thousands of fields, from which we identified 134 variables associated with each subscriber that we conjectured might be linked to churn. The variables included • subscriber location; • credit classification; • customer classification (e.g., corporate versus retail); • number of active services of various types; • beginning and termination dates of various services; • avenue through which services were activated; • monthly charges and usage; • number, dates, and nature of customer service calls; • number of calls made; • number of abnormally terminated calls.” (Emphasis added.)Also p. 692: “To evaluate the benefit of carefully constructing the representation, we performed studies using both naive and a sophisticated representations. The naive representation mapped the 134 variables to a vector of 148 elements in a straightforward manner. Numerical variables, such as the length of time a subscriber had been with the carrier, were translated to an element of the representational vector that was linearly related to the variable value. We imposed lower and upper limits on the variables to suppress irrelevant variation and to not mask relevant variation by too large a dynamic range; vector elements were restricted to lie between and standard deviations of the variable. One-of-discrete variables, such as credit classification, were translated into an -dimensional subvector with one nonzero element.” (Emphasis added.)
selecting, by the computing device, a plurality of key performance indicators ... based on an extremely rare event being modeled ... ;
P. 692: “The sophisticated representation incorporated the domain knowledge of our experts to produce a 73-element vector encoding attributes of the subscriber. This representation collapsed across some of the variables that, in the judgement of the experts, could be lumped together (e.g., different types of calls to the customer service department) and expanded on others (e.g., translating the scalar length of time with carrier to a multidimensional basis-function representation, where the receptive-field centers of the basis functions were suggested by the domain experts) and performed transformations of other variables (e.g., ratios of two variables or time-series regression parameters).” (Emphasis added.)The Examiner interprets “extremely rare event” in view of the Applicant’s written description at [0002]: “Modeling techniques are used to predict various extremely rare events, including subscriber chum (turnover).” (Emphasis added.)
constructing, by the computing device, a linear model using the plurality of key performance indicators ... ; and
p. 692, sec. VI: "logit regression". The Examiner notes that logit regression, aka logistic regression, is a linear model.
predicting, by the computing device, occurrences of the extremely rare event using the linear model.
P. 693: “For each predictor, we obtain an estimate of the probability of churn for each subscriber in the data set by merging the test sets from the ten data splits. Because decision making ultimately requires a “churn” or “no churn” prediction, the continuous probability measure must be thresholded to obtain a discrete predicted outcome.”P. 692: “Numerical variables, such as the length of time a subscriber had been with the carrier, were translated to an element of the representational vector that was linearly related to the variable value.” (Emphasis added.)Also, p. 692, sec. VI: "logit regression". The Examiner notes that logit regression, aka logistic regression, is a linear model.Also, see p. 692, sec. VII, discussing boosting. (The Examiner notes that boosting creates a strong classifier by using linear combinations of weak classifiers, whether the underlying classifiers are linear or not.) 
Milne discloses the following additional limitation which Mozer does not seem to disclose explicitly:
selecting, by the computing device, a plurality of key performance indicators from the plurality of derived attributes using a neural network...
PP. 1-2L “To push [the abilities of neural networks] to the limit of their capabilities we need to recognize that ... using only the significant features in training and classification will give use the best possible results. .... So we use the neural network to help us decide which are the most useful features in giving a classification. .... By giving a measure of the contribution each input feature makes to the final output of the network we can select the features to use.” (Emphasis added.)The Examiner notes that Milne uses neural networks to perform feature selection, see abstract and passim.
At the time of filing, it would have been obvious to a person of ordinary skill to perform feature selection using a neural network (as taught by Milne) when training a neural network for a classification task (e.g. churn prediction, as in Mozer). As noted by Milne, performing feature selection using a neural network “will reduce the noise and extraneous information that the network has to deal with as well as reducing training an classification times.” (P. 2.) Both disclosures pertain to neural networks.

Basheer discloses the following further limitation which neither Mozer not Milne discloses:
... assuming a predetermined number of events in training data of the neural network;
P. 18, sec. 11.1: “Models developed from data generally depend on database size. ANNs, like other empirical models, may be obtained from databases of any size, however generalization of these models to data from outside the model development domain will be adversely affected. Since ANNs are required to generalize for unseen cases, they must be used as interpolators. Data to be used for training should be sufficiently large to cover the possible known variation in the problem domain.”
At the time of filing, it would have been obvious to a person of ordinary skill to condition the testing or implementation of a neural network system (such as the Mozer/Milne combination) upon having a sufficient quantity of training data (as taught by Basheer). As explained in the latter reference, the outputs of a neural network are unreliable when insufficiently trained. Although a particular number of training instances is not specified, it is clear that more examples are better, albeit with diminishing returns as the number of instances grows. This information would allow a person of ordinary skill to set an appropriate training threshold for the task at hand.

Yu discloses the following further limitation which neither Mozer nor Milne discloses:
constructing, by the computing device, a linear model using the plurality of key performance indicators including social media data; and
P. 3: “Specifically, for each time stamp t = 1, 2, ... , the inputs are given as a snapshot of the network in the form of a binary string.” (Emphasis added.)P. 6: “To detect anomalies, the authors define the typical pattern as a linear combination of the past activity vectors”. (Emphasis added.)
At the time of filing, it would have been obvious to a person of ordinary skill to apply the combined anomaly detection system of Mozer/Milne/Basheer to problems using social media data (as taught by Yu) because—as noted by Yu—"Social media anomaly detection is of critical importance to prevent malicious activities such as bullying, terrorist attack planning, and fraud information dissemination.” (Abstract.)

Independent claim 15 recites a system whose functionality is substantially identical to that of claim 1 (which recites a method). Therefore, claim 15 is rejected for the same reason as claim 1. Its additional limitations—namely a hardware processor, a computer readable memory, and a computer readable storage medium—are each inherent in each of Mozer and Milne.

Regarding claim 2, Mozer discloses the further limitation comprising analyzing, by the computing device, the plurality of derived attributes on the neural network.
P. 692, sec. VI: neural network model. See also p. 691, sec. III discussing data set (i.e. derived attributes as discussed in the rejection of claim 1). In other words, the neural network in Mozer analyzes the attributes (as discussed in claim 1) in order to predict customer churn.Milne also discloses this limitation, insofar as the disclosed system analyzes input features using a neural network in order to perform a classification task. See p. 3, sec. 3.

Regarding claims 3 and 17, Mozer discloses the further limitation comprising measuring, by the computing device, performance of the linear model on different datasets, including out of time (OOT) validation datasets.
PP. 693-94, sec. VIII: results and discussion, including performance metrics at p. 694, first col. The Examiner notes that logit regression, aka logistic regression, is a linear model.
P. 692, first col.: “The information sources listed above were distributed over three distinct databases maintained by the carrier.” (Emphasis added.)P. 695, first col.: “In real-world usage, however, one would train the predictor on all subscribers at a given point in time, say, to predict January/February churn, and then test the following months, predicting March/April churn.” (Emphasis added.) The  Examiner notes that this passage describes OOT validation.

Claims 5 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Mozer, Milne, Basheer, Yu and Kuhn.
Regarding claims 5 and 19, Kuhn discloses its further limitation which neither Mozer, Milne, Basheer nor Yu seems to disclose explicitly wherein the selecting the plurality of key performance indicators comprises using C5.0 Winnow Attributes to automatically select the plurality of key performance indicators while reducing overfitting.
P. 14: Winnowing. “Winnowing is a feature selection step conducted before modeling.” See generally pp. 14-16 discussing winnowing using C5.0.
At the time of filing, it would have been obvious to a person of ordinary skill to apply the technique disclosed by Kuhn for winnowing to the combined system of Mozer/Milne/Basheer/Yu because it can lead to similar results with a faster runtime. See e.g. results at Kuhn p. 17.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Mozer, Milne, Basheer, Yu, Kuhn and Anonymous 2017.
Regarding claim 6, Anonymous 2017 discloses its further limitation which neither Mozer, Milne, Basheer, Yu nor Kuhn seem to disclose explicitly wherein the selecting the plurality of key performance indicators further comprises performing neural network bagging using multilayer perceptron.
P. 12, sec. 2.2.1: Bagging based ensemble techniques. Also p. 13, last paragraph noting the applicability to neural network classification models, continuing on p. 14.
At the time of filing, it would have been obvious to a person of ordinary skill to apply the techniques disclosed in Anonymous 2017 to the combined system of Mozer/Milne/Basheer/Yu/Kuhn because it can help classifiers reduce overfitting, thus resulting in improved classification performance on real-world problems. See Anonymous 2017, p. 12.

Claims 7 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Mozer, Milne, Basheer, Yu and Duman.
Regarding claims 7 and 20, Duman discloses the following further limitation which neither Mozer nor Milne seems to disclose explicitly wherein the selecting the plurality of key performance indicators comprises using Chi-square Automatic Interaction Detector (CHAID) to automatically select the plurality of key performance indicators while reducing overfitting.
P. 50, sec. 3.3 CHAID. “Using the significance of a statistical test as a criterion, it evaluates all of the values of a potential predictor field. It merges values that are judged to be statistically similar with respect to the target variable and maintains all other values that are dissimilar. It then selects the best predictor to form the first branch in the decision tree, such that each child node is made of a group of similar values of the selected field. This process continues recursively until the tree is fully grown.”
At the time of filing, it would have been obvious to a person of ordinary skill to combine the CHAID technique disclosed by Duman with the combined system of Mozer/Milne/Basheer/Yu because it may result in superior classification results, as well as fast training. (See generally Duman sec. 3.3.) All three disclosures pertain to classification/prediction using imbalanced data sets.

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Mozer, Milne, Basheer, Yu and Chandrashekar.
Regarding claim 16, Mozer discloses the further limitation comprising analyz[ing] the plurality of derived attributes on the neural network...
P. 692, sec. VI: neural network model. See also p. 691, sec. III discussing data set (i.e. derived attributes as discussed in the rejection of claim 1).
Chandrashekar discloses the following further limitation which neither Mozer nor Milne seems to disclose explicitly wherein the selecting comprises minimizing a number of the key performance indicators unless a number of events exceeds a predefined number.
P. 17, sec. 2: “Filter methods use variable ranking techniques as the principle criteria for variable selection by ordering. Ranking methods are used due to their simplicity and good success is reported for practical applications. A suitable ranking criterion is used to score the variables and a threshold is used to remove variables below the threshold.” (Emphasis added.)
At the time of filing, it would have been obvious to a person of ordinary skill to employ a threshold for determining when to continuous eliminating features during feature selection (as taught by Chandrashekar) in the combined system of Mozer/Milne/Basheer/Yu because this would allow system engineers to balance two competing priorities, namely model accuracy (which may benefit from considering more features) and computational resources used in model training (which may benefit from considering fewer features). All three disclosures pertain to machine learning.

Additional Relevant Prior Art
The following references were identified by the Examiner as being relevant to the disclosed invention, but are not relied upon in any particular prior art rejection:
Song discloses, inter alia, a neural network model for subscriber churn prediction, which also features an anti-churn targeted marketing program. (Song, Guojie, et al. "A mixed process neural network and its application to churn prediction in mobile communications." Sixth IEEE International Conference on Data Mining-Workshops (ICDMW'06). IEEE, 2006.)


Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Vincent Gonzales whose telephone number is (571) 270-3837. The examiner can normally be reached on Monday-Friday 7 a.m. to 4 p.m. MT.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang, can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Vincent Gonzales/Primary Examiner, Art Unit 2124