DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response To Amendment
The amendments filed on 2022-05-31 have been entered.  Applicant’s amendments to the claims have overcome the following objections:
Line 6 of Claim 1 no longer begins with unnecessary extra spacing.  The objection is withdrawn.
Claims 2, 4-5, 10, and 12 now have the formula numbers enclosed within parentheses.  The objection is withdrawn.
Claim 11 now begins on a new line.  The objection is withdrawn.
Claim 20 now does end with a period.  The objection is withdrawn.
The objection to Claim 10 remains and is detailed below in “Claim Objections”, along with a new objection to Claim 1.  The rejections under 35 USC 112 remain, and are also detailed below.  The status of the claims is as follows:
Claims 1-20 remain pending in the application.
Claims 1-2, 4-8, 10-12, 14-16, and 18-19 are amended.

Response to Arguments
Applicant’s arguments with respect to rejections under 35 USC 112 have been fully considered but are not persuasive.  Applicant argues that “Examiner has not asserted, let alone established, why the claimed features are believed to be indefinite from the perspective of one of ordinary skill in the art.”  Examiner respectfully disagrees, as the reason for the rejection was clearly laid out in the previous office action, the reason being that terms used within the formulae were not defined within the claims, nor within any claims on which the claims depend.  Details are provided in the 112 rejections below.
Applicant’s arguments with respect to rejections under 35 USC 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.  Applicant argues on Remarks Page 14 that the cited combination does not teach the contents of the historical data used for training, as per the amended matter.  However, below Examiner will show this to be obvious in light of combination with Ashiya et al. (US 2019/0295087 A1).  Applicant also argues on Remarks Page 14 that the cited combination does not teach the newly amended matter of using a SVM to optimize a logistic statistical model.  However, below Examiner will show this to be obvious in light of combination with Kannan et al. (“A hybrid binary classifier: Using modified logistic regression for non-support vector elimination”).

Claim Objections
Claim 1 is objected to because of the following informalities:  “an indication prior fraud” should be changed to read “an indication of prior fraud”.  Appropriate correction is required.
Claim 10 is objected to because of the following informalities: Lines 6 (“for any n.”) and 8 (“space.”) contain periods.  MPEP 608.01(m) states that “Each claim begins with a capital letter and ends with a period. Periods may not be used elsewhere in the claims except for abbreviations.”  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2, 4-5, 10, 12, and 20 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
These claims recite formulae containing terms which are not defined within the claim itself or within any claim from which the claim depends. 
As per Claim 2, the terms N, b, and tn are not defined within the claim or within any claim from which the claim depends.  
As per Claim 4, the terms L and N are not defined within the claim or within any claim from which the claim depends.  L is defined as a “loss function” in Claim 2, but Claim 4 does not depend from Claim 2 and does not inherit its definitions.
As per Claim 5, the terms L, N, and hw(x) are not defined.  Although hw(x) is defined in Claim 4, Claim 5 does not depend from Claim 4 and does not inherit its definitions.
As per Claim 10, the term tn is not defined within the claim or within any claim from which the claim depends.  
As per Claim 12, the terms N, b, and tn are not defined within the claim or within any claim from which the claim depends.  
As per Claim 20, the terms xi, xj, b, d, and σ are not defined within the claim or within any claim from which the claim depends.  
The particular terms discussed above are not an exhaustive list, and the Applicant is invited to review all claims containing formulae and/or mathematical terms to ensure that they are adequately defined.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 7-11, 13, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Singh et al. (“A Machine Learning Approach for Detection of Fraud based on SVM,” International Journal of Scientific Engineering and Technology, Volume No. 1, Issue No. 3, pp. 194-198, 01 July 2012, hereinafter “Singh”) in view of Ashiya et al. (US 2019/0295087 A1; hereinafter “Ashiya”) , further in view of Wijnhoven et al. (“Fast Training of Object detection using Stochastic Gradient Descent,” 2010 International Conference on Pattern Recognition, pp. 424-427, hereinafter “Wijnhoven”), further in view of Kannan et al. (“A hybrid binary classifier: Using modified logistic regression for non-support vector elimination”,2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS), pp 167-172; hereinafter “Kannan”), and further in view of Maldonado et al. (“Robust classification of imbalanced data using one-class and two-class SVM-based multiclassifiers,” Intelligent Data Analysis 18 (2014) 95-112, hereinafter “Maldonado.”).

Regarding claim 1, Singh discloses a computer-implemented method for detecting fraud-related events, the method comprising: training a [logistic statistical] model, during a training phase, using historical event data associated with fraud-related events, (Singh, § 5 “Implementation”, “Hence for testing of implementation of our algorithm we generated the data of true & false Transaction using different mean & variance & then mixed them with different probability. We used the MATLAB for the implementation of the algorithm because of its rich sets of mathematical functions and also supporting the inbuilt functions for SVM.”
wherein the model learns patterns to determine whether data associated with an event provides an indication that the event is fraudulent or non-fraudulent, (Singh, pg. 197 Steps 5-7 of Select one of three kernels, Train SVM, and Save the classifier.)  [The SVM is trained to recognize fraudulent transaction data.]
events inputted to the [logistic statistical] model being classified as fraudulent or non-fraudulent, during an operational phase, based on event-related parameters being processed by the [logistic statistical] model according to the training; (Singh, § 6 “Results” and tables of § 6, “The results are simulated for five different Fraud probabilities from 0.3 to 0.5 & changing the training data size from 30 to 100, then according to outputs of program the following tables are drawn which shows TPR = True Positive Rate TNR = True Negative Rate FPR = False Positive Rate FNR = False Negative Rate.
	Singh does not explicitly disclose the contents of their training set, and thus does not disclose wherein the historical event data associated with fraud- related events comprises one or more of the following: an indication prior fraud, a region from which a transaction originated, a destination for the transaction, a frequency of transactions, a mean transaction amount, and a variance of the transaction amount
	Ashiya teaches wherein the historical event data associated with fraud- related events comprises one or more of the following: an indication prior fraud, a region from which a transaction originated, a destination for the transaction, a frequency of transactions, a mean transaction amount, and a variance of the transaction amount (Ashiya, Para [0023], discloses:  “In particular embodiments, a database of historical transaction data is accessed to provide training data for a machine learning model. The training data contains transaction records stored as feature vectors that have been tagged as fraudulent or non-fraudulent.”  Ashiya, Para [0024], indicates a region from which the transaction originated:  “For example, the historical training data may show that transactions originating in certain regions of South America against European merchants having a US delivery address within a certain geographic range are correlated with fraudulent activity.”
	Ashiya and Singh are analogous art because they are both in the field of using machine learning for identify fraudulent transactions.
	It would have been obvious before the effective filing date of the claimed invention to combine the historical transaction information of Ashiya with the SVM classifier of Singh.  One of ordinary skill in the art would be motivated to do so in order to 
	The combination of Singh and Ashiya does not explicitly disclose continue training the [logistic statistical] model by iteratively adjusting parameters w and b, respectively associated with weights and biases for event-related input data; adjusting values associated with the parameters w and b to adjust preferences given to one or more event-related parameters and to influence the [logistic statistical] model toward generating an outcome that is more accurate;
Wijnhoven teaches continue training the [logistic statistical] model by iteratively adjusting parameters w and b, respectively associated with weights and biases for event-related input data; adjusting values associated with the parameters w and b to adjust preferences given to one or more event-related parameters and to influence the [logistic statistical] model toward generating an outcome that is more accurate; (Wijnhoven, § 5 “Conclusions” “We have incorporated the Stochastic Gradient Descent (SGD) algorithm for learning a linear SVM classifier in an object detection framework.”;
Wijnhoven, pg. 425, Col. 1, ¶ 5 “SGD considers one sample at each iteration and updates the weight vector w iteratively using a time-dependent weighting factor,” [Using Stochastic Gradient Descent for the weight vector w corresponding to the claimed w parameter]
Wijnhoven, § 4 “Experiments and Results”, ¶ 1 “We employ the SGD implementation svmsgd2 by Bottou […] In addition, we modify the gain factor for the updating of the bias term, that is also iteratively updated in the implementation.” [The SGD implementation of Wijnhoven also updates the bias term, corresponding to the claimed b parameter.])
Wijnhoven, Abstract “Incorporating SGD speeds up the optimization process significantly requiring only a single iteration over the training set to obtain results comparable to state-of-the-art SVM techniques.” [The SGD optimization corresponds to the claimed “generate an outcome that is more accurate.”]
	Wijnhoven and the combination of Singh and Ashiya are analogous art, as they are in the field of using Support Vector Machines for classification.
	It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the Stochastic Gradient Descent of Wijnhoven into the Support Vector Machines of Singh and Ashiya, the benefit being increased optimization with state-of-the-art performance, as recited in the Abstract of Wijnhoven: “Incorporating SGD speeds up the optimization process significantly requiring only a single iteration over the training set to obtain results comparable to state-of-the-art SVM techniques.”
	The combination of Singh, Ashiya, and Wijnhoven does not disclose a logical statistical model, nor does it teach and optimizing the logistic statistical model using a support vector machine that assigns additional training data as input to further optimize training of the logistic statistical model
	Kannan teaches logical statistical model, and teaches and optimizing the logistic statistical model using a support vector machine that assigns additional training data as input to further optimize training of the logistic statistical model (Kannan, Page 167 Top Right Paragraph, concludes:  “Our idea is to use a modified form of logistic regression to filter out as much data as possible using linear boundaries and resort to SVM only for the subset of data that cannot be linearly separated.”  Here, Kannan discloses first using a logistical statistical model (“logistic regression”) and then using SVM to optimize the model (“resort to SVM”), which uses additional training data as input (“subset of data that cannot be linearly separated”)).
	Kannan and the combination of Singh, Ashiya, and Wijnhoven are analogous art because they are both in the field of endeavor of machine learning.
	It would have been obvious before the effective filing date of the claimed invention to combine the Logistic Regression of Kannan with the SVM of Singh, Ashiya, and Wijnhoven.  One of ordinary skill in the art would be motivated to do so in order to improve accuracy on imbalanced data sets (Kannan, Page 167 Section II:  “Our initial study was motivated by an observation of performance of LR (Logistic Regression) on skewed data-sets. Since the cost function used in the LR does not account for skew in the dataset (possibly for honoring naturally occurring skews), it always favors the majority label in the data… This led us to develop a modified version of LR that tries to be fair to both labels.”)
	The combination of Singh, Ashiya, Wijnhoven, and Kannan does not disclose the optimizing of the logistic statistical model consistent with an objective for making the logistic statistical model more balanced with respect to training by taking into account fraudulent transactions and non-fraudulent transactions, the objective being accomplished by at least attempting to cause a reduction or minimization in penalties calculated based on determining whether the logistic statistical model wrongfully categorized the events inputted to the logistic statistical model.
Maldonado teaches the optimizing of the logistic statistical model consistent with an objective for making the logistic statistical model more balanced [with respect to training by taking into account fraudulent transactions and non-fraudulent transactions], the objective being accomplished by at least attempting to cause a reduction or minimization in penalties calculated based on determining whether the logistic statistical model wrongfully categorized the events inputted to the logistic statistical model. (Recall above that Ashiya discloses in [0023]:  “The training data contains transaction records stored as feature vectors that have been tagged as fraudulent or non-fraudulent.” Maldonado, § 3.2 “Cost-sensitive learning”, “Cost sensitive techniques provide a viable alternative to sampling methods for imbalanced learning domains [11]. The objective of cost-sensitive learning is to develop a classification function that minimizes the overall cost on the training data set. This approach is based on the concept of the cost matrix, which is a numerical representation of the penalty when classifying instances from one class to another.
For example, we define C- as the cost of misclassifying [corresponds to claimed “wrongfully categorized”] a majority class instance as a minority class instance and let C+ represent the cost of the contrary case. Typically, there is no cost for correct classification and the cost of misclassification in the target class is higher than the contrary case, i.e., C+ > C-.” [Cost-sensitive learning assigns cost penalties for misclassification, and attempts to minimize these costs across the training set. This is useful for “imbalanced learning domains,” where there many more examples of one class than the other in the training data set.]  Thus, as Maldonado takes into account two classes C+ and C-, in combination with Ashiya this teaches the limitation of taking into account fraudulent transactions and non-fraudulent transactions.
Maldonado and the combination of Singh, Ashiya, Wijnhoven, and Kannan is analogous art as it is in the field of using Support Vector Machines for classification.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the cost matrix of Maldonado into the Support Vector Machine of Singh, Ashiya, Wijnhoven, and Kannan, the benefit being that it aids optimization when using imbalanced learning domains, as cited by Maldonado in § 3.2 ““Cost sensitive techniques provide a viable alternative to sampling methods for imbalanced learning domains [11]. The objective of cost-sensitive learning is to develop a classification function that minimizes the overall cost on the training data set. This approach is based on the concept of the cost matrix, which is a numerical representation of the penalty when classifying instances from one class to another.”
	Claims 11 and 16 recite similar limitations as claim 1 and are rejected under the same rationale as applied to claim 1 above.

Regarding claim 3, the combination of references as applied to claim 1 above teaches [t]he computer-implemented method of claim 1. Further, Wijnhoven teaches wherein a stochastic gradient descend (SGD) method is utilized to adjust the values associated with the parameters w and b. (Wijnhoven, § 5 “Conclusions” “We have incorporated the Stochastic Gradient Descent (SGD) algorithm for learning a linear SVM classifier in an object detection framework.”;
Wijnhoven, pg. 425, Col. 1, ¶ 5 “SGD considers one sample at each iteration and updates the weight vector w iteratively using a time-dependent weighting factor,” [Using Stochastic Gradient Descent for the weight vector w corresponding to the claimed w parameter]
Wijnhoven, § 4 “Experiments and Results”, ¶ 1 “We employ the SGD implementation svmsgd2 by Bottou […] In addition, we modify the gain factor for the updating of the bias term, that is also iteratively updated in the implementation.” [The SGD implementation of Wijnhoven also updates the bias term, corresponding to the claimed b parameter.])
Claims 13 and 17 recite similar limitations as claim 3 and are rejected under the same rationale as applied to claim 3 above.

Regarding claim 7, the combination of references as applied to claim 1 above teaches [t]he computer-implemented method of claim 1.  Further, Singh discloses wherein support vector machines (SVMs) are employed (Singh, § 3 “Support Vector Machines”, ¶ 1 “SVMs function by nonlinearly projecting the training data in the input space to a feature space of higher dimension by use of a kernel function. This results in a linearly separable dataset that can be separated by a linear classifier.”)
to implement supervised learning models (Singh, § 3.1 “SVM Classification”, ¶ 1 “, Let xi be a feature vector or a set of input variables and let yi be a corresponding class label” [labeled training data for supervised learning])
 with associated learning algorithms that analyze data used for classification and regression analysis (Ibid., ¶ 2 “The optimal separating hyperplane, when
classes have equal loss-functions, maximizes the margin between the hyperplane and the closest samples of classes. The margin is given by [Equations (1), (2), and (3)])
to optimize the computing model.  (Ibid., “The optimal separating hyperplane can now be solved by maximizing (3) subject to (1).”

Claim 18 recites similar limitations as claim 7 and is rejected under the same rationale as applied to claim 7 above.

Regarding claim 9, the combination of references as applied to claim 8 below (see the discussion of similar claim 19) teaches [t]he computer-implemented method of claim 8.  Further, Singh discloses wherein the SVMs perform a non-linear classification using a kernel method. (Singh, § 3 “SVM (Support Vector Machine), ¶ 1 “SVMs function by nonlinearly projecting the training data in the input space to a feature space of higher dimension by use of a kernel function. This results in a linearly separable dataset that can be separated by a linear classifier.”
Regarding claim 10, the combination of references as applied to claim 9 above teaches [t]he computer-implemented method of claim 9.  Further, Singh discloses wherein the SVMs are treated as max margin problems (Singh, § 3.1 “SVM Classification”, ¶ 2 “The optimal separating hyperplane, when classes have equal loss-functions, maximizes the margin [corresponds to claimed “max margin problems”] between the hyperplane and the closest samples of classes.”
according to Formula 1.5 or Formula 1.6, to further simplify the computing model using a Lagrange multiplier towards a solvable quadratic programming problem, 
    PNG
    media_image1.png
    28
    382
    media_image1.png
    Greyscale

(Singh, pg. 195, equation 4. [Equation 4 is the Lagrangian with respect to w, b, and α, but removing the α terms to make the Lagrangian a function of w and b reduces equation (4) to Formula 1.5 as claimed.] Ibid., “The objective is now to minimize the Lagrangian [equation (4)]” [corresponds to claimed “arg min” in Formula 1.5 as claimed]
wherein parameters w and b minimize the term ||w||2, (Singh, pg. 195, Equation (4) without the α term minimizes ½ ||w||2, which also minimizes ||w||2 as claimed)
on condition that the inequality persist for any n.  (Singh, pg. 195, Equation (1) expresses the claimed inequality for all i from 1 to n.)
- 27 of 33 -Attorney Docket No. 054874-429F01USΦ(xn) denotes a function that project xn into some lower dimensional space.  
Formula 1.6 [the claimed mathematical expression is not reproduced here]: 
(Singh, pg. 195, Equation (5) is the same as claimed Formula 1.6, and the paragraph following Equation (5) states “The optimal hyperplane can be found by maximizing (5)” [corresponds to the “max” function in Formula 1.6 as claimed.])

    PNG
    media_image2.png
    49
    235
    media_image2.png
    Greyscale
0
(Singh, pg. 195, Col. 2, ¶ 3 “Σi αiyi = 0”)
{a1, a2 ... an} are Lagrange multipliers, (Singh, pg. 195, equation (4) and Col. 2 ¶ 4 “In (4), αi are nonnegative Lagrange multipliers) 
which replace w and b. (Singh, pg. 195, Col. 2, ¶¶ 3-4 “Substituting w into (4) gives the dual form [equation (5)] which is not anymore an explicit function of w or b.”

Regarding claim 19, the combination of references as applied to claim 18 above teaches [t]he computer program product of claim 18.  Further, Singh discloses wherein a set of training examples marked as belonging to fraudulent or non-fraudulent categories and the SVMs are used to train the computing model as a non-probabilistic binary linear classifier and to perform a non-linear classification using a kernel method. (Singh, Abstract, “although many person has proposed their work for credit card fraud detection by characterizing the user spending profile, but in this paper we are proposing the SVM (support vector machine) based method with multiple kernel involvement” [SVM used for fraud detection];
Singh, § 3.1 “SVM Classification”, ¶ 1 “, Let xi […] be a feature vector or a set of input variables and let yi […] be a corresponding class label” [labeled training examples];
Singh, § 3.0, last paragraph “By their nature SVMs are intrinsically binary classifiers however there exist strategies by which they can be adapted to multiclass tasks. But in our case we not need multiclass classification.” [SVMs used in Singh as binary classifiers];
Singh, § 3.1 “SVM Classification”, ¶ 2 “In linearly separable cases a separating hyperplane satisfies [equation (1)] Where the hyperplane is denoted by a vector of weights w and a bias term b.” [The SVM may be used as a binary linear classifier.];
Singh, § 3.1, ¶ 5 “However, in most real world situations classes are not linearly separable and it is not possible to find a linear hyperplane that would satisfy (1) for all i = 1. . . n. In these cases a classification problem can be made linearly separable by using a nonlinear mapping [corresponds to claimed “kernel method”] into the feature space where classes are linearly separable.”;
Singh, pg. 196, Col. 1 describing three kernel methods (linear, polynomial, and Gaussian))
Claim 8 recites similar limitations as claim 19 and is rejected under the same rationale as applied to claim 19 above.

Regarding claim 20, the combination of references as applied to claim 18 above teaches [t]he computer program product of claim 18. Further, Singh discloses wherein the SVMs are treated as max margin problems and one or more of the following linear, polynomial or Gauss kernel methods are adopted to simplify the max margin problem calculations: 
Linear: K(xi, xj)  =xiTxj

    PNG
    media_image3.png
    96
    256
    media_image3.png
    Greyscale

(Singh, § 3.1 “SVM Classification”, ¶ 2 “The optimal separating hyperplane, when classes have equal loss-functions, maximizes the margin [corresponds to claimed “max margin problems”] between the hyperplane and the closest samples of classes.”;
Singh, pg. 196, Col. 1 showing equations for Linear, Polynomial, and Gaussian kernels. [The c term in the Linear kernel disclosed by Singh is not present in the claim, but is described as an “optional constant” (Singh, pg. 196, ¶ 4) and thus may be set to zero or omitted entirely];
Singh, § 3 “Support Vector Machine (SVM)”, ¶ 1 “SVMs function by nonlinearly projecting the training data in the input space to a feature space of higher dimension by use of a kernel function. This results in a linearly separable dataset that can be separated by a linear classifier.” [corresponds to claimed “kernel methods are adopted to simplify the max margin problem calculations”])

Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Singh, Ashiya, Wijnhoven, Kannan, and Maldonado and further in view of Li et al., “A Loss Function Analysis for Classification Methods in Text Categorization,” Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003, pp. 472-479, hereinafter “Li”.
Regarding claim 2, the combination of references as applied to claim 1 above teaches [t]he computer-implemented method of claim 1.  Further, Singh discloses generate an output yn: - 25 of 33 -Attorney Docket No. 054874-429F01US and yn represents a hypothetical prediction of x., (Singh, pg. 195 equation (1), [The equation term (<w * xi> +b) corresponding to the yn term in claimed Formula 1.1 represents the predicted result of the SVM which is added to a bias term b and then multiplied by the class label yi to find values defining the separating hyperplane.])
such that when a first condition is met, xn is categorized as fraudulent. (Singh, § 3.1 “SVM Classification” “In linearly separable cases a separating hyperplane satisfies (1) Where the hyperplane is denoted by a vector of weights w and a bias term b.” [When equation (1) is satisfied, the weights and biases define a hyperplane such that data points on one side of the plane are deemed fraudulent while points on the other side of the hyperplane are deemed non-fraudulent.])

The above combination does not explicitly disclose 

    PNG
    media_image4.png
    50
    379
    media_image4.png
    Greyscale
in λ denotes a coefficient of regularization term for w, and xn denotes a feature or attribute associated with an event inputted to the computing model,

Li teaches
 
    PNG
    media_image4.png
    50
    379
    media_image4.png
    Greyscale
in λ denotes a coefficient of regularization term for w, (Li, pg. 473, last paragraph and formula (1), “The optimization in SVM is to find ß that minimizes the sum of the two terms in formula 1. […] The value of λ controls the trade-off between the two terms, that is, it is the weight (algorithmically determined in the training phase of SVM) of the second term relative to the first term.” [The second term of Li’s formula (1) is λ||ß||2, with ß representing parameters corresponding to the claimed term w. The λ term in Li’s formula 1 acts as the claimed “regularization term.”]
and xn denotes a feature or attribute associated with an event inputted to the computing model, (Li, § 2, first bullet point, “The training data consists of N pairs of (x1,y,), (x2,y2),…,(xN,yN).  Vector xi […] represents the values of the p input variables in the ith training example.” [corresponds to claimed “feature or attribute associated with an event inputted to the computing model”]

	Li is analogous art as it is in the field of using Support Vector Machines for classification.  
	It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the Support Vector Machine of Singh with the loss function of Li, the benefit being that describing the loss in a Support Vector Machine allows for the optimization of the Support Vector Machine by minimizing the loss, as recited by Li on page 473, last paragraph “The first term in the right hand side of formula 1 is the cumulative training-set loss and the second term is the complexity penalty and both are functions of vector ß/ The optimization in SVM is to find ß that minimizes the sum of the two terms in formula 1.”

	Claim 12 recites similar limitations as claim 2 and is rejected under the same rationale as applied to claim 2 above.

Claims 4-6 and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Singh, Ashiya, Wijnhoven, Kannan, and Maldonado and further in view of He et al. “A novel ensemble method for credit scoring: Adaption of different imbalance ratios,” Expert Systems With Applications 98 (11 January 2018) 105-117. Hereinafter “He”.


Regarding claim 4, the combination of references as applied to claim 1 above teaches [t]he computer-implemented method of claim 1.
The above combination does not explicitly teach wherein a loss function according to Formula 1.3 is adopted to optimize the computing model based on determining a cross entropy loss function for calculating a loss value for the computing model,
 
    PNG
    media_image5.png
    18
    432
    media_image5.png
    Greyscale

hw(x) denoting a hypothetical prediction of x, and tn denoting a label of sample xn.  
He teaches wherein a loss function according to Formula 1.3 is adopted to optimize the computing model based on determining a cross entropy loss function for calculating a loss value for the computing model,
 
    PNG
    media_image5.png
    18
    432
    media_image5.png
    Greyscale

hw(x) denoting a hypothetical prediction of x, and tn denoting a label of sample xn.  (He, pg. 112, Col. 1 “Logistic Loss”, “Logistic Loss also known as log loss or cross-entropy loss. It is used to measure the robustness of the model, which can be depicted in Eq. (13), where, N represents the number of samples, yi [is an element of] {0,1}, yi and pi denote the true value [corresponds to claimed “label”] and the probability prediction, respectively (Bishop, 2006).”

    PNG
    media_image6.png
    97
    750
    media_image6.png
    Greyscale

[In equation (13) of He, pi corresponds to the claimed (hw(x)), and yi corresponds to the claimed tn.])
	He is analogous art, as it is in the field of using Support Vector Machines for classification.
	It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to measure the logistic loss of Singh’s Support Vector Machine using the logistic loss equation of He, the benefit being that it allows measurement of the robustness of the model, as cited by He at pg. 112, Col. 1 “Logistic Loss also known as log loss or cross-entropy loss. It is used to measure the robustness of the model, which can be depicted in Eq. (13).”
Claim 14 recites similar limitations as claim 4 and is rejected under the same rationale as applied to claim 4 above.

Regarding claim 5, the combination of references as applied to claim 1 above teaches [t]he computer-implemented method of claim 1. Further, Maldonado teaches wherein a cost matrix 
-and-
where α and ß values are penalties applied when the computing model classifies an event in the wrong class, (Maldonado, § 3.2 “Cost-sensitive learning”, “Cost sensitive techniques provide a viable alternative to sampling methods for imbalanced learning domains [11]. The objective of cost-sensitive learning is to develop a classification function that minimizes the overall cost on the training data set [corresponds to claimed “optimize the computing model”]. This approach is based on the concept of the cost matrix, which is a numerical representation of the penalty when classifying instances from one class to another.
For example, we define C- as the cost of misclassifying [corresponds to claimed “wrongfully categorized”] a majority class instance as a minority class instance and let C+ represent the cost of the contrary case. Typically, there is no cost for correct classification and the cost of misclassification in the target class is higher than the contrary case, i.e., C+ > C-. […] Other approaches consider cost-sensitive adjustments of different classification methods, which can be applied to the decision threshold or modifying their formulations.”)

	The above combination does not teach according to Formula 1.4 is adopted to further optimize the computing model,
He teaches according to Formula 1.4 is adopted to further optimize the computing model,  
    PNG
    media_image7.png
    18
    452
    media_image7.png
    Greyscale
  
(He, pg. 112, Col. 1 “Logistic Loss”, “Logistic Loss also known as log loss or cross-entropy loss. It is used to measure the robustness of the model, which can be depicted in Eq. (13), where, N represents the number of samples, yi [is an element of] {0,1}, yi and pi denote the true value [corresponds to claimed “label”] and the probability prediction, respectively (Bishop, 2006).”

    PNG
    media_image6.png
    97
    750
    media_image6.png
    Greyscale

[In equation (13) of He, pi corresponds to the claimed (hw(x)), and yi corresponds to the claimed tn.];
	[The cited Formula 1.4 is the same as Formula 1.3 cited in claim 4 above, with the addition of terms α and ß to respectively adjust the loss when the model incorrectly classifies a minority class instance (fraudulent activity) as a majority class (non-fraudulent), and vice-versa.  The C+ and C- terms of Maldonado provide the same function of adjustable costs (corresponds to claimed “cost matrix”), with C+ corresponding to the claimed α and C- corresponding to the claimed ß.  Combining these C+ and C- factors from Maldonado with the logistic loss equation of He cited above gives the claimed Formula 1.4.]
He is analogous art, as it is in the field of using Support Vector Machines for classification.
	It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to measure the logistic loss of Singh’s Support Vector Machine using the logistic loss equation of He, the benefit being that it allows measurement of the robustness of the model, as cited by He at pg. 112, Col. 1 “Logistic Loss also known as log loss or cross-entropy loss. It is used to measure the robustness of the model, which can be depicted in Eq. (13).”
Claim 15 recites similar limitations as claim 5 and is rejected under the same rationale as applied to claim 5 above.

Regarding claim 6, the combination of references as applied to claim 5 above teaches [t]he computer-implemented method of claim 5.  Further, Maldonado teaches wherein further optimization comprises restricting the computing model to meet condition α > ß, in the training phase, to configure the computing model to give additional weight to data that indicates a fraudulent activity.  (Maldonado, § 3.2 “Cost-sensitive learning”, “Cost sensitive techniques provide a viable alternative to sampling methods for imbalanced learning domains [11]. The objective of cost-sensitive learning is to develop a classification function that minimizes the overall cost on the training data set [corresponds to claimed “optimize the computing model”]. This approach is based on the concept of the cost matrix, which is a numerical representation of the penalty when classifying instances from one class to another.
For example, we define C- as the cost of misclassifying a majority class instance as a minority class instance and let C+ represent the cost of the contrary case. Typically, there is no cost for correct classification and the cost of misclassification in the target class is higher than the contrary case, i.e., C+ > C-.” (emphasis added) 
[The cost of misclassification of the minority [corresponds to claimed “fraudulent”] class (C+, corresponding to claimed α) is higher than the cost of misclassifying a minority case (C-, corresponding to claimed ß), “to give additional weight to data that indicates a fraudulent activity” as claimed]


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Chen et al. (“A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.”) discloses using both Logistic Regression (LR) and Support Vector Machines (SVM) together in order to forecast fraudulent financial statements
Zhang et al. (“Classifier selection and clustering with fuzzy assignment in ensemble model for credit scoring”) discloses an ensemble model including LR and SVM for credit scoring
Hua et al. (“Predicting corporate financial distress based on integration of support vector machine and logistic regression”) discloses using a combination of SVM and LR to predict corporate financial distress
Wei et al. (“An Optimized SVM Model for Detection of Fraudulent Online Credit Card Transactions”) discloses using SVM to identify fraudulent credit card transactions
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/L.A.S./Examiner, Art Unit 2126  
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126