Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This is a Non-Final Office Action in response to application 15/851,616 entitled "ROBUST FEATURES GENERATION ARCHITECTURE FOR FRAUD MODELING" with claims 1 to 20 pending.
Status of Claims
Claims 1, 4, 8, 9, 11, 14, and 18-20 have been amended and are hereby entered.
Claims 1 to 20 are pending and have been examined.

Response to Amendment
The amendment filed May 11, 2021 has been entered. Claims 1-20 remain pending in the application.  Applicant’s amendments to the Specification, Drawings, and/or Claims have been noted in response to the Non-Final Office Action mailed February 12, 2021. 
  Information Disclosure Statement
The information disclosure statement (IDS) submitted on July 2, 2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
The terms “knowledge domain”, “machine learning”, and “targeted machine learning model”   in Claims 1, 8, 11, 18, 19, and 20 introduce ambiguity and no clear description is provided in the specification:
With “machine learning” and “machine learning model”,   the amendment filed introduces new matter found neither in the specification nor the original claims.     While the specification mentions “computer model”, “risk model”, and “targeted “machine learning” mentioned. The term “knowledge domain” is also not mentioned or described.
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ). The terms “knowledge domain”, and “targeted machine learning model”   in   Claims 1, 8, 11, 18, 19, and 20   are relative terms which renders the claim indefinite.  The term “knowledge domain” and “targeted machine learning model are not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. 
With “knowledge domain”  it is unclear which attributes are the subject of focus and/or what elements are disregarded. Moreover, the specification mentions “risk domain” and “generic fraud domain”, no “knowledge domain” is described.
With “targeted machine learning model”  it is unclear which attributes are the subject of focus and/or what elements are disregarded.
Therefore the claims are rejected.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-8 and 11-18 are rejected under 35 U.S.C. 103 as being unpatentable over MacLennan ("MACHINE LEARNING SEMANTIC MODEL", U.S. Publication Number: 2014/0046879 A1) ,in view of Pauly ("METHOD AND COMPUTER FOR DETERMINATION OF A TRAINING FUNCTION FOR GENERATING ANNOTATED TRAINING IMAGES", U.S. Publication Number: 2018/0174049A1),in view of Ben-Or ("METHOD FOR ADAPTIVE TUNING VIA AUTOMATED SIMULATION AND OPTIMIZATION", U.S. Publication Number: 2018/0330268 A1), in view of Zoldi (“EXPLAINING MACHINE LEARNING MODELS BY TRACKED BEHAVIORAL LATENT FEATURES”, U.S. Publication Number: 2019/0156196 A1)
Regarding Claim 1, 
MacLennan teaches,
 obtaining, by one or more hardware processors, a plurality of candidate features for building a base machine learning model configured to detect fraudulent transactions;	(MacLennan [Abstract]  data source is selected for a predictive model associated with a predictive algorithm in which the predictive model includes one or more queries and parameters. A set of transformations are then determined based on the queries and parameters for at least a subset of data from the data source to be processed by the predictive algorithm. / MacLennan [0003] The subject technology provides for a computer-implemented method / MacLennan [0006] The system includes one or more processors, and a memory including instructions stored therein, which when executed by the one or more processors, cause the processors to perform operations/ MacLennan [0002] The present disclosure generally relates to predictive analytics that may utilize statistical techniques such as modeling, machine learning, data mining and other techniques for analyzing data to make predictions about future events. / MacLennan [0010] FIG. 1 illustrates an example computing environment including a machine learning semantic model (MLSM) server according to some configurations of the subject technology.)
evaluating, by the one or more hardware processors, the plurality of candidate features using a plurality of different feature selection algorithms, wherein the evaluating comprises determining, for each candidate feature in the plurality of candidate features, (MacLennan [Abstract]  data source is selected for a predictive model associated with a predictive algorithm in which the predictive model includes one or more queries and parameters. A set of transformations are then determined based on the queries and parameters for at least a subset of data from the data source to be processed by the predictive algorithm.)
determine…a plurality of fraud detection scores (MacLennan [0044] The process begins at 205 by specifying a business problem to determine a probability of an event occurring in which the business problem includes a constraint.... By way of example, the business problem can ....determine an incident of a fraudulent transaction given a number of transactions over a period of time, etc. / MacLennan [0051] FIG. 3 conceptually illustrates an example process for scoring a predictive model for solving a business problem. / MacLennan [0055] Further, the process at 320 may provide a set of scores instead of a single score.)
the plurality of different feature selection algorithms; (MacLennan [0084] In some configurations, the data divided into respective bins is utilized by a given predictive algorithm(s) as the binned data represents a finalized version of the data after performing the set of transformations. / MacLennan [0087] A graph is provided in the GUI 1700 that displays respective curves for the results of different models based on the following techniques or algorithms: 1) decision tree 1730; 2) neural network 1735; 3) no model 1740; 4) logistic regression analysis 1750; and 5) naïve Bayes classifier.)
selecting, by the one or more hardware processors, a subset of features from the plurality of candidate features (MacLennan [Abstract]  data source is selected for a predictive model associated with a predictive algorithm in which the predictive model includes one or more queries and parameters. A set of transformations are then determined based on the queries and parameters for at least a subset of data from the data source to be processed by the predictive algorithm. / MacLennan [0003] The subject technology provides for a computer-implemented method / MacLennan [0006] The system includes one or more processors, and a memory including instructions stored therein, which when executed by the one or more processors, cause the processors to perform operations)
 that are determined to be dominative relative to fraud detection over remaining candidate features in the plurality of candidate features (MacLennan [0047] An example of a business problem transformation may include grouping more relevant data according to the objectives of the business problem. For example, customers that fall within zip codes or geographical areas close to a business of interest may be grouped together in a corresponding bucket, or other remaining customers in other zip codes or geographical areas may be grouped into another bucket for the predictive algorithm. / MacLennan [0044]  By way of example, the business problem can ... determine an incident of a fraudulent transaction given a number of transactions over a period of time, etc. / MacLennan [0048] For instance, a predictive algorithm that predicts instances of fraudulent transactions, which may be statistical insignificant among a set of data with an arbitrary size, may perform a rebalancing technique that amplifies the statistical significance of fraudulent data and reduces the statistical significant of instances of non-fraudulent data.)
 according to the plurality of different feature selection algorithms; (MacLennan [0032] Output of predictive algorithms is restated into relevant business terms. / MacLennan [0055] the process in FIG. 3 may be applied for several predictive algorithms and still be within the scope of the subject technology. /MacLennan [0087]  results of different models based on the following techniques or algorithms: 1) decision tree 1730; 2) neural network 1735; 3) no model 1740; 4) logistic regression analysis 1750; and 5) naïve Bayes classifier.)
   building, by the one or more hardware processors, the base machine learning model for detecting fraudulent transactions using input values (MacLennan [0050] The predictive algorithm may utilize queries, parameters for the queries and one or more machine learning techniques for solving the business problem. 
MacLennan [0044]  determine an incident of a fraudulent transaction given a number of transactions over a period of time
MacLennan [0048] For instance, a predictive algorithm that predicts instances of fraudulent transactions)
 based on the subset of features, and not other features in the plurality of candidate features (MacLennan [0003]  determining a set of transformations based on the queries and parameters for at least a subset of data from the data source to be processed by the predictive algorithm; identifying a set of patterns based on the set of transformations for at least the subset of data from the data source;)
receiving, by the one or more hardware processors, a transaction request; 
(MacLennan [0102] in response to requests received from the web browser.
MacLennan [0006]  The system includes one or more processors)
and using, by the one or more hardware processors, the targeted machine learning model to determine whether the transaction request is a fraudulent request
(MacLennan [0055] After performing the set of transformations, the process at 320 provides a score indicating a probability of an event specified by the business problem based on the predictive algorithm on the set of data. By way of example, the process performs the predictive algorithm on the set of data to provide  ... a likelihood that a transaction is fraudulent 
 MacLennan [0006]  The system includes one or more processors)
MacLennan does not teach wherein each fraud detection score in the plurality of fraud detection scores represents a fraud detection efficacy of the candidate feature based on a different feature selection algorithm; wherein each feature in the subset of features has higher fraud detection scores than any candidate features in the remaining candidate features according to each of the plurality of feature selection algorithms; that is knowledge domain-independent; in response to detecting an event, generating, by the one or more hardware processors, a targeted machine learning model by retraining the base machine learning model using second historical transaction data that is specific to a first knowledge domain;
Pauly teaches,
 in response to detecting an event, generating, by the one or more hardware processors, a targeted machine learning model by retraining the base machine learning model using second historical transaction data that is specific to a first knowledge domain;
(Pauly [0008]  An object of the present invention is to provide a method for generating a training function for the rapid and inexpensive generation of annotated image data for training self-learning algorithms.
Pauly [0010]   the computer makes an adjustment of a parameter of the image-information-processing first function and/or the image-processing second function based on at least the first result and the second result.
Pauly [0011] The invention is based on the insight that the use of an image-information-processing second function as the basis of the training function enables training images to be generated not only from random data, but also from data with an information content. 
Pauly [0012] The use of the first result and the second result for the adjustment of the parameters of the first and second function enables a particularly good evaluation of the progress of the training and hence the determination of the training function in very few iteration steps.)
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the fraud detection computer model of  MacLennan to incorporate the machine self-learning training teachings of Pauly to “provide a method for generating a training function for the rapid and inexpensive generation of annotated…data for training self-learning algorithms..” (Pauly [0008]).        The modification would have been obvious, because it is merely applying a known technique (i.e. training self-learning algorithms) to a known concept (i.e. fraud detection computer modeling) ready for improvement to yield predictable result (i.e. “for the same task, machine-based … classifiers produce a lower error rate than an average human classifier and at the same time, the … classification is performed more quickly.” Pauly [0003])
Pauli does not teach  wherein each fraud detection score in the plurality of fraud detection scores represents a fraud detection efficacy of the candidate feature based on a different feature selection algorithm; wherein each feature in the subset of features has higher fraud detection scores than any candidate features in the remaining candidate features according to each of the plurality of feature selection algorithms; that is knowledge domain-independent;
Zoldi teaches,
that is knowledge domain-independent; (Zoldi [0015] this method leverages previous data points within a given time series to provide a locally relevant context for comparison of the individual's behavior patterns, rather than relying on statistics from the global population or randomly generated perturbations. The method is independent of the neural network architecture, and can handle both a large number and a large variety of input variables. Finally, this method is fast and efficient, with no requirement to perform time consuming rescoring calculations or store a large amount of profiled data.)
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the fraud detection computer model of  MacLennan to incorporate the tracked behavioral latent features teachings of Zoldi to utilize “changes in behavior of a time series to identify the latent factors that drive explanation.” (Zoldi [Abstract]).        The modification would have been obvious, because it is merely applying a known technique (i.e. behavioral latent features) to a known concept (i.e. fraud detection computer modeling) ready for improvement to yield predictable result (i.e. “assess and increase robustness associated with model governance processes” (Zoldi [Abstract])
Zoldi does not teach wherein each fraud detection score in the plurality of fraud detection scores represents a fraud detection efficacy of the candidate feature based on a different feature selection algorithm; wherein each feature in the subset of features has higher fraud detection scores than any candidate features in the remaining candidate features according to each of the plurality of feature selection algorithms;  
Ben-Or teaches,
   wherein each fraud detection score in the plurality of fraud detection scores represents a fraud detection efficacy of the candidate feature 
(Ben-Or [0052] with fraudulent transactions having a target value of 1 and non-fraud transactions having a value of 0. This represents an online fraud detection system that takes a stream of transactional inputs and returns a fraud score for each observation, with high scores corresponding to a high probability of fraud present.)
based on a different feature selection algorithm;
(Ben-Or [0008] Another current method for tuning a behavioral risk model for detecting suspicious financial activity includes the concept of bootstrap aggregating ("bagging"). Bagging may include generating multiple classifiers by obtaining the predicted values from the adjusted models to several replicated datasets and then combining them into a single predictive classification in order to improve the classification accuracy. 
Ben-Or [0016] to generate a set of at least two additional models for each of the optimized model and the at least one new random model
Ben-Or [0021]  a plurality of predictive models)
 wherein each feature in the subset of features has higher fraud detection scores than any candidate features in the remaining candidate features according to each of the plurality of feature selection algorithms;  
(Ben-Or [0002]  Fraud detection is based on analyzing past transactional and cases disposition data which may be used to build and tune behavioral and risk models. Whether a fraud detection model is accurate enough to provide correct classification of the case as fraudulent or legitimate is a critical factor when statistical techniques are used to detect fraud. / Ben-Or [0058]  a subset of relevant data may be selected from the data received  / Ben-Or [0060]  A KI and corresponding score may be selected for the reduced set if, for example, the KI and corresponding score has an influence ranking above a predetermined influence ranking threshold. / Ben-Or  [0064]  Ensemble modeling may include running or executing two or more related but different analytical models and then synthesizing the results into a single score or spread in order to improve the accuracy of predictive analytics and data mining)
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the fraud detection computer model of  MacLennan to incorporate the adaptive tuning via automated simulation and optimization teachings of Ben-Or for “optimization of model parameters of at least one predictive model for detecting suspicious financial activity.” (Ben-Or [Abstract]).        The modification would have been obvious, because it is merely applying a known technique (i.e. adaptive tuning via automated simulation and optimization) to a known concept (i.e. fraud detection computer modeling) ready for improvement to yield predictable result (i.e. “automatic tuning of behavioral and risk models for rare event prediction” (Ben-Or [0001])
Regarding Claim 2, 
MacLennan, Pauly, Zoldi, and Ben-Or teach the fraud detection computer model of Claim 1 as described earlier.
MacLennan teaches,
wherein selecting the subset of features comprises: sorting the plurality of candidate features into a plurality of layers of candidate features based on the plurality of fraudulent detection scores determined for each of the plurality of candidate features, (MacLennan [0055] After performing the set of transformations, the process at 320 provides a score indicating a probability of an event specified by the business problem based on the predictive algorithm on the set of data. By way of example, the process performs the predictive algorithm on the set of data to provide  ... a likelihood that a transaction is fraudulent / MacLennan [0055] Further, the process at 320 may provide a set of scores instead of a single score. / MacLennan [0003]  in which the predictive model includes one or more queries and parameters;)
and selecting one or more candidate features from the first layer as the subset of features. (MacLennan [0017] selecting data in an column that includes encoded data according to some configurations of the subject technology.  / MacLennan [0003] determining a set of transformations based on the queries and parameters for at least a subset of data from the data source to be processed by the predictive algorithm)
MacLennan does not teach wherein each candidate feature in a first layer has higher fraudulent detection scores than candidate features in the remaining layers according to the plurality of different feature selection algorithms;
Pauly teaches,
wherein each candidate feature in a first layer… candidate features in the remaining layers according to the plurality of different feature selection algorithms (Pauly [0038] an artificial neural network constructed from layers, which depicts an input value on an output similar to the input value. In this case, the autoencoder comprises at least one input layer, an output layer with the same number of nodes as the input layer and a central layer between the input layer and the output layer with fewer nodes than the input layer. The nodes of the input layer are assigned to the input data in the same way as the assignment of the nodes of the output layer to the output data. / Pauly [0002]   there has been great progress in the use of so-called “deep learning” algorithms, wherein the algorithms have been trained using very large volumes of data / Pauly [0055]  Therefore, the cost functions are obtained asK D=BCE(D_x,1)+BCE(D_G_y,0)∝ log(D_x)+log(1−D_G_y)K G=BCE(D_G_y,1)∝ log(D_G_y)wherein in each case only the first component of the first calculated image information D_x and the second item of calculated image information D_G_y is used. Therefore, the two cost functions KD, KG have alternating components so that, when both cost functions are minimized, the first function G and the second function D are trained to overcome one another. Other cost functions KD, KG having this property are also possible.)
Pauli does not teach has candidate feature …higher fraudulent detection scores 
Ben-Or teaches,
candidate feature …has higher fraudulent detection scores (Ben-Or [0060]  A KI and corresponding score may be selected for the reduced set if, for example, the KI and corresponding score has an influence ranking above a predetermined influence ranking threshold.)
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the fraud detection computer model of  MacLennan to incorporate the adaptive tuning via automated simulation and optimization teachings of Ben-Or for “optimization of model parameters of at least one predictive model for detecting suspicious financial activity.” (Ben-Or [Abstract]).        The modification would have been obvious, because it is merely applying a known technique (i.e. adaptive tuning via automated simulation and optimization) to a known concept (i.e. fraud detection computer modeling) ready for improvement to yield predictable result (i.e. “automatic tuning of behavioral and risk models for rare event prediction” (Ben-Or [0001])
Regarding Claim 3, 
MacLennan, Pauly, Zoldi, and Ben-Or teach the fraud detection computer model of Claim 2 as described earlier.
Pauly teaches,
wherein each candidate feature in the first layer;  (Pauly [0038] an artificial neural network constructed from layers, which depicts an input value on an output similar to the input value. In this case, the autoencoder comprises at least one input layer, an output layer with the same number of nodes as the input layer and a central layer between the input layer and the output layer with fewer nodes than the input layer. The nodes of the input layer are assigned to the input data in the same way as the assignment of the nodes of the output layer to the output data.)
other candidate features in the first layer according to the plurality of different feature selection algorithms. (Pauly [0038] an artificial neural network constructed from layers, which depicts an input value on an output similar to the input value. In this case, the autoencoder comprises at least one input layer, an output layer with the same number of nodes as the input layer and a central layer between the input layer and the output layer with fewer nodes than the input layer. The nodes of the input layer are assigned to the input data in the same way as the assignment of the nodes of the output layer to the output data. / Pauly  [0052]  an item of image information is defined by a binary variable that specifies whether an image contains a predefined feature...It is also possible to use other predefined features / Pauly [0002]   there has been great progress in the use of so-called “deep learning” algorithms, wherein the algorithms have been trained using very large volumes of data.)
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the fraud detection computer model of  MacLennan to incorporate the machine self-learning training teachings of Pauly to “provide a method for generating a training function for the rapid and inexpensive generation of annotated…data for training self-learning algorithms..” (Pauly [0008]).        The modification would have been obvious, because it is merely applying a known technique (i.e. training self-learning algorithms) to a known concept (i.e. fraud detection computer modeling) ready for improvement to yield predictable result (i.e. “for the same task, machine-based … classifiers produce a lower error rate than an average human classifier and at the same time, the … classification is performed more quickly.” Pauly [0003])
Pauly does not teach does not have higher fraudulent detection scores
Ben-Or teaches,
does not have higher fraudulent detection scores (Ben-Or [Claim 3]   if the level of improvement between the optimized model and the best performing model candidate is equal to or below a predetermined threshold, selecting the optimized model for detecting suspicious financial activity.)
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the fraud detection computer model of  MacLennan to incorporate the adaptive tuning via automated simulation and optimization teachings of Ben-Or for “optimization of model parameters of at least one predictive model for detecting suspicious financial activity.” (Ben-Or [Abstract]).        The modification would have been obvious, because it is merely applying a known technique (i.e. adaptive tuning via automated simulation and optimization) to a known concept (i.e. fraud detection computer modeling) ready for improvement to yield predictable result (i.e. “automatic tuning of behavioral and risk models for rare event prediction” (Ben-Or [0001])

Regarding Claim 4, 
MacLennan, Pauly, Zoldi, and Ben-Or teach the fraud detection computer model of Claim 1 as described earlier.
MacLennan teaches,
wherein the input values correspond to the mathematical representations
(MacLennan [0046] An example of a data space transformation may be a mathematical operation(s) such as a logarithm performed on numerical data (e.g., price of a product, etc.) that reshapes the data for the predictive algorithm.)
 MacLennan does not teach building a neural network based on the selected subset of features, wherein the neural network has a plurality of input nodes in an input layer corresponding to the selected subset of features, a plurality of hidden nodes in a hidden layer having a number of nodes less than the plurality of input nodes, and a plurality of output nodes in an output layer corresponding to the plurality of input nodes; configuring each hidden node in the hidden layer to generate a mathematical representation representing the selected subset of features based on input values from the plurality of input nodes; and training the neural network to produce a plurality of output values that matches a plurality of input values provided to the neural network, wherein the training comprises iteratively adjusting at least one mathematical representation corresponding to at least one hidden node in the hidden layer, in the hidden layer of the neural network, and wherein a first mathematical representation corresponding to a first hidden node in the plurality of hidden nodes is different from a second mathematical representation corresponding to a second hidden node in the plurality of hidden nodes.
Pauly teaches,
building a neural network based on the selected subset of features, wherein the neural network has a plurality of input nodes in an input layer corresponding to the selected subset of features, a plurality of hidden nodes in a hidden layer having a number of nodes less than the plurality of input nodes, and a plurality of output nodes in an output layer corresponding to the plurality of input nodes;
(Pauly [0038] an artificial neural network constructed from layers, which depicts an input value on an output similar to the input value. In this case, the autoencoder comprises at least one input layer, an output layer with the same number of nodes as the input layer and a central layer between the input layer and the output layer with fewer nodes than the input layer. The nodes of the input layer are assigned to the input data in the same way as the assignment of the nodes of the output layer to the output data.
Examiner notes that "a central layer between the input layer and the output layer " is equivalent to a hidden layer )
configuring each hidden node in the hidden layer to generate a mathematical representation representing the selected subset of features based on input values from the plurality of input nodes; 
(Pauly [0030] Furthermore, the invention can relate to a data generator that includes the interface and the processor of the function-determining computer, wherein the computing unit is furthermore configured to generate annotated training data by applying the training function / Pauly [0069] Furthermore, the distance of the central node values D3_x from the central node values G3_i makes a positive contribution to the cost function; the distance of the central node values D3_x from the central node values G3_y makes a positive contribution. For the calculation of the distance, the node values D3_x, G3_i, G3_y are understood to be vectors in an n-dimensional vector space, wherein n corresponds to the number of nodes in the central layers D3 and G3 and the j-th value of the vector corresponds to the value of the j-th node in the central layer D3, G3. The distance of the node values is in each case understood to be the sum of the squared differences of the vector components. However, other distance terms are also conceivable, in particular distance terms from an L1-norm (Manhattan metric), from the maximum norm or the minimum norm. The distances are determined by a comparison function C3.; Examiner notes that since each node in the central layer (hidden layer) accepts input node values, the mathematical representation (distance calculation and L1-norm (Manhattan metric)) representing the selected subset of features based on input values from the plurality of input nodes. Examiner notes the prior art calls the hidden layer/nodes as central layer/nodes)
and training the neural network to produce a plurality of output values that matches a plurality of input values provided to the neural network, 
(Pauly [0030] Furthermore, the invention can relate to a data generator that includes the interface and the processor of the function-determining computer, wherein the computing unit is furthermore configured to generate annotated training data by applying the training function / Pauly [0069] Furthermore, the distance of the central node values D3_x from the central node values G3_i makes a positive contribution to the cost function; the distance of the central node values D3_x from the central node values G3_y makes a positive contribution. For the calculation of the distance, the node values D3_x, G3_i, G3_y are understood to be vectors in an n-dimensional vector space, wherein n corresponds to the number of nodes in the central layers D3 and G3 and the j-th value of the vector corresponds to the value of the j-th node in the central layer D3, G3. The distance of the node values is in each case understood to be the sum of the squared differences of the vector components. However, other distance terms are also conceivable, in particular distance terms from an L1-norm (Manhattan metric), from the maximum norm or the minimum norm. The distances are determined by a comparison function C3.; Examiner notes that since each node in the central layer (hidden layer) accepts input node values, the mathematical representation (distance calculation and L1-norm (Manhattan metric)) representing the selected subset of features based on input values from the plurality of input nodes. Examiner notes the prior art calls the hidden layer/nodes as central layer/nodes)
wherein the training comprises iteratively adjusting at least one mathematical representation corresponding to at least one hidden node in the hidden layer, 
(Pauly [0012] The use of the first result and the second result for the adjustment of the parameters of the first and second function enables a particularly good evaluation of the progress of the training and hence the determination of the training function in very few iteration steps. / Pauly [0089] In the exemplary embodiment, the input/output unit FDU.4 can be used to input parameters of the method, such as the number of iterations, or to output information on the method, such as an average cost function, to the user. / Pauly [0092] In A.4, a loop with “trainings_epochs” iterations is defined via a counting variable “n” and a loop over all pairs of training images “tr_image” and training image information “tr_info” in the training data “tr_data”. Herein, “trainings_epochs” is a predefined whole number describing the number of training cycles)
in the hidden layer of the neural network, 
(Pauly [0038] an artificial neural network constructed from layers, which depicts an input value on an output similar to the input value. In this case, the autoencoder comprises at least one input layer, an output layer with the same number of nodes as the input layer and a central layer between the input layer and the output layer with fewer nodes than the input layer. The nodes of the input layer are assigned to the input data in the same way as the assignment of the nodes of the output layer to the output data.
Examiner notes that "a central layer between the input layer and the output layer " is equivalent to a hidden layer )
and wherein a first mathematical representation corresponding to a first hidden node in the plurality of hidden nodes is different from a second mathematical representation corresponding to a second hidden node in the plurality of hidden nodes.
(Pauly [0069] the distance of the central node values D3_x from the central node values G3_y makes a positive contribution. For the calculation of the distance, the node values D3_x, G3_i, G3_y are understood to be vectors in an n-dimensional vector space, wherein n corresponds to the number of nodes in the central layers D3 and G3 and the j-th value of the vector corresponds to the value of the j-th node in the central layer D3, G3. The distance of the node values is in each case understood to be the sum of the squared differences of the vector components. However, other distance terms are also conceivable, in particular distance terms from an L1-norm (Manhattan metric), from the maximum norm or the minimum norm. The distances are determined by a comparison function C3.; Examiner notes the prior art calls the hidden layer/nodes as central layer/nodes)
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the fraud detection computer model of  MacLennan to incorporate the machine self-learning training teachings of Pauly to “provide a method for generating a training function for the rapid and inexpensive generation of annotated…data for training self-learning algorithms..” (Pauly [0008]).        The modification would have been obvious, because it is merely applying a known technique (i.e. training self-learning algorithms) to a known concept (i.e. fraud detection computer modeling) ready for improvement to yield predictable result (i.e. “for the same task, machine-based … classifiers produce a lower error rate than an average human classifier and at the same time, the … classification is performed more quickly.” Pauly [0003])
Regarding Claim 5, 
MacLennan, Pauly, Zoldi, and Ben-Or teach the fraud detection computer model of Claim 4 as described earlier.
MacLennan does not teach wherein the first mathematical representation comprises a first weight for a first feature corresponding to a first input node in the plurality of input nodes, and wherein the second mathematical representation comprises a second weight different from the first weight for the first feature.
Pauly teaches,
wherein the first mathematical representation comprises a first weight for a first feature corresponding to a first input node in the plurality of input nodes, and wherein the second mathematical representation comprises a second weight different from the first weight for the first feature. (Pauly [0070] The cost function is then minimized by backpropagation by the adjustment of the edge weights of the edges between the layers D1, D2, D3, D4, D5 the second function D and the edges between the layers G1, G2, G3, G4, G5 of the first function G, and possibly the edge weights of the input edges of the first function G and the second function D.)
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the fraud detection computer model of  MacLennan to incorporate the machine self-learning training teachings of Pauly to “provide a method for generating a training function for the rapid and inexpensive generation of annotated…data for training self-learning algorithms..” (Pauly [0008]).        The modification would have been obvious, because it is merely applying a known technique (i.e. training self-learning algorithms) to a known concept (i.e. fraud detection computer modeling) ready for improvement to yield predictable result (i.e. “for the same task, machine-based … classifiers produce a lower error rate than an average human classifier and at the same time, the … classification is performed more quickly.” Pauly [0003])
Regarding Claim 6,  
MacLennan, Pauly, Zoldi, and Ben-Or teach the fraud detection computer model of Claim 1 as described earlier.
MacLennan does not teach wherein of the mathematical representation corresponding to each hidden node in the hidden layer comprises a weight associated with each of the input values from the plurality of input nodes
Pauly teaches,
wherein of the mathematical representation corresponding to each hidden node in the hidden layer comprises a weight associated with each of the input values from the plurality of input nodes (Pauly [0038] The autoencoder can include further layers, furthermore, the autoencoder can be constructed symmetrically about the central layer. The lower number of nodes in the central layer compared to the input and output layer results in the compression of the input data and decompression of the compressed input data to form the output data. Therefore, adjustment of at least the edge weights using training data enables an autoencoder to learn a compression method and a decompression method.; Examiner notes the prior art calls the hidden layer/nodes as central layer/nodes)
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the fraud detection computer model of  MacLennan to incorporate the machine self-learning training teachings of Pauly to “provide a method for generating a training function for the rapid and inexpensive generation of annotated…data for training self-learning algorithms..” (Pauly [0008]).        The modification would have been obvious, because it is merely applying a known technique (i.e. training self-learning algorithms) to a known concept (i.e. fraud detection computer modeling) ready for improvement to yield predictable result (i.e. “for the same task, machine-based … classifiers produce a lower error rate than an average human classifier and at the same time, the … classification is performed more quickly.” Pauly [0003])
Regarding Claim 7,  
MacLennan, Pauly, Zoldi, and Ben-Or teach the fraud detection computer model of Claim 1 as described earlier.
MacLennan does not teach wherein each output node in the plurality of output nodes is configured to produce an output value based on values received from each of the plurality of hidden nodes.
Pauly teaches,
wherein each output node in the plurality of output nodes is configured to produce an output value based on values received from each of the plurality of hidden nodes. (Pauly [0038] The nodes of the input layer are assigned to the input data in the same way as the assignment of the nodes of the output layer to the output data. ; Examiner notes this explains that the output layer (N3) may produce an output, though not depicted in Figure 6, below:
Applicant’s Figure 6
    PNG
    media_image1.png
    603
    841
    media_image1.png
    Greyscale

Pauly’s Figure 6
    PNG
    media_image2.png
    355
    516
    media_image2.png
    Greyscale


 )
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the fraud detection computer model of  MacLennan to incorporate the machine self-learning training teachings of Pauly to “provide a method for generating a training function for the rapid and inexpensive generation of annotated…data for training self-learning algorithms..” (Pauly [0008]).        The modification would have been obvious, because it is merely applying a known technique (i.e. training self-learning algorithms) to a known concept (i.e. fraud detection computer modeling) ready for improvement to yield predictable result (i.e. “for the same task, machine-based … classifiers produce a lower error rate than an average human classifier and at the same time, the … classification is performed more quickly.” Pauly [0003])

Regarding Claim 8, 
MacLennan, Pauly, Zoldi, and Ben-Or teach the fraud detection computer model of Claim 1 as described earlier.
MacLennan does not teach in response to detecting a second event, generating a second targeted machine learning model by retraining the base machine learning model using third historical transaction data that is specific to a second knowledge domain, and using the second targeted machine learning model to classify incoming transaction requests
Pauly  teaches,
 in response to detecting a second event, generating a second targeted machine learning model by retraining the base machine learning model using third historical transaction data that is specific to a second knowledge domain, and using the second targeted machine learning model to classify incoming transaction requests.
(Pauly [0008]  An object of the present invention is to provide a method for generating a training function for the rapid and inexpensive generation of annotated image data for training self-learning algorithms.
Pauly [0010]   the computer makes an adjustment of a parameter of the image-information-processing first function and/or the image-processing second function based on at least the first result and the second result.
Pauly [0011] The invention is based on the insight that the use of an image-information-processing second function as the basis of the training function enables training images to be generated not only from random data, but also from data with an information content. 
Pauly [0012] The use of the first result and the second result for the adjustment of the parameters of the first and second function enables a particularly good evaluation of the progress of the training and hence the determination of the training function in very few iteration steps.
Pauly  [Claim 6] second function is a classification function)
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the fraud detection computer model of  MacLennan to incorporate the machine self-learning training teachings of Pauly to “provide a method for generating a training function for the rapid and inexpensive generation of annotated…data for training self-learning algorithms..” (Pauly [0008]).        The modification would have been obvious, because it is merely applying a known technique (i.e. training self-learning algorithms) to a known concept (i.e. fraud detection computer modeling) ready for improvement to yield predictable result (i.e. “for the same task, machine-based … classifiers produce a lower error rate than an average human classifier and at the same time, the … classification is performed more quickly.” Pauly [0003])

Claim 11 is rejected on the same basis as Claim 1.
Claim 12 is rejected on the same basis as Claim 2.
Claim 13 is rejected on the same basis as Claim 3.
Claim 14 is rejected on the same basis as Claim 4.
Claim 15 is rejected on the same basis as Claim 5.
Claim 16 is rejected on the same basis as Claim 6.
Claim 17 is rejected on the same basis as Claim 7.
Claim 18 is rejected on the same basis as Claim 8.

Claims 9, 10, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over MacLennan, Pauly, Zoldi, and Ben-Or in view of Duke (“RULE OPTIMIZATION FOR CLASSIFICATION AND DETECTION”, U.S. Publication Number: 2014/0282856 A1)
Regarding Claim 9, 
MacLennan, Pauly, Zoldi, and Ben-Or teach the fraud detection computer model of Claim 1 as described earlier.
MacLennan, Pauly, Zoldi, and Ben-Or do not teach wherein the plurality of candidate features comprises at least one of: an Internet Protocol (IP) address, a number of successful transactions within a predetermined period of time, a number of failed transactions within the predetermined period of time, a time, a browser type, a device type, an amount associated with the transaction, or a transaction type of the transaction.
Duke teaches wherein the plurality of candidate features comprises at least one of: an Internet Protocol (IP) address, a number of successful transactions within a predetermined period of time, a number of failed transactions within the predetermined period of time, a time, a browser type, a device type, an amount associated with the transaction request, or a transaction type of the transaction request. (Duke [0024] Banks, financial institutions, e-commerce businesses, and other entities use analytical algorithms to monitor data generated by client account activity. / Duke [0065] the fraud detection system 100 could represent events of this sample using the respective event classification labels, the dollar amounts (D) proposed to be transacted, and the event times (T) registered by the payment system. Thus, the fraud detection system 100 could represent each such transactional event using an ordered triple having as elements the respect event classification label, transaction dollar amount observation, and time of day observation.)
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the fraud detection computer model of  MacLennan to incorporate the classification rules teachings of Duke that “determining classification rules to use within a fraud detection system” (Duke [Abstract]).        The modification would have been obvious, because it is merely applying a known technique (i.e. fraud detection classification rules) to a known concept (i.e. fraud detection computer modeling) ready for improvement to yield predictable result (i.e. “By detecting unauthorized account activity, a financial service provider may be able to avoid costs associated with fraud.” Duke [0034])
Regarding Claim 10, 
MacLennan, Pauly, Zoldi, and Ben-Or teach the fraud detection computer model of Claim 1 as described earlier.
MacLennan, Pauly, Zoldi, and Ben-Or do not teach wherein the plurality of different feature selection algorithms comprises at least one univariate feature selection algorithm and at least one multivariate feature selection algorithm.
Duke teaches wherein the plurality of different feature selection algorithms comprises at least one univariate feature selection algorithm and at least one multivariate feature selection algorithm. (Duke [0024] Banks, financial institutions, e-commerce businesses, and other entities use analytical algorithms to monitor data generated by client account activity. / Duke [0051] The candidate classification rule that best aligns with the cluster is identified, selected, and then, during the next iteration, is again modified in the same way as previously described./  Duke [0006] the distributional data representing a distribution of historical transactional events over a multivariate observational sample space defined with respect to multiple transactional variables / Duke [0072] the observational sample space may be one, two, three or four-dimensional, and each dimension may be associated with a different one of the variables involved in representing the events.; Examiner interprets a one-dimensional observational sample space as univariate and a two to four-dimensional observational sample space as multivariate )
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the fraud detection computer model of  MacLennan to incorporate the classification rules teachings of Duke that “determining classification rules to use within a fraud detection system” (Duke [Abstract]).        The modification would have been obvious, because it is merely applying a known technique (i.e. fraud detection classification rules) to a known concept (i.e. fraud detection computer modeling) ready for improvement to yield predictable result (i.e. “By detecting unauthorized account activity, a financial service provider may be able to avoid costs associated with fraud.” Duke [0034])
Claim 19 is rejected on the same basis as Claim 9.
Claim 20 is rejected on the same basis as Claim 10.

Response to Remarks
Applicant's arguments filed on May 11, 2021,  have been fully considered and Examiner’s remarks to Applicant’s amendments follow.   
Response Remarks on Claim Rejections - 35 USC § 101
Applicant’s arguments, with respect to the rejection of claims under 35 USC § 101 have been fully considered and are persuasive.  The focus of the claims is more so an alleged improvement of neural networks and algorithmic training procedures than determining fraudulent financial transactions. 
Therefore, the rejection under 35 USC § 101 has been withdrawn.  

Response Remarks on Claim Rejections - 35 USC § 103
Applicant's amendments required the application of no new nor additional prior art. 
Examiner maintains the combination of MacLennan, Pauly, Zoldi, Ben-Or and Duke anticipates the Applicant’s claimed invention.
Therefore, the rejection under 35 USC § 103 remains.


Prior Art Cited But Not Applied

















The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Yu (“TRAINING AND SELECTION OF MULTIPLE FRAUD DETECTION MODELS”, U.S. Publication Number: 2017/0148027 A1) proposes multiple modeling technologies used within a fraud detection system to each handle different blocks of segmented transaction data in order to detect fraud. Model data is created from recent segmented transaction data that is currently handled by an existing model.
Stubblefield (“SYSTEM AND METHODS FOR PROCESSING A COMMUNICATION NUMBER FOR FRAUD PREVENTION”, U.S. Publication Number: 2015/0106265 A1) teaches a client system to use a fraud score to assess a risk associated with engaging in a transaction with a user.
 Teller (“METHOD AND SYSTEM FOR DEVELOPING PREDICTIONS FROM DISPARATE DATA SOURCES USING INTELLIGENT PROCESSING”, U.S. Publication Number: 2010/0179930 A1) teaches a platform for prediction based on extraction of features and observations collected from a large number of disparate data sources that uses machine learning to reinforce quality of collection, prediction and action based on those predictions.
 
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHINEDU EKECHUKWU whose telephone number is (571)272-4493.  The examiner can normally be reached on Mon-Fri 9 AM ET to 3:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christine M. Behncke can be reached on (571) 272-8103.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/C.E./Examiner, Art Unit 3697
/CHRISTINE M BEHNCKE/Supervisory Patent Examiner, Art Unit 3697