DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 1 is objected to because of the following informalities:  the claim contains 2 step b. Since it appears as though both of the steps occur correction of the steps is requested.   

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter because the claim(s) as a whole, considering all claim elements both individually and in combination, do not amount to significantly more than an abstract idea. The claim(s) is/are directed to the abstract idea of detecting anomalies in modeled data. The claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more than the judicial exception itself. Claim(s) (1-20) is/are directed to an abstract idea without significantly more. 

Step 1 
Regarding Step 1 of the Subject Matter Eligibility Test for Products and Processes (from the January 2019 §101 Examination Guidelines), claim(s) (1-11) is/are directed to a method, claim(s) (17-20) is/ are directed to a computer readable medium, and claims(s) (12-16) is/are directed to a system and 

Step 2A Prong 1

The claimed invention is directed to an abstract idea without significantly more. The claim(s) recite(s) (mathematical relationships/formulas, mental process or certain methods of organizing human activity). Specifically the independent claims recite:

(a) mental process: as drafted, the claim recites the limitations of accessing fraud scores, generating a baseline distribution, generating a current distribution, determining a divergence value, determining an activeness value, clustering the data, and designating the data abnormal, which is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting a computer/processor nothing in the claim precludes the determining step from practically being performed in the human mind. For example, but for the “by a processor” language, the claim encompasses a user generating distribution models and making an assessment. The mere nominal recitation of a generic computing device does not take the claim limitation out of the mental processes grouping. This limitation is a mental process.  With regard to the instant application the Examiner has reviewed the disclosure and determined that the underlying claimed invention is described as a concept that is performed in the human mind and/or with the aid of a pen and paper, and thus it is viewed that the applicant is merely claiming that concept performed 1) on a generic computer, 2) in a computer environment or 3) is merely using a computer as a tool to perform the concept, and therefore is considered to recite a mental process. Claims can recite a mental process even if they are claimed as being performed on a computer.  The courts have found claims requiring a generic computer or nominally reciting a generic computer may still recite a mental process even though the clam limitations are not performed entirely in the human mind.  

(b) mathematical formula: The claim recites a mathematical concept (which can include a  mathematical relationships, mathematical formulas or equations, and mathematical calculations), and in this case using distributions to making a determination on data. The distribution provides a parameterized mathematical function that can be used to calculate the probability for any individual observation from the sample space. Thus, the claim recites a mathematical concept and mathematical formulas or equations.  “Mathematical Relationships” A mathematical relationship is a relationship between variables or numbers. A mathematical relationship may be expressed in words or using mathematical symbols. For example, pressure (p) can be described as the ratio between the magnitude of the normal force (F) and area of the surface on contact (A), or it can be set forth in the form of an equation such as p = F/A.  Examples of mathematical relationships recited in a claim include: a relationship between reaction rate and temperature, which relationship can be expressed in the form of a formula called the Arrhenius equation, Diamond v. Diehr; a conversion between binary-coded decimal and pure binary numerals, Gottschalk v. Benson; and a mathematical relationship between enhanced directional radio activity and antenna conductor arrangement (i.e., the length of the conductors with respect to the operating wave length and the angle between the conductors), Mackay Radio & Tel. Co. v. Radio Corp. of Am.  “Mathematical Formulas or Equations” A claim that recites a numerical formula or equation will be considered as falling within the “mathematical concepts” grouping. In addition, there are instances where a formula or equation is written in text format that should also be considered as falling within this grouping. For example, the phrase “determining a ratio of A to B” is merely a textual replacement for the particular equation (ratio = A/B). Additionally, the phrase “calculating the force of the object by multiplying its mass by its acceleration” is a textual replacement for the particular equation (F= ma).  Examples of mathematical equations or formulas recited in a claim include:a Arrhenius equation, Diamond v. Diehr; a formula for computing an alarm limit, Parker v. Flook; a mathematical formula for hedging (claim 4), Bilski v. Kappos.“Mathematical Calculations” A claim that recites a mathematical calculation will be considered as falling within the “mathematical concepts” grouping. A mathematical calculation is a mathematical operation (such as multiplication) or an act of calculating using mathematical methods to determine a variable or number, e.g., performing an arithmetic operation such as exponentiation. There is no particular word or set of words that indicates a claim recites a mathematical calculation. That is, a claim does not have to recite the word “calculating” in order to be considered a mathematical calculation. For example, a step of “determining” a variable or number using mathematical methods or “performing” a mathematical operation may also be considered mathematical calculations when the broadest reasonable interpretation.


Step 2A Prong 2

Specifically the determined judicial exception is not integrated into a practical application because the claim is directed to an abstract idea with additional generic computer elements, the generically recited computer elements do not add a meaningful limitation to the abstract idea because 

	The Examiner has further determined that the claims as a whole does not integrate a judicial exception into a practical application in order to provide an improvement in the functioning of a computer or an improvement to other technology or technical field.  It has been determined that based on the disclosure does not provide sufficient details such that one of ordinary skill in the art would recognize the claimed invention as providing an improvement.  It has not been provided clearly in the disclosure that the alleged improvement would be apparent to one of ordinary skill in the art, but is instead in a conclusory manner (i.e., a bare assertion of an improvement without the detail necessary to be apparent to a person of ordinary skill in the art, and therefore does not improve the technology). 

For further clarification the Examiner points out that the claim(s) 1-20 recite(s) accessing fraud scores, generating a baseline distribution, generating a current distribution, determining a divergence value, determining an activeness value, clustering the data, and designating the data abnormal which are viewed as an abstract idea in the form of applying mathematical formulas to a mental process.  This judicial exception is not integrated into a practical application because the use of a computer for accessing, generating, clustering, and designating which is the abstract idea steps of generating distributions, determining divergent values, and designating the data abnormal in the manner of “apply it”. 

Thus the claims recites an abstract idea directed to applying a mathematical formula to a mental process (i.e. detecting anomalies in modeled data). Using a computer to access, generate, cluster, and designate this mathematically-based, mental process merely implements the abstract idea in the manner of “apply it” and does not provide 'something more' to make the claimed invention patent 

The specification makes it clear that the claimed invention is directed to determining anomalies in data:

[0002]    The present disclosure generally relates to systems and methods for use in monitoring machine learning systems and, in particular, for performing anomaly detection for data generated by machine learning models, where the models are based on input data (e.g., fraud scores, etc.) provided through and/or stored in computer networks (e.g., in data structures associated with the computer networks, etc.).

The dependent claims recite elements that narrow the metes and bounds of the abstract idea but do not provide ‘something more’.  
The dependent claims do not remedy these deficiencies.

Claims 3, 7-11, 19, 20 recite limitations which further limit calculating and the claimed analysis of data.

Claims 4-6, 14-16, 18 recites limitations directed to claim language which further define and calculate the KL value.  
	
Using a computer to perform the data processing as claimed is merely implementing the abstract idea in the manner of “apply it” and does not provide significantly more. Thus the problem the claimed invention is directed to answering the question based on gathered and analyzed information about data produced by a system.  This is not a technical or technological problem but is rather in the realm of business or fraud detection management and therefore an abstract idea.


Step 2B

The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because as discussed with respect to Step 2A Prong Two, the additional 
The same analysis applies here in 2B, i.e., mere instructions to apply an exception using a generic computer component cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.  This is the case because in order for the claims to be viewed as significantly more the claims must incorporate the integral use of a machine to achieve performance of a method, in contrast to where the machine is merely an object on which the method operates, which does not provide significantly more in order for a machine to add significantly more, it must play a significant part in permitting the claimed method to be performed, rather than function solely as an obvious mechanism for permitting a solution to be achieved more quickly.  Whether its involvement is extra-solution activity or a field-of-use, i.e., the extent to which (or how) the machine or apparatus imposes meaningful limits on the claim. Use of a machine that contributes only nominally or insignificantly to the execution of the claimed method (e.g., in a data gathering step or in a field-of-use limitation) would not provide significantly more.  Additionally, another consideration when determining whether a claim recites significantly more is whether the claim effects a transformation or reduction of a particular article to a different state or thing. "[T]ransformation and reduction of an article ‘to a different state or thing’ is the clue to patentability of a process claim that does not include particular machines.  All together the above analysis shows there is not improvement in computer functionality, or improvement to any other technology or technical field.  The claim is ineligible. 	

With respect to the Berkheimer as noted above the same analysis applies to the 2B where the claims are viewed as applying it and as such no further analysis is required.  However with respect to the claims that are viewed as extra solution or post solution activity the Examiner notes that the claims are viewed as well-understood, routine, and conventional because (pick one of the following a-d).

(a) A citation to an express statement in the specification or to a statement made by an applicant during prosecution that demonstrates the well-understood, routine, conventional nature of the additional element(s). A specification demonstrates the well-understood, routine, conventional nature of additional elements when it describes the additional elements as well-understood or routine or conventional (or an equivalent term), as a commercially available product, or in a manner that indicates that the additional elements are sufficiently well-known that the specification does not need to describe the particulars of such additional elements to satisfy 35 U.S.C. § 112(a).

[0030] FIG. 2 illustrates an exemplary computing device 200 that can be used in the system 100. The computing device 200 may include, for example, one or more servers, workstations, personal computers, laptops, tablets, smartphones, PDAs, etc. In addition, the computing device 200 may include a single computing device, or it may include multiple computing devices located in close proximity or distributed over a geographic region, so long as the computing devices are specifically configured to function as described herein. However, the system 100 should not be considered to be limited to the computing device 200, as described below, as different computing devices and/or arrangements of computing devices may be used. In addition, different components and/or arrangements of components may be used in other computing devices.

(c) A citation to a publication that demonstrates the well-understood, routine, conventional nature of the additional element(s). An appropriate publication could include a book, manual, review article, or other source that describes the state of the art and discusses what is well-known and in common use in the relevant industry. 


Prior art references Arrabothu et al. (US 20190385170 A1), Zoldi et al. (US 20160342963 A1), Gerber et al. (US 20160314471 A1), Kennel et al. (US 20140180974 A1) discloses processor, computing device, interface, memory, and medium in at least Arrabothu (Fig. 1, ¶ 14-16, 46-50, 53, 60, 90, 94, 98, 107), Zoldi (Fig. 2, ¶ 20, 27-30, 82, 122-124), Gerber (Fig. 1-5, 8, 31, ¶ 16, 24, 35, 39, 57, 87-118), Kennel (Fig. 3, ¶ 34, 72, 75, 82, 83, 122, 124, 171-174). 

The dependent claims recite elements that narrow the metes and bounds of the abstract idea but do not provide ‘something more’.  Specifically, the dependent claims do not remedy these deficiencies of the independent claims. Therefore based on the above analysis as conducted based on the January 2019 Guidance from the United States Patent and Trademark Office the claims are viewed as a court recognized abstract idea, are viewed as a judicial exception, does not integrate the claims into a practical application, and does not provide an inventive concept, therefore the claims are ineligible.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –



Claim(s) 12, 14, 16-18, 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Zoldi et al. (US 20160342963 A1).

Regarding claim 12, Zoldi teaches a system for use in detecting anomalies in output of a fraud score model, the system comprising (¶ 2, 6, 18);

a memory including a data structure, the data structure including transaction data for a plurality of transactions involving a plurality of payment accounts associated with a plurality of segments of payment accounts, the transaction data including a plurality of fraud scores generated by at least one fraud model (¶ 123, 5, 46-47, 86, 101);

for each of the plurality of segments of payment accounts: access, from the data structure, fraud scores for the segment of payment accounts for a target interval and for a series of prior similar intervals (¶ 83, For a new dataset under investigation, each transaction sample can traverse through the built tree and reach a leaf node to get classified. The likelihood of fraud can be recorded and the mean and variance of the fraud likelihood can be calculated for the entire population or subpopulations of legitimate transactions and fraudulent transactions. FIG. 3 shows an exemplary mean and variance for some data sets over all the pathways. The mean (left subplot) and variance (right subplot) of all the legitimate and fraudulent samples over the entire set of pathways can be calculated respectively. It can be seen that for fraud samples, the mean value can decrease from the in-time test dataset (V13) through the out-of-time dataset (V14) to the out-of-region dataset (U21). For non-fraud samples, the mean value can change less significantly from the in-time dataset, indicating a change in the fraud behaviors. The two-sample Wilcoxon test between the in-time datasets and out-of-time dataset (V14) or 

generate a baseline distribution based on the accessed fraud scores for the segment of payment accounts for the series of prior similar intervals, the baseline distribution including a value for each of multiple fraud score segments across a range (¶ 82, The system can include a pathway processor, an anomaly evaluator, and a pathway feature analyzer, all of which may be implemented as one or more processors, computers, or any other devices capable of processing large amounts of unstructured or 

generate a current distribution based on the accessed fraud scores for the segment of payment accounts for the target interval, the current distribution including a value for each of the multiple fraud score segments (abstract, ¶ 8, In one aspect, a decision tree is built based on a training dataset from a reference dataset. Pathway transversal information along each pathway is recorded for the reference dataset. A first mean and a first variance of a class probability are calculated of all samples over each pathway. A pathway distribution for a new transaction dataset under investigation and a second mean and a second variance of all samples of the new transaction dataset are obtained. The second mean and the second variance are representative of a fraud probability. A first pathway density distribution is retrieved for the reference dataset. A second pathway density distribution is generated for the new transaction dataset. Deviation metrics between the first pathway density distribution and the second pathway density distribution are determined on a global level. The deviation metrics between the first pathway density distribution and the second pathway density distribution are determined on a local level. The deviation metrics between one or more feature statistics of a feature along each pathway for the reference dataset and the new dataset are determined on a local level. One or more likely feature contributors to one or more pattern changes are determined by analyzing the deviation metrics along each pathway. One or more of an alert and a report are generated based on the deviation metrics according to one or more predetermined criteria. ¶ 83, For a new dataset under investigation, each transaction sample can traverse through the built tree and reach a leaf node to get classified. The likelihood of fraud can be recorded and the mean and variance of the fraud likelihood can 

determine a divergence value between the baseline distribution and the current distribution for the segment of payment accounts (¶ 90, In some implementations, as a first step the fraud pattern changes can be detected by determining a transversal density pattern of the pathways involved and comparing that transversal density pattern to a reference transversal density pattern. In other words, to detect and/or identify pattern changes, the density distribution can be compared to a corresponding density distribution for the same comparison basis (for example, using the same number of fraud samples to compare). Based on the transversal density distribution and the reference density distribution, distribution metrics can be used, to include deviation, correlation and a divergence metric as a measure of dissimilarity between the two density distributions as follows. ¶ 109-111, Note that pathways p1 and 

determine an activeness of the segment of payment accounts based on a total number of transactions involving the payment accounts for each of the prior similar intervals, whereby the divergence value and the activeness form a divergence pair (¶ 19, In yet another aspect, methods, apparatus, and computer program products for detecting fraud pattern changes in the transaction data can include building a decision tree based on a training dataset from a reference dataset, recording pathway traversal information along each pathway of the decision tree for the reference dataset, and calculating a mean and variance of a class probability of all samples over all the pathways. The methods, apparatus, and computer program products can further include obtaining a pathway distribution for a new transaction dataset under investigation, and the mean and variance of the fraud probability of all the samples, and generating a pathway density distribution for the new transaction dataset. The methods, apparatus, and computer program products can further include retrieving pathway density distribution for the reference dataset, and determining a deviation metric between the pathway density distributions of the reference and new datasets on a global level. ¶ 84, The diagram of the pathway index and transversal density illustrates whether the distribution may be random or dominated by some pathways, or whether pathways are not effective in describing the fraud patterns of new fraud tactics. Accordingly, depicting the frequency of the pathways may not be restricted to transaction fraud detection scenarios but can be applied to other fields for pathway analysis. ¶ 90-98,  In some implementations, as a first step the fraud pattern changes can be detected by determining a transversal 

and cluster the multiple divergence pairs determined for the plurality of segments of payment accounts (¶ 85, Note that the three datasets (V13, V14 and U21) can all have clear class tags, i.e., non-fraud and fraud. Accordingly, each target class can be handled individually in pursuit of further investigation of each target class. In reality, this scenario may be not ideal because class tags may not be available. For example, some samples may not have tags that clearly show that they are non-fraud or fraud in the bi-modal case. With the built tree from the reference dataset (that must have full tags to build a decision tree), the untagged samples in a new dataset may derive their tags from the sample distributions on leaf nodes. For example, a sample can be tagged as non-fraud if the distance to the non-fraud cluster center at a leaf node is shorter than the distance to the fraud cluster center in the bi-modal case. In a multi-modal case, the shortest distance can dictate the approximate tag of the sample. Using this approximation approach, the untagged samples can be tagged based on a relative magnitude of the 

and designate one or more of the multiple divergence pairs as abnormal based on the clustered divergence pairs (¶ 86-87, By analyzing the pathway transversal density distributions, a new transaction data set exhibiting anomalous behaviors may be identified by some metrics, and an alert based on some predetermined criterion may be generated. For example, a correlation between the reference pathway 

Regarding claim 14, Zoldi teaches in connection with generating the baseline distribution, segregate the fraud scores for the prior similar intervals into the multiple fraud score segments across the range (¶ 99-100, 47, 85, 118); 
wherein the at least one processor is configured to, in connection with generating the current distribution, segregate the fraud scores for the target interval into the multiple fraud score segments across the range (¶ 99-100, 47, 85, 76, 118);
and wherein the at least one processor is configured to determine the divergence value by determining a Kullback-Leibler (KL) divergence value based on the baseline distribution and the current distribution (¶ 95-99, 11-12, 18, 22, 28).

Regarding claim 15, Zoldi teaches wherein the at least one processor is configured to determine the KL divergence value based, at least in part, on: wherein p(x) is the current distribution and q(x) is the baseline distribution (¶ 95-99, 11-12, 18, 22, 28).

Regarding claim 16, Zoldi teaches wherein the at least one processor is configured to determine the KL divergence value based, at least in part, on:   wherein P(i) is the current distribution and Q(i) is the baseline distribution (¶ 95-99, 11-12, 18, 22, 28).

Regarding claim 17, Zoldi teaches a non-transitory computer-readable storage medium including computer- executable instructions for use in detecting anomalies in output of a fraud score model, which, when executed by a processor, cause the processor to (¶ 123, 18, 81, 82, abstract);

for each of the plurality of segments of payment accounts: access, from the data structure, fraud scores for the segment of payment accounts for a target interval and for a series of prior similar intervals (¶ 83, For a new dataset under investigation, each transaction sample can traverse through the built tree and reach a leaf node to get classified. The likelihood of fraud can be recorded and the mean and variance of the fraud likelihood can be calculated for the entire population or subpopulations of 

generate a baseline distribution based on the accessed fraud scores for the segment of payment accounts for the series of prior similar intervals, the baseline distribution including a value for each of multiple fraud score segments across a range (¶ 82, The system can include a pathway processor, an anomaly evaluator, and a pathway feature analyzer, all of which may be implemented as one or more processors, computers, or any other devices capable of processing large amounts of unstructured or structured data. In some implementations, the pathway processor can obtain inputs from the reference dataset and from a new dataset of interest. The inputs from each dataset may include a pathway density distribution and feature statistics (sample density at each feature node and variable statistics at the leaf node) along each pathway. The reference distribution can be the base distribution obtained from the development model, including the pathway transversal density, feature statistics, and class likelihood of each pathway. Based on the two input datasets, the pathway processor can calculate the measurement metrics of the distribution difference and send the results to the anomaly evaluator to use the metrics according to one or more predetermined thresholds to detect pattern changes. The pathway feature analyzer can further the investigation by delving into the features along pathways and utilize pathway feature statistics. The pathway feature analyzer may be composed of two processing modules. One of the processing modules can look at the feature statistics on all the nodes along a pathway. The other processing module can peer at the sample density at each node for the dataset. The pathway processor can search for changes in the distribution of features along each pathway and analyze the changes along the pathway to generate a list of possible features attributable to the distribution difference. Based at least in part on this list, the pathway processor can generate and send an alert signal to users in order for them to make decisions concerning remedial strategies. ¶ 85-88, FIG. 4 shows an example of the 
	
generate a current distribution based on the accessed fraud scores for the segment of payment accounts for the target interval, the current distribution including a value for each of the multiple fraud score segments (abstract, ¶ 8, In one aspect, a decision tree is built based on a training dataset from a reference dataset. Pathway transversal information along each pathway is recorded for the reference dataset. A first mean and a first variance of a class probability are calculated of all samples over each pathway. A pathway distribution for a new transaction dataset under investigation and a second mean and a second variance of all samples of the new transaction dataset are obtained. The second mean and the second variance are representative of a fraud probability. A first pathway density distribution is retrieved for the reference dataset. A second pathway density distribution is generated for the new transaction dataset. Deviation metrics between the first pathway density distribution and the second pathway density distribution are determined on a global level. The deviation metrics 

determine a divergence value between the baseline distribution and the current distribution for the segment of payment accounts (¶ 90, In some implementations, as a first step the fraud pattern changes can be detected by determining a transversal density pattern of the pathways involved and comparing that transversal density pattern to a reference transversal density pattern. In other words, to detect and/or identify pattern changes, the density distribution can be compared to a corresponding density distribution for the same comparison basis (for example, using the same number of fraud samples to compare). Based on the transversal density distribution and the reference density distribution, distribution metrics can be used, to include deviation, correlation and a divergence metric as a measure of dissimilarity between the two density distributions as follows. ¶ 109-111, Note that pathways p1 and p2 can share the same route until feature X39. Below X39, pathways p1 and p2 can diverge. The comparison between the two pathways below the node X92 is noteworthy. For example, the in-time (red) and out-of-time (green) sample density can merge to the same density on pathway p1 at node X39, while the out-of-time (green) density can fall below the in-time (red) density. This behavior can signify that at this depth the distributions are significantly different on feature X39. ¶ 114, 8, 17, 20-22); 

determine an activeness of the segment of payment accounts based on a total number of transactions involving the payment accounts for each of the prior similar intervals, whereby the divergence value and the activeness form a divergence pair (¶ 19, In yet another aspect, methods, apparatus, and computer program products for detecting fraud pattern changes in the transaction data can include building a decision tree based on a training dataset from a reference dataset, recording pathway traversal information along each pathway of the decision tree for the reference dataset, and calculating a mean and variance of a class probability of all samples over all the pathways. The methods, apparatus, and computer program products can further include obtaining a pathway distribution for a new transaction dataset under investigation, and the mean and variance of the fraud probability of all the 

and cluster the multiple divergence pairs determined for the plurality of segments of payment accounts (¶ 85, Note that the three datasets (V13, V14 and U21) can all have clear class tags, i.e., non-

and designate one or more of the multiple divergence pairs as abnormal based on the clustered divergence pairs (¶ 86-87, By analyzing the pathway transversal density distributions, a new transaction data set exhibiting anomalous behaviors may be identified by some metrics, and an alert based on some predetermined criterion may be generated. For example, a correlation between the reference pathway distribution and new pathway distribution may be less than 0.8, which may indicate pattern changes in the new dataset. The feature analysis along each pathway as a subsequent step may enable detection of which features may be responsible for the changes. Those features in such localized analysis can be analyzed and used for deriving reason codes. Thus, the processors can use the information of the contributing features to generate recommended steps in order to react to these shifts in fraud tactics. For example, new variables can be defined in adaptive analytics models, rule features can be created, and/or new model builds can be started to react to shifting fraud environments. ¶ 101-104, FIG. 5 depicts a flowchart of a method for detecting fraud pattern changes in payment transactions on a global level in accordance with some implementations. The flowchart details the procedure in the pathway anomaly evaluator, referring to FIG. 2. In some implementations, a development model can be chosen to be a reference dataset, and the pathway density distribution may be obtained by using the test data which is disjointed with the training data. An example reference dataset can include payment transactions in a North American country. The model was developed and used as a reference model (V13). Exemplary new datasets can include 1) payment transactions for the same country but in a different year, which can be called an out-of-time dataset (V14); and 2) payment transactions for 

Regarding claim 18, Zoldi teaches in connection with generating the baseline distribution, segregate the fraud scores for the prior similar intervals into the multiple fraud score segments across the range (¶ 99-100, 47, 85, 118); 
in connection with generating the current distribution, segregate the fraud scores for the target interval into the multiple fraud score segments across the range (¶ 99-100, 47, 85, 76, 118);
and determine the divergence value by determining a Kullback-Leibler (KL) divergence value based on the baseline distribution and the current distribution (¶ 95-99, 11-12, 18, 22, 28).

Regarding claim 20, Zoldi teaches wherein the instructions, when executed by the processor, cause the processor to: in connection with generating the current distribution: for each fraud score segment for each prior similar interval, divide a count of the fraud scores segregated into the fraud score segment by a total number of fraud scores segregated into the multiple fraud score segments for the prior similar interval, thereby calculating a score ratio for each fraud score segment for each prior similar interval (¶ 77-83, 86-89, 94, 101-105, 109-114, 58-70);
average the score ratios for the corresponding fraud score segments across the prior similar intervals, thereby generating an average score ratio for each of the multiple fraud score segments (¶ 97-100, If the two pathway density distributions (reference and new test density distributions) are very close, the K-L distance can be close to 0. A larger K-L distance can be representative of a dissimilarity between the two distributions. Therefore K-L distance can also be a good indicator of the similarity between the two 
span=max(metrics for all classes)−min(metrics for all classes) The span can be compared against some threshold, and different variation patterns across classes may persist. In addition, the comparison of the pathway density distributions for all samples without tags may be made by using a similar metric in order to characterize total population behavioral shifts from the reference dataset to the new dataset.
¶ 18, 19, 23, 29, 83, 94, 105, 114, 48-70);  
and define the value included in the baseline distribution for each of the multiple fraud score segments as the average score ratio for the corresponding fraud score segment (¶ 97-101, 82-89, 18, 19, 23, 29, 112-114, 94, 105, 48-70, teaches various calculations);  
and in connection with generating the current distribution: for each fraud score segment for the target interval, divide a count of the fraud scores segregated into the fraud score segment by the total number of fraud scores segregated into the multiple fraud score segments for the target interval, thereby calculating a score ratio for each fraud score segment for the target interval (¶ 97-101, 79, 86, 95-86);  
and define the value included in the current distribution for each of the multiple fraud score segments as the score ratio for the corresponding fraud score segment (¶ 82-89, 97-101, 18, 19, 23, 29, 112-114, 48-70, teaches various calculations).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 6, 8-11, is/are rejected under 35 U.S.C. 103 as being unpatentable over Zoldi et al. (US 20160342963 A1) in view of Arrabothu et al. (US 20190385170 A1).

Regarding claim 1, Zoldi teaches detecting anomalies in output of a fraud score model generated by a machine learning system (¶ 2, 6, 18);

(a) accessing fraud scores for a segment of payment accounts for a target interval and for a series of prior similar intervals, the segment of payment accounts subject to at least one fraud model whereby the fraud scores are generated consistent with the at least one fraud model (¶ 83, For a new dataset under investigation, each transaction sample can traverse through the built tree and reach a leaf node to get classified. The likelihood of fraud can be recorded and the mean and variance of the fraud likelihood can be calculated for the entire population or subpopulations of legitimate transactions and fraudulent transactions. FIG. 3 shows an exemplary mean and variance for some data sets over all the pathways. The mean (left subplot) and variance (right subplot) of all the legitimate and fraudulent samples over the entire set of pathways can be calculated respectively. It can be seen that for fraud samples, the mean value can decrease from the in-time test dataset (V13) through the out-of-time dataset (V14) to the out-of-region dataset (U21). For non-fraud samples, the mean value can change less 

(b) generating, by a computing device, a baseline distribution based on the fraud scores for the segment of payment accounts for the series of prior similar intervals, the baseline distribution including a value for each of multiple fraud score segments across a range (¶ 82, The system can include a pathway processor, an anomaly evaluator, and a pathway feature analyzer, all of which may be implemented as one or more processors, computers, or any other devices capable of processing large amounts of unstructured or structured data. In some implementations, the pathway processor can obtain inputs from the reference dataset and from a new dataset of interest. The inputs from each dataset may include a pathway density distribution and feature statistics (sample density at each feature node and variable statistics at the leaf node) along each pathway. The reference distribution can be the base distribution obtained from the development model, including the pathway transversal density, feature statistics, and class likelihood of each pathway. Based on the two input datasets, the pathway processor can calculate the measurement metrics of the distribution difference and send the results to the anomaly evaluator to use the metrics according to one or more predetermined thresholds to detect pattern changes. The pathway feature analyzer can further the investigation by delving into the features along pathways and utilize pathway feature statistics. The pathway feature analyzer may be composed of two processing modules. One of the processing modules can look at the feature statistics on all the nodes along a pathway. The other processing module can peer at the sample density at each node for 

(b) generating, by the computing device, a current distribution based on the fraud scores for the segment of payment accounts for the target interval, the current distribution including a value for each of the multiple fraud score segments (abstract, ¶ 8, In one aspect, a decision tree is built based on a training dataset from a reference dataset. Pathway transversal information along each pathway is recorded for the reference dataset. A first mean and a first variance of a class probability are calculated of all samples over each pathway. A pathway distribution for a new transaction dataset under investigation and a second mean and a second variance of all samples of the new transaction dataset 

(c) determining, by the computing device, a divergence value between the baseline distribution and the current distribution for the segment of payment accounts (¶ 90, In some implementations, as a first step the fraud pattern changes can be detected by determining a transversal density pattern of the pathways involved and comparing that transversal density pattern to a reference transversal density pattern. In other words, to detect and/or identify pattern changes, the density distribution can be compared to a corresponding density distribution for the same comparison basis (for example, using the same number of fraud samples to compare). Based on the transversal density distribution and the reference density distribution, distribution metrics can be used, to include deviation, correlation and a divergence metric as a measure of dissimilarity between the two density distributions as follows. ¶ 109-111, Note that pathways p1 and p2 can share the same route until feature X39. Below X39, pathways p1 and p2 can diverge. The comparison between the two pathways below the node X92 is noteworthy. For example, the in-time (red) and out-of-time (green) sample density can merge to the same density on pathway p1 at node X39, while the out-of-time (green) density can fall below the in-time (red) density. This behavior can signify that at this depth the distributions are significantly different on feature X39. ¶ 114, 8, 17, 20-22); 

(d) determining, by the computing device, an activeness of the segment of payment accounts based on a total number of transactions involving the payment accounts for each of the prior similar intervals, whereby the divergence value and the activeness form a divergence pair (¶ 19, In yet another aspect, methods, apparatus, and computer program products for detecting fraud pattern changes in the transaction data can include building a decision tree based on a training dataset from a 

multiple divergence pairs are determined for multiple segments of payment accounts (¶ 90, In some implementations, as a first step the fraud pattern changes can be detected by determining a transversal density pattern of the pathways involved and comparing that transversal density pattern to a reference transversal density pattern. In other words, to detect and/or identify pattern changes, the density distribution can be compared to a corresponding density distribution for the same comparison basis (for example, using the same number of fraud samples to compare). Based on the transversal density distribution and the reference density distribution, distribution metrics can be used, to include deviation, correlation and a divergence metric as a measure of dissimilarity between the two density distributions as follows. ¶ 88, The left plot shows the comparison for non-fraud samples, and the right plot shows the comparison for fraud samples. The reference pathways can be ordered by density, and the pathways can be dominated by non-fraud samples. FIG. 4 shows that the transversal pathway densities can change from the reference dataset in different pathways in two aspects: deviations from the model development dataset, and similarity/dissimilarity between the datasets. Those changes in distributions can be the signature of the pattern changes, and metrics may be designed to detect and quantify the characteristic changes. ¶ 97-99, If the two pathway density distributions (reference and new test density distributions) are very close, the K-L distance can be close to 0. A larger K-L distance can be representative of a dissimilarity between the two distributions. Therefore K-L distance can also be a good indicator of the similarity between the two density distributions.);

clustering, by the computing device, the multiple divergence pairs for the multiple segments of payment accounts (¶ 85, Note that the three datasets (V13, V14 and U21) can all have clear class tags, 

and designating, by the computing device, one or more of the multiple divergence pairs as abnormal based on the clustered divergence pairs, thereby permitting generation of an interface visualizing anomalous behavior of the at least one fraud score model (¶ 86-87, By analyzing the pathway transversal density distributions, a new transaction data set exhibiting anomalous behaviors may be identified by some metrics, and an alert based on some predetermined criterion may be generated. For example, a correlation between the reference pathway distribution and new pathway distribution may be less than 0.8, which may indicate pattern changes in the new dataset. The feature analysis along each pathway as a subsequent step may enable detection of which features may be responsible for the changes. Those features in such localized analysis can be analyzed and used for deriving reason codes. Thus, the processors can use the information of the contributing features to generate recommended steps in order to react to these shifts in fraud tactics. For example, new variables can be defined in adaptive analytics models, rule features can be created, and/or new model builds can be started to react to shifting fraud environments. ¶ 101-104, FIG. 5 depicts a flowchart of a method for detecting fraud pattern changes in payment transactions on a global level in accordance with some implementations. The flowchart details the procedure in the pathway anomaly evaluator, referring to FIG. 2. In some implementations, a development model can be chosen to be a reference dataset, and the pathway density distribution may be obtained by using the test data which is disjointed with the training data. An example reference dataset can include payment transactions in a North American country. The model was developed and used as a reference model (V13). Exemplary new datasets can include 1) payment 

Zoldi does not specifically teach repeating the steps of the invention. However, 

Arrabothu teaches repeating the steps for one or more other segments of payment accounts (¶ 32-35, In various embodiments, any combination of steps 502-516 may occur automatically, continuously, and/or repeatedly, such that the improvable fraud detection model associated with neural network 160 are continuously updated. The resulting updated improvable fraud detection models will be more effective at detecting fraud in response to receiving an authorization request for a transaction from a merchant. ¶ 44, 37-42, 19, 20). 

It would have been obvious to one of ordinary skill in the art at the time of Applicant’s invention to modify Zoldi to include/perform repeating the one or more other segments, as taught/suggested by Arrabothu. This known technique is applicable to the system of Arrabothu as they both share characteristics and capabilities, namely, they are directed to applying machine learning to fraud detection. One of ordinary skill in the art would have recognized that applying the known technique of Zoldi would have yielded predictable results and resulted in an improved system. It would have been recognized that applying the technique of Zoldi to the teachings of Arrabothu would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied 

Regarding claim 4, Zoldi teaches wherein generating the baseline distribution includes segregating the fraud scores for the prior similar intervals into the multiple fraud score segments across the range (¶ 99-100, 47, 85, 118); 
wherein generating the current distribution includes segregating the fraud scores for the target interval into the multiple fraud score segments across the range (¶ 99-100, 47, 85, 76, 118);
and wherein determining the divergence value includes determining a Kullback-Leibler (KL) divergence value based on the baseline distribution and the current distribution (¶ 95-99, 11-12, 18, 22, 28).

Regarding claim 6, Zoldi teaches wherein the KL divergence value is determined based, at least in part, on:wherein P(i) is the current distribution and Q(i) is the baseline distribution (¶ 95-99, 11-12, 18, 22, 28).

Regarding claim 8, Zoldi teaches wherein generating the current distribution includes: for each fraud score segment for each prior similar interval, dividing a count of the fraud scores segregated into the fraud score segment by a total number of fraud scores segregated into the multiple fraud score segments for the prior similar interval, thereby calculating a score ratio for each fraud score segment for each prior similar interval (¶ 77-83, 86-89, 94, 101-105, 109-114, 58-70);
averaging the score ratios for the corresponding fraud score segments across the prior similar intervals, thereby generating an average score ratio for each of the multiple fraud score segments (¶ 
span=max(metrics for all classes)−min(metrics for all classes) The span can be compared against some threshold, and different variation patterns across classes may persist. In addition, the comparison of the pathway density distributions for all samples without tags may be made by using a similar metric in order to characterize total population behavioral shifts from the reference dataset to the new dataset.
¶ 18, 19, 23, 29, 83, 94, 105, 114, 48-70);  
and defining the value included in the baseline distribution for each of the multiple fraud score segments as the average score ratio for the corresponding fraud score segment (¶ 97-101, 82-89, 18, 19, 23, 29, 112-114, 94, 105, 48-70, teaches various calculations);  
wherein generating the target distribution includes: for each fraud score segment for the target interval, dividing a count of the fraud scores segregated into the fraud score segment by the total number of fraud scores segregated into the multiple fraud score segments for the target interval, thereby calculating a score ratio for each fraud score segment for the target interval (¶ 97-101, 79, 86, 95-86);  
and defining the value included in the current distribution for each of the multiple fraud score segments as the score ratio for the corresponding fraud score segment (¶ 82-89, 97-101, 18, 19, 23, 29, 112-114, 48-70, teaches various calculations).

Regarding claim 9, Zoldi teaches wherein determining the activeness of the segment of payment accounts includes determining the activeness based on a log of an average number of transitions under the segment of payment accounts for each of the prior similar intervals (¶ 95-97, 82-89). 

Regarding claim 10, Zoldi teaches wherein the range includes a numeric range extending from 0 to 999, and wherein each numeric value in the range is indicative of a likelihood of fraud (¶ 78-79, 83, 103-104).

Regarding claim 11, Zoldi teaches wherein the multiple fraud score segments include at least ten divisions (¶ 88, 105).

Claim 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zoldi et al. (US 20160342963 A1) in view of Arrabothu et al. (US 20190385170 A1) in further view of Gerber et al. (US 20160314471 A1).

Regarding claim 2, Zoldi teaches multiple divergence pairs designated as abnormal (¶ 86-87, 101-104, FIG. 5), Zoldi does not specifically teach generating an interface. 

generating an interface based on the one or more data points designated as abnormal (¶ 51-52, 135). 

It would have been obvious to one of ordinary skill in the art at the time of Applicant’s invention to modify Zoldi to include/perform generating an interface as taught/suggested by Gerber. This known technique is applicable to the system of Gerber as they both share characteristics and capabilities, namely, they are directed to fraud detection. One of ordinary skill in the art would have recognized that applying the known technique of Zoldi would have yielded predictable results and resulted in an improved system. It would have been recognized that applying the technique of Zoldi to the teachings of Gerber would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such interface features into similar systems. Further, applying generating an interface would have been recognized by those of ordinary skill in the art as resulting in an improved system that would allow for information dispersal methods as needed.

Claim 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zoldi et al. (US 20160342963 A1) in view of Arrabothu et al. (US 20190385170 A1) in further view of Kennel et al. (US 20140180974 A1).

Regarding claim 3, Zoldi teaches multiple divergence pairs designated as abnormal (¶ 86-87, 101-104, FIG. 5), Zoldi does not specifically teach a bank identification number. 

However, Kennel teaches wherein the segment of payment accounts includes payment accounts each having a same bank identification number (BIN)  (¶ 23-25, 47, 53, 66, 69, 71, 113, 120, 144). 

.

Claim 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zoldi et al. (US 20160342963 A1) in view of Arrabothu et al. (US 20190385170 A1) in further view of Guo et al. “Estimate the Call Duration Distribution Parameters in GSM System Based on K-L Divergence Method”, published 2007 (cited as reference 1-W; referred to hereinafter as ‘Guo’).

Regarding claim 5, Zoldi teaches wherein the KL divergence value is determined based, at least in part, on:wherein p(x) is the current distribution and q(x) is the baseline distribution (¶ 95-99, 11-12, 18, 22, 28). Zoldi does not teach the exact equation as claimed. 


    PNG
    media_image1.png
    167
    515
    media_image1.png
    Greyscale


It would have been obvious to one of ordinary skill in the art at the time of Applicant’s invention to modify Zoldi to include/perform the specific KL divergence value  equation as taught/suggested by Guo. This known technique is applicable to the system of Guo as they both share characteristics and capabilities, namely, they are directed to determine divergence using Kullback-Leiber divergence. One of ordinary skill in the art would have recognized that applying the known technique of Zoldi would have yielded predictable results and resulted in an improved system. It would have been recognized that applying the technique of Zoldi to the teachings of Guo would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such KL divergence value equation features into similar systems. Further, including the equation would have been recognized by those of ordinary skill in the art as resulting in an improved system that would allow for the specific use of the equation as claimed.

Claim 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zoldi et al. (US 20160342963 A1) in view of Arrabothu et al. (US 20190385170 A1) in further view of Barz et al. “Detecting Regions of Maximal Divergence for Spatio-Temporal Anomaly Detection”, published May 2019 (cited as reference 1-V; referred to hereinafter as ‘Barz’).

Regarding claim 7, Zoldi teaches multiple divergence pairs designated as abnormal (¶ 86-87, 101-104, FIG. 5), Zoldi does not specifically teach clustering of applications with noise algorithm.

However, Barz teaches wherein clustering the multiple divergence pairs includes applying a density-based spatial clustering of applications with noise (DB SCAN) algorithm to the multiple divergence pairs (pg. 1090, Moreover, a measure DðpI ; pVÞ for the degree of “deviation” of pI from pV has to be defined. Like some other works on collective anomaly detection [8], [10], we use— among others—the Kullback-Leiber (KL) divergence for this purpose. However, Section 2.5 will show that this is a suboptimal choice when used without a slight modification and discuss alternative divergence measures. Given these ingredients, the underlying optimization problem for finding the most anomalous interval can be described as I ^ ¼ argmax I2IA;B D pI ; pVðIÞ  : (4) Various possible choices for the divergence measure D will be discussed in Section 2.5. In order to actually locate this “maximally divergent interval” I ^, the MDI algorithm scans over all intervals I 2 IA;B, estimates the distributions pI and pV and computes the divergence between them, which becomes the anomaly score of the interval I. The parameters A and B, which define the minimum and the maximum size of the intervals in question, have to be specified by the user in advance. This is not a severe restriction, since extreme values may be chosen for these parameters in exchange for increased computation time. But depending on the application and the focus of the analysis, there is often prior knowledge about reasonable limits for the size of possible intervals., pg. 1093,  Though DKLðpI ; pVÞ does not overestimate the anomalousness of low-variance intervals as extremely as DKLðpV; pI Þ does, the following theoretical analysis will show that it is not unbiased either. In contrast to the previous section, this bias is not related to the data itself, but to the length of the intervals: smaller intervals systematically get higher scores than longer ones. This harms the quality of interval detections, because anomalies will be split up into multiple contiguous small detections (see Fig. 5a for an example). Recall that In m;m denotes the set of all intervals of length m in . 

It would have been obvious to one of ordinary skill in the art at the time of Applicant’s invention to modify Zoldi to include/perform clustering of applications with noise algorithm as taught/suggested by Barz. This known technique is applicable to the system of Barz as they both share characteristics and capabilities, namely, they are directed to using Kullback-Leibler divergence in multiple fields including fraud detection. One of ordinary skill in the art would have recognized that applying the known technique of Zoldi would have yielded predictable results and resulted in an improved system. It would have been recognized that applying the technique of Zoldi to the teachings of Barz would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such noise algorithm features into similar systems. Further, applying clustering of applications with noise algorithm would have been recognized by those of ordinary skill in the art as resulting in an improved system that would not require one to specify the number of clusters in the data a priori, as opposed to k-means. A noise scan can find arbitrarily-shaped clusters or a cluster completely surrounded by a different cluster. 

Claim 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zoldi et al. (US 20160342963 A1) in view of Gerber et al. (US 20160314471 A1).

Regarding claim 13, Zoldi teaches multiple divergence pairs designated as abnormal (¶ 86-87, 101-104, FIG. 5), Zoldi does not specifically teach generating an interface. 

However, Gerber teaches wherein the at least one processor is configured to generate an interface based on the one or more of the multiple divergence pairs designated as abnormal (¶ 51-52, 135). 

It would have been obvious to one of ordinary skill in the art at the time of Applicant’s invention to modify Zoldi to include/perform generating an interface as taught/suggested by Gerber. This known technique is applicable to the system of Gerber as they both share characteristics and capabilities, namely, they are directed to fraud detection. One of ordinary skill in the art would have recognized that applying the known technique of Zoldi would have yielded predictable results and resulted in an improved system. It would have been recognized that applying the technique of Zoldi to the teachings of Gerber would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such interface features into similar systems. Further, applying generating an interface would have been recognized by those of ordinary skill in the art as resulting in an improved system that would allow for information dispersal methods as needed.

Claim 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zoldi et al. (US 20160342963 A1) in view of Guo et al. (2007).

Regarding claim 15, Zoldi teaches wherein the KL divergence value is determined based, at least in part, on:wherein p(x) is the current distribution and q(x) is the baseline distribution (¶ 95-99, 11-12, 18, 22, 28). Zoldi does not teach the exact equation as claimed. 


    PNG
    media_image1.png
    167
    515
    media_image1.png
    Greyscale


It would have been obvious to one of ordinary skill in the art at the time of Applicant’s invention to modify Zoldi to include/perform the specific KL divergence value  equation as taught/suggested by Guo. This known technique is applicable to the system of Guo as they both share characteristics and capabilities, namely, they are directed to determine divergence using Kullback-Leiber divergence. One of ordinary skill in the art would have recognized that applying the known technique of Zoldi would have yielded predictable results and resulted in an improved system. It would have been recognized that applying the technique of Zoldi to the teachings of Guo would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such KL divergence value equation features into similar systems. Further, including the equation would have been recognized by those of ordinary skill in the art as resulting in an improved system that would allow for the specific use of the equation as claimed.

Claim 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zoldi et al. (US 20160342963 A1) in view of Barz et al. (2019).

Regarding claim 19, Zoldi teaches multiple divergence pairs designated as abnormal (¶ 86-87, 101-104, FIG. 5), Zoldi does not specifically teach clustering of applications with noise algorithm.

However, Barz teaches wherein the instructions, when executed by the processor, cause the processor to, in connection with clustering the multiple divergence pairs, apply a density-based spatial clustering of applications with noise (DBSCAN) algorithm to the multiple divergence pairs (pg. 1090, Moreover, a measure DðpI ; pVÞ for the degree of “deviation” of pI from pV has to be defined. Like some other works on collective anomaly detection [8], [10], we use— among others—the Kullback-Leiber (KL) divergence for this purpose. However, Section 2.5 will show that this is a suboptimal choice when used without a slight modification and discuss alternative divergence measures. Given these ingredients, the underlying optimization problem for finding the most anomalous interval can be described as I ^ ¼ argmax I2IA;B D pI ; pVðIÞ  : (4) Various possible choices for the divergence measure D will be discussed in Section 2.5. In order to actually locate this “maximally divergent interval” I ^, the MDI algorithm scans over all intervals I 2 IA;B, estimates the distributions pI and pV and computes the divergence between them, which becomes the anomaly score of the interval I. The parameters A and B, which define the minimum and the maximum size of the intervals in question, have to be specified by the user in advance. This is not a severe restriction, since extreme values may be chosen for these parameters in exchange for increased computation time. But depending on the application and the focus of the analysis, there is often prior knowledge about reasonable limits for the size of possible intervals., pg. 1093,  Though DKLðpI ; pVÞ does not overestimate the anomalousness of low-variance intervals as extremely as DKLðpV; pI Þ does, the following theoretical analysis will show that it is not unbiased either. In contrast to the previous section, this bias is not related to the data itself, but to the length of the intervals: smaller intervals systematically get higher scores than longer ones. This harms the quality of interval detections, because anomalies will be split up into multiple contiguous small detections (see Fig. 5a for an example). Recall that In m;m denotes the set of all intervals of length m in a time-series with n time-steps. Furthermore, let ~0d; d 2 N; denote a d-dimensional vector with all coefficients being 0 and Id the identity matrix of . 

It would have been obvious to one of ordinary skill in the art at the time of Applicant’s invention to modify Zoldi to include/perform clustering of applications with noise algorithm as taught/suggested by Barz. This known technique is applicable to the system of Barz as they both share characteristics and capabilities, namely, they are directed to using Kullback-Leibler divergence in multiple fields including fraud detection. One of ordinary skill in the art would have recognized that applying the known technique of Zoldi would have yielded predictable results and resulted in an improved system. It would have been recognized that applying the technique of Zoldi to the teachings of Barz would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such noise algorithm features into similar systems. Further, applying clustering of applications with noise algorithm would have been recognized by those of ordinary skill in the art as resulting in an improved system that would not require one to specify the number of clusters in the data a priori, as opposed to k-means. A noise scan can find arbitrarily-shaped clusters or a cluster completely surrounded by a different cluster. 

Other pertinent prior art includes Zhang et al. “Machine Learning Testing: Survey, Landscapes and Horizons”, published June 2019 (cited as reference 1-U; referred to hereinafter as ‘Zhang’) discloses testing properties (e.g., correctness, robustness, and fairness), testing components (e.g., the data, learning program, and framework), testing workflow (e.g., test generation and test evaluation), and application scenarios (e.g., autonomous driving, machine translation). Anthony Samy et al. (US 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMIE H AUSTIN whose telephone number is (571)272-7363. The examiner can normally be reached Monday, Wednesday, Thursday 7am-2pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Brian Epstein can be reached on (571)270-5389. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

JAMIE H. AUSTIN
Examiner
Art Unit 3683