DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 1-20 have been rejected.
Examiner’s Comments
Claims are read in light of the Spec. and the Examiner is mapping for clarity below. Claims 19 and 20 are not mapped as they are duplicatively mapped in view of other claims. Claims 9 and 18 are simple enough not to be mapped.
1 (Claim 10).	An artificial intelligence system configured to detect anomalies in transaction data sets (0002 “money laundering, fraud, or non-compliant transactions”, 0004 “commercial entities”, 0015), the artificial intelligence system comprising:
a processor and a computer readable medium operably coupled thereto, the computer readable medium comprising a plurality of instructions stored in association therewith that are accessible to, and executable by, the processor, to perform modeling operations which comprise:
receiving a first data set (Fig. 2 Item “Data set B”) for training a first machine learning model to detect anomalies in the transaction data sets using a machine learning technique (0016 “apply one or more supervised ML algorithms to the first data set to train one or more micromodels”);
[training at least one micro-model using at least one second data set separate from the first data set] (Fig. 2 Item 5 “Micromodel Creation”; 0016 “micromodel allows for risk score detection using features from other data sets”; compare Fig. 2 Item “set B” with Fig. 2 Item “set A”; see also Fig. 2 Item “Data set A”; 0015 “second data set or auxiliary data set”)
accessing at least one micro-model previously trained using at least one second data set (Fig. 2 Item “set A”) (Fig. 2 Item 5 “Micromodel Creation”; 0016 “micromodel allows for risk score detection using features from other data sets”; compare Fig. 2 Item “set B” with Fig. 2 Item “set A”; see also Claims 5, 14 (originally filed));
determining risk scores from the first data set using the at least one micro-model (0016 “generate one or more risk scores for the second data set”);
enriching the first data set with the risk scores; and (0016 “risk scores are used to enriched the second data set”)
determining the first machine learning model for the enriched first data set using the machine learning technique (0017 “may determine an unsupervised ML algorithm that may be used to determine an ML model”).

2 (Claim 11).	The artificial intelligence system of claim 1, wherein the modeling operations further comprise:
determining a second machine learning model for the first data set (Fig. 2 Item “Data set B”) using the machine learning technique (0034 “trains a machine learning model [for data set B]”), wherein the second machine learning model is for an unenriched data set corresponding to the first data set (0016 “risk scores are used to enriched the second data set”).

3 (Claim 12).	The artificial intelligence system of claim 2, wherein the modeling operations further comprise:
generating a model explanation output based on the first machine learning model and the second machine learning model, wherein the model explanation output comprises a comparison between each feature of each classification task within the first machine learning model and the second machine learning model (Fig. 2 Item 9; 0043; see also Claims 4, 13 (originally filed)).

4 (Claim 13).	The artificial intelligence system of claim 3, wherein generating the model explanation output comprises:
obtaining an importance ranking of each feature in each classification task of the first machine learning model and the second machine learning model; and (Fig. 2 Item 9; 0043)
averaging the importance ranking of each feature to each classification task to obtain the comparison (0043 “an average of those contributions”).

5 (Claim 14).	The artificial intelligence system of claim 1, wherein, before accessing the at least one micro-model, the modeling operations further comprise:
receiving the at least one second data set for the at least one micro-model, wherein the first data set and each of the at least one second data set comprise segregated data sets for a federated training system (0021 “federated transfer learning”), and wherein the at least one second data set comprises at least one auxiliary data set; and (Fig. 2 Item 5 “Micromodel Creation”; 0016 “micromodel allows for risk score detection using features from other data sets”; compare Fig. 2 Item “set B” with Fig. 2 Item “set A”; see also Fig. 2 Item “Data set A”; 0015 “second data set or auxiliary data set”)
generating the at least one micro-model using the at least one second data set (0021 “federated transfer learning”) and at least one supervised machine learning technique (Fig. 2 Item 5 “Micromodel Creation”; 0016 “micromodel allows for risk score detection using features from other data sets”; compare Fig. 2 Item “set B” with Fig. 2 Item “set A”; see also Fig. 2 Item “Data set A”; 0015 “second data set or auxiliary data set”).

6 (Claim 15).	The artificial intelligence system of claim 5, wherein the risk scores are determined based on intersecting features between the first data set and the at least one second data set for the at least one micro-model (0023).

7 (Claim 16).	The artificial intelligence system of claim 5, wherein, before generating the at least one micro-model, the modeling operations further comprise:
pre-processing the at least one second data set (Fig. 2 Item “set A”; 0061 “auxiliary data set”) to reduce a first dimensionality of the at least one second data set; and (0061 “reduce dimensionality of the first transaction data set”)
sampling the pre-processed at least one second data set based on one or more anomalous transactions within the pre-processed at least one second data set, wherein the sampling is used for generating the at least one micro-model, and (0061 “sampled during pre-processing so that a sufficient number (e.g., all or a significant portion) of anomalous transactions are select with a small portion of the non-anomalous transactions.”; see also 0037 “sufficient anomalous transitions are selected”, 0037 “small amount (e.g., predefined threshold)”)
wherein, before determining the first machine learning model for the enriched first data set using the machine learning technique, the modeling operations further comprise:
reducing a second dimensionality of the enriched first data set (Fig. 2 Item “Data set B”) using a dimensionality reduction technique (Fig. 2 Item 7; 0042 “dimensionality reduction is further performed.”).

8 (Claim 17).	The artificial intelligence system of claim 5, wherein each of the at least one micro-model is trained using one of a different algorithm (0041 “different models may be utilized”) or a different data set for the at least one second data set, and wherein the each of the at least one micro-model is not optimized after training (0022 “different data sets”).

Additionally, Examiner notes that “machine learning” for claimed “machine learning technique” found in at least claims 1 and 10 is a term of art that is broad.1
Similarly, the term “micromodel” is a term used within the prior art which is used in conjunction with anomaly detection. This term of art can be found in “Model Aggregation for Distributed Content Anomaly Detection” by Whalen et al. Compare Spec. at TITLE (ARTIFICIAL INTELLIGENCE SYSTEM FOR ANOMALY DETECTION IN TRANSACTION DATA SETS). Specifically, Whalen uses this term of art in Section 3.7 in page 37. Whalen cites backwards to [8] Cretu et al. in “Casting out Demons: Sanitizing Training Data for Anomaly Sensors”. Compare Spec. at TITLE (same). Cretu provides a good definition for micromodel as an anchor2 in Sections 1.4 and 2.2.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-9 of the invention are directed to non-statutory subject matter.  
The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because claims 1 to 9 do not fall within at least one of the four categories of patent eligible subject matter because given the broadest reasonable interpretation, computer-readable medium may be interpreted as a signal. See e.g. In re Nuijten, 84 USPQ2d 1495 (Fed. Cir. 2007). Under BRI, claim 1 recites: “a [transitory or nontransitory] computer readable medium”. 
By comparative analysis, claim 19 (originally filed and therefore part of the Spec.) by contrast discloses “non-transitory computer-readable medium.” Reading in light of the Spec. in the Written Description, para. 0068 discloses: “Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media.” Transmission media is further defined in 0068 as: “In one example, transmission media may take the form of acoustic or light waves, such as those generated during radio wave, optical, and infrared data communications.”
Therefore, claim 1 under BRI is a signal per se.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 7-8 and 16-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Claimed Genus of “optimized” is 
a Relative Term or the Genus is without Bounds.
Claim 8 (and similarly claim 17) recites: “wherein each of the at least one micro-model is trained using one of a different algorithm or a different data set for the at least one second data set, and wherein the each of the at least one micro-model is not optimized after training.” 
Broadly, paragraph 0016 of the Spec. discloses: “[T]he micromodels are not required to be optimized to refine to a fully verified ML model.” Id. (emphasis added); see also Spec. at 0061 (high level). Para. 0023 goes deeper with the disclosure of: “Micromodels 121 are trained to have multiple hyper-parameter settings where instead of optimizing certain hyper-parameters…[these] are instead trained and selected based on data set and scenario.” (Emphasis added.) Lastly, the originally filed claims are part of the Spec. which are at a higher level of abstraction than para. 0023.
Genus (Claimed)
Species (Spec.)
Optimization
Hyperparameter Optimization 


As additional evidence, “Hyperparameter Optimization” is a term of art that can be applied to, for example, Neural Networks as evidenced by the work of Hinz et al. in “Speeding Up the Hyperparameter Optimization of Deep Convolutional Neural Networks.”3
Following MPEP 2173.05(b), claims may be definite if the specification provides examples or teachings that can be used to measure a degree. MPEP 2173.05(b)(I) (citing Interval Licensing LLC v. AOL, Inc., 766 F.3d 1364, 1371-72, 112 USPQ2d 1188, 1193 (Fed. Cir. 2014)). While the Spec. does provide the example of Hyperparameter Optimization (a term of art), the claims are originally filed, and therefore part of the Spec. As such, the claimed language is at the Genus level, not Species level. This genus language is without metes and bounds. Put another way, reading the claims as “hyperparameter optimization” would err on the side of reading claim limitations from the Spec. into the claim language. MPEP 2111.01(II)(reading limitations from Spec. [which includes originally filed claims] into claim).
Given that the Spec. provides no other examples or algorithms (i.e., a representative number of Species to ascertain the Genus) and given that claimed language is at the Genus level,4 the language is indefinite as relative. See MPEP 2173.05(b); Intellectual Ventures I LLC v. T-Mobile USA, Inc., 902 F.3d 1372, 1381 (holding that claimed “optimize” and “optimizing” are relative terms); see also MPEP 2173.04 (“But a genus claim could be interpreted in such a way that it is not clear which species [plural] are covered would be indefinite[.]”) (emphasis added).

Claimed “sampling” is a Relative Term becausethe Spec. Provides No Examples as Anchors.
Claims 7 and 16 recite: “sampling the pre-processed at least one second data set based on one or more anomalous transactions….” Support may be found in 0022, 0036, 0037, 0040, 0058, and 0061. Each of the paragraphs outlined do not provide examples which is a factor that is to be considered for determining definiteness. MPEP 2173.05(b)(I). 
Further, paragraph 0037 discloses: “A sampling step may be performed [such that] sufficient anomalous transactions are selected.” (Emphasis added.) Para. 0037 goes on to disclose that “sampling…is conducted where all or a significant portion [of] anomalous transactions are selected[.]”5 Similarly, paragraph 0061 discloses: “[T]he first transaction dataset is sampled…so that a sufficient number (e.g., all or a significant portion)…are selected[.]” (Emphasis added.)
Reading in light of the Spec., the claimed term is relative following no examples. MPEP 2173.05(b)(I).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 2, 9, 10, 11, and 18 are rejected under AIA  35 U.S.C. 103(a) as being unpatentable over (A) Cretu US-20080262985-A1 in view of (B) Fogel US-20090319346.

Regarding Claims 1 and 106
[preamble] An artificial intelligence system configured to detect anomalies in transaction data sets, the artificial intelligence system comprising:
a processor and a computer readable medium operably coupled thereto, the computer readable medium comprising a plurality of instructions stored in association therewith that are accessible to, and executable by, the processor, to perform modeling operations which comprise:
[a] receiving a first data set for training a first machine learning model to detect anomalies in the transaction data sets using a machine learning technique;

Examiner is discussing the [preamble] and element [a] together. In the preamble, the claimed “anomalies” are taught by the language of normal and abnormal in Cretu (0022 “Lj,i=TEST(Pj,Mi)” for “0 if…normal or 1 [if] abnormal”). Cretu’s solution to a data cleaning problem is: creating a model that may be used for detection purposes. (Fig. 1 Items 160, 170; 0024 showing equations for Model 150, where Model 170 is the “anomaly detection model”; see also 0020 (explaining that 195 “can [also] be used to detect anomalies”)). As such, the preamble language of “detect” is taught. Similarly, element [a]’s language of “to detect” is also taught by the same mapping. See also Cretu at TITLE “…ANOMALY DETECTION MODELS”, 0020 “model 170 and/or model 195 can be used to detect anomalies”.
Element [a] differs from the preamble with the “receiving” operation with the data which is taught by Cretu at (Fig. 1 Items 125, 126 (small units); 0020 “training data subset 126”, 0021 (on training with “110 (T)”)). Lastly, the limitation of “for training a first machine learning model” is a type of intended use, which is Examiner is taking as limiting. Support for this limiting language can be found here: (Fig. 1 Items 125, 126 (small units); 0020 “training data subset 126”, 0021 (on training with “110 (T)”)).7
Cretu is lacking in that there is no mention of financial “transaction” in claimed “transaction data set” albeit Cretu’ s “TECHNICAL FIELD” is broad with anomaly detection in general. Fogel remedies.
Like Cretu, anomaly detection is used. When an anomaly is detected, a notification is sent to the user to apprise them of the risk. (Fig. 4 Item 430 (anomaly detection), 432 (notify user), Fig. 5 (showing risk level); 0056 “detect[ and] notify the user”, 0058 “risk score”; Claim 57, preamble “evaluating the risk of fraud”). 
Fogel teaches data from entities (Fig. 3 Items 306, 308, 310; 0026). This data may be used in fraud detection for a transaction (0006, 0007, 0028 “fraud…with a credit transaction”). Therefore, the type of data of “transaction data” is taught in the preamble and in element [a]. 

[b] accessing at least one micro-model previously trained using at least one second data set separate from the first data set;

Cretu teaches: [b] accessing at least one micro-model (Fig. 1 Item 130; 0021 referred to as “135 (M)”) previously trained using at least one second data set separate from the first data set (Fig. 1 Item 125 “Training Data Subsets”, 0021 “ssi is the subset”);

[c] determining risk scores from the first data set using the at least one micro-model;
[d] enriching the first data set with the risk scores; and

Elements [c] and [d] are discussed together.
For element[c], Cretu teaches generating a following the micromodel (0022 “Lj,i=TEST(Pj,Mi)” for “0 if…normal or 1 [if] abnormal”, 0023 using “a score can be calculated” (emphasis added)). For element [d] and the operation of “enriching,” this language is understood in light of the Spec. (0016 “risk scores are used to enriched the second data set”). This language is broad under BRI. The Species of Cretu of data labeling on the specified dataset therefore meets this Genus language here: (0022 “labeled data set with each training dataset item” (emphasis added) & “Lj,i=TEST(Pj,Mi)” for “0 if…normal or 1 [if] abnormal”, Lj,i is the generated label).
Similar to elements [a] and [b], [c] and [d] are silent on financial uses for data processing. Specifically, the claimed “risk scores” are not taught by Cretu. Fogel remedies this coloration with the following: (Fig. 4 Item 430 (anomaly detection), 432 (notify user), Fig. 5 (showing risk level); 0056 “detect[ and] notify the user”, 0058 “risk score”; Claim 57, preamble “evaluating the risk of fraud”). Therefore, in combination with Cretu the whole limitation “risk score” is taught in elements [c] and [d].

[e] determining the first machine learning model for the enriched first data set using the machine learning technique.

Element [e] is understood as a whole in view of element [a]’s “for training a first machine learning model” which the Examiner is taking as limiting as noted above. The use of this model (i.e., detection) is taught as follows: (Fig. 1 Items 160, 170; 0024 showing equations for Model 150, where Model 170 is the “anomaly detection model”; see also 0020 (explaining that 195 “can [also] be used to detect anomalies”)).

Therefore, it would have been obvious to one of ordinary skill in the art at the time of the filing date to modify micromodel anomaly system of Cretu by altering datasets to be used in financial transactions, like those found in Fogel’s anomaly detection, in order to use machine learning machines in fraud analysis since “human experts” have short comings when it comes to fraud detection in financial applications. Fogel at 0003-0007. Put another way, using machine learning methods on transaction sets, would be obvious as it would remove the manual nspection activity performed by a data analyst. MPEP 2144.04(III) (Automating a Manual Activity).

Regarding claims 2 and 11 Cretu teaches: 
determining a second machine learning model for the first data set using the machine learning technique (Fig. 2 Item 220; 0025), wherein the second machine learning model is for an unenriched data set corresponding to the first data set (0025-0026).

Regarding claims 9 and 18: 
While Cretu teaches multiple data subsets (Fig. 1 Item 125), Fogel teaches “one or more data sources” from the data origin. (Fig. 3; 0027.) 
Cretu teaches “unannotated data sets” before processing the micromodel: (0022 “labeled data set with each training dataset item” (emphasis added) & “Lj,i=TEST(Pj,Mi)” for “0 if…normal or 1 [if] abnormal”, Lj,i is the generated label). Lastly, “fraudulent transactions” are taught by Fogel (0029; Claim 57, preamble).

Claims 3-4 and 12-13 are rejected under AIA  35 U.S.C. 103(a) as being unpatentable over (A) Cretu and (B) Fogel in view of (C) Inakoshi US-20210117830-A1.
Regarding claims 3 and 12 Cretu teaches first and second model generations, and compares a plurality of models:
generating a model… based on the first machine learning model and the second machine learning model (0025-0026), wherein the model explanation output comprises a comparison (Fig. 2 Item 210 “Compare Models”; 0025-0026 (on comparing))…of each classification task within the first machine learning model and the second machine learning model (0025 “common set of features 230”)
Neither Cretu nor Fogel teach “explanation output” (which is akin to the term of art “XAI” or “Explainable Artificial Intelligence” found in Inakoshi (0006)) based on features. Inakoshi remedies with: (Fig. 3 Item 22; 0045-0046 “explainable model generator 22”; EQ1 at 0046); see also (0097 “contribution of the features”).
            
                (
                E
                Q
                1
                )
                 
                f
                
                    
                         
                        x
                    
                
                 
                 
                =
                 
                g
                
                    
                         
                        x
                        '
                    
                
                 
                =
                φ
                
                    
                         
                    
                    
                        0
                    
                
                 
                -
                Σ
                 
                φ
                
                    
                         
                    
                    
                        i
                    
                
                 
                
                    
                        x
                    
                    
                        i
                    
                    
                        '
                    
                
                 
                ,
                 
                w
                h
                e
                r
                e
                 
                φ
                
                    
                         
                    
                    
                        i
                    
                
                 
                 
                i
                s
                 
                t
                h
                e
                 
                c
                o
                n
                t
                r
                i
                b
                u
                t
                i
                o
                n
                .
            
        

Therefore, it would have been obvious to one of ordinary skill in the art at the time of the filing date to take the combined teachings of Cretu-Fogel, specifically the post-processing found in Cretu of comparing models, and substitute in XAI methods found in Inakoshi in order to determine which features are casually related to the output. Inakoshi at 0005–0006.

Regarding claims 4 and 13 Cretu teaches model classification: 
in each classification task of the first machine learning model and the second machine learning model; and (0025-0026)… to each classification task to obtain the comparison (0025-0026).
Neither Cretu nor Fogel discloses explanation machine with ranking based on averages. Inakoshi teaches SHAP values8: 
obtaining an importance ranking of each feature (0097 “contribution of the features”) and (0097 “average ranks….median of the numerical ranks of the SHAP values”).

Claims 5, 6, 8, 14, 15, and 17 are rejected under AIA  35 U.S.C. 103(a) as being unpatentable over (A) Cretu and (B) Fogel in view of (D) Sinn US-20210312336-A1.
Regarding claims 5 and 14 Cretu teaches: 
receiving the at least one second data set for the at least one micro-model, wherein the first data set and each of the at least one second data set comprise segregated data sets (Fig. 1 Items 136; 0021 “ssi is the subset”)…and wherein the at least one second data set comprises at least one auxiliary data set; and (0021-0022)
generating the at least one micro-model using the at least one second data set (0021-0022).

Cretu does not teach (i) a Species of machine learning of supervised learning and (ii) federated systems (which is a term of art):
Fogel teaches supervised learning:
and at least one supervised machine learning technique (0051 “supervised learning”)
Neither Cretu nor Fogel teach federated learning. Sinn teaches:
for a federated training system (TITLE; Fig. 5A (showing whole system); 0010 (explaining Fig. 5), 0067)…

Taking a Graham factor into account of “content”, Cretu discloses a segmented data called “Training Data Subsets.” Similarly, Sinn discloses similar structure content of the “AGGREGATOR” in Fig. 5A which yokes together local data form P1 to PN whereby the yoked data is aggregated data is segmented from each user.
As such, the modification is simple. It is obvious to take Cretu’s data structure and allow the sourcing from more than one user. A POSITA would be motivated to source from more than one user in order to gather data is “[a]cross boundaries” from different users. (Sinn at 0014; see also 0019-0020 (multiple banks).) Sourcing data this way would allow for a type of resource pooling. Sinn at 0020, 0026.

Regarding claims 6 and 15 Cretu teaches: 
wherein the scores are determined based on intersecting features between the first data set and the at least one second data set for the at least one micro-model (0025 “Model 250” is based in part on intersection of “Models 220 ∩Model 200 [as opposed to the UNION operator ∪]”).
Cretu does not teach “risk score”. Fogel remedies with: (Fig. 4 Item 430 (anomaly detection), 432 (notify user), Fig. 5 (showing risk level); 0056 “detect[ and] notify the user”, 0058 “risk score”). 

Regarding claims 8 and 17 Cretu teaches: 
wherein each of the at least one micro-model is trained (Fig. 1 Item 125 “Training Data Subsets”, 0021 “ssi is the subset”)…for the at least one second data set (Fig. 1 Item 125 “Training Data Subsets”, 0021 “ssi is the subset”), and wherein the each of the at least one micro-model is not optimized after training (Compare Fig. 1 Item 135 (after creation) with Fig. 1 Items 120,125 (before creation with training)).
Cretu does not teach different algorithms and datasets. Sinn remedies with local data sets and functions based on features within those datasets: using one of a different algorithm or a different data set (Fig. 5A Item “x” of “Local Data” inside P’s; 0068 “local data”, 0070 “feature functions”).

Claims 19 and 20 are rejected under AIA  35 U.S.C. 103(a) as being unpatentable over (A) Cretu in view of (B) Fogel in view of (C) Inakoshi US-20210117830-A1 in view of (D) Sinn US-20210312336-A1.
Regarding claims 19 Cretu teaches: 
A non-transitory computer-readable medium having stored thereon computer-readable instructions executable to detect anomalies…based on one or more machine learning models, the computer-readable instructions executable to perform modeling operations which comprises (0022 “Lj,i=TEST(Pj,Mi)” for “0 if…normal or 1 [if] abnormal”):
receiving a first data set for training a first machine learning model (Fig. 1 Items 125, 126 (small units); 0020 “training data subset 126”, 0021 (on training with “110 (T)”)) to detect anomalies in the…data sets (TITLE “…ANOMALY DETECTION MODELS”, 0020 “model 170 and/or model 195 can be used to detect anomalies”) using a machine learning technique; (0024 showing equations for Model 150)
accessing at least one micro-model previously trained (Fig. 1 Item 130; 0021 referred to as “135 (M)”) using at least one second data set separate from the first data set (Fig. 1 Item 125 “Training Data Subsets”, 0021 “ssi is the subset”)…
determining…scores from the first data set using the at least one micro-model; enriching the first data set with the…scores; (0022 “Lj,i=TEST(Pj,Mi)” for “0 if…normal or 1 [if] abnormal”, 0023 using “a score can be calculated” (emphasis added))
determining the first machine learning model (Fig. 1 Items 160, 170; 0024 showing equations for Model 150, where Model 170 is the “anomaly detection model”; see also 0020 (explaining that 195 “can [also] be used to detect anomalies”)) for the enriched first data set using the machine learning technique; (0022 “labeled data set with each training dataset item” (emphasis added) & “Lj,i=TEST(Pj,Mi)” for “0 if…normal or 1 [if] abnormal”, Lj,i is the generated label)
determining a second machine learning model for the first data set using the machine learning technique (Fig. 2 Item 220; 0025), wherein the second machine learning model is for an unenriched data set corresponding to the first data set; and (0025-0026)
…based on the first machine learning model and the second machine learning model (0025-0026), wherein the model…comprises a comparison (Fig. 2 Item 210 “Compare Models”; 0025-0026 (on comparing)) between each feature of each classification task within the first machine learning model and the second machine learning model (0025 “common set of features 230”).

Cretu does not teach 3-fold:
in transaction data sets…risk scores…
wherein the first data set and each of the at least one second data set comprise segregated data sets for a federated training system;
generating a model explanation output… explanation output
Fogel teaches:
in transaction data sets (Fig. 3 Items 306, 308, 310; 0026)…risk scores… (Fig. 4 Item 430 (anomaly detection), 432 (notify user), Fig. 5 (showing risk level); 0056 “detect[ and] notify the user”, 0058 “risk score”; Claim 57, preamble “evaluating the risk of fraud”)
Neither Cretu nor Fogel teach a (i) federated learning system with split datasets and (ii) explanation machine learning.
Inakoshi teaches explanation machine learning:
generating a model explanation output (Fig. 3 Item 22; 0045-0046 “explainable model generator 22”; EQ1 at 0046, 0097)… explanation output (Fig. 3 Item 22; 0045-0046 “explainable model generator 22”; EQ1 at 0046, 0097)
            
                (
                E
                Q
                1
                )
                 
                f
                
                    
                         
                        x
                    
                
                 
                 
                =
                 
                g
                
                    
                         
                        x
                        '
                    
                
                 
                =
                φ
                
                    
                         
                    
                    
                        0
                    
                
                 
                -
                Σ
                 
                φ
                
                    
                         
                    
                    
                        i
                    
                
                 
                
                    
                        x
                    
                    
                        i
                    
                    
                        '
                    
                
                 
                ,
                 
                w
                h
                e
                r
                e
                 
                φ
                
                    
                         
                    
                    
                        i
                    
                
                 
                 
                i
                s
                 
                t
                h
                e
                 
                c
                o
                n
                t
                r
                i
                b
                u
                t
                i
                o
                n
                .
            
        

Neither Cretu, Fogel, nor Inakoshi teach federate learning.
Sinn teaches:
wherein the first data set and each of the at least one second data set comprise segregated data sets for a federated training system; (TITLE; Fig. 5A (showing whole system); 0010 (explaining Fig. 5), 0067)

Therefore, it would have been obvious to one of ordinary skill in the art at the time of the filing date to modify micromodel anomaly system of Cretu by altering datasets to be used in financial transactions, like those found in Fogel’s anomaly detection, in order to use machine learning machines in fraud analysis since “human experts” have short comings when it comes to fraud detection in financial applications. Fogel at 0003-0007. Put another way, using machine learning methods on transaction sets, would be obvious as it would remove the manual activity of the data analyst of inspecting data. MPEP 2144.04(III) (Automating a Manual Activity).
Following the combination above, it would have been obvious to one of ordinary skill in the art at the time of the filing date to take the combined teachings of Cretu-Fogel, specifically the post-processing found in Cretu of comparing models, and substitute out XAI methods found in Inakoshi in order which features are most casually related to the output. Inakoshi at 0005–0006.
Taking a Graham factor into account of “content”, Cretu discloses a segmented data called “Training Data Subsets.” Similarly, Sinn discloses similar structure content of the “AGGREGATOR” in Fig. 5A which yokes together local data form P1 to PN whereby the yoked data is aggregated data is segmented based on teach user.
As such, the modification is simple. It is obvious to take Cretu’s data structure and allow the sourcing from more than one user. A POSITA would be motivated to use this model in order to gather data is “[a]cross boundaries” from different users. (Sinn at 0014; see also 0019-0020 (multiple banks)) By gathering more data through multiple sources, this would allow for a type of resource pooling. Sinn at 0020, 0026.

Viewing (4) References as a Whole 
Following the combination of the (4) reference rejection, the current modification includes a small coloration of taking Cretu’s dataset and applying to financial endeavors. Second, there is a structural modification to Cretu’s sub-set of data and how the data is sourced prior to the micromodel. These modifications are a type of preprocessing. The remaining teachings (i.e., “generating”), viewed as a whole, are still obvious to integrate as the claimed “explanation output” occurs in a post-processing phase. As such, the principle function (i.e., overall flow from pre-processing, in-processing (i.e., training), and to post-processing) of Cretu’s would NOT change. MPEP 2145(III) (citing MPEP 2143.01).

Regarding claims 20 Cretu teaches: 
wherein the at least one micro-model is generated using the at least one second data set (0021-0022)…wherein the…scores are determined based on intersecting features between the first data set and the at least one second data set for the at least one micro- model (0025 “Model 250” is based in part on intersection of “Models 220 ∩Model 200 [as opposed to the UNION operator ∪]”), and wherein the at least one second data set comprises at least one auxiliary data set (0021-0022).
Cretu does not teach “risk scores” and a specialized machine learning method of supervised learning. Fogel teaches:
risk score (Fig. 4 Item 430 (anomaly detection), 432 (notify user), Fig. 5 (showing risk level); 0056 “detect[ and] notify the user”, 0058 “risk score”; Claim 57, preamble “evaluating the risk of fraud”)…and at least one supervised machine learning technique (0051 “supervised learning”)…
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENNIS G KERITSIS whose telephone number is (313)446-6591.  The examiner can normally be reached on Mon-Fri 9:00-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hayes John can be reached on (571) 272-6708.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DENNIS G KERITSIS/Examiner, Art Unit 3685                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 Steven M. Kaplan, WILEY ELECTRICAL & ELECTRONICS ENGINEERING DICTIONARY 605 (2004) (“Wiley EE Dictionary”), p. 441 (defined as “The ability of devices and systems to apply knowledge that has been previously programmed and recorded, in order to analyze and better face new situations.”) (enclosed as NPL).
        2 This is possibly the earliest disclosed whitepaper reference for this term of art.
        3 A copy will be furnished and cited in the PTO-892.
        4 See also The American Heritage Dictionary at optimize, def. (“1. To make as perfect or effective as possible.
        2. Computers To increase the computing speed and efficiency of (a program), as by rewriting instructions.”), available at https://www.ahdictionary.com/word/search.html?q=optimize .
        5 Paragraph 0037 does go on to provide examples both of “MSE” and “XGBoost”; however, these relate to normalization and imputation (respectively), not the step of sampling. See also 0036 (outlining other preprocessing steps in second sentence), 0040 (outlining other preprocessing steps in first sentence).
        6 Independent Claim 19 is narrower.
        7 Examiner is additionally viewing the claim as a whole. That is, element [a]’s “for a first machine learning model” is viewed in terms of element [e] whereby the same language of “first machine learning model” shows up again. Detailed discussed on elements [a] and [e] together can be found below.
        8 The Species of “SHAP” and “LIME” are found in the Spec. 0017. Therefore, the Species teaches the Genus.