DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This Office Action is in response to the amendments filed.
The amendments filed have been accepted and are hereby entered.
Claims 12, 5-6, 8-10, 12-13, and 15-20 have been amended.
No claims have been canceled or withdrawn.
No claims have been added.
Claims 1-20 are pending and have been examined.
This action is made FINAL.









Response to Amendment
Applicant’s argument with respect to the 35 U.S.C. §112(b) rejection of claim 9 directed to “current payment transaction” is withdrawn in view of the claim amendment received.

Applicant’s arguments with respect to the 35 U.S.C. § 101 rejection of claims 1-20 have been fully considered but are not persuasive.

Applicant’s arguments with respect to the 35 U.S.C. § 103 rejection of amended claims 1-20 have been fully considered and are deemed persuasive; however, they are moot in view of new grounds of rejection.

As required by M.P.E.P. § 707.07(f), a response to these arguments appears below.










Response to Arguments
 With respect to 101 arguments, Applicant asserts the claims are patent eligible per the following reasons, of which Examiner respectfully disagrees (analysis provided inline, below):

Page 9 of Remarks, addressing step 2A Prong II: […] the present claims necessarily require the use of computer technology and specific computing devices so as to be "rooted in computer technology" and therefore are not directed to an abstract idea.

With respect to above argument, Examiner respectfully disagrees. With respect to “rooted in computer technology” statement, Examiner believes Applicant argument is relying upon DDR Holdings v. Hotels.com, which stated certain claims were patent eligible per being “necessarily rooted in computer technology in order to overcome a problem arising in the realm of computers”. Examiner respectfully submits that the above rationale is not applicable to the instant claims, as the additional elements are merely applied (As described in 101 rejection below – see MPEP 2106.05(f)). Examiner respectfully maintains this position in view of at least ¶¶57-58 of Applicant Specification suggesting that the computers (e.g., additional elements) implementing process may be general purpose computers (e.g., personal computer). More specifically, addressing original argument, Examiner fails to see how the claim limitations are directed to overcoming a problem arising in the “realm of computers”, and conversely contends that the problem being solved is detecting transaction fraud via fraud models. Examiner fails to see as to how the claimed details pertaining to the recurrent neural network of fraud model distinguish itself from any other generic, commercial off-the-shelf recurrent neural network (RNN) model, as each of the claim elements of the claimed RNNs are features/processes that are known to define or otherwise be included in generic, commercial off-the-shelf (COTS) recurrent neural networks: 

Training of a RNN with features based on previous features/input – Examiner notes any generic, COTS RNN, or COTS, neural network, more generally, is trained with (typically labeled) data, and remembers previous calculations in calculating a given (current) output.

A plurality of hidden Nodes and corresponding encoded states – Examiner notes generic, COTS RNN are generally known to comprise a hidden layer with nodes comprising encoded hidden states and cell states.

Output nodes / layers – See (2) above.

Nodes previously calculated based on prior (input) data – See (1) above. Examiner notes generic, COTS RNNs use a hidden layer comprising nodes to calculate their corresponding predictions / estimations based on prior data in a sequence, generally.

Storing RNN model calculations (in memory) – It is generally known that generic, COTS RNN models perform calculations in prediction/training process, and that any given process of RNN is stored in memory, per generic, COTS RNNs being computer implemented.

Encoding a new set of encoded states for a plurality of nodes based on the current / most recent input (i.e., updating encoded states in response to new inputs) – Examiner notes any generic, COTS RNN will comprise a hidden layer comprised of a plurality of nodes with encoded hidden states and cell states.

Addressing claim 2, a feedback loop that updates one or more hidden states in a hidden layer for a plurality of hidden nodes – Examiner notes, as even suggested by ¶2 of Applicant specification, that Recurrent Neural Networks, by definition, have a feedback loop in their hidden layer, as opposed to a simpler feed-forward design (e.g., neural network (NN)). I.e., the limitation fails to distinguish itself from any generic, COTS RNNs.

Furthermore, similarly, Examiner fails to see as to how the claimed details pertaining to the claimed cache distinguishes itself from any other generic, commercial off-the-shelf cache, as each of the following functionalities of the claimed cache fails to distinguish itself from a generic cache: 

Cache storing (data)

Updating Cache (i.e., updating data in cache)

Caches using address spaces (i.e., addressing via a range of memory addresses) – Examiner also notes this aspect addresses Applicant’s claim 8, as it is known that computers determine address spaces when accessing cache, based on what information is to be processed by computer.

Furthermore, Examiner respectfully submits that caches are generally known to speed up computer processing via faster data retrieval (relative to slower read operations in persistent/main memory), and further maintains that the cache in both claims and Applicant’s specification fails to distinguish itself from any generic, general-purpose cache, as the claims fails to provide any details about how the cache operates / caches (or other pertinent details such as structure / architecture), beyond merely stating that the cache stores data, is updated, and utilizes address spaces. Furthermore, Examiner respectfully submits that, despite Applicant claims disclosing a “cache”, there is effectively no difference between the claimed functionality performed by the cache, and any generic computer memory, as the claims fail to provide any cache-specific implementations / details, architecture, or caching methodologies that differentiates a generic, general-purpose memory from a cache.

 In view of the generality of the additional elements, and the solved problem being directed to abstract subject matter (e.g., detecting transaction fraud based on individual information), Examiner respectfully maintains that, while the claims are limited to embodiments including computers, merely adding a generic computer, generic computer components, or a programmed computer to perform generic computer functions does not automatically overcome an eligibility rejection. Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 573 U.S. 208, 223-24, 110 USPQ2d 1976, 1983-84 (2014). See In re Alappat, 33 F.3d 1526, 1545 (Fed. Cir. 1994); In re Bilski, 545 F.3d 943 (Fed. Cir. 2008) (See MPEP §2106.05(b))1. 

Page 9 of Remarks, addressing Step 2A Prong II: Applicant submits that the limitations of the amended claims integrate the exception into a practical application, and thus the amended claims are statutory under Step 2A, Prong Two. […] the limitations relate to a specific usage (e.g., intelligent recurrent neural network (RNN) usage and optimization) in a specific situation (e.g., when analyzing large data sets for transactions having may features for feature inputs at an input layer) to provide a particular practical application that improves technology (e.g., to optimize data storage and RNN run-time resource usage through the use of encoded hidden states stored in a cache in place of large transaction data sets).

Examiner respectfully disagrees with above argument, as Examiner fails to see the claim limitations reciting anything other than generic computer components / algorithms (e.g., RNN), of which are merely applied. 

Furthermore, with respect to “optimize data storage”, and “RNN run-time resource usage”, Examiner notes caches generally provide improved efficiency of data retrieval, (and correspondingly provide more timely data processing per retrieval being a bottleneck). Examiner fails to see as to how Applicant is improving a technology or technical field – particularly when Applicant fails to provide any details of the additional elements (e.g., including the cache and RNNs) beyond what is generic / inherent for general-purpose caches/memory and generic, COTS RNNs, and the general benefits they’re known to inherently provide, when implemented properly. 

In other words, Examiner fails to find any improvement in a technical field (See MPEP 2106.05(a)), as the claims fails to differentiate the general benefits of the merely applied computer components from the purported improvement of the invention – specifically because RNNS are known to be ran in memory, and caches are generally known as a form of memory which may speed up data retrieval (and therefore improve processing times / resource utilization). 

With respect to statement, “[optimize] RNN run-time resource usage through the use of encoded hidden states stored in a cache in place of large transaction data sets”,  Examiner, in viewing the claims, fails to find any improvement in a technical field (See MPEP 2106.05(a)), as the claims fails to differentiate the general benefits of the merely applied components from the purported improvement – specifically because RNNS are known to be ran in memory, and caches are generally known as a form of memory which may speed up data retrieval (and therefore improve processing times / resource utilization). 

Furthermore, Examiner fails to see as to how the claims are commensurate in scope with Applicant’s basis for improvement.  The claims, as written, do not suggest, limit, or provide any indication that “large transaction data sets” are omitted or otherwise done away with – and further fails to indicate that the “large transaction data sets” are unnecessary / done away with in Applicant’s invention. Examiner respectfully submits that training data is generally used in RNN processes – the encoded state values (e.g., hidden and cell states) are known to be determined via training, so Examiner fails to see as to how large transaction data sets are, for lack of better phrase, ‘done away with’ in Applicant’s invention. While Examiner concedes the omission of large training data sets for processing individual instances of inputs may amount to successful integration, per being an improvement to the functioning of a computer, or other technology (2106.05(a)), Examiner respectfully maintains that the claims are not limited to the narrower embodiment (i.e., system free of large transaction data sets in per-input-processing) that Applicant asserts as basis for improvement, as claims do not preclude themselves from embodiments where both cache of hidden states and “large transaction datasets” are used), resulting in the claims failing to distinguish itself from embodiments merely applying the generic / COTS computer elements. Examiner notes this analysis is required by 2019 PEG guidance ((See Page 13 of 2019 PEG, under §“An Improvement in the Functioning of a Computer or an Improvement to other Technology or Technical Field” (emphasis added)):

During examination, the examiner should analyze the “improvements” consideration by evaluating the specification and the claims to ensure that a technical explanation of the asserted improvement is present in the specification, and that the claim reflects the asserted improvement. Generally, examiners are not expected to make a qualitative judgment on the merits of the asserted improvement. If the examiner concludes the disclosed invention does not improve technology, the burden shifts to applicant to provide persuasive arguments supported by any necessary evidence to demonstrate that one of ordinary skill in the art would understand that the disclosed invention improves technology. Any such evidence submitted under 37 C.F.R. § 1.132 must establish what the specification would convey to one of ordinary skill in the art and cannot be used to supplement the specification. 74 For example, in response to a rejection under 35 U.S.C. § 101, an applicant could submit a declaration under § 1.132 providing testimony on how one of ordinary skill in the art would interpret the disclosed invention as improving technology and the underlying factual basis for that conclusion.

Examiner respectfully submits that the aforementioned rationales are sufficient in showing a 35 U.S.C. §101 rejection is proper. Arguendo, with respect to claims storing the hidden states / hidden layer values of the RNN in cache, Examiner notes that caching the hidden layer calculations (e.g., hidden states) is adding insignificant extra solution activity (MPEP 2106.05(g)), and is well-understood, routine, and conventional activity (MPEP 2106.05(d)) in the field of neural networks:

  Non-Patent-Literature, “Conditional Computation in Deep and Recurrent Neural Networks”, disclosing in §2.7, “Other Methods of Accelerating Neural Networks”, that, generally, caching of weights in neural networks accelerates speed / utilization of overall solution: 

“In the case that weights or activations cannot be so aggressively quantized, there are still caching benefits, where larger proportions of the model weights can be stored in L1 or L2 cache, providing faster accesses to model parameters when computing activations.”

Non-Patent-Literature, “DeepCPU: Serving RNN-based Deep Learning Models 10x Faster” (“Minjia”), disclosing, in Introduction section (page 952): 

“[matrix multiplications] in RNNs are usually much smaller, fitting entirely in shared L3 cache, but with minimal data reuse: data movement from shared L3 cache to private L2 cache is the main bottleneck”.

Non-Patent Literature, “Building your Recurrent Neural Network - Step by Step” (“Berhane”), disclosing steps to help someone learning data science how to “implement [their] first Recurrent Neural Network in numpy [sic]”, further suggesting encoded hidden layer values (e.g., hidden states) being cached (See at least §1.1 – RNN cell disclosing at, at-1 (i.e., the hidden states) as stored in cache in step 3). Examiner respectfully asserts that the purely instructive nature of Berhane, in tandem with the fact disclosure is to help one “implement your first [RNN][…]” – suggests a basic, bare-bones implementation, and indicates that the disclosed caching is a basic operation in feed-forward / backpropagation steps of recurrent neural networks.

Accordingly, in view of the above rationales, and 101 rejection (below), Examiner respectfully asserts the argument is not persuasive. 

Page 9 of Remarks, addressing Step 2A Prong II: The claims provide specific improvements in technology so as to be limited to a practical application that improves over the prior systems. For example, the present application notes the issues in the technology at paragraphs [0003]and [0004], which state: Typically, the features used by the feed-forward neural network a derived via feature engineering by those knowledgeable about the data used and produced by these transactions. However, a feed-forward neural network uses only information pertaining to the current transaction as input to determine its output.

Examiner respectfully disagrees with the above argument, particularly in view of the reasons mentioned in previous arguments ‘a.’ and ‘b.’ (above);(In other words, Examiner respectfully disagrees that the claims provide specific improvements to the field of machine learning, as the additional elements are merely applied, or otherwise extra-solution activity). 

With respect to statement “However, a feed-forward neural network uses only information pertaining to the current transaction as input to determine its output”, Examiner fails to find argument convincing, as any given COTS RNN is not limited to forward-only travel (i.e., any given generic, COTS RNN has feedback loops, by definition).

With respect to statement, “Typically, the features used by the feed-forward neural network a derived via feature engineering by those knowledgeable about the data used and produced by these transactions, Examiner fails to find argument convincing, as Examiner fails to see how the claims are commensurate in scope – more specifically, Examiner fails to see how the RNN of Applicant’s claims discloses automatically performing feature engineering. Similarly, Examiner further fails to see how the claim limitations surrenders claim scope of RNNs whose features were determined manually by those knowledgeable about the data used and produced by the transactions. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). ((See also Page 13 of 2019 PEG, under §“An Improvement in the Functioning of a Computer or an Improvement to other Technology or Technical Field” (emphasis added)):

During examination, the examiner should analyze the “improvements” consideration by evaluating the specification and the claims to ensure that a technical explanation of the asserted improvement is present in the specification, and that the claim reflects the asserted improvement. Generally, examiners are not expected to make a qualitative judgment on the merits of the asserted improvement. If the examiner concludes the disclosed invention does not improve technology, the burden shifts to applicant to provide persuasive arguments supported by any necessary evidence to demonstrate that one of ordinary skill in the art would understand that the disclosed invention improves technology. Any such evidence submitted under 37 C.F.R. § 1.132 must establish what the specification would convey to one of ordinary skill in the art and cannot be used to supplement the specification. 74 For example, in response to a rejection under 35 U.S.C. § 101, an applicant could submit a declaration under § 1.132 providing testimony on how one of ordinary skill in the art would interpret the disclosed invention as improving technology and the underlying factual basis for that conclusion.

See also MPEP §2106.04(d)(1) and quote from Intellectual ventures LLC v. Symantec Corp (respectively):

    Evaluating Improvements in the Functioning of a Computer, or an Improvement to Any Other Technology or Technical Field in Step 2A Prong Two: The specification need not explicitly set forth the improvement, but it must describe the invention such that the improvement would be apparent to one of ordinary skill in the art. Conversely, if the specification explicitly sets forth an improvement but in a conclusory manner (i.e., a bare assertion of an improvement without the detail necessary to be apparent to a person of ordinary skill in the art), the examiner should not determine the claim improves technology.

But when a claim directed to an abstract idea “contains no restriction on how the result is accomplished . . . [and] [t]he mechanism . . . is not described, although this is stated to be the essential innovation[,]” id. at 1348, then the claim is not patent-eligible.

Accordingly, Examiner fails to find argument convincing. 

Page 9 of Remarks, addressing Step 2A Prong II: […] RNNs typically do not use engineered features and depending on implementation, RNNs can consume a relatively large amount of computing and storage resources. 

With respect to “engineered features”, Examiner fails to find argument convincing, as Examiner fails to see how the claims are commensurate in scope – more specifically, Examiner fails to see how the RNN of Applicant’s claims discloses engineered (i.e., derived, aggregated, augmented, etc.) features, and respectfully maintains that the claimed features of claimed RNN model do not surrender scope of non-derived / non-aggregated / non-augmented features. Examiner further fails to see how the claim limitations surrenders claim scope of RNNs whose features are simply labeled data corresponding to aspects of original transaction fields, and further notes instant claims 9 and 14 imply the exact opposite of Applicant’s stance of the features being limited to derived features, as claims 9 and 14 suggest the features come directly from the transaction information. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). ((See also Page 13 of 2019 PEG, under § “An Improvement in the Functioning of a Computer or an Improvement to other Technology or Technical Field” (emphasis added)):

During examination, the examiner should analyze the “improvements” consideration by evaluating the specification and the claims to ensure that a technical explanation of the asserted improvement is present in the specification, and that the claim reflects the asserted improvement. Generally, examiners are not expected to make a qualitative judgment on the merits of the asserted improvement. If the examiner concludes the disclosed invention does not improve technology, the burden shifts to applicant to provide persuasive arguments supported by any necessary evidence to demonstrate that one of ordinary skill in the art would understand that the disclosed invention improves technology. Any such evidence submitted under 37 C.F.R. § 1.132 must establish what the specification would convey to one of ordinary skill in the art and cannot be used to supplement the specification. 74 For example, in response to a rejection under 35 U.S.C. § 101, an applicant could submit a declaration under § 1.132 providing testimony on how one of ordinary skill in the art would interpret the disclosed invention as improving technology and the underlying factual basis for that conclusion.

Furthermore, Examiner generally disagrees with characterization of RNNs generally not including engineered features in view of the following prior art:

Context-Aware Credit Card Fraud Detection to Jurgovsky (“Jurgovsky”), disclosing:

 in page 58, ¶1: feature level augmentation appears to be a more promising strategy for credit card fraud detection as it enables us to probe the fraudulent or genuine character of transactions directly by including either manually crafted or automatically derived features in the classification pipeline.

In page 108, ¶3: Feature aggregates can be easily integrated in any existing classification process and they seem to readily improve the prediction accuracy. [i.e., Examiner notes  aforementioned suggests feature aggregation is largely known to be implementation agnostic, as it’s a pre-processing step].

See also at least page 127, Caption of Fig. 6.4 disclosing aggregated features selected based on a greedy forward selection algorithm, and § A.2 in page 137-138 disclosing automatically selected aggregates in feature selection process.

Non-Patent Literature, “Sequence classification for credit-card fraud detection” to Jurgovsky (Jurgovsky-2), disclosing:
On page 236, within §2.1 “Feature Engineering for Temporal Sequences” - more traditional approaches aim to extract features that indicate the degree to which the current transaction differs from previous transactions. Whitrow, Hand, Juszczak, Weston, and Adams (2009) proposed feature aggregation strategies to summarize the historic purchase activities of card holders. The authors create an activity record for each transaction and account by summing up the amounts spent in previous transactions within time periods spanning the past 1, 3 and 7 days. The record contains several of such aggregate variables – one for each value of a categorical feature. For instance, the aggregate variable MCC1434_3 would denote the sum of amounts of all transactions from the previous 3 days, in which the merchant category code was 1434. For all transactions they compute the corresponding activity record and feed the new feature vector to a classifier, that is trained to predict whether the status of the account is compromised or normal.

With respect to “RNNs can consume a relatively large amount of computing and storage resources”, Examiner respectfully maintains argument is not convincing for the same reasons addressed in previous argument responses mentioned previously (above, e.g., in responding to arguments ‘a.’ and ‘b.’), as purported improvement is drawn to additional elements which are merely applied.

Accordingly, Examiner fails to find argument convincing. 

Page 9 of Remarks, addressing Step 2A Prong II, emphasis added: [...] Storing only one set of hidden states for a user account reduces the amount of storage required for maintaining respective set of hidden states for multiple user accounts. Such an implementation enables execution of the RNN fraud model to scale to service provider systems 102 having a large number of user accounts.

Examiner respectfully disagrees with above argument, as the claims are not commensurate in scope with Applicant’s narrower embodiment (in addition to the reasons provided above in previous arguments). Specifically, Examiner notes that the claims recite: the cache further stores a plurality of sets of encoded states for a plurality of user accounts; Furthermore, Examiner fails to see the claims reciting that the cache is limited only to storing hidden/encoded states, and is accordingly merely applied, as explained in rationales (above).

Page 10 of Remarks, addressing Step 2A Prong II: The limitations of the present claims provide technical solutions directed to the above to provide faster and more efficient RNN usage for fraud detection and other data processing of large data sets by efficiently storing encoded hidden state data instead of large data sets and features for multiple past data instances (e.g., transactions).

Examiner respectfully disagrees with above argument for the same reasons provided in responding to arguments ‘a.’ and ‘b.’, (above), as the additional elements are merely applied (See Examiner response to arguments of a. and b.).

Page 10 of Remarks, addressing Step 2A Prong II: […] the specific limitations of the claims are directed to improving neural network and RNN engines and systems for more efficient and faster predictive outputs. […] Here, like claim 1 in Example 42, the claimed subject matter is directed to improved data processing and RNN outputs by accurately configuring and weighing data prior to a current transaction, which may be stored as hidden state data and which allows for fast and efficient data processing.

Examiner respectfully disagrees with above argument for the same reasons provided in responding to arguments ‘a.’ and ‘b.’, (above), as the additional elements are merely applied (See Examiner response to arguments of a. and b.). More specifically, Examiner respectfully submits Example 42 does not apply here, as the additional elements are merely applied, and not providing a technical solution to a technical problem, but rather merely applying generic, COTS hardware/software to solve the problem of fraud detection, where generic and known benefits of the generic additional elements are afforded to the solution, generally (i.e., Examiner fails to see an improvement in the technical field of machine learning being realized by claim limitations).

Page 11 of Remarks, addressing Step 2B: upon a specific enhancing limitation that necessarily incorporates the invention's distributed architecture - an architecture providing a technological solution to a technological problem. This provides the requisite 'something more' than the performance of 'well-understood, routine, [and] conventional activities previously known to the industry."' See page 24 of Amdocs. As the present claims further provide a technical solution to a technological problem, Applicant traverses and respectfully submits that the currently amended claims add "significantly more" to the alleged abstract idea.

Examiner respectfully disagrees with above argument, as Examiner fails to find any meaningful differences between the additional elements claimed, and generic, general-purpose forms of them. Furthermore, Examiner fails to see any particular “distributed architecture”, reflected in the claims – beyond general purpose and otherwise COTS additional elements. Furthermore, Examiner notes that caching the hidden layer calculations (e.g., hidden states) is adding insignificant extra solution activity (MPEP 2106.05(g)), and is well-understood, routine, and conventional activity (MPEP 2106.05(d)) in the field of neural networks:

  Non-Patent-Literature, “Conditional Computation in Deep and Recurrent Neural Networks”, disclosing in §2.7, “Other Methods of Accelerating Neural Networks”, that generally, caching of weights in neural networks accelerates speed / utilization of overall solution: “In the case that weights or activations cannot be so aggressively quantized, there are still caching benefits, where larger proportions of the model weights can be stored in L1 or L2 cache, providing faster accesses to model parameters when computing activations.”

Non-Patent-Literature, “DeepCPU: Serving RNN-based Deep Learning Models 10x Faster” (“Minjia”), disclosing, in Introduction section (page 952): “[matrix multiplications] in RNNs are usually much smaller, fitting entirely in shared L3 cache, but with minimal data reuse: data movement from shared L3 cache to private L2 cache is the main bottleneck”.

Non-Patent Literature, “Building your Recurrent Neural Network - Step by Step” (“Berhane”), disclosing steps to help someone learning data science how to “implement [their] first Recurrent Neural Network in numpy [sic]”, further suggesting encoded hidden layer values (e.g., hidden states) being cached (See at least §1.1 – RNN cell disclosing at, at-1 (i.e., the hidden states) as stored in cache in step 3). Examiner respectfully asserts that the purely instructive nature of Berhane, in tandem with the fact disclosure is to help one “implement your first [RNN][…]” – suggests a basic, bare-bones implementation, and indicates that the disclosed caching is a basic operation in feed-forward / backpropagation steps of recurrent neural networks.

As previously mentioned in prior argument responses, Examiner fails to see a technical solution to a technical problem, but rather merely applying generic, COTS hardware/software to solve the problem of fraud detection, where generic and known benefits of the generic additional elements are afforded to the solution, generally (i.e., Examiner fails to see an improvement in the technical field of machine learning being realized by claim limitations).

Accordingly, in view of the above rationales, and 101 rejection (below), Examiner respectfully asserts the argument is not persuasive. 

Page 11 of Remarks, addressing Step 2B: the present application, in some embodiments, utilizes pre-encoded hidden state data, stored in a cache, to quickly provide predictive outputs in a faster, more coordinated, and more efficient manner. Thus, the data from the large data sets may be immediately used as the hidden state data from previous transactions without being required to load those individual transactions, and the current transaction's features may be utilized as the input features for an RNN. In particular, the present claims provide operations to improve RNN data processing systems for recommendations and predictive outputs.

Examiner respectfully disagrees with the above argument for the reasons already mentioned in responses to arguments g and h. Furthermore, as similarly stated above, Examiner fails to see as to how the claims are commensurate in scope with Applicant’s basis for improvement.  The claims, as written, do not provide any indication that “pre-encoded hidden state data” is used – which is relied upon in further explaining advantage (i.e., data from large data sets may be immediately used without being needed to load individual transactions). Examiner respectfully submits that encoded state values (e.g., hidden and cell states) are known to be determined via processing, generally. Examiner respectfully maintains that the claims are not limited to the narrower embodiment that Applicant asserts as basis for improvement, as claims do not preclude themselves from embodiments where the hidden state data is not pre-encoded), resulting in the claims failing to distinguish itself from embodiments merely applying the generic / COTS computer elements. Examiner notes this analysis is required by 2019 PEG guidance ((See Page 13 of 2019 PEG, under §“An Improvement in the Functioning of a Computer or an Improvement to other Technology or Technical Field” (emphasis added)):

During examination, the examiner should analyze the “improvements” consideration by evaluating the specification and the claims to ensure that a technical explanation of the asserted improvement is present in the specification, and that the claim reflects the asserted improvement. Generally, examiners are not expected to make a qualitative judgment on the merits of the asserted improvement. If the examiner concludes the disclosed invention does not improve technology, the burden shifts to applicant to provide persuasive arguments supported by any necessary evidence to demonstrate that one of ordinary skill in the art would understand that the disclosed invention improves technology. Any such evidence submitted under 37 C.F.R. § 1.132 must establish what the specification would convey to one of ordinary skill in the art and cannot be used to supplement the specification. 74 For example, in response to a rejection under 35 U.S.C. § 101, an applicant could submit a declaration under § 1.132 providing testimony on how one of ordinary skill in the art would interpret the disclosed invention as improving technology and the underlying factual basis for that conclusion.


Pages 11-12 of Remarks, addressing Step 2B: the recited amended claim elements are directed to data analysis and intelligent predictive operations that differ from the generic, routine, and conventional sequence of events normally conducted when training and executing RNN and other neural networks, similar to the unconventional sequence of events in Bascom. Applicant thus respectfully submits that, similar to the analysis of Bascom, "[a]s is the case here, an inventive concept can be found in the non-conventional and non-generic arrangement of known, conventional pieces."

With respect to Applicant argument asserting claims are patent eligible under Step 2B analysis per amounting to significantly more than the Judicial Exception, citing Bascom, Examiner respectfully disagrees. Examiner notes MPEP §2106.06(b) states Bascom is pertinent when “the technology-based solution [is considered] to be an improvement to computer functionality”. Examiner fails to see which aspects of claimed invention recite an “unconventional and unique combination” of technical elements realizing an improvement to computer functionality, as the computing devices used may merely be either general purpose or COTS solutions, and includes well-understood, routine and conventional activities in the field of machine learning (e.g., feature engineering, and caching of machine learning calculations). 

Accordingly, in view of the rationales / responses, and rationales within 101 rejection (below), Examiner respectfully maintains the 101 rejection.

Claim Rejections - 35 USC § 112(a)
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement.  The claims contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA  the inventor(s), at the time the application was filed, had possession of the claimed invention (See MPEP §2106). 

Specifically, claims 1, 10, and 17 recite “the new set of encoded states comprising the newly calculated hidden value for each of the plurality of hidden nodes and the at least one output node”.  

One of ordinary skill in the art would not be apprised as to how Applicant’s Specification is written support showing Applicant possession for claimed invention, because Applicant’s specification does not reflect that Applicant contemplated how encoded states would comprise output nodes. The claim limitations noted herein define the invention by “merely specifying a desired result”, such that the specification fails to “sufficiently identify how a function is performed or the result is achieved”, and also appears to describe a ‘narrow species’ without evidence that the genus was contemplated (MPEP §2163.03). Similarly, the figures of Applicant’s disclosure fail to show, to one of ordinary skill in the art, that the invention as defined by claim limitations in question (above) was contemplated by Applicant, and are merely specifying the desired result, without describing as to how the output nodes are also encoded states. 

See MPEP §2163.03 (“An original claim may lack written description support when (1) the claim defines the invention in functional language specifying a desired result but the disclosure fails to sufficiently identify how the function is performed or the result is achieved or (2) a broad genus claim is presented but the disclosure only describes a narrow species with no evidence that the genus is contemplated. See Ariad Pharms., Inc. v. Eli Lilly & Co., 598 F.3d 1336, 1349-50 (Fed. Cir. 2010) (en banc). The written description requirement is not necessarily met when the claim language appears in ipsis verbis in the specification. "Even if a claim is supported by the specification, the language of the specification, to the extent possible, must describe the claimed invention so that one skilled in the art can recognize what is claimed. The appearance of mere indistinct words in a specification or a claim, even an original claim, does not necessarily satisfy that requirement." Enzo Biochem, Inc. v. Gen-Probe, Inc., 323 F.3d 956, 968, 63 USPQ2d 1609, 1616 (Fed. Cir. 2002).”).

Examiner notes the same rationales applies to Dependent claims 6, 12, and 18, mutatis mutandis. 4.  

Claims 2-9, 11-16, and 18-20 are rejected by virtue of dependency upon the aforementioned parent claims rejected under 112(a) rationale (above). 

Furthermore, claim 5 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement.  The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA  the inventor(s), at the time the application was filed, had possession of the claimed invention.  

Claim 5 recites the limitations of “wherein the cache stores only the single set corresponding to the set of encoded states corresponding to the user account at a first time, and wherein the single set is updated to the new set of encoded states at a second time.”.  The Examiner fails to find support in the specification for this feature, and further notes. Therefore, it is new matter.  Claims 9 and 16 recite similar limitations.

Claim Rejections - 35 USC § 112(b)
 The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
 
With respect to claims 1, 10, and 17, they recite: “executing, the new set of encoded states comprising the newly calculated hidden value for each of the plurality of hidden nodes and the at least one output node;”. It is unclear as to what is occurring in the limitation, as two distinct, non-overlapping interpretations are possible with the limitations, specifically: 

 “executing, the new set of encoded states comprising the newly calculated hidden value for each of the plurality of hidden nodes and the at least one output node;” (i.e., the “the at least one output node;” is further limiting the executing step)

“executing, the new set of encoded states comprising the newly calculated hidden value for each of the plurality of hidden nodes and the at least one output node;” (i.e., the “the at least one output node;” is further limiting the comprising limitation)


Claims 2-9, 11-16, and 18-20 are rejected by virtue of dependency upon the aforementioned unclear/rejected claims. 

Furthermore, with respect to claim 5, it recites: The system of claim 1 [wherein
the cache further stores a plurality of sets of encoded states for a plurality of user accounts], wherein the cache stores only the single set corresponding to the set of encoded states corresponding to the user account at a first time. It is unclear as to how the cache both stores a plurality of sets of encoded states for a plurality of user accounts and also cache stores only the single set corresponding to the set of encoded states corresponding to the user account at a first time. This is exacerbated by the specification failing to provide any examples of how this occurs.

Claim Rejections - 35 USC § 112(d)
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.


Claim 5 is rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends. Parent claim 1 discloses “wherein the cache further stores a plurality of sets of encoded states for a plurality of user accounts”, whereas dependent claim 5 broadens parent claim 1 by stating: wherein the cache stores only the single set corresponding to the set of encoded states corresponding to the user account [...].  Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Based upon consideration of all relevant factors with respect to the claims as a whole, claims 1-20 are determined to be directed to an abstract idea. The Examiner has identified system claim 1 as the claim that represents the claimed invention for analysis and is analogous to method claim 10 and non-transitory computer readable storage medium of claim 17 (i.e., same rationale of claim 1 (below), is similarly applied to claims 10 and 17 (mutatis mutandis)). The rationale for the aforementioned determination of patent ineligibility under 35 USC §101 is explained below:

With respect Step 1 of 2019 PEG analysis, the claims are either directed to a system, article of manufacture, or method, which are statutory categories of invention (Step 1 of 2019 PEG analysis: YES).

With respect Step 2A Prong I of 2019 PEG analysis, claims 1-20 recite as a whole a method of organizing human activity because the claims recite a method of (additional elements emphasized in bold and bracketed are considered to be parsed from the remaining elements which are reciting the abstract idea): 

A system, [comprising]: 

one or more hardware processors; 

[and] a memory [storing computer-executable] instructions, that [in response to execution by the] one or more hardware processors, causes the system to perform operations comprising: 

receiving a request to process a current payment transaction between a payment provider and a user having a user account with the payment provider; 

accessing a cache [storing] a set of encoded states for a plurality of nodes of a recurrent neural network (RNN) fraud model, the set of encoded states being previously calculated based on execution of the RNN fraud model with respect to a prior transaction of the user account and [being a single set] stored [by] the cache for the user account, 

wherein the prior transaction is an immediately preceding transaction to the current payment transaction, 

[and wherein] the cache [further stores a plurality of sets of encoded states for a plurality of user accounts;] 

determining a state of the RNN fraud model for the user account [based on the set of encoded states]; 

executing the RNN fraud model based on data associated with the current payment transaction [and the set of encoded states stored in the cache],

wherein the executing comprises: 

inputting the data associated with the current payment transaction [to at least one input node of the plurality of nodes], 

and calculating a new hidden value [for each of a plurality of hidden nodes] and at least one output node using an input value [from at least one previous node] based on the data [and the set of encoded states], 

encoding a new set of encoded states for the plurality of nodes [based on the executing], 

[encoding based on the executing] the new set of encoded states comprising the newly calculated hidden value for each of the plurality of hidden nodes and the at least one output node; 

[Page 2 of 15Appl. No.: 16/723,746updating] the cache [to store] the new set of encoded states [in place of the set of encoded states]; 

and determining a risk level corresponding to the current payment transaction based on an output of the executing the RNN fraud model.  

Under broadest reasonable interpretation, these are fundamental economic principles and/or practices of mitigating risk by determining risk levels of payment transactions / transaction requests. Thus, the claim recites an abstract idea (Step 2A Prong I: Yes).

Addressing Step 2A Prong II of 2019 PEG analysis, this judicial exception is not integrated into a practical application. The claims as a whole merely describe how to generally apply the generic computer components including a system, processor, [non-transitory] memory, and cache (including generic address spaces) (See MPEP 2106.05(f)), such that it amounts to no more than mere instructions to implement the abstract idea by adding the words “apply it” (or an equivalent). Furthermore, The claims as a whole merely describe how to generally apply the generic RNN (See MPEP 2106.05(f)), such that it amounts to no more than mere instructions to implement the abstract idea by adding the words “apply it” (or an equivalent), as the RNN is indistinguishable from a generic Commercial-off-the-shelf solution, beyond the abstract determinations it performs, based on abstract data (e.g., transaction information). the generic corresponding elements of the COTS RNN, including generic corresponding elements such as encoded states, hidden values, output nodes, etc. also amount to mere instructions to implement the abstract idea by adding the words “apply it” (or an equivalent), as the RNN is still indistinguishable from a generic Commercial-off-the-shelf solution. Additionally, sending / receiving of request information, and caching hidden states is adding insignificant extra-solution activity to the judicial exception (See MPEP 2106.05(g)).  Simply implementing the abstract idea on the aforementioned generic hardware is not a practical application of the abstract idea. Accordingly, when considered separately and as an ordered combination, these additional elements do not integrate the abstract idea into a practical application. The claims are directed to an abstract idea. (Step 2A Prong II: NO, the additional claimed elements are not integrated into a practical application).

Addressing Step 2B of 2019 PEG analysis, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As previously discussed, merely describe how to generally apply the generic computer components including a system, processor, [non-transitory] memory, and cache (See MPEP 2106.05(f)), such that it amounts to no more than mere instructions to implement the abstract idea by adding the words “apply it” (or an equivalent). Furthermore, the claims as a whole merely describe how to generally apply the generic RNN (See MPEP 2106.05(f)), such that it amounts to no more than mere instructions to implement the abstract idea by adding the words “apply it” (or an equivalent), as the RNN is indistinguishable from a generic Commercial-off-the-shelf solution, beyond the abstract determinations it performs, based on abstract data (e.g., transaction information). the generic corresponding elements of the COTS RNN, including generic corresponding elements such as encoded states, hidden values, output nodes, etc. also amount to mere instructions to implement the abstract idea by adding the words “apply it” (or an equivalent), as the RNN is still indistinguishable from a generic Commercial-off-the-shelf solution. For the step of system receiving request that was previously considered extra-solution, this has been further evaluated here and determined to be well-understood, routine, and conventional activity in the field. The specification does not provide any indication that claimed receiving of request is performed by anything other than a generic form of data transmission, and the OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) court decisions (MPEP 2106.05 (d)(II)) indicate that a computer merely sending/receiving information over a network is well-understood, routine, and conventional function when claimed at a high level of generality, (as the case is here). 

Furthermore, with respect to claims storing the hidden states / hidden layer values of the RNN in cache previously determined as extra-solution, Examiner notes that caching the hidden layer calculations (e.g., hidden states) is well-understood, routine, and conventional activity (MPEP 2106.05(d)) in the field of neural networks:

  Non-Patent-Literature, “Conditional Computation in Deep and Recurrent Neural Networks”, disclosing in §2.7, “Other Methods of Accelerating Neural Networks”, that generally, caching of weights in neural networks accelerates speed / utilization of overall solution: “In the case that weights or activations cannot be so aggressively quantized, there are still caching benefits, where larger proportions of the model weights can be stored in L1 or L2 cache, providing faster accesses to model parameters when computing activations.”

Non-Patent-Literature, “DeepCPU: Serving RNN-based Deep Learning Models 10x Faster” (“Minjia”), disclosing, in Introduction section (page 952): “[matrix multiplications] in RNNs are usually much smaller, fitting entirely in shared L3 cache, but with minimal data reuse: data movement from shared L3 cache to private L2 cache is the main bottleneck”.

Non-Patent Literature, “Building your Recurrent Neural Network - Step by Step” (“Berhane”), disclosing steps to help someone learning data science how to “implement [their] first Recurrent Neural Network in numpy [sic]”, further suggesting encoded hidden layer values (e.g., hidden states) being cached (See at least §1.1 – RNN cell disclosing at, at-1 (i.e., the hidden states) as stored in cache in step 3). Examiner respectfully asserts that the purely instructive nature of Berhane, in tandem with the fact disclosure is to help one “implement your first [RNN][…]” – suggests a basic, bare-bones implementation, and indicates that the disclosed caching is a basic operation in feed-forward / backpropagation steps of recurrent neural networks.

Accordingly, when considered separately and as an ordered combination, nothing in the claim adds significantly more (i.e. an inventive concept) to the abstract idea. Thus, claims 1 and 11 are not patent eligible. (Step 2B: NO. The claims do not amount to significantly more).

With respect to the dependent claims, the dependent claims have been given the full analysis including analyzing the additional limitations both individually and as an ordered combination. The dependent claims, when analyzed both individually and in combination, are also held to be patent ineligible under 35 U.S.C. 101 because of the same reasoning as above and because the additional limitations recited fail to establish that the claims are not directed to an abstract idea. The additional limitations of the dependent claims, when considered individually and as an ordered combination, do not recite additional elements outside of the abstract idea that integrate the judicial exception into a practical application, and do not amount to significantly more than the abstract idea.

With respect to claims 2-3, 6-9, 12-14, and 18-19 they are also merely apply the RNNs (See MPEP 2106.5(f)), and accordingly do not indicate that the previously mentioned additional elements are successfully integrated / amounting to significantly more, either alone or in combination. For these reasons these dependent claims are also not patent eligible.

With respect to claims 4 and 11, they do not recite any further additional elements outside of the abstract idea, and accordingly do not indicate that the previously mentioned additional elements are successfully integrated / amounting to significantly more, either alone or in combination. For these reasons these dependent claims are also not patent eligible.

With respect to claims 5 and 16, they generally apply the generic cache (See MPEP 2106.05(f)), such that it amounts to no more than mere instructions to implement the abstract idea by adding the words “apply it” (or an equivalent) and accordingly do not indicate that the previously mentioned additional elements are successfully integrated / amounting to significantly more, either alone or in combination. For these reasons these dependent claims are also not patent eligible.

With respect to claims 15 and 20, they generally apply the generic cache (See MPEP 2106.05(f)), such that it amounts to no more than mere instructions to implement the abstract idea by adding the words “apply it” (or an equivalent), and  generally apply the generic RNN (See MPEP 2106.05(f)), such that it amounts to no more than mere instructions to implement the abstract idea by adding the words “apply it” (or an equivalent), and accordingly do not indicate that the previously mentioned additional elements are successfully integrated / amounting to significantly more, either alone or in combination. For these reasons these dependent claims are also not patent eligible.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-7, 10-15, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over United States Application Publication No.  US-20200314101-A1 to Zhang (hereinafter Zhang) in further view of United States Application Publication No.  US-20200065812-A1 to Walters (“Walters”).


    PNG
    media_image1.png
    322
    772
    media_image1.png
    Greyscale
With respect to claim 1, Zhang discloses: A system, (Fig. 1, 104 Processing Computer):


    PNG
    media_image2.png
    777
    842
    media_image2.png
    Greyscale
comprising: one or more hardware processors (Fig. 9 in further view of ¶¶107, 118 of Zhang):

¶107 of Zhang: FIG. 9 shows a block diagram of a processing computer 1200 that may be used in embodiments of the invention. Processing computer 1200 may be for example, processing computer 104 of FIG. 1. Processing computer 1200 may comprise a memory 1220, a processor 1240, […]

and a memory storing computer-executable instructions, that in response to execution by the one or more hardware processors, causes the system to perform operations comprising (¶107 of Zhang): 

¶107 of Zhang: […] Processing computer 1200 may be for example, processing computer 104 of FIG. 1. Processing computer 1200 may comprise a memory 1220, a processor 1240, […]. The processing computer 1200 may also comprise a computer readable medium 1280, which may comprise code, executable by the processor 1240, for implementing methods according to embodiments. […]

receiving a request to process a current payment transaction (authorization request message, ¶27 of Zhang) between a payment provider (authorizing computer 106) and a user having a user account (user, ¶¶21, 50 of Zhang) with the payment provider (authorizing entity, ¶23 of Zhang); (See Fig. 1, circled 1 of Zhang, in further view of ¶¶50-51, and ¶¶21, 23, 25, 27 of Zhang):

    PNG
    media_image3.png
    322
    772
    media_image3.png
    Greyscale

¶¶50-51 of Zhang: In step 1, a user may use the access device 102 to initiate a transaction with a resource provider, and the user may input payment credentials into the access device 102 […] The access device 102 may then generate an authorization request message. [¶51] […] the processing computer 104 can receive the authorization request message from the access device 102. […]

¶21 of Zhang: A “user” may include an individual or a computational device. […] the user may be a cardholder, account holder, or consumer.

¶23 of Zhang: An “authorizing entity” may be an entity that authorizes a request, typically using an authorizing computer to do so. An authorizing entity may be an issuer, a governmental agency, a document repository, an access administrator, etc.

¶25 of Zhang [In view of ¶23 above]: An “issuer” may be a financial institution, such as a bank, that creates and maintains financial accounts for account holders. An issuer or issuing bank may issue and maintain financial accounts for consumers. The issuer of a particular consumer account may determine whether or not to approve or deny specific transactions. An issuer may authenticate a consumer and release funds to an acquirer if transactions are approved (e.g., a consumer's account has sufficient available balance and meets other criteria for authorization or authentication).

¶27 of Zhang: An “authorization request message” may be a message that is sent to request authorization for an interaction. […] An authorization request message according to some embodiments may comply with ISO 8583, which is a standard for systems that exchange electronic transaction information associated with a payment made by a user using a payment device or payment account.

Examiner’s Note: Examiner interprets the limitation as stating the “between a payment provider and a user” recitation is further qualifying the “receiving a request of a current payment transaction”)

accessing […] (memory) storing a set of encoded states (hidden states, h(t-1), and/or cell states c(t-1)) for a plurality of nodes (cells) of a recurrent neural network (RNN) fraud model, 

(Examiner notes ¶¶17,47 [disclosing fraud RNN model], Fig. 5 in further view of ¶¶85-87 [disclosing structure where states of RNNs are determined], and Fig. 3, refs 315, 304, 306, 335, 312, 345, in further view of  ¶71 of Zhang [disclosing how determination of state of RNN fraud model is determined by a set of previous states]. Examiner interprets the “plurality of nodes” claim limitation to include the one or more LSTMs cells 

¶¶17, 47 of Zhang: Embodiments of the invention include[s] […] approach to incorporating authorization decisions from an authorizing computer into an analytical model residing at a processing computer. The analytical model can be a deep recurrent neural network (RNN) with long short-term memory (LSTM) where authorization decisions are embedded into the inner structure of the deep recurrent neural network. An LSTM is a unit of an RNN that can effectively retain information, […] Authorization decisions for interactions may include […] fraud flags. [¶47 of Zhang:] For example, the processing computer 104 may use the analytical model to predict if the authorization request is fraudulent.


    PNG
    media_image4.png
    748
    458
    media_image4.png
    Greyscale


¶85 of Zhang: FIG. 5 shows a block diagram of an analytical model 500 according to embodiments. The analytical model may be a deep recurrent neural network (RNN). The analytical model 500 may comprise an embedding layer 510, one or more LSTM cells 530A and 530B, and a predictive layer 540.

¶87 of Zhang: […] The first LSTM cell 530A may maintain a cell state c1 (t) and a hidden state h1(t) for each user in the network. The cell state c(t) may be a vector that stores information about a user's interactions over a long time scale (i.e., a long period of time) and the hidden state h(t) may be a vector that stores information about the user's interactions over a short time scale (i.e., a short period of time).


    PNG
    media_image5.png
    537
    780
    media_image5.png
    Greyscale

¶¶71, 73 of Zhang [in view of Figs. 3 and 5]: The input vector x(t) 305 and the hidden state h(t−1) 315 may also pass through an input activation layer 330 that is a tanh neural network layer. The input activation layer 330 may use the tanh function to transform the inputs to values between −1 and 1. The information in the cell state c(t−1) 325 and the hidden state h(t−1) 315 may be within the range of −1 to 1 already, thus in order to meaningfully add new information to the cell state c(t−1) 325, the input can be scaled to that range as well. Other embodiments may use a different activation function to scale the inputs. The input gate 304 may be a pointwise multiplication of the output of the input activation layer 330 and the output of the input gate layer 340, which results in a vector of information that should be added to the cell state c(t−1) 325. A pointwise addition operation 306 can add this vector of information from the input gate 304 to the cell state c(t−1) 325. The cell state c(t−1) 325 is thus updated to an updated cell state c(t) 335 by removing information with the forget gate 302 and adding information with the input gate 304. […][¶73 of Zhang:] The updated cell state c(t) 335 can pass through a pointwise tanh function 308 to transform the values of the updated cell state c(t) 335 between −1 and 1. As with the input activation layer 330, this may be to ensure that the output is scaled correctly. The output gate 312 may perform a pointwise multiplication of the tanh function 308 and the output of the output gate layer 350 to generate an updated hidden vector h(t) 345. […]. The updated hidden vector h(t) 345 may also be output from the LSTM cell, and may be sent to another LSTM cell or a neural network layer.

Examiner’s Note (1): With respect to “encoded” limitation, Examiner notes the states of Zhang are understood to be “encoded” per the input layer encoding the data forwarded to hidden layer containing LSTMs (Fig. 5 in further view of ¶86 of Zhang), and further takes the stance that it is generally understood that the information within hidden layers ( of which is before predictive (i.e., output/softmax/decoding) layer, and after embedding layer (e.g., encoding layer)) are generally understood to be encoded:

    PNG
    media_image6.png
    748
    458
    media_image6.png
    Greyscale

¶86 of Zhang: The embedding layer 510 may encode the inputs.
Examiner’s Note (2): With respect to “for a plurality of nodes” limitation, Examiner interprets “node” to refer to any given LSTM cell. Examiner also notes that calculating inputs/outputs using the associated vectors to determine subsequent states is understood to involve accessing the vectors in the long-short-term-memory cells (LSTM cells) in computer memory, per being computer implemented. Examiner notes At least Fig. 3 shows a plurality of nodes (i.e., input / outputs of cells (e.g., c(t-1), h(t-1), etc). Furthermore, ¶87 explicitly states: “each LSTM cell 530 may comprise 256 hidden nodes”. 

the set of encoded states (e.g., c(t-1), h(t-1) of LSTM cells of RNN) being previously calculated based on execution of the RNN fraud model with respect to a prior transaction of the user account, and being a single set […] for the user account (Fig. 3 in further view of ¶¶61, 64, 67, 101 of Zhang [and aforementioned calculations at LSTM gates]):

    PNG
    media_image7.png
    564
    750
    media_image7.png
    Greyscale

¶61 of Zhang [addressing the “with respect to a prior transaction of the user account” limitation]: After the authorization decision is sent back to the processing computer, embodiments of the invention can allow the processing computer to incorporate the
authorization decision into an analytical model to enhance the accuracy for the subsequent transactions.

¶64 of Zhang [addressing the “with respect to a prior transaction of the user account” limitation]: the authorization decision features 220 may still be used to update the analytical model 230, which can be immediately available to analyze the next interaction. Therefore, authorization decisions can not only be used during training but may also be stored and updated at runtime. This may be done in real time, or substantially close to real time.

¶67 in further view of ¶101 of Zhang: time t […] time step t−1. [106] Each time step represents an interaction […]

¶101 of Zhang: […] A precursor cell state from the previous time step c(t−1) and a precursor hidden state from the previous time step h(t−1) may be updated in the LSTM to form a cell state c(t) and a hidden state h(t) for the current time step. […]

Examiner’s Note: Examiner notes the “interactions” of Zhang are understood to be include authorization requests/responses indicative of pending transactions corresponding to user’s account in view of at least ¶¶27-28 of Zhang: 

¶¶27-28 of Zhang: An “authorization request message” may be a message that is sent to request authorization for an interaction. […] An authorization request message according to some embodiments may comply with ISO 8583, which is a standard for systems that exchange electronic transaction information associated with a payment made by a user using a payment device or payment account. […] An “authorization response message” may be a message reply to an authorization request message. The authorization response message may be generated, for example, by a secure data server, an issuing financial institution, a payment processing network, a processing gateway, etc. The authorization response message may include, for example, one or more of the following status indicators: Approval—interaction was approved; Decline—interaction was not approved;

Examiner’s Note (3): With respect to “for the user account” limitation, Examiner notes the transaction is with respect to user (as previously shown above), of which includes account holders. the user submitting payment authorization request is already understood to be performing transaction with respect to an account of user per Zhang’s definition of user including account holder( i.e., fraud detection by model is for the user account));(See ¶21 of Zhang):

	¶21 of Zhang: A “user” may include an individual or a computational device. In some embodiments, a user may be associated with one or more personal accounts and/or devices. In some embodiments, the user may be a cardholder, account holder, or consumer.


wherein the prior transaction is an immediately preceding transaction to the current payment transaction; (Fig. 3, note notation, (c(t-1) -> c(t), and h(t-1) -> h(t)));(See also ¶76 elucidating that the updated states e.g., (c(t), h(t)), correspond to the current input (e.g., current transaction authorization request), in further view of ¶¶67, 101 of Zhang);

    PNG
    media_image8.png
    564
    750
    media_image8.png
    Greyscale

¶76 of Zhang [in view of Fig. 3 (above): […] the current input vector x(t) […]

¶67 in further view of ¶101 of Zhang: time t […] time step t−1. [106] Each time step represents an interaction […]

¶101 of Zhang: […] A precursor cell state from the previous time step c(t−1) and a precursor hidden state from the previous time step h(t−1) may be updated in the LSTM to form a cell state c(t) and a hidden state h(t) for the current time step. […]

determining a state of the RNN fraud model for the user account based on the set of encoded states; (cell state, c(t) and/or hidden state h(t) Fig. 3); (Examiner notes ¶¶17,47 [disclosing fraud RNN model], Fig. 5 in further view of ¶¶85-87 [disclosing structure where states of RNNs are determined], and Fig. 3, refs 315, 304, 306, 335, 312, 345,  in further view of  ¶71 of Zhang [disclosing how determination of state of RNN fraud model is determined by a set of previous states]):

¶¶17, 47 of Zhang: Embodiments of the invention include[s] […] approach to incorporating authorization decisions from an authorizing computer into an analytical model residing at a processing computer. The analytical model can be a deep recurrent neural network (RNN) with long short-term memory (LSTM) where authorization decisions are embedded into the inner structure of the deep recurrent neural network. An LSTM is a unit of an RNN that can effectively retain information, […] Authorization decisions for interactions may include […] fraud flags. [¶47 of Zhang:] For example, the processing computer 104 may use the analytical model to predict if the authorization request is fraudulent.


    PNG
    media_image4.png
    748
    458
    media_image4.png
    Greyscale


¶85 of Zhang: FIG. 5 shows a block diagram of an analytical model 500 according to embodiments. The analytical model may be a deep recurrent neural network (RNN). The analytical model 500 may comprise an embedding layer 510, one or more LSTM cells 530A and 530B, and a predictive layer 540.

¶87 of Zhang: […] The first LSTM cell 530A may maintain a cell state c1 (t) and a hidden state h1(t) for each user in the network. The cell state c(t) may be a vector that stores information about a user's interactions over a long time scale (i.e., a long period of time) and the hidden state h(t) may be a vector that stores information about the user's interactions over a short time scale (i.e., a short period of time).


    PNG
    media_image5.png
    537
    780
    media_image5.png
    Greyscale

¶¶71, 73 of Zhang [in view of Figs. 3 and 5]: The input vector x(t) 305 and the hidden state h(t−1) 315 may also pass through an input activation layer 330 that is a tanh neural network layer. The input activation layer 330 may use the tanh function to transform the inputs to values between −1 and 1. The information in the cell state c(t−1) 325 and the hidden state h(t−1) 315 may be within the range of −1 to 1 already, thus in order to meaningfully add new information to the cell state c(t−1) 325, the input can be scaled to that range as well. Other embodiments may use a different activation function to scale the inputs. The input gate 304 may be a pointwise multiplication of the output of the input activation layer 330 and the output of the input gate layer 340, which results in a vector of information that should be added to the cell state c(t−1) 325. A pointwise addition operation 306 can add this vector of information from the input gate 304 to the cell state c(t−1) 325. The cell state c(t−1) 325 is thus updated to an updated cell state c(t) 335 by removing information with the forget gate 302 and adding information with the input gate 304. […][¶73 of Zhang:] The updated cell state c(t) 335 can pass through a pointwise tanh function 308 to transform the values of the updated cell state c(t) 335 between −1 and 1. As with the input activation layer 330, this may be to ensure that the output is scaled correctly. The output gate 312 may perform a pointwise multiplication of the tanh function 308 and the output of the output gate layer 350 to generate an updated hidden vector h(t) 345. […]. The updated hidden vector h(t) 345 may also be output from the LSTM cell, and may be sent to another LSTM cell or a neural network layer.

Examiner’s Note (1): With respect to “encoded” limitation, Examiner notes the states of Zhang are understood to be “encoded” per the input layer encoding the data forwarded to hidden layer containing LSTMs (Fig. 5 in further view of ¶86 of Zhang), and further takes the stance that it is generally understood that the information within hidden layer are generally understood to be encoded:

    PNG
    media_image6.png
    748
    458
    media_image6.png
    Greyscale

¶86 of Zhang: The embedding layer 510 may encode the inputs.
Examiner’s Note (2): With respect to “for a plurality of nodes” limitation, Examiner interprets “node” to refer to an LSTM cell. Examiner also notes that calculating inputs/outputs using the associated vectors to determine subsequent states is understood to involve accessing the vectors of the long-short-term-memory cells (LSTM cells). Examiner notes At least Fig. 3 shows a plurality of nodes (with input / outputs of cells (e.g., c(t-1), h(t-1), etc). 

Examiner’s Note (3): With respect to “for the user account” limitation, Examiner notes the transaction is with respect to user (as previously shown above), of which includes account holders. the user submitting payment authorization request is already understood to be performing transaction with respect to an account of user per Zhang’s definition of user including account holder( i.e., fraud detection by model is for the user account));(See ¶21 of Zhang):

	¶21 of Zhang: A “user” may include an individual or a computational device. In some embodiments, a user may be associated with one or more personal accounts and/or devices. In some embodiments, the user may be a cardholder, account holder, or consumer.

executing the RNN fraud model based on data associated with the current payment transaction and the set of encoded states stored […], (Fig. 3 in further view of Fig. 5 (above), and aforementioned citations indicating encoded states (above) in aforementioned calculations, and ¶¶74, 77 of Zhang):

¶74 of Zhang: the output gate layer 350 may receive information from the cell state c(t−1) 325 and/or the updated cell state c(t) 335 in addition to the input x(t) 305 and the hidden state h(t−1) 315 when determining what information to output.

¶77 of Zhang: The final prediction from the analytical model can be calculated from the hidden state h(t) […] a softmax function may be used to convert the hidden state h(t) into probabilities for each potential label. If there are only two possible categories for the prediction, other activation functions may be used to convert the hidden state h(t) into probabilities, such as a sigmoid function.

wherein the executing comprises: 

inputting the data associated with the current payment transaction (x(t)) to at least one input node of the plurality of nodes (cell of LSTM cells), and calculating a new hidden value (h(t)) for each of a plurality of hidden nodes and at least one output node using an input value from at least one previous node based on the data and the set of encoded states ((ct-1, h(t-1)), (Fig. 3 in further view of ¶¶61, 64, 67, 74, 77, 101 of Zhang [and aforementioned calculations at LSTM gates]);(With respect to new hidden value, see ¶¶74-75, 87, 101 of Zhang, in view of Fig. 3)):

    PNG
    media_image7.png
    564
    750
    media_image7.png
    Greyscale

¶61 of Zhang [addressing the “with respect to a prior transaction of the user account” limitation]: After the authorization decision is sent back to the processing computer, embodiments of the invention can allow the processing computer to incorporate the
authorization decision into an analytical model to enhance the accuracy for the subsequent transactions.

¶64 of Zhang [addressing the “with respect to a prior transaction of the user account” limitation]: the authorization decision features 220 may still be used to update the analytical model 230, which can be immediately available to analyze the next interaction. Therefore, authorization decisions can not only be used during training but may also be stored and updated at runtime. This may be done in real time, or substantially close to real time.

¶67 in further view of ¶101 of Zhang: time t […] time step t−1. [106] Each time step represents an interaction […]

¶101 of Zhang: […] A precursor cell state from the previous time step c(t−1) and a precursor hidden state from the previous time step h(t−1) may be updated in the LSTM to form a cell state c(t) and a hidden state h(t) for the current time step. […]

Examiner’s Note: Examiner notes the “interactions” of Zhang are understood to be include authorization requests/responses indicative of pending transactions corresponding to user’s account in view of at least ¶¶27-28 of Zhang: 

¶¶27-28 of Zhang: An “authorization request message” may be a message that is sent to request authorization for an interaction. […] An authorization request message according to some embodiments may comply with ISO 8583, which is a standard for systems that exchange electronic transaction information associated with a payment made by a user using a payment device or payment account. […] An “authorization response message” may be a message reply to an authorization request message. The authorization response message may be generated, for example, by a secure data server, an issuing financial institution, a payment processing network, a processing gateway, etc. The authorization response message may include, for example, one or more of the following status indicators: Approval—interaction was approved; Decline—interaction was not approved;

¶74 of Zhang: the output gate layer 350 may receive information from the cell state c(t−1) 325 and/or the updated cell state c(t) 335 in addition to the input x(t) 305 and the hidden state h(t−1) 315 when determining what information to output.

¶75 of Zhang: Mathematically, in a general LSTM, the state vectors c(t) and h(t) at time step t can be concatenated into (c(t), h(t)) which can be updated based on state vectors for the previous time step t−1, c(t−1) and h(t−1), as well as a current input vector x(t) 

¶87 of Zhang: The first LSTM cell 530A may update a cell state c.sub.1(t−1) and a hidden state h.sub.1(t−1) from a previous time step with the new input data x(t) using the method described with reference to FIG. 3. […] Each LSTM cell 530 may comprise 256 hidden nodes. […] [i.e., executing the RNN fraud model includes encoding a new set of encoded states for a plurality of nodes, such as nodes corresponding to c(t), h(t)]

¶101 of Zhang: A precursor cell state from the previous time step c(t−1) and a precursor hidden state from the previous time step h(t−1) may be updated in the LSTM to form a cell state c(t) and a hidden state h(t) for the current time step.


the new set of encoded states comprising the newly calculated hidden value for each of the plurality of hidden nodes (cells) and the at least one output node; (¶88 of Zhang, in further view of aforementioned citations of Fig. 3 (above) showing how encoded states are updated);(with respect to encoded states comprising output node, see ¶96 in further view of ¶61 of Zhang)

¶88 of Zhang: The second LSTM cell 530B may receive the hidden state h.sub.1(t) of the first LSTM cell 530A and may use it as an input vector. […]. Adding additional LSTM cells before the predictive layer 540 may allow the analytical model to discover more complex dependencies in the interaction data. […]

¶96 of Zhang: the processing computer may extract authorization response data from the first authorization response message. For example, the authorization response data may comprise an authorization decision and a reason code, such as [approved, 00]. The processing computer may then input the authorization response data as authorization decision features and the analytical model may encode the authorization decision features. For example, the analytical model may encode the authorization decision features as [0, 00] where “0” represents an approved interaction as opposed to “1” for a declined transaction.

¶61 of Zhang: Augmented with additional information and expertise on its own, along with information from the authorization request message provided by the processing computer, the authorizing computer is able to provide a more accurate decision on whether or not a authorization request should be approved or declined. After the authorization decision is sent back to the processing computer, embodiments of the invention can allow the processing computer to incorporate the authorization decision into an analytical model to enhance the accuracy for the subsequent transactions. More specifically, the authorization decisions may be used in two places simultaneously, serving as output labels and as input features. 

Page 2 of 15Appl. No.: 16/723,746updating the [data] to store the new set of encoded states […]; (At least ¶103 of Zhang):
¶103 of Zhang:  A precursor cell state from the previous time step c(t−1) and a precursor hidden state from the previous time step h(t−1) may be updated with information from the interaction data features […] in the LSTM to form a cell state c(t) and a hidden state h(t) for the current time step.

and determining a risk level (e.g., risk score / risk probability) corresponding to the current payment transaction based on an output of the executing the RNN fraud model. (at least ¶¶85, 101 of Zhang discloses a risk score (i.e., risk level) corresponding to the current payment request (denoted as the tth step, as explained above). Examiner also notes ¶¶17, 47, 65, 79, 85, 89, 101 106 of Zhang as relevant):

¶85 of Zhang: […] One output ŷc(t) may be an interaction label, which may include a security risk score. […].

¶101 of Zhang: A precursor cell state from the previous time step c(t−1) and a precursor hidden state from the previous time step h(t−1) may be updated in the LSTM to form a cell state c(t) and a hidden state h(t) for the current time step. The precursor cell state c(t−1) and the precursor hidden state h(t−1) may be updated by a method such as that described in FIG. 3. The analytical model may then output an interaction ŷc(t) […]For example, the interaction label ŷc(t)  may be a security risk score […].

¶17 of Zhang: The analytical model can be trained using authorization decisions, which can be used as both inputs and auxiliary outputs during training, in addition to processing computer outputs such as fraud or non-fraud. […]

¶47 of Zhang: For example, the processing computer 104 may use the analytical model to predict if the authorization request is fraudulent.

¶89 of Zhang: For example, the predictive layer 540 may output a probability that an interaction is fraudulent. The predictive layer 540 may output a value for each possible output. For example, the analytical model may be configured to classify an interaction in one of six categories or risk labels: 0 for a normal interaction, 1 for a fraudulent interaction.

¶65 of Zhang: The risk labels may be based on or related to a risk score. The risk score can be the probability that an interaction is likely to be fraudulent. For example, the risk score may be a value between 0 and 1. A risk score value of close to 1 may indicate that the interaction has a very high likelihood being fraudulent. Because the analytical model may determine a classification for each interaction, the analytical model may be considered a machine learning classifier.

¶79 of Zhang: […] This allows the LSTM to utilize both current input and past information while making future predictions […].

¶106 of Zhang: The risk score may take on values between 0 and 1, with 0 representing an interaction that is likely not fraudulent and/or with minimal risk, and 1 representing an interaction that is high risk and/or likely fraudulent. When there is a fraudulent interaction, that may be represented by both the risk label and the risk score.

Zhang fails to disclose, but Walters suggests: 
Accessing a cache (Fraud detection logic circuitry 1015) storing a set of encoded states […] / encoded states stored in the cache (See Fraud Detection Logic Circuitry, Fig. 1A, 1015, and Fig. 9, 1037, 1047 in further view of ¶¶9, 84 of Walters. At least Fig. 1A, abstract, ¶¶41, 12, 22, in further view of ¶¶47, of Walters discloses a cache (Fig. 1A, Fraud Detection Logic Circuitry 1015) may store multiple instances Models, such as LSTMs comprising hidden states);


    PNG
    media_image9.png
    617
    441
    media_image9.png
    Greyscale

¶9 of Walters: FIG. 4 […] a system including a multiple-processor platform, a chipset, buses, and accessories such as the server shown in FIGS. 1A-1B; […]

¶84 of Walters: The fraud detection logic circuitry 4026 may represent circuitry configured to implement the functionality of fraud detection for neural network support within the processor core(s) 4020 or may represent a combination of the circuitry within a processor and a medium to store all or part of the functionality of the fraud detection logic circuitry 4026 in memory such as cache,

abstract of Walters: Logic may detect fraudulent transactions. Logic may determine, by a neural network based on the data about a transaction, a deviation of the transaction from a range of purchases predicted for the customer, wherein the neural network is pretrained to predict purchases by the customer based on a purchase history of the customer. […]

¶12 of Walters: Thereafter, an instance of that neural network is assigned to a specific customer and retrains or continues to train based on the purchase history of that specific customer,

¶40 of Walters: An RNN is a class of artificial neural network where connections between nodes form a directed graph along a sequence. This allows the RNN to exhibit dynamic temporal behavior for a time sequence. RNNs can use their internal state (memory) to process sequences of inputs

¶22 of Walters: In one embodiment, the neural network is only trained with that customer's purchase history.

¶¶46-47 of Walters: the generative neural network 1605 and the discriminative neural network 1660 may comprise Long Short-Term Memory (LSTM) neural networks, […]. [¶47:] An LSTM is a basic deep learning model and capable of learning long-term dependencies. […] The LSTM internal units have a hidden state augmented with nonlinear mechanisms to allow the state to propagate […]

¶67 of Walters: […] payment instrument issuer. The payment instrument issuer may comprise a server to perform fraud detection based on the instance of the neural network 2010 that is trained for this specific customer […]

Examiner’s Note: Examiner notes that the encoded / hidden states are an element of neural network, and accordingly, are stored in the in the fraud detection logic circuitry, which was disclosed to include cache storage.


    PNG
    media_image9.png
    617
    441
    media_image9.png
    Greyscale
The encoded states […] being a single set stored by the cache […] And wherein the cache further stores a plurality of sets of encoded states for a plurality of user accounts; (See Fig. 1A in view of aforementioned citations in mapping (above) of Walters);

Examiner’s Note: Examiner notes that the encoded / hidden states are an element of neural network, and accordingly, are stored in the in the fraud detection logic circuitry, which was disclosed to include cache storage.

Accordingly, it would have been obvious to one having ordinary skill in the art prior to the effective filing date of the claimed invention that the model of Zhang could be copied, resulting in multiple instances (with multiple corresponding hidden states) stored in cache, as suggested by Walters (for providing customer specific transaction fraud models), resulting in a second set of encoded states corresponding to a second instance of the RNN fraud model that has been executed with respect to a transaction between the payment provider and a second user having a second user account with the payment provider,  in order to advantageously  recognize transaction patterns specific to a given customer with a faster form of memory (e.g., cache) (See ¶13 of Walters disclosing customization advantages): 

¶13 of Walters: […] retrains or continues to train based on the purchase history of that specific customer, advantageously training the neural network to recognize specific transaction patterns of that specific customer. As a result, determinations by the neural network about non-fraudulent transactions are based on predicted transactions for each customer.

With respect to claim 2, Zhang in view of Walters discloses: The system of claim 1, wherein the operations further comprise: 

training the RNN fraud model based on a set of transaction features that have been previously computed based on transaction information associated with previous transactions of a plurality of other user accounts with the payment provider, . (Fig. 4, 308 of Zhang, in further view of ¶84 of Zhang):

    PNG
    media_image10.png
    423
    372
    media_image10.png
    Greyscale

¶84 of Zhang: In step 308, the analytical model may analyze the interaction data features and the authorization decision features. The analytical model may analyze the training data associated with each user. As the analytical model processes the training data, LSTM in the analytical model can update a cell state and a hidden state. For each interaction in the training data that the analytical model processes, it may output a predicted interaction label and a predicted authorization decision label. The output may be risk score and/or a risk label. The analytical model may then calculate classification loss by comparing the predicted interaction label to the actual interaction label and comparing the predicted authorization label to the actual interaction label. The analytical model can recursively process the training data to minimize the classification loss. When training the analytical model, dropouts may be applied in each LSTM, with a dropout probability of 0.5.

¶81 of Zhang: In step 302, the processing computer can receive prior authorization request data from a plurality of past interactions. The prior authorization request data may form part of a training dataset. […] For example, the prior authorization request data may be derived from interaction histories of a plurality of users […]

¶21 of Zhang: A “user” may include an individual or a computational device. In some embodiments, a user may be associated with one or more personal accounts and/or devices. In some embodiments, the user may be a cardholder, account holder, or consumer.

wherein the training includes implementing a feedback loop that updates one or more hidden states in a hidden layer for the plurality of hidden nodes. (See Examiner’s Note regarding “feedback loop”);(See also at least Fig. 3 in further view of ¶¶103, 101, and 96 of Zhang realizing a feedback look involving updating of hidden states):

    PNG
    media_image11.png
    557
    727
    media_image11.png
    Greyscale

¶103 of Zhang:  A precursor cell state from the previous time step c(t−1) and a precursor hidden state from the previous time step h(t−1) may be updated with information from the interaction data features […] in the LSTM to form a cell state c(t) and a hidden state h(t) for the current time step.

¶101 of Zhang: […] A precursor cell state from the previous time step c(t−1) and a precursor hidden state from the previous time step h(t−1) may be updated in the LSTM to form a cell state c(t) and a hidden state h(t) for the current time step. […]

¶61 of Zhang [addressing the “with respect to a prior transaction of the user account” limitation]: After the authorization decision is sent back to the processing computer, embodiments of the invention can allow the processing computer to incorporate the authorization decision into an analytical model to enhance the accuracy for the subsequent transactions.

¶96 of Zhang: the processing computer may extract authorization response data from the first authorization response message. For example, the authorization response data may comprise an authorization decision and a reason code, such as [approved, 00]. The processing computer may then input the authorization response data as authorization decision features and the analytical model may encode the authorization decision features. For example, the analytical model may encode the authorization decision features as [0, 00] where “0” represents an approved interaction as opposed to “1” for a declined transaction.

 
Examiner’s Note: Examiner takes official notice that it is known that any given RNN model has a feedback loop in their hidden layers, by definition. See also ¶2 of Applicant’s specification.

With respect to claim 3, Zhang in view of Walters discloses: The system of claim 2, wherein the set of encoded states are updated in response to each new transaction processed by the RNN fraud model.  (¶¶61, 64 of Zhang. Examiner also notes ¶¶67, 73, 61, 64, 101 of Zhang also discloses this in a more granular level in view of Fig. 3 of Zhang):

¶61 of Zhang: After the authorization decision is sent back to the processing computer, embodiments of the invention can allow the processing computer to incorporate the authorization decision into an analytical model to enhance the accuracy for the subsequent transactions.

¶64 of Zhang: the authorization decision features 220 may still be used to update the analytical model 230, which can be immediately available to analyze the next interaction. Therefore, authorization decisions can not only be used during training but may also be stored and updated at runtime. This may be done in real time, or substantially close to real time.


    PNG
    media_image12.png
    537
    780
    media_image12.png
    Greyscale

¶67 in further view of ¶101 of Zhang: time t […] time step t−1. [106] Each time step represents an interaction [As noted in parent claim 1, Examiner notes it is understood the interactions may comprise transaction authorization requests (i.e., transactions)]

¶73 of Zhang: The output gate 312 may perform a pointwise multiplication of the tanh function 308 and the output of the output gate layer 350 to generate an updated hidden vector h(t) 345. In other embodiments, the operation of the tanh function 308 may correspond to the activation function of the input activation layer 330. The updated hidden vector h(t) 345 and the updated cell vector c(t) 335 can then be used by the LSTM cell at the next time step t+1. The updated hidden vector h(t) 345 may also be output from the LSTM cell, and may be sent to another LSTM cell or a neural network layer.

¶101 of Zhang: […] A precursor cell state from the previous time step c(t−1) and a precursor hidden state from the previous time step h(t−1) may be updated in the LSTM to form a cell state c(t) and a hidden state h(t) for the current time step. […]


With respect to claim 4, Zhang in view of Walters discloses all the claim elements of parent claim 2. Zhang further discloses:

wherein the transaction information comprises at least one of networking information associated with a user device of the user, a user identifier, a transaction identifier, or a geographic location of the user.  (At least ¶76 in view of ¶80 of Zhang discloses user identifiers included as elements of the transaction information used as input. Examiner also notes ¶¶98, 83, 63 of Zhang as relevant):

¶76 of Zhang: the current input vector x(t) may consist of interaction data features created from interaction information (e.g., user identifier, […]

¶80 of Zhang: The analytical model may be formed and trained using interaction data from prior authorization request messages and prior authorization response messages from an authorizing computer [.] The analytical model may be run on a processing computer such as processing computer 104 in FIG. 1.

¶98 of Zhang: The second authorization request message may have been received by the processing computer for a second interaction between the user and a resource provider. The resource provider may be the same resource provider as the first interaction or a different resource provider. Example second interaction data may comprise an interaction type, a timestamp, and a device identifier of an access device where the second authorization request originated. For example, a vector of interaction data for an a log-in authorization request may be [logInReq, 9 PM, mobile48207].

¶83 of Zhang: For example, the interaction data features may be [$20, 1 PM, Target, e-commerce].

¶63 of Zhang: Interaction data features 210 may include an interaction value, a time stamp, an interaction location, etc.

With respect to claim 5, Zhang in view of Walters discloses:  The system of claim 1, wherein the cache stores only the single set corresponding to the set of encoded states corresponding to the user account at a first time, (Examiner, in view of lack of clarity (per apparent contradiction), interprets the limitation as stating the cache storing a set of encoded states corresponding to a user account);(See Fig. 1 in further view of ¶¶9, 84 of Walters – Examiner notes that the neural networks are understood to include encoded states, of which correspond to user accounts in view of at least ¶67 of Walters):

    PNG
    media_image9.png
    617
    441
    media_image9.png
    Greyscale

¶9 of Walters: FIG. 4 […] a system including a multiple-processor platform, a chipset, buses, and accessories such as the server shown in FIGS. 1A-1B; […]

¶84 of Walters: The fraud detection logic circuitry 4026 may represent circuitry configured to implement the functionality of fraud detection for neural network support within the processor core(s) 4020 or may represent a combination of the circuitry within a processor and a medium to store all or part of the functionality of the fraud detection logic circuitry 4026 in memory such as cache,

¶67 of Walters: Once the fraud detection logic circuitry 2000 retrains one of the instances 2012 for a specific customer, the instance of the neural network 2010 can perform fraud detection for the specific customer and, in several embodiments, continue to train with new, non-fraudulent transactions completed by the customer. When the specific customer conducts a transaction, the vendor may transmit information related to that transaction to the payment instrument issuer. The payment instrument issuer may comprise a server to perform fraud detection based on the instance of the neural network 2010 that is trained for this specific customer or may hire a third party to perform the fraud detection. In either case, the fraud detection logic circuitry 2000 receives the transaction data 2005 as an input and provides the input to the instance of the neural network 2010 trained for the specific customer.

¶48 of Walters: An LSTM is a basic deep learning model and capable of learning long-term dependencies. A LSTM internal unit is composed of a cell, an input gate, an output gate, and a forget gate. The LSTM internal units have a hidden state augmented with nonlinear mechanisms to allow the state to propagate without modification, be updated, or be reset, using simple learned gating functions.

and wherein the single set is updated to the new set of encoded states at a second time.  (See aforementioned mapping of claim 1 in view of Zhang, showing Zhang’s LSTM updating encoded states, mutatis mutandis); (See also ¶48 of Walters disclosing update)

¶48 of Walters: An LSTM is a basic deep learning model and capable of learning long-term dependencies. A LSTM internal unit is composed of a cell, an input gate, an output gate, and a forget gate. The LSTM internal units have a hidden state augmented with nonlinear mechanisms to allow the state to […] be updated, […]

With respect to claim 6, Zhang in view of Walters discloses:  The system of claim 4, wherein the new set of encoded states comprise updated RNN values for the plurality of hidden nodes and the at least one output node based on the data associated with the current payment transaction and the set of encoded states stored in the cache.  ((ct-1, h(t-1)), (Fig. 3 in further view of ¶¶61, 64, 67, 74, 77, 101 of Zhang [and aforementioned calculations at LSTM gates]);(With respect to new hidden value, see ¶¶74-75, 87, 101 of Zhang, in view of Fig. 3));(With respect to output node, see ¶¶91, 61 of Zhang):

    PNG
    media_image7.png
    564
    750
    media_image7.png
    Greyscale

¶61 of Zhang [addressing the “with respect to a prior transaction of the user account” limitation]: After the authorization decision is sent back to the processing computer, embodiments of the invention can allow the processing computer to incorporate the
authorization decision into an analytical model to enhance the accuracy for the subsequent transactions.

¶64 of Zhang [addressing the “with respect to a prior transaction of the user account” limitation]: the authorization decision features 220 may still be used to update the analytical model 230, which can be immediately available to analyze the next interaction. Therefore, authorization decisions can not only be used during training but may also be stored and updated at runtime. This may be done in real time, or substantially close to real time.

¶67 in further view of ¶101 of Zhang: time t […] time step t−1. [106] Each time step represents an interaction […]

¶101 of Zhang: […] A precursor cell state from the previous time step c(t−1) and a precursor hidden state from the previous time step h(t−1) may be updated in the LSTM to form a cell state c(t) and a hidden state h(t) for the current time step. […]

Examiner’s Note: Examiner notes the “interactions” of Zhang are understood to be include authorization requests/responses indicative of pending transactions corresponding to user’s account in view of at least ¶¶27-28 of Zhang: 

¶¶27-28 of Zhang: An “authorization request message” may be a message that is sent to request authorization for an interaction. […] An authorization request message according to some embodiments may comply with ISO 8583, which is a standard for systems that exchange electronic transaction information associated with a payment made by a user using a payment device or payment account. […] An “authorization response message” may be a message reply to an authorization request message. The authorization response message may be generated, for example, by a secure data server, an issuing financial institution, a payment processing network, a processing gateway, etc. The authorization response message may include, for example, one or more of the following status indicators: Approval—interaction was approved; Decline—interaction was not approved;

¶74 of Zhang: the output gate layer 350 may receive information from the cell state c(t−1) 325 and/or the updated cell state c(t) 335 in addition to the input x(t) 305 and the hidden state h(t−1) 315 when determining what information to output.

¶75 of Zhang: Mathematically, in a general LSTM, the state vectors c(t) and h(t) at time step t can be concatenated into (c(t), h(t)) which can be updated based on state vectors for the previous time step t−1, c(t−1) and h(t−1), as well as a current input vector x(t) 

¶87 of Zhang: The first LSTM cell 530A may update a cell state c.sub.1(t−1) and a hidden state h.sub.1(t−1) from a previous time step with the new input data x(t) using the method described with reference to FIG. 3. […] Each LSTM cell 530 may comprise 256 hidden nodes. […] [i.e., executing the RNN fraud model includes encoding a new set of encoded states for a plurality of nodes, such as nodes corresponding to c(t), h(t)]

¶101 of Zhang: A precursor cell state from the previous time step c(t−1) and a precursor hidden state from the previous time step h(t−1) may be updated in the LSTM to form a cell state c(t) and a hidden state h(t) for the current time step.

¶96 of Zhang: the processing computer may extract authorization response data from the first authorization response message. For example, the authorization response data may comprise an authorization decision and a reason code, such as [approved, 00]. The processing computer may then input the authorization response data as authorization decision features and the analytical model may encode the authorization decision features. For example, the analytical model may encode the authorization decision features as [0, 00] where “0” represents an approved interaction as opposed to “1” for a declined transaction.

¶61 of Zhang: Augmented with additional information and expertise on its own, along with information from the authorization request message provided by the processing computer, the authorizing computer is able to provide a more accurate decision on whether or not a authorization request should be approved or declined. After the authorization decision is sent back to the processing computer, embodiments of the invention can allow the processing computer to incorporate the authorization decision into an analytical model to enhance the accuracy for the subsequent transactions. More specifically, the authorization decisions may be used in two places simultaneously, serving as output labels and as input features. 

With respect to claim 7, Zhang in view of Walters discloses: The system of claim 1, wherein the RNN fraud model comprises at least one of a long short-term memory RNN or a gated recurrent units RNN.  (¶¶17, 38 of Zhang):

¶17 of Zhang: The analytical model can be a deep recurrent neural network (RNN) with long short-term memory (LSTM) where authorization decisions are embedded into the inner structure of the deep recurrent neural network.

¶38 of Zhang: An LSTM may be comprised of a cell and gates that control the flow information into and out of the cell.

With respect to claim 10, it is rejected under the same rationale as claim 1 (above), mutatis mutandis. (Examiner notes the encoded states of claim 1 correspond to hidden states of claim 10).

With respect to claim 11, Zhang in view of Walters discloses all the elements of parent claim 10. Furthermore, Zhang discloses: determining whether to process the current transaction based on the risk score. (¶52 of Zhang):

¶52 of Zhang: If the processing computer 104 determines that the risk score is too high, the processing computer 104 may not send the authorization request message to the authorizing computer 106 and may instead send a failure message or decline to the access device 102.

With respect to claim 12, it is rejected under the same rationale as claim 1 (above), mutatis mutandis. (See mapping of “new encoded states” limitation of claim 1)

With respect to claim 13, Zhang discloses:  The method of claim 10, further comprising: prior to the running the RNN fraud model, training the RNN fraud model using a plurality of features that were previously generated based on prior transactions of a plurality of other user accounts with the service provider, (¶¶84, 81 in further view of ¶21 of Zhang discloses features extracted from interaction (e.g., transaction) and authorization decision features corresponding to a plurality of other accountholders are used to train the model).

 ¶84 of Zhang: In step 308, the analytical model may analyze the interaction data features and the authorization decision features. The analytical model may analyze the training data associated with each user. As the analytical model processes the training data, LSTM in the analytical model can update a cell state and a hidden state. For each interaction in the training data that the analytical model processes, it may output a predicted interaction label and a predicted authorization decision label. The output may be risk score and/or a risk label. The analytical model may then calculate classification loss by comparing the predicted interaction label to the actual interaction label and comparing the predicted authorization label to the actual interaction label. The analytical model can recursively process the training data to minimize the classification loss. When training the analytical model, dropouts may be applied in each LSTM, with a dropout probability of 0.5.

¶81 of Zhang: In step 302, the processing computer can receive prior authorization request data from a plurality of past interactions. The prior authorization request data may form part of a training dataset. […] For example, the prior authorization request data may be derived from interaction histories of a plurality of users […]

¶21 of Zhang: A “user” may include an individual or a computational device. […] the user may be a cardholder, account holder, or consumer.

wherein the training includes implementing a feedback loop that updates one or more node hidden states in a hidden layer for the, plurality of hidden nodes.  (See Examiner’s Note regarding “feedback loop”);(See also at least Fig. 3 in further view of ¶¶103, 101, and 96 of Zhang realizing a feedback look involving updating of hidden states):

    PNG
    media_image11.png
    557
    727
    media_image11.png
    Greyscale

¶103 of Zhang:  A precursor cell state from the previous time step c(t−1) and a precursor hidden state from the previous time step h(t−1) may be updated with information from the interaction data features […] in the LSTM to form a cell state c(t) and a hidden state h(t) for the current time step.

¶101 of Zhang: […] A precursor cell state from the previous time step c(t−1) and a precursor hidden state from the previous time step h(t−1) may be updated in the LSTM to form a cell state c(t) and a hidden state h(t) for the current time step. […]

¶61 of Zhang [addressing the “with respect to a prior transaction of the user account” limitation]: After the authorization decision is sent back to the processing computer, embodiments of the invention can allow the processing computer to incorporate the authorization decision into an analytical model to enhance the accuracy for the subsequent transactions.

¶96 of Zhang: the processing computer may extract authorization response data from the first authorization response message. For example, the authorization response data may comprise an authorization decision and a reason code, such as [approved, 00]. The processing computer may then input the authorization response data as authorization decision features and the analytical model may encode the authorization decision features. For example, the analytical model may encode the authorization decision features as [0, 00] where “0” represents an approved interaction as opposed to “1” for a declined transaction.

Examiner’s Note: Examiner takes official notice that it is known that any given RNN model has a feedback loop in their hidden layers, by definition. See also ¶2 of Applicant’s specification.

Zhang fails to teach, but Walters teaches: prior to the running the RNN fraud model [training the RNN fraud model], (¶¶12, 19 of Walters)

¶12 of Walters: Moreover, embodiments may train the neural networks based on a transaction history or purchase history for each specific customer. Many embodiments pretrain the neural network based on purchase histories of multiple customers or all customers. Thereafter, an instance of that neural network is assigned to a specific customer and retrains or continues to train based on the purchase history of that specific customer, advantageously training the neural network to recognize specific transaction patterns of that specific customer. As a result, determinations by the neural network about non-fraudulent transactions are based on predicted transactions for each customer.

¶19 of Walters: In many embodiments, a service on one or more servers may pretrain the neural networks with the multiple customers' transaction data and each specific customer's transaction data prior to operating the neural network in inference mode to detect fraudulent transactions for a customer.

Examiner’s Note: Examiner interprets the limitation of “prior to running” as meaning the aforementioned running is with respect to a specific user, and that the prior training is not with respect to a specific user (i.e., the plurality of other users),  of which seems to be in line with ¶32 of the Applicant Specification: 

¶32 of Applicant Specification: According to certain embodiments, each user account may be provided with an initial set of hidden states (stored in the cache 112) before running the RNN fraud model for the first time with respect to transactions of the user account. The initial set of hidden states may the same or may be different for each user account. Over time, however, the set of hidden states for each user account are likely to change based on different transaction information for the different transactions in which each of the user accounts participate.

Accordingly, it would have been obvious to one having ordinary skill in the art prior to the effective filing date of the claimed invention that the model of the model of Zhang could each be assigned an instance, as suggested by Walters, where it is pretrained prior to the running of the RNN model with the specific assigned user, in order to advantageously  increase the robustness of the training, advantageously enabling the model to learn common sequences of transactions, and resultantly decreasing likelihood of false positives (¶17 of Walters): 

¶17 of Walters: […] In several embodiments, the neural network is initially pretrained with sets of transactions from multiple customers to train the neural network about common sequences of transactions. Some embodiments select different sets of transactions from the multiple customers to train the neural network with transaction sequences that have different counts […], advantageously increasing the robustness of the neural network's ability to recognize non-fraudulent transactions. […]


With respect to claim 14, Zhang in view of Walters discloses the limitations of parent claim 10. Furthermore, Zhang discloses: wherein the current transaction information associated with the current transaction includes a features vector corresponding to the plurality of features being extracted from the current transaction. (at least ¶¶33, 67, 76, 83, of Zhang):

¶33 of Zhang: A “machine learning model” may include an application of artificial intelligence that provides systems with the ability to automatically learn and improve from experience without explicitly being programmed. A machine learning model may include a set of software routines and parameters that can predict an output of a process (e.g., identification of an attacker of a computer network, authentication of a computer, a suitable recommendation based on a user search query, etc.) based on a “feature vector” or other input data. […]

¶67 of Zhang [referring to Fig. 3 showing LSTM cell structure]: The input vector x(t) 305 may comprise interaction data features (e.g., interaction value, time stamp) and/or authorization decision features (e.g., reason codes).

¶76 of Zhang: […] the current input vector x(t) may consist of interaction data features created from interaction information (e.g., user identifier, resource provider identifier) and authorization decision features created from authorization response messages (e.g., authorization decision, reason code).

¶83 of Zhang: The analytical model can encode the interaction data features and the authorization decision features as embedding vectors. For example, the interaction data features may be [$20, 1 PM, Target, e-commerce]. The analytical model may encode that information as [20, 13, 5, 3] where 13 represents the time stamp in hours, 5 represents a resource provider identifier (e.g., Target is 5.sup.th on a list or resource providers), and 3 represents an interaction type (e.g., e-commerce is 3.sup.rd on a list of interaction types).

With respect to claim 15, Zhang in view of Walters discloses: The method of claim 10, wherein the storing the new hidden states in the cache further comprises replacing (e.g., updating) the hidden states that were previously generated based on the previous transaction.  (¶¶67, 71 of Zhang);(With respect to “in the cache” limitation, see obviousness rationale of Walters in claim 1 (above), mutatis mutandis):

¶67 of Zhang: An example LSTM cell is shown in FIG. 3. The inputs to an LSTM cell at time t include an input vector x(t) 305, and the cell state c(t−1) 325 and the hidden state h(t−1) 315 of the LSTM cell at the previous time step t−1. The input vector x(t) 305 may comprise interaction data features (e.g., interaction value, time stamp) and/or authorization decision features (e.g., reason codes). The input vector x(t) 305 and the hidden state h(t−1) 315 can be concatenated together, so that information about the present (via the input vector x(t) 305) and the recent past (via the hidden state h(t−1) 315) can be analyzed together. The two vectors can then pass through a forget gate 302, an input gate 304, and an output gate 312 to determine how to update the cell state c(t−1) 325 and what information to output.

¶71 of Zhang: […] A pointwise addition operation 306 can add this vector of information from the input gate 304 to the cell state c(t−1) 325. The cell state c(t−1) 325 is thus updated to an updated cell state c(t) 335 by removing information with the forget gate 302 and adding information with the input gate 304. At the next time step t+1, the updated cell state c(t) 335 can be updated again with new information.

With respect to claim 17, Zhang discloses: A non-transitory computer readable medium storing computer-executable instructions that in response to execution by one or more hardware processors, causes a service provider system to perform operations comprising: (¶¶31, 107-108 in further view of ¶¶47, 55  of Zhang discloses memory, such as non-transitory memory, may implement operations that a cause service provider to perform operations)

¶31 of Zhang: A “memory” may be any suitable device or devices that can store electronic data. A suitable memory may comprise a non-transitory computer readable medium that stores instructions that can be executed by a processor to implement a desired method.

¶¶107-108 of Zhang: Processing computer 1200 may be for example, processing computer 104 of FIG. 1. Processing computer 1200 may comprise a memory 1220, a processor 1240, and a network interface 1260. The processing computer 1200 may also comprise a computer readable medium 1280, which may comprise code, executable by the processor 1240, for implementing methods according to embodiments. [¶108:] The memory 1220 may be implemented using any combination of any number of non-volatile memories (e.g., flash memory) […]

¶47 of Zhang: In some embodiments the processing computer 104 may be part of a payment processing network. In other embodiments the processing computer 104 may be part of an access gateway. The processing computer 104 may process authorization requests from the access device 102 using an analytical model.

¶55 of Zhang: the processing computer 104 may be a payment processing network (e.g., Visa), and the authorizing computer 106 may be an issuer computer.

Examiner’s Note: Examiner takes the stance Visa payment network is a form of payment provider.

With respect to the remaining claim limitations of claim 17, they are rejected under the same rationale as claim 1 (above), mutatis mutandis. 

With respect to claim 18, it is rejected under the same rationale as claims 1 (above), mutatis mutandis. (Examiner notes the RNN values include the hidden state values and authorization decision features)

With respect to claim 19, it is rejected under the same rationale as claim 13 (above), mutatis mutandis. (Particularly, see mapping of claim 13 pertaining to Walters, and corresponding obviousness statement).

With respect to claim 20, it is rejected under the same rationale as claim 15 (above), mutatis mutandis.

Claims 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Walters, as applied in parent claims 1 and 10, in further view of Non-Patent Literature, “Casting Deep Nets on Financial Crime” to Abreu (“Abreu”).

With respect to claim 8, Zhang in view of Walters discloses:

RNN fraud model that has been executed with respect to a transaction between the payment provider and a second user having a second user account of the plurality of user accounts with the payment provider, (See Fig. 5 in view of ¶85 disclosing RNN used in conjunction with interaction (e.g., transaction auth) data; At least Fig. 1, circled 1 of Zhang, ¶¶50-51, and ¶¶2, 21, 23, 25, 27 of Zhang discloses the transaction is between the payment provider and one or more users of a plurality of user accounts associated with payment 
    PNG
    media_image4.png
    748
    458
    media_image4.png
    Greyscale
provider):

¶85 of Zhang: FIG. 5 shows a block diagram of an analytical model 500 according to embodiments. The analytical model may be a deep recurrent neural network (RNN). The analytical model 500 may comprise an embedding layer 510, one or more LSTM cells 530A and 530B, and a predictive layer 540. At time t, one input x.sub.c(t) may be interaction data features such as interaction values, a timestamp, an interaction location, and an interaction type. For example, x.sub.c(t) may be a vector including [$20, 1 PM, Target, e-commerce]. Another input x.sub.d(t) may be authorization decision features based on authorization decisions, such as an approval or a decline and reason codes. For example, x.sub.d(t) may be a vector including [declined, 05], where “05” is a particular reason code. One output custom-character(t) may be an interaction label, which may include a security risk score. For example, custom-character(t) may be a vector [0.95, 0.05] with probabilities of an interaction being fraud and non-fraud, respectively. In some embodiments, there may be more than two potential interaction labels. Another output custom-character(t) may be an authorization decision label, which may also be based on authorization decisions, like the authorization decision features. However, there may be more authorization decision features than authorization decision labels. FIG. 5 depicts an analytical model with two LSTM cells, however embodiments of the invention may have more or fewer LSTM cells.

    PNG
    media_image3.png
    322
    772
    media_image3.png
    Greyscale

¶2 of Zhang: […] The processing computer may receive millions of authorization requests and authorization decisions for various users at different times. […]

¶¶50-51 of Zhang: In step 1, a user may use the access device 102 to initiate a transaction with a resource provider, and the user may input payment credentials into the access device 102 […] The access device 102 may then generate an authorization request message. [¶51] […] the processing computer 104 can receive the authorization request message from the access device 102. […]

¶21 of Zhang: A “user” may include an individual or a computational device. […] the user may be a cardholder, account holder, or consumer.

¶23 of Zhang: An “authorizing entity” may be an entity that authorizes a request, typically using an authorizing computer to do so. An authorizing entity may be an issuer, a governmental agency, a document repository, an access administrator, etc.

¶25 of Zhang [In view of ¶23 above]: An “issuer” may be a financial institution, such as a bank, that creates and maintains financial accounts for account holders. An issuer or issuing bank may issue and maintain financial accounts for consumers. The issuer of a particular consumer account may determine whether or not to approve or deny specific transactions. An issuer may authenticate a consumer and release funds to an acquirer if transactions are approved (e.g., a consumer's account has sufficient available balance and meets other criteria for authorization or authentication).

¶27 of Zhang: An “authorization request message” may be a message that is sent to request authorization for an interaction. […] An authorization request message according to some embodiments may comply with ISO 8583, which is a standard for systems that exchange electronic transaction information associated with a payment made by a user using a payment device or payment account.

Zhang fails to teach, but Walters discloses (the same obviousness rationale of claim 1 is applied herein, mutatis mutandis):

wherein the plurality of sets of encoded states stored by the cache (Fraud Detection Logic circuitry; see Fig. 1 of Walters) comprises a second set of encoded states corresponding to a second instance of the RNN fraud model […], (At least abstract, ¶¶12, 22, in further view of ¶¶47 of Walters discloses multiple instances associated with customers may be instantiated);

abstract of Walters: Logic may detect fraudulent transactions. Logic may determine, by a neural network based on the data about a transaction, a deviation of the transaction from a range of purchases predicted for the customer, wherein the neural network is pretrained to predict purchases by the customer based on a purchase history of the customer. […]

¶12 of Walters: Thereafter, an instance of that neural network is assigned to a specific customer and retrains or continues to train based on the purchase history of that specific customer,

¶22 of Walters: In one embodiment, the neural network is only trained with that customer's purchase history.

¶¶46-47 of Walters: the generative neural network 1605 and the discriminative neural network 1660 may comprise Long Short-Term Memory (LSTM) neural networks, […]. [¶47:] An LSTM is a basic deep learning model and capable of learning long-term dependencies. […] The LSTM internal units have a hidden state augmented with nonlinear mechanisms to allow the state to propagate […]

Examiner’s Note: Examiner notes that, per multiple instance of RNNs of Walters are customer specific, and RNN’s (such as those of Zhang/Walters) are understood to comprise encoded states, Walters silently discloses a plurality of encoded states (executed with respect to transactions between payment provider and second user of the plurality, as disclosed by Zhang)

and wherein the operations further comprise: 

determining an address space in the cache that is associated with the user account and a second address space in the cache that is associated with the second user account, (See Examiner’s Note in view of Fig. 1A of Walters. Examiner notes the silently disclosed address spaces are associated with the user accounts per storing their corresponding, customer-specific 
    PNG
    media_image9.png
    617
    441
    media_image9.png
    Greyscale
RNN).


Examiner’s Note: Examiner takes official notice that that applications, such as the neural networks of Walters / Zhang, are understood, per requiring non-zero memory space, and being memory addressable (per being used in processing), to have address spaces. Examiner notes that the RNN processing of Walters requires the server to access (i.e., determine) the memory spaces to process, generally.

wherein the cache (fraud detection logic circuitry) is updated to store the new set of encoded states in association with the address space, and wherein the second set of encoded states is stored in association with the second address space.  (¶48 of Walters, in further view of the models being stored in fraud detection logic circuitry (e.g., cache, as previously explained in parent claim 1);(See also claim 1 mapping showing Zhang teaching updating of encoded states based on transaction inputs):

¶48 of Walters: An LSTM is a basic deep learning model and capable of learning long-term dependencies. A LSTM internal unit is composed of a cell, an input gate, an output gate, and a forget gate. The LSTM internal units have a hidden state augmented with nonlinear mechanisms to allow the state to propagate without modification, be updated, or be reset, using simple learned gating functions.

While Examiner maintains that Zhang in view of Walters renders obvious determining an address space in the cache that is associated with the user account, Examiner understands a narrower interpretation of the “determining … associated” limitations may disagree. Arguendo, Abreu discloses: determining an address space in the cache that is associated with the user account, (Page 7 of Abreu discloses accessing a memory location corresponding to a given card number of a new transaction, in order to access a state vector of a recurrent neural network, so as to feed the transaction into the model, resulting in yielding a new state (i.e., state is updated), and providing a prediction of credit card fraud for the inputted transaction);(See also page 9 in view of page 6 of Abreu disclosing that there are a plurality of cards with corresponding neural 
    PNG
    media_image13.png
    663
    577
    media_image13.png
    Greyscale
networks) 

Page 7 of Abreu: when a new transaction arrives we just need to […] fetch the current state for the given card number from memory.

    PNG
    media_image14.png
    308
    596
    media_image14.png
    Greyscale

Page 6: We skipped most of the details, but in fact each of these two blocks is a neural
Network

Accordingly, it would have been obvious to one having ordinary skill in the art prior to the effective filing date of the claimed invention to have the incoming transactions of Zhang result in system determining the cache address of the corresponding RNN model, in order to advantageously be able to ensure that the customer-specific transactions are properly processed by the corresponding RNN of Walters.

With respect to claim 16, it is rejected under the same rationale as claim 8 (above), mutatis mutandis. (Particularly, see mapping of claim 8 pertaining to Walters, and corresponding obviousness / silently disclosed statement).

Claims 9 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Walters, as applied in parent claim 1, in further view of Non-Patent Literature, “Context-Aware Credit Card Fraud Detection” to Jurgovsky (“Jurgovsky”).

With respect to claim 9, Zhang in view of Walters discloses: The system of claim 1, wherein the data associated with the current payment transaction includes RNN model features extracted from the current payment transaction, (¶63 of Zhang. Examiner also notes ¶¶96-97 of Zhang as relevant):

¶63 of Zhang: Interaction data features 210 and authorization decision features 220 may be extracted from a set of training data. For example, the training data may be the interaction history of a user, including authorization responses from an issuer for each transaction.

¶96 of Zhang: the processing computer may extract authorization response data from the first authorization response message. For example, the authorization response data may comprise an authorization decision and a reason code, such as [approved, 00]. The processing computer may then input the authorization response data as authorization decision features and the analytical model may encode the authorization decision features. For example, the analytical model may encode the authorization decision features as [0, 00] where “0” represents an approved interaction as opposed to “1” for a declined transaction.

¶97 of Zhang: the analytical model may be updated by the processing computer with the first authorization response message (as authorization decision features) and the first authorization request message (as interaction features) to form an updated analytical model. The authorization response data may be associated in the analytical model with the first interaction data. One or more LSTM cells of the analytical model may determine whether to add the authorization response data to cell states and hidden states.

and wherein the RNN model features were previously generated for the RNN fraud model by one of the payment provider or another device. (See Examiner’s Note)

Examiner’s Note: Examiner notes that the features extracted from the current payment transaction of Walters are “previously generated”, as they must be generated prior to extraction / model processing, generally. Examiner further notes ¶47 of Applicant specification: 

¶47 of Applicant specification: Referring back to Fig. 2, at step 208, a user application, such as application 122 communicates with the service provider system 102 to complete a current transaction between a user account and the service provider. At step 210, the model processing module 108 extracts the set of features (e.g., the features previously selected by the training module 106 to train the RNN fraud model 220) from current transaction information associated with the current transaction

Furthermore, Examiner notes that, per being computer implemented, the features must be generated by a device, generally.

While Examiner maintains that Zhang in view of Walters silently discloses model features previously generated for the RNN fraud model by [a] device for reasons shown above, Examiner notes a narrower interpretation of claims may disagree, as the features may have been determined manually. Examiner, arguendo, notes Jurgovsky teaches automated feature determination (e.g., feature selection): RNN model features were previously generated for the RNN fraud model by one of the payment provider or another device. (Page 58 of Jurgovsky);(See also §6.2.1 of Jurgovsky);

Page 58 of Jurgovsky: feature level augmentation appears to be a more promising strategy for credit card fraud detection as it enables us to probe the fraudulent or genuine character of transactions directly […] automatically derived features in the classification pipeline.

§6.2.1 of Jurgovsky (pages 115-117): Filters consider feature selection as a pre-processing step and they score and rank features independently of the chosen algorithm. And Wrappers utilize the classifier as black box and repeatedly probe its predictive performance under varying features on a validation set. […] In order to lift the restriction of assessing a feature’s relevance in isolation, several authors have developed holistic filter-based feature selection algorithms, with built-in feature scoring functions and search strategies. Kira et al. [KR92] proposed the RELIEF algorithm, that determines the relevance of a feature based on feature value differences between nearest neighbor instance pairs. [Examiner notes other automated feature selection methods are disclosed, but for brevity included the nearest neighbors implementation as an example]

Accordingly, it would have been obvious to one having ordinary skill in the art prior to the effective filing date of the claimed invention to have the features of Zhang in view of Walters selected automatically by computer device running feature selection algorithms, as suggested by Jurgovsky, in order to advantageously automatically determine the most relevant features for transaction fraud, so as to save time / improve accuracy of the overall solution.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 

Non-Patent Literature, “IMPROVING NEURAL LANGUAGE MODELS WITH A CONTINUOUS CACHE” to Grave (“Grave”)2, disclosing use of cache on neural network hidden states (Fig. 1 caption) to pre-train (Page 1, last paragraph), improving prediction without more additional training (Page 1, last paragraph).

United States Patent Publication No.  US-10067944-B2 to Bitincka, disclosing system with cache-aware searching of buckets in remote storage (title), where data intake and query system (Fig. 2) may query disparate data sources for remote data via customer IDs (Fig. 5), and access the data in a cache via a cache manager (Fig. 8A, particularly refs 800, 802, 816, 818, 830, and 812). Examiner notes the following citations as also relevant in showing how identifiers, such as the customer identifier, can be used as an index to access files stored in cache (corresponding to, for example, customer identifier):
 Col 11, lines 9 – 32, Col 12, lines 37-39, and 42-43  
Col 15, lines 43 – 64, 
Col 16 lines 1-2
Col 23, lines 8-11,
Col 24, lines 1-5, 13-19, 34-37, and 41-48

Non-Patent literature, “4 Major Challenges facing Fraud Detection; Ways to Resolve Them using Machine Learning” to Razorthink (“RazorThink”)3, disclosing use case of LSTM to detect fraud based on IP address (i.e., network identifier associated with payment device of user) and city (i.e., geolocation) to determine whether or not a transaction is fraudulent: “For example, an LSTM (Long Short Term Memory) deep learning model is useful for detecting fraud in a sequence of events. If a user logs in with a new IP address from a different city, changes his street address on file, then purchases an expensive item on an e-commerce site, LSTM might flag this transaction as fraudulent. None of these events alone is indicative of fraud, but the sequence of all three is.”

“Sequence classification for credit-card fraud detection” to Jurgovsky (“Jurgovsky-2”), disclosing examples of aggregated features in §2.1. “Feature engineering for temporal sequences” used in credit card fraud detection.

United States Application Publication No.  US 20190258733 A1 to Brown, disclosing a plurality of data streams stored in cache memory according to a first key to aggregate events (abstract), where the data streams may pertain to fraud detection (¶2). See also ¶¶24 in further view of Fig. 2 disclosing a segmented cache – each corresponding to an event stream.

United States Application Publication No.  US 20190370800 A1 to Song, disclosing in-memory caching ¶113, involving transaction evaluation based on aggregations (¶141).

United States Application Publication No.  US-20080172356-A1 to Bruno, disclosing querying of data stores by account/user identifiers (¶¶4-5).

Examiner Notes Application number 17/174,046, available on public pair, has IDS references supplied which contains subject matter relevant to Applicant’s claimed subject matter.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK A MALKOWSKI whose telephone number is (313)446-6624.  The examiner can normally be reached on Monday - Thursday 7:30AM-5:00PM, Alternating Fridays.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ryan Donlon can be reached on (571) 270-3602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/M.A.M./Examiner, Art Unit 3695                                                                                                                                                                                                        
 

/RYAN D DONLON/Supervisory Patent Examiner, Art Unit 3695                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 Examiner notes arguments drawn to cache storing states of RNN are addressed in next argument “b”.
        2 See PTO-892 reference “V”
        3 See PTO-892 reference “W”