Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
The instant application having Application No. 16437050 has a total of 30 claims pending in the application, of which claims 19-30 have been withdrawn as a non-elected group. 

I. ACKNOWLEDGEMENT OF REFERENCES CITED BY APPLICANT
Information Disclosure Statement
As required by M.P.E.P 609(c), the applicant’s submissions of the Information Disclosure Statement dated 10/1/2020 is acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending. As required by M.P.E.P 609 C(2), a copy of the PTOL-1449 initialed and dated by the examiner is attached to the instant office action.

II. REJECTIONS NOT BASED ON PRIOR ART

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 8 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 8 recites the limitation "the at least one" in line 2.  There is insufficient antecedent basis for this limitation in the claim. The phrase does not actually refer to any noun/object. This causes the claim to be unclear as it is not clear what the “at least one” is being referred to. This leads to a rejection under U.S.C. 112(b) for failure to particularly point out and distinctly claim the subject matter which the inventor regards as the invention. 



III. REJECTIONS BASED ON PRIOR ART
Examiners Note: Some rejections will be followed by an ‘EN’ that will denote an examiners note. This will be placed to further explain a rejection.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 and 10-12 are rejected under 35 U.S.C. 103 as being unpatentable over Trivedi et al (“Stochastic Multi-path Routing Problem with Non-stationary Rewards: Building PayU’s Dynamic Routing”) in view of Rougier (“Reinforcement Learning”) and Rowen et al ( US 20110238510 A1). 
As per claims 1 and 10, Trivedi discloses, A method of routing a transaction” (abstract; EN: this denotes routing transactions through multiple payment gateways. Here the payment gateway is the method of processing online card payments through the various available networks). “at a server system” (Pg.1707, C1, Introduction section; EN: this denotes that this is occurring online which inherently makes use of a server). “from a client system” (Pg.1707, C1, introduction section; EN: this denotes the payments coming from merchants (i.e. a client)). “to a destination payment network in a plurality of payment networks” (Abstract; EN: this denotes the use of multiple payment gateways to process the card payments. Here the appropriate payment gateway of multiple payment gates is the destination payment network of a plurality of payment networks). “the plurality of payment networks in network communication with the server system, comprising” (Abstract; EN: this denotes the use of multiple payment gateways to process the card payments. Here the appropriate payment gateway of multiple payment gates is the destination payment network of a plurality of payment networks).
“Providing, at the server system, a machine learning routing model, the machine learning routing model comprising” (Pg.1708, particularly section 4; EN: this denotes the use of a reinforcement learning model). 
“a plurality of states defining a state space, the state space having a plurality of dimensions” (Pg.1707 Figure 2 and C2 environment section; Pg.1709, C2, last paragraph; EN: this denotes aspects of the environment such as successes and failures at particular gateways. Pg.1710, particularly section 5; EN; this denotes aspects of the environment like eligible gateways, payment method used.  Pg.1711, particularly C2, last bullet; EN: this denotes cost as a consideration on which gateway to accept. All of these things are different dimensions of the state as they are used to decide where to route a transaction). 
“a plurality of actions corresponding to the plurality of payment networks” (Pg.1707-Pg.1708, particularly section 3; EN: this denotes selecting a particular gateway, with the potential actions being selecting that particular gateway). 
“At least one reward function” (pg.1709-1710, particularly section 4.3 and 4.4; EN: this denotes various optional reward functions). 
“Receiving, from the client system, a transaction request…” (Pg.1708 the transaction request comprising a plurality of data” (Pg.1709-1710 particularly section 4.2-4.4; EN: this denotes the system receiving and responding to transactions). 
“determining, at the server system, the destination payment network by”  (Abstract; EN: this denotes the use of multiple payment gateways to process the card payments. Here the appropriate payment gateway of multiple payment gates is the destination payment network of a plurality of payment networks).
“determining a routing decision from the machine learning routing model based on the transaction request and the machine learning model” (Pg.1708, particularly section 4; EN: this denotes the use of a reinforcement learning model to select the appropriate payment gateway/payment network for incoming transactions). “wherein the routing decision is determined based on the at least one reward function calculated for each action in the plurality of actions” (Pg.1707-Pg.1708, particularly section 3; EN: this denotes selecting a particular gateway, with the potential actions being selecting that particular gateway. pg.1709-1710, particularly section 4.3 and 4.4; EN: this denotes various optional reward functions).
“Creating, at the server system, a transaction corresponding to the transaction request…” (Pg.1709-1710 particularly section 4.2-4.4; EN: this denotes the system receiving and responding to transactions).  
“Transmitting the transaction to the destination payment network corresponding to the routing decision” (Pg.1709-1710 particularly section 4.2-4.4; EN: this denotes the system receiving and responding to transactions).  
However, Trivedi fails to explicitly disclose, “a plurality of state transition tables”, “the transaction request comprising a plurality of data, wherein the plurality of data comprises transaction data and transaction metadata”, “the transaction comprising at least a payor, a payee, a transaction price, a transaction status, and a transaction time” 
Rougier discloses, “a plurality of state transition tables” (pg.5-6; EN: this discloses the use of a markov decision process with reinforcement learning, including multiple transition tables for state action pairs).  
Rowen discloses, “the transaction request comprising a plurality of data, wherein the plurality of data comprises transaction data” (Pg.8, particularly paragraph 0089-0093; EN: this denotes transaction data associated with the transaction such as the merchant/store identifier codes, amount purchased, and billing/shipping address. ).  “and transaction metadata” ((pg.8, particularly paragraph  0098; EN: this includes metadata as well, such as the time and date of the transaction). 
“the transaction comprising at least a payor” (pg.8, particularly paragraph 0090; EN; this denotes the transaction identifying the payor).  “A payee” (Pg.8, particularly paragraph 0096; EN: this denotes who is being paid, the merchant or store). “a transaction price” (Pg.8, paragraph 0097; EN: this denotes the transaction price). “a transaction status” (Pg.8, particularly paragraphs 0106; EN: this denotes the approval/refusal status of the transaction). “and a transaction time” (pg.9, particularly paragraph  0098; EN: this includes metadata as well, such as the time and date of the transaction).
Trivedi and Rougier are analogous art because both involve reinforcement learning. 
Before the effective filing date it would have been obvious to one skilled in the art of reinforcement learning to combine the work of Trivedi and Rougier in order to keep track of the states and actions of a reinforcement algorithm. 
	The motivation for doing so would be to help “learning what to do – how to map situations to actions so as to maximize a numerical reward signal.” (Rougier, Pg.4, second quote) and allowing the use of state-transition tables allows the mapping to be clearly displayed. 
Therefore before the effective filing date it would have been obvious to one skilled in the art of reinforcement learning to combine the work of Trivedi and Rougier in order to keep track of the states and actions of a reinforcement algorithm.
Trivedi and Rowen are analogous art because both involve purchase transactions.  
Before the effective filing date it would have been obvious to one skilled in the art of purchase transactions to combine the work of Trivedi and Rowen in order to include purchase details and other data about the ongoing transaction. 
	The motivation for doing so would be to allow the system to properly approve or disapprove the purchase for the safety and security of the card holder and the store “to provide superior fraud detection and prevention performance for the payment card industry authorization process”  (See Rowen, Pg.9,  paragraph 0111). 
Therefore before the effective filing date it would have been obvious to one skilled in the art of purchase transactions to combine the work of Trivedi and Rowen in order to include purchase details and other data about the ongoing transaction.
As per claims 2 and 11, Trivedi discloses, “creating at the server system, a transaction response corresponding to the transaction request and a response of the destination payment network to the transaction and”  (Pg.1708, C2, particularly the reward section; EN: this denotes the transactions being rewarded upon success or failure, which inherently includes some sort of response to the transaction request). 
However, Trivedi fails to explicitly disclose, “Transmitting the transaction response to the client.”
Rowen discloses, “Transmitting the transaction response to the client” (Pg.8, particularly paragraph 0107; EN: this denotes returning the approval or decline from the transaction to the merchant/client). 
Trivedi and Rowen are analogous art because both involve purchase transactions.  
Before the effective filing date it would have been obvious to one skilled in the art of purchase transactions to combine the work of Trivedi and Rowen in order to include purchase details and other data about the ongoing transaction. 
	The motivation for doing so would be to allow the system to properly approve or disapprove the purchase for the safety and security of the card holder and the store “to provide superior fraud detection and prevention performance for the payment card industry authorization process”  (See Rowen, Pg.9,  paragraph 0111). 
Therefore before the effective filing date it would have been obvious to one skilled in the art of purchase transactions to combine the work of Trivedi and Rowen in order to include purchase details and other data about the ongoing transaction.
As per claims 3 and 12, Trivedi discloses, “storing, at a database of the server system, the transaction request, the transaction, and the transaction response” (Pg.1709, particularly C2, last paragraph; EN: this denotes tracking rewards and successes of the transactions including the goal and the response. Pg.1710, particularly section 4.4.; EN: this denotes keeping track of the transactions over time). 


Claim Rejections - 35 USC § 103
Claims 4-9 and 13-18 are rejected under 35 U.S.C. 103 as being unpatentable over Trivedi et al (“Stochastic Multi-path Routing Problem with Non-stationary Rewards: Building PayU’s Dynamic Routing”) in view of Rougier (“Reinforcement Learning”) and Rowen et al ( US 20110238510 A1) and further in view of Varghese et al (US 20060282660 A1). 
As per claims 4 and 13, Trivedi discloses, “Wherein the state space has at least a price dimension corresponding to the transaction price” (Pg.1711, particularly C2, last bullet; EN: this denotes cost as a consideration on which gateway to accept. All of these things are different dimensions of the state as they are used to decide where to route a transaction). “a reliability dimension corresponding to the transaction status” (Pg.1707 Figure 2 and C2 environment section; Pg.1709, C2, last paragraph; EN: this denotes aspects of the environment such as successes and failures at particular gateways). 
However, Trivedi fails to explicitly disclose, “a network speed dimension corresponding to the transaction time”, “a compliance dimension”, “and a geographic dimension”
Rowen discloses, “a compliance dimension”  (Pg.8, particularly paragraph 0102; EN: this denotes considering of fraud and the like when performing payment transactions). 
Varghese discloses, “a network speed dimension corresponding to the transaction time” and “and a geographic dimension” (Pg.8, particularly paragraph 0090 and Table 3; EN: this denotes various information important for bank transaction authentication, including location (i.e. geographic information) as well as connection type, speed, etc). 
Trivedi and Rowen are analogous art because both involve purchase transactions.  
Before the effective filing date it would have been obvious to one skilled in the art of purchase transactions to combine the work of Trivedi and Rowen in order to include purchase details and other data about the ongoing transaction. 
	The motivation for doing so would be to allow the system to properly approve or disapprove the purchase for the safety and security of the card holder and the store “to provide superior fraud detection and prevention performance for the payment card industry authorization process”  (See Rowen, Pg.9,  paragraph 0111) or in the case of compliance information, allow the system to have the proper information to determine whether the transaction is fraudulent when performing the routing of the transaction. 
Therefore before the effective filing date it would have been obvious to one skilled in the art of purchase transactions to combine the work of Trivedi and Rowen in order to include purchase details and other data about the ongoing transaction.
Trivedi and Varghese are analogous art because both involve purchase transactions and fraud detection. 
Before the effective filing date it would have been obvious to one skilled in the art of purchase transactions and fraud detection to combine the work of Trivedi and Varghese in order to include speed and location information for fraud detection for purchase transactions. 
	The motivation for doing so would be to perform checks of “authenticity and security of a user-initiated request made to a service-provider application, e.g. an online store application, an online banking application, and the like” (Varghese, Pg.7, paragraph 0088) or in the case of Trivedi, allow the system to consider other aspects like proper fraud detection when routing transactions to particular payment networks.
Therefore before the effective filing date it would have been obvious to one skilled in the art of purchase transactions and fraud detection to combine the work of Trivedi and Varghese in order to include speed and location information for fraud detection for purchase transactions.
As per claims 5 and 14, Rougier discloses, “wherein each state transition table in the plurality of state transition tables comprises a plurality of state transition entries” (Pg.6; EN :this denotes numerous transition matrices with multiple entries). “each state transition entry describing a probability of a state transition from a first state to a second state for a given action” (Pg.5; EN: this denotes the values in the transition tables being probabilities). 
As per claims 6 and 15, Trivedi discloses,  “Wherein determining the routing decision comprises: for each action in the plurality of actions” (pg.1708, particularly section 4.1; EN: this denotes the actions part of the reinforcement learning algorithm). 
“determining an expected action score via reinforcement learning” (pg.1708, particularly section 4.1;” EN: This denotes using reinforcement learning with the actions). 
“selecting the action having the highest expected action score as the routing decision” (Pg.1708, particularly the Recommender section; EN: this denotes selecting the best gateway possible based upon the reinforcement learning). 
As per claims 7 and 16, Trivedi discloses, “, “wherein the determining an expected action score further comprises …” (pg.1707-1708, particularly sections 3 and 4; EN: this denotes the mathematics to perform the gateway selection via reinforcement learning).  “probability of an action transitioning a transaction…” (Pg.1710, particularly C1, first paragraph; EN: this denotes selecting gateways via probability). “…”is a reward of transitioning…calculated based on the at least one reward function” (Pg.1710, particularly section 4.4; EN; this denotes calculating rewards based upon transitions). “… and is a discounted future action score” (Pg.1710, particularly section 4.4; EN: this denotes having discounts based upon how old the rewards are). 
Trivedi fails to explicitly disclose, “wherein the determining an expected action score further comprises SUMs’Pa(s,s’)(ra(s,s’) + yV(s’)) wherein Pa(s,s’) is a probability of an action transitioning a transaction from state s to s’, Ra(s,s’) is a reward of transitioning from state s to s’ calculated based on the at least one reward function, and yV(s’) is a discounted future action score.” 
Rougier discloses, “wherein the determining an expected action score further comprises SUMs’Pa(s,s’)(ra(s,s’) + yV(s’))” (Pg.21, particularly the Bellman Equation section; EN: This denotes the same equation as seen in the claims).  “wherein Pa(s,s’) is a probability of an action transitioning a transaction from state s to s’” (Pg.21; EN: this discloses that Pass’ is the probability of transitioning to the next state).  “Ra(s,s’) is a reward of transitioning from state s to s’ calculated based on the at least one reward function” (Pg.12; Pg.21; EN: this denotes the reward function aspects). “and yV(s’) is a discounted future action score” (Pg.15. Pg.21; EN: this denotes the last part being the discount portion). 
Trivedi and Rougier are analogous art because both involve reinforcement learning. 
Before the effective filing date it would have been obvious to one skilled in the art of reinforcement learning to combine the work of Trivedi and Rougier in order to use the particular equation found within the Rougier reference. 
	The motivation for doing so would be to achieve “for a given policy pi, we can valuate a state (using a scalar) such that this value expresses the sequences of future rewards. We thus need to define how to aggregate future rewards into a single value” (Rougier, Pg.12) or in the case of Trivedi  and Rougier, allow the Trivedi reinforcement learning algorithm to use the reinforcement learning equations of the Rougier reference to determine an optimal policy (see Rougier, Pg.22). 
	 
Therefore before the effective filing date it would have been obvious to one skilled in the art of reinforcement learning to combine the work of Trivedi and Rougier in order to keep track of the states and actions of a reinforcement algorithm.
As per claim 8, Trivedi discloses, “Further comprising determining the routing decision based on the at least one, each reward function being a weighted linear combination of state components” (Pg.1710, particularly section 4.4; EN: this denotes the reward function being weighted).
As per claims 9 and 18, Trivedi fails to explicitly disclose, “Wherein the machine learning routing model is modelled using a markov decision process” 
Rougier discloses, “Wherein the machine learning routing model is modelled using a markov decision process” (pg.5; EN: this denotes defining the problem in the reinforcement learning as a markov decision process). 
Trivedi and Rougier are analogous art because both involve reinforcement learning. 
Before the effective filing date it would have been obvious to one skilled in the art of reinforcement learning to combine the work of Trivedi and Rougier in order to use a markov decision process. 
	The motivation for doing so would be to provide a description of the problem for a set of finite states and actions as described in the Trivedi reference where the gates are the limited sets of actions and the current environment is the state, and the benefit of the MDP is that for any MDP there exists an optimal deterministic policy (see Rougier, Pg. 11). 
Therefore before the effective filing date it would have been obvious to one skilled in the art of reinforcement learning to combine the work of Trivedi and Rougier in order to use a markov decision process.
As per claim 17, Trivedi discloses, “determining the routing decision based on a weighted combination of state components” (Pg.1710, particularly section 4.4; EN: this denotes the reward function being weighted).

Conclusion
The examiner requests, in response to this Office action, support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line no(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.
When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections See 37 CFR 1.111(c). 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BEN M RIFKIN whose telephone number is (571)272-9768. The examiner can normally be reached Monday-Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Trujillo can be reached on (571) 272-3677. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BEN M RIFKIN/Primary Examiner, Art Unit 2198