DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 6/10/20 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Allowable Subject Matter
Claims 1-12 allowed.

EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with Attorney Kelly Horn on 8/11/22.
The application has been amended as follows: 
1. 	(Currently Amended) A system for reinforcement based bidding in an energy market, comprising: a communication interface; a memory, wherein the memory comprises a Markovian Decision Process (MDP) based model of the energy market; one or more hardware processors, wherein the one or more hardware processors generate one or more bidding recommendations using Reinforcement Learning (RL), comprising: training a RL agent of the system, by modeling interaction of the RL agent with the MDP based model of the energy market as a Markovian Decision Process (MDP), comprising: defining a state space of the RL agent by using a plurality of state space parameters comprising (i) a forecasted demand Dt+h for a consumption slot t + h that is available at time slot t, (ii) a realized demand Dt+h-48 and a Maximum Clearing Price (MCP) Pt+h-48 at the time slot t+h-48 which precedes the consumption slot t +h by 24 hours, and (iii) a realized demand Dt+h-336 and MCP Pt+h-336 at the time slot which precedes the consumption slot by one week, as state space variables; and defining an action space of the RL agent, comprising defining one or more actions for each state 's' in the state space, wherein the one or more actions when executed, maximize an estimated state- action value function Q0(s,a) of each state in the state space to minimize error between the estimated state-action value function Q0(s, a) and a true state action value function Q(s, a) is minimum; and processing one or more real-time inputs by the RL agent, comprising: observing a state 's' while bidding at a time slot t for a consumption

determining an action from among a plurality of actions defined in the action space, for the state s being observed, as action that minimizes error between Q0(s, a) and Q(s, a) for the state s; executing the determined action to generate a bid; and recommending the generated bid to a user.

2. 	(Original) The system as claimed in claim 1, wherein the MDP based model of the energy market comprises: a first sub-model, wherein the first sub-model models a market clearing operation as an optimization problem as a function of price, maximum quantity of electric energy that can be supplied by a generator in a region m during time t, quantum of power flow through one or more transmission links in the region; and a second sub-model, wherein the second sub-model estimates bids from each competitor, for each generator being considered.

3. 	(Original) The system as claimed in claim 1, wherein the system determines the action for the state s being observed, by: determining error between Q0(s, a) and Q(s, a) for the state s, for each of the one or more actions in the action space; comparing the error determined for the one or more actions with one another; and determining the action having least value of error, as the action that minimizes error between Q0(s, a) and Q(s, a) for the state s.

4. 	(Currently Amended) The system as claimed in claim 1, wherein observing the state space s at time t for a consumption slot t+s by the system comprises obtaining value of each of the state space parameter from a historical information across a pre-defined time period corresponding to 

5. 	(Currently Amended) A processor implemented method for reinforcement based bidding in an energy market, comprising: training a RL agent of a system, via one or more hardware processors, by modeling interaction of the RL agent with a Markovian Decision Process (MDP) based model of the energy market as a Markovian Decision Process, comprising: defining a state space of the RL agent by using a plurality of state space parameters comprising (i) a forecasted demand Dt+h for a consumption slot t + h that is available at time slot t, (ii) a realized demand Dt+h-48 and a Maximum Clearing Price (MCP) Pt+h-48 at the time slot t +h-48 which precedes the consumption slot t +h by 24 hours, and (iii) a realized demand Dt+h-336 and MCP Pt+h-336 at the time slot which precedes the consumption slot by one week, as state space variables; and defining an action space of the RL agent, comprising defining one or more actions for each state s in the state space, wherein the one or more actions when executed, maximizes an estimated state- action value function Q0(s,a) of each state in the state space to minimize error between the estimated state-action value function Q0(s, a) and a true state action value function Q(s, a) is minimum; and processing one or more real-time inputs by the RL agent, via the one or more hardware processors, comprising: observing a state s while bidding at a time slot t for a consumption slot t+s; determining an action from among a plurality of actions defined in the action space, for the state s being observed, as action that minimizes error between Q0(s, a) and Q(s, a) for the state s; executing the determined action to generate a bid; and recommending the generated bid to a user.

6. 	(Original) The processor implemented method as claimed in claim 5, wherein the MDP based model of the electric market comprises: a first sub-model, wherein the first sub-model models a market clearing operation as an optimization problem as a function of price, maximum quantity of electric energy that can be supplied by a generator in a region m during time t, quantum of power flow through one or more transmission links in the region; and a second sub-model, wherein the second sub-model estimates bids from each competitor, for each generator being considered.

7. 	(Original) The processor implemented method as claimed in claim 5, wherein determining the action for the state s being observed, comprises: determining error between Q0(s, a) and Q(s, a) for the state s, for each of the one or more actions in the action space; comparing the error determined for the one or more actions with one another; and determining the action having least value of error, as the action that minimizes error between Q0(s, a) and Q(s, a) for the state s.

8. 	(Currently Amended) The processor implemented method as claimed in claim 5, wherein observing the state space s at time t for a consumption slot t+s comprises obtaining value of each of the state space parameter from a historical information across a pre-defined time period corresponding to 

9. 	(Currently Amended) A non-transitory computer readable medium for reinforcement based bidding in an energy market, wherein the reinforcement based bidding comprising: training a RL agent of a system, via one or more hardware processors, by modeling interaction of the RL agent with a Markovian Decision Process (MDP) based model of the energy market as a Markovian Decision Process, comprising: defining a state space of the RL agent by using a plurality of state space parameters comprising (i) a forecasted demand Dt+h for a consumption slot t + h that is available at time slot t, (ii) a realized demand Dt+h-48 and a Maximum Clearing Price (MCP) Pt+h-48 at the time slot t +h-48 which precedes the consumption slot t +h by 24 hours, and (iii) a realized demand Dt+h-336 and MCP Pt+h-336 at the time slot which precedes the consumption slot by one week, as state space variables; and defining an action space of the RL agent, comprising defining one or more actions for each state s in the state space, wherein the one or more actions when executed, maximizes an estimated state- action value function Q0(s,a) of each state in the state space to minimize error between the estimated state-action value function Q0(s, a) and a true state action value function Q(s, a) is minimum; and processing one or more real-time inputs by the RL agent, via the one or more hardware processors, comprising: observing a state s while bidding at a time slot t for a consumption slot t+s
determining an action from among a plurality of actions defined in the action space, for the state s being observed, as action that minimizes error between Q0(s, a) and Q(s, a) for the state s; executing the determined action to generate a bid; and recommending the generated bid to a user.

10. 	(Original) The non-transitory computer readable medium as claimed in claim 9, wherein the MDP based model of the electric market comprises: a first sub-model, wherein the first sub-model models a market clearing operation as an optimization problem as a function of price, maximum quantity of electric energy that can be supplied by a generator in a region m during time t, quantum of power flow through one or more transmission links in the region; and a second sub-model, wherein the second sub-model estimates bids from each competitor, for each generator being considered.

11. 	(Original) The non-transitory computer readable medium as claimed in claim 9, wherein determining the action for the state s being observed, comprises: determining error between Q6(s, a) and Q(s, a) for the state s, for each of the one or more actions in the action space; comparing the error determined for the one or more actions with one another; and determining the action having least value of error, as the action that minimizes error between QB(s, a) and Q(s, a) for the state s.

12. 	(Currently Amended) The non-transitory computer readable medium as claimed in claim 9, wherein observing the state space s at time t for a consumption slot t+s comprises obtaining value of each of the state space parameter from a historical information across a pre-defined time period corresponding to 

Reasons for Allowance
The following is an examiner’s statement of reasons for allowance: 
Reason’s claims are subject matter eligible under 35 USC 101: Under Step 1 analysis, independent claims 1 (system), 5 (method), and 9 (non-transitory computer-readable medium), respectively fall within at least one of the four statutory categories of 35 U.S.C. 101: (i) process; (ii) machine; (iii) manufacture; or (iv) composition of matter. Claim 1 is directed to a system (i.e. machine), claim 5 is directed to a method (i.e. process), and claim 9 is directed to a non-transitory computer-readable medium (i.e. manufacture). Under Step 2A Prong 1, the independent claims are directed to reinforcement based bidding in an energy market. The claim elements are considered abstract ideas because they are directed to a method of organizing human activity which includes fundamental economic practices in addition to mathematical concepts in the form of mathematical calculations. If a claim limitation, under its broadest reasonable interpretation, covers fundamental economic practices, then it falls within the “method of organizing human activity” grouping of abstract ideas. However, under Step 2A Prong 2, the claims integrate the judicial exception into a practical application by training a reinforcement learning agent by modeling interaction of the RL agent with the MDP based model of the energy market as a Markovian Decision Process (MDP), comprising: defining a state space of the RL agent by using a plurality of state space parameters comprising (i) a forecasted demand Dt+h for the consumption slot t + h that is available at time slot t, (ii) a realized demand Dt+h-48 and a Maximum Clearing Price (MCP) Pt+h-48 at the time slot t+h-48 which precedes the consumption slot t +h by 24 hours, and (iii) a realized demand Dt+h-336 and MCP Pt+h-336 at the time slot which precedes the consumption slot by one week, as state space variables; and defining an action space of the RL agent, comprising defining one or more actions for each state 's' in the state space, wherein the one or more actions when executed, maximize an estimated state- 20action value function Q0(s,a) of each state in the state space to minimize error between the estimated state-action value function Q0(s, a) and a true state action value function Q(s, a) is minimum. This combination of elements applies the judicial exception in a meaningful way beyond generally linking the use of the judicial exception to a particular technological environment with reference to the PEG 2019. 
Reason’s claims are patentable over prior art: Independent claims 1-12 disclose a system, method, and non-transitory computer-readable medium for reinforcement based bidding in an energy market. 
The closest prior art of record is:
Gu et al. (US 20170228662 A1) – which discloses methods of training a policy subnetwork of a reinforcement learning system that is configured to compute Q values for actions to be performed by an agent interacting with an environment from a continuous action space of actions;
Sarker et al. (US 20190139159 A1) - which discloses a system to identify energy opportunities for an entity (e.g., supplier or consumer) to help ensure than an objective of the entity is met, in which ISO provides a day-ahead (“DA”) energy market in which offers and bids are collected for each target period (e.g., hour) of the following day;
Sun et al. (US 20190147551 A1) – which describes a power generation system of a power producer for generating and providing electricity to an electric power system operated by an independent system operator (ISO);
Catanzaro et al. (JP 5662446B2) – which discloses performing real-time advertisement bidding in advertisement switching using the competitive economic evaluation models including machine learning; 
Jin, Junqi, Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising, September 2018, Alibaba Group, University College London, Shanghai JiaoTong Univeristy – discloses the use of multi-agent reinforcement learning on bid optimization of advertisements;
Krause, Thilo & Andersson, Göran & Ernst, Damien & Beck, E.V. & Cherkaoui, Rachid & Germond, Alain. (2012). Nash Equilibria and Reinforcement Learning for Active Decision Maker Modelling in Power Markets – modeling power supplier market bids using reinforcement learning algorithms.  
The prior art of record neither teaches nor suggests all particulars of the limitations as recited in claims 1, 5, and 9. While individual features may be known per se, there is no teaching or suggestion absent applicants’ own disclosure to combine these features other than with impermissible hindsight. Specifically, the claimed “training a RL agent of the system, by modeling interaction of the RL agent with the MDP based model of the energy market as a Markovian Decision Process (MDP), comprising: defining a state space of the RL agent by using a plurality of state space parameters comprising (i) a forecasted demand Dt+h for the consumption slot t + h that is available at time slot t, (ii) a realized demand Dt+h-48 and a Maximum Clearing Price (MCP) Pt+h-48 at the time slot t+h-48 which precedes the consumption slot t +h by 24 hours, and (iii) a realized demand Dt+h-336 and MCP Pt+h-336 at the time slot which precedes the consumption slot by one week, as state space variables; and defining an action space of the RL agent, comprising defining one or more actions for each state 's' in the state space, wherein the one or more actions when executed, maximize an estimated state- 20action value function Q0(s,a) of each state in the state space to minimize error between the estimated state-action value function Q0(s, a) and a true state action value function Q(s, a) is minimum; and processing one or more real-time inputs by the RL agent, comprising: observing a state 's' while bidding at a time slot t for a consumption  
    PNG
    media_image1.png
    16
    63
    media_image1.png
    Greyscale
 determining an action from among a plurality of actions defined in the action space, for the state s being observed, as action that minimizes error between Q0(s, a) and Q(s, a) for the state s”, which is not taught by the prior art. Therefore, the claims are allowable over the prior art noted above. 

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KRISTIN ELIZABETH GAVIN whose telephone number is (571)270-7019. The examiner can normally be reached M-F 7:30-4:30 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Brian Epstein can be reached on 571-270-5389. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/K.E.G./Examiner, Art Unit 3683                                                                                                                                                                                                        

/BRIAN M EPSTEIN/Supervisory Patent Examiner, Art Unit 3683