DETAILED ACTION
This action is in response to claims filed 28 May 2019 for application 16403388 filed 03 May 2019. Currently claims 2-21 are pending and have been examined.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Terminal Disclaimer
The terminal disclaimer filed on 22 February 2022 disclaiming the terminal portion of any patent granted on this application which would extend beyond the expiration date of US Patent 10,346,741 has been reviewed and is accepted.  The terminal disclaimer has been recorded.
Reasons for Allowance
The following is an examiner’s statement of reasons for allowance: None of the prior art teaches the limitations of claims either alone or in combination, particularly:

for each observation received before the environment replica interacted with by the actor associated with the worker transitions into the state that satisfies the particular criteria:
generating, based on the observation and the current values of the parameters of the baseline network, a corresponding baseline score representing an estimated reward received by the agent starting from the state characterized by the observation;
determining an actual long-term reward corresponding to the observation; and

updating respective accumulated gradients for the baseline and policy networks based on the respective current gradients for the baseline and policy networks;

	The closest prior art of record Mnih et al. (Playing Atari with Deep Reinforcement Learning) discloses using reinforcement learning and rewards in an iterative process but does not disclose the difference between baseline and actual rewards or the particular method of the instant claims. Baird et al. (Gradient Descent for General Reinforcement Learning) discloses exemplary gradient descent algorithms. Tan (Multi-Agent Reinforcement Learning: Independent vs Cooperative Agents) discloses using reinforcement learning with agents in environments. 
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Claims 2-21 are allowed.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC NILSSON whose telephone number is (571)272-5246.  The examiner can normally be reached on M-F: 9-5.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571)-272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ERIC NILSSON/Primary Examiner, Art Unit 2122