DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
• The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
• This action is responsive to the following communication: US Patent Application filed on 4/27/2020.

Information Disclosure Statement
• The information disclosure statement (IDS) submitted on 8/7/2020 & 8/3/2021 were in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

EXAMINER’S AMENDMENT
• An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee. 
The claim has been amended as follows: Amending claim 13 to depend upon claim 12, since claim 13 is a method claim, and not a system claim. 
Claim 13 (Currently Amended). The computer-implemented method of claim [[1]] 12, further comprising: receiving an observation characterizing a current state of an instance of the environment, generating a transition starting from the received observation by selecting actions to be performed by the agent using the action selection neural network replica and in accordance with current values of the action selection parameters, and storing the transition in the memory.

Allowable Subject Matter
• Claims 1-10, 12, 13, 15 are allowed and renumbered as claims 1-13 respectively.
---The following is an examiner’s statement of reasons for allowance: The searched prior arts fail to yield any references (e.g. either singularly or combination of references) that teach and/or suggest “sampling a transition from a memory, wherein the transition includes an observation-action-reward triple and a last observation; processing the observation-action pair using a distributional Q network having critic parameters to generate, for the triple, a distribution over possible returns that could result if the action is performed in response to the observation; processing the last observation in the transition using a target action selection network to map the last observation to a next action, wherein the target action selection network has the same architecture as the action selection neural network but with different parameter values; processing the last observation and the next action using a target distributional Q network to generate a distribution over possible returns that could result if the next action is performed in response to the last observation, wherein the target distributional Q network has the same architecture as the distributional Q neural network but with different parameter values; determining a target distribution for the triple from the reward in the triple and the distribution over possible returns for the last observation; determining an update to the critic parameters of the distributional Q network by determining a gradient of a critic objective that depends on a distance between the target distribution for the triple and the distribution generated by the distributional Q network for the triple; and determining an update to the action selection parameters using the distributional Q network” as cited in claim 1. The same also applies to claims 12 and 15 (renumbered as claims 11 and 13 respectively). 
---Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THIERRY L PHAM whose telephone number is (571)272-7439. The examiner can normally be reached M-F, 11-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Tieu can be reached on (571)272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/THIERRY L PHAM/Primary Examiner, Art Unit 2674