Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 

Claims 21-40 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 10,445,641 to Srinivasan et al. (hereinafter Srinivasan). Although the claims at issue are not identical, they are not patentably distinct from each other because the instant claims are broadened versions of claims in the Patent and thus overlapping in scope and anticipated by the Patent. 
The non-statutory double patenting anticipatory analysis is demonstrated below between independent Claim 21 of the instant application vis-à-vis Claim 2 (which further depends from Claim 1) of the Patent.  


Instant Claim 21
Patent Claim 2 (with dependence on claim 1)
Claim 21. A system for training a Q network used to control an agent that interacts with an environment by receiving observations characterizing a current state of the environment and performing an action from a predetermined set of actions in response to each observation, wherein the Q network is a deep neural network that is configured to receive as input an observation and an action and to generate a neural network output from the input in accordance with a set of parameters, wherein training the Q network comprises adjusting the values of the set of parameters of the Q network, and wherein the system comprises one or more computing units and one or more storage devices storing instructions that when executed by the one or more computing units cause the one or more computing units to implement:
Claim 1. A system for training a reinforcement learning system, the reinforcement learning system comprising an agent that interacts with an environment by receiving observations characterizing a current state of the environment and selecting an action to be performed from a predetermined set of actions, wherein the agent selects an action to be performed using a Q network, wherein the Q network is a deep neural network that is configured to receive as input an observation and an action and to generate a neural network output from the input in accordance with a set of parameters, wherein training the reinforcement learning system comprises adjusting the values of the set of parameters of the Q network, and wherein the system comprises:

  a plurality of computers configured to implement a plurality of learners, wherein each learner executes on a respective computing unit, wherein each learner is configured to operate independently of each other learner, wherein each learner maintains a respective learner Q network replica and a respective target Q network replica, and wherein each learner is further configured to repeatedly perform operations comprising:
   

    receiving, from a parameter server, current values of the parameters of the Q network;
    updating the parameters of the learner Q network replica maintained by the learner using the current values;
    updating the parameters of the learner Q network replica maintained by the learner using the current values;
    selecting an experience tuple from a respective replay memory;
    computing a gradient from the experience tuple using the learner Q network replica maintained by the learner and the target Q network replica maintained by the learner; and
    providing the computed gradient to the parameter server.
  one or more actors, wherein each actor executes on a respective computing unit, wherein each actor is configured to operate independently of each other actor, wherein each actor interacts with a respective replica of the environment, wherein each actor maintains a respective actor Q network replica, and wherein each actor is further configured to repeatedly perform operations comprising:
Claim 2.  The system of claim 1, wherein the one or more computers are further configured to implement one or more actors, wherein each actor executes on a respective computing unit, wherein each actor is configured to operate independently of each other actor, wherein each actor interacts with a respective replica of the environment, wherein each actor maintains a respective actor Q network replica, and wherein each actor is further configured to repeatedly perform operations comprising:
  receiving, from a parameter server, current values of the parameters of the Q network;
  updating the values of the parameters of the actor Q network replica maintained by the actor using the current values; 
  receiving an observation characterizing a current state of the environment replica interacted with by the actor;
  selecting an action to be performed in response to the observation using the actor Q network replica maintained by the actor; 
  receiving a reward in response to the action being performed and a next observation characterizing a next state of the environment replica interacted with by the actor;
  generating an experience tuple that comprises the current observation, the action selected, the reward, and the next observation; and 
    storing the experience tuple in a respective replay memory for the actor for use in training the Q network.
  receiving, from the parameter server, current values of the parameters of the Q network;
  updating the values of the parameters of the actor Q network replica maintained by the actor using the current values;
  receiving an observation characterizing a current state of the environment replica interacted with by the actor; 
  selecting an action to be performed in response to the observation using the actor Q network replica maintained by the actor;
  receiving a reward in response to the action being performed and a next observation characterizing a next state of the environment replica interacted with by the actor;
  generating an experience tuple that comprises the current observation, the action selected, the reward, and the next observation; and 
  storing the experience tuple in a respective replay memory.


Dependent Claim 22 of the instant application is rejected on the ground of nonstatutory double patenting as being unpatentable over Claim 2 of the Patent because all the limitations are recited by Claim 2 of the Patent.  

Claims 32-40 of the instant application are rejected on the ground of nonstatutory double patenting as being unpatentable over Claims 12-20 of the Patent as they parallel an equivalent non-statutory double patenting analysis as that applied for instant claims 21-31.
Allowable Subject Matter
Claims 21-40 would be allowable if rewritten to overcome the Double Patenting rejection set forth in this Office action and to include all of the limitations of the base claim. 
The following is the statement of reasons for the indication of allowable subject matter:  The prior art disclosed by the applicant and cited by the Examiner fail to teach or suggest, alone or in combination, all the limitations of the independent claims 21, 32 and 40, particularly the details of training a Q network as highlighted in figs. 3-5 of the instant Drawings.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALAN CHEN whose telephone number is (571)272-4143. The examiner can normally be reached M-F 10-7.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ALAN CHEN/Primary Examiner, Art Unit 2125