DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter
Claims 1, 2, 4-19, 32 are allowed.
As allowable subject matter has been indicated, applicant's reply must either comply with all formal requirements or specifically traverse each requirement not complied with.  See 37 CFR 1.111(b) and MPEP § 707.07(a).
The following is an examiner’s statement of reasons for allowance:
The prior art does not disclose:
Claim 1,
“storing, in a buffer, instances of robot experience data generated during the episodes by the robots, each of the instances of the robot experience data being generated during a corresponding one of the episodes, and being generated at least in part on corresponding output generated using the policy neural network with corresponding policy parameters for the policy neural network for the corresponding episode, wherein the instances of the experience data for a given robot of the plurality of robots are stored in the buffer at a first frequency; iteratively generating updated policy parameters of the policy neural network at a second frequency greater than the first frequency, wherein each of multiple iterations of the iteratively generating comprises generating the updated policy parameters using a group of one or more of the instances of the robot experience data in the buffer during the iteration; and by each of the robots in conjunction with a start of each of a plurality of the episodes performed by the 

Claim 13,
“providing, in one iteration of a plurality of experience data iterations of providing experience data from the given robot, first instances of robot experience data generated based on the policy network during the given episode, wherein the plurality of experience data iterations occur at a first frequency; prior to performance, by the given robot, of a subsequent episode of performing the task based on the policy network: replacing one or more of the policy parameters of the first group with updated policy parameters, wherein the updated policy parameters are generated based on training of the policy network based on additional instances of robot experience data, generated by an additional robot during an additional robot episode of explorations of performing the task by the additional robot, wherein the performing the task by the additional robot is based on the policy network, and wherein the training of the policy network comprises a plurality of training iterations occurring at a second Page 4 of 9Patent Application No. 16/333,482 Attorney Docket No. ZS202-19890 Response to 12/09/2021 Office Action frequency that is greater than the first frequency, the plurality of training iterations including; a first training iteration of training of the policy network based at least in part on the first instances and the additional instances; and one or more additional training iterations of the policy network based on yet further instances of experience data from the plurality of the robots; wherein the subsequent episode immediately follows the [[first]] given episode, and wherein 
Claim 32, 
“iteratively receiving instances of experience data generated by a plurality of robots operating asynchronously and simultaneously, wherein each of the instances of experience data is generated by a corresponding robot of the plurality of robots during a corresponding episode of task exploration based on a policy neural network, and wherein the instances of experience data generated by a given robot of the plurality of robots are received at a first frequency; iteratively training the policy neural network at a second frequency based on the received experience data from the plurality of robots to generate one or more updated parameters of the policy neural network at each of the training iterations, wherein the second frequency is greater than the first frequency; and iteratively and asynchronously providing instances of the updated parameters to the robots for updating the policy neural network of the robots prior to the Page 6 of 9Patent Application No. 16/333,482Response to 12/09/2021 Office Actionsubsequent episodes of task explorations on which further instances of experience data are based.”
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
The invention is useful as a method implemented by one of more processors.




Communications
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RONNIE MANCHO whose telephone number is (571)272-6984.  The examiner can normally be reached on Mon-Thurs.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Khoi Tran can be reached on 571 272 6919.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/RONNIE M MANCHO/Primary Examiner, Art Unit 3664