DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application is a continuation of Application No. 16/380,125 (filed on 04/10/2019, now U.S. Patent No. 10632618), which is a continuation of PCT/US2017/055894 (filed on 10/10/2017), which claims benefit of provisional Application No. 62/406,363 (filed on 10/10/2016).
An electronic Terminal Disclaimer was filed and approved on 03/09/2022.
This action is in response to preliminary amendments and remarks filed on 03/26/2020. In the amendments, claims 1-20 are cancelled and claims 21-42 are added. Claims 21-42 are pending and have been examined.

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 03/25/2020, 10/23/2020, 04/19/2021, and 02/23/2022. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with Applicant’s Representative, Kim Bui (Registration No. 76843), on 03/09/2022.


The application has been amended as follows- Claim 23 is cancelled and Claims 21, 26, 27, 29-31, and 33-42 are amended:
21. (Currently Amended) A computer-implemented method comprising:
receiving an observation characterizing a current state of a real-world environment being interacted with by a robotic agent to perform a robotic task;
processing the observation using a neural network system to generate a policy output,
wherein the neural network system comprises a sequence of deep neural networks (DNNs), and
wherein the sequence of DNNs comprises:
a simulation-trained DNN that has been trained on interactions of a simulated
version of the robotic agent with a simulated version of the real-world environment to perform a simulated version of the robotic task, wherein:
the simulation-trained DNN comprises a first plurality of indexed layers,
and
the simulation-trained DNN, -implemented by one or more computers, is configured to receive the observation and process the observation through each layer in the first plurality of indexed layers to generate a respective layer output for each layer in the first plurality of indexed layers; and
a first robot-trained DNN that has been trained on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed, wherein
the first robot-trained DNN comprises a second plurality of indexed layers,
the first robot-trained DNN, -implemented by the one or more computers, is configured to receive the observation and to process the observation through each layer in the second plurality of indexed layers to generate the policy output, and
one or more of the layers in the second plurality of indexed layers, implemented by the one or more computers, are each configured to receive as input (i) a layer output generated by a preceding layer of the first robot-trained DNN, and (ii) a layer output generated by a preceding layer of the simulation-trained DNN, wherein a preceding layer is a layer whose index is one less than the index of the layer; and
		selecting an action to be performed by the robotic agent in response to the observation using the policy output.



26. (Currently Amended) The method of claim 25, wherein the recurrent neural network layer in the second plurality of indexed layers, -implemented by the one or more computers, is configured to receive as input (i) a layer output of a layer preceding the recurrent neural network layer in the first plurality of indexed layers, (ii) an internal state of the recurrent neural network layer in the first plurality of indexed layers, and (iii) a layer output of a layer preceding the recurrent neural network layer in the second plurality of indexed layers.

27. (Currently Amended) The method of claim 21, wherein each of the one or more of the layers in the second plurality of indexed layers, -implemented by the one or more computers, that are configured to receive as input (i) a layer output generated by a preceding layer of the first robot-trained DNN, and (ii) a layer output generated by a preceding layer of the simulation-trained DNN, -implemented by the one or more computers, is further configured to:
apply a respective first set of parameters to the layer output generated by the preceding layer of the first robot-trained DNN; and
apply a respective second set of parameters to the layer output generated by the preceding layer of the simulation-trained DNN.

29. (Currently Amended) The method of claim [[21]] 28, wherein the plurality of degrees of freedom include one or more joints of the robotic agent and one or more actuators of the robotic agent.


30. (Currently Amended) The method of claim 21, wherein the sequence of DNN s further comprises: 
a second robot-trained DNN, wherein:
the second robot-trained DNN comprises a third plurality of indexed layers, and
one or more of the layers in the third plurality of indexed layers, -implemented by the one or more computers, are each configured to receive as input (i) a layer output generated by a preceding layer of the first robot-trained DNN, (ii) a layer output generated by a preceding layer of the simulation-trained DNN, and (iii) a layer output generated by a preceding layer of the second robot-trained DNN.




the second robot-trained DNN, -implemented by the one or more computers, is configured to receive different data characterizing the current state in conjunction with the observation; and
the second robot-trained DNN, -implemented by the one or more computers, is configured to process the different data through the third plurality of indexed layers to generate a second policy output that defines an action to be performed by the robotic agent to perform a second, different robotic task.

33. (Currently Amended) One or more non-transitory computer-readable 
receiving an observation characterizing a current state of a real-world environment being interacted with by a robotic agent to perform a robotic task;
processing the observation using a neural network system to generate a policy output, wherein the neural network system comprises a sequence of deep neural networks (DNNs), and
wherein the sequence of DNN s comprises:
a simulation-trained DNN that has been trained on interactions of a simulated version of the robotic agent with a simulated version of the real-world environment to perform a simulated version of the robotic task, wherein:
the simulation-trained DNN comprises a first plurality of indexed layers, and
the simulation-trained DNN, implemented by the one or more computers, is configured to receive the observation and process the observation through each layer in the first plurality of indexed layers to generate a respective layer output for each layer in the first plurality of indexed layers; and
a first robot-trained DNN that has been trained on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed, wherein
the first robot-trained DNN comprises a second plurality of indexed layers,
the first robot-trained DNN, implemented by the one or more computers, is configured to receive the observation and to process the observation through each layer in the second plurality of indexed layers to generate the policy output, and
one or more of the layers in the second plurality of indexed layers, implemented by the one or more computers, are each configured to receive as input (i) a layer output generated by a preceding layer 
selecting an action to be performed by the robotic agent in response to the observation using the policy output.

34. (Currently Amended) A system comprising:
one or more computers; and
one or more non-transitory computer-readable 
receiving an observation characterizing a current state of a real-world environment being interacted with by a robotic agent to perform a robotic task;
processing the observation using a neural network system to generate a policy output, wherein the neural network system comprises a sequence of deep neural networks (DNNs), and
wherein the sequence of DNN s comprises:
a simulation-trained DNN that has been trained on interactions of a simulated version of the robotic agent with a simulated version of the real-world environment to perform a simulated version of the robotic task, wherein:
the simulation-trained DNN comprises a first plurality of indexed layers,
and
the simulation-trained DNN, implemented by the one or more computers, is configured to receive the observation and process the observation through each layer in the first plurality of indexed layers to generate a respective layer output for each layer in the first plurality of indexed layers; and
a first robot-trained DNN that has been trained on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed, wherein
the first robot-trained DNN comprises a second plurality of indexed layers,
the first robot-trained DNN, implemented by the one or more computers, is configured to receive the observation and to process the observation through each layer in the second plurality of indexed layers to generate the policy output, and
one or more of the layers in the second plurality of indexed layers, implemented by the one or more computers, are each configured to receive as input (i) a layer output generated by a 
selecting an action to be performed by the robotic agent in response to the observation using the policy output.

35. (Currently Amended) One or more non-transitory computer-readable , implemented by the one or more computers, is configured to receive an observation characterizing a current state of a real-world environment being interacted with by a robotic agent to perform a robotic task and to process the observation to generate a policy output that defines an action to be performed by the robotic agent in response to the observation,
wherein the sequence of DNN s comprises:
a simulation-trained DNN, wherein the simulation-trained DNN, implemented by the one or more computers, comprises a first plurality of indexed layers and is configured to receive the observation and process the observation through each layer in the first plurality of indexed layers to generate a respective layer output for each layer in the first plurality of indexed layers; and
a first robot-trained DNN that has been trained on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed, wherein
the first robot-trained DNN comprises a second plurality of indexed layers,
the first robot-trained DNN, implemented by the one or more computers, is configured to receive the observation and to process the observation through each layer in the second plurality of indexed layers to generate the policy output, and
one or more of the layers in the second plurality of indexed layers, implemented by the one or more computers, are each configured to receive as input (i) a layer output generated by a preceding layer of the first robot-trained DNN, and (ii) a layer output generated by a preceding layer of the simulation-trained DNN, wherein a preceding layer is a layer whose index is one less than the index of the layer,
wherein the operations comprise:


36. (Currently Amended) The one or more non-transitory computer-readable , implemented by the one or more computers, is configured to:
receive as input (i) a layer output generated by a layer preceding the output layer of the first robot-trained DNN, and (ii) a layer output generated by a layer preceding an output layer of the simulation-trained DNN;
apply a first set of parameters to the layer output generated by a layer preceding the output layer the first robot-trained DNN; and
apply a second set of parameters to the layer output generated by a layer preceding the output layer of the simulation-trained DNN, and
wherein the method further comprises:
initializing values of the second set of parameters to match trained values of parameters of the output layer of the simulation-trained DNN.
.
37. (Currently Amended) The one or more non-transitory computer-readable 


38. (Currently Amended) The one or more non-transitory computer-readable of the first robot-trained DNN to random values.

39. (Currently Amended) The one or more non-transitory computer-readable 

40. (Currently Amended) The one or more non-transitory computer-readable 


42. (Currently Amended) A system comprising:
one or more computers; and
one or more non-transitory computer-readable , implemented by the one or more computers, is configured to receive an observation characterizing a current state of a real-world environment being interacted with by a robotic agent to perform a robotic task and to process the observation to generate a policy output that defines an action to be performed by the robotic agent in response to the observation,
wherein the sequence of DNN s comprises:
a simulation-trained DNN, wherein the simulation-trained DNN, implemented by the one or more computers, comprises a first plurality of indexed layers and is configured to receive the observation and process the observation through each layer in the first plurality of indexed layers to generate a respective layer output for each layer in the first plurality of indexed layers; and
a first robot-trained DNN that has been trained on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed, wherein
the first robot-trained DNN comprises a second plurality of indexed layers,
the first robot-trained DNN, implemented by the one or more computers, is configured to receive the observation and to process the observation through each layer in the second plurality of indexed layers to generate the policy output, and
one or more of the layers in the second plurality of indexed layers, implemented by the one or more computers, are each configured to receive as input (i) a layer output generated by a preceding layer of the first robot-trained DNN, and (ii) a layer output generated by a preceding layer of the simulation-trained DNN, wherein a preceding layer is a layer whose index is one less than the index of the layer,
wherein the operations comprise: 
.

Allowable Subject Matter
Claims 21-22 and 24-42 are allowed. These claims are renumbered as claims 1-21 upon allowance.

REASONS FOR ALLOWANCE
The following is an examiner’s statement of reasons for allowance: 
Independent claim 21 is directed to a computer-implemented method. None of the prior arts, either alone or in combination, teaches the following limitations:
...a simulation-trained DNN that has been trained on interactions of a simulated
version of the robotic agent with a simulated version of the real-world environment to perform a simulated version of the robotic task, wherein:
the simulation-trained DNN comprises a first plurality of indexed layers,
and
the simulation-trained DNN, -implemented by one or more computers, is configured to receive the observation and process the observation through each layer in the first plurality of indexed layers to generate a respective layer output for each layer in the first plurality of indexed layers; and
a first robot-trained DNN that has been trained on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed, wherein
the first robot-trained DNN comprises a second plurality of indexed layers,
the first robot-trained DNN, -implemented by the one or more computers, is configured to receive the observation and to process the observation through each layer in the second plurality of indexed layers to generate the policy output, and


Independent claim 33 is directed to one or more non-transitory computer-readable media. None of the prior arts, either alone or in combination, teaches the following limitations:
...a simulation-trained DNN that has been trained on interactions of a simulated version of the robotic agent with a simulated version of the real-world environment to perform a simulated version of the robotic task, wherein:
the simulation-trained DNN comprises a first plurality of indexed layers, and
the simulation-trained DNN, implemented by the one or more computers, is configured to receive the observation and process the observation through each layer in the first plurality of indexed layers to generate a respective layer output for each layer in the first plurality of indexed layers; and
a first robot-trained DNN that has been trained on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed, wherein
the first robot-trained DNN comprises a second plurality of indexed layers,
the first robot-trained DNN, implemented by the one or more computers, is configured to receive the observation and to process the observation through each layer in the second plurality of indexed layers to generate the policy output, and
one or more of the layers in the second plurality of indexed layers, implemented by the one or more computers, are each configured to receive as input (i) a layer output generated by a preceding layer of the first robot-trained DNN, and (ii) a layer output generated by a preceding layer of the simulation-trained DNN, wherein a preceding layer is a layer whose index is one less than the index of the layer; and...

Independent claim 34 is directed to a system. None of the prior arts, either alone or in combination, teaches the following limitations:

the simulation-trained DNN comprises a first plurality of indexed layers,
and
the simulation-trained DNN, implemented by the one or more computers, is configured to receive the observation and process the observation through each layer in the first plurality of indexed layers to generate a respective layer output for each layer in the first plurality of indexed layers; and
a first robot-trained DNN that has been trained on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed, wherein
the first robot-trained DNN comprises a second plurality of indexed layers,
the first robot-trained DNN, implemented by the one or more computers, is configured to receive the observation and to process the observation through each layer in the second plurality of indexed layers to generate the policy output, and
one or more of the layers in the second plurality of indexed layers, implemented by the one or more computers, are each configured to receive as input (i) a layer output generated by a preceding layer of the first robot-trained DNN, and (ii) a layer output generated by a preceding layer of the simulation-trained DNN, wherein a preceding layer is a layer whose index is one less than the index of the layer; and...

Independent claim 35 is directed to one or more non-transitory computer-readable media. None of the prior arts, either alone or in combination, teaches the following limitations:
...a simulation-trained DNN, wherein the simulation-trained DNN, implemented by the one or more computers, comprises a first plurality of indexed layers and is configured to receive the observation and process the observation through each layer in the first plurality of indexed layers to generate a respective layer output for each layer in the first plurality of indexed layers; and
a first robot-trained DNN that has been trained on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters 
the first robot-trained DNN comprises a second plurality of indexed layers,
the first robot-trained DNN, implemented by the one or more computers, is configured to receive the observation and to process the observation through each layer in the second plurality of indexed layers to generate the policy output, and
one or more of the layers in the second plurality of indexed layers, implemented by the one or more computers, are each configured to receive as input (i) a layer output generated by a preceding layer of the first robot-trained DNN, and (ii) a layer output generated by a preceding layer of the simulation-trained DNN, wherein a preceding layer is a layer whose index is one less than the index of the layer,
wherein the operations comprise:
training the first robot-trained DNN on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed.

Independent claim 42 is directed to a system. None of the prior arts, either alone or in combination, teaches the following limitations:
...a simulation-trained DNN, wherein the simulation-trained DNN, implemented by the one or more computers, comprises a first plurality of indexed layers and is configured to receive the observation and process the observation through each layer in the first plurality of indexed layers to generate a respective layer output for each layer in the first plurality of indexed layers; and
a first robot-trained DNN that has been trained on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed, wherein
the first robot-trained DNN comprises a second plurality of indexed layers,
the first robot-trained DNN, implemented by the one or more computers, is configured to receive the observation and to process the observation through each layer in the second plurality of indexed layers to generate the policy output, and
one or more of the layers in the second plurality of indexed layers, implemented by the one or more computers, are each configured to receive as input (i) a layer output generated by a 
wherein the operations comprise: 
training the first robot-trained DNN on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed.

The closest prior arts of record are the following:
Hausknecht et al. (“Deep Reinforcement Learning in Parameterized Action Space”) teaches reinforcement learning within the domain of simulated RoboCup soccer. This prior art does not teach at least the features of a simulation-trained DNN and a first robot-trained DNN wherein a first robot-trained DNN that has been trained on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed, and one or more of the layers in the second plurality of indexed layers, implemented by the one or more computers, are each configured to receive as input (i) a layer output generated by a preceding layer of the first robot-trained DNN, and (ii) a layer output generated by a preceding layer of the simulation-trained DNN, wherein a preceding layer is a layer whose index is one less than the index of the layer.  [“Other Documents” No. 10 on IDS submitted on 03/25/2020]
Levine et al. (“End-to-End Training of Deep Visuomotor Policies”) teaches deep convolutional neural networks (CNNs) based reinforcement learning. This prior art does not teach at least the features of a simulation-trained DNN and a first robot-trained DNN wherein a first robot-trained DNN that has been trained on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed, and one or more of the layers in the second plurality of indexed layers, implemented by the one or more computers, are [“Other Documents” No. 14 on IDS submitted on 03/25/2020]
Foerster et al. (“Learning to Communicate with Deep Multi-Agent Reinforcement Learning”) teaches Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). This prior art does not teach at least the features of a simulation-trained DNN and a first robot-trained DNN wherein a first robot-trained DNN that has been trained on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed, and one or more of the layers in the second plurality of indexed layers, implemented by the one or more computers, are each configured to receive as input (i) a layer output generated by a preceding layer of the first robot-trained DNN, and (ii) a layer output generated by a preceding layer of the simulation-trained DNN, wherein a preceding layer is a layer whose index is one less than the index of the layer.  [“Other Documents” No. 6 on IDS submitted on 03/25/2020]
Lample et al. (“Playing FPS Games with Deep Reinforcement Learning”) teaches utilization of deep reinforcement learning in playing a First-Person-Shooting (FPS) game in a 3D environment. This prior art does not teach at least the features of a simulation-trained DNN and a first robot-trained DNN wherein a first robot-trained DNN that has been trained on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed, and one or more of the layers in the second plurality of indexed layers, implemented by the one or more computers, are each configured to receive as input (i) a layer output generated by a preceding layer of the first robot-trained DNN, and (ii) a layer output generated by a preceding layer of the simulation-trained DNN, wherein a preceding layer is a layer whose index is one less than the index of the layer. [“Other Documents” No. 13 on IDS submitted on 03/25/2020]
Francis et al. (US 2019/0025917 A1) teaches a reinforcement learning brain-machine interface. This prior art does not teach at least the features of a simulation-trained DNN and a first robot-trained DNN wherein a first robot-trained DNN that has been trained on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed, and one or more of the layers in the second plurality of indexed layers, implemented by the one or more computers, are each configured to receive as input (i) a layer output generated by a preceding layer of the first robot-trained DNN, and (ii) a layer output generated by a preceding layer of the simulation-trained DNN, wherein a preceding layer is a layer whose index is one less than the index of the layer.  [“U.S. Patent Documents” No. 1 on IDS submitted on 03/25/2020]
Fisher et al. (US 8,793,205 B1) teaches implementing robotic learning and evolution for an ecosystem of robots utilizing artificial neuron networks for implementing learning of new traits. This prior art does not teach at least the features of a simulation-trained DNN and a first robot-trained DNN wherein a first robot-trained DNN that has been trained on interactions of the robotic agent with the real-world environment to perform the robotic task to determine trained values of parameters of the first robot-trained DNN while holding trained values of the parameters of the simulation-trained DNN fixed, and one or more of the layers in the second plurality of indexed layers, implemented by the one or more computers, are each configured to receive as input (i) a layer output generated by a preceding layer of the first robot-trained DNN, and (ii) a layer output generated by a preceding layer of the simulation-trained DNN, wherein a preceding layer is a layer whose index is one less than the index of the layer.  

The primary reason for the allowance of the claims in this case is the inclusion of the features recited above, now included in the independent claims in combination with the other elements recited, which are
not found in the prior arts of record. Therefore, the present claims are allowable.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484. The examiner can normally be reached Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YING YU CHEN/Examiner, Art Unit 2125