Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

EXAMINER’S AMENDMENT
Authorization for this examiner’s amendment was given in an interview with Lisa Schreihart on 4/27/22.

The application has been amended as follows: 

1. (Currently Amended) A computer-implemented method, comprising: 
obtaining, at a service provider network, a first set of parameters of a robotic device associated with a customer account and a second set of parameters specifying a reinforcement learning model, wherein the reinforcement learning model comprises a customer-defined reinforcement function; 
generating, in response to a first request to simulate execution of an application comprising the reinforcement learning model for the robotic device, a simulation environment; 
provisioning computing resources determined to be sufficient to execute the simulation environment and train the reinforcement learning model, wherein sufficiency of the computing resources is determined based, at least in part, on the second set of parameters specifying the reinforcement learning model; 
executing, in the simulation environment using the provisioned computing resources, the application for the robotic device to obtain data indicating simulated performance of the application; 
updating the reinforcement learning model based at least in part on the simulated performance and a customer-supplied update to the customer-defined reinforcement function during the simulated performance; 
providing the data and the updated reinforcement learning model to fulfill the first request; 
obtaining a second request to install the application comprising the updated reinforcement learning model on to a fleet of robotic devices associated with the customer account, the fleet including the robotic device; 
transmitting the application from the service provider network to the fleet of robotic devices associated with the customer account that includes the robotic device; 
monitoring performance of the fleet of robotic devices in a physical environment resulting from the execution of the application in the physical environment; and 
updating the reinforcement learning model based at least in part on the monitored performance in the physical environment.

5. (Currently Amended) A system, comprising: 
one or more processors; and 
memory that stores computer-executable instructions that, if executed, cause the one or more processors to: 
obtain a first set of parameters of a robotic device associated with a customer account and a second set of parameters specifying a simulation environment for testing an application of the robotic device, the first set of parameters indicating a data storage location of the application and the customer account and the second set of parameters indicating a selection of the simulation environment from a set of simulation environments; 
obtain a reinforcement learning model for the application and a termination condition for completing training of the reinforcement learning model, wherein the reinforcement learning model comprises a customer-defined reinforcement function;
select, from a pool of resources, a set of resources on which to execute a set of simulations in the simulation environment, the set of resources selected based, at least in part, on sufficiency of the set of resources for executing the set of simulations and training the reinforcement learning model; 
obtain the application from the data storage location; 
load the application onto the set of resources; 
execute the set of simulations with the application to train the reinforcement learning model until the termination condition is reached;
monitor execution of the set of simulations to obtain data indicative of performance of the application; and 
provide the data to enable modification of the application and the reinforcement learning model during the execution of the set of simulations, wherein:
the modification of the reinforcement learning model comprises receiving, from a customer, an updated 
the modification of the application is based on at least one of the performance of the application, the updated customer-defined reinforcement function, and the termination condition.

13. (Currently Amended) A non-transitory computer-readable storage medium comprising executable instructions that, as a result of being executed by one or more processors of a computer system, cause the computer system to at least: 
obtain a first set of parameters of a robotic device associated with a customer account and a second set of parameters specifying a simulation environment for testing an application of the robotic device; 
obtain a set of reinforcement learning models, wherein a reinforcement learning model of the set of reinforcement learning models comprises a customer-defined reinforcement function; 
select, from a pool of computer system instances, a set of computer system instances on which to execute a set of simulations in the simulation environment, the set of computing system instances determined to be sufficient for executing the set of simulations and training the set of reinforcement learning models; 
execute the set of simulations in the simulation environment, each simulation simulating a reinforcement learning model of the set of reinforcement learning models; 
obtain a selection from the set of reinforcement learning models indicating a selected reinforcement learning model; 
update the application based on the selected reinforcement learning model; 
select a fleet of robotic devices from a set of robotic devices associated with the customer account;
transmit, to the fleet of robotic devices, associated with the customer account, that includes the robotic device, a resource identifier indicating a location of the application data to cause the fleet of robotic devices to install and execute the application; 
obtain, from the robotic device, data generated in response to sensor inputs obtained by the robotic device; and 
update, using the data, a graphical user interface to enable monitoring of the robotic device and updating of the customer-defined reinforcement function with a customer-defined update during execution of the application.

20. (Currently Amended) The non-transitory computer-readable storage medium of claim 13, wherein the instructions that cause the computer system to execute the set of simulations in the simulation environment further cause the computer system to utilize the set of reinforcement learning models based on simulated performance of a simulated robotic device within the simulation environment, the instructions further causing the system to: 
generate a ranking of the set of reinforcement learning models utilized in the simulation environment; and 
present the ranking of the set of reinforcement learning models, 
wherein the ranking provides information usable, by a customer, for updating the customer-defined reinforcement function and defining a new customer-defined reinforcement function during the set of simulations.

All other claims remain as previously presented. 

REASONS FOR ALLOWANCE
The following is an examiner’s statement of reasons for allowance:
The closest prior art (e.g. US 9,671,777 to Aichele et al., “CARLA: An Open Urban Driving Simulator” by Dosovitskiy et al., US 10,058,995 to Sampedro et al., US 8,418,000 to Salame, and US 9,110,496 to Michelsen) discloses or suggests:
obtaining, at a service provider network, a first set of parameters of a robotic device associated with a customer account and a second set of parameters specifying a reinforcement learning model, wherein the reinforcement learning model comprises a customer-defined reinforcement function (Aichele col. 7, lines 43-47 “data associated with … architecture for robots, … working environments”, col. 8, lines 24-28 “provide a learning process for optimization of robot behavior and task results”, Sampedro col. 18, lines 59-63 “a neural network model may be provided in a request along with the robot instructions”, Dosovitskiy pg. 5, §3.3. 1st par. “deep reinforcement learning … based on a reward signal provided by the environment”, Salame col. 11, lines 1-9 “on behalf of a plurality of customers”); 
generating, in response to a first request to simulate execution of an application comprising the reinforcement learning model for the robotic device, a simulation environment (Aichele col. 8, lines 32-34 “determining a physically plausible virtual environment for a simulated robot at operation 210”, Sampedro col. 12, lines 11-17 “the robot instruction may include instructions for utilizing the neural network model”); 
provisioning computing resources determined to be sufficient to execute the simulation environment and train the reinforcement learning model, wherein sufficiency of the computing resources is determined based, at least in part, on the second set of parameters specifying the reinforcement learning model (Michelsen col. 2, lines 1-3 “provisioned based, at least in part, on the predicted requirements of the test”); 
executing, in the simulation environment using the provisioned computing resources, the application for the robotic device to obtain data indicating simulated performance of the application (Aichele col. 9, lines 31-34 “the plurality of plausible tests may be executed substantially simultaneously on the simulated robot”); 
updating the reinforcement learning model based at least in part on the simulated performance and the customer-defined reinforcement function during the simulated performance (Aichele col. 8, lines 24-28 “provide a learning process for optimization of robot behavior and task results”, Dosovitskiy par. bridging pp. 5-6 “The reward is a weighted sum of five terms: positively weighted speed and distance … negatively weighted collision damage, overlap with the sidewalk, and overlap with the opposite lane”); 
providing the data and the updated reinforcement learning model to fulfill the first request (Aichele col. 9, lines 40-42 “test results … of the execution … may be obtained”); 
obtaining a second request to install the application comprising the updated reinforcement learning model on to a fleet of robotic devices associated with the customer account, the fleet including the robotic device (Aichele col. 11, lines 12-14 “In the case of the success result 420, a step 460 maybe performed by running the plausible test … on a physical robot”, Sampedro col. 17, lines 25-35 “operating each of a plurality of testing robots”, Salame col. 11, lines 1-9 “each customer [has] its own functionally complex system 470”); 
transmitting the application from the service provider network to the fleet of robotic devices associated with the customer account that includes the robotic device (Aichele col. 11, lines 12-14 “running the plausible test … on a physical robot”); 
monitoring performance of the fleet of robotic devices in a physical environment resulting from the execution of the application in the physical environment (Aichele col. 11, lines 12-14 “running the plausible test … on a physical robot”); and 
updating the reinforcement learning model based at least in part on the monitored performance in the physical environment (Aichele col. 10, lines 48-53 “The test parameters … may be adjusted automatically by employing machine learning”).

The closest prior art does not fairly disclose or suggest:
executing the application for the robotic device to obtain data indicating simulated performance of the application; and
updating the reinforcement learning model based at least in part on the simulated performance and a customer-supplied update to the customer-defined reinforcement function during the simulated performance.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASON D MITCHELL whose telephone number is (571)272-3728. The examiner can normally be reached Monday through Thursday 7:00am - 4:30pm and alternate Fridays 7:00am 3:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached on (571)272-3759. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JASON D MITCHELL/Primary Examiner, Art Unit 2199