Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is in response to a request for continued examination filed 9/16/21.
Claims 1-20 are pending.

Response to Arguments
Applicant's arguments filed 8/2/21 have been fully considered but they are not persuasive.

Claims 1-4
Specifically, Dosovitskiy, as referenced by the Office at p. 5, §3.3, 1st paragraph, discloses that the "deep reinforcement learning ... trains a deep network based on a reward signal provided by the environment, with no human driving interfaces." Dosovitskiy further discloses, in the same paragraph, using a specific "actor-critic (A3C) algorithm," not a customer defined reinforcement function. For example, Dosovitskiy discloses that it has no human in the loop ("no human driving interfaces"), by which Dosovitskiy’s algorithm could be customer defined or updated. Thus, Dosovitskiy, alone or in combination with Aichele and/or Sampedro, further does not teach or disclose "updating the reinforcement learning model based on at least one of the simulated performance or an update to the customer-defined reinforcement function," as recited in amended claim 1. (pg. 13, 1st full par.)

As noted by applicant, Dosovitskiy teaches reinforcement learning including a reinforcement function (see e.g. pg. 5, last full par. “deep reinforcement learning … based on a reward signal”), but does not explicitly teach the reinforcement learning model or reinforcement function being defined by a “customer”. Sampedro teaches a user defined learning model (col. 18, lines 59-63 “a neural network model may be provided in a request 

Claims 5-12
Applicant submits that Dosovitskiy, alone or in combination with Aichele, does not teach or disclose "wherein the reinforcement model comprises a customer-defined reinforcement function," as recited in amended claim 5. Specifically, Dosovitskiy, as referenced by the Office at p. 5, §3 .3, 1st paragraph, discloses that the "deep reinforcement learning ... trains a deep network based on a reward signal provided by the environment, with no human driving interfaces." Dosovitskiy further discloses, in the same paragraph, using an "actor-critic (A3C) algorithm," not a customer-defined reinforcement function. For example, Dosovitskiy discloses that it has no human in the loop ("no human driving interfaces"), by which the algorithm could be customer defined or updated. (2nd full par. on pg. 15)

Dosovitskiy teaches a reinforcement learning model comprising a reinforcement function (pg. 5, §3.3. 1st par. “deep reinforcement learning … based on a reward signal provided by the environment”). Salame teaches customers providing similar functions to a testing environment (col. 11, lines 1-9 “on behalf of a plurality of customers”, col. 11, lines 41-45 “users may create … one or more test rules 437”). Accordingly, as indicated in the rejection, the combination of Dosovitskiy and Salame teach the claimed “customer-defined reinforcement function”. 
Additionally it is noted Dosovitskiy disclosure that the training of the deep network is not based on “human driving traces” indicates that no human was required to perform the simulation and/or training and does not indicate that no human devised the reinforcement st par. “a reward signal”, par. bridging pp. 5-6 “The reward is a weighted sum of five terms”).

Further, Dosovitskiy, pp. 5-6, §3.3, 2nd paragraph, does not teach or disclose "send a notification indicating that the termination condition was reached," as recited by amended claim 5. There is no teaching in Dosovitskiy, as cited by the Office, regarding a notification indicating that the termination condition was reached. (last full par. on pg. 15)

Aichele discloses a sending a notification when a test terminates (col. 10, lines “visualizing results 370”). As indicated in the rejection it would have been obvious to send such a notification when Dosovitskiy’s termination condition is reached (e.g. par. bridging pp. 5-6 “terminated when the vehicle reaches the goal, … collides with an obstacle, or when a time budget is exhausted”).

For example, claim 6 has been amended to recite "provide the data to enable modification of the application and the reinforcement learning model comprising the customer-defined reinforcement function, based on at least one of the performance of the application or the termination condition." Accordingly, the data for updating the customer-defined reinforcement function can additionally be based on the termination condition, which is not taught or otherwise rendered obvious by Aichele, Michelsen, Dosovitskiy, Salame, Sampedro, and Yeap, individually or in combination. (last full par. on pg. 16)

First, it is noted that the “based on the termination condition” is presented in the alternative (i.e. “at least one of … or the termination condition”) and thus is not required to show obviousness of the claim as a whole. 
Further, Dosovitskiy teaches generating weights based on “distance traveled to the goal” and “collision damage” which correspond to termination conditions which include “reach[ing] the goal” and “collid[ing] with objects” (see e.g. par. bridging pp. 5-6). Accordingly, the combination of references teaches the claim. 

Claims 13-20
First, Aichele does not disclose "a resource identifier indicating a location of the application data," as recited in claim 13. Aichele’s description does constitute a "resource identifier indicating a location of the application data," as recited in claim 13. As stated in Applicant's specification, at [0048]-[0049], a "resource identifier" may identify a location of a resource. For example, Applicant's Specification states, "rather than transmitting the application 106 to the robotic devices 112 specified by the customer, the robotic device management service 104 transmits, to each robotic device 112, the network address of the data object or other datastore utilized to store the application 106. This may cause each robotic device 112 to utilize the provided network address to access the data object or other datastore used to store the application 106 and obtain, from the data object or other datastore, the application 106 for installation and execution on the robotic device 112." App. Spec. at [0049]. As described in the Applicant's specification, by transmitting the location to the fleet of robotic devices, the robotic devices of the fleet have the information required to fetch the application for installation, as per claim 13, "to cause the fleet of robotic devices to install and execute the application." This is not disclosed by Aichele. None of the descriptions described by Aichele would be sufficient to "indicat[e] a location of the application data to cause the fleet of robotic devices to install and execute the application," as recited in claim 13. (pg. 18, 1st par.)

While applicant’s specification discloses a URI (see e.g. par. [0048]) the claim is not so limited, instead reciting only “a resource identifier” which indicates a “location” of the data. Aichele discloses an identifier of a resource (col. 9, lines 7-17 “description”) that allows the system to retrieve the data from, and thus identifies the location in, a database (col. 9, lines 7-17 “retrieved from the database”). Accordingly, Aichele discloses a resource identifier as claimed. Sampedro is only cited for its teaching of transmitting to a “fleet of robotic devices” (rather than Aichele’s single robot) and is not relied upon to teach the resource identifier.
Further, in the interest of furthering prosecution, it is noted that URI’s were well known in the art and would have been an obvious alternative to Aichele’s “description”. In other words, using a known identifier for its intended purpose does not constitute a non-obvious distinction over otherwise anticipatory art.

Further, Applicant submits that Aichele, Sampedro, and Dosovitskiy, which the Office applied to reject the elements of the "obtain" step as obvious under§ 103, do not in combination teach or disclose "wherein a reinforcement learning model of the set of reinforcement learning models comprises a customer-defined reinforcement function," as recited in amended claim 13. See Office Action, pp. 25-27. The Office acknowledges that the combination of Aichele and Sampedro does not teach a reinforcement learning model, but submits that Dosovitskiy teaches a reinforcement learning model. See Office Action, p. 27. Applicant submits that Dosovitskiy, alone or in combination with Aichele and/or Sampedro, does not teach or disclose "wherein a reinforcement learning model of the set of reinforcement learning models comprises a customer defined reinforcement function," as recited in amended claim 13. Specifically, Dosovitskiy, as referenced by the Office at p. 5, §3.3, pt paragraph, discloses that the "deep reinforcement learning ... trains a deep network based on a reward signal provided by the environment, with no human driving interfaces." Dosovitskiy further discloses, in the same paragraph, using an "actorcritic (A3C) algorithm," not a customer-defined reinforcement function. For example, Dosovitskiy discloses that it has no human in the loop ("no human driving interfaces"), by which the algorithm could be customer-defined or updated. (pg. 19, 1st full par.)

Dosovitskiy teaches a reinforcement learning model comprising a reinforcement function (pg. 5, §3.3. 1st par. “deep reinforcement learning … based on a reward signal provided by the environment”). Salame teaches customers providing similar functions to a testing environment (col. 11, lines 1-9 “on behalf of a plurality of customers”, col. 11, lines 41-45 “users may create … one or more test rules 437”). Accordingly, as indicated in the rejection, the combination of Dosovitskiy and Salame teach the claimed “customer-defined reinforcement function”. 

Salame, Michelsen, and/or Caplan do not cure the shortcomings of the combination of Aichele, Sampedro, and Dosovitskiy. Salame is cited by the Office as teaching only a part of the claim 13 recitations of "a first set of parameters of a robotic device associated with a customer account;" and "select a fleet of robotic devices from a set of robotic devices associated with the customer account." See Office Action, pp. 27-28. … (last partial par. on pg. 19)



For example, claim 20 has been amended to recite "wherein the ranking provides information usable for at least one of updating the customer-defined reinforcement function or for defining a new customer-defined reinforcement function." The Office stated that "a ranking of the set of reinforcement learning models" and "presenting the ranking ... " are taught by Oliner. See Office Action, p. 36. Applicant submits that Oliner, as cited by the Office, does not teach or disclose "wherein the ranking provides information usable for at least one of updating the customer-defined reinforcement function or for defining a new customer-defined reinforcement function." Oliner, at [0395], appears to disclose information usable for "selecting from available options, indicating approval or acceptance, supplying authorization or credentials ... for identifying a model type for the prospective model" or "determining the sequence of processing used to arrive at an identification of a model type for the prospective model." Oliner at [0395]. Oliner does not disclose "information usable for at least one of updating the customer defined reinforcement function or for defining a new customer-defined reinforcement function." The recitation is not taught or otherwise rendered obvious by Aichele, Michelsen, Sampedro, Dosovitskiy, Salame, Caplan, and Oliner, individually or in combination. (last par. on pg. 20)

Oliner discloses using the ranking to select among “model types” including “reinforcement learning” models (e.g. par. [0395]). Dosovitskiy teaches reinforcement learning models include a reinforcement function (e.g. par. bridging pp. 5-6). Accordingly, in combination Oliner and Dosovitskiy teach selecting among reinforcement learning models including a reinforcement function based on a ranking of the performance of such models. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 are rejected under 35 U.S.C. 103 as being unpatentable over US 9,671,777 B1 to Aichele et al. (Aichele) in view of “CARLA: An Open Urban Driving Simulator” by Dosovitskiy et al. (Dosovitskiy) in view of US 10,058,995 B1 to Sampedro et al. (Sampedro) in view of US 8,418,000 to Salame (Salame) in view of US 9,110,496 B1 to Michelsen (Michelsen).

Claim 1: Aichele discloses a computer-implemented method, comprising: 
obtaining, at a service provider network (col. 14, lines 26-31 “systems that provide cloud-based resources”), a first set of parameters of a robotic device (col. 5, lines 54-57 “receive task data”, col. 7, lines 43-47 “data associated with … architecture for robots, … working environments, mechanical and material properties of objects, kinematics …”) and a second set of parameters specifying a learning model (col. 8, lines 24-28 “provide a learning process for optimization of robot behavior and task results”); 

executing, in the simulation environment using provisioned computing resources, the application for the robotic device to obtain data indicating simulated performance of the application (col. 9, lines 31-34 “the plurality of plausible tests may be executed substantially simultaneously on the simulated robot”); and 
updating the learning model based on at least one of the simulated performance or an update to the customer-defined reinforcement function (col. 8, lines 24-28 “provide a learning process for optimization of robot behavior and task results”); 
providing the data to fulfill the first request (col. 9, lines 40-42 “test results … of the execution … may be obtained”); 
obtaining a second request to install the application comprising the updated learning model on to the robotic device (col. 11, lines 12-14 “In the case of the success result 420, a step 460 maybe performed by running the plausible test … on a physical robot”); 
transmitting the application from the service provider network to the fleet of robotic devices associated with the customer account that includes the robotic device (col. 11, lines 12-14 “running the plausible test … on a physical robot”); 
monitoring performance of the robotic device in a physical environment resulting from the execution of the application in the physical environment (col. 11, lines 12-14 “running the plausible test … on a physical robot”, those of ordinary skill in the art would have understood 
updating learning model based at least in part on the monitored performance in the physical environment (col. 10, lines 48-53 “The test parameters … may be adjusted automatically by employing machine learning”).

Aichele does not disclose: 
the second set of parameters specifying a reinforcement learning model, wherein the reinforcement learning model comprises a reinforcement function; and
updating the reinforcement learning model based on at least one of the simulated performance or an update to the customer-defined reinforcement function.

Dosovitskiy teaches:
a reinforcement learning model comprising a reinforcement function (pg. 5, §3.3. 1st par. “deep reinforcement learning … based on a reward signal provided by the environment”); and
updating the reinforcement learning model based on the simulated performance (par. bridging pp. 5-6 “The reward is a weighted sum of five terms: positively weighted speed and distance … negatively weighted collision damage, overlap with the sidewalk, and overlap with the opposite lane”).

st par. “deep reinforcement learning”). Those of ordinary skill in the art would have been motivated to do so as a known type of learning model which would have produced only the expected results (e.g. Dosovitskiy pg. 5, §3.3 “shown to perform well in simulated three-dimensional environment … enables running multiple simulation threads in parallel”). 

Aichele and Dosovitskiy do not teach:
the application comprising the reinforcement learning model for the robotic device, wherein the reinforcement learning model comprises a reinforcement function;
the second request to install an application comprising the updated reinforcement learning model on to a fleet of robotic devices, the fleet including the robotic device.

Sampedro teaches:
an application comprising a learning model (col. 12, lines 11-17 “the robot instruction may include instructions for utilizing the neural network model”), wherein the reinforcement learning model comprises a user-defined learning function (col. 18, lines 59-63 “a neural network model may be provided in a request along with the robot instructions”);
a request to install an application comprising a learning model on to the fleet of robotic devices (col. 10, line 65-col. 11, line 3 “load the robot instruction … in memory”, col. 17, lines 25-35 “operating each of a plurality of testing robots”).

st par. “deep reinforcement learning … based on a reward signal provided by the environment”). Those of ordinary skill in the art would have been motivated to do so as a known means of further ensuring the application functions as intended when executed on the robots (Aichele col. 11, lines 12-14 “running the plausible test … on a physical robot”, also see e.g. Sampedro col. 7, lines 1-21) while providing additional flexibility in the training of the model (see e.g. Sampedro col. 10, lines 45-49 “a machine learning model (and/or properties of a machine learning model”).

Aichele, Dosovitskiy and Sampedro do not explicitly teach:
a fleet of robotic devices associated with a customer account; and
a customer-defined reinforcement function.

Salame teaches:
an environment associated with a customer account (col. 11, lines 1-9 “each customer … being an owner or operator of its own functionally complex system 470”); and
a first set of parameters associated with a customer account (col. 11, lines 1-9 “testing services being delivered by test system 400 … on behalf of a plurality of customers”, e.g. col. 11, lines 41-45 “users may create … one or more test rules 437”). 

st par. “a reward signal provided by the environment”), and fleet of robots (Sampedro col. 17, lines 25-35 “operating each of a plurality of testing robots”) with a customer account (Salame col. 11, lines 1-9 “on behalf of a plurality of customers”, col. 11, lines 41-45 “one or more test rules 437”, Sampedro col. 18, lines 59-63 “provided in a request along with the robot instructions”). Those of ordinary skill in the art would have been motivated to do so as a means of monetizing the testing.

Aichele, Dosovitskiy, Sampedro and Salame do not explicitly teach:
provisioning computing resources determined to be sufficient to execute the simulation environment and train the reinforcement learning model, wherein sufficiency of the computing resources is determined based, at least in part, on the second set of parameters associated with specifying the reinforcement learning model; 

Michelsen teaches:
provisioning computing resources determined to be sufficient to execute a simulation environment (col. 9, lines 8-13 “test manager 210 can access and identify, from a catalog … a set of pre-production resources … needed for a particular virtual test lab”), wherein sufficiency of the computing resources is determined based, at least in part, on a set of parameters associated with specifying the testing (col. 2, lines 1-3 “provisioned based, at least in part, on the predicted requirements of the test”). 



Claim 2: Aichele, Dosovitskiy, Sampedro, Salame and Michelsen teach the computer-implemented method of claim 1, wherein the method further comprises:
obtaining sensor data corresponding to responses of the robotic device to a physical environment (Aichele col. 8, lines 36-38 “generate plausible sensory data”, Sampedro col. 18, lines 35-39 “sensor data from one or more sensors of the robot”); 
updating, based on the sensor data, the application resulting in an updated application (Aichele col. 9, lines 47-49 “At operation 260, the results of the execution may be analyzed to select an optimized robot control program”, Sampedro col. 8, lines 18-24 “the instructions … may be updated”); and 
transmitting, to the fleet of robotic devices, the updated application (Sampedro col. 18, lines 15-20 “load the robot instructions … in one or more computer readable media associated with the testing robots for execution”).

Claim 3: Aichele, Dosovitskiy, Sampedro, Salame and Michelsen teach the computer-implemented method of claim 1, wherein the method further comprises:
establishing a communications channel to the service provider network, through the application, with the robotic device (Sampedro col. 10, lines 19-22 “one or more computing devices … in network communication with, robots 180A and 108B”);
transmitting, over the communications channel, executable instructions to cause the robotic device to perform a set of actions (Sampedro col. 18, lines 15-20 “load the robot instructions … in one or more computer readable media associated with the testing robots for execution”); and 
obtaining, over the communications channel, additional data from the robotic device, the additional data specifying responses to execution of the executable instructions (Sampedro col. 18, lines 40 -41 “provides at least some of the stored data”).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over US 9,671,777 B1 to Aichele et al. (Aichele) in view of “CARLA: An Open Urban Driving Simulator” by Dosovitskiy et al. (Dosovitskiy) in view of US 10,058,995 B1 to Sampedro et al. (Sampedro) in view of US 8,418,000 to Salame (Salame) in view of US 9,110,496 B1 to Michelsen (Michelsen) in view of US 2017/0195483 to Gault (Gault).

Claim 4: Aichele, Dosovitskiy, Sampedro, Salame and Michelsen teach the computer-implemented method of claim 1, wherein the provisioning further comprises:

performing parallel simulations, in the simulation environment, of the application using the computing resources (Aichele col. 9, lines 31-34 “the plurality of plausible tests may be executed substantially simultaneously”, Michelsen col. 2, lines 49-51 a plurality of tests … executed at least partially in parallel”).

Aichele, Dosovitskiy, Sampedro, Salame and Michelsen do not teach the network connection is a virtual private network.

Gault teaches establishing a virtual private network connected to a component (par. [0020] “creating at least one tunnel, particularly of the VPN type”).

It would have been obvious at the time of filing to establish a virtual private network connected to the customer component (Sampedro col. 10, lines 19-22 “in network communication with, robots 180A and 108B”, par. [0020] “creating at least one tunnel, particularly of the VPN type”). Those of ordinary skill in the art would have been motivated to do so as a known means of providing secure communications which would have produced only the expected results. 

Claims 5-7 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over US 9,671,777 B1 to Aichele et al. (Aichele) in view of “CARLA: An Open Urban Driving Simulator” by Dosovitskiy et al. (Dosovitskiy) in view of US 8,418,000 to Salame (Salame) in view of US 9,4110,496 B1 to Michelsen (Michelsen).
 
Claim 5: Aichele discloses a system, comprising: 
one or more processors (e.g. Fig. 6 “Processor Units” 610); and 
memory that stores computer-executable instructions (e.g. Fig. 6, “Main Memory” 620) that, if executed, cause the one or more processors to: 
obtain a first set of parameters of a robotic device and a second set of parameters specifying a simulation environment for testing an application of the robotic device (col. 5, lines 54-57 “receive task data”, col. 7, lines 43-47 “data associated with … architecture for robots, … working environments, mechanical and material properties of objects, kinematics …”), the first set of parameters indicating a data storage location of the application (col. 9, lines 7-17 “the task … associated with the robot control program can be retrieved from the database based on a description of the at least one task in the task data”) and the second set of parameters indicating a selection of the simulation environment from a set of simulation environments (col. 9, lines 18-19 “a description of the working environment of a physical robot may be obtained from the database”); 
obtain a learning model for the application (col. 10, lines 48-53 “test parameters … may be adjusted automatically by employing machine learning … based on results of the execution … and the given criteria for correctness”);

load the application (col. 5, lines 54-57 “create a plurality of plausible tests”), 
execute the set of simulations with the application to train the learning model (col. 10, lines 48-53 “test parameters … may be adjusted automatically by employing machine learning … based on results of the execution … and the given criteria for correctness”); and
send a notification indicating that the execution is complete (col. 10, lines “visualizing results 370”).

Aichele does not explicitly disclose:
obtaining a reinforcement learning model for the application and a termination condition for completing training of the reinforcement learning model, wherein the reinforcement learning model comprises a customer-defined reinforcement function. 

Dosovitskiy teaches:
obtaining a reinforcement learning model (pg. 5, §3.3. 1st par. “deep reinforcement learning”) and a set of termination condition for completing training of the reinforcement learning model (pp. 5-6, §3.3, 2nd par. “the vehicle reaches the goal … collides with an obstacle, or when a time budget is exhausted”), wherein the reinforcement learning model comprises a st par. “a reward signal provided by the environment”); 
executing a simulation with an application to train the reinforcement learning model until the termination condition is reached (pp. 5-6, §3.3, 2nd par. “The episode is terminated when the vehicle reaches the goal … collides with an obstacle, or when a time budget is exhausted”).

It would have been obvious at the time of filing to obtain a reinforcement learning model (Aichele col. 10, lines 48-53 “employing machine learning”, Dosovitskiy pg. 5, §3.3. 1st par. “deep reinforcement learning”) comprising a reinforcement function () and termination conditions (Dosovitskiy pp. 5-6, §3.3, 2nd par. “The episode is terminated”). Those of ordinary skill in the art would have been motivated to do so as a known type of learning model which would have produced only the expected results (e.g. Dosovitskiy pg. 5, §3.3 shown to perform well in simulated three-dimensional environment … enables running multiple simulation threads in parallel”). 

Aichele and Dosovitskiy do not explicitly teach:
the first set of parameters of a robotic device associated with a customer account, the first set of parameters indicating the customer account; and
wherein the reinforcement learning model comprises a customer-defined reinforcement function. 


wherein a testing model comprises a customer-defined function (col. 11, lines 41-45 “one or more test rules 437”). 

It would have been obvious at the time of filing to associate the first set of parameters, including an indication of the customer account, with a customer account (e.g. Salame col. 12, lines 41-45 “Using test manager 430, users may …”) and further to obtain a customer-defined reinforcement function (Salame col. 11, lines 41-45 “one or more test rules 437”, Dosovitskiy pg. 5, §3.3. 1st par. “a reward signal provided by the environment”). Those of ordinary skill in the art would have been motivated to do so as a means of monetizing the testing.

Aichele, Dosovitskiy and Salame do not teach:
selecting, from a pool of resources, a set of resources on which to execute a set of simulations in the simulation environment, the set of resources selected based, at least in part, 
loading the application onto the set of resources.

Michelsen teaches:
selecting, from a pool of resources, a set of resources on which to execute a set of simulations in the simulation environment (col. 9, lines 8-13 “test manager 210 can access and identify, from a catalog … a set of pre-production resources … needed for a particular virtual test lab”, e.g. col. 4, lines 20-27 “a pool of available cloud servers”), the set of resources selected based, at least in part, on sufficiency of the set of resources for executing the test (col. 2, lines 1-3 “provisioned based, at least in part, on the predicted requirements of the test”); and 
loading the application onto the set of resources (col. 9, lines 13-18 “provision one or more allocated computing devices with the test lab resources”).

It would have been obvious at the time of filing to select and load the application onto a set of resources for simulating the execution of the application (Michelsen col. 9, lines 13-18 “dynamically provision one or more allocated computing devise with the test lab resources”) based on the second set of parameters (Aichele e.g. col. 7, lines 43-47 “data associated with … architecture for robots, … working environments, mechanical and material properties of objects, kinematics …”, Sampedro col. 4, lines 9-14 “the request further includes a machine learning model”). Those of ordinary skill in the art would have been motivated to provide an 

Claim 6: Aichele, Dosovitskiy, Salame and Michelsen teach the system of claim 5, wherein the computer-executable instructions further cause the one or more processors to: 
monitor execution of the set of simulations in the simulation environment to obtain data indicative of performance of the application (Aichele col. 9, lines 40-42 “test results … of the execution … may be obtained”); and 
provide the data to enable modification of the application and the reinforcement learning model comprising the customer-defined reinforcement function, based on at least one of the performance of the application or the termination condition (Aichele col. 10, lines 48-52 “machine learning … based on results of the execution”, Dosovitskiy par. bridging pp. 5-6 “terminated when the vehicle reaches the goal, when the vehicle collides with an obstacle … positively weighted … distance traveled towards the goal, and negatively weighted collision damage”).

Claim 7: Aichele, Dosovitskiy, Salame and Michelsen teach the system of claim 5, wherein the computer-executable instructions further cause the one or more processors to: 
identify, based on the first set of parameters and the second set of parameters, a number of simulations to be performed for testing of the application in the simulation environment (Aichele col. 5, lines 55-57 “create a plurality of plausible tests for a robot control 
select, based on the number of simulations to be performed, the set of resources on which to execute the set of simulations (Michelsen col. 9, lines 8-13 “test manager 210 can access and identify, from a catalog … a set of pre-production resources … needed for a particular virtual test lab”).

Claim 12: Aichele, Dosovitskiy, Salame and Michelsen teach the system of claim 5, wherein the computer-executable instructions further cause the one or more processors to: 
monitor execution of the simulation in the simulation environment (Aichele col. 9, lines 40-42 “test results … of the execution … may be obtained”, Michelsen col. 4, lines Statistical data can be collected during the test”); 
determine, prior to the termination condition being reached, that additional resources are required to support continued execution of the simulation in the simulation environment (Michelsen col. 4, lines 20-23 “determine whether the set of hardware originally-provisioned for the test is adequate”); and 
provision the additional resources to enable the continued execution of the simulation (Michelsen col. 4, lines 23-27 “additional hardware can be quickly and dynamically … allocated and provisioned”).

Claims 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over US 9,671,777 B1 to Aichele et al. (Aichele) in view of “CARLA: An Open Urban Driving Simulator” by Dosovitskiy et al. (Dosovitskiy) in view of US 8,418,000 to Salame (Salame) in view of US 9,110,496 B1 to Michelsen (Michelsen) in view of US 10,058,995 B1 to Sampedro et al. (Sampedro).

Claim 8: Aichele, Dosovitskiy, Salame and Michelsen teach the system of claim 5, wherein the computer-executable instructions further cause the one or more processors to transmit, to the robotic device associated with the customer account (Salame col. 11, lines 1-9 “each customer … being an owner or operator of its own functionally complex system 470”), the application to cause the robotic device to install and execute the application (Aichele col. 11, lines 12-14 “In the case of the success result 420, a step 460 maybe performed by running the plausible test … on a physical robot”).

Aichele, Dosovitskiy, Salame and Michelsen do not teach transmitting to a fleet of robotic devices that includes the robotic device, the application to cause the fleet of robotic devices to install and execute the application.

Sampedro teaches transmitting to a fleet of robotic devices that includes a robotic device, the application to cause the fleet of robotic devices to install and execute the application (col. 17, lines 25-35 “operating each of a plurality of testing robots”, col. 18, lines 15-20 “load the robot instructions … in one or more computer readable media associated with the testing robots for execution”).



Claim 9: Aichele, Dosovitskiy, Salame, Michelsen and Sampedro teach the system of claim 8, wherein the computer-executable instructions further cause the one or more processors to: 
detect an issue with the application that prevents further execution of the application on the fleet of robotic devices (Sampedro data that indicates error conditions(s) encountered during operation of the robot (e.g. due to … safety limits to be exceeded)”, it would at least have been obvious to halt execution when a safety limit is exceed in order to ensure safe operation); 
identify a different version of the application (Aichele col. 9, lines 47-49 “At operation 260, the results of the execution may be analyzed to select an optimized robot control program”, Sampedro col. 8, lines 18-24 “the instructions … may be updated”); and 
transmit, to the fleet of robotic devices, the different version of the application to cause the fleet of robotic devices to install and execute the different version of the application 

Claim 10: Aichele, Dosovitskiy, Salame, Michelsen and Sampedro teach the system of claim 8, wherein the computer-executable instructions further cause the one or more processors to: 
establish a communications channel with the robotic device (Sampedro col. 10, lines 19-22 “in network communication with, robots 180A and 108B”); 
obtain a set of executable instructions that, if executed by the robotic device, cause the robotic device to perform a set of operations (Aichele col. 5, lines 54-57 “receive task data”, Sampedro col. 18, lines 15-20 “load the robot instructions”); and 
transmit, over the communications channel, the set of executable instructions (Sampedro col. 18, lines 15-20 “load the robot instructions … in one or more computer readable media associated with the testing robots for execution”).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over US 9,671,777 B1 to Aichele et al. (Aichele) in view of “CARLA: An Open Urban Driving Simulator” by Dosovitskiy et al. (Dosovitskiy) in view of US 8,418,000 to Salame (Salame) in view of US 9,110,496 B1 to Michelsen (Michelsen) in view of US 10,058,995 B1 to Sampedro et al. (Sampedro) in view of US 2006/0080534 to Yeap et al. (Yeap).

Claim 11: Aichele, Dosovitskiy, Salame, Michelsen and Sampedro teach the system of claim 10, but do not explicitly teach wherein the computer-executable instructions further cause the one or more processors to: 
obtain, from the robotic device over the communications channel, a request to access a second set of resources, the request including a digital certificate of the robotic device;
authenticate, based on the digital certificate, the robotic device; and 
provide access to the second set of resources in accordance with access control policies associated with the digital certificate.

Yeap teaches obtaining, from a device over a communications channel, a request to access a set of resources, the request including a digital certificate of the robotic device (par. [0008] “receiving a digital certificate from a device”);
authenticating, based on the digital certificate, the device (par. [0011] “if the identifier is determined to be valid”); and 
provide access to the resources in accordance with access control policies associated with the digital certificate (par. [0011] permitting the device to access the resource”).

It would have been obvious at the time of filing to obtain and authenticate a digital certificate (Yeap par. [0011] “if the identifier is determined to be valid”) to provide access to a second set of resources (Yeap par. [0011] permitting the device to access the resource”, Michelsen col. 9, lines 8-13 “test manager 210 can access and identify, from a catalog … a set of pre-production resources … needed for a particular virtual test lab”). Those of ordinary skill in .

Claims 13-15 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over US 9,671,777 B1 to Aichele et al. (Aichele) in view of “CARLA: An Open Urban Driving Simulator” by Dosovitskiy et al. (Dosovitskiy) in view of US 2016/0042292 to Caplan et al. (Caplan) in view of US 10,058,995 B1 to Sampedro et al. (Sampedro) in view of US 8,418,000 to Salame (Salame) in view of US 9,4110,496 B1 to Michelsen (Michelsen).

Claim 13: Aichele discloses non-transitory computer-readable storage medium comprising executable instructions that, as a result of being executed by one or more processors of a computer system, cause the computer system to at least: 
obtain a first set of parameters of a robotic device and a second set of parameters specifying a simulation environment for testing an application of the robotic device (col. 5, lines 54-57 “receive task data”, col. 7, lines 43-47 “data associated with … architecture for robots, … working environments, mechanical and material properties of objects, kinematics …”); 
obtain a set of learning models (col. 8, lines 24-28 “provide a learning process”, note that a set may contain a single element); 
execute the set of simulations in the simulation environment, each simulation simulating a learning model of the set of learning models (col. 9, lines 31-34 “the plurality of plausible tests may be executed substantially simultaneously”); 

transmit, to the robotic device, a resource identifier indicating a location of the application data to cause the fleet of robotic devices to install and execute the application (col. 11, lines 12-14 “running the plausible test … on a physical robot”, col. 9, lines 7-17 “the task … associated with the robot control program can be retrieved from the database based on a description of the at least one task in the task data” note that here the “description” is disclosed as sufficient to identify the task resource and thus constitutes an “resource identifier”).

Aichele does not disclose:
a set of reinforcement learning models, wherein a reinforcement learning model of the set of reinforcement learning models comprises a reinforcement function.

Dosovitskiy teaches:
a set of reinforcement learning models, wherein a reinforcement learning model of the set of reinforcement learning models comprises reinforcement function (pg. 5, §3.3. 1st par. “deep reinforcement learning …. a reward signal provided by the environment”).

It would have been obvious at the time of filing to obtain a set of reinforcement learning models (Aichele col. 10, lines 48-53 “employing machine learning”, Dosovitskiy pg. 5, §3.3. 1st 

Aichele and Dosovitskiy do not explicitly teach 
obtaining a selection from the set of reinforcement learning models indicating a selected reinforcement learning model; 
updating the application based on the selected learning model.

Caplan teaches:
a plurality of learning models (par. [0030] “models are generated … stored in the Model Database 120”);
obtaining a selection from the set of reinforcement learning models indicating a selected reinforcement learning model (par. [0031] “identify one or more preferred models”); 
updating the application based on the selected learning model (par. [0031] “the preferred model … may be applied to the new example”).

It would have been obvious at the time of filing to obtain, simulate and select a set of reinforcement learning models (Caplan par. [0031] “identify one or more preferred models”, Dosovitskiy pg. 5, §3.3. 1st par. “deep reinforcement learning”) to update the application (Aichele col. 10, lines 48-53 “test parameters … may be adjusted automatically by employing 

Aichele, Dosovitskiy and Caplan do not teach:
selecting a fleet of robotic devices from a set of robotic devices; and 
transmitting the resource identifier to the fleet of robotic devices.

Sampedro teaches:
selecting a fleet of robotic devices from a set of robotic devices (col. 18, lines 6-10 “a quantity of testing robots that are configured at block 454”); and 
transmitting application data to the fleet of robotic devices (col. 18, lines 15-20 “load the robot instructions … in one or more computer readable media associated with the testing robots for execution”).

It would have been obvious at the time of filing to select a fleet of robotic devices (Sampedro col. 17, lines 25-35 “operating each of a plurality of testing robots”, Aichele col. 11, lines 12-14 “running the plausible test … on a physical robot”), and to transmit the resource identifier to the fleet of robotic devices (Aichele col. 11, lines 12-14 “running the plausible test … on a physical robot”, col. 9, lines 7-17 “a description of the at least one task in the task data”, Sampedro col. 18, lines 15-20 “load the robot instructions”).Those of ordinary skill in the art would have been motivated to do so as a means of further ensuring the application functions as 

Aichele, Dosovitskiy, Caplan and Sampedro do not teach:
a first set of parameters of a robotic device associated with the customer account; 
a set of robotic devices associated with the customer account; and 
a customer-defined reinforcement function.

Salame teaches: 
a first set of parameters associated with a customer account (col. 11, lines 1-9 “testing services being delivered by test system 400 … on behalf of a plurality of customers”, e.g. col. 11, lines 41-45 “one or more test rules 437”); 
a testing system associated with the customer account (col. 11, lines 1-9 “each customer … being an owner or operator of its own functionally complex system 470”); and
a customer-defined testing function (col. 11, lines 41-45 “one or more test rules 437”).

It would have been obvious at the time of filing to associate the first set of parameters (e.g. Aichele col. 7, lines 43-47 “data associated with … architecture for robots”), fleet of robotic devices (Sampedro col. 17, lines 25-35 “operating each of a plurality of testing robots”) and reinforcement function (Dosovitskiy pg. 5, §3.3. 1st par. “a reward signal”) with a customer account (Salame col. 11, lines 1-9 “on behalf of a plurality of customers”). Those of ordinary skill in the art would have been motivated to do so as a means of monetizing the testing.

Aichele, Dosovitskiy, Caplan, Sampedro and Salame do not teach:
selecting, from a pool of computer system instances, a set of computer system instances on which to execute a set of simulations in the simulation environment, the set of computing system instances determined to be sufficient for executing the set of simulations and training the set of reinforcement learning models. 

Michelsen teaches:
selecting, from a pool of computer system instances, a set of computer system instances on which to execute a set of tests (col. 9, lines 8-13 “test manager 210 can access and identify, from a catalog … a set of pre-production resources … needed for a particular virtual test lab”, e.g. col. 4, lines 20-27 “a pool of available cloud servers”), the set of computing system instances determined to be sufficient for executing the tests (col. 2, lines 1-3 “provisioned based, at least in part, on the predicted requirements of the test”).

It would have been obvious at the time of filing to select and load the application onto a set of resources sufficient for simulating the execution of the application and training the set of learning models (Michelsen col. 9, lines 13-18 “dynamically provision one or more allocated computing devise with the test lab resources”) based on the second set of parameters (Aichele e.g. col. 7, lines 43-47 “data associated with … architecture for robots, … working environments, mechanical and material properties of objects, kinematics …”, Sampedro col. 4, lines 9-14 “the request further includes a machine learning model”). Those of ordinary skill in the art would 

Claim 14: Aichele, Dosovitskiy, Caplan, Sampedro, Salame and Michelsen and teach the non-transitory computer-readable storage medium of claim 13, wherein the instructions that cause the computer system to execute the set of simulations in the simulation environment further cause the computer system to: 
determine, based on the first set of parameters and the second set of parameters, a number of simulations to be performed (Aichele col. 5, lines 55-57 “create a plurality of plausible tests for a robot control program”, Michelsen col. 2, lines 49-51 a plurality of tests … executed at least partially in parallel”); 
identify, for each simulation of the number of simulations, a number of computer system instances for execution of the simulation (Michelsen col. 9, lines 8-13 “test manager 210 can access and identify, from a catalog … a set of pre-production resources … needed for a particular virtual test lab”); and 
based on the number of computer system instances for execution of each simulation of the number of simulations, identify the set of computer system instances (Michelsen col. 9, lines 8-13 “test manager 210 can access and identify, from a catalog … a set of pre-production resources … needed for a particular virtual test lab”, e.g. col. 4, lines 20-27 “a pool of available cloud servers”).

Claim 15: Aichele, Dosovitskiy, Caplan, Sampedro, Salame and Michelsen teach the non-transitory computer-readable storage medium of claim 13, wherein the instructions further cause the computer system to: 
obtain an indication, from a subset of the fleet of robotic devices, that an error impacting execution of the application has been detected by the subset of the fleet of robotic devices (Sampedro col. 6, lines 26-34 “data that indicates error conditions(s) encountered during operation of the robot (e.g. due to … safety limits to be exceeded)”); and 
in response to the indication, transmit second application data corresponding to a different version of the application to the subset of the fleet of robotic devices to cause the subset of the fleet of robotic devices to install and execute the different version of the application (Aichele col. 9, lines 47-49 “At operation 260, the results of the execution may be analyzed to select an optimized robot control program”, Sampedro col. 8, lines 18-24 “the instructions … may be updated”).

Claim 17: Aichele, Dosovitskiy, Caplan, Sampedro, Salame and Michelsen teach the non-transitory computer-readable storage medium of claim 13, wherein the instructions further cause the computer system to: 
obtain, from the robotic device, data generated in response to sensor inputs obtained by the robotic device (Sampedro col. 18, lines 40-43 “provides at least some of the stored data in response to the request”); and 
update, using the data, a graphical user interface to enable monitoring of the robotic device during execution of the application (Aichele col. 9, lines human-machine interface (HMI) 

Claim 18: Aichele, Dosovitskiy, Caplan, Sampedro, Salame and Michelsen teach the non-transitory computer-readable storage medium of claim 17, wherein the instructions further cause the computer system to: 
obtain, via the graphical user interface, computer-executable instructions that, if executed by the robotic device, cause the robotic device to perform a set of operations (Aichele col. 9, lines “human-machine interface (HMI) operable to receive the task data”, also see e.g. col. 13, lines 40-44, Sampedro e.g. col. 15, lines 25-28 “an example graphical interface that may be utilized to define environmental parameters for the operation of testing robots”);
transmit the computer-executable instructions to the robotic device (Sampedro col. 18, lines 15-20 “load the robot instructions … in one or more computer readable media associated with the testing robots for execution”); and 
monitor execution of the computer-executable instructions using additional data generated in response to second sensor inputs obtained by the robotic device as a result of the execution of the computer-executable instructions (Aichele col. 9, lines 40-42 “test results … of the execution … may be obtained”, Sampedro col. 15, lines 25-28 “an example graphical interface that may be utilized … for the operation of testing robots”).

Claim 19 Aichele, Dosovitskiy, Caplan, Sampedro, Salame and Michelsen teach the non-transitory computer-readable storage medium of claim 13, wherein the instructions further cause the computer system to: 
monitor execution of the set of simulations in the simulation environment (Aichele col. 9, lines 40-42 “test results … of the execution … may be obtained”, Michelsen col. 4, lines Statistical data can be collected during the test”); 
determine that additional computer system instances are needed to enable continued execution of the set of simulations (Michelsen col. 4, lines 20-23 “determine whether the set of hardware originally-provisioned for the test is adequate”); and 
provision, from the pool of computer system instances, the additional computer system instances to allow the continued execution of the set of simulations (Michelsen col. 4, lines 23-27 “additional hardware can be quickly and dynamically … allocated and provisioned”).

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over US 9,671,777 B1 to Aichele et al. (Aichele) in view of US 10,058,995 B1 to Sampedro et al. (Sampedro) in view of “CARLA: An Open Urban Driving Simulator” by Dosovitskiy et al. (Dosovitskiy) in view of US 8,418,000 to Salame (Salame) in view of US 9,4110,496 B1 to Michelsen (Michelsen) in view of US 2016/0042292 to Caplan et al. (Caplan) in view of US 2006/0080534 to Yeap et al. (Yeap).

Claim 16: Aichele, Dosovitskiy, Caplan, Sampedro, Salame and Michelsen teach the non-transitory computer-readable storage medium of claim 13, but do not explicitly teach wherein the instructions further cause the computer system to: 
obtain, from the robotic device, a request to access a set of resources, the request including a digital certificate of the robotic device; 
evaluate the digital certificate of the robotic device to identify a set of access control policies specifying a level of access to the set of resources; and 
allow the robotic device to access the set of resources subject to the set of access control policies.

Yeap teaches obtaining, from a device over a communications channel, a request to access a set of resources, the request including a digital certificate of the robotic device (par. [0008] “receiving a digital certificate from a device”);
authenticating, based on the digital certificate, the device (par. [0011] “if the identifier is determined to be valid”); and 
provide access to the resources in accordance with access control policies associated with the digital certificate (par. [0011] “permitting the device to access the resource”).

It would have been obvious at the time of filing to obtain and authenticate a digital certificate (Yeap par. [0011] “if the identifier is determined to be valid”) to provide access to a second set of resources (Yeap par. [0011] permitting the device to access the resource”, Michelsen col. 9, lines 8-13 “test manager 210 can access and identify, from a catalog … a set of .

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over US 9,671,777 B1 to Aichele et al. (Aichele) in view of US 10,058,995 B1 to Sampedro et al. (Sampedro) in view of “CARLA: An Open Urban Driving Simulator” by Dosovitskiy et al. (Dosovitskiy) in view of US 8,418,000 to Salame (Salame) in view of US 9,4110,496 B1 to Michelsen (Michelsen) in view of US 2016/0042292 to Caplan et al. (Caplan) in view of US 2018/0349482 to Oliner et al. (Oliner).

Claim 20: Aichele, Dosovitskiy, Caplan, Sampedro, Salame and Michelsen teach the non-transitory computer-readable storage medium of claim 13, wherein the instructions that cause the computer system to execute the set of simulations in the simulation environment further cause the computer system to utilize the set of reinforcement learning models based on performance of a simulated robotic device within the simulation environment (e.g. Aichele col. 10, lines 48-53 “test parameters … may be adjusted automatically by employing machine learning … based on results of the execution … and the given criteria for correctness”).

Aichele, Dosovitskiy, Caplan, Sampedro, Salame and Michelsen do not teach the instructions further causing the system to:

present the ranking of the set of reinforcement learning models,
wherein the ranking provides information usable for at least one of updating the customer-defined reinforcement function or for defining a new customer-defined reinforcement function.

Oliner teaches generating a ranking of a set of reinforcement learning models (par. [0395] “determine a … ranked list of model types”, par. [0395] “reinforcement learning models”); and 
presenting the ranking of the set of reinforcement learning models (par. [0395] “solicit user input for selecting from available options”),
wherein the ranking provides information usable for updating the learning model (par. [0395] “selecting from available options”).

It would have been obvious to rank and present a set of reinforcement learning models (Oliner par. [0395] “ranked list of model types”) and to use this ranking to update or define a new reinforcement function (Aichele col. 10, lines 48-53 “test parameters … may be adjusted … based on results of the execution”, Sampedro col. 8, lines 18-24 “operated based on the updated instructions”, Dosovitskiy pg. 5, §3.3. 1st par. “a reward signal provided by the environment”). Those of ordinary skill in the art would have been motivated to do so as a means of selecting a particular model to use (e.g. Oliner par. [0395] “identifying a model type for the prospective model”).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASON D MITCHELL whose telephone number is (571)272-3728. The examiner can normally be reached Monday through Thursday 7:00am - 4:30pm and alternate Fridays 7:00am 3:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached on (571)272-3759. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JASON D MITCHELL/Primary Examiner, Art Unit 2199