DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Allowable Subject Matter
Claim 11-12 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  prior art of record does not teach or fairly suggest the method in which processing the representation of the candidate next simulation state using a discriminator neural network to generate the discriminative score characterizing the likelihood that the candidate next simulation state is a realistic simulation state comprises: 
processing, for each agent, the representation of the next state of the agent at the next time step corresponding to the candidate next simulation state using the discriminator neural network to generate an agent-specific discriminative score characterizing a likelihood that the next state of the agent is a realistic agent state; and generating the discriminative score characterizing the likelihood that the candidate next simulation state is a realistic simulation state based on the agent-specific discriminative scores.
Rather, the closest prior art of record, Siddiqui (US 20200353943 A1), teaches processing, for each agent, the representation of the next state of the agent at the next time step using the discriminator neural network to generate a discriminative score characterizing a likelihood that the next state of the agent is a realistic agent state. Siddiqui’s discriminator score indicates the extent to which the generated data (generated trajectories) looks realistic but is silent on generating a discriminative score characterizing the likelihood that the candidate next simulation state is a realistic simulation state based on the agent-specific discriminative scores.
The closest non-patent literature (NPL) of record, Li “Interaction-aware Multi-agent Tracking and Probabilistic Behavior Prediction via Adversarial Learning”, teaches a method of tracking multiple interactive agents and modelling the joint distribution of their future behaviors at current time step. The method uses a generator and discriminator neural network in which the goal of the discriminator is to distinguish whether the given sample (trajectory) is real or produced by the generator. Li’s discriminator, however, does not generate a discriminative score characterizing the likelihood that the candidate next simulation state is a realistic simulation state based on the agent-specific discriminative scores.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-10 and 13-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Siddiqui (US 20200353943 A1) in view of Isele (US 20200391738 A1)

Regarding claim 1, Siddiqui teaches
A method performed by one or more data processing apparatus for generating a simulation of an environment that is being interacted with by a plurality of agents over a plurality of time steps, wherein the simulation comprises a respective simulation state for each time step that specifies a respective state of each agent at the time step, the method comprising, for each time step: see at least FIG. 1; FIG.’s 3A-3B and Abstract where a system 100 includes computer programs encoded on computer storage media for generating a driving scenario machine learning network and providing a simulated driving environment. [0029] discusses that system 100 uses high definition map data and driving scenarios to allow simulated control of vehicles. Also see at least [0038] where the system 100 evaluates a dynamic object’s velocity and direction from one time instant to the next time instant. Also see [0048] where at each time instant, the vehicle has particular states in which the vehicle operates.
obtaining a current simulation state for the current time step; see at least FIG. 2 and [0030] where a simulated 3D environment for autonomous driving is provided (block 240) and then an interaction of autonomous vehicle with dynamic objects is simulated (block 250).
determining, for each candidate next simulation state, a discriminative score characterizing a likelihood that the candidate next simulation state is a realistic simulation state; and see at least [0068]-[0069] where the system 100 may use a Generative Adversarial Network (GAN) which is “a process to make a generative model by having two machine learning networks try to compete with one another. A discriminator tries to distinguish real data from unrealistic data created by a generator.” The discriminator computes a score indicating the extent to which the generated data looks realistic.
selecting a candidate next simulation state as the simulation state for the next time step based on the discriminative scores for the candidate next simulation states. See [0069]-[0070] where the system 100 uses a discriminator to evaluate the generated data by comparing the generated data against the real data (driving scenario data 122) to determine if the generated data looks similar to the real data. The discriminator computes a score indicating the extent to which the generated data looks realistic and based on the scores, the system may perform a policy update to the machine learning network such that the trajectories of dynamic objects are selected based on their ability to appear more and more like realistic trajectory data.

Siddiqui teaches all of the elements of the current invention as stated above except
generating a plurality of candidate next simulation states for a next time step based on the current simulation state, wherein generating each candidate next simulation state comprises: 
sampling, for each agent, a respective action from a set of possible actions that can be performed by the agent; and 
determining, for each agent, a respective next state of the agent at the next time step if the agent performs the corresponding sampled action at the current time step; 
Isele teaches it is known to provide the above elements. See at least FIG. 2 and [0059] where the system 100 for autonomous vehicle interactive decision making may be implemented. The system 100 may analyze possible actions for the autonomous vehicle and other vehicles at a certain time (plurality of candidate next simulation states). The stars [FIG. 2] may represent a successful merge or maneuver (a respective action from a set of possible actions). Also see at least [0037]-[0038] where the system 100 makes predictions of traffic participant behaviors which may be represented as probability distributions to allow for variations in driver motions. Lastly, see [0051] where the system 100 computes the probability of an agent’s action based on its current state, including the continuity or how likely they are to continue doing what they were doing (respective next state).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have modified Siddiqui to incorporate the teachings of Isele and provide the method comprising generating a plurality of candidate next simulation states for a next time step based on the current simulation state, wherein generating each candidate next simulation state comprises: sampling, for each agent, a respective action from a set of possible actions that can be performed by the agent; and determining, for each agent, a respective next state of the agent at the next time step if the agent performs the corresponding sampled action at the current time step. In doing so, by “using the predictions of other agent (e.g., other traffic participants) motions, a set of safe intentions is generated, taking into account the possible different ways in which other agents may respond. Each trajectory may then be evaluated against several metrics (risk, efficiency, etc.) and a trajectory may be selected.” [0037]


Regarding claim 2, Siddiqui in view of Isele teaches
The method of claim 1, wherein the agents are vehicles in the environment. See at least Siddiqui [0036] where dynamic objects include a car, a truck, a semi with a trailer, etc.

Regarding claim 3, Siddiqui in view of Isele teaches
The method of claim 2, wherein the set of possible actions that can be performed by an agent comprise actions that adjust a steering angle of the agent. See at least Isele [0059] and FIG. 2 where the system 100 may analyze possible actions of an agent (“corresponding vehicles 212a, 214a, 216a, etc.) and the stars may represent a successful merge or maneuver (adjusting a steering angle of the agent).

Regarding claim 4, Siddiqui in view of Isele teaches
The method of claim 2, wherein the set of possible actions that can be performed by an agent comprise actions that adjust an acceleration of the agent. See at least Isele [0057] where inputs to the system 100 may include attributes associated with the autonomous vehicle and identified traffic participant(s) including a position, acceleration, velocity, intention prediction, and predicted behavior.

Regarding claim 5, Siddiqui in view of Isele teaches
The method of claim 1, wherein the state of an agent at a time step comprises: (i) a position of the agent at the time step, and (ii) a motion of the agent at the time step. See at least Siddiqui [0050]-[0051] where the system 100 may use various inputs for a particular time instant, for example, a current state of dynamic objects in proximity to the Ego vehicle (i.e. speed and heading) Also see at least [0058] where, at each time instant, the section Dyn Obj 750 represents positions, as indicated as polygons, for each dynamic objects within the area.

Regarding claim 6, Siddiqui in view of Isele teaches
The method of claim 5, wherein the position of the agent comprises: (i) a spatial location of the agent in the environment see at least Siddiqui [0096] where the system 100 may determine state information for a dynamic object such as a geo-spatial coordinate (GPS/GNSS coordinate or latitudinal/longitudinal coordinates), and (ii) a heading of the agent in the environment see at least Siddiqui [0042] where the system 100 may generate heading information for one or more dynamic objects.

Regarding claim 7, Siddiqui in view of Isele teaches
The method of claim 5, wherein the motion of the agent comprises: (i) a speed of the agent, and (ii) an acceleration of the agent. See at least Siddiqui [0096] where the system 100 may determine state information for a dynamic object including an acceleration and rate of speed.

Regarding claim 8, Siddiqui in view of Isele teaches
The method of claim 1, further comprising, for each agent: 
obtaining a representation of the current state of the agent in the environment See at least Siddiqui FIG. 8 which illustrates an example 3D simulated environment and FIG. 9B which illustrates an example user interface depicting a 3D simulated environment; and 
processing the representation of the current state of the agent in the environment using a policy neural network to generate a corresponding probability distribution over the set of possible actions for the agent; See at least Siddiqui FIG. 1 where the representation of the 3D simulated environment is processed using a machine learning network 130. According to [0062]-[0063], the machine learning network 130 may be an artificial recurrent neural network (RNN) or other neural network types which includes policies that are determined over many different dynamic objects. Also see at least Isele [0037]-[0038] where trajectory predictions “may be represented as probability distributions to allow for variations in driver motions.” The sequences of actions may be condensed into intentions which follow a distribution. “Because running forward simulations for all possible ego (e.g., autonomous vehicle) actions and all possible combinations of other agents’ (e.g., all traffic participants) actions is prohibitively expensive, it may be desirable to reduce the branching factor.” This is done by using a probabilistic tree search in which a smaller set of sub-goal intentions is then used for forward simulation.
wherein sampling, for each agent, a respective action from the set of possible actions comprises, for each agent: sampling an action from the set of possible actions in accordance with the probability distribution over the set of possible actions for the agent. See at least Isele [0039]-[0040] where the prediction set of other agents, including all possible intentions for the targeted interactive agent, are compared against generated samples from the selected ego intention class. The action predictor 152 may sample from a limited set of possible actions associated with the identified traffic participant. Also see at least [0049] where, in order to predict the agent’s intentions, predictions from kinematic models of the participants are used. Also see [0031] where probabilities may be utilized to model the expected behavior of other agents.

Regarding claim 9, Siddiqui in view of Isele teaches
The method of claim 1, wherein determining, for each agent, a respective next state of the agent at the next time step if the agent performs the corresponding sampled action at the current time step comprises, for each agent: 
processing data characterizing: (i) a current state of the agent, and (ii) the sampled action for the agent, using a motion model to generate the next state of the agent at the next time step.
See at least Isele [0036] where the action predictor 152 may calculate the coarse probability of the successful merge between the autonomous vehicle and the corresponding traffic participant(s) based on one or more possible actions of the identified traffic participant and one or more possible actions of the autonomous vehicle. This enables the intention predictor 154 and model updater 156 to calculate probabilities and rewards associated with those various possible actions to determine the action or maneuver to be implemented.



Regarding claim 10, Siddiqui in view of Isele teaches
The method of claim 1, wherein determining a discriminative score characterizing a likelihood that a candidate next simulation state is a realistic simulation state comprises: 
obtaining a representation of the candidate next simulation state; and see at least Siddiqui FIG. 6 and [0051] where, based on the calculated trajectories, the system 100 may then determine  what the control 650 should be at the next time instant. Based on the inputs, the system 100 determines the output of the next control for the speed and heading of the vehicle.
processing the representation of the candidate next simulation state using a discriminator neural network to generate the discriminative score characterizing the likelihood that the candidate next simulation state is a realistic simulation state. See at least Siddiqui [0069] where the system 100 may use a discriminator neural network to evaluate the generated data by comparing the generated data against the real data (i.e. the driving scenario data 122) to determine if the generated data looks similar to the real data. The system 100 may compute a score indicating the extent to which the generated data looks realistic. “Over multiple GAN updates, newly generated trajectories created by the generator would look more and more like realistic trajectory data from the original driving scenario data 122.”

Regarding claim 13, Siddiqui in view of Isele teaches
The method of claim 10, wherein the discriminator neural network is trained to generate discriminative scores that characterize an environment state as being realistic if the environment state is a real-world environment state. See at least Siddiqui [0069] where the system 100 may evaluate a machine learning network 130 that was generated from the driving scenario data. The system 100 may evaluate the generated data by comparing the generated data against the real data (i.e. the driving scenario data 122) to determine if the generated data looks similar to the real data. According to [0067], the driving scenario data 122 includes one or more vehicle types and one or more driving categories so that the machine learning network 130 may learn how to drive in different road network topographies, ambient conditions, weather situations, and other types of real-world environments.

Regarding claim 14, Siddiqui in view of Isele teaches
The method of claim 1, wherein selecting a candidate next simulation state as the simulation state for the next time step based on the discriminative scores for the candidate next simulation states comprises: selecting the candidate next simulation state with the highest discriminative score as the simulation state for the next time step. See at least Siddiqui [0069]-[0070] where the system 100 updates the GAN continuously in order for newly generated trajectories created by the generator to look more and more like realistic trajectory data from the original driving scenario data 122. In order for the generated trajectories to look more realistic, the system 100 selects the candidate next simulation state with the highest discriminative score. The system 100 makes a policy update to the machine learning network 130 that best averages achievement of goals with how well the goal beat the current discriminator.


Regarding claim 15, Siddiqui in view of Isele teaches
A system comprising: 
one or more computers; and 
one or more storage devices communicatively coupled to the one or more computers, wherein the one or more storage devices store instructions that, when executed by the one or more computers, cause the one or more computers to perform operations for generating a simulation of an environment that is being interacted with by a plurality of agents over a plurality of time steps, see at least Siddiqui FIG. 1 & 10 and Abstract where computer programs encoded on computer storage media are used for generating a driving scenario machine learning network and providing a simulated driving environment.
wherein the simulation comprises a respective simulation state for each time step that specifies a respective state of each agent at the time step, the operations comprising, for each time step: obtaining a current simulation state for the current time step; generating a plurality of candidate next simulation states for a next time step based on the current simulation state, wherein generating each candidate next simulation state comprises: sampling, for each agent, a respective action from a set of possible actions that can be performed by the agent; and determining, for each agent, a respective next state of the agent at the next time step if the agent performs the corresponding sampled action at the current time step; determining, for each candidate next simulation state, a discriminative score characterizing a likelihood that the candidate next simulation state is a realistic simulation state; and selecting a candidate next simulation state as the simulation state for the next time step based on the discriminative scores for the candidate next simulation states. See preceding logic for claim 1.

Regarding claim 16, Siddique in view of Isele teaches
The system of claim 15, wherein the agents are vehicles in the environment. See preceding logic for claim 2.

Regarding claim 17, Siddique in view of Isele teaches
The system of claim 16, wherein the set of possible actions that can be performed by an agent comprise actions that adjust a steering angle of the agent. See preceding logic for claim 3.


Regarding claim 18, Siddique in view of Isele teaches	
The system of claim 16, wherein the set of possible actions that can be performed by an agent comprise actions that adjust an acceleration of the agent. See preceding logic for claim 3.

Regarding claim 19, Siddique in view of Isele teaches
One or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations for generating a simulation of an environment that is being interacted with by a plurality of agents over a plurality of time steps, wherein the simulation comprises a respective simulation state for each time step that specifies a respective state of each agent at the time step, the operations comprising, for each time step: obtaining a current simulation state for the current time step; generating a plurality of candidate next simulation states for a next time step based on the current simulation state, wherein generating each candidate next simulation state comprises: sampling, for each agent, a respective action from a set of possible actions that can be performed by the agent; and determining, for each agent, a respective next state of the agent at the next time step if the agent performs the corresponding sampled action at the current time step; determining, for each candidate next simulation state, a discriminative score characterizing a likelihood that the candidate next simulation state is a realistic simulation state; and selecting a candidate next simulation state as the simulation state for the next time step based on the discriminative scores for the candidate next simulation states. See preceding logic for claim 1.

Regarding claim 20, Siddique in view of Isele teaches
The non-transitory computer storage media of claim 19, wherein the agents are vehicles in the environment. See preceding logic for claim 2.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Brittany Renee Peko whose telephone number is (408)918-7506. The examiner can normally be reached Monday - Thursday 7:30-5:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Elaine Gort can be reached on (571)272-6781. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/B.R.P./07/30/2022             Examiner, Art Unit 3661          

/Elaine Gort/             Supervisory Patent Examiner, Art Unit 3661