DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in reply to the patent application filed on August 24, 2020. 
Claims 21-40 are currently pending and have been examined.
This action is made Non-FINAL.
The examiner would like to note that this application is being handled by examiner Christine Huynh.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on December 16, 2020, June 22, 2021, January 27, 2022, and August 29, 2022. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: reference character 138, in FIG. 1.  Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 21-26 and 33-40 is/are rejected under 35 U.S.C. 103 as being unpatentable over Siddiqui et al. (US 20200353943 A1) in view of Palanisamy et al. (US 20200033869 A1). 
Regarding claims 21-26 and 33-40: 
With respect to claims 21 and 36, Siddiqui teaches: 
training, using at least one processor, a machine learning model using first samples of simulated driving scenarios; (“FIG. 1 illustrates a block diagram of an example system 100 for training a machine learning network 130 with driving scenario data 122 and for providing a simulated driving environment… the simulation module 108 generates a real-world simulated driving environment for training self-driving cars with multiple driving scenarios.” [0029]) 
obtaining, using the at least one processor, tuning data including second samples of simulated driving scenarios; (“The system 100 may combine GAN updates and reinforcement learning (RL) updates by defining an RL score that combines achievement of goals with the GAN score (i.e. the discrimination score of each trajectory). At each update iteration in reinforcement learning, training samples are generated by the system 100 by playing out driving scenarios.” [0070]) This shows iterative learning to tune the system using more samples of driving scenarios. 
tuning, using the at least one processor, the trained machine learning model with the tuning data; (“At each update iteration in reinforcement learning, training samples are generated by the system 100 by playing out driving scenarios. Then the system 100 makes a policy update to the machine learning network 130 that best averages achievement of goals with how well the goal beat the current discriminator (e.g., as determined by an RL reward function). After a number of iterations, the system updates 100 the discriminator.” [0070]) The machine learning model is updated with the reinforcement learning using the more than one training samples. 
at least one processor; and a memory storing instructions; (“A computer system may include a processor, a memory, and a non-transitory computer-readable medium. The memory and non-transitory medium may store instructions for performing methods and steps described herein.” [0028]) 
Siddiqui does not teach, but Palanisamy teaches: 
optimizing, using the at least one processor, a loss function of predictions output by the tuned machine learning model; (“a loss function configured to process the learning targets output by the corresponding learning target module and the output of the corresponding DRL algorithm to compute an overall output loss…  The updated parameters are available to be used by the driver agents, and can be used by the driving learner modules to retrain and optimize neural network parameters of the DRL algorithm.” [0023]) This shows that the system uses a loss function to update the parameters which are then output. 
operating, using the at least one processor, a vehicle in an environment using the tuned machine learning model; (“The low-level controller is configured to process each action to generate control signals for controlling the vehicle to control the vehicle when operating in that specific driving environment.” [0016]) This shows that after training the system, the vehicle can be operated in a physical environment. 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Siddiqui’s driving scenario system with Palanisamy’s machine learning parameters because (“The parallel and distributed architecture of the autonomous driving policy generation and server system allows the driver agents and driving policy learner modules to find novel driving policies and behaviors faster and more efficiently.” See Palanisamy [0073]). 

With respect to claims 22 and 37, Siddiqui in combination with Palanisamy, as shown in the rejection above, discloses the limitations of claims 21 and 36. The combination of Siddiqui and Palanisamy teaches the machine learning driving scenario of claims 21 and 36. Siddiqui further teaches: 
obtaining, using the at least one processor, log data from a database; (“The system 100 may store the driving scenario data 122 in a database or other file storage system. The system 100 determines trajectories, velocities, goals and other information about the dynamic objects identified in the obtained aerial video data 120. Via the network training module 106, the system 100 trains the machine learning network 130 using imitation and/or reinforcement learning methods based on the generated driving scenario data 122 (block 230).” [0030]). The training uses data that is stored in a database. 
tuning, using the at least one processor, the trained machine learning model by the log data; (“The system 100 using a discriminator may evaluate the generated data by comparing the generated data against the real data (i.e., the driving scenario data 122) to determine if the generated data looks similar to the real data. The system 100 may use the discriminator to compute a score indicating the extent to which the generated data looks realistic. The system 100, based on the computed scores, may perform a policy update to the machine learning network 130 such that the trajectories of dynamic objects are better enabled to achieve their goals. Over multiple GAN updates, newly generated trajectories created by the generator would look more and more like realistic trajectory data from the original driving scenario data 122.” [0069]) Using the data from the database, the machine learning network can be updated, or tuned, in a reiterative process. 

With respect to claims 23 and 38, Siddiqui in combination with Palanisamy, as shown in the rejection above, discloses the limitations of claims 21 and 36. 
The combination of Siddiqui and Palanisamy teaches the machine learning driving scenario of claims 21 and 36. Siddiqui does not teach, but Palanisamy teaches optimizing, using the at least one processor, the loss function of predictions based on ground truth data; (“a loss function configured to process the learning targets output by the corresponding learning target module and the output of the corresponding DRL algorithm to compute an overall output loss…  The updated parameters are available to be used by the driver agents, and can be used by the driving learner modules to retrain and optimize neural network parameters of the DRL algorithm.” [0023], “the controller 34 implements machine learning techniques to assist the functionality of the controller 34, such as feature detection/classification, obstruction mitigation, route traversal, mapping, sensor integration, ground-truth determination” [0071]) This shows that the system uses a loss function to update the parameters which are then output. Ground-truth data can also be used in the machine learning network to validate the model. 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Siddiqui’s driving scenario system with Palanisamy’s machine learning parameters because (“The parallel and distributed architecture of the autonomous driving policy generation and server system allows the driver agents and driving policy learner modules to find novel driving policies and behaviors faster and more efficiently.” See Palanisamy [0073]). 

With respect to claims 24 and 39, Siddiqui in combination with Palanisamy, as shown in the rejection above, discloses the limitations of claims 21 and 36. 
The combination of Siddiqui and Palanisamy teaches the machine learning driving scenario of claims 21 and 36. Siddiqui does not teach, but Palanisamy teaches wherein the loss function is a mean squared error (MSE) loss function; (“The loss function can be implemented using any known type of loss function such as Mean Squared Error (MSE) (or quadratic) loss function” [0118]) 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Siddiqui’s driving scenario system with Palanisamy’s machine learning parameters because (“The parallel and distributed architecture of the autonomous driving policy generation and server system allows the driver agents and driving policy learner modules to find novel driving policies and behaviors faster and more efficiently.” See Palanisamy [0073]). 

With respect to claims 25 and 40, Siddiqui in combination with Palanisamy, as shown in the rejection above, discloses the limitations of claims 24 and 39. 
The combination of Siddiqui and Palanisamy teaches the machine learning driving scenario of claims 24 and 39. Siddiqui does not teach, but Palanisamy teaches optimizing, using the at least one processor, the loss function using stochastic gradient descent or an Adam optimizer; (“Each DRL algorithm is configured to process data relating to driving experiences using stochastic gradient updates to train a neutral network comprising more than one layer of hidden units between its inputs and outputs… Each of the driving policy learner modules further comprises a gradient descent optimizer configured to process the gradient data for each parameter to compute updated parameters (e.g., updates for each parameter) representing a policy.” [0023]) 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Siddiqui’s driving scenario system with Palanisamy’s machine learning parameters because (“The parallel and distributed architecture of the autonomous driving policy generation and server system allows the driver agents and driving policy learner modules to find novel driving policies and behaviors faster and more efficiently.” See Palanisamy [0073]). 

With respect to claim 26, Siddiqui in combination with Palanisamy, as shown in the rejection above, discloses the limitations of claim 21. The combination of Siddiqui and Palanisamy teaches the machine learning driving scenario of claim 21. Siddiqui further teaches:
embedding, using the at least one processor, each sample of simulated driving scenarios into a pseudo-image; (“Via the driving scenario data generation module 104, the system 100 generates driving scenario data 122 based on the obtained video data (block 220)” [0030], “FIGS. 9A and 9B illustrate an example user interface depicting a 3-dimensional simulated environment 900. The system 100 may generate a simulated 3-dimensional environment 900 from the perspective of the primary agent 910 (e.g., the autonomous vehicle) as shown in FIG. 9A, or a birds-eye view of the primary agent 910 from various perspectives.” [0098]) This shows that the driving scenario data is embedded into a 3-dimensional environment, which is a pseudo-image. 
training, using the at least one processor, the machine learning model using the pseudo- image; (“Via the simulation module 108 and API module 110, the system 100 provides a simulated 3-dimensional environment for an external autonomous driving system (block 240). Then the system 100 simulates interaction of an autonomous vehicle with the dynamic objects based on the trained machine learning network (block 250).” [0030]) 

With respect to claim 33, Siddiqui in combination with Palanisamy, as shown in the rejection above, discloses the limitations of claim 21. 
The combination of Siddiqui and Palanisamy teaches the machine learning driving scenario of claim 21. Siddiqui further teaches wherein the initial physical state of each object includes an initial position and an initial acceleration, and the initial position and the initial acceleration are assigned according to a random number outputted by a random number generator; (“the system 100 trains the machine learning network 130, with a reinforcement learning method by processing the driving scenario data 122 where the driving scenario data 122 includes dynamic objects with trajectories, an initial condition (e.g. a state) and goals (e.g., an intended destination).” [0064], “The generator uses random noise or data perturbations to create imitations of the real data in an attempt to trick the discriminator into believing the data is realistic.” [0068]) This shows that the system generates and assigns random data for aspects of the driving scenario, such as the initial conditions of the vehicle and objects, to imitate realistic data. In turn, because the system as claimed has the properties predicted by the prior art, it would have been obvious to make the system or product where a random number determines the initial physical state of the objects. 

With respect to claim 34, Siddiqui in combination with Palanisamy, as shown in the rejection above, discloses the limitations of claim 21. 
The combination of Siddiqui and Palanisamy teaches the machine learning driving scenario of claim 21. Siddiqui does not teach, but Palanisamy further teaches wherein the machine learning model is a deep neural network for object motion prediction; (“the agent uses a deep neural network to learn the longterm value of a state/action. The DRL based agent can also use a deep neural network to learn the mappings between state and actions. By performing an action, the agent transitions from state to state.” [0104]) 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Siddiqui’s driving scenario system with Palanisamy’s machine learning parameters because (“The parallel and distributed architecture of the autonomous driving policy generation and server system allows the driver agents and driving policy learner modules to find novel driving policies and behaviors faster and more efficiently.” See Palanisamy [0073]). 

With respect to claim 35, Siddiqui in combination with Palanisamy, as shown in the rejection above, discloses the limitations of claim 21. 
The combination of Siddiqui and Palanisamy teaches the machine learning driving scenario of claim 21. Siddiqui further teaches wherein each sample of simulated driving scenarios includes a plurality of time stamps and associated position, velocity or acceleration of a virtual vehicle; (“The system 100 may use various inputs for a particular time instant. These inputs may include, but are not limited to, an observations map 610 (i.e., a current state of dynamic objects around or in proximity to the Ego vehicle 612), a local map layer 620 (i.e., a local vector map located indicating the road network and structures around the Ego vehicle 612), a local goal 630 (i.e., an intended destination 614 of the Ego vehicle 612), and a current vehicle state 640 (i.e., a speed and heading of the Ego vehicle 612).” [0050]) Which shows that each time instant, or time stamp, would have an associated vehicle state.

Claim(s) 27-29 and 31-32 is/are rejected under 35 U.S.C. 103 as being unpatentable over Siddiqui et al. (US 20200353943 A1) in view of Palanisamy et al. (US 20200033869 A1) and Atsom (US 20190228571 A1). 
Regarding claims 27-29 and 31-32: 
With respect to claim 27, Siddiqui in combination with Palanisamy, as shown in the rejection above, discloses the limitations of claim 21. The combination of Siddiqui and Palanisamy teaches the machine learning driving scenario of claim 21. Siddiqui further teaches: 
simulating a plurality of driving scenarios using a map, an initial physical state of each object in the map, and the mental state of the virtual driver of the virtual vehicle; (“This allows the system to generate random topologies and vehicle positions which can be used in the above RL/GAN training. The trained policy network 130 then can be used to drive all these vehicles. The system 100 may generate new driving scenarios using the GAN by generating the maps, initial positions, and goals using the GAN and applying the trained policy network 130 to generate the vehicle trajectories within that context.” [0071]) 
However, Siddiqui and Palanisamy does not teach simulating the mental state of the virtual driver of the virtual vehicle. Atsom does teach: 
assigning, using the at least one processor, a mental state of a virtual driver of a virtual vehicle; (“movement of one or more ground vehicles inserted into the virtual realistic model may be controlled according to driver behavior data received from a driver behavior simulator. The driver behavior data may be adjusted according to one or more driver behavior patterns and/or driver behavior classes exhibited by a plurality of drivers in the certain geographical area, i.e. driver behavior patterns and/or driver behavior classes that may be typical to the certain geographical area.” [0085]) 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Siddiqui’s driving scenario system and Palanisamy’s machine learning parameters with Atsom’s mental state of a virtual driver because (“Enhancing the driver behavior pattern(s) may allow a more accurate characterization of the driver prototype(s) detected in the geographical area and may thus improve the simulation created using the virtual realistic model.” See Atsom [0068]) 

With respect to claim 28, Siddiqui in combination with Palanisamy and Atsom, as shown in the rejection above, discloses the limitations of claim 27. 
The combination of Siddiqui, Palanisamy, and Atsom teaches the machine learning driving scenario of claim 27. Siddiqui and Palanisamy does not teach, but Atsom teaches wherein the mental state of the virtual driver includes an acceleration preference of the virtual driver; (“The driver behavior pattern(s) may include one or more motion parameters, for example, a speed parameter, an acceleration parameter, a breaking parameter, a direction parameter, an orientation parameter and/or the like… In another example, the driver behavior pattern(s) may describe a direction and/or orientation parameter for one or more phase while exiting the interchange on the exit ramp. In another example, the driver behavior pattern(s) may describe an acceleration parameter for one or more phases while entering the interchange entrance ramp.” [0130]) This shows that the driver behavior, or mental state of the driver, includes specifying the acceleration to be within certain limits depending on the situation, which would be an acceleration preference of the virtual driver. 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Siddiqui’s driving scenario system and Palanisamy’s machine learning parameters with Atsom’s mental state of a virtual driver because (“Enhancing the driver behavior pattern(s) may allow a more accurate characterization of the driver prototype(s) detected in the geographical area and may thus improve the simulation created using the virtual realistic model.” See Atsom [0068]) 

With respect to claim 29, Siddiqui in combination with Palanisamy and Atsom, as shown in the rejection above, discloses the limitations of claim 27. 
The combination of Siddiqui, Palanisamy, and Atsom teaches the machine learning driving scenario of claim 27. Siddiqui and Palanisamy does not teach, but Atsom teaches wherein the mental state of the virtual driver includes a preference to maintain a gap between the other virtual vehicle and at least one other object in the set of objects; (“based on analysis of sensory ranging data received from the range sensor(s), the driver behavior simulator 214 may identify space keeping parameters, in-lane position parameters and may thus associate one or more of the driver behavior patterns with the tailgating characteristic, the in-lane position characteristic and/or the like.” [0132]) 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Siddiqui’s driving scenario system and Palanisamy’s machine learning parameters with Atsom’s mental state of a virtual driver because (“Enhancing the driver behavior pattern(s) may allow a more accurate characterization of the driver prototype(s) detected in the geographical area and may thus improve the simulation created using the virtual realistic model.” See Atsom [0068]) 

With respect to claim 31, Siddiqui in combination with Palanisamy and Atsom, as shown in the rejection above, discloses the limitations of claim 27. 
The combination of Siddiqui, Palanisamy, and Atsom teaches the machine learning driving scenario of claim 27. Siddiqui further teaches wherein the mental state of the virtual driver includes a goal of the virtual driver; (“In training the machine learning network 130, with given inputs 610, 620, 630, 640 for the Ego vehicle 612, the system 100 may learn how to control the Ego vehicle 612 in such a way that that the Ego vehicle 612 achieves its high-level goal (e.g., the vehicle's intended destination)… For example, the dynamic object's local goal may be the final position of the dynamic object at a particular time interval.” [0052]) This shows that the state of the virtual driver includes a goal, which can be the vehicle’s intended destination, or reaching an intended destination within a particular time interval. 

With respect to claim 32, Siddiqui in combination with Palanisamy and Atsom, as shown in the rejection above, discloses the limitations of claim 27. 
The combination of Siddiqui, Palanisamy, and Atsom teaches the machine learning driving scenario of claim 27. Siddiqui and Palanisamy does not teach, but Atsom teaches wherein the mental state of the virtual driver includes a politeness factor that determines how much the virtual driver is willing to inconvenience other virtual drivers of other virtual vehicles in the driving scenario; (“the driver behavior simulator 214 may classify at least some of the plurality of drivers to one or more driver behavior classes according to the driver behavior pattern(s) associated with each of the drivers. The driver behavior classes may include, for example, an aggressive driver prototype, a normal driver prototype, a patient driver prototype, a reckless driver prototype and/or the like.” [0134]) This shows that the driver behaviors can be classified with how the virtual drivers interact with other drivers, which is comparable to a politeness factor which determines a virtual driver’s interactions with other drivers. 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Siddiqui’s driving scenario system and Palanisamy’s machine learning parameters with Atsom’s mental state of a virtual driver because (“Enhancing the driver behavior pattern(s) may allow a more accurate characterization of the driver prototype(s) detected in the geographical area and may thus improve the simulation created using the virtual realistic model.” See Atsom [0068]) 

Claim(s) 30 is/are rejected under 35 U.S.C. 103 as being unpatentable over Siddiqui et al. (US 20200353943 A1) in view of Palanisamy et al. (US 20200033869 A1), Atsom (US 20190228571 A1), and Zhang et al. (US 20190318267 A1). 
Regarding claim 30: 
 With respect to claim 30, Siddiqui in combination with Palanisamy and Atsom, as shown in the rejection above, discloses the limitations of claim 27. 
The combination of Siddiqui, Palanisamy, and Atsom teaches the machine learning driving scenario of claim 27. Siddiqui, Palanisamy, and Atsom does not teach wherein the mental state of the virtual driver includes a preference for a particular route; (“Decision module 304 and/or planning module 305 examine all of the possible routes to select and modify one of the most optimal route in view of other data provided by other modules such as traffic conditions from localization module 301, driving environment perceived by perception module 302, and traffic condition predicted by prediction module 303.” [0041]) Which shows that an optimal route, or preferred route, can be determined based on parameters such as traffic or environmental conditions in a driving simulation. 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Siddiqui’s driving scenario system, Palanisamy’s machine learning parameters, Atsom’s mental state of a virtual driver, and Zhang’s preferred route so (“the autonomous driving software component can be improved to handle difficult driving scenarios (e.g., cornering scenarios), thereby ensuring the safety of the human user of an autonomous driving vehicle.” See Zhang [0017]) and therefore better train the autonomous driving software. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Christine N Huynh whose telephone number is (571)272-9980. The examiner can normally be reached Monday - Friday 8 am - 4 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aniss Chad can be reached on (571)270-3832. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHRISTINE NGUYEN HUYNH/Examiner, Art Unit 3662                                                                                                                                                                                                        
/ANISS CHAD/Supervisory Patent Examiner, Art Unit 3662