Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Applicant's submission filed on 23 August 2022 has been entered.  Claim 1 has been amended.   Claims 1-20 are pending, of which claims, of which claim 1 is in independent form.  Accordingly, this action has been made FINAL.
Amendments to the Specification and Drawings have been considered
Response to Argument
Applicant's arguments with respect to claims 1-20 have been considered but are moot in view of the new ground(s) of rejection.  
The Office's Note:
The Office has cited particular paragraphs / columns and line numbers in the reference(s) applied to the claims above for the convenience of the Applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim(s), other passages and figures may apply as well. It is respectfully requested from the Applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the cited passages as taught by the prior art or relied upon by the Examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Rosman (US 20200086863– hereinafter Rosman), in view of Shalev-Shwartz (US 20190369637 – hereinafter Shalev-Shwartz) and further in view of Moustafa (US 20220126864– hereinafter Moustafa).
Claim 1 is rejected, Rosman teaches a method of running a simulation in order to test software used to control a vehicle in an autonomous driving mode, the method comprising (Rosman, abstract):
 identifying, by one or more processors, logged data for the simulation, the logged data having been collected by a first vehicle, the logged data further identifying an agent, wherein the agent is a road user(Rosman, US 20200086863, fig. 4, paragraph [0058-0059], With further reference to FIG. 4, additional aspects are illustrated relating to selection of information for training. That is, as previously mentioned, the prediction system 170, in one embodiment, trains the model 260 from multiple data logged datasources (logged datasets, vehicles actively on the road, simulated data, etc.) at the same time. The plate diagram 400 as illustrated in FIG. 4 indicates this combined use of information. For example, the plate 410 and the plate 420 represent different experience sources. By way of example, source plate 410 is real data acquired from vehicles traveling along roads and logging sensor data about perceived aspects of the environment. By contrast, the source 420 is simulated data that is computer-generated by, for example, the model 260 or another model that produces simulated observations used for learning and/or validation.  Paragraph [0060], the data sources can include agent vehicle observations, ego vehicle data, and simulated data logs. Incorporating the simulated information alongside the real-world collected data provides for improving on data that covers rare or non-existing circumstances in the real-world data thereby improving the overall training.  Fig. 7 and paragraph [0076], At 710, the training module 270 acquires one or more sources of data for training the model 260. As previously explained in relation to FIG. 4, the logged data includes past observations from one or more vehicles that have logged observations, and/or simulated data that is computer-generated. In one embodiment, the simulated data is generated using the model 260 and/or another simulation system (e.g., generative neural network). In either case, the training data is collected to acquire a set of training examples having a sufficient quantity, quality, and diversity that the process of training the model 260 produces understandings within the model 260 that can be extrapolated into accurate predictions of future states using present and past observations of agents.  Paragraph [0077], At 720, the training module 270 trains the model 260 using one or more of the logged data and the simulated data. In one embodiment, the training module 270 trains the model 260 using reinforcement learning to generate backwards updates of the agent model. For example, the training module 270 computes a prior probability of a previously given state of the road agent at a previous point in time according to the previous observations and the present observations. In this way, the training module 270 can improve on the previous determinations through the perspective of hindsight to update the prior probability by computing a refined probability according to a determined next state and future observations of the road agent.  Fig. 5 and paragraph [0064-0068], At 520, the input module 220, in response to receiving the sensor data 250 including present observations of at least one road agent of dynamic agents that are being tracked in the surrounding environment, identifies previous observations of the road agent. That is, for example, the input module 230 correlates the road agent with previous observations to maintain continuity in understanding of the surrounding environment. In this way, the road agent can be continuously tracked, and estimates of future states provided accordingly.); 
analyzing, by the one or more processors, the logged data to identify one or more signals of intent of the agent including a logged path of the agent(Rosman, fig. 5 and paragraph [0064-0068], At 520, the input module 220, in response to receiving the sensor data 250 including present observations of at least one road agent of dynamic agents that are being tracked in the surrounding environment, identifies previous observations of the road agent. That is, for example, the input module 230 correlates the road agent with previous observations to maintain continuity in understanding of the surrounding environment. In this way, the road agent can be continuously tracked, and estimates of future states provided accordingly.); 
determining, by the one or more processors, one or more characteristics based on the one or more signals(Rosman, paragraph [0065-0068], At 530, the input module 220 retrieves the previous observations from an electronic data store (e.g., memory 210). In one embodiment, the previous observations are maintained in memory, and thus the input module 220 may simply link the observations to create the noted correlation. As a further note, while the input module 220 is discussed as identifying and retrieving observations, in further aspects, the input module 220 may also track and manage further aspects about information relating to the dynamic agents such as actions, states, and so on.); and 
The Office would like to use prior art to back up Rosman to further teach limitation
a logged path(Shalev-Shwartz, US 20190369637, fig. 5E and paragraph [0140-0142], FIG. 5E is a flowchart showing an exemplary process 500E for causing one or more navigational responses in vehicle 200 based on a vehicle path, consistent with the disclosed embodiments. At step 570, processing unit 110 may construct an initial vehicle path associated with vehicle 200. The vehicle path may be represented using a set of points expressed in coordinates (x, z), and the distance d.sub.i between two points in the set of points may fall in the range of 1 to 5 meters. In one embodiment, processing unit 110 may construct the initial vehicle path using two polynomials, such as left and right road polynomials. Processing unit 110 may calculate the geometric midpoint between the two polynomials and offset each point included in the resultant vehicle path by a predetermined offset (e.g., a smart lane offset), if any (an offset of zero may correspond to travel in the middle of a lane). The offset may be in a direction perpendicular to a segment between any two points in the vehicle path. In another embodiment, processing unit 110 may use one polynomial and an estimated lane width to offset each point of the vehicle path by half the estimated lane width plus a predetermined offset (e.g., a smart lane offset).  Paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication.).
It would have obvious to one having ordinary skill in the art before the effecting filing date of the claimed invention to combine the teachings of cited references. Thus, one of ordinary skill in the art before the effecting filing date of the claimed invention would have been motivated to incorporate Shalev-Shwartz into Rosman's to receive multiple images representative of an environment of the host vehicle. Multiple images are analyzed to identify navigational state information associated with the host vehicle. Multiple potential trajectories for the host vehicle is determined based on the navigational state information. The preliminary analysis is performed relative to each of the potential trajectories. The subset of multiple potential trajectories are selected based on the indicator of relative ranking assigned to each of the potential trajectories as suggested by Shalev-Shwartz (See abstract and summary).
Rosman and Shalev-Shwartz do not explicitly teach
running, by the one or more processors, the simulation using the logged data by replacing the agent with an interactive agent having the one or more characteristics of the agent, wherein the interactive agent is capable of responding to actions performed by a simulated vehicle using the software in the simulation
However, Moustafa teaches
running, by the one or more processors, the simulation using the logged data by replacing the agent with an interactive agent having the one or more characteristics of the agent, wherein the interactive agent is capable of responding to actions performed by a simulated vehicle using the software in the simulation(Moustafa, US 20220126864, paragraph [0200-0210], In higher level autonomous vehicles, the in-vehicle processing system implementing an autonomous driving stack allows driving decisions to be made and controlled without the direct input of the passengers in the vehicle, with the vehicle's system instead relying on the application of models, including machine learning models, which may take as inputs data collected automatically by sensors on the vehicle, data from other vehicles or nearby infrastructure (e.g., roadside sensors and cameras, etc.), and data (e.g., map data) describing the geography and maps of routes the vehicle may take. The models relied upon by the autonomous vehicle's systems may also be developed through training on data sets that describe other preceding trips (by the vehicle or other vehicles), whose ground truth may also be based on the perspective of the vehicle and the results it observes or senses through its sensors. In some implementations, the “success” of an autonomous vehicle's operation can thus be machine-centric or overly pragmatic-rightfully focused on providing safe and reliable transportation from point A to point B, while potentially being agnostic to the unique preferences and variable human contexts of the passengers.  Paragraph [0214],  In some implementations, data generated by these various sensors may be provided for consumption and/or hosting by one or more cloud- or fog-based computing environments (e.g., 835). In some implementations, such a solution may serve to democratize and/or generalize data collected within environments in which autonomous vehicles are present. Aggregating collection of at least a portion of this data may further allow additional processing to be performed and data collection and offload to maximized, such as to address specific needs of varying “client” vehicles interfacing with these services (e.g., 835), for instance, to obtain types of sensor-induced data for the type missing, demographics, context, delivery of agglomerated results, results unique to a particular region or area, among other examples. Such components, or a subset of them, may provide the cloud (e.g., 835) with various levels of sensory information, to help cross-correlate information collected within the environment, and effectively integrate the various data and sensors as a service for client autonomous vehicles with lower-end sensor capabilities, permanently or temporarily damaged/disabled sensors (including on high-end autonomous vehicles), or vehicles possessing less than a full suite of high-level sensing or compute capabilities, among other examples.  Paragraph [0215-0228], data collection.  Paragraph [0794-0795], In various embodiments, simulation and techniques such as reinforcement learning can also be used to automatically learn the context-based sampling policies (e.g., rates) and sensor fusion weights. Determining how frequently to sample different sensors and what weights to assign to which sensors is challenging due to the large number of driving scenarios. The complexity of context-based sampling is also increased by the desire to achieve different objectives such as high object tracking accuracy and low power consumption without compromising safety. Simulation frameworks which replay sensor data collected in the real-world or simulate virtual road networks and traffic conditions provide safe environments for training context-based models and exploring the impact of adaptive policies.  Fig. 129 and paragraph [0827-0829], At 12908, the position information obtained at 12906 is used in a sensor fusion process of the autonomous vehicle. For example, the autonomous vehicle may use the position information in a perception phase of an autonomous driving pipeline.  Paragraph [1172],  Example 148 includes the subject matter of any one of Examples 143-147, where the instructions are further executable to cause the machine to: apply a machine learning model to inputs at the autonomous vehicle to predict a likelihood that one or more of the sensors on the autonomous vehicle will be compromised during the trip; and configure a recommender system based on the likelihood.  Paragraph [1174-1178],  Example 150 includes the subject matter of any one of Examples 143-149, where the particular sensor is one of a suite of sensors on the autonomous vehicle to generate collection of sensor data for use as inputs to support autonomous driving, and the storage medium further includes filtering the recommendation data to keep the portion of the recommendation data corresponding to a portion of the collection of sensor data missing as a result of the particular sensor being compromised.)
It would have obvious to one having ordinary skill in the art before the effecting filing date of the claimed invention to combine the teachings of cited references. Thus, one of ordinary skill in the art before the effecting filing date of the claimed invention would have been motivated to incorporate Moustafa into Rosman and Shalev-Shwartz 's to receive sensor data from multiple sensors. The sensors comprised of a first set of sensors and a second set of sensors and a portion of multiple sensors are coupled to a vehicle. The automate control of the vehicle based on a portion of the sensor data generated by the first set of sensors is using a processor in the vehicle. The passenger attributes of passengers within the autonomous vehicle from sensor data generated by the second set of sensors is determined using a processor in the vehicle. The vehicle attributes of the vehicle based on the passenger attributes and the sensor data generated by the first set of sensors are modified. as suggested by Moustafa (See abstract and summary).
Claim 2 is rejected for the reasons set forth hereinabove for claim 1, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 1, 
wherein the agent is a vehicle (Rosman, paragraph [0031-0035],  Thereafter, the tracking module 230 can control one or more of the vehicle systems 140 according to the future state of the road agent. For example, the tracking module 230 in combination with the autonomous driving module 160 identify, plan, and execute maneuvers of the vehicle 100 in relation to the determined future state of the road agent.).  
Claim 3 is rejected for the reasons set forth hereinabove for claim 1, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 1, 
wherein the logged path is defined by changes in pose of the agent over a course of the logged data(Rosman, paragraph [0039-0040], In a further example, the model 260 as implemented via the tracking module 230 provides for tracking a car with road semantics. For example, S represents a location and dynamics, waypoint. The action A represents steering. Here, φ embodies lane information of the vehicle (e.g., vehicle 100) a nearby vehicle (cars that are external to the agent being tracked) location in car coordinate frame. Z represents sensor data 250 associated location observations and semantic map values at the estimated locations, S—location/dynamics, A—steering, T—dynamics in car's coordinate frame+update of waypoint.   Paragraph [0071-0072], In general, since we normalize with respect to a pose of the vehicle relevant features include speeds, relative speeds, legal speed, and lane positions.  Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication).  
Claim 4 is rejected for the reasons set forth hereinabove for claim 1, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 1, 
wherein the one or more characteristics includes following the logged path(Rosman, paragraph [0042-0045], Accordingly, the tracking module 230 can use the trained model 260 to predict future states and actions associated with a road agent for which, for example, only partial observations are available. In this way, the tracking module 230 leverages the model 260 to improve predictions about the road agents, which then influence, in an improved manner, how the prediction system 170 affects autonomous control of the vehicle 100. This improvement flows from improved knowledge of the road agents and thus a better ability to account for movements of the road/dynamic agents. Accordingly, through the unconventional arrangement of relationships and indicated computations, the tracking module 230 in combination with the model 260 improve tracking/prediction of dynamic behaviors for the road agents, which are realized through improved planning and control for the vehicle 100.    Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication.  Paragraph [0226 and 0231], break.  Paragraph [0302], The first term may result in a penalty for non-zero accelerations, thus encouraging smooth driving. The second term depends on the ratio between the distance to the target car, x.sub.t, and the desired distance, x.sub.t*, which is defined as the maximum between a distance of 1 meter and break distance of 1.5 seconds. In some cases, this ratio may be exactly 1, but as long as this ratio is within [0.7, 1.3], the policy may forego any penalties, which may allow the host vehicle some slack in navigation—a characteristic that may be important in achieving a smooth drive.  Paragraph [0193-0194], There are many examples of potential hard constraints. For example, a hard constraint may be defined in conjunction with a guardrail on an edge of a road. In no situation may the host vehicle be allowed to pass the guardrail. Such a rule induces a hard lateral constraint on the trajectory of the host vehicle. Another example of a hard constraint may include a road bump (e.g., a speed control bump), which may induce a hard constraint on the speed of driving before the bump and while traversing the bump. Hard constraints may be considered safety critical and, therefore, may be defined manually rather than relying solely on a trained system learning the constraints during training.).  
Claim 5 is rejected for the reasons set forth hereinabove for claim 1, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 1, 
wherein the one or more signals of intent includes a following distance of the agent(Rosman, paragraph [0084-0086],  In one or more arrangements, the one or more data stores 115 can include map data 116. The map data 116 can include maps of one or more geographic areas. In some instances, the map data 116 can include information or data on roads, traffic control devices, road markings, structures, features, and/or landmarks in the one or more geographic areas. The map data 116 can be in any suitable form. In some instances, the map data 116 can include aerial views of an area. In some instances, the map data 116 can include ground views of an area, including 360-degree ground views. The map data 116 can include measurements, dimensions, distances, and/or information for one or more items included in the map data 116 and/or relative to other items included in the map data 116. The map data 116 can include a digital map with information about road geometry. The map data 116 can be high quality and/or highly detailed.  Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication.  Paragraph [0226 and 0231], break.  Paragraph [0302], The first term may result in a penalty for non-zero accelerations, thus encouraging smooth driving. The second term depends on the ratio between the distance to the target car, x.sub.t, and the desired distance, x.sub.t*, which is defined as the maximum between a distance of 1 meter and break distance of 1.5 seconds. In some cases, this ratio may be exactly 1, but as long as this ratio is within [0.7, 1.3], the policy may forego any penalties, which may allow the host vehicle some slack in navigation—a characteristic that may be important in achieving a smooth drive.  Paragraph [0193-0194], There are many examples of potential hard constraints. For example, a hard constraint may be defined in conjunction with a guardrail on an edge of a road. In no situation may the host vehicle be allowed to pass the guardrail. Such a rule induces a hard lateral constraint on the trajectory of the host vehicle. Another example of a hard constraint may include a road bump (e.g., a speed control bump), which may induce a hard constraint on the speed of driving before the bump and while traversing the bump. Hard constraints may be considered safety critical and, therefore, may be defined manually rather than relying solely on a trained system learning the constraints during training.).  
Claim 6 is rejected for the reasons set forth hereinabove for claim 5, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 5, 
wherein the one or more characteristics is a following distance for following another agent or the simulated vehicle in the simulation(Rosman, paragraph [0086-0090], In one or more arrangements, the map data 116 can include one or more static obstacle maps 118. The static obstacle map(s) 118 can include information about one or more static obstacles located within one or more geographic areas. A “static obstacle” is a physical object whose position does not change or substantially change over a period of time and/or whose size does not change or substantially change over a period of time. Examples of static obstacles include trees, buildings, curbs, fences, railings, medians, utility poles, statues, monuments, signs, benches, furniture, mailboxes, large rocks, hills. The static obstacles can be objects that extend above ground level. The one or more static obstacles included in the static obstacle map(s) 118 can have location data, size data, dimension data, material data, and/or other data associated with it. The static obstacle map(s) 118 can include measurements, dimensions, distances, and/or information for one or more static obstacles. The static obstacle map(s) 118 can be high quality and/or highly detailed. The static obstacle map(s) 118 can be updated to reflect changes within a mapped area.  Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication.  Paragraph [0226 and 0231], break.  Paragraph [0302], The first term may result in a penalty for non-zero accelerations, thus encouraging smooth driving. The second term depends on the ratio between the distance to the target car, x.sub.t, and the desired distance, x.sub.t*, which is defined as the maximum between a distance of 1 meter and break distance of 1.5 seconds. In some cases, this ratio may be exactly 1, but as long as this ratio is within [0.7, 1.3], the policy may forego any penalties, which may allow the host vehicle some slack in navigation—a characteristic that may be important in achieving a smooth drive.  Paragraph [0193-0194], There are many examples of potential hard constraints. For example, a hard constraint may be defined in conjunction with a guardrail on an edge of a road. In no situation may the host vehicle be allowed to pass the guardrail. Such a rule induces a hard lateral constraint on the trajectory of the host vehicle. Another example of a hard constraint may include a road bump (e.g., a speed control bump), which may induce a hard constraint on the speed of driving before the bump and while traversing the bump. Hard constraints may be considered safety critical and, therefore, may be defined manually rather than relying solely on a trained system learning the constraints during training).  
Claim 7 is rejected for the reasons set forth hereinabove for claim 1, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 1, 
wherein the one or more signals of intent includes an overtaking intent of the agent with respect to one of another agent or the first vehicle (Shalev-Shwartz, paragraph [0189-0190], In FIG. 11B, the situation is slightly different. Here, host vehicle 1105 senses one or more target vehicles 1107 entering the main roadway 1112 from merge lane 1111. In this situation, once driving policy module 803 encounters merge node 913, it may choose to initiate an overtake left maneuver in order to avoid the merging situation.  Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication.  Paragraph [0226 and 0231], break.  Paragraph [0302], The first term may result in a penalty for non-zero accelerations, thus encouraging smooth driving. The second term depends on the ratio between the distance to the target car, x.sub.t, and the desired distance, x.sub.t*, which is defined as the maximum between a distance of 1 meter and break distance of 1.5 seconds. In some cases, this ratio may be exactly 1, but as long as this ratio is within [0.7, 1.3], the policy may forego any penalties, which may allow the host vehicle some slack in navigation—a characteristic that may be important in achieving a smooth drive.  Paragraph [0193-0194], There are many examples of potential hard constraints. For example, a hard constraint may be defined in conjunction with a guardrail on an edge of a road. In no situation may the host vehicle be allowed to pass the guardrail. Such a rule induces a hard lateral constraint on the trajectory of the host vehicle. Another example of a hard constraint may include a road bump (e.g., a speed control bump), which may induce a hard constraint on the speed of driving before the bump and while traversing the bump. Hard constraints may be considered safety critical and, therefore, may be defined manually rather than relying solely on a trained system learning the constraints during training).  
Claim 8 is rejected for the reasons set forth hereinabove for claim 7, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 7, 
wherein the overtaking intent is determined by observing whether the agent overtook one of the another agent or the first vehicle in the logged data(Shalev-Shwartz, paragraph [0189-0190],  In FIG. 11B, the situation is slightly different. Here, host vehicle 1105 senses one or more target vehicles 1107 entering the main roadway 1112 from merge lane 1111. In this situation, once driving policy module 803 encounters merge node 913, it may choose to initiate an overtake left maneuver in order to avoid the merging situation.  Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication.  Paragraph [0226 and 0231], break.  Paragraph [0302], The first term may result in a penalty for non-zero accelerations, thus encouraging smooth driving. The second term depends on the ratio between the distance to the target car, x.sub.t, and the desired distance, x.sub.t*, which is defined as the maximum between a distance of 1 meter and break distance of 1.5 seconds. In some cases, this ratio may be exactly 1, but as long as this ratio is within [0.7, 1.3], the policy may forego any penalties, which may allow the host vehicle some slack in navigation—a characteristic that may be important in achieving a smooth drive.  Paragraph [0193-0194], There are many examples of potential hard constraints. For example, a hard constraint may be defined in conjunction with a guardrail on an edge of a road. In no situation may the host vehicle be allowed to pass the guardrail. Such a rule induces a hard lateral constraint on the trajectory of the host vehicle. Another example of a hard constraint may include a road bump (e.g., a speed control bump), which may induce a hard constraint on the speed of driving before the bump and while traversing the bump. Hard constraints may be considered safety critical and, therefore, may be defined manually rather than relying solely on a trained system learning the constraints during training.).
 
Claim 9 is rejected for the reasons set forth hereinabove for claim 7, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 7, 
wherein the overtaking intent is determined from a behavior prediction for the agent during and beyond a period of the logged data (Shalev-Shwartz, paragraph [0189-0190], In FIG. 11B, the situation is slightly different. Here, host vehicle 1105 senses one or more target vehicles 1107 entering the main roadway 1112 from merge lane 1111. In this situation, once driving policy module 803 encounters merge node 913, it may choose to initiate an overtake left maneuver in order to avoid the merging situation.  Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication).  
Claim 10 is rejected for the reasons set forth hereinabove for claim 7, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 7, 
wherein the one or more characteristics includes a characteristic of wanting to overtake another agent or the simulated vehicle in the simulation(Shalev-Shwartz, paragraph [0189-0190],  In FIG. 11B, the situation is slightly different. Here, host vehicle 1105 senses one or more target vehicles 1107 entering the main roadway 1112 from merge lane 1111. In this situation, once driving policy module 803 encounters merge node 913, it may choose to initiate an overtake left maneuver in order to avoid the merging situation.  Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication).  
Claim 11 is rejected for the reasons set forth hereinabove for claim 1, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 1, 
wherein the one or more signals of intent includes a cut-in aggressiveness of the agent with respect to the first vehicle(Shalev-Shwartz, paragraph [0186-0190], 1) not relevant: indicating that the sensed vehicle in the scene is currently not relevant; 2) next lane: indicating that the sensed vehicle is in an adjacent lane and an appropriate offset should be maintained relative to this vehicle (the exact offset may be calculated in the optimization problem that constructs the trajectory given the Desires and hard constraints, and can potentially be vehicle dependent—the stay leaf of the options graph sets the target vehicle's semantic type, which defines the Desire relative to the target vehicle); 3) give way: the host vehicle will attempt to give way to the sensed target vehicle by, for example, reducing speed (especially where the host vehicle determines that the target vehicle is likely to cut into the lane of the host vehicle); 4) take way: the host vehicle will attempt to take the right of way by, for example, increasing speed; 5) follow: the host vehicle desires to maintain smooth driving following after this target vehicle; 6) takeover left/right: this means the host vehicle would like to initiate a lane change to the left or right lane. Overtake left node 917 and overtake right node 915 are internal nodes that do not yet define Desires.  Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication).
Claim 12 is rejected for the reasons set forth hereinabove for claim 11, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 11, 
wherein the cut-in aggressiveness of the agent is determined by comparing the logged path to a route the first vehicle was following in the logged data(Shalev-Shwartz, paragraph [0186-0190], 1) not relevant: indicating that the sensed vehicle in the scene is currently not relevant; 2) next lane: indicating that the sensed vehicle is in an adjacent lane and an appropriate offset should be maintained relative to this vehicle (the exact offset may be calculated in the optimization problem that constructs the trajectory given the Desires and hard constraints, and can potentially be vehicle dependent—the stay leaf of the options graph sets the target vehicle's semantic type, which defines the Desire relative to the target vehicle); 3) give way: the host vehicle will attempt to give way to the sensed target vehicle by, for example, reducing speed (especially where the host vehicle determines that the target vehicle is likely to cut into the lane of the host vehicle); 4) take way: the host vehicle will attempt to take the right of way by, for example, increasing speed; 5) follow: the host vehicle desires to maintain smooth driving following after this target vehicle; 6) takeover left/right: this means the host vehicle would like to initiate a lane change to the left or right lane. Overtake left node 917 and overtake right node 915 are internal nodes that do not yet define Desires.  Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication).  
Claim 13 is rejected for the reasons set forth hereinabove for claim 12, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 12, 
wherein the cut-in aggressiveness of the agent is determined based on whether the logged path overlaps with the route(Shalev-Shwartz, paragraph [0186-0190], 1) not relevant: indicating that the sensed vehicle in the scene is currently not relevant; 2) next lane: indicating that the sensed vehicle is in an adjacent lane and an appropriate offset should be maintained relative to this vehicle (the exact offset may be calculated in the optimization problem that constructs the trajectory given the Desires and hard constraints, and can potentially be vehicle dependent—the stay leaf of the options graph sets the target vehicle's semantic type, which defines the Desire relative to the target vehicle); 3) give way: the host vehicle will attempt to give way to the sensed target vehicle by, for example, reducing speed (especially where the host vehicle determines that the target vehicle is likely to cut into the lane of the host vehicle); 4) take way: the host vehicle will attempt to take the right of way by, for example, increasing speed; 5) follow: the host vehicle desires to maintain smooth driving following after this target vehicle; 6) takeover left/right: this means the host vehicle would like to initiate a lane change to the left or right lane. Overtake left node 917 and overtake right node 915 are internal nodes that do not yet define Desires.   Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication.  Paragraph [0226 and 0231], break.  Paragraph [0302], The first term may result in a penalty for non-zero accelerations, thus encouraging smooth driving. The second term depends on the ratio between the distance to the target car, x.sub.t, and the desired distance, x.sub.t*, which is defined as the maximum between a distance of 1 meter and break distance of 1.5 seconds. In some cases, this ratio may be exactly 1, but as long as this ratio is within [0.7, 1.3], the policy may forego any penalties, which may allow the host vehicle some slack in navigation—a characteristic that may be important in achieving a smooth drive.).    
Claim 14 is rejected for the reasons set forth hereinabove for claim 13, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 13, further comprising, 
when the logged path overlaps with the route, determining a minimum braking amount required for the first vehicle to avoid a collision with the agent(Rosman, paragraph [0100-0105], The processor(s) 110, the prediction system 170, and/or the autonomous driving module(s) 160 can cause the vehicle 100 to accelerate (e.g., by increasing the supply of fuel provided to the engine), decelerate (e.g., by decreasing the supply of fuel to the engine and/or by applying brakes) and/or change direction (e.g., by turning the front two wheels).  Paragraph [0106].   Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication.  Paragraph [0226 and 0231], break.  Paragraph [0302], The first term may result in a penalty for non-zero accelerations, thus encouraging smooth driving. The second term depends on the ratio between the distance to the target car, x.sub.t, and the desired distance, x.sub.t*, which is defined as the maximum between a distance of 1 meter and break distance of 1.5 seconds. In some cases, this ratio may be exactly 1, but as long as this ratio is within [0.7, 1.3], the policy may forego any penalties, which may allow the host vehicle some slack in navigation—a characteristic that may be important in achieving a smooth drive).  
Claim 15 is rejected for the reasons set forth hereinabove for claim 14, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 14, 
wherein determining the one or more characteristics is further based on the minimum braking amount(Rosman, paragraph [0106-0110], The autonomous driving module(s) 160 either independently or in combination with the prediction system 170 can be configured to determine travel path(s), current autonomous driving maneuvers for the vehicle 100, future autonomous driving maneuvers and/or modifications to current autonomous driving maneuvers based on data acquired by the sensor system 120, driving scene models, and/or data from any other suitable source such as determinations from the sensor data 250 as implemented by the tracking module 230. “Driving maneuver” means one or more actions that affect the movement of a vehicle. Examples of driving maneuvers include: accelerating, decelerating, braking, turning, moving in a lateral direction of the vehicle 100, changing travel lanes, merging into a travel lane, and/or reversing, just to name a few possibilities. The autonomous driving module(s) 160 can be configured can be configured to implement determined driving maneuvers.  Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication.  Paragraph [0226 and 0231], break.  Paragraph [0302], The first term may result in a penalty for non-zero accelerations, thus encouraging smooth driving. The second term depends on the ratio between the distance to the target car, x.sub.t, and the desired distance, x.sub.t*, which is defined as the maximum between a distance of 1 meter and break distance of 1.5 seconds. In some cases, this ratio may be exactly 1, but as long as this ratio is within [0.7, 1.3], the policy may forego any penalties, which may allow the host vehicle some slack in navigation—a characteristic that may be important in achieving a smooth drive).  
Claim 16 is rejected for the reasons set forth hereinabove for claim 1, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 1, 
wherein the one or more signals of intent include an intent of the agent to go off-road(Rosman, paragraph [0106-0110], The autonomous driving module(s) 160 either independently or in combination with the prediction system 170 can be configured to determine travel path(s), current autonomous driving maneuvers for the vehicle 100, future autonomous driving maneuvers and/or modifications to current autonomous driving maneuvers based on data acquired by the sensor system 120, driving scene models, and/or data from any other suitable source such as determinations from the sensor data 250 as implemented by the tracking module 230. “Driving maneuver” means one or more actions that affect the movement of a vehicle. Examples of driving maneuvers include: accelerating, decelerating, braking, turning, moving in a lateral direction of the vehicle 100, changing travel lanes, merging into a travel lane, and/or reversing, just to name a few possibilities. The autonomous driving module(s) 160 can be configured can be configured to implement determined driving maneuvers.  Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication.  Paragraph [0226 and 0231], break.  Paragraph [0302], The first term may result in a penalty for non-zero accelerations, thus encouraging smooth driving. The second term depends on the ratio between the distance to the target car, x.sub.t, and the desired distance, x.sub.t*, which is defined as the maximum between a distance of 1 meter and break distance of 1.5 seconds. In some cases, this ratio may be exactly 1, but as long as this ratio is within [0.7, 1.3], the policy may forego any penalties, which may allow the host vehicle some slack in navigation—a characteristic that may be important in achieving a smooth drive). 
Claim 17 is rejected for the reasons set forth hereinabove for claim 1, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 1, 
wherein the one or more signals includes whether the agent intended to run a red light, and the one or more characteristics includes a characteristic of running a red light(Rosman, paragraph [0071-0072], In a further example, consider the presence of a traffic light, and a car that is occluding the traffic light. Further consider the previous circumstance of T1 and T2. Thus, in such a circumstance, the tracking module 230 infers the presence of the status of the traffic light, and consequently, determines not to pass T2 since the light is the cause of the behaviors of T2. In general, features (i.e., observations) employed by the prediction system 170 for training and predicting on this circumstance include speeds, relative speeds, legal speed, lane positions, as well as the state of the traffic light. The state of the traffic light may be acquired through various means such as prediction or direct observation.  Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication.  Paragraph [0226 and 0231], break.  Paragraph [0302], The first term may result in a penalty for non-zero accelerations, thus encouraging smooth driving. The second term depends on the ratio between the distance to the target car, x.sub.t, and the desired distance, x.sub.t*, which is defined as the maximum between a distance of 1 meter and break distance of 1.5 seconds. In some cases, this ratio may be exactly 1, but as long as this ratio is within [0.7, 1.3], the policy may forego any penalties, which may allow the host vehicle some slack in navigation—a characteristic that may be important in achieving a smooth drive.).  
Claim 18 is rejected for the reasons set forth hereinabove for claim 1, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 1, 
wherein the one or more signals of intent include a maximum velocity of the agent, and the one or more characteristics includes the maximum velocity as a limit on a velocity of the interactive agent(Rosman, paragraph [0071-0072], By way of example, consider a circumstance where ado-car T1 (e.g., nearby vehicle/agent) is traveling on a roadway with a slower car T2 in front of T1. The tracking module 230 predicts whether T1 is likely to pass T2 by changing a lane before approaching too closely to T2. In general, since we normalize with respect to a pose of the vehicle relevant features include speeds, relative speeds, legal speed, and lane positions. The training module 270 trains the model 260 using, for example, imitation learning through updates with respect to available features (i.e., sensor data/observations of the noted features), and sampling based on noisy features (i.e. sampling positions that are observed). Thereafter, the tracking module 230 acquires similar information in real-time from sensors of the vehicle 100 to predict whether vehicle T1 will pass and thereby produce the estimate to inform maneuvers of the vehicle 100.  Paragraph [0091], the vehicle sensor(s) 121 can include a speedometer to determine a current speed of the vehicle 100.  Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication.  Paragraph [0226 and 0231], break.  Paragraph [0302], The first term may result in a penalty for non-zero accelerations, thus encouraging smooth driving. The second term depends on the ratio between the distance to the target car, x.sub.t, and the desired distance, x.sub.t*, which is defined as the maximum between a distance of 1 meter and break distance of 1.5 seconds. In some cases, this ratio may be exactly 1, but as long as this ratio is within [0.7, 1.3], the policy may forego any penalties, which may allow the host vehicle some slack in navigation—a characteristic that may be important in achieving a smooth drive).  
Claim 19 is rejected for the reasons set forth hereinabove for claim 1, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 1, 
wherein the one or more signals of intent include the maximum acceleration of the agent, and the one or more characteristics includes the maximum acceleration as a limit on acceleration of the interactive agent(Rosman, paragraph [0071-0072], By way of example, consider a circumstance where ado-car T1 (e.g., nearby vehicle/agent) is traveling on a roadway with a slower car T2 in front of T1. The tracking module 230 predicts whether T1 is likely to pass T2 by changing a lane before approaching too closely to T2. In general, since we normalize with respect to a pose of the vehicle relevant features include speeds, relative speeds, legal speed, and lane positions. The training module 270 trains the model 260 using, for example, imitation learning through updates with respect to available features (i.e., sensor data/observations of the noted features), and sampling based on noisy features (i.e. sampling positions that are observed). Thereafter, the tracking module 230 acquires similar information in real-time from sensors of the vehicle 100 to predict whether vehicle T1 will pass and thereby produce the estimate to inform maneuvers of the vehicle 100.  Paragraph [0091], the vehicle sensor(s) 121 can include a speedometer to determine a current speed of the vehicle 100.  Shalev-Shwartz, paragraph [0197-0198], For the training of ŝ.sub.t+1 given s.sub.t, a.sub.t, supervised learning may be used together with real data. For training the policy of nodes simulators can be used. Later, fine tuning of a policy can be accomplished using real data. Two concepts may make the simulation more realistic. First, using imitation, an initial policy can be constructed using the “behavior cloning” paradigm, using large real world data sets. In some cases, the resulting agents may be suitable. In other cases, the resulting agents at least form very good initial policies for the other agents on the roads. Second, using self-play, our own policy may be used to augment the training. For example, given an initial implementation of the other agents (cars/pedestrians) that may be experienced, a policy may be trained based on a simulator. Some of the other agents may be replaced with the new policy, and the process may be repeated. As a result, the policy can continue to improve as it should respond to a larger variety of other agents that have differing levels of sophistication.  Paragraph [0226 and 0231], break.  Paragraph [0302], The first term may result in a penalty for non-zero accelerations, thus encouraging smooth driving. The second term depends on the ratio between the distance to the target car, x.sub.t, and the desired distance, x.sub.t*, which is defined as the maximum between a distance of 1 meter and break distance of 1.5 seconds. In some cases, this ratio may be exactly 1, but as long as this ratio is within [0.7, 1.3], the policy may forego any penalties, which may allow the host vehicle some slack in navigation—a characteristic that may be important in achieving a smooth drive.  Paragraph [0193-0194], There are many examples of potential hard constraints. For example, a hard constraint may be defined in conjunction with a guardrail on an edge of a road. In no situation may the host vehicle be allowed to pass the guardrail. Such a rule induces a hard lateral constraint on the trajectory of the host vehicle. Another example of a hard constraint may include a road bump (e.g., a speed control bump), which may induce a hard constraint on the speed of driving before the bump and while traversing the bump. Hard constraints may be considered safety critical and, therefore, may be defined manually rather than relying solely on a trained system learning the constraints during training.).  
Claim 20 is rejected for the reasons set forth hereinabove for claim 1, Rosman, Shalev-Shwartz and Moustafa teach the method of claim 1, further comprising: 
evaluating results of running the simulation(Rosman, paragraph [0058-0065], the model 260 or another model that produces simulated observations used for learning and/or validation.); and 
flagging the simulation for review based on the evaluation(Rosman, paragraph [0058-0065], the model 260 or another model that produces simulated observations used for learning and/or validation.).
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DUY KHUONG THANH NGUYEN whose telephone number is (571)270-7139. The examiner can normally be reached M-F 8 to 5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached on 5712723759. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DUY KHUONG T NGUYEN/           Primary Examiner, Art Unit 2199