DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The disclosure is objected to because of the following informalities:
Paragraph [0039] line 14 “40a-40” should read “40a-40n”. 
Paragraph [0040] line 1 “panned” should read “planned”. ​​ 
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 9-13, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Tram et al. (US2020/0326719A1) which will further be referred to as Tram.

Regarding Claim 1, Tram teaches a method for controlling an autonomous vehicle, comprising: receiving, by a controller of the autonomous vehicle, a sensor input from a plurality of sensors of the autonomous vehicle ([0011] “In order to receive data comprising information about the surrounding environment the control device may be in communicative connection (wired or wireless) to perception systems having one or more sensors”); determining a plurality of possible planned movements of the autonomous vehicle in a future using a plurality of autonomous driving techniques and the sensor input from the plurality of sensors ([0008] “The first module is configured to receive data comprising information about a surrounding environment of the ego-vehicle, and to determine, by means of the trained self-learning model, an action to be executed by the ego-vehicle based on the received data.”); grading each of the plurality of possible planned movements to obtain a plurality of scores each corresponding to one of the plurality of possible planned movements, wherein the plurality of scores includes a highest score ([0040] “If the self-learning model is in the form of a trained reinforcement learning algorithm, then training can be realized as deep Q-learning. In more detail, reinforcement learning is when an agent/a model observes the states, of the environment, takes an action at, and receives a reward rt, at every time step t. Through "experience" the agent updates a policy π in a way that maximizes the accumulated reward T in order to find the optimal policy π*.”); selecting one of the plurality of possible planned movements that corresponds with the highest score of the plurality of scores ([0040] “The optimal policy is given by the action that gives the highest Q-value.”); determining a predicted movement of at least one other vehicle based on the selected one of the plurality of possible planned movements ( [0042] “The second module 24 also receives environmental observations in order to predict a future environmental state (e.g. predict trajectories of the ego-vehicle and surrounding vehicles), and to generate a suitable trajectory for the vehicle given the determined action.”); determining a plurality of possible reactive movements of the autonomous vehicle in the future based on the predicted movement of the at least one other vehicle ([0051] “a trajectory for the ego-vehicle is determined 104 based on the received action for the finite time horizon and on the predicted 103 environmental state for the first time period,” and [0057] “of the second module generates future motions of other vehicles 2a, 2e in the environment.”); modifying the plurality of possible planned movements to include the plurality of possible reactive movements to obtain a plurality of modified planned movements in the future ([0045] “The first module 21 is configured to receive this feedback signal and run it through the self-learning model in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The first module 21 also receives data comprising information about the surrounding environment during the first time period, in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The feedback accordingly acts an additional input to the trained self-learning model to generate actions for the ego vehicle. For example, the trained self-learning model learned that in this specific environmental state (based on the received data), with a feedback signal indicating that the "confidence level" of the prediction made by the second module 22 is low, the previously determined action is still feasible and will therefore allow the second module to keep providing a control signal to the ego-vehicle based on the previously determined action. Thus, the first module 21 can compensate for model errors of the second module 22, and avoid unnecessary stops or jerking due to constant generation of new actions.”); regrading each of the plurality of modified planned movements to obtain a plurality of updated scores each corresponding to one of the plurality of modified planned movements, wherein the plurality of updated scores includes a highest updated score ([0045] “The first module 21 is configured to receive this feedback signal and run it through the self-learning model in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The first module 21 also receives data comprising information about the surrounding environment during the first time period, in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The feedback accordingly acts an additional input to the trained self-learning model to generate actions for the ego vehicle.” and [0040] “If the self-learning model is in the form of a trained reinforcement learning algorithm, then training can be realized as deep Q-learning. In more detail, reinforcement learning is when an agent/a model observes the states, of the environment, takes an action at, and receives a reward rt, at every time step t. Through "experience" the agent updates a policy π in a way that maximizes the accumulated reward T in order to find the optimal policy π*.” ); selecting one of the plurality of modified planned movements that corresponds with the highest updated score of the plurality of scores ([0040] “The optimal policy is given by the action that gives the highest Q-value.” ); and commanding, by the controller, the autonomous vehicle to move according to the selected one of the plurality of modified planned movements with the highest updated score ([0042] “the second module being arranged as a controller.” and [0044] “the second module 22 sends the signal in order to control the ego vehicle according to the determined trajectory during the first time period while the determined action is feasible based on the at least one predefined criteria”). 
	Tram does not explicitly teach a plurality of possible planned or reactive movement, but implies this limitation ([0049] “The self-learning model of the first module 21 aids the MPC 22 by solving the mixed integer problem and by implicitly approximating which action is the optimal one so that the MPC 22 only needs to compute a prediction for a single action.”).  *Examiner interprets the expression of “which action” is an example of a plurality of actions.  
It would have been obvious to one having ordinary skill in the art based on the teaching of Tram that the first module determines which action is the optimal action one would be able to determine that the learning algorithm also taught in Tram would be able to compare multiple planned movements. One would be motivated to compare multiple planned movements in order to execute movements more precisely as certain movements will not be executed if they do not meet certain standards. One would want to specify certain standards in order to better ensure passenger safety.  

Examiner interprets control device to be synonymous with the controller. Examiner interprets autonomous driving technique to be “one or more rule-based models and/or machine- learning models ([0039] Applicant’s specification as filed). Examiner interprets planned movements to be an operation that a vehicle can perform for example a left turn. Examiner interprets action to be synonymous with planned movements (see Tram [0012]). 
Regarding Claim 2, Tram teaches the method of claim 1 as detailed above.
	Tram does not explicitly teach wherein the plurality of autonomous driving techniques includes a rule-based model but does suggest this limitation ([0065] “software implementations could be accomplished with standard programming techniques with rule-based logic and other logic to accomplish the various connection steps, processing steps, comparison steps and decision steps.”).  
	It would have been obvious to on having ordinary skill in the art based on the suggestion of a program using rule-based logic to perform connection steps, processing steps, comparison steps, and decision steps, rule-based logic could be included in the autonomous driving techniques. One would be motivated to use a rule-based model because it can be used when training data is not readily available. Therefore, using a rule-based model would increase the reliability of the system.  

Regarding Claim 3, Tram teaches the method of claim 1 as detailed above.
	Tram does not explicitly teach wherein the plurality of autonomous driving techniques includes a machine learning tree-regression model but does suggest this limitation ([0029] “the second module can be in the form of a sample based tree search controller.”).  
	It would have been obvious to one of ordinary skill in the art based on the suggestion that the second module can be in the form of a sample based tree search that one could have the autonomous driving techniques include a machine learning tree-regression model. One would have been motivated to implement a tree-regression model to reduce calculating time as it requires less data pre-processing. One would want to reduce calculating time in order to allow for an autonomous vehicle to respond quickly to the surrounding environment in order to ensure passenger safety. 



Regarding Claim 9, Tram teaches the method of claim 1 as detailed above.
Tram does not explicitly teach wherein the future is 3.5 seconds ahead of a current time.
However, Tram does teach a future can be selected based on a specific application or an associated specification (Tram, [0012] “A time period is in the present context to be construed as a sub-portion of or the complete the finite time horizon. For example, if the finite time horizon is ten seconds, the time period may be two seconds, four seconds, or ten seconds. The length of the finite time horizon and the time period can be arbitrarily selected depending on a desired application, or associated specifications.”).
Examiner interprets time period to be a future timeslot 
It would have been obvious to one having ordinary skill in the art based on the teachings of Tram that a future time can be chosen arbitrarily based on specific desires therefore one would be able to choose the future time to be 3.5 seconds. One would be motivated to choose the future time to be specifically 3.5 seconds in order to make specific adjustments and to account for changes in the nearby environment. One would what to account make specific adjustments and account for changes in the nearby environment in order to promote passenger safety. 

Regarding Claim 10, Tram teaches the method of claim 1 as detailed above.
Tram does not explicitly teach wherein the predicted movement of at least one other vehicle is determined using a kinematic prediction model but does suggest this limitation ([0057] “For simplicity, one can assume a constant velocity prediction model. The motion is predicted at every time instant for prediction times k∈[0,N] and can be used to form the collision avoidance constraints. As the skilled reader readily understands, there are other more or less complex prediction methods, however, the following simplified model is used to show the overall potential of the proposed solution.”). 
It would have been obvious to one having ordinary skill of the art based on the suggestion that the predicted motion can be based on a constant velocity model that the predicted movement can be determined using a kinematic model. One would be motivated to use a kinematic model to predict the movement of at least one other vehicle in order to yield accurate calculations for example with respect to position. One would want to accurately determine the position of a vehicle in order to reduce the possibility of a collision. 

Regarding Claim 11,  Tram teaches a control system for an autonomous vehicle, comprising: a plurality of sensors ([0011] perception systems having one or more sensors); a controller in communication with the plurality of sensors, wherein the controller is programmed to: receive input from a plurality of sensors of the autonomous vehicle ( [0011] “In order to receive data comprising information about the surrounding environment the control device may be in communicative connection (wired or wireless) to perception systems having one or more sensors” ); determine a plurality of possible planned movements of the autonomous vehicle in a future using a plurality of autonomous driving techniques and input from the plurality of sensors ([0008] “The first module is configured to receive data comprising information about a surrounding environment of the ego-vehicle, and to determine, by means of the trained self-learning model, an action to be executed by the ego-vehicle based on the received data.”); grade each of the plurality of possible planned movements to obtain a plurality of scores each corresponding to one of the plurality of possible planned movements, wherein the plurality of scores includes a highest score ([0040] “If the self-learning model is in the form of a trained reinforcement learning algorithm, then training can be realized as deep Q-learning. In more detail, reinforcement learning is when an agent/a model observes the states, of the environment, takes an action at, and receives a reward rt, at every time step t. Through "experience" the agent updates a policy π in a way that maximizes the accumulated reward T in order to find the optimal policy π*.); select one of the plurality of the possible planned movements that corresponds with the highest score of the plurality of scores ([0040] “The optimal policy is given by the action that gives the highest Q-value.”); determine a predicted movement of at least one other vehicle based on the selected one of the plurality of possible planned movements ([0042] “The second module 24 also receives environmental observations in order to predict a future environmental state (e.g. predict trajectories of the ego-vehicle and surrounding vehicles), and to generate a suitable trajectory for the vehicle given the determined action.”); determine a plurality of possible reactive movements of the autonomous vehicle in the future based on the predicted movement of the at least one other vehicle ([0051] “a trajectory for the ego-vehicle is determined 104 based on the received action for the finite time horizon and on the predicted 103 environmental state for the first time period,” and [0057] “of the second module generates future motions of other vehicles 2a, 2e in the environment.”); modify the plurality of possible planned movements to include the plurality of possible reactive movements to obtain a plurality of modified planned movements in the future([0045] “The first module 21 is configured to receive this feedback signal and run it through the self-learning model in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The first module 21 also receives data comprising information about the surrounding environment during the first time period, in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The feedback accordingly acts an additional input to the trained self-learning model to generate actions for the ego vehicle. For example, the trained self-learning model learned that in this specific environmental state (based on the received data), with a feedback signal indicating that the "confidence level" of the prediction made by the second module 22 is low, the previously determined action is still feasible and will therefore allow the second module to keep providing a control signal to the ego-vehicle based on the previously determined action. Thus, the first module 21 can compensate for model errors of the second module 22, and avoid unnecessary stops or jerking due to constant generation of new actions.”); regrade each of the plurality of modified planned movements to obtain a plurality of updated scores each corresponding to one of the plurality of modified planned movements, wherein the plurality of updated scores includes a highest updated score ([0045] “The first module 21 is configured to receive this feedback signal and run it through the self-learning model in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The first module 21 also receives data comprising information about the surrounding environment during the first time period, in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The feedback accordingly acts an additional input to the trained self-learning model to generate actions for the ego vehicle.” and [0040]); and select one of the plurality of modified planned movements that corresponds with the highest updated score of the plurality of scores ([0040] “The optimal policy is given by the action that gives the highest Q-value.”); and command the autonomous vehicle to move according to the selected one of the plurality of modified planned movements with the highest updated score ([0042] “the second module being arranged as a controller.” and [0044] “the second module 22 sends the signal in order to control the ego vehicle according to the determined trajectory during the first time period while the determined action is feasible based on the at least one predefined criteria”).  

Examiner interprets that the first and second module are incorporated in the control device and the control device is synonymous with the controller. 

Tram does not explicitly teach that the evaluation function is applied to multiple actions but implies this limitation ([0049] “The self-learning model of the first module 21 aids the MPC 22 by solving the mixed integer problem and by implicitly approximating which action is the optimal one so that the MPC 22 only needs to compute a prediction for a single action.”). 
It would have been obvious to one having ordinary skill in the art based on the teaching of Tram that the first module determines which action is the optimal action one would be able to determine that the learning algorithm also taught in Tram would be able to compare multiple planned movements. One would be motivated to compare multiple planned movements in order to execute movements more precisely as certain movements will not be executed if they do not meet certain standards. One would want to specify certain standards in order to better ensure passenger safety.  

Regarding Claim 12, Tram teaches the control system of claim 11 as detailed above.
Tram does not explicitly teach wherein the plurality of autonomous driving techniques includes a rule-based model but does suggest this limitation ([0065] “software implementations could be accomplished with standard programming techniques with rule-based logic and other logic to accomplish the various connection steps, processing steps, comparison steps and decision steps.”).  
	It would have been obvious to on having ordinary skill in the art based on the suggestion of a program using rule-based logic to perform connection steps, processing steps, comparison steps, and decision steps, rule-based logic could be included in the autonomous driving techniques. One would be motivated to use a rule-based model because it can be used when training data is not readily available. Therefore, using a rule-based model would increase the reliability of the system.  

Regarding Claim 13, Tram teaches the control system of claim 11 as detailed above. 
	Tram does not explicitly teach wherein the plurality of autonomous driving techniques includes a machine learning tree-regression model but does suggest this limitation ([0029] “the second module can be in the form of a sample based tree search controller.”).   
It would have been obvious to one of ordinary skill in the art based on the suggestion that the second module can be in the form of a sample based tree search that one could have the autonomous driving techniques include a machine learning tree-regression model. One would have been motivated to implement a tree-regression model to reduce calculating time as it requires less data pre-processing. One would want to reduce calculating time in order to allow for an autonomous vehicle to respond quickly to the surrounding environment in order to ensure passenger safety.

Regarding Claim 19, Tram teaches the control system of claim 11 as detailed above. 
Tram does not explicitly teach wherein the future is 3.5 seconds ahead of a current time.
However, Tram does teach a future can be selected based on a specific application or an associated specification ([0012] “A time period is in the present context to be construed as a sub-portion of or the complete the finite time horizon. For example, if the finite time horizon is ten seconds, the time period may be two seconds, four seconds, or ten seconds. The length of the finite time horizon and the time period can be arbitrarily selected depending on a desired application, or associated specifications.”).
Examiner interprets time period to be a future timeslot 
It would have been obvious to one having ordinary skill in the art based on the teachings of Tram that a future time can be chosen arbitrarily based on specific desires one would be able to choose the future time to be 3.5 seconds ahead of a current time. One would be motivated to choose the future time to be specifically 3.5 seconds in order to make specific adjustments and to account for changes in the nearby environment. One would what to account make specific adjustments and account for changes in the nearby environment in order to promote passenger safety. 

Regarding Claim 20, Tram teaches the control system of claim 11 as detailed above.
Tram does not explicitly teach wherein the predicted movement of at least one other vehicle is determined using a kinematic prediction model but does suggest this limitation ([0057] “For simplicity, one can assume a constant velocity prediction model. The motion is predicted at every time instant for prediction times k∈[0,N] and can be used to form the collision avoidance constraints. As the skilled reader readily understands, there are other more or less complex prediction methods, however, the following simplified model is used to show the overall potential of the proposed solution.”). 
It would have been obvious to one having ordinary skill of the art based on the suggestion that the predicted motion can be based on a constant velocity model that the predicted movement can be determined using a kinematic model. One would be motivated to use a kinematic model to predict the movement of at least one other vehicle in order to yield accurate calculations for example with respect to position. One would want to accurately determine the position of a vehicle in order to reduce the possibility of a collision.

Claims 4-8 and 14-18 are rejected under 35 U.S.C. 103 as being unpatentable over Tram et al. (US2020/0326719A1) in view of Liu et al. (US2019/0346851A1) which will further be referred to as Tram and Liu respectively. 

Regarding Claim 4, Tram teaches the method of claim 1 as detailed above.
Tram teaches wherein grading each of the possible planned movements includes determining a speed of the autonomous vehicle for each of the plurality of possible planned movements ([0039] “The decision making process for the trained self-learning model in the first module can be modelled as a Partially Observable Markov Decision Process (POMDP). A POMPDP is defined by the seven-tuple (S, A, Τ, R, Ω, O, γ), where S is a state space, A is an action space, T is the transmission function, T is the reward function T: S x A ->R, Ω is the observation space, O is the probability of being in state st, given the observation O, and γ is the discount factor. At each time instant, an action a-t∈A, is taken, which will change the environment state st to a new state vector st+1· Each transition to st with action at has a reward rt given by a reward function Tt. and [0032] “Still further, the received data may comprise a current state of the ego-vehicle 1 and a current state of any surrounding vehicles 2a-f relative to the ego-vehicle 1. In more detail, the received data may comprise a pose (position and heading angle) and a speed of the ego-vehicle 1, as well as a pose (position and heading angle) and speed of any surrounding vehicles 2a-f). Tram suggests certain criteria that can be considered when determining an action to be performed by the autonomous vehicle ([0045]-[0046] In more detail, the second module is configured to continuously reassess if the predicted environmental states correlates with the perceived "current" environmental state during the first time period, and to provide a feedback based on this assessment to the first module 21. The first module 21 is configured to receive this feedback signal and run it through the self-learning model in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The first module 21 also receives data comprising information about the surrounding environment during the first time period, in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The feedback accordingly acts an additional input to the trained self-learning model to generate actions for the ego vehicle…The wording that an action is feasible is to be understood as that a determined trajectory can be executed given the constraints defined by the one or more predefined criteria. In more detail, the predefined criteria may for example be one or more of the following: to execute the determined trajectory without collisions with other vehicles, without exceeding predefined acceleration thresholds, without exceeding predefined time thresholds, without breaking any predefined traffic rules, and so forth. A predefined passenger comfort model comprising an acceleration/deceleration limit of e.g. 5 m/s2 may define the acceleration thresholds.”). 
Tram does not explicitly teach wherein grading each of the possible planned movements includes determining a distance from the autonomous vehicle to another object for each of the plurality of possible planned movements, a presence of a stop sign for each of the plurality of possible planned movements, a distance from the autonomous vehicle to the stop sign for each of the plurality of possible planned movements, a presence of a pedestrian for each of the plurality of possible planned movements, and a distance from the autonomous vehicle to the pedestrian for each of the plurality of possible planned movements.  
Liu more explicitly teaches wherein grading each of the possible planned movements includes determining a presence of a stop sign for each of the plurality of possible planned movements, a distance from the autonomous vehicle to the stop sign for each of the plurality of possible planned movements, a presence of a pedestrian for each of the plurality of possible planned movements, and a distance from the autonomous vehicle to the pedestrian for each of the plurality of possible planned movements ([0043] “A score for each candidate maneuver in the set of candidate maneuvers can be generated, and a selected maneuver can be determined based at least in part on the scores for each candidate maneuver in the set of candidate maneuvers…The score generated for each candidate maneuver can include one or more scoring factors, including but not limited to costs, discounts and/or rewards associated with aspects of a candidate maneuver for use in evaluation of a cost function or other scoring equation. Example scoring factors can include, for example, a dynamics cost for given dynamics ( e.g., jerk, acceleration) associated with the candidate maneuver, a buffer cost associated with proximity of a candidate maneuver to one or more constraints within the multi-dimensional space, a constraint violation cost associated with violating one or more constraints, a reward or discount for one or more achieved performance objectives and  [0092] “To provide an example cost function 324 for the purpose of illustration: a first example cost function can provide a first cost that is negatively correlated to a magnitude of a first distance from the autonomous vehicle to a proximate object of interest.” and [0097]-[0102] “In general, constraints can be generated by scenario generator 206 (e.g., via constraint generator 410) relative to one or more objects of interest determined by object classifier 400…Objects of interest, can include, for example, one or more of a vehicle, a pedestrian, a bicycle, a traffic light, a stop sign, a crosswalk, and a speed zone. In some implementations, an object classifier 400 can more particularly include a blocking classifier 404, a yield zone generator 406, and a side classifier 408 (as illustrated in FIG. 14). More particularly, an object classifier 400 can make determinations based on world state data determined by an autonomy computing system ( e.g., by world state generator 204). For example, the world state generator 204 can determine one or more features associated with an object and/or the surrounding environment. For example, the features can be determined based at least in part on the state data associated with the object.”)

Examiner interprets the object classifier is responsible for grading of a possible planned movement associated with the presence of pedestrians and stop signs. 

	It would have been obvious to one having ordinary skill in the art based on the suggestion that the learning algorithm can incorporate certain criteria or constraints in order to determine a planned movement among a plurality of possible planned movements as taught in Tram one would be able to add any number of constraints in relation to autonomous vehicle’s environment. Therefore, it would have been obvious to combine the learning algorithm taught in Tram with the constraints taught in Liu that are associated with the presence of objects such as pedestrians and stops signs and their respective distances to the autonomous vehicle. One would have been motivated to combine the added constraints with the in order to better represent the autonomous vehicle’s surrounding environment. One would want to better represent the vehicle’s surrounding environment to better respond to various events in the environment to increase passenger safety.   

Regarding Claim 5, the combination of Tram and Liu teach the method of claim 4 as detailed above.
	Tram does not explicitly teach wherein grading each of the possible planned movements includes assigning a partial score to each of the speed of the autonomous vehicle for each of the plurality of possible planned movements , the distance from the autonomous vehicle to another object for each of the plurality of possible planned movements , the presence of the stop sign for each of the plurality of possible planned movements , the distance from the autonomous vehicle to the stop sign for each of the plurality of possible planned movements, the presence of a pedestrian for each of the plurality of possible planned movements , and the distance from the autonomous vehicle to the pedestrian for each of the plurality of possible planned movements in order to obtain a plurality of partial movement scores for each of the plurality of possible planned movements.  
Liu teaches  wherein grading each of the possible planned movements includes assigning a partial score to each of the speed of the autonomous vehicle for each of the plurality of possible planned movements , the distance from the autonomous vehicle to another object for each of the plurality of possible planned movements , the presence of the stop sign for each of the plurality of possible planned movements , the distance from the autonomous vehicle to the stop sign for each of the plurality of possible planned movements, the presence of a pedestrian for each of the plurality of possible planned movements , and the distance from the autonomous vehicle to the pedestrian for each of the plurality of possible planned movements in order to obtain a plurality of partial movement scores for each of the plurality of possible planned movements ([0085] “a selected maneuver 305 can be determined based at least in part on the scores for each candidate maneuver in the set of candidate maneuvers…The score generated for each candidate maneuver can include one or more scoring factors, including but not limited to costs, discounts and/or rewards associated with aspects of a candidate maneuver for use in evaluation of a cost function or other scoring equation. Example scoring factors can include, for example, a dynamics cost for given dynamics (e.g., jerk, acceleration) associated with the candidate maneuver, a buffer cost associated with proximity of a candidate maneuver to one or more constraints within the multi dimensional space,” and [0089] “the total cost can be based at least in part on one or more cost functions 324. In one example implementation, the total cost equals the sum of all costs minus the sum of all rewards and the optimization planner attempts to minimize the total cost. The cost functions 324 can be evaluated by a penalty/reward generator 322.” also see [0092]-[0093]).  
It would have been obvious to one having ordinary skill in the art based on the teaches of the combination of Tram and Liu with respect to the different factors that are included in the scoring and the further teaching of individual costs that can be summed together in cost functions which are then the bases of the score generated as taught in Liu one would be able to grade partial scores with respect to the different factors. One would have been motivated to incorporate partial scores to have more variables to manipulate. One would want more variables to manipulate to create a more accurate depiction of events that can happen in the autonomous vehicle’s environment.  

Regarding Claim 6, the combination of Tram and Liu teaches the method of claim 5 as detailed above.
Tram does not explicitly teach wherein each of the plurality of scores is a function of the plurality of partial movement scores. 
However, Liu more explicitly teaches wherein each of the plurality of scores is a function of the plurality of partial movement scores ([0043] “The score generated for each candidate maneuver can include one or more scoring factors, including but not limited to costs, discounts and/or rewards associated with aspects of a candidate maneuver for use in evaluation of a cost function or other scoring equation.” and [0089] “the total cost can be based at least in part on one or more cost functions 324. In one example implementation, the total cost equals the sum of all costs minus the sum of all rewards and the optimization planner attempts to minimize the total cost. The cost functions 324 can be evaluated by a penalty/reward generator 322.”).  
It would have been obvious to one having ordinary skill in the art based on the teachings in Liu that the score is a function of total costs or scoring factors one would have been able to create a scoring function that is the function of a plurality of partial scores. One would have been motivated to do so in order to weigh the different aspects with respect to the planned movement.  One would want to weigh the different aspects with respect to the planned movement to provide the best plan of action according to a specific environment to increase the overall safety of the passengers. 

Regarding Claim 7, the combination of Tram and Liu teach the method of claim 6 as detailed above.
	The combination of Tram and Liu teaches wherein grading each of the possible planned movements includes determining a speed of the autonomous vehicle for each of the plurality of possible planned movements, a distance from the autonomous vehicle to another object for each of the plurality of possible planned movements, a presence of a stop sign for each of the plurality of possible planned movements, a distance from the autonomous vehicle to the stop sign for each of the plurality of possible planned movements, a presence of a pedestrian for each of the plurality of possible planned movements, and a distance from the autonomous vehicle to the pedestrian for each of the plurality of possible planned movements as detailed in claim 4 above. 
	Tram teaches regrading the plurality of possible planned movements ([0045] “The first module 21 is configured to receive this feedback signal and run it through the self-learning model in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The first module 21 also receives data comprising information about the surrounding environment during the first time period, in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The feedback accordingly acts an additional input to the trained self-learning model to generate actions for the ego vehicle.”).
	Tram does not explicitly teach wherein regrading each of the plurality of possible planned movements includes determining an updated speed of the autonomous vehicle for each of the plurality of modified planned movements, an updated distance from the autonomous vehicle to another object for each of the plurality of modified planned movements, an updated presence of a stop sign for each of the plurality of modified planned movements, an updated distance from the autonomous vehicle to the stop sign for each of the plurality of modified planned movements, an updated presence of a pedestrian for each of the plurality of modified planned movements, and an updated distance from the autonomous vehicle to the pedestrian for each of the plurality of modified planned movements.  
	It would have been obvious to one having ordinary skill in the art to combine the regrading of the possible planned movements as taught in Tram with the scores taught by the combination of Tram and Liu. One would have been motivated to do so in order to have a score that reflects the current environment. One would want to score the planned movements based on the current environment to avoid maneuvers that will lead to collisions. 

Regarding Claim 8, the combination of Tram and Liu teach the method of claim 7 as detailed above.
The combination of Tram and Liu teach wherein grading each of the possible planned movements includes assigning a partial score to each of the speed of the autonomous vehicle for each of the plurality of possible planned movements , the distance from the autonomous vehicle to another object for each of the plurality of possible planned movements , the presence of the stop sign for each of the plurality of possible planned movements , the distance from the autonomous vehicle to the stop sign for each of the plurality of possible planned movements, the presence of a pedestrian for each of the plurality of possible planned movements , and the distance from the autonomous vehicle to the pedestrian for each of the plurality of possible planned movements in order to obtain a plurality of partial movement scores for each of the plurality of possible planned movements as detailed above with regard to claim 5.  
Tram teaches regrading the plurality of possible planned movements ([0045] “The first module 21 is configured to receive this feedback signal and run it through the self-learning model in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The first module 21 also receives data comprising information about the surrounding environment during the first time period, in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The feedback accordingly acts an additional input to the trained self-learning model to generate actions for the ego vehicle.”).
Tram does not explicitly teach wherein regrading each of the possible modified movements includes assigning an updated partial score to each of the updated speed of the autonomous vehicle for each of the plurality of modified planned movements, the updated distance from the autonomous vehicle to another object for each of the plurality of modified planned movements, the updated presence of the stop sign for each of the plurality of modified planned movements, the updated distance from the autonomous vehicle to the stop sign for each of the plurality of modified planned movements, the updated presence of a pedestrian for each of the plurality of modified planned movements, and the updated distance from the autonomous vehicle to the pedestrian for each of the plurality of modified planned movements in order to obtain a plurality of partial modified scores for each of the plurality of modified planned movements.  
	It would have been obvious to one having ordinary skill in the art to combine the regrading of the possible planned movements as taught in Tram with the partial scores taught in Liu. One would have been motivated to update the partial scores to have them reflects the current environment. One would want the partial scores to reflect the current environment to be able to have a more accurate grading system in order to reduce the possibility of dangerous traffic situations.  

Regarding Claim 14, Tram teaches the control system of claim 11 as detailed above.
	Tram teaches wherein the controller grades each of the possible planned movements by determining a speed of the autonomous vehicle for each of the plurality of possible planned movements ([0039] “The decision making process for the trained self-learning model in the first module can be modelled as a Partially Observable Markov Decision Process (POMDP). A POMPDP is defined by the seven-tuple (S, A, Τ, R, Ω, O, γ), where S is a state space, A is an action space, T is the transmission function, T is the reward function T: S x A ->R, Ω is the observation space, O is the probability of being in state st, given the observation O, and γ is the discount factor. At each time instant, an action a-t∈A, is taken, which will change the environment state st to a new state vector st+1· Each transition to st with action at has a reward rt given by a reward function Tt. and [0032] “Still further, the received data may comprise a current state of the ego-vehicle 1 and a current state of any surrounding vehicles 2a-f relative to the ego-vehicle 1. In more detail, the received data may comprise a pose (position and heading angle) and a speed of the ego-vehicle 1, as well as a pose (position and heading angle) and speed of any surrounding vehicles 2a-f). Tram suggests certain criteria that can be considered when determining an action to be performed by the autonomous vehicle ([0045]-[0046] In more detail, the second module is configured to continuously reassess if the predicted environmental states correlates with the perceived "current" environmental state during the first time period, and to provide a feedback based on this assessment to the first module 21. The first module 21 is configured to receive this feedback signal and run it through the self-learning model in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The first module 21 also receives data comprising information about the surrounding environment during the first time period, in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The feedback accordingly acts an additional input to the trained self-learning model to generate actions for the ego vehicle…The wording that an action is feasible is to be understood as that a determined trajectory can be executed given the constraints defined by the one or more predefined criteria. In more detail, the predefined criteria may for example be one or more of the following: to execute the determined trajectory without collisions with other vehicles, without exceeding predefined acceleration thresholds, without exceeding predefined time thresholds, without breaking any predefined traffic rules, and so forth. A predefined passenger comfort model comprising an acceleration/deceleration limit of e.g. 5 m/s2 may define the acceleration thresholds.”).  
	Tram does not explicitly teach wherein the controller grades each of the possible planned movements by determining a distance from the autonomous vehicle to another object for each of the plurality of possible planned movements, a presence of a stop sign for each of the plurality of possible planned movements, a distance from the autonomous vehicle to the stop sign for each of the plurality of possible planned movements, a presence of a pedestrian for each of the plurality of possible planned movements, and a distance from the autonomous vehicle to the pedestrian for each of the plurality of possible planned movements.  
	Liu more explicitly teaches wherein the controller (see at least Fig.1 and [0021] “The autonomous vehicle can include an autonomy computing system that assists in controlling the autonomous vehicle. In some implementations, the autonomy computing system can include a perception system, a prediction system, and a motion planning system that cooperate to perceive the surrounding environment of the autonomous vehicle and determine a motion plan for controlling the motion of the autonomous vehicle accordingly.”) grades each of the possible planned movements by determining a speed of the autonomous vehicle for each of the plurality of possible planned movements, a distance from the autonomous vehicle to another object for each of the plurality of possible planned movements, a presence of a stop sign for each of the plurality of possible planned movements, a distance from the autonomous vehicle to the stop sign for each of the plurality of possible planned movements, a presence of a pedestrian for each of the plurality of possible planned movements, and a distance from the autonomous vehicle to the pedestrian for each of the plurality of possible planned movements ([0043] “A score for each candidate maneuver in the set of candidate maneuvers can be generated, and a selected maneuver can be determined based at least in part on the scores for each candidate maneuver in the set of candidate maneuvers…The score generated for each candidate maneuver can include one or more scoring factors, including but not limited to costs, discounts and/or rewards associated with aspects of a candidate maneuver for use in evaluation of a cost function or other scoring equation. Example scoring factors can include, for example, a dynamics cost for given dynamics ( e.g., jerk, acceleration) associated with the candidate maneuver, a buffer cost associated with proximity of a candidate maneuver to one or more constraints within the multi-dimensional space, a constraint violation cost associated with violating one or more constraints, a reward or discount for one or more achieved performance objectives and  [0092] “To provide an example cost function 324 for the purpose of illustration: a first example cost function can provide a first cost that is negatively correlated to a magnitude of a first distance from the autonomous vehicle to a proximate object of interest.” and [0097]-[0102] “In general, constraints can be generated by scenario generator 206 (e.g., via constraint generator 410) relative to one or more objects of interest determined by object classifier 400…Objects of interest, can include, for example, one or more of a vehicle, a pedestrian, a bicycle, a traffic light, a stop sign, a crosswalk, and a speed zone. In some implementations, an object classifier 400 can more particularly include a blocking classifier 404, a yield zone generator 406, and a side classifier 408 (as illustrated in FIG. 14). More particularly, an object classifier 400 can make determinations based on world state data determined by an autonomy computing system ( e.g., by world state generator 204). For example, the world state generator 204 can determine one or more features associated with an object and/or the surrounding environment. For example, the features can be determined based at least in part on the state data associated with the object.”).  
Examiner interprets that autonomy computing system 102 is synonymous with the controller
	      It would have been obvious to one having ordinary skill in the art based on the suggestion that the learning algorithm can incorporate certain criteria or constraints in order to determine a planned movement among a plurality of possible planned movements as taught in Tram one would be able to add any number of constraints in relation to autonomous vehicle’s environment. Therefore, it would have been obvious to combine the learning algorithm taught in Tram with the constraints taught in Liu that are associated with the presence of objects such as pedestrians and stops signs and their respective distances to the autonomous vehicle. One would have been motivated to combine the added constraints with the in order to better represent the autonomous vehicle’s surrounding environment. One would want to better represent the vehicle’s surrounding environment to better respond to various events in the environment to increase passenger safety.    

Regarding Claim 15, the combination of Tram and Liu teach the control system of claim 14 as detailed above.
Tram does not explicitly teach wherein the controller grades each of the possible planned movements by assigning a partial score to each of the speed of the autonomous vehicle for each of the plurality of possible planned movements, the distance from the autonomous vehicle to another object for each of the plurality of possible planned movements, the presence of the stop sign for each of the plurality of possible planned movements, the distance from the autonomous vehicle to the stop sign for each of the plurality of possible planned movements, the presence of a pedestrian for each of the plurality of possible planned movements, and the distance from the autonomous vehicle to the pedestrian for each of the plurality of possible planned movements in order to obtain a plurality of partial movement scores for each of the plurality of possible planned movements.  
Liu more explicitly teaches wherein the controller (at least see Fig. 1, Fig.2, and Fig.4 ) grades each of the possible planned movements by assigning a partial score to each of the speed of the autonomous vehicle for each of the plurality of possible planned movements, the distance from the autonomous vehicle to another object for each of the plurality of possible planned movements, the presence of the stop sign for each of the plurality of possible planned movements, the distance from the autonomous vehicle to the stop sign for each of the plurality of possible planned movements, the presence of a pedestrian for each of the plurality of possible planned movements, and the distance from the autonomous vehicle to the pedestrian for each of the plurality of possible planned movements in order to obtain a plurality of partial movement scores for each of the plurality of possible planned movements ( [0085] “a selected maneuver 305 can be determined based at least in part on the scores for each candidate maneuver in the set of candidate maneuvers…The score generated for each candidate maneuver can include one or more scoring factors, including but not limited to costs, discounts and/or rewards associated with aspects of a candidate maneuver for use in evaluation of a cost function or other scoring equation. Example scoring factors can include, for example, a dynamics cost for given dynamics (e.g., jerk, acceleration) associated with the candidate maneuver, a buffer cost associated with proximity of a candidate maneuver to one or more constraints within the multi dimensional space,” and [0089] “the total cost can be based at least in part on one or more cost functions 324. In one example implementation, the total cost equals the sum of all costs minus the sum of all rewards and the optimization planner attempts to minimize the total cost. The cost functions 324 can be evaluated by a penalty/reward generator 322.” also see [0092]-[0093]). 
It would have been obvious to one having ordinary skill in the art based on the teaches of the combination of Tram and Liu with respect to the different factors that are included in the scoring and the further teaching of individual costs that can be summed together in cost functions which are then the bases of cost functions as taught in Liu one would be able to grade partial scores with respect to the different factors. One would have been motivated to incorporate partial scores to have more variables to manipulate. One would want more variables to manipulate to create a more accurate depiction of events that can happen in the autonomous vehicle’s environment.    

Regarding Claim 16, the combination of Tram and Liu teach the control system of claim 15 as detailed above.
 	Tram does not explicitly teach wherein each of the plurality of scores is a function of the plurality of partial movement scores.  
	However, Liu more explicitly teaches wherein each of the plurality of scores is a function of the plurality of partial movement scores ([0043] “The score generated for each candidate maneuver can include one or more scoring factors, including but not limited to costs, discounts and/or rewards associated with aspects of a candidate maneuver for use in evaluation of a cost function or other scoring equation.” and [0089] “the total cost can be based at least in part on one or more cost functions 324. In one example implementation, the total cost equals the sum of all costs minus the sum of all rewards and the optimization planner attempts to minimize the total cost. The cost functions 324 can be evaluated by a penalty/reward generator 322.”).
	It would have been obvious to one having ordinary skill in the art based on the teachings in Liu that the score is a function of total costs or scoring factors one would have been able to create a scoring function that is the function of a plurality of partial scores. One would have been motivated to do so in order to weigh the different aspects with respect to the planned movement.  One would want to weigh the different aspects with respect to the planned movement to provide the best plan of action according to a specific environment to increase the overall safety of the passengers.  
Regarding Claim 17, the combination of Tram and Liu the control system of claim 16 as detailed above.
                The combination of Tram and Liu teaches wherein the controller grades each of the possible planned movements by determining a speed of the autonomous vehicle for each of the plurality of possible planned movements, a distance from the autonomous vehicle to another object for each of the plurality of possible planned movements, a presence of a stop sign for each of the plurality of possible planned movements, a distance from the autonomous vehicle to the stop sign for each of the plurality of possible planned movements, a presence of a pedestrian for each of the plurality of possible planned movements, and a distance from the autonomous vehicle to the pedestrian for each of the plurality of possible planned movements as detailed in claim 14 above.  
	   Tram teaches regrading the plurality of possible planned movements ([0045] “The first module 21 is configured to receive this feedback signal and run it through the self-learning model in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The first module 21 also receives data comprising information about the surrounding environment during the first time period, in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The feedback accordingly acts an additional input to the trained self-learning model to generate actions for the ego vehicle.”).
Tram does not explicitly teach wherein regrading each of the plurality of possible planned movements includes determining an updated speed of the autonomous vehicle for each of the plurality of modified planned movements, an updated distance from the autonomous vehicle to another object for each of the plurality of modified planned movements, an updated presence of a stop sign for each of the plurality of modified planned movements, an updated distance from the autonomous vehicle to the stop sign for each of the plurality of modified planned movements, an updated presence of a pedestrian for each of the plurality of modified planned movements, and an updated distance from the autonomous vehicle to the pedestrian for each of the plurality of modified planned movements.  
It would have been obvious to one having ordinary skill in the art to combine the regrading of the possible planned movements as taught in Tram with the scores taught by the combination of Tram and Liu. One would have been motivated to do so in order to have a score that reflects the current environment. One would want to score the planned movements based on the current environment to avoid maneuvers that will lead to collisions.  

Regarding Claim 18, the combination of Tram and Liu the control system of claim 17 as detailed above.
	The combination of Tram and Liu teach wherein the controller grades each of the possible planned movements by assigning a partial score to each of the speed of the autonomous vehicle for each of the plurality of possible planned movements, the distance from the autonomous vehicle to another object for each of the plurality of possible planned movements, the presence of the stop sign for each of the plurality of possible planned movements, the distance from the autonomous vehicle to the stop sign for each of the plurality of possible planned movements, the presence of a pedestrian for each of the plurality of possible planned movements, and the distance from the autonomous vehicle to the pedestrian for each of the plurality of possible planned movements in order to obtain a plurality of partial movement scores for each of the plurality of possible planned movements as detailed above with regard to claim 15.  
	Tram teaches regrading the plurality of possible planned movements ([0045] “The first module 21 is configured to receive this feedback signal and run it through the self-learning model in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The first module 21 also receives data comprising information about the surrounding environment during the first time period, in order to evaluate if a new action is to be generated or if the previously determined action still is feasible. The feedback accordingly acts an additional input to the trained self-learning model to generate actions for the ego vehicle.”). 
Tram does not explicitly teach wherein regrading each of the possible modified movements includes assigning an updated partial score to each of the updated speed of the autonomous vehicle for each of the plurality of modified planned movements, the updated distance from the autonomous vehicle to another object for each of the plurality of modified planned movements, the updated presence of the stop sign for each of the plurality of modified planned movements, the updated distance from the autonomous vehicle to the stop sign for each of the plurality of modified planned movements, the updated presence of a pedestrian for each of the plurality of modified planned movements, and the updated distance from the autonomous vehicle to the pedestrian for each of the plurality of modified planned movements in order to obtain a plurality of partial modified scores for each of the plurality of modified planned movements.  
It would have been obvious to one having ordinary skill in the art to combine the regrading of the possible planned movements as taught in Tram with the partial scores taught in Liu. One would have been motivated to update the partial scores to have them reflects the current environment. One would want the partial scores to reflect the current environment to be able to have a more accurate grading system in order to reduce the possibility of dangerous traffic situations.  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US2020/0249674A1 (Dally et al.) Discloses Sensors measure information about actors or other objects near an object, such as a vehicle or robot, to be maneuvered. Sensor data is used to determine a sequence of possible actions for the maneuverable object to achieve a determined goal. For each possible action to be considered, one or more probable reactions of the nearby actors or objects are determined. This can take the form of a decision tree in some embodiments, with alternative levels of nodes corresponding to possible actions of the present object and probable reactive actions of one or more other vehicles or actors. Machine learning can be used to determine the probabilities, as well as to project out the options along the paths of the decision tree including the sequences. A value function is used to generate a value for each considered sequence, or path, and a path having a highest value is selected for use in determining how to navigate the object. 
US2022/0048535A1 (Niendorf et al.) Discloses a method that includes receiving environment data associated with an environment detected by a vehicle, generating goal states of the environment for the vehicle by using the observed driving data associated with the environment, wherein each goal state corresponds to a region that the vehicle is capable of navigating through in the environment, generating candidate trajectories for the vehicle based on at least the goal states of the environment, wherein each candidate trajectory is associated with at least one goal state, assigning candidate values to the candidate trajectories based on the observed driving data, and selecting a candidate trajectory associated with at least one goal state from the candidate trajectories for the vehicle to navigate through the environment based on the candidate values.
US2021/0253128A1 (Nister et al.) Discloses technology that selects a preferred trajectory for an autonomous vehicle based on an evaluation of multiple hypothetical trajectories by different components within a planning system. The various components provide an optimization score for each trajectory according to the priorities of the component and scores from multiple components may form a final optimization score. 
(US2020/0269871A1) Schmidt et al. Discloses a method for determining a driving maneuver including obtaining vehicle parameters from a driving maneuver planning module of the vehicle. Schmidt also discloses determining at least one possible driving maneuver via the driving maneuver planning module based on the vehicle parameters received by means of at least one decision-making submodule of the driving maneuver planning module. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALYSSA N RORIE whose telephone number is (571)272-6962. The examiner can normally be reached Monday - Friday (out of office every other Friday) 7:30 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jelani Smith can be reached on 571-270-3969. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/A.R./Examiner, Art Unit 3662        

/JELANI A SMITH/Supervisory Patent Examiner, Art Unit 3662