DETAILED ACTION
Remarks
This Non-Final office action is in response to the application filled on 07/28/2020. Claims 1-21 are pending and examined below. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
As of date of this action, IDS filled has been annotated and considered.
Claim Objections
Claim(s) 7 and 16 is/are objected to because of the following informalities:
Claim 7, line 1, “the set of data” should be “the set of state data”.
Claim 16, line 1, “the set of data” should be “the set of state data”.
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim (s) 19-21 is/are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter.
Regarding claim 19, which recites “a computer readable medium” and applicant is claiming a computer readable medium storing instructions, but does not explicitly state that the computer readable medium is storing instructions on a "non-transitory" computer readable medium. The currently recited language is not permissible under 101 as it may include both transitory and non-transitory computer readable medium (see, e.g., In re Nuijten, Fed. Cir. Sept. 20, 2007) (slip. Op. at 18) ("A transitory, propagating signal ... is not a process, machine, manufacture, or composition of matter. Thus, such a signal cannot be patentable subject matter.").
A computer readable medium typically covers forms of non-transitory tangible media and transitory mediums which are in form of propagating signals per se in view of the ordinary and customary meaning of computer readable media (See MPEP 2111.05).
However, submitted specification describe that the computer readable medium is on a "non-transitory computer readable medium", see [0112] of PGPUB of submitted specification. Examiner suggests to add "non-transitory" to the preamble of the current claim 19 to overcome the 101 rejection. 
Dependent claim(s) 20 and 21 is/are also rejected because they do not specifically point out that they are on non-transitory computer readable medium and thus they are rejected for the same reasons as stated for the claim from which they depend from.   
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim(s) 7-9 and 16-18 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Regarding claim 7 (and similarly claim 9, 16 and 18), which recites “the candidate trajectory”, is not clear and there is lack of antecedent basis. It is unclear and indefinite since there is no candidate trajectory mentioned previously on claim 1. It is not clear if the selected trajectory is selected from multiple candidate trajectories or not. It is also not clear whether the candidate trajectory is referring the selected trajectory or not.
Dependent claim(s) 8 is/are also rejected because they do not resolve their parent (claim 7’s) deficiencies. 
Dependent claim(s) 17 is/are also rejected because they do not resolve their parent (claim 16’s) deficiencies. 
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 5-8, 10, 14-17 and 19 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by US 2017/0277194 (“Frazzoli”).
Regarding claim 1, Frazzoli discloses a system for training a motion planner for an autonomous vehicle (see [0047], where “the computer system aboard the self-driving vehicle employs an algorithmic process 28 to automatically generate and execute a trajectory 30 through the environment toward a designated goal 32. We use the term trajectory broadly to include, for example, a path or route from one place to another, e.g., from a pickup location to a drop off location. In some implementations, a trajectory can comprise a sequence of transitions each from one world state to a subsequent world state.”), the system comprising a processing unit configured to execute instructions (see fig 1, where a system for generating control instruction for an autonomous vehicle is shown. see also [0177]) to cause the system to: 
receive, as input to a trajectory evaluator agent of the motion planner, a first set of state data defining a current state of the autonomous vehicle and an environment at a current time step (see [0063], where “A computer system 18 (data processor) located on the vehicle that is capable of executing algorithms 69. e.g., as described in this application.”; see also [0070], where “Quantities expressed as part of the world state include, but are not limited to, statistics on: the current position, velocity, and acceleration of the ego vehicle; estimates of the types, positions, velocities, and current intents of other nearby vehicles, pedestrians, cyclists, scooters, carriages, carts, and other moving objects or obstacles; the positions and types of nearby static obstacles (e.g., poles, signs, curbs, traffic marking cones and barrels, road dividers, trees); and the positions, types and information content of road markings, road signs, and traffic signals. The world state can also include information about the roadway's physical properties, such as the number of vehicular and cyclist travel lanes, lane width, lane traffic direction, lane marker type and location, and the spatial locations of road features such as crosswalks, traffic signs, and traffic signals. The world state 88 contains probabilistic estimates of the states of the ego vehicle and of nearby vehicles, including maximum likelihood estimate, error covariance, and sufficient statistics for the variables of interest.”; computer system 18 is interpreted as trajectory evaluator agent); 
select, based on the current state, a selected trajectory (see [0008], where “A finite set of candidate trajectories of the vehicle is generated that begin at a location of the vehicle as of a given time. The candidate trajectories are based on a state of the vehicle and on possible behaviors of the vehicle and of the environment as of the location of the vehicle and the given time. A putative optimal trajectory is selected from among the candidate trajectories based on costs associated with the candidate trajectories. The costs include costs associated with violations of rules of operation of the vehicle. The selected putative optimal trajectory is used to facilitate the operation related to control of the vehicle.”; see also [0063], where “A computer system 18 (data processor) located on the vehicle that is capable of executing algorithms 69. e.g., as described in this application. The algorithms, among other things, process data provided by the above sources and (in addition to other results discussed below), compute a predicted optimal trajectory 61 that encompasses a safe driving action in a current scenario that can be taken over a short future time horizon (the time horizon can be, for example, on the order of, for example, 2-5 seconds although in some cases the time horizon can be shorter (for example, fractions of seconds) or longer (for example tens of seconds, minutes, or many minutes)”; see also fig 17, where generation of an optimal trajectory is shown. see also [0127], where “As shown in FIG. 14, processes 202 (of the kind discussed earlier with respect to self-driving vehicles) running on the computer 18 generate candidate trajectories 204 (e.g., time-parameterized paths) that the ego vehicle may follow through the environment during the configurable time horizon T.”); 
compute reward for the selected trajectory based on performance of the selected trajectory in the current state (per submitted specification, reward of the selected trajectory is calculated based on closeness to a desired goal of a safe, comfortable and fast path. If the selected trajectory is following a lane with a speed close to the speed limit of the lane, then the reward is position. However,  if the selected trajectory getting the vehicle in accident, then the reward is negative, see [0065] of PGPUB of submitted specification. Frazzoli teaches a system that select the trajectory by considering the feasibility of the trajectory at the current vehicle’s operating speed (current vehicle’s state). The feasibility checking include the operating speed of the vehicle, collision on the trajectory or not and operating the vehicles by maintaining the local rules etc. Feasibility checking is interpreted as compute reward. see Frazzoli [0049-52], where “The automatically generated trajectory should ideally possess at least the following properties: [0050] 1) It should be feasible, meaning that the trajectory can be followed by the vehicle with a reasonable degree of precision at the vehicle's current or expected operating speed; [0051] 2) It should be collision free, meaning that, were the vehicle to travel along the trajectory, it would not collide with any objects; and [0052] 3) It should obey a predefined set of rules, which may include local rules of operation or rules of the road, common driving practices 17, or the driving preferences 19 of a general class of passenger or a particular passenger or a combination of any two or more of those factors. Together these and possibly other similar factors are sometimes referred to generally as rules of operation (and we sometimes refer to rules of operation as driving rules). When no trajectory exists that obeys all predefined driving rules, the trajectory should minimize the severity and extent of rule violation.”); 
receive a second set of state data defining a next state of the autonomous vehicle and the environment at a next time step (see [0124], where “For driver performance purposes, each of the quantities is calculated at each time step k while the vehicle is in operation. The intervals that separate successive time instants when the quantities are calculated can range from 0.2 to 2 seconds, indicatively.”; see also [0126], where “The future positions 246 of all moving objects (e.g., vehicles, cyclists, pedestrians, etc.) are predicted over a configurable time horizon T (e.g., a period of time from the current time step k to a future time step k+T) using known techniques”; see also [0157], where “With reference to FIG. 13 and the left side of FIG. 15, at each time step k, the system also knows and records the actual position of the ego vehicle and the actual motion characteristics of other vehicles, cyclists, pedestrians, and other obstacles in the environment of the vehicle. Together this information amounts to, among other things, and actual trajectory of the ego vehicle during the time period T.”; see also [0158-163]); and 
update parameters of the trajectory evaluator agent based on the current state, selected trajectory, computed reward and next state, the parameters of the agent of the trajectory evaluator being updated to assign an evaluation value for the selected trajectory that reflects the computed reward and expected performance of the selected trajectory in the future states (per submitted specification, evaluation value is assigned to each candidate trajectory. The evaluation value is reflective of whether the current trajectory successfully achieve the goal of safe, comfortable and speedy driving, see [0056] of PGPUB of submitted specification. Per submitted specification, reward of a selected trajectory is calculated based on closeness to a desired goal of a safe, comfortable and fast path at current time step. If the selected trajectory is following a lane with a speed close to the speed limit of the lane, then the reward is position. However, if the selected trajectory getting the vehicle in accident, then the reward is negative, see [0065] of PGPUB of submitted specification. The evaluation value is assigned to the selected trajectory based on the overall performance from start to goal and reward value is assigned (computed) to the selected trajectory based on the current (state/time) performance. Reward is computed for the selected trajectory at current state and evaluation value is assigned to the selected trajectory for overall performance. Frazzoli teaches a system that select the trajectory based on the current state of the vehicle and the surrounding environment and the selected trajectory also updated based on the states of the vehicle and surrounding environment. The trajectories are evaluated and ranked based on the quality/desirability/cost. Most desirable (highly ranked quality, minimum cost path) trajectory is selected out of many candidate trajectories. So, a value for desirability/quality is assigned. The value of desirability/quality/cost is interpreted as evaluation value. As reward of the selected trajectory is calculated based on current states, so reward, current state and next state also reflect on the evaluation value. see Frazzoli [0024], where “A computational element iteratively updates (a) a set of world states, each of the world states representing a combination of a state of the vehicle, a state of an environment of the vehicle, and a state of at least one other object in the environment based at least in part on the information about world states, and (b) a set of world trajectories, each of the world trajectories representing a temporal transition between one of the world states and another of the world states. Each of the iterations of the updating includes, for each of one or more of the world states and for a corresponding vehicle control policy, simulating a candidate trajectory from the world state to a subsequent world state. If the simulated candidate trajectory does not violate a constraint, the trajectory is added to the set of world trajectories to form an updated set of world trajectories. If necessary, a new world state is added to the set of world states corresponding to the transition represented by the simulated candidate trajectory to form an updated set of world states. A minimum-cost path is determined through the updated set of world states and the updated set of world trajectories.”; see also [0154], where “an optimal trajectory 250 is identified as one that is deemed most desirable, as determined by analysis of some combination (e.g., a weighted sum) of the quantitative metrics described in a through c. Typically, the candidate trajectory that exhibits the minimum value of the weighted sum of all performance metrics is deemed the optimal trajectory.”; see also [0130-139], where “The candidate ego vehicle trajectories are evaluated and ranked according to their quality or desirability. More precisely, each candidate trajectory is evaluated according to a set of performance metrics that may include, but are not limited to, any one or more of the following”; see also [0016], [0060], [0093], [0100] and [0105]).
Regarding claim 5, Frazzoli further discloses a system wherein the evaluation value is generated as a set of statistical metrics defining a probability distribution of a probabilistic evaluation value (per submitted specification, the evaluation value is generated as a set of statistical metrices e.g. mean, variance, maximum, minimum, see [0100] of PGPUB of submitted specification. see Frazzoli [0106], where “As a feature of the steps of the assessment process, the cost of each edge can be influenced by statistical, probabilistic, or worst-case estimates of events such as the ego vehicle colliding with other vehicles or obstacles, the ego vehicle violating a driving rule, or other events relevant to the operation of the vehicle.”; see also [0107], where “given the set of candidate trajectories, the assessment process can quickly find which one is the best according to criteria that are encoded in a cost that can be comprised of several components. The cost can be expressed as an array of the form (10.1, 2, 0), where each component gives the cost incurred for a particular criterion. For example, the first component could be the path length, the second could be the number of lane boundaries to be crossed, and the third could be the number of expected collisions. The costs are compared following a lexicographic ordering in which, for example, the later entries have higher priority than the earlier ones. For example a trajectory with cost (25, 4, 0) is considered preferable to one with cost (10, 2, 1), because the latter will cause a collision, even though it is shorter. A trajectory with cost (12, 0, 0) will be preferable to both. This concept allows the system to systematically compute trajectories that satisfy all driving rules that the vehicle is able to satisfy (allowing for some minimal violation), thus providing predictable and graceful performance degradation instead of, e.g., aborting, when some rule needs to be violated.”; see also [0069]).
Regarding claim 6, Frazzoli further discloses a system wherein the selected trajectory is selected according to a selection criteria based on one or more statistical metrics (per submitted specification, trajectory selection criteria  may be based on safer motion planning, see [0097] of PGPUB of submitted specification. Safer motion is the selection criteria. Frazzoli teaches a system that select trajectory based on collision on the route or not, which is safer motion. see Frazzoli [0107], where “given the set of candidate trajectories, the assessment process can quickly find which one is the best according to criteria that are encoded in a cost that can be comprised of several components. The cost can be expressed as an array of the form (10.1, 2, 0), where each component gives the cost incurred for a particular criterion. For example, the first component could be the path length, the second could be the number of lane boundaries to be crossed, and the third could be the number of expected collisions. The costs are compared following a lexicographic ordering in which, for example, the later entries have higher priority than the earlier ones. For example a trajectory with cost (25, 4, 0) is considered preferable to one with cost (10, 2, 1), because the latter will cause a collision, even though it is shorter. A trajectory with cost (12, 0, 0) will be preferable to both. This concept allows the system to systematically compute trajectories that satisfy all driving rules that the vehicle is able to satisfy (allowing for some minimal violation), thus providing predictable and graceful performance degradation instead of, e.g., aborting, when some rule needs to be violated.”).
Regarding claim 7, as best understood in view of indefiniteness rejection explained above, Frazzoli further discloses a system wherein the set of data defining the candidate trajectory is a set of parameters defining the candidate trajectory according to a trajectory generation function (per submitted specification, the selected trajectory is calculated by using a set of parameters in a predefined trajectory generation function. The set of parameters is determined based on current state data, see [0100] of PGPUB of submitted specification. A set of parameters include initial speed, final speed, initial orientation, trajectory horizon etc. see [0085] of PGPUB of submitted specification. see Frazzoli [0024], where “Each of the iterations of the updating includes, for each of one or more of the world states and for a corresponding vehicle control policy, simulating a candidate trajectory from the world state to a subsequent world state. If the simulated candidate trajectory does not violate a constraint, the trajectory is added to the set of world trajectories to form an updated set of world trajectories. If necessary, a new world state is added to the set of world states corresponding to the transition represented by the simulated candidate trajectory to form an updated set of world states.”; World states (trajectory generation function) include states of the vehicle/environment of the vehicle (a set of parameters).).
Regarding claim 8, Frazzoli further discloses a system wherein the processing unit is configured to execute instructions to cause the system to: generate the selected trajectory from the set of parameters, according to the trajectory generation function (see [0024], where “Each of the iterations of the updating includes, for each of one or more of the world states and for a corresponding vehicle control policy, simulating a candidate trajectory from the world state to a subsequent world state. If the simulated candidate trajectory does not violate a constraint, the trajectory is added to the set of world trajectories to form an updated set of world trajectories. If necessary, a new world state is added to the set of world states corresponding to the transition represented by the simulated candidate trajectory to form an updated set of world states.”; candidate trajectory is generated based on a set of world states (trajectory generation function). World states include states of the vehicle/environment of the vehicle (set of parameters).).
Regarding claim 10, Frazzoli further discloses a method for training a motion planner for an autonomous vehicle (see [0047], where “the computer system aboard the self-driving vehicle employs an algorithmic process 28 to automatically generate and execute a trajectory 30 through the environment toward a designated goal 32. We use the term trajectory broadly to include, for example, a path or route from one place to another, e.g., from a pickup location to a drop off location. In some implementations, a trajectory can comprise a sequence of transitions each from one world state to a subsequent world state.”; see also fig 1, where a system for generating control instruction for an autonomous vehicle is shown. see also [0177]), the method comprising: 
receiving, as input to a trajectory evaluator of the motion planner, a first set of state data defining a current state of the autonomous vehicle and an environment at a current time step (see [0063], where “A computer system 18 (data processor) located on the vehicle that is capable of executing algorithms 69. e.g., as described in this application.”; see also [0070], where “Quantities expressed as part of the world state include, but are not limited to, statistics on: the current position, velocity, and acceleration of the ego vehicle; estimates of the types, positions, velocities, and current intents of other nearby vehicles, pedestrians, cyclists, scooters, carriages, carts, and other moving objects or obstacles; the positions and types of nearby static obstacles (e.g., poles, signs, curbs, traffic marking cones and barrels, road dividers, trees); and the positions, types and information content of road markings, road signs, and traffic signals. The world state can also include information about the roadway's physical properties, such as the number of vehicular and cyclist travel lanes, lane width, lane traffic direction, lane marker type and location, and the spatial locations of road features such as crosswalks, traffic signs, and traffic signals. The world state 88 contains probabilistic estimates of the states of the ego vehicle and of nearby vehicles, including maximum likelihood estimate, error covariance, and sufficient statistics for the variables of interest.”; computer system 18 is interpreted as trajectory evaluator agent); 
selecting, based on the current state, a selected trajectory (see [0008], where “A finite set of candidate trajectories of the vehicle is generated that begin at a location of the vehicle as of a given time. The candidate trajectories are based on a state of the vehicle and on possible behaviors of the vehicle and of the environment as of the location of the vehicle and the given time. A putative optimal trajectory is selected from among the candidate trajectories based on costs associated with the candidate trajectories. The costs include costs associated with violations of rules of operation of the vehicle. The selected putative optimal trajectory is used to facilitate the operation related to control of the vehicle.”; see also [0063], where “A computer system 18 (data processor) located on the vehicle that is capable of executing algorithms 69. e.g., as described in this application. The algorithms, among other things, process data provided by the above sources and (in addition to other results discussed below), compute a predicted optimal trajectory 61 that encompasses a safe driving action in a current scenario that can be taken over a short future time horizon (the time horizon can be, for example, on the order of, for example, 2-5 seconds although in some cases the time horizon can be shorter (for example, fractions of seconds) or longer (for example tens of seconds, minutes, or many minutes)”; see also fig 17, where generation of an optimal trajectory is shown. see also [0127], where “As shown in FIG. 14, processes 202 (of the kind discussed earlier with respect to self-driving vehicles) running on the computer 18 generate candidate trajectories 204 (e.g., time-parameterized paths) that the ego vehicle may follow through the environment during the configurable time horizon T.”); 
computing a reward for the selected trajectory based on performance of the selected trajectory in the current state (per submitted specification, reward of the selected trajectory is calculated based on closeness to a desired goal of a safe, comfortable and fast path. If the selected trajectory is following a lane with a speed close to the speed limit of the lane, then the reward is position. However,  if the selected trajectory getting the vehicle in accident, then the reward is negative, see [0065] of PGPUB of submitted specification. Frazzoli teaches a system that select the trajectory by considering the feasibility of the trajectory at the current vehicle’s operating speed (current vehicle’s state). The feasibility checking include the operating speed of the vehicle, collision on the trajectory or not and operating the vehicles by maintaining the local rules etc. Feasibility checking is interpreted as compute reward. see Frazzoli [0049-52], where “The automatically generated trajectory should ideally possess at least the following properties: [0050] 1) It should be feasible, meaning that the trajectory can be followed by the vehicle with a reasonable degree of precision at the vehicle's current or expected operating speed; [0051] 2) It should be collision free, meaning that, were the vehicle to travel along the trajectory, it would not collide with any objects; and [0052] 3) It should obey a predefined set of rules, which may include local rules of operation or rules of the road, common driving practices 17, or the driving preferences 19 of a general class of passenger or a particular passenger or a combination of any two or more of those factors. Together these and possibly other similar factors are sometimes referred to generally as rules of operation (and we sometimes refer to rules of operation as driving rules). When no trajectory exists that obeys all predefined driving rules, the trajectory should minimize the severity and extent of rule violation.”); 
receiving a second set of state data defining a next state of the autonomous vehicle and the environment at a next time step (see [0124], where “For driver performance purposes, each of the quantities is calculated at each time step k while the vehicle is in operation. The intervals that separate successive time instants when the quantities are calculated can range from 0.2 to 2 seconds, indicatively.”; see also [0126], where “The future positions 246 of all moving objects (e.g., vehicles, cyclists, pedestrians, etc.) are predicted over a configurable time horizon T (e.g., a period of time from the current time step k to a future time step k+T) using known techniques”; see also [0157], where “With reference to FIG. 13 and the left side of FIG. 15, at each time step k, the system also knows and records the actual position of the ego vehicle and the actual motion characteristics of other vehicles, cyclists, pedestrians, and other obstacles in the environment of the vehicle. Together this information amounts to, among other things, and actual trajectory of the ego vehicle during the time period T.”; see also [0158-163]); and 
updating parameters of the trajectory evaluator agent based on the current state, selected trajectory, computed reward and next state, the parameters of the trajectory evaluator agent being updated to assign an evaluation value for the selected trajectory that reflects the computed reward and expected performance of the selected trajectory in the future states (per submitted specification, evaluation value is assigned to each candidate trajectory. The evaluation value is reflective of whether the current trajectory successfully achieve the goal of safe, comfortable and speedy driving, see [0056] of PGPUB of submitted specification. Per submitted specification, reward of a selected trajectory is calculated based on closeness to a desired goal of a safe, comfortable and fast path at current time step. If the selected trajectory is following a lane with a speed close to the speed limit of the lane, then the reward is position. However, if the selected trajectory getting the vehicle in accident, then the reward is negative, see [0065] of PGPUB of submitted specification. The evaluation value is assigned to the selected trajectory based on the overall performance from start to goal and reward value is assigned (computed) to the selected trajectory based on the current (state/time) performance. Reward is computed for the selected trajectory at current state and evaluation value is assigned to the selected trajectory for overall performance. Frazzoli teaches a system that updates the trajectory based on the current state of the vehicle and the surrounding environment. the trajectories are evaluated and ranked based on the quality/desirability/cost. Most desirable (highly ranked quality, minimum cost path) trajectory is selected out of many candidate trajectories. So, a value for desirability/quality is assigned. The value of desirability/quality/cost is interpreted as evaluation value. As reward of the selected trajectory is calculated based on current states, so reward, current state and next state also reflect on the evaluation value. see Frazzoli [0024], where “A computational element iteratively updates (a) a set of world states, each of the world states representing a combination of a state of the vehicle, a state of an environment of the vehicle, and a state of at least one other object in the environment based at least in part on the information about world states, and (b) a set of world trajectories, each of the world trajectories representing a temporal transition between one of the world states and another of the world states. Each of the iterations of the updating includes, for each of one or more of the world states and for a corresponding vehicle control policy, simulating a candidate trajectory from the world state to a subsequent world state. If the simulated candidate trajectory does not violate a constraint, the trajectory is added to the set of world trajectories to form an updated set of world trajectories. If necessary, a new world state is added to the set of world states corresponding to the transition represented by the simulated candidate trajectory to form an updated set of world states. A minimum-cost path is determined through the updated set of world states and the updated set of world trajectories.”; see also [0154], where “an optimal trajectory 250 is identified as one that is deemed most desirable, as determined by analysis of some combination (e.g., a weighted sum) of the quantitative metrics described in a through c. Typically, the candidate trajectory that exhibits the minimum value of the weighted sum of all performance metrics is deemed the optimal trajectory.”; see also [0130-139], where “The candidate ego vehicle trajectories are evaluated and ranked according to their quality or desirability. More precisely, each candidate trajectory is evaluated according to a set of performance metrics that may include, but are not limited to, any one or more of the following”; see also [0016], [0060], [0093], [0100] and [0105]).
Regarding claim 14, Frazzoli further discloses a method wherein the evaluation value is generated as a set of statistical metrics defining a probability distribution of a probabilistic evaluation value (per submitted specification, the evaluation value is generated as a set of statistical metrices e.g. mean, variance, maximum, minimum, see [0100] of PGPUB of submitted specification. see Frazzoli [0106], where “As a feature of the steps of the assessment process, the cost of each edge can be influenced by statistical, probabilistic, or worst-case estimates of events such as the ego vehicle colliding with other vehicles or obstacles, the ego vehicle violating a driving rule, or other events relevant to the operation of the vehicle.”; see also [0107], where “given the set of candidate trajectories, the assessment process can quickly find which one is the best according to criteria that are encoded in a cost that can be comprised of several components. The cost can be expressed as an array of the form (10.1, 2, 0), where each component gives the cost incurred for a particular criterion. For example, the first component could be the path length, the second could be the number of lane boundaries to be crossed, and the third could be the number of expected collisions. The costs are compared following a lexicographic ordering in which, for example, the later entries have higher priority than the earlier ones. For example a trajectory with cost (25, 4, 0) is considered preferable to one with cost (10, 2, 1), because the latter will cause a collision, even though it is shorter. A trajectory with cost (12, 0, 0) will be preferable to both. This concept allows the system to systematically compute trajectories that satisfy all driving rules that the vehicle is able to satisfy (allowing for some minimal violation), thus providing predictable and graceful performance degradation instead of, e.g., aborting, when some rule needs to be violated.”; see also [0069]).
Regarding claim 15, Frazzoli further discloses a method wherein the selected trajectory is selected according to a selection criteria based on one or more statistical metrics (per submitted specification, trajectory selection criteria  may be based on safer motion planning, see [0097] of PGPUB of submitted specification. Safer motion is the selection criteria. Frazzoli teaches a system that select trajectory based on collision on the route or not, which is safer motion. see Frazzoli [0107], where “given the set of candidate trajectories, the assessment process can quickly find which one is the best according to criteria that are encoded in a cost that can be comprised of several components. The cost can be expressed as an array of the form (10.1, 2, 0), where each component gives the cost incurred for a particular criterion. For example, the first component could be the path length, the second could be the number of lane boundaries to be crossed, and the third could be the number of expected collisions. The costs are compared following a lexicographic ordering in which, for example, the later entries have higher priority than the earlier ones. For example a trajectory with cost (25, 4, 0) is considered preferable to one with cost (10, 2, 1), because the latter will cause a collision, even though it is shorter. A trajectory with cost (12, 0, 0) will be preferable to both. This concept allows the system to systematically compute trajectories that satisfy all driving rules that the vehicle is able to satisfy (allowing for some minimal violation), thus providing predictable and graceful performance degradation instead of, e.g., aborting, when some rule needs to be violated.”).
Regarding claim 16, as best understood in view of indefiniteness rejection explained above, Frazzoli further discloses a method wherein the set of data defining the candidate trajectory is a set of parameters defining the candidate trajectory according to a trajectory generation function (per submitted specification, the selected trajectory is calculated by using a set of parameters in a predefined trajectory generation function. The set of parameters is determined based on current state data, see [0100] of PGPUB of submitted specification. A set of parameters include initial speed, final speed, initial orientation, trajectory horizon etc. see [0085] of PGPUB of submitted specification. see Frazzoli [0024], where “Each of the iterations of the updating includes, for each of one or more of the world states and for a corresponding vehicle control policy, simulating a candidate trajectory from the world state to a subsequent world state. If the simulated candidate trajectory does not violate a constraint, the trajectory is added to the set of world trajectories to form an updated set of world trajectories. If necessary, a new world state is added to the set of world states corresponding to the transition represented by the simulated candidate trajectory to form an updated set of world states.”; World states (trajectory generation function) include states of the vehicle/environment of the vehicle (a set of parameters).).
Regarding claim 17, Frazzoli further discloses a method comprising: generating the selected trajectory from the set of parameters, according to the trajectory generation function (see [0024], where “Each of the iterations of the updating includes, for each of one or more of the world states and for a corresponding vehicle control policy, simulating a candidate trajectory from the world state to a subsequent world state. If the simulated candidate trajectory does not violate a constraint, the trajectory is added to the set of world trajectories to form an updated set of world trajectories. If necessary, a new world state is added to the set of world states corresponding to the transition represented by the simulated candidate trajectory to form an updated set of world states.”; candidate trajectory is generated based on a set of world states (trajectory generation function). World states include states of the vehicle/environment of the vehicle (set of parameters).).
Regarding claim 19, Frazzoli further discloses a computer-readable medium storing instructions for execution by a processing unit of a system for training a motion planner for an autonomous vehicle (see [0047], where “the computer system aboard the self-driving vehicle employs an algorithmic process 28 to automatically generate and execute a trajectory 30 through the environment toward a designated goal 32. We use the term trajectory broadly to include, for example, a path or route from one place to another, e.g., from a pickup location to a drop off location. In some implementations, a trajectory can comprise a sequence of transitions each from one world state to a subsequent world state.”; see also fig 1, where a system for generating control instruction for an autonomous vehicle is shown. see also [0177]), the instructions when executed causing the system to: 
receive, as input to a trajectory evaluator of the motion planner, a first set of state data defining a current state of the autonomous vehicle and an environment at a current time step (see [0063], where “A computer system 18 (data processor) located on the vehicle that is capable of executing algorithms 69. e.g., as described in this application.”; see also [0070], where “Quantities expressed as part of the world state include, but are not limited to, statistics on: the current position, velocity, and acceleration of the ego vehicle; estimates of the types, positions, velocities, and current intents of other nearby vehicles, pedestrians, cyclists, scooters, carriages, carts, and other moving objects or obstacles; the positions and types of nearby static obstacles (e.g., poles, signs, curbs, traffic marking cones and barrels, road dividers, trees); and the positions, types and information content of road markings, road signs, and traffic signals. The world state can also include information about the roadway's physical properties, such as the number of vehicular and cyclist travel lanes, lane width, lane traffic direction, lane marker type and location, and the spatial locations of road features such as crosswalks, traffic signs, and traffic signals. The world state 88 contains probabilistic estimates of the states of the ego vehicle and of nearby vehicles, including maximum likelihood estimate, error covariance, and sufficient statistics for the variables of interest.”; computer system 18 is interpreted as trajectory evaluator agent); 
select, based on the current state, a selected trajectory (see [0008], where “A finite set of candidate trajectories of the vehicle is generated that begin at a location of the vehicle as of a given time. The candidate trajectories are based on a state of the vehicle and on possible behaviors of the vehicle and of the environment as of the location of the vehicle and the given time. A putative optimal trajectory is selected from among the candidate trajectories based on costs associated with the candidate trajectories. The costs include costs associated with violations of rules of operation of the vehicle. The selected putative optimal trajectory is used to facilitate the operation related to control of the vehicle.”; see also [0063], where “A computer system 18 (data processor) located on the vehicle that is capable of executing algorithms 69. e.g., as described in this application. The algorithms, among other things, process data provided by the above sources and (in addition to other results discussed below), compute a predicted optimal trajectory 61 that encompasses a safe driving action in a current scenario that can be taken over a short future time horizon (the time horizon can be, for example, on the order of, for example, 2-5 seconds although in some cases the time horizon can be shorter (for example, fractions of seconds) or longer (for example tens of seconds, minutes, or many minutes)”; see also fig 17, where generation of an optimal trajectory is shown. see also [0127], where “As shown in FIG. 14, processes 202 (of the kind discussed earlier with respect to self-driving vehicles) running on the computer 18 generate candidate trajectories 204 (e.g., time-parameterized paths) that the ego vehicle may follow through the environment during the configurable time horizon T.”); 
compute a reward for the selected trajectory based on performance of the selected trajectory in the current state (per submitted specification, reward of the selected trajectory is calculated based on closeness to a desired goal of a safe, comfortable and fast path. If the selected trajectory is following a lane with a speed close to the speed limit of the lane, then the reward is position. However,  if the selected trajectory getting the vehicle in accident, then the reward is negative, see [0065] of PGPUB of submitted specification. Frazzoli teaches a system that select the trajectory by considering the feasibility of the trajectory at the current vehicle’s operating speed (current vehicle’s state). The feasibility checking include the operating speed of the vehicle, collision on the trajectory or not and operating the vehicles by maintaining the local rules etc. Feasibility checking is interpreted as compute reward. see Frazzoli [0049-52], where “The automatically generated trajectory should ideally possess at least the following properties: [0050] 1) It should be feasible, meaning that the trajectory can be followed by the vehicle with a reasonable degree of precision at the vehicle's current or expected operating speed; [0051] 2) It should be collision free, meaning that, were the vehicle to travel along the trajectory, it would not collide with any objects; and [0052] 3) It should obey a predefined set of rules, which may include local rules of operation or rules of the road, common driving practices 17, or the driving preferences 19 of a general class of passenger or a particular passenger or a combination of any two or more of those factors. Together these and possibly other similar factors are sometimes referred to generally as rules of operation (and we sometimes refer to rules of operation as driving rules). When no trajectory exists that obeys all predefined driving rules, the trajectory should minimize the severity and extent of rule violation.”); 
receive a second set of state data defining a next state of the autonomous vehicle and the environment at a next time step (see [0124], where “For driver performance purposes, each of the quantities is calculated at each time step k while the vehicle is in operation. The intervals that separate successive time instants when the quantities are calculated can range from 0.2 to 2 seconds, indicatively.”; see also [0126], where “The future positions 246 of all moving objects (e.g., vehicles, cyclists, pedestrians, etc.) are predicted over a configurable time horizon T (e.g., a period of time from the current time step k to a future time step k+T) using known techniques”; see also [0157], where “With reference to FIG. 13 and the left side of FIG. 15, at each time step k, the system also knows and records the actual position of the ego vehicle and the actual motion characteristics of other vehicles, cyclists, pedestrians, and other obstacles in the environment of the vehicle. Together this information amounts to, among other things, and actual trajectory of the ego vehicle during the time period T.”; see also [0158-163]); and 
update parameters of the trajectory evaluator based on the current state, selected trajectory, computed reward and next state, the parameters of the trajectory evaluator being updated to assign an evaluation value for the selected trajectory that reflects the calculated reward and expected performance of the selected trajectory in the future states (per submitted specification, evaluation value is assigned to each candidate trajectory. The evaluation value is reflective of whether the current trajectory successfully achieve the goal of safe, comfortable and speedy driving, see [0056] of PGPUB of submitted specification. Per submitted specification, reward of a selected trajectory is calculated based on closeness to a desired goal of a safe, comfortable and fast path at current time step. If the selected trajectory is following a lane with a speed close to the speed limit of the lane, then the reward is position. However, if the selected trajectory getting the vehicle in accident, then the reward is negative, see [0065] of PGPUB of submitted specification. The evaluation value is assigned to the selected trajectory based on the overall performance from start to goal and reward value is assigned (computed) to the selected trajectory based on the current (state/time) performance. Reward is computed for the selected trajectory at current state and evaluation value is assigned to the selected trajectory for overall performance. Frazzoli teaches a system that updates the trajectory based on the current state of the vehicle and the surrounding environment. the trajectories are evaluated and ranked based on the quality/desirability/cost. Most desirable (highly ranked quality, minimum cost path) trajectory is selected out of many candidate trajectories. So, a value for desirability/quality is assigned. The value of desirability/quality/cost is interpreted as evaluation value. As reward of the selected trajectory is calculated based on current states, so reward, current state and next state also reflect on the evaluation value. see Frazzoli [0024], where “A computational element iteratively updates (a) a set of world states, each of the world states representing a combination of a state of the vehicle, a state of an environment of the vehicle, and a state of at least one other object in the environment based at least in part on the information about world states, and (b) a set of world trajectories, each of the world trajectories representing a temporal transition between one of the world states and another of the world states. Each of the iterations of the updating includes, for each of one or more of the world states and for a corresponding vehicle control policy, simulating a candidate trajectory from the world state to a subsequent world state. If the simulated candidate trajectory does not violate a constraint, the trajectory is added to the set of world trajectories to form an updated set of world trajectories. If necessary, a new world state is added to the set of world states corresponding to the transition represented by the simulated candidate trajectory to form an updated set of world states. A minimum-cost path is determined through the updated set of world states and the updated set of world trajectories.”; see also [0154], where “an optimal trajectory 250 is identified as one that is deemed most desirable, as determined by analysis of some combination (e.g., a weighted sum) of the quantitative metrics described in a through c. Typically, the candidate trajectory that exhibits the minimum value of the weighted sum of all performance metrics is deemed the optimal trajectory.”; see also [0130-139], where “The candidate ego vehicle trajectories are evaluated and ranked according to their quality or desirability. More precisely, each candidate trajectory is evaluated according to a set of performance metrics that may include, but are not limited to, any one or more of the following”; see also [0016], [0060], [0093], [0100] and [0105]).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 2 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2017/0277194 (“Frazzoli”), as applied to claim 1 and 10 above, and further in view of US 2021/0096241 (“Bongio Karrman”). 
Regarding claim 2, Frazzoli further discloses a system wherein the first set of state data and the second set of state data each independently (see [0009], where “Each of the trajectories represents a temporal transition from the state of the vehicle at the given time to a state of the vehicle at a later time. For each of a succession of times after the given time, a subsequent finite set of candidate trajectories of the vehicle is generated that began at a location of the vehicle as of the succeeding time. The candidate trajectories of the subsequent finite set are based on a state of the vehicle and on possible behaviors of the vehicle and of the environment as of the location of the vehicle at the succeeding time.”; see also [0024], where “A computational element iteratively updates (a) a set of world states, each of the world states representing a combination of a state of the vehicle, a state of an environment of the vehicle, and a state of at least one other object in the environment based at least in part on the information about world states”) includes 
Frazzoli does not disclose the following limitation: 
wherein the state data encoded in the form of 2D images.
However, Bongio Karrman discloses a system wherein the state data encoded in the form of 2D images (see [0010], where “the radar-based perception system may utilize a top-down or two-dimensional machine learned radar perception update process. For instance, the radar-based perception system may receive radar-based point cloud data from one or more sensors positioned on the vehicle and convert the raw radar-based point cloud data into object level representations that may be processed or utilized by a planning and/or prediction system of the vehicle in making operational decisions for the vehicle. In one specific example, the radar-based perception system may convert the radar-based point cloud data (which may represent at least three dimensions) into a point cloud representation (also referred to generally as a discretized data representation) usable for feature extraction and/or instance detection. The discretized data representation may represent three-dimensional data in a two-dimensional manner. In some examples, the three-dimensional data can be associated with a discretized region of an environment (e.g., a portion of a grid) whereby the three-dimensional data can be collapsed or otherwise represented in a two-dimensional manner. In some examples, such a two-dimensional representation may be referred to as a top-down representation. In some examples, a top-down or two-dimensional discretized data representations may store detections represented in the radar-based point cloud data as vectors, pillars, or collections. In some cases, a top-down representation may include a two-dimensional “image” of an environment, whereby each pixel of the image may represent a grid location (or other discretized region) that has a fixed size, while in other cases, the grid locations or discretized regions may be associated with variable number of points.”).
Because both Frazzoli and Bongio Karrman are in the same field of endeavor of autonomous vehicle navigation system. Thus, before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified Frazzoli to incorporate the teachings of Bongio Karrman by including the above feature, wherein the state data encoded in the form of 2D images, for providing faster guidance by lowering complex computation during autonomous travelling.
Regarding claim 11, Frazzoli further discloses a method wherein the first set of state data and the second set of state data each independently (see [0009], where “Each of the trajectories represents a temporal transition from the state of the vehicle at the given time to a state of the vehicle at a later time. For each of a succession of times after the given time, a subsequent finite set of candidate trajectories of the vehicle is generated that began at a location of the vehicle as of the succeeding time. The candidate trajectories of the subsequent finite set are based on a state of the vehicle and on possible behaviors of the vehicle and of the environment as of the location of the vehicle at the succeeding time.”; see also [0024], where “A computational element iteratively updates (a) a set of world states, each of the world states representing a combination of a state of the vehicle, a state of an environment of the vehicle, and a state of at least one other object in the environment based at least in part on the information about world states”) includes 
Frazzoli does not disclose the following limitation: 
wherein the state data encoded in the form of 2D images.
However, Bongio Karrman further discloses a method wherein the state data encoded in the form of 2D images (see [0010], where “the radar-based perception system may utilize a top-down or two-dimensional machine learned radar perception update process. For instance, the radar-based perception system may receive radar-based point cloud data from one or more sensors positioned on the vehicle and convert the raw radar-based point cloud data into object level representations that may be processed or utilized by a planning and/or prediction system of the vehicle in making operational decisions for the vehicle. In one specific example, the radar-based perception system may convert the radar-based point cloud data (which may represent at least three dimensions) into a point cloud representation (also referred to generally as a discretized data representation) usable for feature extraction and/or instance detection. The discretized data representation may represent three-dimensional data in a two-dimensional manner. In some examples, the three-dimensional data can be associated with a discretized region of an environment (e.g., a portion of a grid) whereby the three-dimensional data can be collapsed or otherwise represented in a two-dimensional manner. In some examples, such a two-dimensional representation may be referred to as a top-down representation. In some examples, a top-down or two-dimensional discretized data representations may store detections represented in the radar-based point cloud data as vectors, pillars, or collections. In some cases, a top-down representation may include a two-dimensional “image” of an environment, whereby each pixel of the image may represent a grid location (or other discretized region) that has a fixed size, while in other cases, the grid locations or discretized regions may be associated with variable number of points.”).
Because both Frazzoli and Bongio Karrman are in the same field of endeavor of autonomous vehicle navigation system. Thus, before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified Frazzoli to incorporate the teachings of Bongio Karrman by including the above feature, wherein the state data encoded in the form of 2D images, for providing faster guidance by lowering complex computation during autonomous travelling.

Claim(s) 3, 12 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2017/0277194 (“Frazzoli”), as applied to claim 1, 10 and 19 above, and further in view of US 2022/0196414 (“Wang”). 
Regarding claim 3, The claim limitation is interpreted as the evaluation value of the trajectory at the current state is updated based on the reward value of current state, evaluation value at next (future) time step and a discount factor (see [0066-67] of PGPUB of submitted specification). The claim limitation is copied and pasted below:
wherein the parameters of the trajectory evaluator agent are updated according to the equation: 
                
                    V
                    
                        
                            
                                
                                    s
                                
                                
                                    t
                                
                            
                            ,
                            
                                
                                    τ
                                
                                
                                    t
                                
                            
                        
                    
                    ←
                    
                        
                            r
                        
                        
                            t
                        
                    
                    +
                    γ
                    V
                    (
                    
                        
                            s
                        
                        
                            t
                            +
                            1
                        
                    
                    ,
                    
                        
                            τ
                        
                        
                            t
                            +
                            1
                        
                    
                    )
                
            
where st is the current state at the current time step t, ꚍt is the selected trajectory, rt is the calculated reward, V(st,ꚍt) is the evaluation value for the selected trajectory at the current time step and the current state, t + 1 is the next time step, V(st+1, ꚍt+1) is an evaluation value for the selected trajectory at the next time step and the next state, and γ is a discount factor.
Frazzoli further discloses a system wherein the parameters of the trajectory evaluator agent are updated based on world states. World states include states of the vehicle and surrounding environments at different time steps (current and future). The states of the vehicle and surround environment at different time steps e.g. current and future are the parameters for updating the trajectory. Frazzoli also discloses a system wherein rewards (feasibility) of candidate trajectories at current state are calculated to pick the optimal trajectory. Frazzoli also discloses a system wherein an evaluation value (overall quality/cost/feasibility) is assigned for the optimal trajectory that reflects the calculated reward of the selected trajectory and the parameters for the current state and future state of the vehicle and surround environment (see citation above on claim 1). 
Frazzoli does not disclose a system that include discount factor for updating the parameters (evaluation value) of the selected trajectory.
However, Wang discloses a system wherein discount factor is included for calculating reward value when a vehicle reaches a target position. The discount factor is calculated based on distance and other factors (see [0032], where “In the application scenario of global path planning for an unmanned vehicle in the present embodiment, the object model includes: a state s, an action a, a state transition model p, a reward r, and a discount factor γ.”; see also [0043], where “Specifically, the first description type is set in the following way: when the unmanned vehicle reaches the target position, a positive maximum reward value is given; a discount coefficient is set based on the distance, a discount reward value is calculated from the discount coefficient and the maximum reward value; when the distance between the unmanned vehicle and the target position is less than the distance threshold, the discount reward value is given; and when the distance between the unmanned vehicle and the target position is greater than the distance threshold, no reward is given.”; see also [0099], where “the discount factor γ is an attenuation factor used when calculating rewards acquired by the unmanned vehicle for performing multiple actions, and is used to adjust an output of the value function.”). Wang discloses a path planning system for an unmanned vehicle that include state of vehicle and surround environment, reward and discount factor, see [0032]. Discount factor/coefficient is determined based on quality of the planned path e.g. when the unmanned vehicle and target position is less than a threshold distance then a discount reward is given. The discount factor is used to adjust the output of the value function, see [0043-50]. So, discount factor discloses by Wang is adjusting the value based on distance to target position. Wang discloses a system that characterizes planned path or evaluate the selected trajectory by adding discount factor.
Nevertheless, applying any mathematical formulae, including that of the claimed invention, would have been an obvious design choice for one of ordinary skill in the art because it facilitates known mathematical means for deriving/updating evaluation value as shown by Wang. Since the invention failed to provide novel or unexpected results from the usage of said claimed formula, use of any mathematical means, including that of the claimed invention, would be an obvious matter of design choice within the skill of the art. In addition, because Frazzoli  and Wang are directed to autonomous vehicle navigation system it would have been obvious for a person with ordinary skill in the art, at the time the invention was made, to have substituted evaluation value equation for providing a faster route to reach the destination while maintaining safety and comfort of the passengers.
Because Frazzoli and Wang in the same field of endeavor of autonomous vehicle navigation system. Thus, before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified Frazzoli to incorporate the teachings of Wang by including the above feature of introducing discount factor for characterizing evaluation value for avoiding collision by providing updated trajectory.
Regarding claim 12, The claim limitation is interpreted as the evaluation value of the trajectory at the current state is updated based on the reward value of current state, evaluation value at next (future) time step and a discount factor (see [0066-67] of PGPUB of submitted specification). The claim limitation is copied and pasted below:
wherein the parameters of the trajectory evaluator agent are updated according to the equation: 
                
                    V
                    
                        
                            
                                
                                    s
                                
                                
                                    t
                                
                            
                            ,
                            
                                
                                    τ
                                
                                
                                    t
                                
                            
                        
                    
                    ←
                    
                        
                            r
                        
                        
                            t
                        
                    
                    +
                    γ
                    V
                    (
                    
                        
                            s
                        
                        
                            t
                            +
                            1
                        
                    
                    ,
                    
                        
                            τ
                        
                        
                            t
                            +
                            1
                        
                    
                    )
                
            
where st is the current state at the current time step t, ꚍt is the selected trajectory, rt is the calculated reward, V(st,ꚍt) is the evaluation value for the selected trajectory at the current time step and the current state, t + 1 is the next time step, V(st+1, ꚍt+1) is an evaluation value for the selected trajectory at the next time step and the next state, and γ is a discount factor.
Frazzoli further discloses a method wherein the parameters of the trajectory evaluator agent are updated based on world states. World states include states of the vehicle and surrounding environments at different time steps (current and future). The states of the vehicle and surround environment at different time steps e.g. current and future are the parameters for updating the trajectory. Frazzoli also discloses a method wherein rewards (feasibility) of candidate trajectories at current state are calculated to pick the optimal trajectory. Frazzoli also discloses a method wherein an evaluation value (overall quality/cost/feasibility) is assigned for the optimal trajectory that reflects the calculated reward of the selected trajectory and the parameters for the current state and future state of the vehicle and surround environment (see citation above on claim 10). 
Frazzoli does not disclose a method that include discount factor for updating the parameters (evaluation value) of the selected trajectory.
However, Wang further discloses a method wherein discount factor is included for calculating reward value when a vehicle reaches a target position. The discount factor is calculated based on distance and other factors (see [0032], where “In the application scenario of global path planning for an unmanned vehicle in the present embodiment, the object model includes: a state s, an action a, a state transition model p, a reward r, and a discount factor γ.”; see also [0043], where “Specifically, the first description type is set in the following way: when the unmanned vehicle reaches the target position, a positive maximum reward value is given; a discount coefficient is set based on the distance, a discount reward value is calculated from the discount coefficient and the maximum reward value; when the distance between the unmanned vehicle and the target position is less than the distance threshold, the discount reward value is given; and when the distance between the unmanned vehicle and the target position is greater than the distance threshold, no reward is given.”;  see also [0099], where “the discount factor γ is an attenuation factor used when calculating rewards acquired by the unmanned vehicle for performing multiple actions, and is used to adjust an output of the value function.”). Wang discloses a path planning method for an unmanned vehicle that include state of vehicle and surround environment, reward and discount factor, see [0032]. Discount factor/coefficient is determined based on quality of the planned path e.g. when the unmanned vehicle and target position is less than a threshold distance then a discount reward is given. The discount factor is used to adjust the output of the value function, see [0043-50]. So, discount factor discloses by Wang is adjusting the value based on distance to target position. Wang discloses a system that characterizes planned path or evaluate the selected trajectory by adding discount factor.
Nevertheless, applying any mathematical formulae, including that of the claimed invention, would have been an obvious design choice for one of ordinary skill in the art because it facilitates known mathematical means for deriving/updating evaluation value as shown by Wang. Since the invention failed to provide novel or unexpected results from the usage of said claimed formula, use of any mathematical means, including that of the claimed invention, would be an obvious matter of design choice within the skill of the art. In addition, because Frazzoli  and Wang are directed to autonomous vehicle navigation system it would have been obvious for a person with ordinary skill in the art, at the time the invention was made, to have substituted evaluation value equation for providing a faster route to reach the destination while maintaining safety and comfort of the passengers.
Because Frazzoli and Wang in the same field of endeavor of autonomous vehicle navigation system. Thus, before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified Frazzoli to incorporate the teachings of Wang by including the above feature of introducing discount factor for characterizing evaluation value for avoiding collision by providing updated trajectory.
Regarding claim 20, The claim limitation is interpreted as the evaluation value of the trajectory at the current state is updated based on the reward value of current state, evaluation value at next (future) time step and a discount factor (see [0066-67] of PGPUB of submitted specification). The claim limitation is copied and pasted below:
wherein the parameters of the trajectory evaluator agent are updated according to the equation: 
                
                    V
                    
                        
                            
                                
                                    s
                                
                                
                                    t
                                
                            
                            ,
                            
                                
                                    τ
                                
                                
                                    t
                                
                            
                        
                    
                    ←
                    
                        
                            r
                        
                        
                            t
                        
                    
                    +
                    γ
                    V
                    (
                    
                        
                            s
                        
                        
                            t
                            +
                            1
                        
                    
                    ,
                    
                        
                            τ
                        
                        
                            t
                            +
                            1
                        
                    
                    )
                
            
where st is the current state at the current time step t, ꚍt is the selected trajectory, rt is the calculated reward, V(st,ꚍt) is the evaluation value for the selected trajectory at the current time step and the current state, t + 1 is the next time step, V(st+1, ꚍt+1) is an evaluation value for the selected trajectory at the next time step and the next state, and γ is a discount factor.
Frazzoli further discloses a system wherein the parameters of the trajectory evaluator agent are updated based on world states. World states include states of the vehicle and surrounding environments at different time steps (current and future). The states of the vehicle and surround environment at different time steps e.g. current and future are the parameters for updating the trajectory. Frazzoli also discloses a system wherein rewards (feasibility) of candidate trajectories at current state are calculated to pick the optimal trajectory. Frazzoli also discloses a system wherein an evaluation value (overall quality/cost/feasibility) is assigned for the optimal trajectory that reflects the calculated reward of the selected trajectory and the parameters for the current state and future state of the vehicle and surround environment (see citation above on claim 19). 
Frazzoli does not disclose a system that include discount factor for updating the parameters (evaluation value) of the selected trajectory.
However, Wang discloses a system wherein discount factor is included for calculating reward value when a vehicle reaches a target position. The discount factor is calculated based on distance and other factors (see [0032], where “In the application scenario of global path planning for an unmanned vehicle in the present embodiment, the object model includes: a state s, an action a, a state transition model p, a reward r, and a discount factor γ.”; see also [0043], where “Specifically, the first description type is set in the following way: when the unmanned vehicle reaches the target position, a positive maximum reward value is given; a discount coefficient is set based on the distance, a discount reward value is calculated from the discount coefficient and the maximum reward value; when the distance between the unmanned vehicle and the target position is less than the distance threshold, the discount reward value is given; and when the distance between the unmanned vehicle and the target position is greater than the distance threshold, no reward is given.”;  see also [0099], where “the discount factor γ is an attenuation factor used when calculating rewards acquired by the unmanned vehicle for performing multiple actions, and is used to adjust an output of the value function.”). Wang discloses a path planning system for an unmanned vehicle that include state of vehicle and surround environment, reward and discount factor, see [0032]. Discount factor/coefficient is determined based on quality of the planned path e.g. when the unmanned vehicle and target position is less than a threshold distance then a discount reward is given. The discount factor is used to adjust the output of the value function, see [0043-50]. So, discount factor discloses by Wang is adjusting the value based on distance to target position. Wang discloses a system that characterizes planned path or evaluate the selected trajectory by adding discount factor.
Nevertheless, applying any mathematical formulae, including that of the claimed invention, would have been an obvious design choice for one of ordinary skill in the art because it facilitates known mathematical means for deriving/updating evaluation value as shown by Wang. Since the invention failed to provide novel or unexpected results from the usage of said claimed formula, use of any mathematical means, including that of the claimed invention, would be an obvious matter of design choice within the skill of the art. In addition, because Frazzoli  and Wang are directed to autonomous vehicle navigation system it would have been obvious for a person with ordinary skill in the art, at the time the invention was made, to have substituted evaluation value equation for providing a faster route to reach the destination while maintaining safety and comfort of the passengers.
Because Frazzoli and Wang in the same field of endeavor of autonomous vehicle navigation system. Thus, before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified Frazzoli to incorporate the teachings of Wang by including the above feature of introducing discount factor for characterizing evaluation value for avoiding collision by providing updated trajectory.

Claim(s) 4, 13 and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2017/0277194 (“Frazzoli”), as applied to claim 1, 10 and 19 above, and in view of US 20210380099 (“Lee”), and further in view of US 2022/0196414 (“Wang”). 
Regarding claim 4, The claim limitation is interpreted as the evaluation value of the trajectory at the current state is updated based on the reward value of current state, evaluation value at next (future) time step with a different trajectory and a discount factor (see [0093] of PGPUB of submitted specification). The claim limitation is copied and pasted below:
wherein the parameters of the trajectory evaluator are updated according to the equation:                         
                            V
                            
                                
                                    
                                        
                                            s
                                        
                                        
                                            t
                                        
                                    
                                    ,
                                    
                                        
                                            τ
                                        
                                        
                                            t
                                        
                                    
                                
                            
                            ←
                            
                                
                                    r
                                
                                
                                    t
                                
                            
                            +
                            γ
                            V
                            (
                            
                                
                                    s
                                
                                
                                    t
                                    +
                                    1
                                
                            
                            ,
                            T
                            S
                            (
                            
                                
                                    s
                                
                                
                                    t
                                    +
                                    1
                                
                            
                            )
                            )
                        
                    
where st is the current state at the current time step t, ꚍt is the selected trajectory, rt is the computed reward, V(st,ꚍt) is the evaluation value for the selected trajectory at the current time step and the current state, t + 1 is the next time step, TS(st+1) is a next selected trajectory at the next time step, V(st+1,TS(st+1)) is an evaluation value for the next selected trajectory and the next state, and γ is a discount factor.
Frazzoli further discloses a system wherein the parameters of the trajectory evaluator agent are updated based on world states. World states include states of the vehicle and surrounding environments at different time steps (current and future). The states of the vehicle and surround environment at different time steps e.g. current and future are the parameters for updating the trajectory. Frazzoli also discloses a system wherein rewards (feasibility) of candidate trajectories at current state are calculated to pick the optimal trajectory. Frazzoli also discloses a system wherein an evaluation value (overall quality/cost/feasibility) is assigned for the optimal trajectory that reflects the calculated reward of the selected trajectory and the parameters for the current state and future state of the vehicle and surround environment (see citation above on claim 1). 
Frazzoli does not disclose a system that include evaluation value at next (future) time step with a different trajectory and discount factor for updating the parameters (evaluation value) of the selected trajectory.
However, Lee discloses a system wherein an updated path is generated based on initial path and constraint on the initial path (e.g. obstacles on the path), see [0066-69] and fig 6. Lee discloses a system wherein the selected path is updated in case any uncertainty arises on the path, see [0044-45].
Because Frazzoli and Lee in the same field of endeavor of autonomous vehicle navigation system. Thus, before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified Frazzoli to incorporate the teachings of Lee by including the above feature for providing a collision free path/trajectory.
Frazzoli in view of Lee does not disclose a system that include discount factor for updating the parameters (evaluation value) of the selected trajectory.
However, Wang further discloses a system wherein discount factor is included for calculating reward value when a vehicle reaches a target position. The discount factor is calculated based on distance and other factors (see [0032], where “In the application scenario of global path planning for an unmanned vehicle in the present embodiment, the object model includes: a state s, an action a, a state transition model p, a reward r, and a discount factor γ.”; see also [0043], where “Specifically, the first description type is set in the following way: when the unmanned vehicle reaches the target position, a positive maximum reward value is given; a discount coefficient is set based on the distance, a discount reward value is calculated from the discount coefficient and the maximum reward value; when the distance between the unmanned vehicle and the target position is less than the distance threshold, the discount reward value is given; and when the distance between the unmanned vehicle and the target position is greater than the distance threshold, no reward is given.”;  see also [0099], where “the discount factor γ is an attenuation factor used when calculating rewards acquired by the unmanned vehicle for performing multiple actions, and is used to adjust an output of the value function.”). Wang discloses a path planning system for an unmanned vehicle that include state of vehicle and surround environment, reward and discount factor, see [0032]. Discount factor/coefficient is determined based on quality of the planned path e.g. when the unmanned vehicle and target position is less than a threshold distance then a discount reward is given. The discount factor is used to adjust the output of the value function, see [0043-50]. So, discount factor discloses by Wang is adjusting the value based on distance to target position. Wang discloses a system that characterizes planned path or evaluate the selected trajectory by adding discount factor.
Nevertheless, applying any mathematical formulae, including that of the claimed invention, would have been an obvious design choice for one of ordinary skill in the art because it facilitates known mathematical means for deriving/updating evaluation value as shown by Wang. Since the invention failed to provide novel or unexpected results from the usage of said claimed formula, use of any mathematical means, including that of the claimed invention, would be an obvious matter of design choice within the skill of the art. In addition, because Frazzoli, Lee and Wang are directed to autonomous vehicle navigation system it would have been obvious for a person with ordinary skill in the art, at the time the invention was made, to have substituted evaluation value equation for providing a faster route to reach the destination while maintaining safety and comfort of the passengers.
Because Frazzoli, Lee and Wang in the same field of endeavor of autonomous vehicle navigation system. Thus, before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified Frazzoli in view of Lee to incorporate the teachings of Wang by including the above feature of introducing discount factor for characterizing evaluation value for avoiding collision by providing updated trajectory.
Regarding claim 13, The claim limitation is interpreted as the evaluation value of the trajectory at the current state is updated based on the reward value of current state, evaluation value at next (future) time step with a different trajectory and a discount factor (see [0093] of PGPUB of submitted specification). The claim limitation is copied and pasted below:
wherein the parameters of the trajectory evaluator are updated according to the equation:                         
                            V
                            
                                
                                    
                                        
                                            s
                                        
                                        
                                            t
                                        
                                    
                                    ,
                                    
                                        
                                            τ
                                        
                                        
                                            t
                                        
                                    
                                
                            
                            ←
                            
                                
                                    r
                                
                                
                                    t
                                
                            
                            +
                            γ
                            V
                            (
                            
                                
                                    s
                                
                                
                                    t
                                    +
                                    1
                                
                            
                            ,
                            T
                            S
                            (
                            
                                
                                    s
                                
                                
                                    t
                                    +
                                    1
                                
                            
                            )
                            )
                        
                    
where st is the current state at the current time step t, ꚍt is the selected trajectory, rt is the computed reward, V(st,ꚍt) is the evaluation value for the selected trajectory at the current time step and the current state, t + 1 is the next time step, TS(st+1) is a next selected trajectory at the next time step, V(st+1,TS(st+1)) is an evaluation value for the next selected trajectory and the next state, and γ is a discount factor.
Frazzoli further discloses a method wherein the parameters of the trajectory evaluator agent are updated based on world states. World states include states of the vehicle and surrounding environments at different time steps (current and future). The states of the vehicle and surround environment at different time steps e.g. current and future are the parameters for updating the trajectory. Frazzoli also discloses a system wherein rewards (feasibility) of candidate trajectories at current state are calculated to pick the optimal trajectory. Frazzoli also discloses a system wherein an evaluation value (overall quality/cost/feasibility) is assigned for the optimal trajectory that reflects the calculated reward of the selected trajectory and the parameters for the current state and future state of the vehicle and surround environment (see citation above on claim 10). 
Frazzoli does not disclose a method that include evaluation value at next (future) time step with a different trajectory and discount factor for updating the parameters (evaluation value) of the selected trajectory.
However, Lee further discloses a method wherein an updated path is generated based on initial path and constraint on the initial path (e.g. obstacles on the path), see [0066-69] and fig 6. Lee discloses a system wherein the selected path is updated in case any uncertainty arises on the path, see [0044-45].
Because Frazzoli and Lee in the same field of endeavor of autonomous vehicle navigation system. Thus, before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified Frazzoli to incorporate the teachings of Lee by including the above feature for providing a collision free path/trajectory.
Frazzoli in view of Lee does not disclose a method that include discount factor for updating the parameters (evaluation value) of the selected trajectory.
However, Wang further discloses a method wherein discount factor is included for calculating reward value when a vehicle reaches a target position. The discount factor is calculated based on distance and other factors (see [0032], where “In the application scenario of global path planning for an unmanned vehicle in the present embodiment, the object model includes: a state s, an action a, a state transition model p, a reward r, and a discount factor γ.”; see also [0043], where “Specifically, the first description type is set in the following way: when the unmanned vehicle reaches the target position, a positive maximum reward value is given; a discount coefficient is set based on the distance, a discount reward value is calculated from the discount coefficient and the maximum reward value; when the distance between the unmanned vehicle and the target position is less than the distance threshold, the discount reward value is given; and when the distance between the unmanned vehicle and the target position is greater than the distance threshold, no reward is given.”;  see also [0099], where “the discount factor γ is an attenuation factor used when calculating rewards acquired by the unmanned vehicle for performing multiple actions, and is used to adjust an output of the value function.”). Wang discloses a path planning system for an unmanned vehicle that include state of vehicle and surround environment, reward and discount factor, see [0032]. Discount factor/coefficient is determined based on quality of the planned path e.g. when the unmanned vehicle and target position is less than a threshold distance then a discount reward is given. The discount factor is used to adjust the output of the value function, see [0043-50]. So, discount factor discloses by Wang is adjusting the value based on distance to target position. Wang discloses a system that characterizes planned path or evaluate the selected trajectory by adding discount factor.
Nevertheless, applying any mathematical formulae, including that of the claimed invention, would have been an obvious design choice for one of ordinary skill in the art because it facilitates known mathematical means for deriving/updating evaluation value as shown by Wang. Since the invention failed to provide novel or unexpected results from the usage of said claimed formula, use of any mathematical means, including that of the claimed invention, would be an obvious matter of design choice within the skill of the art. In addition, because Frazzoli, Lee and Wang are directed to autonomous vehicle navigation system it would have been obvious for a person with ordinary skill in the art, at the time the invention was made, to have substituted evaluation value equation for providing a faster route to reach the destination while maintaining safety and comfort of the passengers.
Because Frazzoli, Lee and Wang in the same field of endeavor of autonomous vehicle navigation system. Thus, before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified Frazzoli in view of Lee to incorporate the teachings of Wang by including the above feature of introducing discount factor for characterizing evaluation value for avoiding collision by providing updated trajectory.
Regarding claim 21, The claim limitation is interpreted as the evaluation value of the trajectory at the current state is updated based on the reward value of current state, evaluation value at next (future) time step with a different trajectory and a discount factor (see [0093] of PGPUB of submitted specification). The claim limitation is copied and pasted below:
wherein the parameters of the trajectory evaluator are updated according to the equation:                         
                            V
                            
                                
                                    
                                        
                                            s
                                        
                                        
                                            t
                                        
                                    
                                    ,
                                    
                                        
                                            τ
                                        
                                        
                                            t
                                        
                                    
                                
                            
                            ←
                            
                                
                                    r
                                
                                
                                    t
                                
                            
                            +
                            γ
                            V
                            (
                            
                                
                                    s
                                
                                
                                    t
                                    +
                                    1
                                
                            
                            ,
                            T
                            S
                            (
                            
                                
                                    s
                                
                                
                                    t
                                    +
                                    1
                                
                            
                            )
                            )
                        
                    
where st is the current state at the current time step t, ꚍt is the selected trajectory, rt is the computed reward, V(st,ꚍt) is the evaluation value for the selected trajectory at the current time step and the current state, t + 1 is the next time step, TS(st+1) is a next selected trajectory at the next time step, V(st+1,TS(st+1)) is an evaluation value for the next selected trajectory and the next state, and γ is a discount factor.
Frazzoli further discloses a system wherein the parameters of the trajectory evaluator agent are updated based on world states. World states include states of the vehicle and surrounding environments at different time steps (current and future). The states of the vehicle and surround environment at different time steps e.g. current and future are the parameters for updating the trajectory. Frazzoli also discloses a system wherein rewards (feasibility) of candidate trajectories at current state are calculated to pick the optimal trajectory. Frazzoli also discloses a system wherein an evaluation value (overall quality/cost/feasibility) is assigned for the optimal trajectory that reflects the calculated reward of the selected trajectory and the parameters for the current state and future state of the vehicle and surround environment (see citation above on claim 19). 
Frazzoli does not disclose a system that include evaluation value at next (future) time step with a different trajectory and discount factor for updating the parameters (evaluation value) of the selected trajectory.
However, Lee further discloses a system wherein an updated path is generated based on initial path and constraint on the initial path (e.g. obstacles on the path), see [0066-69] and fig 6. Lee discloses a system wherein the selected path is updated in case any uncertainty arises on the path, see [0044-45].
Because Frazzoli and Lee in the same field of endeavor of autonomous vehicle navigation system. Thus, before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified Frazzoli to incorporate the teachings of Lee by including the above feature for providing a collision free path/trajectory.
Frazzoli in view of Lee does not disclose a system that include discount factor for updating the parameters (evaluation value) of the selected trajectory.
However, Wang further discloses a system wherein discount factor is included for calculating reward value when a vehicle reaches a target position. The discount factor is calculated based on distance and other factors (see [0032], where “In the application scenario of global path planning for an unmanned vehicle in the present embodiment, the object model includes: a state s, an action a, a state transition model p, a reward r, and a discount factor γ.”; see also [0043], where “Specifically, the first description type is set in the following way: when the unmanned vehicle reaches the target position, a positive maximum reward value is given; a discount coefficient is set based on the distance, a discount reward value is calculated from the discount coefficient and the maximum reward value; when the distance between the unmanned vehicle and the target position is less than the distance threshold, the discount reward value is given; and when the distance between the unmanned vehicle and the target position is greater than the distance threshold, no reward is given.”;  see also [0099], where “the discount factor γ is an attenuation factor used when calculating rewards acquired by the unmanned vehicle for performing multiple actions, and is used to adjust an output of the value function.”). Wang discloses a path planning system for an unmanned vehicle that include state of vehicle and surround environment, reward and discount factor, see [0032]. Discount factor/coefficient is determined based on quality of the planned path e.g. when the unmanned vehicle and target position is less than a threshold distance then a discount reward is given. The discount factor is used to adjust the output of the value function, see [0043-50]. So, discount factor discloses by Wang is adjusting the value based on distance to target position. Wang discloses a system that characterizes planned path or evaluate the selected trajectory by adding discount factor.
Nevertheless, applying any mathematical formulae, including that of the claimed invention, would have been an obvious design choice for one of ordinary skill in the art because it facilitates known mathematical means for deriving/updating evaluation value as shown by Wang. Since the invention failed to provide novel or unexpected results from the usage of said claimed formula, use of any mathematical means, including that of the claimed invention, would be an obvious matter of design choice within the skill of the art. In addition, because Frazzoli, Lee and Wang are directed to autonomous vehicle navigation system it would have been obvious for a person with ordinary skill in the art, at the time the invention was made, to have substituted evaluation value equation for providing a faster route to reach the destination while maintaining safety and comfort of the passengers.
Because Frazzoli, Lee and Wang in the same field of endeavor of autonomous vehicle navigation system. Thus, before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified Frazzoli in view of Lee to incorporate the teachings of Wang by including the above feature of introducing discount factor for characterizing evaluation value for avoiding collision by providing updated trajectory.

Claim(s) 9 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2017/0277194 (“Frazzoli”), as applied to claim 1 and 10 above, and further in view of US 2019/0243370 (“Li”). 
Regarding claim 9, as best understood in view of indefiniteness rejection explained above, Frazzoli does not disclose the following limitation: 
wherein the selected trajectory is defined by a set of 2D images defining waypoints of the candidate trajectory over multiple time steps.
However, Li discloses a system wherein the selected trajectory is defined by a set of 2D images defining waypoints of the candidate trajectory over multiple time steps (per submitted specification, the candidate trajectory is in the form of a 2D image defining the trajectory as a set of waypoints in the 2D image, see [0077] of PGPUB of submitted specification. see fig 7C, where waypoints along the route of the vehicle are shown. see also [0060], where “For example, in one embedment, the rough path profile is generated by a cost function consisting of costs based on: a curvature of path and a distance from the reference line and/or reference points to obstacles. Points on the reference line are selected and are moved to the left or right of the reference lines as candidate movements representing path candidates. Each of the candidate movements has an associated cost. The associated costs for candidate movements of one or more points on the reference line can be solved using dynamic programming for an optimal cost sequentially, one point at a time. In one embodiment, SL maps generator 509 generates a station-lateral map as part of the rough path profile. A station-lateral map is a two-dimensional geometric map (similar to an x-y coordinate plane) that includes obstacles information perceived by the ADV.”; see also [0064], where “Aggregator 411 performs the function of aggregating the path and speed planning results. For example, in one embodiment, aggregator 411 can combine the two-dimensional ST graph and SL map into a three-dimensional SLT graph. In another embodiment, aggregator 411 can interpolate (or fill in additional points) based on 2 consecutive points on a SL reference line or ST curve. In another embodiment, aggregator 411 can translate reference points from (S, L) coordinates to (x, y) coordinates. Trajectory generator 413 can calculate the final trajectory to control the ADV. For example, based on the SLT graph provided by aggregator 411, trajectory generator 413 calculates a list of (x, y, T) points indicating at what time should the ADC pass a particular (x, y) coordinate.”; see also [0054] and [0056]).
Because both Frazzoli and Li are in the same field of endeavor of autonomous vehicle navigation system. Thus, before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified Frazzoli to incorporate the teachings of Li by including the above feature, wherein the selected trajectory is defined by a set of 2D images defining waypoints of the candidate trajectory over multiple time steps, for providing collision free trajectory by monitoring the position of the vehicle over future time steps.
Regarding claim 18, as best understood in view of indefiniteness rejection explained above, Frazzoli does not disclose the following limitation: 
wherein the selected trajectory is defined by a set of 2D images defining waypoints of the candidate trajectory over multiple time steps.
However, Li further discloses a method wherein the selected trajectory is defined by a set of 2D images defining waypoints of the candidate trajectory over multiple time steps (per submitted specification, the candidate trajectory is in the form of a 2D image defining the trajectory as a set of waypoints in the 2D image, see [0077] of PGPUB of submitted specification. see fig 7C, where waypoints along the route of the vehicle are shown. see also [0060], where “For example, in one embedment, the rough path profile is generated by a cost function consisting of costs based on: a curvature of path and a distance from the reference line and/or reference points to obstacles. Points on the reference line are selected and are moved to the left or right of the reference lines as candidate movements representing path candidates. Each of the candidate movements has an associated cost. The associated costs for candidate movements of one or more points on the reference line can be solved using dynamic programming for an optimal cost sequentially, one point at a time. In one embodiment, SL maps generator 509 generates a station-lateral map as part of the rough path profile. A station-lateral map is a two-dimensional geometric map (similar to an x-y coordinate plane) that includes obstacles information perceived by the ADV.”; see also [0064], where “Aggregator 411 performs the function of aggregating the path and speed planning results. For example, in one embodiment, aggregator 411 can combine the two-dimensional ST graph and SL map into a three-dimensional SLT graph. In another embodiment, aggregator 411 can interpolate (or fill in additional points) based on 2 consecutive points on a SL reference line or ST curve. In another embodiment, aggregator 411 can translate reference points from (S, L) coordinates to (x, y) coordinates. Trajectory generator 413 can calculate the final trajectory to control the ADV. For example, based on the SLT graph provided by aggregator 411, trajectory generator 413 calculates a list of (x, y, T) points indicating at what time should the ADC pass a particular (x, y) coordinate.”; see also [0054] and [0056]).
Because both Frazzoli and Li are in the same field of endeavor of autonomous vehicle navigation system. Thus, before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified Frazzoli to incorporate the teachings of Li by including the above feature, wherein the selected trajectory is defined by a set of 2D images defining waypoints of the candidate trajectory over multiple time steps, for providing collision free trajectory by monitoring the position of the vehicle over future time steps.
Examiner Note
List of references not being used on the current rejection but relevant to current invention:
US 2021/0264795 (“Mguni”) discloses UAV control system based on observation data.
US 2020/0132488 (“Slutskyy”) discloses an optimal trajectory generation for an autonomous vehicle and controlling the vehicle from start to destination.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SOHANA TANJU KHAYER whose telephone number is (408)918-7597.  The examiner can normally be reached on Monday - Thursday, 7 am-5.30 pm, PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abby Lin can be reached on 571-270-3976.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SOHANA TANJU KHAYER/Examiner, Art Unit 3664