DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-21 are presented for examination.
Claim 1-21 are rejected.

Response to Arguments
Applicant's arguments filed 12/10/2021 have been fully considered but they are not persuasive. 
The Applicants argued some of the claimed subject matter, e.g., “applying a reinforcement learning (RL) algorithm to an initial state of an initially planned trajectory based on the perceived environment (ADV) to determine a plurality of controls for the ADV to advance to a plurality of trajectory states based on map and vehicle control information for the ADV (ADV)” not being taught by the prior art on record, i.e., LIN, please see Pages 1-4 of the Remarks filed 12/10/2021. 
The Examiner kindly steers the applicants’ attention to following fact that LIN clearly teaches the claimed subject matter in question. Concerning “applying a reinforcement learning (RL) algorithm to an initial state of an initially planned trajectory based on the perceived environment (ADV)”, LIN teaches “the candidate planned trajectory sets are generated based on the current position, the current heading direction, the current speed and the a current acceleration of the self-driving vehicle that are currently detected by the vehicle detecting device 11, the path end points from the trajectory calculating device 14, and the width and the curvature of the road…”, as disclosed in ¶ [0039]-¶ [0041], and exhibited in Figs. 2-4 steps 202-225. Therefore, …planning of the trajectory sets…, clearly teaches “reinforcement learning (RL) algorithm”
Regarding “…determine a plurality of controls for the ADV to advance to a plurality of trajectory states based on map and vehicle control information for the ADV (ADV)”, LIN clearly teaches “…determines a target point r* for the candidate path based on whether an obstacle is detected on a corresponding lane of the road, where the target point r* can be expressed as (x*, y*, θ*, k*)…”, as disclosed in ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0048], and exhibited in Figs. 2-4 steps 202-225. Therefore, …determines a target point r* for the candidate path…, clearly teaches “…determine a plurality of controls for the ADV…”.
Concerning claims 7, 14, and 21, Choi is relied upon to teach the claimed subject matter “wherein the RL algorithm is performed by an actor neural network and a critic neural network, and wherein the actor neural network and the critic neural network are deep neural networks”, where Choi teaches “uses an RNN that is augmented with a fusion layer that incorporates interaction between agents and a convolutional neural network (CNN) that provides scene information…” , as disclosed in ¶ [0020]-¶ [0022], and exhibited in Figs. 2-3 steps 202-304)
Hence, the office action is made final with some elucidations to clarify the Examiner’s position.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-6, 8-13, and 15-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by LIN et al. (US Pub. No.: 2020/0156631 A1: hereinafter “LIN”).

          Consider claim 1:
                   LIN teaches a computer-implemented method for operating an autonomous driving vehicle (See LIN, e.g., “A method for planning a trajectory for a self-driving vehicle on a road: generating multiple target planned trajectory sets based on information concerning the self-driving vehicle; …” of Abstract, ¶ [0007], and Figs. 2-4 steps 202-225), the method comprising: perceiving an environment surrounding an autonomous driving vehicle (ADV) (See LIN, e.g., “the candidate planned trajectory sets are generated based on the current position, the current heading direction, the current speed and the a current acceleration of the self-driving vehicle that are currently detected by the vehicle detecting device 11, the path end points from the trajectory calculating device 14, and the width and the curvature of the road…” of ¶ [0039]-¶ [0041], and Figs. 2-4 steps 202-225); applying a reinforcement learning (RL) algorithm to an initial state of an initially planned trajectory based on the perceived environment (ADV) (See LIN, e.g., “Each of the candidate planned trajectory sets includes a candidate path from the current position of the self-driving vehicle to one of the path end points (i.e., each candidate path corresponds with a specific lane of the road), and a candidate speed curve indicating estimated change of the speed of the self-driving vehicle within a driving time period during which the self-driving vehicle moves along the candidate path…” of ¶ [0039]-¶ [0041], and Figs. 2-4 steps 202-225) to determine a plurality of controls for the ADV to advance to a plurality of trajectory states based on map and vehicle control information for the ADV (ADV) (See LIN, e.g., “…determines a target point r* for the candidate path based on whether an obstacle is detected on a corresponding lane of the road, where the target point r* can be expressed as (x*, y*, θ*, k*)…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0048], and Figs. 2-4 steps 202-225); determining a reward prediction by the RL algorithm for each of the plurality of controls in view of a target destination state (ADV) (See LIN, e.g., “sets a target curvature k* for the candidate path and a target current heading direction θ* based on the location (x*, y*) of the target point, thereby obtaining the expression of the target point r*. Using the target point r*…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], and Figs. 2-4 steps 202-225); and generating a first trajectory from the trajectory states by maximizing the reward predictions to control the ADV autonomously according to the first trajectory (ADV) (See LIN, e.g., “…selects the candidate planned trajectory set as a target planned trajectory set. The candidate path and the candidate speed curve of the candidate planned trajectory set that is selected as the target planned trajectory set are then employed to serve respectively as a projected (predicted) path and a projected speed curve of the target planned trajectory set.…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

          Consider claim 2:
                   LIN teaches everything claimed as implemented above in the rejection of claim 1. In addition, LIN teaches further comprising applying a judgment logic to the first trajectory to determine a judgment score for the first (See LIN, e.g., “…selects the candidate planned trajectory set as a target planned trajectory set…the target planned trajectory set are then employed to serve respectively as a projected (predicted) path and a projected speed curve of the target planned trajectory set…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

          Consider claim 3:
                   LIN teaches everything claimed as implemented above in the rejection of claim 2. In addition, LIN teaches wherein the judgment score includes scores for whether the first trajectory ends at the target destination state, whether the first trajectory is smooth, and whether the first trajectory avoids one or more obstacles in the perceived environment (See LIN, e.g., “…for each of the candidate planned trajectory sets, the processor 151 calculates a lateral acceleration based on a curvature of the candidate path and the candidate speed curve, calculates a time to collision (TTC) for the self-driving vehicle to collide the front obstacle of the at least one obstacle based on the distance, on the relative velocity and on the relative acceleration of the front obstacle, and calculates the TTB for the self-driving vehicle based on the current speed of the self-driving vehicle…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

          Consider claim 4:
                   LIN teaches everything claimed as implemented above in the rejection of claim 3. In addition, LIN teaches further comprising, if the judgment score is below a predetermined threshold, generating a second trajectory based on an open space optimization model to control the ADV autonomously according to the second trajectory (See LIN, e.g., “…the processor 151 may compare the lateral acceleration of the candidate planned trajectory set with a pre-determined threshold (e.g., three tenths of gravitational acceleration), and compare the TTB and the TTC that are calculated for the candidate planned trajectory set…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

          Consider claim 5:
                   LIN teaches everything claimed as implemented above in the rejection of claim 4. In addition, LIN teaches wherein the open space optimization model is to generate a trajectory for the ADV without following a reference line or traffic lines (See LIN, e.g., “…selects the candidate planned trajectory set as a target planned trajectory set…the target planned trajectory set are then employed to serve respectively as a projected (predicted) path and a projected speed curve of the target planned trajectory set…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

          Consider claim 6:
                   LIN teaches everything claimed as implemented above in the rejection of claim 4. In addition, LIN teaches wherein the open space optimization model includes a vehicle dynamic model for the ADV (See LIN, e.g., “…selects the candidate planned trajectory set as a target planned trajectory set…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

          Consider claim 8:
                   LIN teaches a non-transitory machine-readable medium having instructions stored therein (Fig. 1 elements 1-15, 151-152), which when executed by a processor, cause the processor to perform operations (Fig. 1 elements 1-15, 151-152), the operations comprising: perceiving an environment surrounding an autonomous driving vehicle (ADV) (See LIN, e.g., “the candidate planned trajectory sets are generated based on the current position, the current heading direction, the current speed and the a current acceleration of the self-driving vehicle that are currently detected by the vehicle detecting device 11, the path end points from the trajectory calculating device 14, and the width and the curvature of the road…” of ¶ [0039]-¶ [0041], and Figs. 2-4 steps 202-225); applying a reinforcement learning (See LIN, e.g., “Each of the candidate planned trajectory sets includes a candidate path from the current position of the self-driving vehicle to one of the path end points (i.e., each candidate path corresponds with a specific lane of the road), and a candidate speed curve indicating estimated change of the speed of the self-driving vehicle within a driving time period during which the self-driving vehicle moves along the candidate path…” of ¶ [0039]-¶ [0041], and Figs. 2-4 steps 202-225) to determine a plurality of controls for the ADV to advance to a plurality of trajectory states based on map and vehicle control information for the ADV (See LIN, e.g., “…determines a target point r* for the candidate path based on whether an obstacle is detected on a corresponding lane of the road, where the target point r* can be expressed as (x*, y*, θ*, k*)…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0048], and Figs. 2-4 steps 202-225); determining a reward prediction by the RL algorithm for each of the plurality of controls in view of a target destination state (See LIN, e.g., “sets a target curvature k* for the candidate path and a target current heading direction θ* based on the location (x*, y*) of the target point, thereby obtaining the expression of the target point r*. Using the target point r*…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], and Figs. 2-4 steps 202-225); and generating a first trajectory from the trajectory states by maximizing the reward predictions to control the ADV autonomously according to the first trajectory (See LIN, e.g., “…selects the candidate planned trajectory set as a target planned trajectory set. The candidate path and the candidate speed curve of the candidate planned trajectory set that is selected as the target planned trajectory set are then employed to serve respectively as a projected (predicted) path and a projected speed curve of the target planned trajectory set.…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

         Consider claim 9:
                   LIN teaches everything claimed as implemented above in the rejection of claim 8. In addition, LIN teaches wherein the operations further comprise applying a judgment logic to the first trajectory to determine a judgment score for the first trajectory (See LIN, e.g., “…selects the candidate planned trajectory set as a target planned trajectory set…the target planned trajectory set are then employed to serve respectively as a projected (predicted) path and a projected speed curve of the target planned trajectory set…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

          Consider claim 10:
                   LIN teaches everything claimed as implemented above in the rejection of claim 9. In addition, LIN teaches wherein the judgment score includes scores for whether the first trajectory ends at the target destination state, whether the first trajectory is smooth, and whether the first trajectory avoids one or more obstacles for the perceived environment (See LIN, e.g., “…for each of the candidate planned trajectory sets, the processor 151 calculates a lateral acceleration based on a curvature of the candidate path and the candidate speed curve, calculates a time to collision (TTC) for the self-driving vehicle to collide the front obstacle of the at least one obstacle based on the distance, on the relative velocity and on the relative acceleration of the front obstacle, and calculates the TTB for the self-driving vehicle based on the current speed of the self-driving vehicle…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

          Consider claim 11:
                   LIN teaches everything claimed as implemented above in the rejection of claim 10. In addition, LIN teaches wherein the operations further comprise, if the judgment score is below a predetermined threshold, generating a second trajectory based on an open space optimization model to control the ADV autonomously according to the second trajectory (See LIN, e.g., “…the processor 151 may compare the lateral acceleration of the candidate planned trajectory set with a pre-determined threshold (e.g., three tenths of gravitational acceleration), and compare the TTB and the TTC that are calculated for the candidate planned trajectory set…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

          Consider claim 12:
                   LIN teaches everything claimed as implemented above in the rejection of claim 11. In addition, LIN teaches wherein the open space optimization model is to generate a trajectory for the ADV without following a reference line or traffic lines (See LIN, e.g., “…selects the candidate planned trajectory set as a target planned trajectory set…the target planned trajectory set are then employed to serve respectively as a projected (predicted) path and a projected speed curve of the target planned trajectory set…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

          Consider claim 13:
                   LIN teaches everything claimed as implemented above in the rejection of claim 11. In addition, LIN teaches wherein the open space optimization model includes a vehicle dynamic model for the ADV (See LIN, e.g., “…selects the candidate planned trajectory set as a target planned trajectory set…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

          Consider claim 15:
                   LIN teaches A data processing system, comprising: a processor; and a memory coupled to the processor to store instructions (Fig. 1 elements 1-15, 151-152), which when executed by the processor, cause the processor to perform operations, the operations including (Fig. 1 elements 1-15, 151-152), perceiving an environment surrounding an autonomous driving vehicle (ADV) (See LIN, e.g., “the candidate planned trajectory sets are generated based on the current position, the current heading direction, the current speed and the a current acceleration of the self-driving vehicle that are currently detected by the vehicle detecting device 11, the path end points from the trajectory calculating device 14, and the width and the curvature of the road…” of ¶ [0039]-¶ [0041], and Figs. 2-4 steps 202-225); applying a reinforcement learning (RL) algorithm to an initial state of an initially planned trajectory (See LIN, e.g., “Each of the candidate planned trajectory sets includes a candidate path from the current position of the self-driving vehicle to one of the path end points (i.e., each candidate path corresponds with a specific lane of the road), and a candidate speed curve indicating estimated change of the speed of the self-driving vehicle within a driving time period during which the self-driving vehicle moves along the candidate path…” of ¶ [0039]-¶ [0041], and Figs. 2-4 steps 202-225) to determine a plurality of controls for the ADV to advance to a plurality of trajectory states based on map and vehicle control information for the ADV (See LIN, e.g., “…determines a target point r* for the candidate path based on whether an obstacle is detected on a corresponding lane of the road, where the target point r* can be expressed as (x*, y*, θ*, k*)…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0048], and Figs. 2-4 steps 202-225); determining a reward prediction by the RL algorithm for each of the plurality of controls in view of a target destination state (See LIN, e.g., “sets a target curvature k* for the candidate path and a target current heading direction θ* based on the location (x*, y*) of the target point, thereby obtaining the expression of the target point r*. Using the target point r*…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], and Figs. 2-4 steps 202-225); and generating a first trajectory from the trajectory states by maximizing the reward predictions to control the ADV autonomously according to the first trajectory (See LIN, e.g., “…selects the candidate planned trajectory set as a target planned trajectory set. The candidate path and the candidate speed curve of the candidate planned trajectory set that is selected as the target planned trajectory set are then employed to serve respectively as a projected (predicted) path and a projected speed curve of the target planned trajectory set.…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

         Consider claim 16:
                   LIN teaches everything claimed as implemented above in the rejection of claim 15. In addition, LIN teaches wherein the operations further comprise applying a judgment logic to the first trajectory to determine a judgment score for the first trajectory (See LIN, e.g., “…selects the candidate planned trajectory set as a target planned trajectory set…the target planned trajectory set are then employed to serve respectively as a projected (predicted) path and a projected speed curve of the target planned trajectory set…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

          Consider claim 17:
                   LIN teaches everything claimed as implemented above in the rejection of claim 16. In addition, LIN teaches wherein the judgment score includes scores for whether the first trajectory ends at the target destination state, whether the first trajectory is smooth, and whether the first trajectory avoids one or more obstacles for the perceived environment (See LIN, e.g., “…for each of the candidate planned trajectory sets, the processor 151 calculates a lateral acceleration based on a curvature of the candidate path and the candidate speed curve, calculates a time to collision (TTC) for the self-driving vehicle to collide the front obstacle of the at least one obstacle based on the distance, on the relative velocity and on the relative acceleration of the front obstacle, and calculates the TTB for the self-driving vehicle based on the current speed of the self-driving vehicle…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

          Consider claim 18:
                   LIN teaches everything claimed as implemented above in the rejection of claim 17. In addition, LIN teaches wherein the operations further comprise, if the judgment score is below a predetermined threshold, generating a second trajectory based on an open space optimization model to control the ADV autonomously according to the second trajectory (See LIN, e.g., “…the processor 151 may compare the lateral acceleration of the candidate planned trajectory set with a pre-determined threshold (e.g., three tenths of gravitational acceleration), and compare the TTB and the TTC that are calculated for the candidate planned trajectory set…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

          Consider claim 19:
                   LIN teaches everything claimed as implemented above in the rejection of claim 18. In addition, LIN teaches wherein the open space optimization model is to generate a trajectory for the ADV without following a reference line or traffic lines (See LIN, e.g., “…selects the candidate planned trajectory set as a target planned trajectory set…the target planned trajectory set are then employed to serve respectively as a projected (predicted) path and a projected speed curve of the target planned trajectory set…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

          Consider claim 20:
                   LIN teaches everything claimed as implemented above in the rejection of claim 18. In addition, LIN teaches wherein the open space optimization model includes a vehicle dynamic model for the ADV (See LIN, e.g., “…selects the candidate planned trajectory set as a target planned trajectory set…” of ¶ [0039]-¶ [0042], ¶ [0047]-¶ [0049], ¶ [0051]-¶ [0060], and Figs. 2-4 steps 202-225).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 7, 14, and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over LIN in view of Choi et al. (US Pub. No.: 2018/0124423 A1: hereinafter “Choi”).

         Consider claim 7:
                   LIN teaches everything claimed as implemented above in the rejection of claim 1. In addition, LIN teaches “Each of the candidate planned trajectory sets includes a candidate path from the current position of the self-driving vehicle to one of the path end points (i.e., each candidate path corresponds with a specific lane of the road), and a candidate speed curve indicating estimated change of the speed of the self-driving vehicle within a driving time period during which the self-driving vehicle moves along the candidate path…” of ¶ [0039]-¶ [0041], and Figs. 2-4 steps 202-225. However, Maruyama does not explicitly teach wherein the RL algorithm is performed by an actor neural network and a critic neural network, and wherein the actor neural network and the critic neural network are deep neural networks.
                     In an analogous field of endeavor, Choi teaches wherein the RL algorithm is performed by an actor neural network and a critic neural network, and wherein the actor neural network and the critic neural network are deep neural networks (See Choi, e.g., “uses an RNN that is augmented with a fusion layer that incorporates interaction between agents and a convolutional neural network (CNN) that provides scene information. Block 204 uses training in a multi-task learning framework where the ranking objective is formulated using inverse optimal control (IOC) and the refinement objective is obtained by regression. In a testing phase, the ranking/refinement of block 204 is iterated to obtain more accurate refinements of the prediction of future trajectories…” of ¶ [0020]-¶ [0022], and Figs. 2-3 steps 202-304).
                   Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of LIN, as taught by Choi, so as to ascertain that the operation of the vehicles are performed smoothly, robustly.

           Consider claim 14:
                   LIN teaches everything claimed as implemented above in the rejection of claim 8. In addition, LIN teaches “Each of the candidate planned trajectory sets includes a candidate path from the current position of the self-driving vehicle to one of the path end points (i.e., each candidate path corresponds with a specific lane of the road), and a candidate speed curve indicating estimated change of the speed of the self-driving vehicle within a driving time period during which the self-driving vehicle moves along the candidate path…” of ¶ [0039]-¶ [0041], and Figs. 2-4 steps 202-225. However, Maruyama does not explicitly teach wherein the RL algorithm is performed by an actor neural network and a critic neural network, and wherein the actor neural network and the critic neural network are deep neural networks.
                     In an analogous field of endeavor, Choi teaches wherein the RL algorithm is performed by an actor neural network and a critic neural network, and wherein the (See Choi, e.g., “uses an RNN that is augmented with a fusion layer that incorporates interaction between agents and a convolutional neural network (CNN) that provides scene information. Block 204 uses training in a multi-task learning framework where the ranking objective is formulated using inverse optimal control (IOC) and the refinement objective is obtained by regression. In a testing phase, the ranking/refinement of block 204 is iterated to obtain more accurate refinements of the prediction of future trajectories…” of ¶ [0020]-¶ [0022], and Figs. 2-3 steps 202-304).
                   Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of LIN, as taught by Choi, so as to ascertain that the operation of the vehicles are performed smoothly, robustly.

             Consider claim 21:
                   LIN teaches everything claimed as implemented above in the rejection of claim 15. In addition, LIN teaches “Each of the candidate planned trajectory sets includes a candidate path from the current position of the self-driving vehicle to one of the path end points (i.e., each candidate path corresponds with a specific lane of the road), and a candidate speed curve indicating estimated change of the speed of the self-driving vehicle within a driving time period during which the self-driving vehicle moves along the candidate path…” of ¶ [0039]-¶ [0041], and Figs. 2-4 steps 202-225. However, Maruyama does not explicitly teach wherein the RL 
                     In an analogous field of endeavor, Choi teaches wherein the RL algorithm is performed by an actor neural network and a critic neural network, and wherein the actor neural network and the critic neural network are deep neural networks (See Choi, e.g., “uses an RNN that is augmented with a fusion layer that incorporates interaction between agents and a convolutional neural network (CNN) that provides scene information. Block 204 uses training in a multi-task learning framework where the ranking objective is formulated using inverse optimal control (IOC) and the refinement objective is obtained by regression. In a testing phase, the ranking/refinement of block 204 is iterated to obtain more accurate refinements of the prediction of future trajectories…” of ¶ [0020]-¶ [0022], and Figs. 2-3 steps 202-304).
                   Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant invention to modify the system of LIN, as taught by Choi, so as to ascertain that the operation of the vehicles are performed smoothly, robustly.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

          Beller et al. (US Pub. No.: 2020/0139967 A1) teaches “Techniques for determining to modify a trajectory based on an object are discussed herein. A vehicle can determine a drivable area of an environment, capture sensor data representing an object in the environment, and perform a spot check to determine whether or not to modify a trajectory. Such a spot check may include processing to incorporate an actual or predicted extent of the object into the drivable area to modify the drivable area. A distance between a reference trajectory and the object can be determined at discrete points along the reference trajectory, and based on a cost, distance, or intersection associated with the trajectory and the modified area, the vehicle can modify its trajectory. One trajectory modification includes following, which may include varying a longitudinal control of the vehicle, for example, to maintain a relative distance and velocity between the vehicle and the object.”

          Zhang et al.  (US Pat. No.: 2019/0235516 A1) teaches “According to some embodiments, a system calculates a first trajectory based on a map and a route information. The system performs a path optimization based on the first trajectory, traffic rules, and an obstacle information describing obstacles perceived by the ADV. The path optimization is performed by performing a spline curve based path optimization on the first trajectory, determining whether a result of the spline curve based path optimization satisfies a first predetermined condition, performing a finite element based path optimization on the first trajectory in response to determining that the result of the spline curve based path optimization does not satisfy the first predetermined condition, performing a speed optimization based on a result of the path optimization, and 

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BABAR SARWAR whose telephone number is (571)270-5584.  The examiner can normally be reached on Mon-Fri 9:00 AM-5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Faris S. Almatrahi can be reached on (313)446-4821.  The fax phone 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BABAR SARWAR/Primary Examiner, Art Unit 3667