DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

EXAMINER'S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given in an interview with Zhan John Cao on 9/8/21.

The application has been amended as follows: 



(Currently Amended) A computer-implemented method to generate a motion planning cost function for an autonomous driving vehicle (ADV), the method comprising: 

collecting information for a driving environment surrounding the ADV using a plurality of sensors of the ADV; 

generating a plurality of sample trajectories from a trajectory sample space for the driving environment; 



wherein the reward model is also generated by: 

generating a Siamese network for the reward model based on the plurality of sample trajectories in the trajectory sampling space and a target trajectory, wherein the target trajectory is an expert trajectory; and 
applying an inverse reinforcement learning algorithm to the Siamese network to determine one or more weighting factors for the reward model to place the target trajectory in a highest ranking among the plurality of sample trajectories, wherein each of the weighting factors correspond to a respective feature for the reward model;


ranking the sample trajectories based on the determined rewards; 

determining a highest ranked trajectory based on the ranking, from the sample trajectories based on the ranking; and 

controlling the ADV autonomously according to the highest ranked trajectory.  

6. (Cancelled) 

1 

10. (Currently Amended) A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations, the operations comprising: 

collecting information for a driving environment surrounding the ADV using a plurality of sensors of the ADV; 

generating a plurality of sample trajectories from a trajectory sample space for the driving environment; 

determining a reward based on a reward model for each of the sample trajectories, wherein the reward model is generated using a rank based conditional inverse reinforcement learning algorithm, wherein the rank based conditional inverse reinforcement learning algorithm is a rank based inverse reinforcement learning conditional on a driving scenario such that the conditional inverse reinforcement learning algorithm is trainable scenario-wise, wherein the driving scenario includes a frame of a planning cycle; 

wherein the reward model is also generated by: 

generating a Siamese network for the reward model based on the plurality of sample trajectories in the trajectory sampling space and a target trajectory, wherein the target trajectory is an expert trajectory; and 

applying an inverse reinforcement learning algorithm to the Siamese network to determine one or more weighting factors for the reward model to place the target trajectory in a highest ranking among the plurality of sample trajectories, wherein each of the weighting factors correspond to a respective feature for the reward model;

ranking the sample trajectories based on the determined rewards; 

determining a highest ranked trajectory based on the ranking, from the sample trajectories based on the ranking; and 

controlling the ADV autonomously according to the highest ranked trajectory.  


11. (Cancelled)

12. (Currently Amended) The non-transitory machine-readable medium of claim 10 

15. (Currently Amended) A computer-implemented method to train a rewards model for an autonomous driving vehicle (ADV), the method comprising: 

determining a target trajectory based on driven trajectories collected from one or more vehicles; 

generating a plurality of sample trajectories from a trajectory sample space for a driving environment of the target trajectory; and 

generating a reward model by applying a rank based conditional inverse reinforcement learning algorithm to the sample trajectories and the target trajectory, wherein the reward model is used by an ADV to generate a driving trajectory to control the ADV, wherein the rank based conditional inverse reinforcement learning algorithm is a rank based inverse reinforcement learning conditional on a driving 

wherein generating a reward model also includes applying a rank based conditional inverse reinforcement learning algorithm comprises: generating a Siamese network for the reward model based on the plurality of sample trajectories in the trajectory sampling space and a target trajectory; and 

applying an inverse reinforcement learning algorithm to the Siamese network to determine one or more weighting factors for the reward model to place the target trajectory in a highest ranking among the plurality of sample trajectories, wherein each of the weighting factors correspond to a respective feature for the reward model, and

ranking the sample trajectories based on the determined rewards; 

determining a highest ranked trajectory based on the ranking, from the sample trajectories based on the ranking; and 

controlling the ADV autonomously according to the highest ranked trajectory.  

20. (Cancelled)

21. (Currently Amended) The method of claim 15 

Reasons for Allowance
The following is an examiner’s statement of reasons for allowance: The closest prior, newly discovered reference Halder US 2019/0310649 discloses in one aspect, a computer-implemented method useful for managing autonomous vehicle application operations with reinforcement learning (RL) . 
 

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS K WILTEY whose telephone number is (571)272-7193.  The examiner can normally be reached on M-F 7-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Olszewski can be reached on (571)272-2706.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


NICHOLAS K. WILTEY
Primary Examiner
Art Unit 3669



/NICHOLAS K WILTEY/Primary Examiner, Art Unit 3669