DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 4 December 2020 has been entered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-3, 5-12, and 14-22 are pending in this application.
Claims 1, 10 and 18 are amended.
Claims 4 and 13 are cancelled. 
Claims 1-3, 5-12, and 14-22 are presented for examination. 

Response to Amendments
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any 
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3, 5-10, 12, 14-18 and 21-22 are rejected under 35 U.S.C. 103 as being unpatentable over Silva et al. (US Publication 2019/0384309 A1) in view of Shalev-Shwartz et al. (US Publication 2019/0377354 A1).
Regarding claim 1, Silva teaches a processor-implemented method in an autonomous vehicle (AV) for executing a maneuver at an intersection, the method comprising: determining, by a processor from vehicle sensor data and road geometry data, a plurality of range measurements, each range measurement determined from a unique ray extending from a starting point on the AV to an ending point that is terminated by an obstacle in the path of that ray or a pre-determined maximum distance (Silva: Para. 0014, 0052, 0067, 0088, 0112, 0113; processor; image data can be captured in an environment, and the image data can be segmented to generate segmented image data indicating drivable regions e.g., road surfaces that are not occupied by an object reads on road geometry data; a first number of expected LIDAR returns associated with the occlusion field and/or a second number of actual LIDAR returns associated with the occlusion field reads on a unique ray extending from a starting point on the AV to an ending point that is terminated by an obstacle in the path of that ray or a pre-determined maximum distance); determining, by the processor from vehicle sensor data, obstacle velocity data, wherein the obstacle velocity data comprises a velocity of an obstacle determined to be at the ending point of the rays (Silva: Para. 0014, 0088, 0112; processor; receiving LIDAR data, image data, and/or other sensor data to measure, as a measured trace, a trajectory e.g., positions, orientations, and velocities of the dynamic object in the environment reads on obstacle velocity data comprises a velocity of an obstacle; LIDAR returns reads on ending point of the rays).
Silva doesn’t explicitly teach determining, by the processor, vehicle state data, the vehicle state data including a velocity of the AV, a distance to a stop line, a distance to a midpoint of an intersection, and a distance to a goal; applying the plurality of range measurements, the obstacle velocity data and the vehicle state data as inputs to a neural network (NN) that is trained to determine a set of discrete behavior actions for the AV to perform comprising a trust option candidate and a do not trust option candidate wherein the trust option candidate comprises one of traversing straight through the intersection, turning left at the intersection, or turning right at the intersection and the do not trust option candidate comprises a different one of the traversing straight through the intersection, turning left at the intersection, or turning right at the intersection, wherein the NN is further trained to determine a unique trajectory control action comprising acceleration of deceleration associated with each discrete behavior action for the AV.
However, Silva is interpreted to disclose an equivalent teaching. The system measures a distance to a stop line, a check line, and an exit of the intersection (Silva: Para. 0093). The check line can represent the furthest into the intersection the vehicle can navigate without interrupting the flow of the traffic (Silva: Para. 0093). The exit point for the intersection is equivalent to a goal distance. It would have been obvious to one of ordinary skill in the art to have a midpoint of the intersection instead of the check line and the intersection exit in order to represent a line or point within the intersection as a reference.
Silva includes information received from various sensors to measure positions, orientations, and velocity of dynamic objects in the environment (Silva: Para. 0112). The system determines characteristics of the area including a distance across the intersection and the acceleration and velocity of the autonomous vehicle (Silva: Para. 0020). Silva discloses left turns, right turns and traversing straight through the intersection as autonomous vehicle navigation maneuvers (Silva: Para. 0016) using intersection regions characteristics such as a distance across the intersection and dynamic object trajectories and selecting and controlling the acceleration and velocity in 
A generic intersection includes vehicle traversing options of right, left and straight. Silva's disclosed straight, right, and left generated trajectories are equivalent to the claimed the option candidates. The act of selected one of the trajectories: right, left, or straight creates a trusted option, while the two not selected options would be do not trust options. The multiple trajectory options includes occlusion information, dynamic trajectory calculations for objects in the environment and velocity and acceleration for the autonomous vehicle to traverse the intersection. Silva considers straight, right, and left navigation of a vehicle through an intersection, where trajectories are generated, and one trajectory with velocity and acceleration control is enacted for the vehicle to enter and exit the intersection. Silva discloses the equivalent of a trust option candidate in the chosen trajectory of the multiple trajectories. The do not trust option candidates are the non-selected but generated trajectories.

 	In the following limitation, Silva teaches determining, using the NN, the set of discrete behavior actions for the AV at the intersection comprising the trust option candidate and the do not trust option candidate and the unique trajectory control action comprising acceleration or deceleration associated with each discrete behavior action for the AV (Silva: Para. 0016, 0020, 0041, 0058-0059, 0074, 0112; distance across the intersection; acceleration level and/or average velocity of the autonomous vehicle; unprotected left turns, free right turns, or negotiating complex intersections; multiple trajectories can be substantially simultaneously generated in accordance with a receding horizon technique, wherein one of the multiple trajectories is selected for the vehicle to navigate).
Silva doesn’t explicitly teach applying a Markov Decision Process, by the processor, to choose a discrete behavior action from the set of discrete behavior actions comprising traversing straight through the intersection, turning left at the intersection, or turning right at the intersection to trust and the associated unique trajectory control action comprising acceleration or deceleration for the AV to perform.
However Shalev-Shwartz, in the same field of endeavor, teaches applying a Markov Decision Process, by the processor, to choose a discrete behavior action from the set of discrete behavior actions comprising traversing straight through the intersection, turning left at the intersection, or turning right at the intersection to trust and the associated unique trajectory control action comprising acceleration or deceleration for the AV to perform (Shalev-Shwartz: Para. 0003, 0070, 0192, 0248, 0256, 0298, 0328, Fig.11E; hierarchical set of decisions organized as a Directed Acyclic Graph; two layers in the long range planning analysis, analysis based on more layers may be possible; basing the long range planning analysis upon one or two layers, three, four or more layers of analysis;  the value of Q.sup..pi.(s, a) may indicate the effect of performing action a on the future; change lanes (e.g., to the left or to the right side) or whether to stay in the current lane; travel from one road to another road at appropriate intersections or interchanges).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to have incorporate Shalev-Shwartz into Silva in order to employ a Markov Decision Process using Q value functions to indicate the effect of performing action on the future (Shalev-Shwartz: Para. 0298).
In the following limitation, Silva teaches communicating, by the processor, a message to vehicle controls conveying the chosen unique trajectory control action associated with the discrete behavior action (Silva: Para. 0058, 0060, 0088; communicate with and/or control corresponding systems of the drive module reads on communicating; processor; determine a path for the vehicle to follow to traverse through an environment reads on the chosen unique trajectory control action); and executing, by the AV, the unique trajectory control action at the intersection (Silva: Para. 0050; vehicle can be controlled to follow a trajectory to traverse the intersection).
claim 3, Silva teaches the method of claim 1, wherein the determining the set of discrete behavior actions and the unique trajectory control action associated with each discrete behavior action (Silva: Para. 0058, 0060; distance across the intersection reads on range measurements; acceleration level and/or average velocity of the autonomous vehicle reads on vehicle state data; unprotected left turns, free right turns, or negotiating complex intersections reads on a set of discrete behavior actions; the operation can include selecting an acceleration level and/or velocity for the vehicle reads on unique trajectory control action associated with each discrete behavior action) comprises: generating a state vector including the vehicle state data, the distance of each ray, and the velocity of obstacles at the end-points of the rays (Silva: Para. 0020, 0051, 0056; determine a position and/or orientation of the vehicle e.g., one or more of an x-, y-, z-position, roll, pitch, or yaw reads on generating a state vector including the vehicle state data; determining an extent of an un-occluded and unoccupied region relative to a distance of an intersection for an autonomous vehicle to travel and/or relative to a velocity, acceleration, and/or time for the autonomous vehicle to traverse a portion of the intersection reads on the distance of each ray, and the velocity of obstacles at the end-points of the rays).
Regarding claim 5, Silva teaches the method of claim 1, wherein the neural network comprises: a hierarchical options network configured to produce two hierarchical option candidates, the two hierarchical option candidates each including a trust option candidate and a do not trust option candidate (Silva: Para. 0015, 0074, 0075; clustering algorithms e.g., k-means, k-medians, expectation maximization EM, hierarchical clustering reads on hierarchical options; determine the occlusion state and ; an actions network configured to produce lower level continuous action choices for acceleration and deceleration (Silva: Para. 0041, 0074; neural network reads on actions network; selecting an acceleration level and/or velocity for the vehicle reads on produce lower level continuous action choices for acceleration and deceleration).
Silva doesn’t explicitly teach a Q values network configured to produce Q values corresponding to the lower level continuous action choices for acceleration and deceleration.
However Shalev-Shwartz, in the same field of endeavor, teaches a Q values network configured to produce Q values corresponding to the lower level continuous action choices for acceleration and deceleration (Shalev-Shwartz: Para. 0070, 0177, 0192, 0298; given that a host vehicle is now in state, s, the value of Q.sup..pi.(s, a) may indicate the effect of performing action a on the future reads on produce Q values; where S is a set of states and A.OR right..sup.2 is the action space e.g., desired speed, acceleration, yaw commands, etc. reads on corresponding to the lower level continuous action choices for acceleration and deceleration).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to have incorporate Shalev-Shwartz into Silva in order to employ a Markov Decision Process using Q value functions to indicate the effect of performing action on the future (Shalev-Shwartz: Para. 0297-0298).
claim 6, Silva teaches the method of claim 5, further comprising: deciding using the hierarchical option candidates that the AV can trust the environment (Silva: Para. 0015, 0074, 0075; clustering algorithms e.g., k-means, k-medians, expectation maximization EM, hierarchical clustering reads on hierarchical options; determine the occlusion state and occupancy state of an occlusion field with a confidence level also referred to as a confidence value that meets or exceeds a threshold value reads on that the AV can trust the environment); and deciding to implement the unique trajectory control action provided by the neural network (Silva: Para. 0073, 0074; the components in the memory can be implemented as a neural network reads on deciding to implement the unique trajectory control action provided by the neural network).
Regarding claim 7, Silva doesn’t explicitly teach a hierarchical options network wherein an input state vector si is followed by three fully connected (FC) layers to generate a Q-values matrix Ot corresponding to two hierarchical option candidates; an actions network wherein the input state vector si is followed by four FC layers to produce a continuous action vector at; and a Q values network that receives the output of a concatenation of the input state vector si followed by an FC layer with the continuous action vector at followed by one FC layer, wherein the Q values network is configured to produce, through four FC layers, a Q-values vector Qt which corresponds to the action vector at.
However Shalev-Shwartz, in the same field of endeavor, teaches a hierarchical options network wherein an input state vector si is followed by three fully connected (FC) layers to generate a Q-values matrix Ot corresponding to two hierarchical option candidates (Shalev-Shwartz: Para. 0070, 0192, 0328, 0298; hierarchical set of decisions organized as a Directed Acyclic Graph reads on hierarchical options network; two layers in the long range planning analysis e.g., a first layer considering the rewards resulting from potential actions to a current state, and a second layer considering the rewards resulting from future action options in response to projected future states, analysis based on more layers may be possible; basing the long range planning analysis upon one or two layers, three, four or more layers of analysis reads on three fully connected layers;  the value of Q.sup..pi.(s, a) may indicate the effect of performing action a on the future reads on Q-values); an actions network wherein the input state vector si is followed by four FC layers to produce a continuous action vector at (Shalev-Shwartz: Para. 0070, 0328; neural network; basing the long range planning analysis upon one or two layers, three, four or more layers of analysis could be used in selecting from among available potential actions in response to a current navigational state reads on four FC layers to produce a continuous action vector at); and a Q values network that receives the output of a concatenation of the input state vector si followed by an FC layer with the continuous action vector at followed by one FC layer, wherein the Q values network is configured to produce, through four FC layers, a Q-values vector Qt which corresponds to the action vector at (Shalev-Shwartz: Para. 0070, 0328, 0298; neural network; long range planning analysis (e.g., a first layer considering the rewards resulting from potential actions to a current state, and a second layer considering the rewards resulting from future action options in response to projected future states), analysis based on more layers may be possible; basing the long range planning analysis upon one or two layers, three, four or more layers of analysis could be used in 
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to have incorporate Shalev-Shwartz into Silva in order to employ a Markov Decision Process using Q value functions to indicate the effect of performing action on the future (Shalev-Shwartz: Para. 0298).
Regarding claim 8, Silva doesn’t explicitly teach learning an optimal policy via the neural network using reinforcement learning.
However Shalev-Shwartz, in the same field of endeavor, teaches learning an optimal policy via the neural network using reinforcement learning (Shalev-Shwartz: Para. 0070, 0297, 0298; given that a host vehicle is now in state, s, the value of Q.sup..pi.(s, a) may indicate the effect of performing action a on the future reads on modeling a choice of actions; Markov Decision Process (MDP); a trained system, such as a neural network).  
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to have incorporate Shalev-Shwartz into Silva in order to employ a Markov Decision Process using Q value functions to indicate the effect of performing action on the future (Shalev-Shwartz: Para. 0298).
In the following limitation, Silva teaches implementing the optimal policy to complete the maneuver at the intersection (Silva: Para. 0014, 0040; the vehicle can be 
Regarding claim 9, Silva teaches the method of claim 1, wherein the maneuver comprises one of traversing straight through the intersection, turning left at the intersection, or turning right at the intersection (Silva: Para. 0016; unprotected left turns, free right turns, or negotiating complex intersections reads on traversing straight through the intersection, turning left at the intersection, or turning right at the intersection). 
Regarding claim 10, Silva teaches a system in an autonomous vehicle (AV) for executing a maneuver at an intersection, the system comprising an intersection maneuver module that comprises one or more processors configured by programming instructions encoded in non-transitory computer readable media, the intersection maneuver module (Silva: Para. 0088; the processors of the computing devices can be any suitable processor capable of executing instructions to process data and perform operations as described herein reads on one or more processors configured by programming instructions encoded in non-transient computer readable media) configured to: determine, from vehicle sensor data and road geometry data, a plurality of range measurements, each range measurement determined from a unique ray extending from a starting point on the AV to an ending point that is terminated by an obstacle in the path of that ray or a pre-determined maximum distance (Silva: Para. 0014, 0052, 0067, 0088, 0112, 0113; processor; image data can be captured in an environment, and the image data can be segmented to generate segmented image data indicating drivable regions e.g., road surfaces that are not occupied by an object reads ; determine, from vehicle sensor data, obstacle velocity data, wherein the obstacle velocity data comprises a velocity of an obstacle determined to be at the ending point of the rays (Silva: Para. 0014, 0088, 0112; processor; receiving LIDAR data, image data, and/or other sensor data to measure, as a measured trace, a trajectory e.g., positions, orientations, and velocities of the dynamic object in the environment reads on obstacle velocity data comprises a velocity of an obstacle; LIDAR returns reads on ending point of the rays).
Silva doesn’t explicitly teach determine vehicle state data wherein the vehicle state data includes a velocity of the AV, a distance to a stop line, a distance to a midpoint of an intersection, and a distance to a goal; apply the plurality of range measurements, the obstacle velocity data and the vehicle state data as inputs to a neural network (NN) that is trained to determine a set of discrete behavior actions for the AV to perform comprising a trust option candidate and a do not trust option candidate wherein the trust option candidate comprises one of the traversing straight through the intersection, turning left at the intersection, or turning right at the intersection and the do not trust option candidate comprises a different one of traversing straight through the intersection, turning left at the intersection, or turning right at the intersection, wherein the NN is further trained to determine a unique trajectory control action comprising acceleration or deceleration associated with each discrete behavior action for the AV. 
However, Silva is interpreted to disclose an equivalent teaching. The system measures a distance to a stop line, a check line, and an exit of the intersection (Silva: Para. 0093). The check line can represent the furthest into the intersection the vehicle can navigate without interrupting the flow of the traffic (Silva: Para. 0093). The exit point for the intersection is equivalent to a goal distance. It would have been obvious to one of ordinary skill in the art to have a midpoint of the intersection instead of the check line and the intersection exit in order to represent a line or point within the intersection as a reference.
Silva includes information received from various sensors to measure positions, orientations, and velocity of dynamic objects in the environment (Silva: Para. 0112). The system determines characteristics of the area including a distance across the intersection and the acceleration and velocity of the autonomous vehicle (Silva: Para. 0020). Silva discloses left turns, right turns and traversing straight through the intersection as autonomous vehicle navigation maneuvers (Silva: Para. 0016) using intersection regions characteristics such as a distance across the intersection and dynamic object trajectories and selecting and controlling the acceleration and velocity in order for the autonomous vehicle to traverse the intersection (Silva: Para. 0020, 0040-0041, 0112) decided by a neural network algorithm (Silva: Para. 0074). Silva discloses multiple trajectories through an intersection being simultaneously generated based on the occlusion grids and using logic evaluate the trajectories, selecting one for the vehicle to navigate (Silva: Para. 0058-0059). From these inputs, Silva discloses 
A generic intersection includes vehicle traversing options of right, left and straight. Silva's disclosed straight, right, and left generated trajectories are equivalent to the claimed the option candidates. The act of selected one of the trajectories: right, left, or straight creates a trusted option, while the two not selected options would be do not trust options. The multiple trajectory options includes occlusion information, dynamic trajectory calculations for objects in the environment and velocity and acceleration for the autonomous vehicle to traverse the intersection. Silva considers straight, right, and left navigation of a vehicle through an intersection, where trajectories are generated, and one trajectory with velocity and acceleration control is enacted for the vehicle to enter and exit the intersection. Silva discloses the equivalent of a trust option candidate in the chosen trajectory of the multiple trajectories. The do not trust option candidates are the non-selected but generated trajectories.
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to determine the velocity of the AV, a distance to a stop line, a distance to a midpoint of an intersection, and a distance to a goal including trust and do not trust option candidates with unique acceleration or deceleration trajectory control action in order to capture image data of the environment and segment the data into drivable regions (Silva: Para. 0052).
determine, using the NN, the set of discrete behavior actions for the AV at the intersection comprising the trust option candidate and the do not trust option candidate and the unique trajectory control action comprising acceleration or deceleration associated with each discrete behavior action for the AV (Silva: Para. 0016, 0020, 0041, 0058-0059, 0074, 0112; distance across the intersection; acceleration level and/or average velocity of the autonomous vehicle; unprotected left turns, free right turns, or negotiating complex intersections; multiple trajectories can be substantially simultaneously generated in accordance with a receding horizon technique, wherein one of the multiple trajectories is selected for the vehicle to navigate).
Silva doesn’t explicitly teach apply a Markov Decision Process, by the processor, to choose a discrete behavior action from the set of discrete behavior actions comprising traversing straight through the intersection, turning left at the intersection, or turning right at the intersection to trust and the associated unique trajectory control action comprising acceleration or deceleration for the AV to perform.
However Shalev-Shwartz, in the same field of endeavor, teaches apply a Markov Decision Process, by the processor, to choose a discrete behavior action from the set of discrete behavior actions comprising traversing straight through the intersection, turning left at the intersection, or turning right at the intersection to trust and the associated unique trajectory control action comprising acceleration or deceleration for the AV to perform (Shalev-Shwartz: Para. 0003, 0070, 0192, 0248, 0256, 0298, 0328, Fig.11E; hierarchical set of decisions organized as a Directed Acyclic Graph; two layers in the long range planning analysis, analysis based on more layers may be possible; basing 
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to have incorporate Shalev-Shwartz into Silva in order to employ a Markov Decision Process using Q value functions to indicate the effect of performing action on the future (Shalev-Shwartz: Para. 0298).
In the following limitation, Silva teaches communicate a message to vehicle controls conveying the chosen unique trajectory control action associated with the discrete behavior action (Silva: Para. 0058, 0060, 0088; communicate with and/or control corresponding systems of the drive module reads on communicating; processor; determine a path for the vehicle to follow to traverse through an environment reads on the chosen unique trajectory control action); and cause the AV to execute the unique trajectory control action at the intersection (Silva: Para. 0073, 0074; the components in the memory (and the memory, discussed below) can be implemented as a neural network reads on applying the state vector as an input to a neural network).
Regarding claim 12, Silva teaches the system of claim 10, wherein the intersection maneuver module is configured to determine a set of discrete behavior actions and a unique trajectory control action associated with each discrete behavior action by: generating a state vector including the vehicle state data, the distance of each ray, and the velocity of obstacles at the end-points of the rays (Silva: Para. 0020, 
Regarding claim 14, Silva teaches the system of claim 10, wherein the neural network comprises: a hierarchical options network configured to produce two hierarchical option candidates, the two hierarchical option candidates each including a trust option candidate and a do not trust option candidate (Silva: Para. 0015, 0074, 0075; clustering algorithms e.g., k-means, k-medians, expectation maximization EM, hierarchical clustering reads on hierarchical options; determine the occlusion state and occupancy state of an occlusion field with a confidence level also referred to as a confidence value that meets or exceeds a threshold value reads on a trust option candidate and a do not trust option candidate); an actions network configured to produce lower level continuous action choices for acceleration and deceleration (Silva: Para. 0041, 0074; neural network reads on actions network; selecting an acceleration level and/or velocity for the vehicle reads on produce lower level continuous action choices for acceleration and deceleration).
Silva doesn’t explicitly teach a Q values network configured to produce Q values corresponding to the lower level continuous action choices for acceleration and deceleration.
a Q values network configured to produce Q values corresponding to the lower level continuous action choices for acceleration and deceleration (Shalev-Shwartz: Para. 0070, 0177, 0192, 0298; given that a host vehicle is now in state, s, the value of Q.sup..pi.(s, a) may indicate the effect of performing action a on the future reads on produce Q values; where S is a set of states and A.OR right..sup.2 is the action space e.g., desired speed, acceleration, yaw commands, etc. reads on corresponding to the lower level continuous action choices for acceleration and deceleration).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to have incorporate Shalev-Shwartz into Silva in order to employ a Markov Decision Process using Q value functions to indicate the effect of performing action on the future (Shalev-Shwartz: Para. 0297-0298).
Regarding claim 15, Silva teaches the system of claim 14, wherein the intersection maneuver module is further configured to: decide using the hierarchical option candidates that the AV can trust the environment (Silva: Para. 0015, 0074, 0075; clustering algorithms e.g., k-means, k-medians, expectation maximization EM, hierarchical clustering reads on hierarchical options; determine the occlusion state and occupancy state of an occlusion field with a confidence level also referred to as a confidence value that meets or exceeds a threshold value reads on that the AV can trust the environment); and decide to implement the unique trajectory control action provided by the neural network (Silva: Para. 0073, 0074; decide to implement the unique trajectory control action provided by the neural network deciding to implement the unique trajectory control action provided by the neural network).
claim 16, Silva doesn’t explicitly teach a hierarchical options network wherein an input state vector si is followed by three fully connected (FC) layers to generate a Q-values matrix Ot corresponding to two hierarchical option candidates; an actions network wherein the input state vector si is followed by four FC layers to produce a continuous action vector at; and a values network that receives the output of a concatenation of the input state vector si followed by an FC layer with the continuous action vector at followed by one FC layer, wherein the Q values network is configured to produce, through four FC layers, a Q-values vector Qt which corresponds to the action vector at.
However Shalev-Shwartz, in the same field of endeavor, teaches a hierarchical options network wherein an input state vector si is followed by three fully connected (FC) layers to generate a Q-values matrix Ot corresponding to two hierarchical option candidates (Shalev-Shwartz: Para. 0070, 0192, 0328, 0298; hierarchical set of decisions organized as a Directed Acyclic Graph reads on hierarchical options network; two layers in the long range planning analysis e.g., a first layer considering the rewards resulting from potential actions to a current state, and a second layer considering the rewards resulting from future action options in response to projected future states, analysis based on more layers may be possible; basing the long range planning analysis upon one or two layers, three, four or more layers of analysis reads on three fully connected layers;  the value of Q.sup..pi.(s, a) may indicate the effect of performing action a on the future reads on Q-values); an actions network wherein the input state vector si is followed by four FC layers to produce a continuous action vector at (Shalev-Shwartz: Para. 0070, 0328; neural network; long range planning analysis e.g., a first ; and a values network that receives the output of a concatenation of the input state vector si followed by an FC layer with the continuous action vector at followed by one FC layer, wherein the Q values network is configured to produce, through four FC layers, a Q-values vector Qt which corresponds to the action vector at (Shalev-Shwartz: Para. 0070, 0328, 0298; neural network; long range planning analysis e.g., a first layer considering the rewards resulting from potential actions to a current state, and a second layer considering the rewards resulting from future action options in response to projected future states, analysis based on more layers may be possible; basing the long range planning analysis upon one or two layers, three, four or more layers of analysis could be used in selecting from among available potential actions in response to a current navigational state reads on four FC layers to produce a continuous action vector at; the Q function may provide a local measure of the quality of an action reads on a Q-values vector Qt which corresponds to the action vector).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to have incorporate Shalev-Shwartz into Silva in order to employ a Markov Decision Process using Q value functions to indicate the effect of performing action on the future (Shalev-Shwartz: Para. 0298).
claim 17, Silva teaches the system of claim 16, wherein the intersection maneuver module is configured to choose a discrete behavior action and a unique trajectory control action to perform by: learning an optimal policy via the neural network using reinforcement learning (Silva: Para. 0073; neural network; machine learning algorithm reads on reinforcement learning); and implementing the optimal policy to complete the maneuver at the intersection (Silva: Para. 0014, 0040; the vehicle can be controlled to follow a trajectory to traverse the intersection reads on implementing the optimal policy to complete the maneuver at the intersection).
Regarding claim 18, Silva teaches an autonomous vehicle (AV), comprising: one or more sensing devices configured to generate vehicle sensor data (Silva: Para. 0022; using multiple sensor modalities e.g., LIDAR sensors, image sensors, RADAR sensors, etc. can improve an overall confidence level associated with an occlusion state or occupancy state of an occlusion field reads on one or more sensing devices configured to generate vehicle sensor data); and an intersection maneuver module (Silva: Para. 0065; occlusion monitoring component reads on intersection maneuver module) configured to: determine, from vehicle sensor data and road geometry data, a plurality of range measurements, each range measurement determined from a unique ray extending from a starting point on the AV to an ending point that is terminated by an obstacle in the path of that ray or a pre-determined maximum distance (Silva: Para. 0014, 0052, 0067, 0088, 0112, 0113; processor; image data can be captured in an environment, and the image data can be segmented to generate segmented image data indicating drivable regions e.g., road surfaces that are not occupied by an object reads on road geometry data; a first number of expected LIDAR returns associated with the ; determine, from vehicle sensor data, obstacle velocity data, wherein the obstacle velocity data comprises a velocity of an obstacle determined to be at the ending point of the rays (Silva: Para. 0014, 0088, 0112; processor; receiving LIDAR data, image data, and/or other sensor data to measure, as a measured trace, a trajectory e.g., positions, orientations, and velocities of the dynamic object in the environment reads on obstacle velocity data comprises a velocity of an obstacle; LIDAR returns reads on ending point of the rays).
Silva doesn’t explicitly teach determine vehicle state data wherein the vehicle state data includes a velocity of the AV, a distance to a stop line, a distance to a midpoint of an intersection, and a distance to a goal; apply the plurality of range measurements, the obstacle velocity data and the vehicle state data as inputs to a neural network (NN) that is trained to determine a set of discrete behavior actions for the AV to perform comprising a trust option candidate and a do not trust option candidate wherein the trust option candidate comprises one of traversing straight through the intersection, turning left at the intersection, or turning right at the intersection and the do not trust option candidate comprises a different one of traversing straight through the intersection, turning left at the intersection, or turning right at the intersection, wherein the NN is further trained to determine a unique trajectory control action comprising acceleration or deceleration associated with each discrete behavior action for the AV. 

Silva includes information received from various sensors to measure positions, orientations, and velocity of dynamic objects in the environment (Silva: Para. 0112). The system determines characteristics of the area including a distance across the intersection and the acceleration and velocity of the autonomous vehicle (Silva: Para. 0020). Silva discloses left turns, right turns and traversing straight through the intersection as autonomous vehicle navigation maneuvers (Silva: Para. 0016) using intersection regions characteristics such as a distance across the intersection and dynamic object trajectories and selecting and controlling the acceleration and velocity in order for the autonomous vehicle to traverse the intersection (Silva: Para. 0020, 0040-0041, 0112) decided by a neural network algorithm (Silva: Para. 0074). Silva discloses multiple trajectories through an intersection being simultaneously generated based on the occlusion grids and using logic evaluate the trajectories, selecting one for the vehicle to navigate (Silva: Para. 0058-0059). From these inputs, Silva discloses processors (Silva: Para. 0088) using a number of decision layers in a neural network to generate a chosen trajectory based on learned parameters (Silva: Para. 0074, 0016). 
A generic intersection includes vehicle traversing options of right, left and straight. Silva's disclosed straight, right, and left generated trajectories are equivalent to the claimed the option candidates. The act of selected one of the trajectories: right, left, or straight creates a trusted option, while the two not selected options would be do not trust options. The multiple trajectory options includes occlusion information, dynamic trajectory calculations for objects in the environment and velocity and acceleration for the autonomous vehicle to traverse the intersection. Silva considers straight, right, and left navigation of a vehicle through an intersection, where trajectories are generated, and one trajectory with velocity and acceleration control is enacted for the vehicle to enter and exit the intersection. Silva discloses the equivalent of a trust option candidate in the chosen trajectory of the multiple trajectories. The do not trust option candidates are the non-selected but generated trajectories.
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to determine the velocity of the AV, a distance to a stop line, a distance to a midpoint of an intersection, and a distance to a goal including trust and do not trust option candidates with unique acceleration or deceleration trajectory control action in order to capture image data of the environment and segment the data into drivable regions (Silva: Para. 0052).
In the following limitation, Silva teaches determine, using the NN, the set of discrete behavior actions for the AV at the intersection comprising the trust option candidate and the do not trust option candidate and the unique trajectory control action comprising acceleration or deceleration associated with each discrete behavior action for the AV (Silva: Para. 0016, 0020, 0041, 0058-0059, 0074, 0112; distance across the intersection; acceleration level and/or average velocity of the autonomous vehicle; unprotected left turns, free right turns, or negotiating complex intersections; multiple trajectories can be substantially simultaneously generated in accordance with a receding horizon technique, wherein one of the multiple trajectories is selected for the vehicle to navigate).
Silva doesn’t explicitly teach apply a Markov Decision Process, by the processor, to choose a discrete behavior action from the set of discrete behavior actions comprising traversing straight through the intersection, turning left at the intersection, or turning right at the intersection to trust and the associated unique trajectory control action comprising acceleration or deceleration for the AV to perform.
However Shalev-Shwartz, in the same field of endeavor, teaches apply a Markov Decision Process, by the processor, to choose a discrete behavior action from the set of discrete behavior actions comprising traversing straight through the intersection, turning left at the intersection, or turning right at the intersection to trust and the associated unique trajectory control action comprising acceleration or deceleration for the AV to perform (Shalev-Shwartz: Para. 0003, 0070, 0192, 0248, 0256, 0298, 0328, Fig.11E; hierarchical set of decisions organized as a Directed Acyclic Graph; two layers in the long range planning analysis, analysis based on more layers may be possible; basing the long range planning analysis upon one or two layers, three, four or more layers of analysis;  the value of Q.sup..pi.(s, a) may indicate the effect of performing action a on 
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to have incorporate Shalev-Shwartz into Silva in order to employ a Markov Decision Process using Q value functions to indicate the effect of performing action on the future (Shalev-Shwartz: Para. 0298).
In the following limitation, Silva teaches communicate a message to vehicle controls conveying the chosen unique trajectory control action associated with the discrete behavior action (Silva: Para. 0058, 0060, 0088; communicate with and/or control corresponding systems of the drive module reads on communicating; processor; determine a path for the vehicle to follow to traverse through an environment reads on the chosen unique trajectory control action); and cause the AV to execute the unique trajectory control action at the intersection (Silva: Para. 0050; vehicle can be controlled to follow a trajectory to traverse the intersection).
Regarding claim 21, Silva teaches the autonomous vehicle of claim 18, wherein the neural network comprises: a hierarchical options network wherein an input state vector si is followed by three fully connected (FC) layers to generate a Q-values matrix Ot corresponding to two hierarchical option candidates (Silva: Para. 0058, 0059, 0088, 0298; hierarchical set of decisions organized as a Directed Acyclic Graph reads on hierarchical options network; two layers in the long range planning analysis (e.g., a first layer considering the rewards resulting from potential actions to a current state, and a second layer considering the rewards resulting from future action options in response to  
Silva don’t explicitly teach an actions network wherein the input state vector si is followed by four FC layers to produce a continuous action vector at.
However Shalev-Shwartz, in the same field of endeavor, teaches an actions network wherein the input state vector si is followed by four FC layers to produce a continuous action vector at (Shalev-Shwartz: Para. 0070, 0328; neural network; basing the long range planning analysis upon one or two layers, three, four or more layers of analysis could be used in selecting from among available potential actions in response to a current navigational state reads on four FC layers to produce a continuous action vector at).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to have incorporate Shalev-Shwartz into Silva in order to employ a Markov Decision Process using Q value functions to indicate the effect of performing action on the future (Shalev-Shwartz: Para. 0298).
Regarding claim 22, Silva don’t explicitly teach a Q values network that receives the output of a concatenation of the input state vector si followed by an FC layer with the continuous action vector at followed by one FC layer, wherein the Q values network is configured to produce, through four FC layers, a Q-values vector Qt which corresponds to the action vector at.
a Q values network that receives the output of a concatenation of the input state vector si followed by an FC layer with the continuous action vector at followed by one FC layer (Shalev-Shwartz: Para. 0328, 0298; a host vehicle is now in state, s, the value of Q.sup..pi.(s, a) may indicate the effect of performing action a on the future reads on a Q values network that receives the output of a concatenation of the input state vector si; two layers in the long range planning analysis e.g., a first layer considering the rewards resulting from potential actions to a current state, and a second layer considering the rewards resulting from future action options in response to projected future states reads on FC layer with the continuous action vector at followed by one FC layer), wherein the Q values network is configured to produce, through four FC layers, a Q-values vector Qt which corresponds to the action vector at (Shalev-Shwartz: Para. 0070, 0328, 0298; neural network; basing the long range planning analysis upon one or two layers, three, four or more layers of analysis could be used in selecting from among available potential actions in response to a current navigational state reads on four FC layers to produce a continuous action vector at).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to have incorporate Shalev-Shwartz into Silva in order to employ a Markov Decision Process using Q value functions to indicate the effect of performing action on the future (Shalev-Shwartz: Para. 0298).


Claims 2, 11 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Silva et al. (US Publication 2019/0384309 A1) and Shalev-Shwartz et al. (US Publication 2019/0377354 A1) and in further view of Zeng (US Publication 2012/0310516 A1). 
Regarding claim 2, Silva and Shalev-Shwartz don’t explicitly teach constructing a computer-generated virtual grid around the AV with the center of the virtual grid located at a middle front of the AV; dividing the virtual grid into a plurality of sub-grids.
However Zeng, in the same field of endeavor, teaches constructing a computer-generated virtual grid around the AV with the center of the virtual grid located at a middle front of the AV (Zeng: Para. 0039, Fig. 3; sub-map is created, origin may defined as the point where vehicle is located when sub-map is created reads on constructing a computer-generated virtual grid around the AV with the center of the virtual grid located at a middle front of the AV); dividing the virtual grid into a plurality of sub-grids (Zeng: Para. 0037; large map may be divided into a plurality of sub-maps reads on dividing the virtual grid into a plurality of sub-grids).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to have incorporate Zeng into Silva and Shalev-Shwartz in order to create sub-map of measures object relative to a vehicle (Zeng: Para. 0035).
In the following limitation, Silva teaches assigning an occupied characteristic to a sub-grid when an obstacle or moving object is present in the area represented by the sub-grid (Silva: Para. 0014, 0019, 0067; determine the occupancy state of the occlusion field e.g., occupied, unoccupied, or indeterminate reads on assigning an occupied characteristic to a sub-grid).
tracing, through the virtual grid, a plurality of linear rays emitted from the middle front of the AV at a plurality of unique angles that covers a front of the AV, wherein each ray begins at the middle front of the AV and ends when it reaches an occupied sub-grid indicating an obstacle or a pre-determined distance.
However Zeng, in the same field of endeavor, teaches tracing, through the virtual grid, a plurality of linear rays emitted from the middle front of the AV at a plurality of unique angles that covers a front of the AV, wherein each ray begins at the middle front of the AV and ends when it reaches an occupied sub-grid indicating an obstacle or a pre-determined distance. 
Zeng includes relative object location data being stored as a vector (Zeng: Para. 0032, 0035), where a LIDAR device measures the relative angle between the object and the vehicle (Zeng: Para. 0033). The sub-map may be limited by the effective range of the sensors, where the LIDAR sensor has a range of 80 to 140 meters (Zeng: Para. 0038). The system creates a sub-map where the origin is defined as the point where the vehicle is located when the sub-map is created (Zeng: Para. 0039). Figure 3 includes a map where rays are extending from the middle front of the vehicle and stopping at the obstacles around the vehicle (Zeng: Fig. 3). The prior art's rays depicted on Figure 3 created using LIDAR detectors to determine the location of obstacles around the vehicle is interpreted as tracing the rays from the middle front of the vehicle indicating an obstacle.

In the following limitation, Silva teaches determining, for each ray, the distance of that ray and the velocity of an obstacle at the end-point of that ray (Silva: Para. 0067, 0112; ray casting component can include functionality to receive LIDAR data and to ray cast the LIDAR data through the occlusion fields to determine an occlusion state and/or an occupancy state of a particular occlusion field reads on determining, for each ray, the distance of that ray;  LIDAR data, image data, and/or other sensor data to measure, as a measured trace, a trajectory e.g., positions, orientations, and velocities of the dynamic object in the environment reads on velocity of an obstacle at the end-point of that ray).
Regarding claim 11, Silva and Shalev-Shwartz don’t explicitly teach constructing a computer-generated virtual grid around the AV with the center of the virtual grid located at a middle front of the AV; dividing the virtual grid into a plurality of sub-grids.
However Zeng, in the same field of endeavor, teaches constructing a computer-generated virtual grid around the AV with the center of the virtual grid located at a middle front of the AV (Zeng: Para. 0039, Fig. 3; sub-map is created, origin may defined as the point where vehicle is located when sub-map is created reads on constructing a computer-generated virtual grid around the AV with the center of the virtual grid located at a middle front of the AV (I need to interpret virtual grid as a map, but the right origin is there)); dividing the virtual grid into a plurality of sub-grids (Zeng: Para. 0037; large map may be divided into a plurality of sub-maps reads on dividing the virtual grid into a plurality of sub-grids).

In the following limitation, Silva teaches assigning an occupied characteristic to a sub-grid when an obstacle or moving object is present in the area represented by the sub-grid (Silva: Para. 0014, 0019, 0067; determine the occupancy state of the occlusion field e.g., occupied, unoccupied, or indeterminate reads on assigning an occupied characteristic to a sub-grid).
Silva and Shalev-Shwartz don’t explicitly teach tracing, through the virtual grid, a plurality of linear rays emitted from the middle front of the AV at a plurality of unique angles that covers a front of the AV, wherein each ray begins at the middle front of the AV and ends when it reaches an occupied sub-grid indicating an obstacle or a pre-determined distance. 
However Zeng, in the same field of endeavor, teaches tracing, through the virtual grid, a plurality of linear rays emitted from the middle front of the AV at a plurality of unique angles that covers a front of the AV, wherein each ray begins at the middle front of the AV and ends when it reaches an occupied sub-grid indicating an obstacle or a pre-determined distance. 
Zeng includes relative object location data being stored as a vector (Zeng: Para. 0032, 0035), where a LIDAR device measures the relative angle between the object and the vehicle (Zeng: Para. 0033). The sub-map may be limited by the effective range of the sensors, where the LIDAR sensor has a range of 80 to 140 meters (Zeng: Para. 0038). The system creates a sub-map where the origin is defined as the point where the 
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to have incorporate Zeng into Silva and Shalev-Shwartz in order to create sub-map of measures object relative to a vehicle (Zeng: Para. 0035).
In the following limitation, Silva teaches determining, for each ray, the distance of that ray and the velocity of an obstacle at the end-point of that ray (Silva: Para. 0067, 0112; ray casting component can include functionality to receive LIDAR data and to ray cast the LIDAR data through the occlusion fields to determine an occlusion state and/or an occupancy state of a particular occlusion field reads on determining, for each ray, the distance of that ray; LIDAR data, image data, and/or other sensor data to measure, as a measured trace, a trajectory e.g., positions, orientations, and velocities of the dynamic object in the environment reads on velocity of an obstacle at the end-point of that ray).
Regarding claim 19, Silva and Shalev-Shwartz don’t explicitly teach wherein the intersection maneuver module is configured to determine a plurality of range measurements and determine obstacle velocity data by: constructing a computer-generated virtual grid around the AV with a center of the virtual grid located at a middle front of the AV; dividing the virtual grid into a plurality of sub-grids.
wherein the intersection maneuver module is configured to determine a plurality of range measurements and determine obstacle velocity data (Zeng: Para. 0032, 0033; relative object location data may be stored as a vector reads on determine a plurality of range measurements and determine obstacle velocity data) by: constructing a computer-generated virtual grid around the AV with a center of the virtual grid located at a middle front of the AV (Zeng: Para. 0039, Fig. 3; sub-map is created, origin may defined as the point where vehicle is located when sub-map is created reads on constructing a computer-generated virtual grid around the AV with the center of the virtual grid located at a middle front of the AV); dividing the virtual grid into a plurality of sub-grids (Zeng: Para. 0037; large map may be divided into a plurality of sub-maps reads on dividing the virtual grid into a plurality of sub-grids).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to have incorporate Zeng into Silva and Shalev-Shwartz in order to create sub-map of measures object relative to a vehicle (Zeng: Para. 0035).
In the following limitation, Silva teaches assigning an occupied characteristic to a sub-grid when an obstacle or moving object is present in the area represented by the sub-grid (Silva: Para. 0014, 0019, 0067; determine the occupancy state of the occlusion field e.g., occupied, unoccupied, or indeterminate reads on assigning an occupied characteristic to a sub-grid).
Silva and Shalev-Shwartz don’t explicitly teach tracing, through the virtual grid, a plurality of linear rays emitted from the middle front of the AV at a plurality of unique angles that covers a front of the AV, wherein each ray begins at the middle front of the AV and ends when it reaches an occupied sub-grid indicating an obstacle or a pre-determined distance.
However Zeng, in the same field of endeavor, teaches tracing, through the virtual grid, a plurality of linear rays emitted from the middle front of the AV at a plurality of unique angles that covers a front of the AV, wherein each ray begins at the middle front of the AV and ends when it reaches an occupied sub-grid indicating an obstacle or a pre-determined distance. 
Zeng includes relative object location data being stored as a vector (Zeng: Para. 0032, 0035), where a LIDAR device measures the relative angle between the object and the vehicle (Zeng: Para. 0033). The sub-map may be limited by the effective range of the sensors, where the LIDAR sensor has a range of 80 to 140 meters (Zeng: Para. 0038). The system creates a sub-map where the origin is defined as the point where the vehicle is located when the sub-map is created (Zeng: Para. 0039). Figure 3 includes a map where rays are extending from the middle front of the vehicle and stopping at the obstacles around the vehicle (Zeng: Fig. 3). The prior art's rays depicted on Figure 3 created using LIDAR detectors to determine the location of obstacles around the vehicle is interpreted as tracing the rays from the middle front of the vehicle indicating an obstacle.
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to have incorporate Zeng into Silva and Shalev-Shwartz in order to create sub-map of measures object relative to a vehicle (Zeng: Para. 0035).
In the following limitation, Silva teach determining, for each ray, the distance of that ray and the velocity of an obstacle at the end-point of that ray (Silva: Para. 0067, 
Regarding claim 20, Silva teaches the autonomous vehicle of claim 19, wherein: the intersection maneuver module is configured to determine a set of discrete behavior actions and a unique trajectory control action associated with each discrete behavior action (Silva: Para. 0016, 0020, 0041, 0088, 0112; distance across the intersection reads on range measurements; acceleration level and/or average velocity of the autonomous vehicle reads on vehicle state data; unprotected left turns, free right turns, or negotiating complex intersections reads on a set of discrete behavior actions; the operation can include selecting an acceleration level and/or velocity for the vehicle reads on unique trajectory control action associated with each discrete behavior action) by: generating a state vector including the vehicle state data, the distance of each ray, and the velocity of obstacles at the end-points of the rays (Silva: Para. 0020, 0051, 0056; determine a position and/or orientation of the vehicle e.g., one or more of an x-, y-, z-position, roll, pitch, or yaw reads on generating a state vector including the vehicle state data; determining an extent of an un-occluded and unoccupied region relative to a distance of an intersection for an autonomous vehicle to travel and/or relative to a velocity, acceleration, and/or time for the autonomous vehicle to traverse a portion of  


Response to Arguments
Applicant’s arguments, filed 4 December 2020, with respect to the rejection of claims 1-3, 5-12, and 14-22 under 35 U.S.C. §103 have been fully considered, but are not persuasive. 
Applicant argues that Silva does not disclose "a neural network (NN) that is trained to determine a set of discrete behavior actions for the AV to perform comprising a trust option candidate and a do not trust option candidate wherein the trust option candidate comprises one of traversing straight through the intersection, turning left at the intersection, or turning right at the intersection and the do not trust option candidate comprises a different one of traversing straight through the intersection, turning left at the intersection, or turning right at the intersection, wherein the NN is further trained to determine a unique trajectory control action comprising acceleration or deceleration associated with each discrete behavior action for the AV."
In response to the applicant’s argument above, the system determines characteristics of the area including a distance across the intersection and the acceleration and velocity of the autonomous vehicle (Silva: Para. 0020). Silva discloses left turns, right turns and traversing straight through the intersection as autonomous vehicle navigation maneuvers (Silva: Para. 0016) using intersection regions characteristics such as a distance across the intersection and dynamic object 
A generic intersection includes vehicle traversing options of right, left and straight. Silva's disclosed straight, right, and left generated trajectories are equivalent to the claimed the option candidates. The act of selected one of the trajectories: right, left, or straight creates a trusted option, while the two not selected options would be do not trust options. The multiple trajectory options includes occlusion information, dynamic trajectory calculations for objects in the environment and velocity and acceleration for the autonomous vehicle to traverse the intersection. Silva considers straight, right, and left navigation of a vehicle through an intersection, where trajectories are generated, and one trajectory with velocity and acceleration control is enacted for the vehicle to enter and exit the intersection. Silva discloses the equivalent of a trust option candidate in the chosen trajectory of the multiple trajectories. The do not trust option candidates are the non-selected but generated trajectories.
13 suggestion by Silva of providing a neural network that is trained to determine both a trust option candidate and a do not trust option candidate. Because Silva does not disclose or suggest both a trust option candidate and a do not trust option candidate, Silva cannot disclose or suggest a unique trajectory control action comprising acceleration or deceleration associated with each discrete behavior action for the AV (each discrete behavior action comprising the trust option candidate and the do not trust option candidate).
In response to the applicant’s argument above, Silva discloses left turns, right turns and traversing straight through the intersection as autonomous vehicle navigation maneuvers (Silva: Para. 0016) using intersection regions characteristics such as a distance across the intersection and dynamic object trajectories and selecting and controlling the acceleration and velocity in order for the autonomous vehicle to traverse the intersection (Silva: Para. 0020, 0040-0041, 0112) decided by a neural network algorithm (Silva: Para. 0074). Silva discloses multiple trajectories through an intersection being simultaneously generated based on the occlusion grids and using logic evaluate the trajectories, selecting one for the vehicle to navigate (Silva: Para. 0058-0059). A generic intersection includes vehicle traversing options of right, left and straight. Silva's disclosed straight, right, and left generated trajectories are equivalent to the claimed the option candidates. The act of selected one of the trajectories: right, left, or straight creates a trusted option, while the two not selected options would be do not trust options. The multiple trajectory options includes occlusion information, dynamic trajectory calculations for objects in the environment and velocity and acceleration for the autonomous vehicle to traverse the intersection. Silva considers straight, right, and 
Applicant next argues that Silva does not and cannot disclose or suggest "determining, using the NN, the set of discrete behavior actions for the AV at the intersection comprising the trust option candidate and the do not trust option candidate and the unique trajectory control action comprising acceleration or deceleration associated with each discrete behavior action for the AV."
In response to the applicant’s argument above, the system simultaneously generates multiple trajectories wherein one trajectory for the vehicle to navigate and determines the locations and instructions for guiding for the AV along each possible trajectory (Silva: Para. 0058, 0059). From these inputs, Silva discloses processors (Silva: Para. 0088) using a number of decision layers in a neural network to generate a chosen trajectory based on learned parameters (Silva: Para. 0074, 0016). The neural network applies determines guiding instructions for multiple AV trajectories.  
Applicant next argues that Shalev-Shwartz reference does not make up for the lack of disclosure by the Silva reference of features of claim 1. Shalev-Shwartz reference does not disclose "applying a Markov Decision Process, by the processor, to choose a discrete behavior action from the set of discrete behavior actions comprising traversing straight through the intersection, turning left at the intersection, or turning 
In response to the applicant’s argument above, Shalev-Shwartz was used only to reject claim 1’s limitation above. Shalev-Shwartz discloses a neural network that uses a MDP model, Markov Decision Process, to determine the unique trajectory based on a Q function in order. The quality is enhanced by taking into account the future effect of performing any given navigational instruction of the multiple trajectories in order to determine the best unique chosen trajectory for the vehicle (Shalev-Shwartz: Para. 0070, 0298).  
Applicant next argues that there would be no motivation to combine the claimed feature with the Silva reference since Silva would have no need or use for a Markov Decision Process since it does not disclose or suggest "determining, using the NN ... the trust option candidate and the do not trust option candidate and the unique trajectory control action comprising acceleration or deceleration associated with each" and the need to select one of the trust option candidate and the do not trust option candidate to trust. The only reason to add the feature would be impermissible hindsight reconstruction.
In response to applicant's argument above that the examiner's conclusion of obviousness is based upon improper hindsight reasoning, it must be recognized that any judgment on obviousness is in a sense necessarily a reconstruction based upon hindsight reasoning.  But so long as it takes into account only knowledge which was within the level of ordinary skill at the time the claimed invention was made, and does not include knowledge gleaned only from the applicant's disclosure, such a In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 1971).
Silva teaches a neural network system to determine multiple trajectories and chose a unique trajectory (Silva: Para. 0058, 0074). Shalev-Shwartz’s neural network uses four layers of analysis and Markov Decision Process’s Q function to indicate the future effect of each trajectory in order to determine the trajectory of an autonomous vehicle (Silva: Para. 0070, 0328, 0298). Shalev-Shwartz’s Markov Decision Process in a neural network chooses an autonomous’ vehicle trajectory make Shalev-Shwartz analogous art.
Applicant next argues that Shalev-Schwartz fails to teach or suggest: (a) "applying the plurality of range measurements, the obstacle velocity data and the vehicle state data as inputs to a neural network (NN) that is trained to determine a set of discrete behavior actions for the AV to perform comprising a trust option candidate and a do not trust option candidate wherein the trust option candidate comprises one of traversing straight through the intersection, turning left at the intersection, or turning right at the intersection and the do not trust option candidate comprises a different one of traversing straight through the intersection, turning left at the intersection, or turning right at the intersection, wherein the NN is further trained to determine a unique trajectory control action comprising acceleration or deceleration associated with each discrete behavior action for the AV"; and (b) "determining, using the NN, the set of discrete behavior actions for the AV at the intersection comprising the trust option candidate and the do not trust option candidate15 and the unique trajectory control 
In response to the applicant’s argument above, Shalev-Schwartz was not used to reject those claimed limitations.
Applicant next argues that Zeng reference does not make up for the lack of disclosure by the Silva and Shalev-Shwartz references of features of claim 1.
In response to the applicant’s argument above, Zeng was not used to reject claim 1.
Applicant argues that “independent claims 10 and 18 recite limitations similar to the features discussed above and are patentable for the same reasons.”
The independent claim 1 is rejected by the prior arts, therefore claims 10 and 18 are rejected for the same reasons.
Applicant argues that “because the independent claims are patentable, the dependent claims are also patentable.”
The independent claims are rejected by the prior arts, therefore dependent claims are also patentable.
The applicant’s arguments have failed to point out the distinguishing characteristics of the amended claim language over the prior art. For the above reasons, Silva’s autonomous vehicle path planning reads on applicant’s decisions at intersections using hierarchical options Markov decision process. The rejection is maintained. 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LAURA E LINHARDT whose telephone number is (571) 272-8325.  The examiner can normally be reached on M-TR, M-F: 8am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Angela Ortiz can be reached on (571) 272-1206.  The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at (866) 217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call (800) 786-9199 (IN USA OR CANADA) or (571) 272-1000.




/L.E.L./				/ADAM D TISSOT/                                                      Primary Examiner, Art Unit 3663                                                                                                                                                  Examiner, Art Unit 3663