DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
This Office Action is in response to the application filed on 11/30/2020. Claims 1-15 are presently pending and are presented for examination. 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-15 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to an abstract idea without significantly more.
As per claim 1
Step 1: The claim is directed to an apparatus as it recites (a controller for an agent).
Step 2 Prong 1: The claim is directed to an abstract idea of a mental process. The claim
recites the limitations (calculate a desired trajectory), (calculate commands for the agent), and
(integrate historic system states). These limitations as drafted are simple processes that under
their broadest reasonable interpretations cover the performance of these limitations in the mind
or by hand.
Step 2 Prong 2: Judicial exception is not integrated into a practical application. There are no
additional elements of recited structure.
Step 2B: The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to Step 2A Prong
2, there are no additional elements. Calculating a desired trajectory utilizing previously accumulated data is well-understood, routine and conventional in the art, as indicated in the following rejections under 103. For these reasons, claim 1 is not patent eligible under 35 U.S.C. § 101.
As per claims 2-9
These apparatus claims further define the abstract ideas of the mental processes illustrated in claim 1, they do not recite any additional elements or other limitations that transform the determinations based on the desired trajectories and previously accumulated data recursively, and these elements are well-understood, routine and conventional in the art, as indicated in the following rejections under 103.
As per claim 11
Step 1: The claim is directed to a method as it recites (a method for training).
Step 2 Prong 1: The claim is directed to an abstract idea of a mental process. The claim
recites the limitation (training the temporal deep network). This limitation as drafted is a simple process that under its broadest reasonable interpretation covers the performance of this limitation in the mind or by hand.
Step 2 Prong 2: Judicial exception is not integrated into a practical application. The claim
recites the additional element of: acquiring trajectories. The instruction to acquire trajectories indications is recited at a high level of generality (i.e., acquiring trajectories for manually driving agents), and amounts to mere data gathering, which is a form of insignificant extra-solution activity. 
Step 2B: The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to Step 2A Prong
2, the additional elements amount to no more than mere instructions to apply gather data. Learning using inverse learning techniques and past experiences are well-understood, routine and conventional in the art, as indicated in the following rejections under 103. For these reasons, claim 11 is not patent eligible under 35 U.S.C. § 101.
As per claim 12
This method claims further define the abstract ideas of the mental processes illustrated in claim 11, they do not recite any additional elements or other limitations that transform the determinations based on the desired trajectories and previously accumulated data recursively, and these elements are well-understood, routine and conventional in the art, as indicated in the following rejections under 103.
As per claim 13
Step 1: The claim is directed to an apparatus as it recites (a computer program code).
Step 2 Prong 1: The claim is directed to an abstract idea of a mental process. The claim
recites the limitation (training the temporal deep network). This limitation as drafted is a simple process that under its broadest reasonable interpretation covers the performance of this limitation in the mind or by hand.
Step 2 Prong 2: Judicial exception is not integrated into a practical application. The claim
recites the additional elements of: acquiring trajectories and at least one processor. The instruction to acquire trajectories indications is recited at a high level of generality (i.e., acquiring trajectories for manually driving agents), and amounts to mere data gathering, which is a form of insignificant extra-solution activity. Further, the recited processor is recited at a high level of generality and merely apply the exception using generic computer components to automate the abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to Step 2A Prong
2, the additional elements amount to no more than mere instructions to apply gather data using generic computer components. Learning using inverse learning techniques, past experiences, and generic computer components are well-understood, routine and conventional in the art, as indicated in the following rejections under 103. For these reasons, claim 13 is not patent eligible under 35 U.S.C. § 101.
As per claim 14
Step 1: The claim is directed to an apparatus as it recites (an apparatus).
Step 2 Prong 1: The claim is directed to an abstract idea of a mental process. The claim
recites the limitation (train the temporal deep network). This limitation as drafted is a simple process that under its broadest reasonable interpretation covers the performance of this limitation in the mind or by hand.
Step 2 Prong 2: Judicial exception is not integrated into a practical application. The claim
recites the additional element of: a processor. The recited processor is recited at a high level of generality and merely apply the exception using generic computer components to automate the abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to Step 2A Prong
2, the additional elements amount to no more than mere instructions to apply gather data using generic computer components. Learning using inverse learning techniques, past experiences, and generic computer components are well-understood, routine and conventional in the art, as indicated in the following rejections under 103. For these reasons, claim 14 is not patent eligible under 35 U.S.C. § 101.
As per claim 15
Step 1: The claim is directed to an apparatus as it recites (a controller for an agent).
Step 2 Prong 1: The claim is directed to an abstract idea of a mental process. The claim
recites the limitations (calculate a desired trajectory), (calculate commands for the agent), and
(integrate historic system states). These limitations as drafted are simple processes that under
their broadest reasonable interpretations cover the performance of these limitations in the mind
or by hand.
Step 2 Prong 2: Judicial exception is not integrated into a practical application. The abstract ideas are being applied in a vehicle environment.
Step 2B: The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to Step 2A Prong
2, the additional elements amount to no more than mere instructions to apply gather data in a vehicle environment. Calculating a desired trajectory utilizing previously accumulated data, and the vehicle environment are well-understood, routine and conventional in the art, as indicated in the following rejections under 103. For these reasons, claim 15 is not patent eligible under 35 U.S.C. § 101.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 13-14 are rejected under 35 U.S.C. 102(a)(1) as being unpatentable over US-20190072959-A1, Palanisamy et al., hereinafter referred to as Palanisamy 959.
As per claim 13
Palanisamy 959 discloses [a] computer program code comprising instructions, which, when executed by at least one processor, cause the at least one processor to acquire trajectories from manually driving agents (An) in a test environment (In embodiments, the deep inverse reinforcement learning (DIRL) module recovers the reward map from human driving data logs (which contain environmental states and/or actions/demonstrations). The discriminator module uses the recovered reward map together with a true environment state to discriminate an output of the generator module., processor (shared, dedicated, or group) and memory that executes one or more software or firmware programs - Palanisamy 959 ¶22 & ¶45); 
train a temporal deep network using inverse reinforcement learning based at least in part on the trajectories acquired from the manually driving agents (An) in the test environment (deep convolutional neural network, In embodiments, the deep inverse reinforcement learning (DIRL) module recovers the reward map from human driving data logs (which contain environmental states and/or actions/demonstrations). The discriminator module uses the recovered reward map together with a true environment state to discriminate an output of the generator module. - Palanisamy 959 ¶12 & ¶22).
As per claim 14
Palanisamy 959 discloses [a]n apparatus for training a temporal deep network comprising: a processor configured to train the temporal deep network using inverse reinforcement learning based on trajectories acquired from manually driving agents (An) in a test environment (deep convolutional neural network, In embodiments, the deep inverse reinforcement learning (DIRL) module recovers the reward map from human driving data logs (which contain environmental states and/or actions/demonstrations). The discriminator module uses the recovered reward map together with a true environment state to discriminate an output of the generator module. - Palanisamy 959 ¶12 & ¶22).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5-8, 10, and 15, and are rejected under 35 U.S.C. 103 as being unpatentable over Palanisamy et al., US-20200139973-A1, and in view of US-20200180647-A1, Anthony, hereinafter referred to as Palanisamy 973 and Anthony.
As per claim 1
Palanisamy 973 discloses [a] controller for an agent (Ai) of a group of agents (An), the controller comprising: a temporal deep network configured to calculate a desired trajectory for the agent (Ai) (At each particular time step the spatial attention module processes the feature map to select relevant regions in the image data that are of importance to focus on for computing actions when making lane-changes while driving, The communication system 36 is configured to wirelessly communicate information to and from other entities 48 , such as but not limited to, other vehicles (“V2V” communication,), For example, the autonomous vehicle 10 may be associated with an autonomous vehicle based remote transportation system. FIG. 2 illustrates an exemplary embodiment of an operating environment shown generally at 50 that includes an autonomous vehicle based remote transportation system 52 that is associated with one or more autonomous vehicles 10 a - 10 n as described with regard to FIG. 1., The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle., The region vectors 132-3 are used by the spatial attention network 134-3 in calculating the context feature vectors 135-3 (which are then processed by the LSTM 150-3 based temporal attention block 160., Once the lane-change policies 172 have been generated they can be deployed to AI driver agent systems used in vehicles to control operation of the vehicles as will be described below with reference to FIG. 10 – Palanisamy 973 ¶9 & ¶52 & ¶57 & ¶102 & ¶106 & ¶123); 
a nonlinear model predictive controller configured to calculate commands for the agent (Ai) based at least in part on the desired trajectory and desired trajectories of other agents (An) of the group of agents (An) (At each particular time step the spatial attention module processes the feature map to select relevant regions in the image data that are of importance to focus on for computing actions when making lane-changes while driving, The communication system 36 is configured to wirelessly communicate information to and from other entities 48 , such as but not limited to, other vehicles (“V2V” communication,), For example, the autonomous vehicle 10 may be associated with an autonomous vehicle based remote transportation system. FIG. 2 illustrates an exemplary embodiment of an operating environment shown generally at 50 that includes an autonomous vehicle based remote transportation system 52 that is associated with one or more autonomous vehicles 10 a - 10 n as described with regard to FIG. 1., Another important concept of CNNs is pooling, which is a form of non-linear down-sampling. There are several non-linear functions to implement pooling including max pooling. Max pooling layers can be inserted between successive convolutional layers of the CNN architecture., For example, the input to the first convolutional layer 224 can be convoluted with a bank of convolutional kernels to generate output neural activations through a non-linear activation function such as a rectified linear unit (ReLU) function., The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle., The region vectors 132-3 are used by the spatial attention network 134-3 in calculating the context feature vectors 135-3 (which are then processed by the LSTM 150-3 based temporal attention block 160., Once the lane-change policies 172 have been generated they can be deployed to AI driver agent systems used in vehicles to control operation of the vehicles as will be described below with reference to FIG. 10 – Palanisamy 973 ¶9 & ¶52 & ¶57 & ¶85 & ¶102 & ¶105 & ¶106 & ¶123);
Palanisamy 973 does not disclose an augmented memory configured to integrate historic system states of the group of agents (An) for the temporal deep network.
However, Anthony teaches an augmented memory configured to integrate historic system states of the group of agents (An) for the temporal deep network (The motion planner is configured to adjust motion of an autonomous vehicle according to the hidden context of traffic entities encountered by the autonomous vehicle while driving in traffic., Accordingly, the symbolic representation of the annotated traffic entities is stored and may be transmitted to systems that are being tested/developed. The motion planner or any other module of the autonomous vehicles that is being tested/developed receives the annotated symbolic representations of the entities and is tested/developed using the simulation data., a long-short-term memory (LSTM) neural network with linear or nonlinear kernels that are two dimensional or three dimensional., The weight parameters can then be adjusted such that their estimated contribution to the overall error is reduced. This process can be repeated for each image (or for each combination of pixel data and human observer summary statistics) in the training set collected. – Anthony ¶22 & ¶25 & ¶53 & ¶54).
Palanisamy 973 discloses systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning. Anthony teaches employing neural networks to develop models for autonomous vehicles utilizing human and historical data.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Palanisamy 973, systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning with employing neural networks to develop models for autonomous vehicles utilizing human and historical data, as taught by Anthony, to improve the performance of autonomous vehicles, see Anthony ¶61 for details. 
As per claim 2
Palanisamy 973 does not specifically disclose wherein the historic system states of the group of agents (An) comprise historic states and observations of the group of agents (An) 
However, Anthony teaches wherein the historic system states of the group of agents (An) comprise historic states and observations of the group of agents (An) (The motion planner is configured to adjust motion of an autonomous vehicle according to the hidden context of traffic entities encountered by the autonomous vehicle while driving in traffic., Accordingly, the symbolic representation of the annotated traffic entities is stored and may be transmitted to systems that are being tested/developed. The motion planner or any other module of the autonomous vehicles that is being tested/developed receives the annotated symbolic representations of the entities and is tested/developed using the simulation data., a long-short-term memory (LSTM) neural network with linear or nonlinear kernels that are two dimensional or three dimensional., The weight parameters can then be adjusted such that their estimated contribution to the overall error is reduced. This process can be repeated for each image (or for each combination of pixel data and human observer summary statistics) in the training set collected. – Anthony ¶22 & ¶25 & ¶53 & ¶54).
Palanisamy 973 discloses systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning. Anthony teaches employing neural networks to develop models for autonomous vehicles utilizing human and historical data.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Palanisamy 973, systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning with employing neural networks to develop models for autonomous vehicles utilizing human and historical data, as taught by Anthony, to improve the performance of autonomous vehicles, see Anthony ¶61 for details. 
As per claim 3
Palanisamy 973 further discloses wherein the temporal deep network comprises a long short-term memory recurrent neural network (The region vectors 132-3 are used by the spatial attention network 134-3 in calculating the context feature vectors 135-3 (which are then processed by the LSTM 150-3 based temporal attention block 160.– Palanisamy 973 ¶102).
As per claim 5
Palanisamy 973 further discloses wherein the controller is configured to share the desired trajectory of the agent (Ai) and observations of the agent (Ai) with the other agents (An) of the group of agents (An) (At each particular time step the spatial attention module processes the feature map to select relevant regions in the image data that are of importance to focus on for computing actions when making lane-changes while driving, The communication system 36 is configured to wirelessly communicate information to and from other entities 48 , such as but not limited to, other vehicles (“V2V” communication,), For example, the autonomous vehicle 10 may be associated with an autonomous vehicle based remote transportation system. FIG. 2 illustrates an exemplary embodiment of an operating environment shown generally at 50 that includes an autonomous vehicle based remote transportation system 52 that is associated with one or more autonomous vehicles 10 a - 10 n as described with regard to FIG. 1., Another important concept of CNNs is pooling, which is a form of non-linear down-sampling. There are several non-linear functions to implement pooling including max pooling. Max pooling layers can be inserted between successive convolutional layers of the CNN architecture., For example, the input to the first convolutional layer 224 can be convoluted with a bank of convolutional kernels to generate output neural activations through a non-linear activation function such as a rectified linear unit (ReLU) function., The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle., Once the lane-change policies 172 have been generated they can be deployed to AI driver agent systems used in vehicles to control operation of the vehicles as will be described below with reference to FIG. 10 – Palanisamy 973 ¶9 & ¶52 & ¶57 & ¶85 & ¶105 & ¶106 & ¶123).
As per claim 6
Palanisamy 973 discloses [a] computer program code comprising instructions, which, when executed by at least one processor, cause the at least one processor to implement a controller for an agent (Ai) of a group of agents (An), the controller comprising (processor configured to execute instructions of a computer program for learning lane-change policies via an actor-critic network architecture. – Palanisamy 973 ¶18): 
a temporal deep network configured to calculate a desired trajectory for the agent (Ai) (At each particular time step the spatial attention module processes the feature map to select relevant regions in the image data that are of importance to focus on for computing actions when making lane-changes while driving, The communication system 36 is configured to wirelessly communicate information to and from other entities 48 , such as but not limited to, other vehicles (“V2V” communication,), For example, the autonomous vehicle 10 may be associated with an autonomous vehicle based remote transportation system. FIG. 2 illustrates an exemplary embodiment of an operating environment shown generally at 50 that includes an autonomous vehicle based remote transportation system 52 that is associated with one or more autonomous vehicles 10 a - 10 n as described with regard to FIG. 1., The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle., The region vectors 132-3 are used by the spatial attention network 134-3 in calculating the context feature vectors 135-3 (which are then processed by the LSTM 150-3 based temporal attention block 160., Once the lane-change policies 172 have been generated they can be deployed to AI driver agent systems used in vehicles to control operation of the vehicles as will be described below with reference to FIG. 10 – Palanisamy 973 ¶9 & ¶52 & ¶57 & ¶102 & ¶106 & ¶123); 
a nonlinear model predictive controller configured to calculate commands for the agent (Ai) based at least in part on the desired trajectory and desired trajectories of other agents (An) of the group of agents (An) (At each particular time step the spatial attention module processes the feature map to select relevant regions in the image data that are of importance to focus on for computing actions when making lane-changes while driving, The communication system 36 is configured to wirelessly communicate information to and from other entities 48 , such as but not limited to, other vehicles (“V2V” communication,), For example, the autonomous vehicle 10 may be associated with an autonomous vehicle based remote transportation system. FIG. 2 illustrates an exemplary embodiment of an operating environment shown generally at 50 that includes an autonomous vehicle based remote transportation system 52 that is associated with one or more autonomous vehicles 10 a - 10 n as described with regard to FIG. 1., Another important concept of CNNs is pooling, which is a form of non-linear down-sampling. There are several non-linear functions to implement pooling including max pooling. Max pooling layers can be inserted between successive convolutional layers of the CNN architecture., For example, the input to the first convolutional layer 224 can be convoluted with a bank of convolutional kernels to generate output neural activations through a non-linear activation function such as a rectified linear unit (ReLU) function., The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle., The region vectors 132-3 are used by the spatial attention network 134-3 in calculating the context feature vectors 135-3 (which are then processed by the LSTM 150-3 based temporal attention block 160., Once the lane-change policies 172 have been generated they can be deployed to AI driver agent systems used in vehicles to control operation of the vehicles as will be described below with reference to FIG. 10 – Palanisamy 973 ¶9 & ¶52 & ¶57 & ¶85 & ¶102 & ¶105 & ¶106 & ¶123); 
Palanisamy 973 does not disclose an augmented memory configured to integrate historic system states of the group of agents (An) for the temporal deep network.
However, Anthony teaches an augmented memory configured to integrate historic system states of the group of agents (An) for the temporal deep network (The motion planner is configured to adjust motion of an autonomous vehicle according to the hidden context of traffic entities encountered by the autonomous vehicle while driving in traffic., Accordingly, the symbolic representation of the annotated traffic entities is stored and may be transmitted to systems that are being tested/developed. The motion planner or any other module of the autonomous vehicles that is being tested/developed receives the annotated symbolic representations of the entities and is tested/developed using the simulation data., a long-short-term memory (LSTM) neural network with linear or nonlinear kernels that are two dimensional or three dimensional., The weight parameters can then be adjusted such that their estimated contribution to the overall error is reduced. This process can be repeated for each image (or for each combination of pixel data and human observer summary statistics) in the training set collected. – Anthony ¶22 & ¶25 & ¶53 & ¶54).
Palanisamy 973 discloses systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning. Anthony teaches employing neural networks to develop models for autonomous vehicles utilizing human and historical data.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Palanisamy 973, systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning with employing neural networks to develop models for autonomous vehicles utilizing human and historical data, as taught by Anthony, to improve the performance of autonomous vehicles, see Anthony ¶61 for details. 
As per claim 7
Palanisamy 973 further discloses for an agent (Ai) of a group of agents (An), wherein the temporal deep network is configured to (At each particular time step the spatial attention module processes the feature map to select relevant regions in the image data that are of importance to focus on for computing actions when making lane-changes while driving, The communication system 36 is configured to wirelessly communicate information to and from other entities 48 , such as but not limited to, other vehicles (“V2V” communication,), For example, the autonomous vehicle 10 may be associated with an autonomous vehicle based remote transportation system. FIG. 2 illustrates an exemplary embodiment of an operating environment shown generally at 50 that includes an autonomous vehicle based remote transportation system 52 that is associated with one or more autonomous vehicles 10 a - 10 n as described with regard to FIG. 1., The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle., The region vectors 132-3 are used by the spatial attention network 134-3 in calculating the context feature vectors 135-3 (which are then processed by the LSTM 150-3 based temporal attention block 160., Once the lane-change policies 172 have been generated they can be deployed to AI driver agent systems used in vehicles to control operation of the vehicles as will be described below with reference to FIG. 10 – Palanisamy 973 ¶9 & ¶52 & ¶57 & ¶102 & ¶106 & ¶123): 
calculate a desired trajectory for the agent (Ai) based at least in part on historic observations of the agent (Ai) (Each of the sensorimotor primitive modules generates a vehicle trajectory and speed profile, The LSTM network 150 - 3 will process the temporal information in the network instead of just having stacked historical observations as input., By this definition, each learned weight is dependent on the previous time step's information and current state information. The learned weights can be interpreted as the importance of the LSTM output at a given frame. – Palanisamy 973 & ¶56 & ¶114 & ¶116); 
calculate a reference trajectory for the agent (Ai) (execute one or more control actions to be performed to automatically control the autonomous vehicle and automate the autonomous driving task encountered in the particular driving scenario (e.g., to achieve one or more particular vehicle trajectory and speed profiles), Each of the sensorimotor primitive modules generates a vehicle trajectory and speed profile – Palanisamy 973 ¶51 & ¶56).
Palanisamy 973 does not disclose calculate historic states of all agents (An) of the group of agents.
However, Anthony teaches calculate historic states of all agents (An) of the group of agents (The motion planner is configured to adjust motion of an autonomous vehicle according to the hidden context of traffic entities encountered by the autonomous vehicle while driving in traffic., Accordingly, the symbolic representation of the annotated traffic entities is stored and may be transmitted to systems that are being tested/developed. The motion planner or any other module of the autonomous vehicles that is being tested/developed receives the annotated symbolic representations of the entities and is tested/developed using the simulation data., a long-short-term memory (LSTM) neural network with linear or nonlinear kernels that are two dimensional or three dimensional., The weight parameters can then be adjusted such that their estimated contribution to the overall error is reduced. This process can be repeated for each image (or for each combination of pixel data and human observer summary statistics) in the training set collected. – Anthony ¶22 & ¶25 & ¶53 & ¶54).
Palanisamy 973 discloses systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning. Anthony teaches employing neural networks to develop models for autonomous vehicles utilizing human and historical data.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Palanisamy 973, systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning with employing neural networks to develop models for autonomous vehicles utilizing human and historical data, as taught by Anthony, to improve the performance of autonomous vehicles, see Anthony ¶61 for details. 
As per claim 8
Palanisamy 973 further discloses wherein the temporal deep network comprises a long short-term memory recurrent neural network (The region vectors 132-3 are used by the spatial attention network 134-3 in calculating the context feature vectors 135-3 (which are then processed by the LSTM 150-3 based temporal attention block 160.– Palanisamy 973 ¶102).
As per claim 10
Palanisamy 973 discloses [a] computer program code comprising instructions, which, when executed by at least one processor, cause the at least one processor to implement a temporal deep network to (processor configured to execute instructions of a computer program for learning lane-change policies via an actor-critic network architecture., The region vectors 132-3 are used by the spatial attention network 134-3 in calculating the context feature vectors 135-3 (which are then processed by the LSTM 150-3 based temporal attention block 160. – Palanisamy 973 ¶18 & ¶102): 
calculate a desired trajectory for an agent (Ai) based at least in part on historic observations of the agent (Ai) (Each of the sensorimotor primitive modules generates a vehicle trajectory and speed profile, The LSTM network 150 - 3 will process the temporal information in the network instead of just having stacked historical observations as input., By this definition, each learned weight is dependent on the previous time step's information and current state information. The learned weights can be interpreted as the importance of the LSTM output at a given frame. – Palanisamy 973 ¶56 & ¶114 & ¶116);
calculate a reference trajectory for the agent (Ai) (execute one or more control actions to be performed to automatically control the autonomous vehicle and automate the autonomous driving task encountered in the particular driving scenario (e.g., to achieve one or more particular vehicle trajectory and speed profiles), Each of the sensorimotor primitive modules generates a vehicle trajectory and speed profile – Palanisamy 973 ¶51 & ¶56).
Palanisamy 973 does not disclose calculate historic states of all agents (An).
However, Anthony teaches calculate historic states of all agents (An) (The motion planner is configured to adjust motion of an autonomous vehicle according to the hidden context of traffic entities encountered by the autonomous vehicle while driving in traffic., Accordingly, the symbolic representation of the annotated traffic entities is stored and may be transmitted to systems that are being tested/developed. The motion planner or any other module of the autonomous vehicles that is being tested/developed receives the annotated symbolic representations of the entities and is tested/developed using the simulation data., a long-short-term memory (LSTM) neural network with linear or nonlinear kernels that are two dimensional or three dimensional., The weight parameters can then be adjusted such that their estimated contribution to the overall error is reduced. This process can be repeated for each image (or for each combination of pixel data and human observer summary statistics) in the training set collected. – Anthony ¶22 & ¶25 & ¶53 & ¶54).
Palanisamy 973 discloses systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning. Anthony teaches employing neural networks to develop models for autonomous vehicles utilizing human and historical data.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Palanisamy 973, systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning with employing neural networks to develop models for autonomous vehicles utilizing human and historical data, as taught by Anthony, to improve the performance of autonomous vehicles, see Anthony ¶61 for details. 
As per claim 15
Palanisamy 973 discloses [a]n autonomous or semi-autonomous vehicle, characterized in that the autonomous or semi-autonomous vehicle comprises (System, methods and a controller are provided for controlling an autonomous vehicle – Palanisamy 973 ¶9): 
a controller comprising: a temporal deep network configured to calculate a desired trajectory for an agent (Ai) (At each particular time step the spatial attention module processes the feature map to select relevant regions in the image data that are of importance to focus on for computing actions when making lane-changes while driving, The communication system 36 is configured to wirelessly communicate information to and from other entities 48 , such as but not limited to, other vehicles (“V2V” communication,), For example, the autonomous vehicle 10 may be associated with an autonomous vehicle based remote transportation system. FIG. 2 illustrates an exemplary embodiment of an operating environment shown generally at 50 that includes an autonomous vehicle based remote transportation system 52 that is associated with one or more autonomous vehicles 10 a - 10 n as described with regard to FIG. 1., The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle., The region vectors 132-3 are used by the spatial attention network 134-3 in calculating the context feature vectors 135-3 (which are then processed by the LSTM 150-3 based temporal attention block 160., Once the lane-change policies 172 have been generated they can be deployed to AI driver agent systems used in vehicles to control operation of the vehicles as will be described below with reference to FIG. 10 – Palanisamy 973 ¶9 & ¶52 & ¶57 & ¶102 & ¶106 & ¶123); 
a nonlinear model predictive controller configured to calculate commands for the agent (Ai) based at least in part on the desired trajectory and desired trajectories of other agents (An) of a group of agents (An) (At each particular time step the spatial attention module processes the feature map to select relevant regions in the image data that are of importance to focus on for computing actions when making lane-changes while driving, The communication system 36 is configured to wirelessly communicate information to and from other entities 48 , such as but not limited to, other vehicles (“V2V” communication,), For example, the autonomous vehicle 10 may be associated with an autonomous vehicle based remote transportation system. FIG. 2 illustrates an exemplary embodiment of an operating environment shown generally at 50 that includes an autonomous vehicle based remote transportation system 52 that is associated with one or more autonomous vehicles 10 a - 10 n as described with regard to FIG. 1., Another important concept of CNNs is pooling, which is a form of non-linear down-sampling. There are several non-linear functions to implement pooling including max pooling. Max pooling layers can be inserted between successive convolutional layers of the CNN architecture., For example, the input to the first convolutional layer 224 can be convoluted with a bank of convolutional kernels to generate output neural activations through a non-linear activation function such as a rectified linear unit (ReLU) function., The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle., The region vectors 132-3 are used by the spatial attention network 134-3 in calculating the context feature vectors 135-3 (which are then processed by the LSTM 150-3 based temporal attention block 160., Once the lane-change policies 172 have been generated they can be deployed to AI driver agent systems used in vehicles to control operation of the vehicles as will be described below with reference to FIG. 10 – Palanisamy 973 ¶9 & ¶52 & ¶57 & ¶85 & ¶102 & ¶105 & ¶106 & ¶123); 
Palanisamy 973 does not disclose an augmented memory configured to integrate historic system states of the group of agents (An) for the temporal deep network.
However, Anthony teaches an augmented memory configured to integrate historic system states of the group of agents (An) for the temporal deep network (The motion planner is configured to adjust motion of an autonomous vehicle according to the hidden context of traffic entities encountered by the autonomous vehicle while driving in traffic., Accordingly, the symbolic representation of the annotated traffic entities is stored and may be transmitted to systems that are being tested/developed. The motion planner or any other module of the autonomous vehicles that is being tested/developed receives the annotated symbolic representations of the entities and is tested/developed using the simulation data., a long-short-term memory (LSTM) neural network with linear or nonlinear kernels that are two dimensional or three dimensional., The weight parameters can then be adjusted such that their estimated contribution to the overall error is reduced. This process can be repeated for each image (or for each combination of pixel data and human observer summary statistics) in the training set collected. – Anthony ¶22 & ¶25 & ¶53 & ¶54).
Palanisamy 973 discloses systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning. Anthony teaches employing neural networks to develop models for autonomous vehicles utilizing human and historical data.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Palanisamy 973, systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning with employing neural networks to develop models for autonomous vehicles utilizing human and historical data, as taught by Anthony, to improve the performance of autonomous vehicles, see Anthony ¶61 for details. 
Claims 4, 9, 11, and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Palanisamy 973 and Anthony, as per claims 1, 7, and 11, respectively, and further in view of US-20190072959-A1, Palanisamy et al., hereinafter referred to as Palanisamy 959. 
As per claim 4
Palanisamy 973 does not specifically disclose wherein the controller is configured to consider a collision avoidance constraint for each agent (An).
However, Palanisamy 959 teaches wherein the controller is configured to consider a collision avoidance constraint for each agent (An) (The reward function R may be expressible as a linear or non-linear combination of weighted features θ, and, in autonomous driving applications, the reward function may be associated with a desired driving behavior (such as collision avoidance behavior). - Palanisamy 959 ¶60).
Palanisamy 973 discloses systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning. Palanisamy 959 teaches systems and methods for controlling autonomous vehicles that train neural networks using human driving logs to avoid collisions. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Palanisamy 973, systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning with systems and methods for controlling autonomous vehicles that train neural networks using human driving logs to avoid collisions, as taught by Palanisamy 959, to make equivalent or improved driving decisions as compared to a human driver, see Palanisamy 959 ¶6 for details. 
As per claim 9
Palanisamy 973 does not specifically disclose wherein the temporal deep network is trained based on inverse reinforcement learning.
However, Palanisamy 959 teaches wherein the temporal deep network is trained based on inverse reinforcement learning (In embodiments, the deep inverse reinforcement learning (DIRL) module recovers the reward map from human driving data logs (which contain environmental states and/or actions/demonstrations). The discriminator module uses the recovered reward map together with a true environment state to discriminate an output of the generator module. - Palanisamy 959 ¶22).
Palanisamy 973 discloses systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning. Palanisamy 959 teaches systems and methods for controlling autonomous vehicles that train neural networks using human driving logs to avoid collisions. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Palanisamy 973, systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning with systems and methods for controlling autonomous vehicles that train neural networks using human driving logs to avoid collisions, as taught by Palanisamy 959, to make equivalent or improved driving decisions as compared to a human driver, see Palanisamy 959 ¶6 for details. 
As per claim 11
Palanisamy 973 discloses [a] method for training a temporal deep network, the method comprising: acquiring trajectories from manually driving agents (An) in a test environment (The region vectors 132-3 are used by the spatial attention network 134-3 in calculating the context feature vectors 135-3 (which are then processed by the LSTM 150-3 based temporal attention block 160., In one embodiment, the policies 172 can be initially sampled from a pool of policies that can be obtained from human driving data or a simulation environment. The policies 172 can then be improved over time using the actor-critic network architecture 102 based on a Deep Recurrent Deterministic Policy Gradient (DRDPG) algorithm. – Palanisamy 973 ¶102 & ¶124); 
Palanisamy 973 does not specifically disclose training the temporal deep network using inverse reinforcement learning based at least in part on trajectories acquired from the manually driving agents (An) in the test environment.
However, Palanisamy 959 teaches training the temporal deep network using inverse reinforcement learning based at least in part on trajectories acquired from the manually driving agents (An) in the test environment (deep convolutional neural network, In embodiments, the deep inverse reinforcement learning (DIRL) module recovers the reward map from human driving data logs (which contain environmental states and/or actions/demonstrations). The discriminator module uses the recovered reward map together with a true environment state to discriminate an output of the generator module. - Palanisamy 959 ¶12 & ¶22).
Palanisamy 973 discloses systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning. Palanisamy 959 teaches systems and methods for controlling autonomous vehicles that train neural networks using human driving logs to avoid collisions. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Palanisamy 973, systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning with systems and methods for controlling autonomous vehicles that train neural networks using human driving logs to avoid collisions, as taught by Palanisamy 959, to make equivalent or improved driving decisions as compared to a human driver, see Palanisamy 959 ¶6 for details. 
As per claim 12
Palanisamy 973 does not specifically disclose wherein parameters of the temporal deep network are learned by minimizing a loss function in a maximum likelihood estimation setup.
However, Palanisamy 959 teaches wherein parameters of the temporal deep network are learned by minimizing a loss function in a maximum likelihood estimation setup (deep convolutional neural network, In embodiments, the deep inverse reinforcement learning (DIRL) module recovers the reward map from human driving data logs (which contain environmental states and/or actions/demonstrations). The discriminator module uses the recovered reward map together with a true environment state to discriminate an output of the generator module. - Palanisamy 959 ¶12 & ¶22).
Palanisamy 973 discloses systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning. Palanisamy 959 teaches systems and methods for controlling autonomous vehicles that train neural networks using human driving logs to avoid collisions. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Palanisamy 973, systems and methods for controlling autonomous vehicles that utilize spatial and temporal attention-based deep reinforcement learning with systems and methods for controlling autonomous vehicles that train neural networks using human driving logs to avoid collisions, as taught by Palanisamy 959, to make equivalent or improved driving decisions as compared to a human driver, see Palanisamy 959 ¶6 for details. 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIS ASIM SHAIKH whose telephone number is (571)272-6426. The examiner can normally be reached 8:00-5:30 M-F EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fadey S. Jabr can be reached on 571-272-1516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/F.A.S./Examiner, Art Unit 3668                                                                                                                                                                                                        
/Thomas Ingram/Primary Examiner, Art Unit 3668