Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Information Disclosure Statement
The information disclosure statement filed 03/15/2020 fails to comply with 37 CFR 1.98(a)(2), which requires a legible copy of each cited foreign patent document; each non-patent literature publication or that portion which caused it to be listed; and all other information or that portion which caused it to be listed.  It has been placed in the application file, but the information referred to therein has not been considered.
No copy has been provided for Non-Patent Literature Cite No. 5 “Drogon: A Causal reasoning framework for future trajectory forecast”.


Claim Objections
Claim 19 is objected to because of the following informalities:  “A non-transitory computer readable storage medium storing instructions that when executed by a computer, which includes a processor perform a method, the method comprising” should read — A non-transitory computer readable storage medium storing instructions that when executed by a computer, which includes a  processor, perform a method, the method comprising —, with a comma separating the clause “which includes a processor”.
Appropriate correction is required.



Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 6-8 and 15-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claims 6 and 15, the phrase “wherein an encoding function is completed to encode and output multi-attention features that are associated with predicted attention weights that are respectively associated with each of the agents” renders the claim indefinite.
The phrase “predicted attention weights” is indefinite because it is unclear whether this term is in reference to previously recited “attention weights” in previous dependent claims or meant as a different term. 
Additionally, the phrase “multi-attention features” is indefinite as it is unclear whether this is in reference to “features that are associated with the agents” previously recited in the same claim. 
For the purposes of examination, the examiner will take “wherein an encoding function is completed to encode and output multi-attention features that are associated with predicted attention weights that are respectively associated with each of the agents” as — wherein an encoding function is completed to encode and output  the features that are associated with  the attention weights that are respectively associated with each of the agents —, based on previous dependent claims, FIG. 6, ¶[0060] of the specification: “...The interaction encoder 112 may perform interaction encoding to attend to useful features by executing the multi-attention function to highlight important interactions in space and in time. In one configuration, the interaction encoder 112 may be configured to predict attention weights using convolution options with a soft-maxfunction”, and ¶[0029]-¶[0030] of the specification: “...As discussed below, the neural network 108 may be configured to utilize an interaction encoder 112 to encode meaningful interactions into encoded features. The interaction encoder 112 may be configured to execute a multi-attention function to highlight important interactions in space and in time that occur with respect to the agents 202 within the surrounding environment 200 of the ego vehicle 102... The neural network 108 may additionally utilize a decoder 114 to decode the encoded features into multi-modal trajectories (represented in FIG. 2 by the exemplary arrows). The multi-modal trajectories may include a set of plausible deterministic trajectories that are respectively associated with each of the agents 202. The multi-modal trajectories may be outputted by the decoder 114 as predicted future trajectories of the agents 202. As discussed below, the predicted future trajectories may be output with corresponding probabilities that pertain to a likelihood that each respective agent 202 utilizes each respective predicted trajectory associated with that particular agent 202 in one or more future time steps”.
Dependent claims 7-8 and 16-17 inherit and do not cure the deficiencies of claims 6 and 15 and are therefore rejected on the same basis as outlined above.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-2, 10-11, and 19 rejected under 35 U.S.C. 103 as being unpatentable over McGill, JR. et al. (US 20200089246 A1) in view of Kum et al. (US 20210261167 A1), henceforth known as McGill and Kum, respectively.

Regarding claim 1, the claim limitations recite a method having limitations similar to those of claim 10 and is therefore rejected on the same basis, as outlined below. 

Regarding claim 2, the claim limitations recite a method having limitations similar to those of claim 11 and is therefore rejected on the same basis, as outlined below. 

Regarding claim 10, McGill discloses:
A system for providing social-stage spatio-temporal multi-modal future forecasting comprising: 
(McGill, FIG. 2; FIG. 3B; FIG. 4; 
¶[0011]: “FIG. 2 illustrates one embodiment of a trajectory prediction system”;
¶[0014]: “FIG. 4 is a block diagram of a trajectory prediction module, in accordance with an illustrative embodiment of the invention”;
Where the trajectory prediction system 170 (A system) predicts a plurality of potential future trajectories of road agents external to the vehicle (for providing social-stage spatio-temporal multi-modal future forecasting))

a memory storing instructions when executed by a processor cause the processor to: 
(McGill, FIG. 2;
¶[0039]: “...The trajectory prediction system 170 is shown as including one or more processors 110 from the vehicle 100 of FIG. 1... the trajectory prediction system 170 includes a memory 210 that stores a trajectory-prediction module 220, a control module 230, and a model-training module 240. The memory 210 is a random-access memory (RAM), read-only memory (ROM), a hard-disk drive, a flash memory, or other suitable memory for storing the modules 220, 230, and 240. The modules 220, 230, and 240 are, for example, computer-readable instructions that when executed by the one or more processors 110, cause the one or more processors 110 to perform the various functions disclosed herein”;
Where the trajectory prediction system 170 includes memory 210 (a memory) storing computer-readable instructions (storing instructions) that when executed by the one or more processors 110, cause the one or more processors 110 to perform the various functions disclosed herein (when executed by a processor cause the processor to:))

receive environment data associated with a surrounding environment of an ego vehicle; 
(McGill, FIG. 1; FIG. 2;
¶[0040]: “...trajectory prediction system 170 receives image data from one or more cameras 126. Trajectory prediction system 170 may also receive LIDAR data from LIDAR sensors 124, radar data from radar sensors 123, and/or sonar data from sonar sensors 125... ”;
¶[0102]: “...the sensor system 120 can include one or more environment sensors 122 configured to acquire, and/or sense driving environment data. “Driving environment data” includes and data or information about the external environment in which an autonomous vehicle is located or one or more portions thereof”;
Where the trajectory prediction system 170 receives data from environment sensors 122 (receive environment data) associated with the external environment of vehicle 100 (associated with a surrounding environment of an ego vehicle))

implement graph convolutions to obtain [...] weights that are respectively associated with agents that are located within the surrounding environment; 
(McGill, FIG. 3B; FIG. 4; FIG. 10; ¶[0006]: external road agent;
¶[0058]: “...the embodiment shown in FIG. 10, variational trajectory predictor 1020 includes a model that assumes future road-agent trajectories, projected onto a polynomial basis, form a Gaussian mixture model (GMM) with diagonal covariance matrices. Given a trajectory τx(t): [0, T]custom-character2 and function basis B, the projection coefficients cx can be computed as cx=ProjB(τ), and the trajectory r can be computed as τx=Bc. The bold typeface of certain variables indicates that these are vector quantities. Analogous relationships apply to the trajectory τy and projection coefficients cy. Thus, a probability distribution over future trajectory can be transformed from a set of projection coefficients, and each projection coefficient is represented as a GMM. The number of components represents the distribution of the likely movements of a detected road agent. For instance, using four components may yield two μ (mean) components that are nearly identical, and two other more distinct components. This would indicate that there are three distinct likely trajectories for the road agent. The GMM parameters 1050 produced by variational trajectory predictor 1020 include the weights w, the means μx and μy, and the variances σx2 and σy2 of the projection coefficients associated with a future road-agent trajectory...”;
Where the trajectory prediction system 170 forms a Gaussian mixture model (GMM) using diagonal covariance matrices (implement graph convolutions) in order to obtain weights w (to obtain [...] weights) associated with a future road-agent trajectory, i.e. associated with an external road-agent in the environment of vehicle 100 (that are respectively associated with agents that are located within the surrounding environment)
For clarity, the broadest reasonable interpretation of a convolution is a transformation of two functions; in the instance above, the future road agent trajectories and the covariance matrix are the first and second functions, respectively)

[...] multi modal trajectories and probabilities for each of the agents, wherein predicted trajectories are determined for each of the agents and rankings associated with probabilities that are associated with each of the predicted trajectories are outputted; and 
(McGill, FIG. 3B; FIG. 4; FIG. 10;
¶[0043]: “FIG. 3B illustrates... At intersection 300, road agent 305 (in this example, another automobile) has at least three choices: (1) proceed straight (trajectory 320a); (2) turn right (trajectory 320b); or (3) turn left (trajectory 320c)... trajectory-prediction module 220 produces, for the various possible road-agent trajectories, probability distributions conditioned on the road agent's past trajectory and current vehicle sensor inputs... trajectory-prediction module 220 also generates a confidence estimate for each predicted road-agent trajectory. Any or all of this information (statistical parameters defining probability distributions, predicted trajectories sampled from the probability distributions, and confidence scores) can be output to other functional units of trajectory prediction system 170”;
¶[0057]: “...Though FIG. 10 focuses on road-agent trajectory prediction subsystem 415 as a specific example, the description below also applies to the other K−1 road-agent trajectory prediction subsystems referenced in FIG. 4...”;
¶[0060]: “...mixture predictor 1070 selects as the most likely predicted road-agent trajectory the one having a confidence score 1060 indicating the highest level of confidence among the candidate predicted road-agent trajectories... the trajectory predictions and their respective confidence scores 1060 (shown as trajectory predictions 450 in FIG. 10) are output to other functional units of vehicle 100... trajectory predictions 450 include the parameters defining the trajectory probability distributions in addition to... specific trajectory predictions obtained by sampling the distributions and their associated confidence scores”;
Where the trajectory prediction system 170 predicts multiple trajectories for each external agent, shown in FIG. 3B ([...] multi modal trajectories) and determines a probability distribution of each trajectory for each trajectory of each agent (and probabilities for each of the agents), wherein the predicted trajectories are determined for each of the agents as shown in FIG. 3B and FIG. 4 (wherein predicted trajectories are determined for each of the agents) and where the highest confidence score indicates the most likely predicted trajectory, i.e. the confidence scores are rankings associated with probabilities (and rankings associated with probabilities), each associated with a trajectory (that are associated with each of the predicted trajectories); the confidence scores are output as part of trajectory predictions 450 (are outputted))

control at least one vehicle system of the ego vehicle based on the predicted trajectories associated with each of the agents and the rankings associated with the probabilities.
(McGill, FIG. 13;
¶[0086]: “At block 1340, control module 230 controls the operation of the ego vehicle (vehicle 100) based, at least in part, on at least one of (1) the iteratively updated predicted trajectories 465 of vehicle 100; and (2) the iteratively updated predicted trajectories 450 of the external road agent... control module 230 can control the operation of vehicle 100 (the ego vehicle) by planning a trajectory for the ego vehicle based, at least in part, on some or all of the kinds of trajectory-prediction information discussed herein (specific predicted trajectories sampled from probability distributions, parameters defining trajectory probability distributions, and confidence scores associated with predicted trajectories)”;
¶[0087]: “Though method 1300 was described above in terms of a single road agent whose trajectory is predicted by road-agent trajectory prediction subsystem 415, in other embodiments, method 1300 can be generalized to the case of K road agents (refer to elements 415, 420, and 425 in FIG. 4)...”;
Where control module 230 controls the operation of the ego vehicle 100 (control at least one vehicle system of the ego vehicle) based on the predicted trajectories of the external road agents (based on the predicted trajectories associated with each of the agents) and the confidence scores associated with the probabilities of each trajectory (and the rankings associated with the probabilities)).

McGill fails to explicitly disclose implement graph convolutions to obtain attention weights and decode... trajectories... for each of the agents, the limitations bolded for emphasis.
However in the same field of endeavor, Kum teaches:
[implement graph convolutions to obtain] attention [weights that are respectively associated with agents that are located within the surrounding environment]; 
(Kum, FIG. 1: Processor 180, Memory 170; FIG. 6 (a), (b); FIG. 7; FIG. 8 (a), (b);
¶[0026]: “...The processor 180 may predict the future trajectory of the surrounding objects for planning driving trajectory of the electronic device 100 based on a surrounding situation of the electronic device 100...”;
¶[0038]: “Referring to FIG. 7, at operation 710, the electronic device 100 may assign importance to each of the surrounding objects using a scalable attention mechanism... the processor 180 may relatively evaluate two nodes connected by each edge in the graph model based on state information on the electronic device 100 and the surrounding objects... the processor 180 may assign importance to each of the nodes based on the results of the evaluation”;
¶[0039]: “At operation 720, the electronic device 100 may compute an adjacency matrix based on the importance of each of the surrounding objects...”;
¶[0040]: “At operation 730, the electronic device 100 may configure a graph convolution neural network using the adjacency matrix... as illustrated in FIG. 8(b), the processor 180 may configure the graph convolution neural network by performing a graph convolution operation”;
Where electronic device 100, including a processor 180 and memory 170, performs a graph convolution operation ([implement graph convolutions]) in order to assign an amount of importance ([to obtain] attention [weights]) to each of the surrounding objects ([that are respectively associated with agents that are located within the surrounding environment]))

decode [... trajectories... for each of the agents...]
(Kum, FIG. 5; FIG. 7; 
¶[0041]: “At operation 740, the electronic device 100 may predict the future trajectories of the surrounding objects based on the long short-term memory network LSTM. The processor 180 may predict the future trajectories from the graph model and the graph convolution neural network based on the LSTM. In this case, the LSTM may be configured with an encoding structure and a decoding structure... as illustrated in FIG. 9(b), the processor 180 may predict the future trajectories based on the state information on the electronic device 100 and the surrounding objects along with the hidden state information and the memory cell state information through the decoding structure of the LSTM”;
Where the electronic device 100 includes a decoding structure to predict the future trajectories of the surrounding objects (decode [multi modal trajectories... for each of the agents...])).

It would have been obvious to a person having ordinary skill in the art prior to the effective filing date to combine the system of McGill with the features taught by Kum because “The electronic device 100 may process unspecified number of multiple surrounding objects that vary in real time in an integrated way through the scalable attention mechanism” (Kum, ¶[0038]) and “...the processor 180 may plan hidden state information and memory cell state information based on the motion characteristic varying over time and the interaction characteristic through the encoding structure of the LSTM” (Kum, ¶[0041]) which would then require a decoder.
The features taught by Kum enable “...a computational load... smaller than a computational load according to the existing technologies and is also consistent regardless of the number of surrounding objects. Furthermore... a prediction error according to various embodiments is smaller than a prediction error according to the existing technologies and is also consistent regardless of the number of surrounding objects. Accordingly, performance according to various embodiments is more excellent than performance of the existing technologies and is also consistent regardless of the number of surrounding objects” (Kum, ¶[0045]; see also FIG. 10 (a), (b)).


Regarding claim 11, McGill and Kum teach the system of claim 10. Kum further teaches:
wherein receiving the environment data associated with the surrounding environment includes receiving images and LiDAR measurements captured of the agents that are located within the surrounding environment of the ego vehicle at a plurality of time steps.
(Kum, FIG. 1: Camera module 120, Sensor module 130; FIG. 5; FIG. 6 (a); 
¶[0021]: “...the sensor module 130 may include at least any one of... a LiDAR sensor...”;
¶[0026]: “...the processor 180 may collect the information on the surrounding situations of the electronic device 100 based on at least any one of image data obtained through the camera module 120 or sensing data obtained through the sensor module 130. The processor 180 may predict the future trajectory of the surrounding objects for planning driving trajectory of the electronic device 100 based on a surrounding situation of the electronic device 100”;
¶[0055]: “...the recognizing of the trajectories may include configuring a moving coordinate system based on a constant velocity model using the state information on the electronic device 100, representing relative positions of the surrounding objects on the moving coordinate system, and recognizing the historical trajectories based on the time-serial positions...”;
Where  electronic device 100 receives information on the surrounding situation (wherein receiving the environment data associated with the surrounding environment) through camera module 120 and LiDAR sensor in sensor module 130 (includes receiving images and LiDAR measurements) in order to detect surrounding objects in the surrounding situation of electronic device 100 (captured of the agents that are located within the surrounding environment of the ego vehicle) at time-serial positions used to determine historical trajectories (at a plurality of time steps)).

It would have been obvious to a person having ordinary skill in the art prior to the effective filing date to combine the system of McGill with the features taught by Kum so that “...the processor 180 may recognize historical trajectories of the surrounding objects based on the positions of the surrounding objects on the moving coordinate system” (Kum, ¶[0027]) and so that “...the processor 180 may predict future trajectories of surrounding objects in an integrated way based on the recognized historical trajectories of the surrounding objects” (Kum, ¶[0028]).
The features taught by Kum enable “...a computational load... smaller than a computational load according to the existing technologies and is also consistent regardless of the number of surrounding objects. Furthermore... a prediction error according to various embodiments is smaller than a prediction error according to the existing technologies and is also consistent regardless of the number of surrounding objects. Accordingly, performance according to various embodiments is more excellent than performance of the existing technologies and is also consistent regardless of the number of surrounding objects” (Kum, ¶[0045]; see also FIG. 10 (a), (b)).


Regarding claim 19, the claim limitations recite a non-transitory computer readable storage medium having limitations similar to those of claim 1 and is therefore rejected on the same basis, as outlined above. Regarding the additional limitations recited in claim 19, McGill further discloses:
A non-transitory computer readable storage medium storing instructions that when executed by a computer, which includes a processor perform a method, the method comprising: 
(McGill, FIG. 2;
¶[0007]: “Another embodiment is a non-transitory computer-readable medium for controlling the operation of a vehicle and storing instructions that when executed by one or more processors cause the one or more processors to... generate predicted trajectories of a road agent that is external to the vehicle using second trajectory predictors based, at least in part, on second inputs including at least past trajectory information for the road agent and the sensor data...”;
¶[0039]: “...The trajectory prediction system 170 is shown as including one or more processors 110 from the vehicle 100 of FIG. 1... the trajectory prediction system 170 includes a memory 210 that stores a trajectory-prediction module 220, a control module 230, and a model-training module 240. The memory 210 is a random-access memory (RAM), read-only memory (ROM), a hard-disk drive, a flash memory, or other suitable memory for storing the modules 220, 230, and 240. The modules 220, 230, and 240 are, for example, computer-readable instructions that when executed by the one or more processors 110, cause the one or more processors 110 to perform the various functions disclosed herein”;
Where the trajectory prediction system 170 includes memory 210 in the form of a non-transitory computer-readable medium (A non-transitory computer readable storage medium) storing computer-readable instructions (storing instructions) that when executed by the one or more processors 110, cause the one or more processors 110 to perform the various functions disclosed herein (that when executed by a computer, which includes a processor perform a method, the method comprising)).


Claims 3-5, 12-14,  are rejected under 35 U.S.C. 103 as being unpatentable over McGill and Kum as applied to claims 2 and 11, above, and in further view of Sen et al. (US 20210253131 A1), henceforth known as Sen.

Regarding claim 3, the claim limitations recite a method having limitations similar to those of claim 12 and is therefore rejected on the same basis, as outlined below. 

Regarding claim 4, the claim limitations recite a method having limitations similar to those of claim 13 and is therefore rejected on the same basis, as outlined below. 

Regarding claim 5, the claim limitations recite a method having limitations similar to those of claim 14 and is therefore rejected on the same basis, as outlined below. 

Regarding claim 12, McGill and Kum teach the system of claim 11. Kum further teaches:
wherein receiving the environment data associated with the surrounding environment includes [...] image data and LiDAR data associated with the images and LiDAR measurements captured of the agents at the plurality of time steps,
(Kum, FIG. 1: Camera module 120, Sensor module 130; FIG. 5; FIG. 6 (a); 
¶[0021]: “...the sensor module 130 may include at least any one of... a LiDAR sensor...”;
¶[0026]: “...the processor 180 may collect the information on the surrounding situations of the electronic device 100 based on at least any one of image data obtained through the camera module 120 or sensing data obtained through the sensor module 130. The processor 180 may predict the future trajectory of the surrounding objects for planning driving trajectory of the electronic device 100 based on a surrounding situation of the electronic device 100”;
¶[0055]: “...the recognizing of the trajectories may include configuring a moving coordinate system based on a constant velocity model using the state information on the electronic device 100, representing relative positions of the surrounding objects on the moving coordinate system, and recognizing the historical trajectories based on the time-serial positions...”;
Where  electronic device 100 receives information on the surrounding situation (wherein receiving the environment data associated with the surrounding environment) through camera module 120 and LiDAR sensor in sensor module 130 (includes [...] image data and LiDAR data associated with the images and LiDAR measurements) in order to detect surrounding objects in the surrounding situation of electronic device 100 (captured of the agents) at time-serial positions used to determine historical trajectories (at the plurality of time steps))

 wherein historic positions of the agents during the plurality of time steps are determined based on [...] image data and LiDAR data.
(Kum, FIG. 1: Camera module 120, Sensor module 130; FIG. 5; FIG. 6 (a); 
¶[0021]: “...the sensor module 130 may include at least any one of... a LiDAR sensor...”;
¶[0026]: “...the processor 180 may collect the information on the surrounding situations of the electronic device 100 based on at least any one of image data obtained through the camera module 120 or sensing data obtained through the sensor module 130”;
¶[0055]: “...the recognizing of the trajectories may include configuring a moving coordinate system based on a constant velocity model using the state information on the electronic device 100, representing relative positions of the surrounding objects on the moving coordinate system, and recognizing the historical trajectories based on the time-serial positions...”;
Where electronic device 100 recognizes the historical trajectories of the surrounding objects at time-serial positions, shown in FIG. 6 (a) (wherein historic positions of the agents during the plurality of time steps are determined) based on the data from camera module 120 and LiDAR sensor in sensor module 130 (based on [...] image data and LiDAR data)).

The combination of McGill and Kum fails to explicitly teach aggregating image data and LiDAR data and historic positions of the agents... are determined based on aggregated image data and LiDAR data, the limitations bolded for emphasis. 
However, in the same field of endeavor, Sen teaches:
[wherein receiving the environment data associated with the surrounding environment includes] aggregating [image data and LiDAR data associated with the images and LiDAR measurements captured of the agents at the plurality of time steps,]
(Sen,  FIG. 1, ¶[0077]; FIG. 5; 
¶[0080]: “The vehicle sensor(s) 125 can be configured to acquire the sensor data 140. This can include sensor data associated with the surrounding environment of the vehicle 105... The vehicle sensor(s) 125 can include a Light Detection and Ranging (LIDAR) system... one or more cameras (e.g., visible spectrum cameras, infrared cameras, etc.)... The sensor data 140 can include image data, radar data, LIDAR data, and/or other data acquired by the vehicle sensor(s) 125...”;
¶[0081]: “... the sensor data 140 can be indicative of one or more objects within the surrounding environment of the vehicle 105. The object(s) can include, for example, vehicles, pedestrians, bicycles, and/or other objects. The object(s) can be located in front of, to the rear of, to the side of the vehicle 105, etc. The sensor data 140 can be indicative of locations associated with the object(s) within the surrounding environment of the vehicle 105 at one or more times. The vehicle sensor(s) 125 can provide the sensor data 140 to the autonomy computing system 130”;
¶[0085]: “The vehicle computing system 100 (e.g., the autonomy computing system 130) can identify one or more objects that are proximate to the vehicle 105 based at least in part on the sensor data 140 and/or the map data 145. For example, the vehicle computing system 100 (e.g., the perception system 155) can process the sensor data 140, the map data 145, etc. to obtain perception data 170 (e.g., fused perception data). The vehicle computing system 100 can generate perception data 170 that is indicative of one or more states (e.g., current and/or past state(s)) of a plurality of objects that are within a surrounding environment of the vehicle 105. For example, the perception data 170 for each object can describe (e.g., for a given time, time period) an estimate of the object's: current and/or past location (also referred to as position)...”;
¶[0093]: “...a path can depict motion (e.g., tracked motion) of an object by a time series of prior positions...”;
Where vehicle computing system 100 perceives the surrounding environment of vehicle 105 ([wherein receiving the environment data associated with the surrounding environment]) by fusing sensor data 140, which includes image data and lidar data, to obtain perception data 170 ([includes] aggregating [image data and LiDAR data]),  wherein the image data and lidar data, i.e. the sensor data 140, indicates one or more objects within the surrounding environment of vehicle 105 ([associated with the images and LiDAR measurements captured of the agents]) at prior positions at one or more times, i.e. a time series ([at the plurality of time steps]))

[wherein historic positions of the agents during the plurality of time steps are determined based on] aggregated [image data and LiDAR data.]
(Sen,  FIG. 1, ¶[0077]; FIG. 5; 
¶[0080]: “... The sensor data 140 can include image data, radar data, LIDAR data, and/or other data acquired by the vehicle sensor(s) 125...”;
¶[0081]: “...The sensor data 140 can be indicative of locations associated with the object(s) within the surrounding environment of the vehicle 105 at one or more times...”;
¶[0085]: “The vehicle computing system 100 (e.g., the autonomy computing system 130) can identify one or more objects that are proximate to the vehicle 105 based at least in part on the sensor data 140 and/or the map data 145. For example, the vehicle computing system 100 (e.g., the perception system 155) can process the sensor data 140, the map data 145, etc. to obtain perception data 170 (e.g., fused perception data). The vehicle computing system 100 can generate perception data 170 that is indicative of one or more states (e.g., current and/or past state(s)) of a plurality of objects that are within a surrounding environment of the vehicle 105. For example, the perception data 170 for each object can describe (e.g., for a given time, time period) an estimate of the object's: current and/or past location (also referred to as position)...”;
¶[0093]: “...A path can describe... how an object and/or actor has been moving (e.g., over a previous time interval)... a path can depict motion (e.g., tracked motion) of an object by a time series of prior positions...”;
Where vehicle computing system 100 determines how an object has been moving over a previous time interval using a time series of prior positions of the object ([wherein historic positions of the agents during the plurality of time steps are determined]) based on perception data 170, created by fusing sensor data 140 which included image data and LIDAR data ([based on] aggregated [image data and LiDAR data.])). 

It would have been obvious to a person having ordinary skill in the art prior to the effective filing date to combine the system of McGill and Kum with the features taught by Sen so that “The computing system can fuse (e.g., combine and/or reconcile) secondary perception data describing objects (e.g., actors) and/or paths from the secondary perception system with primary perception data describing the classified objects from the primary perception system to create fused perception data that can be indicative of a better understanding of an environment of the autonomous vehicle than the primary perception data” (Sen, ¶[0056]). The perception data 170 that fused sensor data 140, including the image data and LIDAR data, is an output of perception system 155 (see FIG. 1), which includes the primary and secondary perception systems. In summary, fusing, i.e. aggregating, sensor data 140 (image data LIDAR data) allows a better understanding of an environment of the autonomous vehicle. 


Regarding claim 13, McGill, Kum, and Sen teach the system of claim 12. Kum further teaches:
further including processing a spatio-temporal graph as a graphic representation of the historic positions of the agents at each time step of the plurality of time steps, 
(Kum, FIG. 2; FIG. 3 (a), (b); FIG. 6 (a), (b); 
¶[0027]: “...In FIG. 2, the electronic device 100 may be represented as a moving object... Accordingly, the processor 180 may recognize historical trajectories of the surrounding objects based on the positions of the surrounding objects on the moving coordinate system”;
¶[0028]: “As illustrated in FIG. 3... the processor 180 may predict the future trajectories by integrating and estimating interactions between the surrounding objects and the electronic device 100... the characteristics of the interactions may have... time variability... The time variability may indicate that an interaction is changed over time. To this end, the processor 180 may configure a graph model for a surrounding situation of the electronic device 100 by performing graph modeling, as illustrated in FIG. 3(b), based on trajectories of the surrounding objects, such as those illustrated in FIG. 3(a)...”;
¶[0069]: “...the processor 180 may be configured to configure a moving coordinate system based on a constant velocity model using the state information on the electronic device 100, represent relative positions of the surrounding objects on the moving coordinate system, and recognize the historical trajectories based on the time-serial positions”;
Where electronic device 100 estimates interactions between the surrounding objects and the electronic device 100 over time by performing graph modeling, shown in FIG. 3 (b) (further including processing a spatio-temporal graph) in order to graphically represent the historical trajectories, including positions, of the surrounding objects (as a graphic representation of the historic positions of the agents), using time-serial positions (at each time step of the plurality of time steps); FIG. 6 (a), (b) shows in greater detail how electronic device 100 performs graph modeling)

wherein adjacency matrices from the plurality of time steps and graph vertices associated with the historic positions of the agents are output.
(Kum, FIG. 3 (a), (b); FIG. 6 (a), (b); FIG. 7;
¶[0028]: “As illustrated in FIG. 3... the processor 180 may predict the future trajectories by integrating and estimating interactions between the surrounding objects and the electronic device 100... the characteristics of the interactions may have... time variability... The time variability may indicate that an interaction is changed over time. To this end, the processor 180 may configure a graph model for a surrounding situation of the electronic device 100 by performing graph modeling, as illustrated in FIG. 3(b), based on trajectories of the surrounding objects, such as those illustrated in FIG. 3(a)...”;
¶[0035]: “...the graph model may be represented as illustrated in FIG. 6(b), and may include a plurality of nodes and a plurality of edges that connect the plurality of nodes. The nodes may indicate the electronic device 100 and surrounding objects, respectively. In this case, the nodes may represent at least any one of a position, speed or a heading angle, for example, based on state information on the electronic device 100 or the surrounding objects...”;
¶[0038]: “...the processor 180 may relatively evaluate two nodes connected by each edge in the graph model based on state information on the electronic device 100 and the surrounding objects. Furthermore, the processor 180 may assign importance to each of the nodes based on the results of the evaluation”;
¶[0039]: “At operation 720, the electronic device 100 may compute an adjacency matrix based on the importance of each of the surrounding objects. The processor 180 may compute an adjacency matrix to which the importance of each of the surrounding objects has been considered”;
¶[0069]: “...the processor 180 may be configured to configure a moving coordinate system based on a constant velocity model using the state information on the electronic device 100, represent relative positions of the surrounding objects on the moving coordinate system, and recognize the historical trajectories based on the time-serial positions”;
Where an adjacency matrix for each of the surrounding objects (wherein adjacency matrices) from the time-serial positions (from the plurality of time steps) and nodes associated with the historical positions of the surrounding objects (and graph vertices associated with the historic positions of the agents) is computed by electronic device 100 (are output)).

It would have been obvious to a person having ordinary skill in the art prior to the effective filing date to combine the system of McGill and Sen with the features taught by Kum so that “...electronic device 100 may process unspecified number of multiple surrounding objects that vary in real time in an integrated way through the scalable attention mechanism. In this case, the processor 180 may relatively evaluate two nodes connected by each edge in the graph model based on state information on the electronic device 100 and the surrounding objects...” (Kum, ¶[0038]). That is, the graphical representation using nodes, i.e. vertices, and edges allows the electronic device 100 to process an unspecified number of surrounding objects in real time. 
The features taught by Kum enable “...a computational load... smaller than a computational load according to the existing technologies and is also consistent regardless of the number of surrounding objects. Furthermore... a prediction error according to various embodiments is smaller than a prediction error according to the existing technologies and is also consistent regardless of the number of surrounding objects. Accordingly, performance according to various embodiments is more excellent than performance of the existing technologies and is also consistent regardless of the number of surrounding objects” (Kum, ¶[0045]; see also FIG. 10 (a), (b)).


Regarding claim 14, McGill, Kum, and Sen teach the system of claim 13. Kum further teaches:
wherein implementing the graph convolutions to obtain the attention weights includes inputting the adjacency matrices from the plurality of time steps and the graph vertices associated with the historic positions of the agents to graph convolutions to obtain the attention weights that are respectively associated with each of the agents.
(Kum, FIG. 3 (a), (b); FIG. 6 (a), (b); FIG. 7; FIG. 8 (a), (b);
¶[0028]: “As illustrated in FIG. 3... the processor 180 may predict the future trajectories by integrating and estimating interactions between the surrounding objects and the electronic device 100... the characteristics of the interactions may have... time variability... The time variability may indicate that an interaction is changed over time. To this end, the processor 180 may configure a graph model for a surrounding situation of the electronic device 100 by performing graph modeling, as illustrated in FIG. 3(b), based on trajectories of the surrounding objects, such as those illustrated in FIG. 3(a)...”;
¶[0035]: “...the graph model may be represented as illustrated in FIG. 6(b), and may include a plurality of nodes and a plurality of edges that connect the plurality of nodes. The nodes may indicate the electronic device 100 and surrounding objects, respectively. In this case, the nodes may represent at least any one of a position, speed or a heading angle, for example, based on state information on the electronic device 100 or the surrounding objects...”;
¶[0038]: “...the processor 180 may relatively evaluate two nodes connected by each edge in the graph model based on state information on the electronic device 100 and the surrounding objects. Furthermore, the processor 180 may assign importance to each of the nodes based on the results of the evaluation”;
¶[0039]: “At operation 720, the electronic device 100 may compute an adjacency matrix based on the importance of each of the surrounding objects...”;
¶[0040]: “At operation 730, the electronic device 100 may configure a graph convolution neural network using the adjacency matrix... as illustrated in FIG. 8(b), the processor 180 may configure the graph convolution neural network by performing a graph convolution operation”;
¶[0069]: “...the processor 180 may be configured to configure a moving coordinate system based on a constant velocity model using the state information on the electronic device 100, represent relative positions of the surrounding objects on the moving coordinate system, and recognize the historical trajectories based on the time-serial positions”;
Where, when electronic device 100 performs the graph convolution operations at operation 730, shown in FIG. 8 (b), to assign an amount of importance to each of the surrounding objects (wherein implementing the graph convolutions to obtain the attention weights), electronic device 100 computes an adjacency matrix for each of the surrounding objects (includes inputting the adjacency matrices) from the time-serial positions (from the plurality of time steps) and from the evaluation of nodes per ¶[0035] associated with the historical positions of the surrounding objects (and the graph vertices associated with the historic positions of the agents) in order to perform the graph convolution operation (to graph convolutions) and assign an appropriate amount of importance to each surrounding object (to obtain the attention weights that are respectively associated with each of the agents)).

It would have been obvious to a person having ordinary skill in the art prior to the effective filing date to combine the system of McGill and Sen with the features taught by Kum so that “...electronic device 100 may process unspecified number of multiple surrounding objects that vary in real time in an integrated way through the scalable attention mechanism. In this case, the processor 180 may relatively evaluate two nodes connected by each edge in the graph model based on state information on the electronic device 100 and the surrounding objects...” (Kum, ¶[0038]). That is, the graphical representation using nodes, i.e. vertices, and edges allows the electronic device 100 to process an unspecified number of surrounding objects in real time. 
The features taught by Kum enable “...a computational load... smaller than a computational load according to the existing technologies and is also consistent regardless of the number of surrounding objects. Furthermore... a prediction error according to various embodiments is smaller than a prediction error according to the existing technologies and is also consistent regardless of the number of surrounding objects. Accordingly, performance according to various embodiments is more excellent than performance of the existing technologies and is also consistent regardless of the number of surrounding objects” (Kum, ¶[0045]; see also FIG. 10 (a), (b)).


Claims 6-8 and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over McGill, Kum, and Sen as applied to claims 5 and 14, above, and in further view of Choi et al. (US 20180124423 A1), henceforth known as Choi.

Regarding claim 6, the claim limitations recite a method having limitations similar to those of claim 15 and is therefore rejected on the same basis, as outlined below. 

Regarding claim 7, the claim limitations recite a method having limitations similar to those of claim 16 and is therefore rejected on the same basis, as outlined below. 

Regarding claim 8, the claim limitations recite a method having limitations similar to those of claim 17 and is therefore rejected on the same basis, as outlined below. 

Regarding claim 15, McGill, Kum, and Sen teach the system of claim 14. The combination of McGill, Kum, and Sen fails to explicitly teach the limitations of claim 15 as a whole. 
However, in the same field of endeavor, Choi teaches:
further including executing a multi-attention function to attend to features that are associated with the agents, 
(Choi, FIG. 1; FIG. 2; FIG. 3; FIG. 6, ¶[0051]; ¶[0047]-¶[0049];
¶[0018]: “Referring now to FIG. 1, an exemplary scene 100 is shown. The scene 100 depicts an intersection that is being monitored and includes a number of agents 102, which in this case may include both pedestrians and automobiles. The past positions of each agent are shown as dotted-line paths 104. Predictions as to an agent's future position are shown as solid lines 106, with thicker lines representing predictions having a higher likelihood and thinner lines representing predictions having a lower likelihood...”;
¶[0020]: “Referring now to FIG. 2, a method for trajectory prediction is shown. Since there can be multiple plausible futures given the same inputs (including images I and past trajectories X), block 202 generates a diverse set of prediction samples Ŷ to provide accurate prediction of future trajectories 106...”;
¶[0022]: “Block 204 then determines the random prediction samples that are most likely to reflect future trajectories while incorporating scene context and interactions. Block 204 ranks the samples and refines them to incorporate contextual and interaction cues...”;
Where processing system 600 performs the method outlined in FIG. 2 through computer instructions, i.e. a function (further including executing a multi-attention function), wherein the function of FIG. 2 generates a diverse set of predictions samples, i.e. a diverse set of possible future trajectories; see FIG. 1 (to attend to features that are associated with the agents); this is similar to ¶[0029]-¶[0030] of the instant application: “... the neural network 108 may be configured to utilize an interaction encoder 112 to encode meaningful interactions into encoded features. The interaction encoder 112 may be configured to execute a multi-attention function to highlight important interactions in space and in time that occur with respect to the agents 202 within the surrounding environment 200 of the ego vehicle 102... The neural network 108 may additionally utilize a decoder 114 to decode the encoded features into multi-modal trajectories (represented in FIG. 2 by the exemplary arrows)... The multi-modal trajectories may be outputted by the decoder 114 as predicted future trajectories of the agents 202... with corresponding probabilities... ”)

wherein an encoding function is completed to encode and output multi-attention features that are associated with predicted attention weights that are respectively associated with each of the agents.
(Choi, FIG. 1; FIG. 2; FIG. 3; FIG. 4; FIG. 5; FIG. 6, ¶[0051]; ¶[0047]-¶[0049];
¶[0018]: “Referring now to FIG. 1, an exemplary scene 100 is shown. The scene 100 depicts an intersection that is being monitored and includes a number of agents 102, which in this case may include both pedestrians and automobiles. The past positions of each agent are shown as dotted-line paths 104. Predictions as to an agent's future position are shown as solid lines 106, with thicker lines representing predictions having a higher likelihood and thinner lines representing predictions having a lower likelihood...”;
¶[0020]: “Referring now to FIG. 2, a method for trajectory prediction is shown. Since there can be multiple plausible futures given the same inputs (including images I and past trajectories X), block 202 generates a diverse set of prediction samples Ŷ to provide accurate prediction of future trajectories 106...”;
¶[0030]: “...An RNN decoder takes the output of the previous step... and generates K future prediction samples for each agent i: Ŷi(1), Ŷi(2), . . . , Ŷi(K)”;
¶[0033]: “Referring now to FIG. 4, a system-oriented view of prediction sample generation block 202 is shown. First RNN encoder 402 and second RNN encoder 404 accept inputs of X and Y respectively... ”;
¶[0005]: “...A ranking/refinement module includes a processor configured to rank the prediction samples according to a likelihood score that incorporates interactions between agents and semantic scene context...”;
Where processing system 600 implements RNN encoder 402 and RNN encoder 404 (wherein an encoding function is completed) to generate future trajectories with associated likelihood scores (to encode and output multi-attention features that are associated with predicted attention weights) for each agent (that are respectively associated with each of the agents)).

It would have been obvious to a person having ordinary skill in the art prior to the effective filing date to combine the system of McGill, Kum, and Sen with the features taught by Choi because “...The present embodiments provide scalability (because deep learning enables end-to-end training and easy incorporation with multiple cues from past motion, scene context, and agent interactions), diversity (the stochastic output is combined with an encoding of past observations to generate multiple prediction hypotheses that resolve the ambiguities and multimodalities of future prediction), and accuracy (long-term future rewards are accumulated for sampled trajectories and a deformation of the trajectory is learned to provide accurate predictions farther into the future)” (Choi, ¶[0019]). See also ¶[0020]-¶[0021]. 


Regarding claim 16, McGill, Kum, Sen, and Choi teach the system of claim 15. Choi further teaches:
wherein decoding the multi modal trajectories and the probabilities for each of the agents includes decoding the multi-attention features to decode and output multiple predicted trajectories as multi-modal trajectories for each mode and agent.
(Choi, FIG. 1; FIG. 2; FIG. 3; FIG. 4; FIG. 5; FIG. 6, ¶[0051]; 
¶[0018]: “Referring now to FIG. 1, an exemplary scene 100 is shown. The scene 100 depicts an intersection that is being monitored and includes a number of agents 102, which in this case may include both pedestrians and automobiles. The past positions of each agent are shown as dotted-line paths 104. Predictions as to an agent's future position are shown as solid lines 106, with thicker lines representing predictions having a higher likelihood and thinner lines representing predictions having a lower likelihood. The scene 100 may be built across many individual images, with agent positions being tracked from image to image”;
¶[0020]: “Referring now to FIG. 2, a method for trajectory prediction is shown. Since there can be multiple plausible futures given the same inputs (including images I and past trajectories X), block 202 generates a diverse set of prediction samples Ŷ to provide accurate prediction of future trajectories 106...”;
¶[0030]: “...An RNN decoder takes the output of the previous step... and generates K future prediction samples for each agent i: Ŷi(1), Ŷi(2), . . . , Ŷi(K)”;
¶[0015]: “...the present embodiments predict the locations of agents and the evolution of scene elements at future times using observations of the past states of the scene, for example in the form of agent trajectories and scene context derived from image-based features or other sensory date (if available)”;
¶[0005]: “...A ranking/refinement module includes a processor configured to rank the prediction samples according to a likelihood score that incorporates interactions between agents and semantic scene context...”;
Where processing system 600 uses an RNN decoder to generate K future prediction samples, i.e. future predicted trajectories and associated likelihood scores for each agent (wherein decoding the multi modal trajectories and the probabilities for each of the agents), wherein processing system 600 decodes K future predictions samples for each agent, i.e. decodes image-based features such as an agent’s position tracked image to image, to incorporate context from scene 100 (includes decoding the multi-attention features) in order to decode and output generate K future prediction samples, i.e. future predicted trajectories for each agent (to decode and output multiple predicted trajectories as multi-modal trajectories for each mode and agent); this is similar to ¶[0029]-¶[0030] of the instant application: see mapping for rejection of claim 15, above).

It would have been obvious to a person having ordinary skill in the art prior to the effective filing date to combine the system of McGill, Kum, and Sen with the features taught by Choi because “...The present embodiments provide scalability (because deep learning enables end-to-end training and easy incorporation with multiple cues from past motion, scene context, and agent interactions), diversity (the stochastic output is combined with an encoding of past observations to generate multiple prediction hypotheses that resolve the ambiguities and multimodalities of future prediction), and accuracy (long-term future rewards are accumulated for sampled trajectories and a deformation of the trajectory is learned to provide accurate predictions farther into the future)” (Choi, ¶[0019]). See also ¶[0027]. 


Regarding claim 17, McGill, Kum, Sen, and Choi teach the system of claim 16. Choi further teaches:
wherein decoding the multi modal trajectories and the probabilities for each of the agents includes utilizing a cross entropy loss for ranking modes by predicting the probabilities and outputting the predicted trajectories with the rankings associated with probabilities.
(Choi, FIG. 1; FIG. 2; FIG. 3; FIG. 4; FIG. 5; FIG. 6, ¶[0051]; 
¶[0018]: “Referring now to FIG. 1, an exemplary scene 100 is shown. The scene 100 depicts an intersection that is being monitored and includes a number of agents 102, which in this case may include both pedestrians and automobiles. The past positions of each agent are shown as dotted-line paths 104. Predictions as to an agent's future position are shown as solid lines 106, with thicker lines representing predictions having a higher likelihood and thinner lines representing predictions having a lower likelihood. The scene 100 may be built across many individual images, with agent positions being tracked from image to image”;
¶[0020]: “Referring now to FIG. 2, a method for trajectory prediction is shown. Since there can be multiple plausible futures given the same inputs (including images I and past trajectories X), block 202 generates a diverse set of prediction samples Ŷ to provide accurate prediction of future trajectories 106...”;
¶[0030]: “...An RNN decoder takes the output of the previous step... and generates K future prediction samples for each agent i: Ŷi(1), Ŷi(2), . . . , Ŷi(K)”;
¶[0005]: “...A ranking/refinement module includes a processor configured to rank the prediction samples according to a likelihood score that incorporates interactions between agents and semantic scene context...”;
¶[0041]: “There are two loss terms in ranking and refinement block 204: a cross-entropy loss and a regression loss...”;
Where processing system 600 uses an RNN decoder to generate K future prediction samples, i.e. future predicted trajectories and associated likelihood scores for each agent (wherein decoding the multi modal trajectories and the probabilities for each of the agents), wherein processing system 600 utilizes a cross-entropy loss term in block 204 (includes utilizing a cross entropy loss) in order to rank the future predicted trajectories with a likelihood score (for ranking modes by predicting the probabilities), and where processing system 600 outputs the trajectories and their rankings according to their likelihood scores in block 204 (and outputting the predicted trajectories with the rankings associated with probabilities)).

It would have been obvious to a person having ordinary skill in the art prior to the effective filing date to combine the system of McGill, Kum, and Sen with the features taught by Choi because “...The present embodiments provide scalability (because deep learning enables end-to-end training and easy incorporation with multiple cues from past motion, scene context, and agent interactions), diversity (the stochastic output is combined with an encoding of past observations to generate multiple prediction hypotheses that resolve the ambiguities and multimodalities of future prediction), and accuracy (long-term future rewards are accumulated for sampled trajectories and a deformation of the trajectory is learned to provide accurate predictions farther into the future)” (Choi, ¶[0019]). See also ¶[0022]. 


Claims 9, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over McGill and Kum as applied to claims 1, 10, and 19, above, and in further view of Hong et al. (US 11195418 B1), henceforth known as Hong.

Regarding claim 9, the claim limitations recite a method having limitations similar to those of claim 18 and is therefore rejected on the same basis, as outlined below. 

Regarding claim 18, McGill and Kum teach the system of claim 10. The combination of McGill and Kum fails to explicitly teach the limitations of claim 18 as a whole.
However, in the same field of endeavor, Hong teaches:
wherein controlling the at least one vehicle system of the ego vehicle includes comparing the rankings associated with the probabilities to determine if the rankings are ranked higher than a predetermined probability threshold, 
(Hong, FIG. 1; FIG. 2; FIG. 3;  
Col 9, lines 58-64: “...the vehicle computing device 204 can include one or more system controllers 226, which can be configured to control steering, propulsion, braking, safety, emitters, communication, and other systems of the vehicle 202. These system controller(s) 226 can communicate with and/or control corresponding systems of the drive system(s) 214 and/or other components of the vehicle 202”;
Col 4, lines 37-67: “...the heat maps may comprise prediction probabilities that may represent a plurality of predicted trajectories for the agent in the environment. In the context of a vehicle traversing an environment, a first predicted trajectory can represent the vehicle making a left-turn through the intersection, while a second predicted trajectory can represent the vehicle going straight through the intersection. In a case where the first predicted trajectory has a higher probability than the second predicted trajectory (e.g., because the sensor data may have captured a left-turn indicator (e.g., a blinker or turn signal) of the vehicle), the operation can include masking, covering, or otherwise removing prediction probabilities of the heat maps that correspond to the first predicted trajectory. Next, the masked heat map may be normalized (e.g., the prediction probabilities can be scaled between 0 and 1) and the highest probability of the masked heat map can be determined as a prediction point. A second trajectory can be based at least in part on the prediction points associated with the masked heat map. That is, the second set of prediction points can be used to generate the second predicted trajectory by evaluating one or more cost functions to determine the second trajectory. This masking process and determining of predicted trajectories can be repeated until a probability of a trajectory does not meet or exceed a prediction threshold. The at least one predicted trajectory can be provided to a planning system of the autonomous vehicle whereby the autonomous vehicle can be controlled based at least in part on the at least one predicted trajectory. In at least other examples, all possible predicted trajectories and/or their corresponding uncertainties can be output to such a planning system.”;
Col 8, lines 22-26: “...the process 100 can be performed in parallel for each agent in the environment. In some instances, the process 100 can be performed on a single set of images to generate at least one trajectory for each agent of a plurality of agents in the environment”;
Where system 200 controls various vehicle systems of vehicle 200 (wherein controlling the at least one vehicle system of the ego vehicle) by determining that a first predicted trajectory has a highest probability, masking the first predicted trajectory and determining that a second predicted trajectory has a second highest probability and repeating until no predicted trajectories are above a prediction threshold (includes comparing the rankings associated with the probabilities to determine if the rankings are ranked higher than a predetermined probability threshold); the first and second trajectories are ranked by probability and are both compared to a prediction threshold such that only trajectories above the prediction threshold are output to the planning system; see FIG. 3)

wherein autonomous control parameters are output to autonomously control the ego vehicle within the surrounding environment of the ego vehicle based on the predicted trajectories associated with each of the agents and the rankings associated with the probabilities.
(Hong, FIG. 1; FIG. 2; FIG. 3;  
Col 9, lines 58-64: “...the vehicle computing device 204 can include one or more system controllers 226, which can be configured to control steering, propulsion, braking, safety, emitters, communication, and other systems of the vehicle 202. These system controller(s) 226 can communicate with and/or control corresponding systems of the drive system(s) 214 and/or other components of the vehicle 202”;
Col 4, lines 37-67: “...the heat maps may comprise prediction probabilities that may represent a plurality of predicted trajectories for the agent in the environment. In the context of a vehicle traversing an environment, a first predicted trajectory can represent the vehicle making a left-turn through the intersection, while a second predicted trajectory can represent the vehicle going straight through the intersection... the first predicted trajectory has a higher probability than the second predicted trajectory...The at least one predicted trajectory can be provided to a planning system of the autonomous vehicle whereby the autonomous vehicle can be controlled based at least in part on the at least one predicted trajectory. In at least other examples, all possible predicted trajectories and/or their corresponding uncertainties can be output to such a planning system”;
Col 19, lines 4-16: “At operation 342, the process can include outputting predicted trajector(ies) to a planning system to generate trajector(ies) to control an autonomous vehicle. An example 344 illustrates the predicted trajectory 324 (e.g., based at least in part on the predicted points 312, 314, and 318) and a predicted trajectory 346 (e.g., based at least in part on the candidate point 316, which may have been determined to be a predicted point for the predicted trajectory 346)... the predicted trajectories can be output to the planning component 224 of the autonomous vehicle 106. The planning system 224 can generate one or more trajectories for the autonomous vehicle 106 to follow based at least in part on the at least one predicted trajectory”;
Col 8, lines 22-26: “...the process 100 can be performed in parallel for each agent in the environment. In some instances, the process 100 can be performed on a single set of images to generate at least one trajectory for each agent of a plurality of agents in the environment”;
Where system 200 controls various vehicle systems of vehicle 200 through one or more system controllers (wherein autonomous control parameters are output to autonomously control the ego vehicle) within the environment (within the surrounding environment of the ego vehicle) based on the predicted trajectories sent to planning component 224 of autonomous vehicle 106 in FIG. 3 and their associated rankings based on their respective probabilities (based on the predicted trajectories associated with each of the agents and the rankings associated with the probabilities); Col 8, lines 22-26, reproduced above, outlines how the process 100 can be performed in parallel for each of a plurality of agents, i.e. predicted trajectories associated with each of the agents).

It would have been obvious to a person having ordinary skill in the art prior to the effective filing date to combine the system of McGill and Kum with the features taught by Hong because “The techniques discussed herein can improve a functioning of a computing device in a number of additional ways... generating the at least one predicted trajectory can be provided to a planner system of an autonomous vehicle, which may allow the autonomous vehicle to generate more accurate and/or safer trajectories for the autonomous vehicle to traverse an environment. For example, a predicted trajectory suggesting a likelihood of a collision or a near-collision may allow the autonomous vehicle to alter a trajectory (e.g., change lanes, stop, etc.) in order to safely traverse the environment” (Hong, Col 5, lines 4-22). That is, ranking the predicted trajectories and outputting only the trajectories that meet or exceed a probability threshold allows the autonomous vehicle to create more accurate and safe trajectories for itself. 

Regarding claim 20, the claim limitations recite a non-transitory computer readable storage medium having limitations similar to those of claim 18 and is therefore rejected on the same basis, as outlined above. 


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Brown et al. (US 20200117958 A1) discloses a method for predicting a future action of agents in a scene including assigning a fidelity level to agents observed in the scene. The method also includes recursively predicting future actions of the agents by traversing the scene. A different forward prediction model is used at each recursion level. The method further includes controlling an action of an ego agent based on the predicted future actions of the agents.
Smolyanskiy et al. (US 20210150230 A1) discloses a deep neural network(s) (DNN) that may be used to detect objects from sensor data of a three dimensional (3D) environment. For example, a multi-view perception DNN may include multiple constituent DNNs or stages chained together that sequentially process different views of the 3D environment. An example DNN may include a first stage that performs class segmentation in a first view (e.g., perspective view) and a second stage that performs class segmentation and/or regresses instance geometry in a second view (e.g., top-down). The DNN outputs may be processed to generate 2D and/or 3D bounding boxes and class labels for detected objects in the 3D environment. As such, the techniques described herein may be used to detect and classify animate objects and/or parts of an environment, and these detections and classifications may be provided to an autonomous vehicle drive stack to enable safe planning and control of the autonomous vehicle.
Abad et al. (WO 2020114780 A1) discloses a computer-implemented method comprising an operating phase comprising the steps of receiving one or several video frames from a plurality of modalities, so-called multi-modality video frames, of a vehicle's environment, corresponding to present and past timestamps; encoding into a latent representation, said multi-modality video frames by a spatial-temporal encoding convolutional neural network (E); combining into a composite representation (Z), said latent representation with encoded conditioning parameters corresponding to timestamps at the desired future time horizon; predicting multiple future multi-modality video frames corresponding to multiple future modes of a multi-modal future solution space associated with likelihood coefficients by a generative convolutional neural network (G) previously trained in a generative adversarial network training scheme.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Tawri M Matsushige whose telephone number is (571)272-3715. The examiner can normally be reached M-Th (0830-1600).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Lee can be reached on (571)270-5965. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/T.M.M./Examiner, Art Unit 3668                                                                                                                                                                                                        

/JAMES J LEE/Supervisory Patent Examiner, Art Unit 3668