DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  
Applicant's submission filed on 07/07/2022 has been entered.
Response to Arguments
	Applicant’s arguments have been fully considered but are moot in light of a new rejection.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4, 6-12, 14, and 16-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Towal et al. US 2019/0384304 in view of Chu, Qi, et al. "Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism." further in view of Audebert, Nicolas, Bertrand Le Saux, and Sébastien Lefèvre. ".
Regarding claims 1 and 11, Towal teaches “a method for predicting trajectories of objects to guide the driving of an autonomous vehicle, the method comprising” (abstract “In various examples, a deep learning solution for path detection is implemented to generate a more abstract definition of a drivable path without reliance on explicit lane-markings—by using a detection-based approach.”): 
“acquiring, by a camera mounted on the autonomous vehicle, a video” ([0034] “The sensor data 102 may include image data generated by one or more cameras of an autonomous vehicle (e.g., vehicle 1100”); 
“acquiring, by one or more sensors mounted on the autonomous vehicle, sensor data” ([0034] “the sensor data 102 may additionally or alternatively include other types of sensor data, such as LIDAR data from one or more LIDAR sensors 1164, RADAR data from one or more RADAR sensors 1160, audio data from one or more microphones 1196, etc”); 
“generating metadata based on (1) an image corresponding to a frame of the video and (2) the sensor data” ([0036] “the image data may be provided as input to a sensor data pre-processor (not shown) to generate pre-processed image data.” and [0181] “An alternative stereo camera(s) 1168 may include a compact stereo vision sensor(s) that may include two camera lenses (one each on the left and right) and an image processing chip that may measure the distance from the vehicle to the target object and use the generated information (e.g., metadata) to activate the autonomous emergency braking and lane departure warning functions.”); 
“encoding the metadata and generating a first set of features based on the encoded metadata” ([0038] “An example of a suitable pre-processing image pipeline may use a raw RCCB Bayer (e.g., 1-channel) type of image from the sensor and convert that image to a RCB (e.g., 3-channel) planar image stored in Fixed Precision (e.g., 16-bit-per-channel) format. The pre-processing image pipeline may include decompanding, noise reduction, demosaicing, white balancing, histogram computing, and/or adaptive global tone mapping (e.g., in that order, or in an alternative order).”); 
“using the trajectory prediction to guide the driving of the autonomous vehicle” ([0030] “determination of drivable paths may be useful for an autonomous vehicle for path planning (e.g., determining a path for the vehicle through the environment), lane keeping (e.g., staying within a certain lane of a driving surface), lane changing (e.g., to determine a trajectory between a first drivable path, such as a first lane, and a second drivable path, such as a second lane), path warnings in semi-autonomous vehicles (e.g., a warning may be output to a passenger or driver when a vehicle is exiting a drivable path, such as drifting to an adjacent lane), mapping (e.g., predicting all drivable paths in an image to map the environment represented by the image), and/or to perform other operations or functions.”)
Towal however does not explicitly teach the remaining limitations. Chu however teaches “generating explanation information based on the image, the metadata, and the first set of features” (Chu fig. 2 
    PNG
    media_image1.png
    335
    1109
    media_image1.png
    Greyscale
which shows explanation information derived from features); 
“generating an attention map based on the image and the metadata” (abstract “In the framework, we introduce spatialtemporal attention mechanism (STAM) to handle the drift caused by occlusion and interaction among targets. The visibility map of the target is learned and used for inferring the spatial attention map.”), “wherein the generating of the attention map comprises marking one or more areas of interest of the image, and wherein at least one of the one or more areas of interest correspond to an object in the image” (abstract “This framework utilizes the merits of single object trackers in adapting appearance models and searching for target in the next frame.”); 
“generating a trajectory prediction for the object based on the first set of features and the attention map” (§1 ¶1 “The goal of multi-object tracking (MOT) is to estimate the locations of multiple objects in the video and maintain their identities consistently in order to yield their individual trajectories.”); 
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Towal with that of Chu since “Tracking objects in videos is an important problem in computer vision which has attracted great attention.” Chu §1¶1 and “Experimental results on challenging MOT benchmarks demonstrate the effectiveness of the proposed online MOT algorithm.” §5 ¶1. This shows that by utilizing the techniques of Chu, one can achieve better multi-object tracking and subsequently have better autonomous navigation. 
Both references however do not explicitly teach semantic segmentation or explanation images. Audebert however teaches “wherein generating explanation information includes generating a semantic segmentation image for each frame of the video” (pg. 3 item 1 “Semantic segmentation to infer pixel-level class masks using a fully convolutional network;”) 
“identifying an area of interest of the semantic segmentation image” (pg. 3 item 2 “2. Vehicle detection by regressing the bounding boxes of connected components;”); and 
“generating an explanation image based on affecting factors marked on the semantic segmentation image” (pg. 3 figure 1 the classified vehicles which are interpreted as explanation images as per [0066] of the spec, “corresponding to each of the frames for testing, where the affecting factors for testing are marked by referring to results of said (i-2).”)
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Towal and Chu with that of Audebert since a combination of known methods would yield predictable results. As shown in Audebert, it is known to have semantic segmentation in order to have images showing items of interest. By using these techniques, one would have more efficient computer vision allowing for broader applications.
Note that independent claim 11 recites the same substantial subject matter as independent claim 1, only differing in embodiment. As such, the claim is subject to the same rejection. The difference in embodiment including a memory and processor is taught by Towal figure 12 which shows both.
Regarding claims 2 and 12, the Towal, Chu, and Audebert references have been addressed above. Towal further teaches “generating training metadata based on (a) a training image corresponding to a training frame of a training video, and (b) training sensor data associated with  the training frame” ([0113] “In order to train the DNN of the present disclosure, ground truth data including a combination of anchor points, or anchor lines, and annotated paths may be used.”); 
“encoding the training metadata and generating one or more training features based on the encoded training metadata” ([0038] “An example of a suitable pre-processing image pipeline may use a raw RCCB Bayer (e.g., 1-channel) type of image from the sensor and convert that image to a RCB (e.g., 3-channel) planar image stored in Fixed Precision (e.g., 16-bit-per-channel) format. The pre-processing image pipeline may include decompanding, noise reduction, demosaicing, white balancing, histogram computing, and/or adaptive global tone mapping (e.g., in that order, or in an alternative order).”);; 
“generating training explanation information based on the training image, the encoded training metadata, and the one or more training features” ([0114] “the annotated paths may indicate edges of the drivable paths in the images, or may indicate a centerline of the drivable paths in the images. The number of annotated paths may correspond to a training parameter of the DNN, for determining how many predicted paths to output by the DNN.”); 
 “minimizing an explanation loss, the explanation loss calculated based on the training explanation information and one or more explanation ground truths” ([0116] “The images of the training set of images may then be input into the DNN and the anchor points, or anchor lines, and the annotated paths (e.g., the delta values for the annotated paths with respect to the respective encoded anchor points) may be used as ground truth data to train the DNN”) and
“minimizing an attention loss, the attention loss calculated based on the training attention map and one or more attention ground truths” ([0116] “one or more loss functions (e.g., a single loss function, a loss function for each output type, etc.) may be used to compare the accuracy of the DNN's predicted paths with the annotated paths, as well as the predicted path type corresponding to each predicted path, and the parameters of the DNN may be updated (e.g., using backward passes, backpropagation, forward passed, etc.) until the accuracy reaches an optimal or acceptable level”)
Chu further teaches “generating a training attention map based on the training explanation information and the encoded training metadata, wherein the generating of the training attention map comprises marking a training area of interest of the training image” (Chu pg. 4836 right col. last ¶ “In single object tracking, the training samples for learning appearance model are collected online, where labels are based on tracking results. The appearance model is then used for finding the target in the next frame. When the target is occluded, the visual cue is unreliable for learning the appearance model.” and abstract “The visibility map of the target is learned and used for inferring the spatial attention map.”)
Regarding claims 4 and 14, the Towal, Chu, and Audebert references have been addressed above. Chu further teaches “further comprising: marking a target object as one of the areas of interest the image based on the trajectory prediction and the explanation information” (Chu abstract “This framework utilizes the merits of single object trackers in adapting appearance models and searching for target in the next frame.”))
Regarding claims 6 and 16, the Towal, Chu, and Audebert references have been addressed above. Towal further teaches “further comprising generating a decision tree based on the metadata” ([0040] “For example, and without limitation, the machine learning model(s) 104 described herein may include any type of machine learning model, such as a machine learning model(s) using linear regression, logistic regression, decision trees”)  
Regarding claims 7 and 17, the Towal, Chu, and Audebert references have been addressed above. Towal further teaches “further comprising generating state information for the target object based on the decision tree and the explanation information” ([0064] “the control component(s) 130 may use a forward prediction model that takes control as an input variable, and produces predictions that may be compared with the desired state (e.g., compared with the desired lateral and longitudinal path requested by the planning component(s) 128). The control(s) that minimize discrepancy may be determined.”)
Regarding claims 8 and 18, the Towal, Chu, and Audebert references have been addressed above. Towal further teaches “further comprising detecting environment information associated with surroundings of the autonomous vehicle and position information associated with the autonomous vehicle, the detecting based on the image and the sensor data” (abstract “In various examples, a deep learning solution for path detection is implemented to generate a more abstract definition of a drivable path without reliance on explicit lane-markings—by using a detection-based approach.”)
Regarding claims 9 and 19, the Towal, Chu, and Audebert references have been addressed above. Towal further teaches “wherein the metadata comprises: an object bounding box corresponding to the object, pose information associated with the autonomous vehicle, and map information corresponding to a location of the autonomous vehicle” ([0212] “The neural network may take as its input at least some subset of parameters, such as bounding box dimensions, ground plane estimate obtained (e.g. from another subsystem), inertial measurement unit (IMU) sensor 1166 output that correlates with the vehicle 1100 orientation, distance, 3D location estimates of the object obtained from the neural network and/or other sensors (e.g., LIDAR sensor(s) 1164 or RADAR sensor(s) 1160), among others”)  
Regarding claims 10 and 20, the Towal, Chu, and Audebert references have been addressed above. Towal further teaches “wherein the trajectory prediction is generated using a Recurrent Neural Network (RNN), and wherein the RNN utilizes at least one of a Long Short-Term Memory algorithm or a Generative Adversarial Network algorithm” ([0040] “the machine learning model(s) 104 described herein may include any type of machine learning model, such as a machine learning model(s) using linear regression, logistic regression, decision trees, support vector machines (SVM), Naïve Bayes, k-nearest neighbor (Knn), K means clustering, random forest, dimensionality reduction algorithms, gradient boosting algorithms, neural networks (e.g., auto-encoders, convolutional, recurrent, perceptrons, Long/Short Term Memory (LSTM), Hopfield, Boltzmann, deep belief, deconvolutional, generative adversarial, liquid state machine, etc.), and/or other types of machine learning models.”)
Claims 3 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Towal et al. US 2019/0384304 in view of Chu, Qi, et al. "Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism." further in view of Audebert, Nicolas, Bertrand Le Saux, and Sébastien Lefèvre. " and Urtasun et al. US 2020/0160151.
Regarding claim 3 and 13, the Towal Chu, and Audebert references have been addressed above. Both do not explicitly teach the claim limitations. Urtasun however teaches “further comprising: reducing one or more dimensions of the image, the metadata, and the one or more features” (Urtasun [0197] “the transform operations can include one or more rigid two-dimensional transform operations to transform the three-dimensional LiDAR sensor outputs represented in the source feature representation 420 into the two-dimensional representation of the transformed source feature representation 424”) 
“generating a second set of features” ([0198] “The correlation operations 426 can be performed on the attended feature representation 414 and the transformed source feature representation 424. The correlation operations can include the use of Fourier Fast Transform based cross correlation to determine one or more correlations between the attended feature representation 414 and the transformed source feature representation 424.”); 
“reconstructing the second set of features” ([0065] “generating a reconstructed target feature representation based at least in part on the compressed target feature representation and a machine-learned reconstruction model”); and 
“generating the explanation information based on the reconstructed second set of features” ([0198] “based on the results of the correlation operations 426, the correlation score 428 can be determined based on one or more similarities between the features of the attended feature representation 414 and the transformed source feature representation 424.”)
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Towal, Chu, and Audebert with that of Urtasun since a combination of known methods would yield predictable results. As shown in Urtasunm, dimension reduction is a known technique in the art and has useful properties. Thus these techniques would operate in a known in predictable manner with the systems above.
Conclusion


Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN W FIGUEROA whose telephone number is (571)272-4623. The examiner can normally be reached Monday-Friday, 10AM-6PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MIRANDA HUANG can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

KEVIN W FIGUEROA
Primary Patent Examiner
Art Unit 2124



/Kevin W Figueroa/Primary Examiner, Art Unit 2124