Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-5, 7, 9-11, 14 and 16-17 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Multi-View 3D Object Detection Network for Autonomous Driving to Chen et al., hereinafter, “Chen”.
Claim 1. A method comprising: determining, from sensor data, first data representing a first view of an environment; Chen [Figure 1: Multi-View 3D object detection network (MV3D)] LiDAR Bird View (is interpreted as the first view)
extracting, using one or more Neural Networks (NNs), classification data representing one or more classifications of objects or scenery depicted in the first view based at least on the first data; Chen [Figure 1: Multi-View 3D object detection network (MV3D)] teaches The network takes the bird’s eye view and front view of LIDAR point cloud as well as an image as input. It first generates 3D object proposals from bird’s eye view map and project them to three views. A deep fusion network is used to combine region-wise features obtained via ROI pooling for each view. The fused features are used to jointly predict object class and do oriented 3D box regression.
Chen [3.2. 3D Proposal Network] teaches we use a multi-task loss to simultaneously classify object/background and do 3D box regression… Examiner interprets the 3D boxes to be the objects or scenery.
generating transformed classification data representing the one or more classifications in a second view of the environment based at least on projecting the one or more classifications from the first view to the second view; Chen [Figure 1: Multi-View 3D object detection network (MV3D)] teaches The network takes the bird’s eye view and front view of LIDAR point cloud as well as an image as input. It first generates 3D object proposals from bird’s eye view map and project them to three views. A deep fusion network is used to combine region-wise features obtained via ROI pooling for each view. The fused features are used to jointly predict object class and do oriented 3D box regression.
Chen [3.3. Region based Fusion Network] teaches given the generated 3D proposals, we can project them to any views in the 3D space. In our case, we project them to three views, i.e., bird’s eye view (BV), front view (FV), and the image plane (RGB). Given a 3D proposal p3D, we obtain ROIs on each view via : (equations 2 and 3)
and generating, using the one or more NNs, second data representing one or more bounding shapes of one or more objects detected in the environment based at least on the transformed classification data. Chen [3.1. 3D Proposal Network] teaches given a bird’s eye view map. the network generates 3D box proposals from a set of 3D prior boxes. Each 3D box is parameterized by (x; y; z; l; w; h), which are the center and size (in meters) of the 3D box in LIDAR coordinate system. (applied in equation 2)
Chen Figure 6: Qualitative comparisons of 3D detection results: 3D Boxes are projected to the bird’s eye view and the images.
Claim 2. The method of claim 1, wherein the first view is a perspective view and the second view is a top-down view. Chen [3.3. Region based Fusion Network] teaches given the generated 3D proposals, we can project them to any views in the 3D space. In our case, we project them to three views, i.e., bird’s eye view (BV), front view (FV), and the image plane (RGB). Given a 3D proposal p3D, we obtain ROIs on each view via: (equations 2 and 3)
Claim 3. The method of claim 1, wherein the first data representing the first view of the environment comprises a projection of a LiDAR point cloud, the projection representing a perspective view of the environment, and wherein the projecting of the one or more classifications from the first view to the second view comprises using the LiDAR point cloud to project the one or more classifications from the perspective view to a top-down view of the environment. Chen [3.1. 3D Point Cloud Representation] teaches Existing work usually encodes 3D LIDAR point cloud into a 3D grid [26, 7] or a front view map [17]. While the 3D grid representation preserves most of the raw information of the point cloud, it usually requires much more complex computation for subsequent feature extraction. We propose a more compact representation by projecting 3D point cloud to the bird’s eye view and the front view. Fig. 2 visualizes the point cloud representation.
Chen [3.1. 3D Point Cloud Representation] teaches Bird’s Eye View Representation. The bird’s eye view representation is encoded by height, intensity and density. We discretize the projected point cloud into a 2D grid with resolution of 0.1m. For each cell, the height feature is computed as the maximum height of the points in the cell. To encode more detailed height information, the point cloud is divided equally into M slices. A height map is computed for each slice, thus we obtain M height maps. The intensity feature is the reflectance value of the point which has the maximum height in each cell. The point cloud density…
Chen [3.3. Region based Fusion Network] teaches Multi-View ROI Pooling. Since features from different views/modalities usually have different resolutions, we employ ROI pooling [10] for each view to obtain feature vectors of the same length. Given the generated 3D proposals, we can project them to any views in the 3D space. In our case, we project them to three views, i.e., bird’s eye view (BV), front view (FV), and the image plane (RGB). Given a 3D proposal p3D, we obtain ROIs on each view via:
Claim 4. The method of claim 1, wherein the first data represents a LiDAR range image of the first view, Chen [Figure 1: Multi-View 3D object detection network (MV3D)] LiDAR Bird View (is interpreted as the first view)
and the determining of the first data comprises projecting a LiDAR point cloud into the LiDAR range image. Chen [Figure 1: Multi-View 3D object detection network (MV3D)] teaches The network takes the bird’s eye view and front view of LIDAR point cloud as well as an image as input.
Chen [3.1. 3D Point Cloud Representation] teaches we propose a more compact representation by projecting 3D point cloud to the bird’s eye view and the front view. Fig. 2 visualizes the point cloud representation.
Claim 5. The method of claim 1, wherein the first data represents a LiDAR range image of the first view, the LiDAR range image having a height in pixels corresponding to a number of horizontal scan lines of a LiDAR sensor that captured the sensor data. Chen [Figure 2] (b) Front view features [Front View Representation] teaches… are the horizontal and vertical resolution of laser beams, respectively. We encode the front view map with three-channel features, which are height, distance and intensity, as visualized in Fig. 2.
Claim 7. The method of claim 1, wherein the projecting of the one or more classifications from the first view to the second view comprises applying a differentiable transformation to 3D locations associated with the classification data. Chen [equation 2] is interpreted to be differentiable 
Claim 9. The method of claim 1, further comprising: decoding an output of one or more NNs to produce candidate bounding shapes for the one or more objects; identifying the second data representing the one or more bounding shapes for the one or more objects based on performing at least one of filtering or clustering of the candidate bounding boxes to remove duplicate candidates from the candidate bounding boxes; and assigning a class label for each of the one or more bounding shapes based on the output of the one or more NNs. Chen [Figure 1: Multi-View 3D object detection network (MV3D)]
Claim 10. The method of claim 1, wherein the determining of the second data representing the one or more bounding shapes comprises: decoding an output of the one or more NNs to produce candidate bounding shapes for the one or more objects; and identifying the second data representing the one or more bounding shapes for the one or more objects based on performing at least one of non-maximum suppression or density-based spatial clustering of applications with noise to remove duplicate candidates from the candidate bounding shapes. Chen [3.2. 3D Proposal Network] teaches for each non-empty anchor at each position of the last convolution feature map, the network generates a 3D box. To reduce redundancy, we apply Non-Maximum Suppression (NMS) on the bird’s eye view boxes. Different from [23], we did not use 3D NMS because objects should occupy different space on the ground plane. We use IoU threshold of 0.7 for NMS. The top 2000 boxes are kept during training, while in testing, we only use 300 boxes.

Claim 11. The method of claim 1, wherein an output of the one or more NNs comprises a tensor storing regressed geometry data for each detected object, wherein the determining of the second data representing the one or more bounding shapes comprises generating one or more 3D bounding shapes for the one or more objects from the regressed geometry data. Chen [Oriented 3D Box Regression]
Claim 14. Claim 14 is similarly claimed in claims 1 and 2. Therefore claim 14 has been analyzed and reviewed in the same way as claim 1 and 2. See the above analysis. 
Claim 16. The method of claim 14, wherein the one or more NN's includes a first stage configured to evaluate the first data representing the perspective view and a second stage configured to evaluate the transformed classification data representing the top-down view. Chen [Figure 1: Multi-View 3D object detection network (MV3D)] (3 stages) 
Claim 17. The method of claim 14, wherein the second data further represents a class label for each of the one or more bounding shapes the one or more objects. Chen [Figure 1: Multi-View 3D object detection network (MV3D)] 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 8, 12 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Multi-View 3D Object Detection Network for Autonomous Driving to Chen et al., hereinafter, “Chen” in view of US 2020/0193606 A1 to Douillard et al., hereinafter, “Douillard”.
Claim 8. Chen is silent on claim 12, however Douillard, in the field of analyzing 3D LiDAR data in images, wherein the sensor data represents a LiDAR point cloud, wherein the transformed classification data represents one or more confidence maps in the second view, and the method further comprises: generating third data representing one or more height maps based at least on projecting the LiDAR point cloud into the second view; forming a tensor comprising a first set of one or more channels storing the transformed classification data representing the one or more confidence maps and a second set of one or more channels storing the third data representing the one or more height maps; and extracting, from the tensor using the one or more NNs, second classification data representing one or more second classifications in the second view and fourth data representing object instance geometry of the one or more objects. Douillard [0154-0164]
Douillard [0167 teaches D. The system of any one of example A through example C, wherein the second height is greater than the first height, and wherein the orientation of the rendering plane is selected to substantially maximize the second height associated with the extracted data relative to the rendering plane. 
Douillard [0181] teaches K. The method of any one of example H through example J, wherein: [0182] the extracted data represents a first width and a first height relative to the first perspective; and [0183] wherein the orientation of the rendering plane is selected such that the extracted data represents a second width and a second height relative to the second perspective, and wherein the second width is greater than the first width. 
Douillard [0206] teaches T. A system of example R or example S, wherein: [0207] the extracted data represents a first width and a first height relative to the first perspective; and [0208] wherein the orientation of the rendering plane is selected such that the extracted data represents a second width and a second height relative to the second perspective, and wherein the second width is greater than the first width

Claim 12. Chen is silent on claim 12, however Douillard, in the field of analyzing 3D LiDAR data in images, further comprising training the one or more NNs using training data generated using annotation tracking to track an annotated object between two or more frames of corresponding sensor data. Chen [Abstract] teaches we propose Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes.
Chen [Introduction] teaches 3D object detection plays an important role in the visual perception system of Autonomous driving cars. Modern self-driving cars are commonly equipped with multiple sensors, such as LIDAR and cameras. Laser scanners have the advantage of accurate depth information while cameras preserve much more detailed semantic information. The fusion of LIDAR point cloud and RGB images should be able to achieve higher performance and safety to self-driving cars.
Douillard [0058] teaches the classification module 316 may include functionality to receive segmented data and to identify a type of object represented by the data. For example, the classification module 316 may classify one or more objects, including but not limited to cars, buildings, pedestrians, bicycles, trees, free space, occupied space, street signs, lane markings, etc.
Douillard [0106] teaches the process may include receiving one or more images that have been segmented to create a set of images segmented for free space, while in some instances, the operation 1004 may include receiving indications of one or more objects identified in the segmented images to perform object tracking and/or object motion prediction. At operation 1206, the process can include inputting the images segmented for free space or inputting the identified and/or tracked objects into a planner system, to generate a trajectory for the autonomous vehicle. In some instances, the planner system may be incorporated into a computing system to receive free space segmented images or to receive objects to be tracked and to generate a trajectory based at least in part on the segmented images or tracked objects… In some instances, the trajectory generated in the operation 1206 may constrain the operation of the autonomous vehicle to operate within the free space segmented in the operation 1204, or to avoid objects identified and/or tracked by a planner system of the autonomous vehicle.
Douillard [0110] teaches the process can include applying the three-dimensional segmentation information to a three-dimensional dataset to identify a three-dimensional object. In some instances, the process 1300 may include performing classification on a per-object basis. Thus, the operation 1306 may include identifying, isolating, extracting, and/or segmenting the three-dimensional object from the three-dimensional dataset so that any subsequent processing can be optimized for the particular object.
Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious to one of ordinary skill in the art to modify extracting, using one or more Neural Networks (NNs), classification data representing one or more classifications of objects or scenery depicted in the first view based at least on the first data by Chen with Douillard’s teaching of training the one or more NNs using training data generated using annotation tracking to track an annotated object between two or more frames of corresponding sensor data. One would have been motivated to perform this combination due to the fact that it allows three-dimensional data can be used in computer vision contexts to locate and interact with objects in the physical world (Douillard, [0025]). In combination, Chen is not altered in that Chen continues 3D object detection in autonomous driving. Douillard's teachings perform the same as they do separately of multi-dimensional data may include data captured by a LIDAR system for use in conjunction with a perception system for an autonomous vehicle.
Therefore one of ordinary skill in the art, such as an individual working in the field of LiDAR data in images could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to claim 12.
Claim 13. Douillard further teaches further comprising training the one or more NNs using training data generated using a link between object tracks generated for a particular object from corresponding sensor data from two or more sensors. Douillard [0058] teaches the classification module 316 may include functionality to receive segmented data and to identify a type of object represented by the data. For example, the classification module 316 may classify one or more objects, including but not limited to cars, buildings, pedestrians, bicycles, trees, free space, occupied space, street signs, lane markings, etc.
Douillard [0065] teaches FIG. 4A depicts a side view 400 of an example vehicle 402 having multiple sensor assemblies mounted to the vehicle 402. In some instances, datasets from the multiple sensor assemblies can be combined or synthesized to form a meta spin (e.g., LIDAR data representing a plurality of LIDAR sensors) or can be combined or fused using sensor fusion techniques to improve an accuracy or processing for segmentation, classification, prediction, planning, trajectory generation, etc.
Douillard [0070] teaches at operation 502, the process may include receiving three-dimensional data. In some instances, the three-dimensional data may include LIDAR data from one or more sensors. In some instances, the three-dimensional data may include fused sensor data including LIDAR data and Radar data.
[0090] teaches the segmentation data 1008 can be applied to the three-dimensional LIDAR data 1006 to identify and isolate data on a per object basis. For example, the LIDAR data 1006 may only include LIDAR data associated with a particular segmentation ID, or may include only the LIDAR data inside a three dimensional bounding box output from the segmentation algorithm.
Douillard [0105] teaches the converted data is input into a convolutional neural network that is trained to segment images based on free space (e.g., drivable or navigable space) in the input image. In some instances, the convolutional neural network may identify objects to be tracked by various systems of the autonomous vehicle. As an example, the converted data may be generated from an image capture system (e.g., a perception system) onboard an autonomous vehicle. In some instances, the image capture system may include any number of sensors, including but not limited to image sensors, LIDAR, radar, etc.
Douillard [0106] teaches the process may include receiving one or more images that have been segmented to create a set of images segmented for free space, while in some instances, the operation 1004 may include receiving indications of one or more objects identified in the segmented images to perform object tracking and/or object motion prediction. At operation 1206, the process can include inputting the images segmented for free space or inputting the identified and/or tracked objects into a planner system, to generate a trajectory for the autonomous vehicle. In some instances, the planner system may be incorporated into a computing system to receive free space segmented images or to receive objects to be tracked and to generate a trajectory based at least in part on the segmented images or tracked objects… In some instances, the trajectory generated in the operation 1206 may constrain the operation of the autonomous vehicle to operate within the free space segmented in the operation 1204, or to avoid objects identified and/or tracked by a planner system of the autonomous
Douillard [0110] teaches the process can include applying the three-dimensional segmentation information to a three-dimensional dataset to identify a three-dimensional object. In some instances, the process 1300 may include performing classification on a per-object basis. Thus, the operation 1306 may include identifying, isolating, extracting, and/or segmenting the three-dimensional object from the three-dimensional dataset so that any subsequent processing can be optimized for the particular object.
Claim 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Multi-View 3D Object Detection Network for Autonomous Driving to Chen et al., hereinafter, “Chen” in view of US 2020/0193606 A1 to Douillard et al., hereinafter, “Douillard”.
Claim 18. A method comprising: generating, using one or more neural networks (NNs), classification data representing one or more classifications from image data representing an image of a first view of an environment; Chen [Figure 1: Multi-View 3D object detection network (MV3D)] LiDAR Bird View (is interpreted as the first view) 
projecting the labeled sensor data to a second view of the environment to generate transformed classification data representing the one or more classifications in the second view; Chen [Figure 1: Multi-View 3D object detection network (MV3D)] teaches The network takes the bird’s eye view and front view of LIDAR point cloud as well as an image as input. It first generates 3D object proposals from bird’s eye view map and project them to three views. A deep fusion network is used to combine region-wise features obtained via ROI pooling for each view. The fused features are used to jointly predict object class and do oriented 3D box regression.
Chen [3.3. Region based Fusion Network] teaches given the generated 3D proposals, we can project them to any views in the 3D space. In our case, we project them to three views, i.e., bird’s eye view (BV), front view (FV), and the image plane (RGB). Given a 3D proposal p3D, we obtain ROIs on each view via : (equations 2 and 3)
and generating, using the one or more NNs, second data representing and generating, using the one or more neural networks (NNs), second data representing one or more bounding shapes of one or more objects detected in the environment based at least on the transformed classification data. Chen [3.1. 3D Proposal Network] teaches given a bird’s eye view map. the network generates 3D box proposals from a set of 3D prior boxes. Each 3D box is parameterized by (x; y; z; l; w; h), which are the center and size (in meters) of the 3D box in LIDAR coordinate system. (applied in equation 2)
Chen Figure 6: Qualitative comparisons of 3D detection results: 3D Boxes are projected to the bird’s eye view and the images.
While Chen discloses 3D locations, Chen fails to explicitly teach associating the classification data with corresponding three-dimensional (3D) locations identified from corresponding sensor data to generate labeled sensor data. Douillard, in the field of analyzing LiDAR data in images, teaches associating the classification data with corresponding three-dimensional (3D) locations identified from corresponding sensor data to generate labeled sensor data; Chen [Introduction] teaches methods based on LIDAR point cloud usually achieve more accurate 3D locations…we propose a Multi-View 3D object detection network (MV3D) which takes multimodal data as input and predicts the full 3D extent of objects in 3D space… Given the multi-view feature representation, the network performs oriented 3D box regression which predict accurate 3D location, size and orientation of objects in 3D space.
Chen [Figure 1: Multi-View 3D object detection network (MV3D)] teaches The network takes the bird’s eye view and front view of LIDAR point cloud as well as an image as input. It first generates 3D object proposals from bird’s eye view map and project them to three views. A deep fusion network is used to combine region-wise features obtained via ROI pooling for each view. The fused features are used to jointly predict object class and do oriented 3D box regression.
Chen [3.2. 3D Proposal Network] teaches we use a multi-task loss to simultaneously classify object/background and do 3D box regression… Examiner interprets the 3D boxes to be the objects or scenery.
Douillard [0110] teaches Measurements of the LIDAR system may be represented as three-dimensional LIDAR data having coordinates (e.g., Cartesian, polar, etc.) corresponding to positions or distances captured by the LIDAR system.
Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious to one of ordinary skill in the art to modify extracting, using one or more Neural Networks (NNs), classification data representing one or more classifications of objects or scenery depicted in the first view based at least on the first data by Chen with Douillard’s teaching of training the one or more NNs using training data generated using annotation tracking to track an annotated object between two or more frames of corresponding sensor data. One would have been motivated to perform this combination due to the fact that it allows three-dimensional data can be used in computer vision contexts to locate and interact with objects in the physical world (Douillard [0025]). In combination, Chen is not altered in that Chen continues 3D object detection in autonomous driving. Douillard's teachings perform the same as they do separately of multi-dimensional data may include data captured by a LIDAR system for use in conjunction with a perception system for an autonomous vehicle.
Therefore one of ordinary skill in the art, such as an individual working in the field of LiDAR data in images could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to claim 18.

Claims 19 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Multi-View 3D Object Detection Network for Autonomous Driving to Chen et al., hereinafter, “Chen” in view of US 2020/0193606 A1 to Douillard et al., hereinafter, “Douillard” and in further view of US 2020/0301013 A1 to Banerjee et al. hereinafter, “Banerjee”.
Claim 19. The combination of Chen and Douillard fails to explicitly teach the limitations of claim 19, however, Banerjee, in the field of object detection based on LiDAR data, teaches wherein the second data further represents one or more class labels of the one or more objects. Banerjee [0097] teaches it is proposed to use the fused RGBD information from the camera and LiDAR sensors (D being the depth information for each pixel in the camera image) for object detection. One approach for object detection is to use machine learning techniques with neural networks where labelled data can be used for training and evaluating the neural network. Labelled data in the context of object recognition is manually labelling the bounding boxes for each object of interest in an image and assigning a class label for each object of interest.
Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious to one of ordinary skill in the art to modify extracting, using one or more Neural Networks (NNs), classification data representing one or more classifications of objects or scenery depicted in the first view based at least on the first data by Chen and Douillard’s with Banerjee teaching of the second data further represents one or more class labels of the one or more objects. One would have been motivated to perform this combination due to the fact that it allows three-dimensional data can be used in computer vision contexts to locate and interact with objects in the physical world (Banerjee [0025]). In combination, Chen is not altered in that Chen continues 3D object detection in autonomous driving. Douillard's teachings perform the same as they do separately of multi-dimensional data may include data captured by a LIDAR system for use in conjunction with a perception system for an autonomous vehicle. Banerjee continues to teach object detection in a scene is based on lidar data and radar data of the scene.
Therefore one of ordinary skill in the art, such as an individual working in the field of LiDAR data in images could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to claim 19.
Claim 20. The combination of Chen and Banerjee further teaches wherein the generating of the second data representing the one or more bounding shapes comprises generating one or more bounding shapes and associated class labels for the objects based on second classification data representing one or more second classifications in the second view and third data representing object instance geometry, the second classification data and the third data extracted by the one or more NNs. Chen [Figure 1: Multi-View 3D object detection network (MV3D)]
Chen [Introduction] teaches methods based on LIDAR point cloud usually achieve more accurate 3D locations…we propose a Multi-View 3D object detection network (MV3D) which takes multimodal data as input and predicts the full 3D extent of objects in 3D space… Given the multi-view feature representation, the network performs oriented 3D box regression which predict accurate 3D location, size and orientation of objects in 3D space.
Banerjee [0097] teaches it is proposed to use the fused RGBD information from the camera and LiDAR sensors (D being the depth information for each pixel in the camera image) for object detection. One approach for object detection is to use machine learning techniques with neural networks where labelled data can be used for training and evaluating the neural network. Labelled data in the context of object recognition is manually labelling the bounding boxes for each object of interest in an image and assigning a class label for each object of interest.
Allowable Subject Matter
Claims 6 and 15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The innovation that makes claims 6 and 15 allowable is “converting the accumulated sensor data to motion-compensated sensor data corresponding to a position of the ego-actor at a particular time;  and projecting the motion-compensated sensor data into two-dimensional (2D) image-space to generate the first data representing a LiDAR range image of the perspective view of the environment.” 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds to Zhou et al. [3.2 Feature Fusion] 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DELOMIA L GILLIARD whose telephone number is (571)272-1681.  The examiner can normally be reached on 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on 571 272-8243.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/DELOMIA L GILLIARD/Primary Examiner, Art Unit 2661