Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Response to Amendment
Claims 1, 5, 6, 8, 9, 16, 18 and 20 are currently amended. Claims 1-20 are pending. 
Response to Arguments
Applicant’s arguments, see Remarks, filed Ovtober 22, 2021, with respect to the rejection(s) of claim(s) 1, 14 and 18 under 35 USC 102 and 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of US 2018/0314253 A1 to Mercep et al., hereinafter, “Mercep”.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-7, 9-11 and 14-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Multi-View 3D Object Detection Network for Autonomous Driving to Chen et al., hereinafter, “Chen” in view of US 2018/0314253 A1 to Mercep et al., hereinafter, “Mercep”.
Claim 1. A method comprising: converting accumulated sensor data to motion-compensated sensor data corresponding to a position of an ego-actor at a particular time, projecting the motion-compensated sensor data into two-dimensional (213) image- space to generate, first data representing a first view of an environment; Chen [Figure 1: Multi-View 3D object detection network (MV3D)] LiDAR Bird View (is interpreted as the first view)
Chen fails to explicitly teach converting accumulated sensor data to motion-compensated sensor data corresponding to a position of an ego-actor at a particular time, projecting the motion-compensated sensor data into two-dimensional (213) image- space to generate. Mercep, in the field of driver’s assistance in image data, teaches Mercep [0007] teaches this application discloses a computing system to perform machine learning classification of sensor measurement data in an assisted or automated driving system of a vehicle. The computing system can receive sensor measurement data from multiple different sensor modalities, which the computing system can temporally and spatially align into an environmental model. The computing system can detect events, such as data point clusters or image features, in the sensor measurement data.
Mercep [0037] teaches the measurement integration system 310 also can temporally align the raw measurement data 301 from different sensors in the sensor system. In some embodiments, the measurement integration system 310 can include a temporal alignment unit 312 to assign time stamps to the raw measurement data 301 based on when the sensor captured the raw measurement data 301, when the raw measurement data 301 was received by the measurement integration system 310, or the like. In some embodiments, the temporal alignment unit 312 can convert a capture time of the raw measurement data 301 provided by the sensors into a time corresponding to the sensor fusion system 300. The measurement integration system 310 can annotate the raw measurement data 301 populated in the environmental model 315 with the time stamps for the raw measurement data 301. The time stamps for the raw measurement data 301 can be utilized by the sensor fusion system 300 to group the raw measurement data 301 in the environmental model 315 into different time periods or time slices. In some embodiments, a size or duration of the time periods or time slices can be based, at least in part, on a refresh rate of one or more sensors in the sensor system. For example, the sensor fusion system 300 can set a time slice to correspond to the sensor with a fastest rate of providing new raw measurement data 301 to the sensor fusion system 300.
Mercep [0038] teaches the measurement integration system 310 can include an ego motion unit 313 to compensate for movement of at least one sensor capturing the raw measurement data 301, for example, due to the vehicle driving or moving in the environment. The ego motion unit 313 can estimate motion of the sensor capturing the raw measurement data 301, for example, by utilizing tracking functionality to analyze vehicle motion information, such as global positioning system (GPS) data, inertial measurements, vehicle odometer data, video images, or the like. The tracking functionality can implement a Kalman filter, a Particle filter, optical flow-based estimator, or the like, to track motion of the vehicle and its corresponding sensors relative to the environment surrounding the vehicle. 
Mercep [0039] teaches the ego motion unit 313 can utilize the estimated motion of the sensor to modify the correlation between the measurement coordinate field of the sensor to the environmental coordinate field for the environmental model 315. This compensation of the correlation can allow the measurement integration system 310 to populate the environmental model 315 with the raw measurement data 301 at locations of the environmental coordinate field where the raw measurement data 301 was captured as opposed to the current location of the sensor at the end of its measurement capture.
Mercep [0083] teaches the management system 410 or the graph system 420 also may generate a perspective view, such as a bird's eye view or an image view, which can overlay sensor measurement data from multiple different sensor modalities. The management system 410 or the graph system 420 may generate perspective or parallel projections of the sensor measurement data 401, such as an orthographic projection onto a ground plane or a perspective projection to the front-facing camera image plane. The management system 410 or the graph system 420 may utilize these generated views or projections to perform cross-checking of the classification indicated from the match distances.
extracting, using one or more Neural Networks (NNs), classification data representing one or more classifications of objects or scenery depicted in the first view based at least on the first data; Chen [Figure 1: Multi-View 3D object detection network (MV3D)] teaches The network takes the bird’s eye view and front view of LIDAR point cloud as well as an image as input. It first generates 3D object proposals from bird’s eye view map and project them to three views. A deep fusion network is used to combine region-wise features obtained via ROI pooling for each view. The fused features are used to jointly predict object class and do oriented 3D box regression.
Chen [3.2. 3D Proposal Network] teaches we use a multi-task loss to simultaneously classify object/background and do 3D box regression… Examiner interprets the 3D boxes to be the objects or scenery.
Mercep [0050] teaches the classification system 400 can perform the classification utilizing a machine learning object classifier. The machine learning object classifier can include multiple classification graphs or tensor graphs, for example, each to describe a different object model. In some embodiments, a classification graph can include multiple nodes, each configured to include matchable data corresponding to a subset of the various poses, orientations, transitional states, potential deformations, textural features, or the like, in the object model. The classification system 400 also can perform the classification utilizing other computational techniques, such as a feed-forward neural network, a support vector machine (SVM), or the like. 
Mercep [0070] teaches FIG. 5A illustrates an example classification graph 500 in a machine learning object classifier implementation of a classification system according to various examples.
Mercep [0088] teaches the unity characteristic can identify whether the sensor measurement data corresponds to a single possible object or multiple possible objects proximate to each other, which can help a machine learning classifier select other node or classification graphs corresponding to different portions of an object model. The velocity characteristic can identify at least one velocity associated with the sensor measurement data. The orientation characteristic can identify a directionality of the sensor measurement data and/or an angle associated with the possible object relative to the vehicle. The center of gravity characteristic can identify a center of the possible object or center of a bounding box corresponding to the sensor measurement data based on a density of the data points associated with the detection event.
generating transformed classification data representing the one or more classifications in a second view of the environment based at least on projecting the one or more classifications from the first view to the second view; Chen [Figure 1: Multi-View 3D object detection network (MV3D)] teaches The network takes the bird’s eye view and front view of LIDAR point cloud as well as an image as input. It first generates 3D object proposals from bird’s eye view map and project them to three views. A deep fusion network is used to combine region-wise features obtained via ROI pooling for each view. The fused features are used to jointly predict object class and do oriented 3D box regression.
Chen [3.3. Region based Fusion Network] teaches given the generated 3D proposals, we can project them to any views in the 3D space. In our case, we project them to three views, i.e., bird’s eye view (BV), front view (FV), and the image plane (RGB). Given a 3D proposal p3D, we obtain ROIs on each view via : (equations 2 and 3)
Mercep [0075] teaches the matchable data 550 can correspond to a vehicle classification, which can allow the sensor measurement data 530 to be viewed from a bird's eye view. The sensor measurement data 530 can be skeletonized in the bird's eye view, for example, by generating lines that connect edges of the sensor measurement data 530., [0083]
and generating, using the one or more NNs, second data representing one or more bounding shapes of one or more objects detected in the environment based at least on the transformed classification data. Chen [3.1. 3D Proposal Network] teaches given a bird’s eye view map. the network generates 3D box proposals from a set of 3D prior boxes. Each 3D box is parameterized by (x; y; z; l; w; h), which are the center and size (in meters) of the 3D box in LIDAR coordinate system. (applied in equation 2)
Chen Figure 6: Qualitative comparisons of 3D detection results: 3D Boxes are projected to the bird’s eye view and the images.
Mercep [0050] teaches the classification system 400 can perform the classification utilizing a machine learning object classifier. The machine learning object classifier can include multiple classification graphs or tensor graphs, for example, each to describe a different object model. In some embodiments, a classification graph can include multiple nodes, each configured to include matchable data corresponding to a subset of the various poses, orientations, transitional states, potential deformations, textural features, or the like, in the object model. The classification system 400 also can perform the classification utilizing other computational techniques, such as a feed-forward neural network, a support vector machine (SVM), or the like. 
Mercep [0070] teaches FIG. 5A illustrates an example classification graph 500 in a machine learning object classifier implementation of a classification system according to various examples.
Mercep [0088] teaches the unity characteristic can identify whether the sensor measurement data corresponds to a single possible object or multiple possible objects proximate to each other, which can help a machine learning classifier select other node or classification graphs corresponding to different portions of an object model. The velocity characteristic can identify at least one velocity associated with the sensor measurement data. The orientation characteristic can identify a directionality of the sensor measurement data and/or an angle associated with the possible object relative to the vehicle. The center of gravity characteristic can identify a center of the possible object or center of a bounding box corresponding to the sensor measurement data based on a density of the data points associated with the detection event.
Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious to one of ordinary skill in the art to modify extracting, using one or more Neural Networks (NNs), classification data representing one or more classifications of objects or scenery depicted in the first view based at least on the first data by Chen with Mercep’s teaching of converting accumulated sensor data to motion-compensated sensor data corresponding to a position of an ego-actor at a particular time, projecting the motion-compensated sensor data into two-dimensional (213) image- space to generate. One would have been motivated to perform this combination due to the fact that it allows receive sensor measurement data from multiple different sensor modalities, which the computing system can temporally and spatially align into an environmental model (Mercep, [0007]). In combination, Chen is not altered in that Chen continues 3D object detection in autonomous driving. Mercep's teachings perform the same as they do separately of implementing perception in sensor data for an assisted or automated driving system of a vehicle.
Therefore one of ordinary skill in the art, such as an individual working in the field of object detection in image data could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to claim 1.
Claim 2. The method of claim 1, wherein the first view is a perspective view and the second view is a top-down view. Chen [3.3. Region based Fusion Network] teaches given the generated 3D proposals, we can project them to any views in the 3D space. In our case, we project them to three views, i.e., bird’s eye view (BV), front view (FV), and the image plane (RGB). Given a 3D proposal p3D, we obtain ROIs on each view via: (equations 2 and 3)
Claim 3. The method of claim 1, wherein the first data representing the first view of the environment comprises a projection of a LiDAR point cloud, the projection representing a perspective view of the environment, and wherein the projecting of the one or more classifications from the first view to the second view comprises using the LiDAR point cloud to project the one or more classifications from the perspective view to a top-down view of the environment. Chen [3.1. 3D Point Cloud Representation] teaches Existing work usually encodes 3D LIDAR point cloud into a 3D grid [26, 7] or a front view map [17]. While the 3D grid representation preserves most of the raw information of the point cloud, it usually requires much more complex computation for subsequent feature extraction. We propose a more compact representation by projecting 3D point cloud to the bird’s eye view and the front view. Fig. 2 visualizes the point cloud representation.
Chen [3.1. 3D Point Cloud Representation] teaches Bird’s Eye View Representation. The bird’s eye view representation is encoded by height, intensity and density. We discretize the projected point cloud into a 2D grid with resolution of 0.1m. For each cell, the height feature is computed as the maximum height of the points in the cell. To encode more detailed height information, the point cloud is divided equally into M slices. A height map is computed for each slice, thus we obtain M height maps. The intensity feature is the reflectance value of the point which has the maximum height in each cell. The point cloud density…
Chen [3.3. Region based Fusion Network] teaches Multi-View ROI Pooling. Since features from different views/modalities usually have different resolutions, we employ ROI pooling [10] for each view to obtain feature vectors of the same length. Given the generated 3D proposals, we can project them to any views in the 3D space. In our case, we project them to three views, i.e., bird’s eye view (BV), front view (FV), and the image plane (RGB). Given a 3D proposal p3D, we obtain ROIs on each view via:
Claim 4. The method of claim 1, wherein the first data represents a LiDAR range image of the first view, Chen [Figure 1: Multi-View 3D object detection network (MV3D)] LiDAR Bird View (is interpreted as the first view)
and the determining of the first data comprises projecting a LiDAR point cloud into the LiDAR range image. Chen [Figure 1: Multi-View 3D object detection network (MV3D)] teaches The network takes the bird’s eye view and front view of LIDAR point cloud as well as an image as input.
Chen [3.1. 3D Point Cloud Representation] teaches we propose a more compact representation by projecting 3D point cloud to the bird’s eye view and the front view. Fig. 2 visualizes the point cloud representation.
Claim 5. The method of claim 1, wherein the first data represents a LiDAR range image of the first view, the LiDAR range image having a height in pixels corresponding to a number of horizontal scan lines of a LiDAR sensor that captured the sensor data. Chen [Figure 2] (b) Front view features [Front View Representation] teaches… are the horizontal and vertical resolution of laser beams, respectively. We encode the front view map with three-channel features, which are height, distance and intensity, as visualized in Fig. 2.
Claim 6. The method of claim 1, wherein the sensor data comprises accumulated sensor data from one or more LiDAR sensors of an ego-actor accumulated over a period of time, and the first data representing a LiDAR range image of the first view of the environment. Mercep [0004] teaches these vehicles typically include multiple sensors, such as one or more cameras, a Light Detection and Ranging ( LIDAR) sensor, a Radio Detection and Ranging (RADAR) system, ultrasonic, or the like, to measure different portions of the environment around the vehicles. Each sensor processes their own measurements captured over time to detect an object within their field of view, and then provide a list of detected objects to an application in the advanced driver assistance systems or the autonomous driving systems to which the sensor is dedicated. In some instances, the sensors can also provide a confidence level corresponding to their detection of objects on the list based on their captured measurements.
Mercep [0075] teaches FIG. 5B illustrates an example flow for comparing sensor measurement data 530 to matchable data 550 in a node of classification graph according to various embodiments. Referring to FIG. 5B, the sensor measurement data 530 may include a set of data points, such as LIDAR points or point cloud, which a sensor fusion system identified as corresponding to a detection event.
Claim 7. The method of claim 1, wherein the projecting of the one or more classifications from the first view to the second view comprises applying a differentiable transformation to 3D locations associated with the classification data. Chen [equation 2] is interpreted to be differentiable 
Claim 9. The method of claim 1, further comprising: decoding an output of one or more NNs to produce candidate bounding shapes for the one or more objects; identifying the second data representing the one or more bounding shapes for the one or more objects based on performing at least one of filtering or clustering of the candidate bounding boxes to remove duplicate candidates from the candidate bounding boxes; and assigning a class label for each of the one or more bounding shapes based on the output of the one or more NNs. Chen [Figure 1: Multi-View 3D object detection network (MV3D)]
Claim 10. The method of claim 1, wherein the determining of the second data representing the one or more bounding shapes comprises: decoding an output of the one or more NNs to produce candidate bounding shapes for the one or more objects; and identifying the second data representing the one or more bounding shapes for the one or more objects based on performing at least one of non-maximum suppression or density-based spatial clustering of applications with noise to remove duplicate candidates from the candidate bounding shapes. Chen [3.2. 3D Proposal Network] teaches for each non-empty anchor at each position of the last convolution feature map, the network generates a 3D box. To reduce redundancy, we apply Non-Maximum Suppression (NMS) on the bird’s eye view boxes. Different from [23], we did not use 3D NMS because objects should occupy different space on the ground plane. We use IoU threshold of 0.7 for NMS. The top 2000 boxes are kept during training, while in testing, we only use 300 boxes.

Claim 11. The method of claim 1, wherein an output of the one or more NNs comprises a tensor storing regressed geometry data for each detected object, wherein the determining of the second data representing the one or more bounding shapes comprises generating one or more 3D bounding shapes for the one or more objects from the regressed geometry data. Chen [Oriented 3D Box Regression]
Claim 14. Claim 14 is similarly claimed in claims 1 and 2. Therefore claim 14 has been analyzed and reviewed in the same way as claim 1 and 2. See the above analysis. 
Claim 15. The method of claim 14, wherein the generating of the first data representing the perspective view of the environment comprises: accessing accumulated sensor data, from the one or more LiDAR sensors of an ego-actor, accumulated over a period of time; Mercep [0004] teaches these vehicles typically include multiple sensors, such as one or more cameras, a Light Detection and Ranging ( LIDAR) sensor, a Radio Detection and Ranging (RADAR) system, ultrasonic, or the like, to measure different portions of the environment around the vehicles. Each sensor processes their own measurements captured over time to detect an object within their field of view, and then provide a list of detected objects to an application in the advanced driver assistance systems or the autonomous driving systems to which the sensor is dedicated. In some instances, the sensors can also provide a confidence level corresponding to their detection of objects on the list based on their captured measurements.
Mercep [0029] teaches Referring back to FIG. 1, the autonomous driving system 100 can include a sensor fusion system 300 to receive the raw measurement data 115 from the sensor system 110 and populate an environmental model 121 associated with the vehicle with the raw measurement data 115. In some embodiments, the environmental model 121 can have an environmental coordinate field corresponding to a physical envelope surrounding the vehicle, and the sensor fusion system 300 can populate the environmental model 121 with the raw measurement data 115 based on the environmental coordinate field. In some embodiments, the environmental coordinate field can be a non-vehicle centric coordinate field, for example, a world coordinate system, a path-centric coordinate field, a coordinate field parallel to a road surface utilized by the vehicle, or the like. 
Mercep [0030] teaches FIG. 2B illustrates an example environmental coordinate field 220 associated with an environmental model for the vehicle 200 according to various embodiments. Referring to FIG. 2B, an environment surrounding the vehicle 200 can correspond to the environmental coordinate field 220 for the environmental model. The environmental coordinate field 220 can be vehicle-centric and provide a 360 degree area around the vehicle 200. The environmental model can be populated and annotated with information detected by the sensor fusion system 300 or inputted from external sources. Embodiments will be described below in greater detail. 
Mercep [0031] teaches Referring back to FIG. 1, to populate the raw measurement data 115 into the environmental model 121 associated with the vehicle, the sensor fusion system 300 can spatially align the raw measurement data 115 to the environmental coordinate field of the environmental model 121. The sensor fusion system 300 also can identify when the sensors captured the raw measurement data 115, for example, by time stamping the raw measurement data 115 when received from the sensor system 110. The sensor fusion system 300 can populate the environmental model 121 with the time stamp or other time-of-capture information, which can be utilized to temporally align the raw measurement data 115 in the environmental model 121. In some embodiments, the sensor fusion system 300 can analyze the raw measurement data 115 from the multiple sensors as populated in the environmental model 121 to detect a sensor event or at least one object in the environmental coordinate field associated with the vehicle. The sensor event can include a sensor measurement event corresponding to a presence of the raw measurement data 115 in the environmental model 121, for example, above a noise threshold. The sensor event can include a sensor detection event corresponding to a spatial and/or temporal grouping of the raw measurement data 115 in the environmental model 121. The object can correspond to spatial grouping of the raw measurement data 115 having been tracked in the environmental model 121 over a period of time, allowing the sensor fusion system 300 to determine the raw measurement data 115 corresponds to an object around the vehicle. The sensor fusion system 300 can populate the environment model 121 with an indication of the detected sensor event or detected object and a confidence level of the detection. Embodiments of sensor fusion and sensor event detection or object detection will be described below in greater detail. 
Mercep [0037] teaches the measurement integration system 310 also can temporally align the raw measurement data 301 from different sensors in the sensor system. In some embodiments, the measurement integration system 310 can include a temporal alignment unit 312 to assign time stamps to the raw measurement data 301 based on when the sensor captured the raw measurement data 301, when the raw measurement data 301 was received by the measurement integration system 310, or the like. In some embodiments, the temporal alignment unit 312 can convert a capture time of the raw measurement data 301 provided by the sensors into a time corresponding to the sensor fusion system 300. The measurement integration system 310 can annotate the raw measurement data 301 populated in the environmental model 315 with the time stamps for the raw measurement data 301. The time stamps for the raw measurement data 301 can be utilized by the sensor fusion system 300 to group the raw measurement data 301 in the environmental model 315 into different time periods or time slices. In some embodiments, a size or duration of the time periods or time slices can be based, at least in part, on a refresh rate of one or more sensors in the sensor system. For example, the sensor fusion system 300 can set a time slice to correspond to the sensor with a fastest rate of providing new raw measurement data 301 to the sensor fusion system 300.
Mercep [0075] teaches FIG. 5B illustrates an example flow for comparing sensor measurement data 530 to matchable data 550 in a node of classification graph according to various embodiments. Referring to FIG. 5B, the sensor measurement data 530 may include a set of data points, such as LIDAR points or point cloud, which a sensor fusion system identified as corresponding to a detection event.
converting the accumulated sensor data to motion-compensated sensor data corresponding to a position of the ego-actor at a particular time; Mercep [0038] teaches the measurement integration system 310 can include an ego motion unit 313 to compensate for movement of at least one sensor capturing the raw measurement data 301, for example, due to the vehicle driving or moving in the environment. The ego motion unit 313 can estimate motion of the sensor capturing the raw measurement data 301, for example, by utilizing tracking functionality to analyze vehicle motion information, such as global positioning system (GPS) data, inertial measurements, vehicle odometer data, video images, or the like. The tracking functionality can implement a Kalman filter, a Particle filter, optical flow-based estimator, or the like, to track motion of the vehicle and its corresponding sensors relative to the environment surrounding the vehicle. 
and projecting the motion-compensated sensor data into two-dimensional (2D) image-space to generate the first data representing a LiDAR range image of the perspective view of the environment. Mercep [0039] teaches the ego motion unit 313 can utilize the estimated motion of the sensor to modify the correlation between the measurement coordinate field of the sensor to the environmental coordinate field for the environmental model 315. This compensation of the correlation can allow the measurement integration system 310 to populate the environmental model 315 with the raw measurement data 301 at locations of the environmental coordinate field where the raw measurement data 301 was captured as opposed to the current location of the sensor at the end of its measurement capture.
Claim 16. The method of claim 14, wherein the one or more NN's includes a first stage configured to evaluate the first data representing the perspective view and a second stage configured to evaluate the transformed classification data representing the top-down view. Chen [Figure 1: Multi-View 3D object detection network (MV3D)] (3 stages) 
Claim 17. The method of claim 14, wherein the second data further represents a class label for each of the one or more bounding shapes the one or more objects. Chen [Figure 1: Multi-View 3D object detection network (MV3D)] 
Claim 18. A method comprising: generating, using one or more neural networks (NNs), classification data representing one or more classifications from image data representing an image of a first view of an environment; Chen [Figure 1: Multi-View 3D object detection network (MV3D)] LiDAR Bird View (is interpreted as the first view) 
associating the classification data with corresponding three-dimensional (3D) locations identified from corresponding sensor data to generate labeled sensor data; Mercep [0036] teaches the measurement integration system 310 can include a spatial alignment unit 311 to correlate measurement coordinate fields of the sensors to an environmental coordinate field for the environmental model 315. The measurement integration system 310 can utilize this correlation to convert or translate locations for the raw measurement data 301 within the measurement coordinate fields into locations within the environmental coordinate field. The measurement integration system 310 can populate the environmental model 315 with the raw measurement data 301 based on the correlation between the measurement coordinate fields of the sensors to the environmental coordinate field for the environmental model 315.
Mercep [0038] teaches the measurement integration system 310 can include an ego motion unit 313 to compensate for movement of at least one sensor capturing the raw measurement data 301, for example, due to the vehicle driving or moving in the environment. The ego motion unit 313 can estimate motion of the sensor capturing the raw measurement data 301, for example, by utilizing tracking functionality to analyze vehicle motion information, such as global positioning system (GPS) data, inertial measurements, vehicle odometer data, video images, or the like. The tracking functionality can implement a Kalman filter, a Particle filter, optical flow-based estimator, or the like, to track motion of the vehicle and its corresponding sensors relative to the environment surrounding the vehicle. 
Mercep [0041] teaches the measurement integration system 310 can receive the object list 302 and populate one or more objects from the object list 302 into the environmental model 315 along with the raw measurement data 301. The object list 302 may include one or more objects, a time stamp for each object, and optionally include a spatial metadata associated with a location of objects in the object list 302. For example, the object list 302 can include speed measurements for the vehicle, which may not include a spatial component to be stored in the object list 302 as the spatial metadata. When the object list 302 includes a confidence level associated with an object in the object list 302, the measurement integration system 310 also can annotate the environmental model 315 with the confidence level for the object from the object list 302. 
Mercep [0043] teaches the object detection system 320 can analyze data stored in the environmental model 315 to detect at least one object. The sensor fusion system 300 can populate the environment model 315 with an indication of the detected object at a location in the environmental coordinate field corresponding to the detection. The object detection system 320 can identify confidence levels corresponding to the detected object, which can be based on at least one of a quantity, a quality, or a sensor diversity of raw measurement data 301 utilized in detecting the object. The sensor fusion system 300 can populate or store the confidence levels corresponding to the detected objects with the environmental model 315. For example, the object detection system 320 can annotate the environmental model 315 with object annotations 324 or the object detection system 320 can output the object annotations 324 to the memory system 330, which populates the environmental model 315 with the detected object and corresponding confidence level of the detection in the object annotations 324. 
Mercep [0066] teaches the management system 410 can include an event association unit 411 to determine whether one of the detection events 402 corresponds to one or more detection events 402 previously received by the management system 410. In some embodiments, the event association unit 411 can associate the detection event 402 with the one or more previously received detection events 402 based, at least in part, on their spatial locations relative to each other, sensor update rates, types of sensor measurement data correlated to the detection events 402, a visibility map 405, or the like.
Chen [Introduction] teaches methods based on LIDAR point cloud usually achieve more accurate 3D locations…we propose a Multi-View 3D object detection network (MV3D) which takes multimodal data as input and predicts the full 3D extent of objects in 3D space… Given the multi-view feature representation, the network performs oriented 3D box regression which predict accurate 3D location, size and orientation of objects in 3D space.
Chen [Figure 1: Multi-View 3D object detection network (MV3D)] teaches The network takes the bird’s eye view and front view of LIDAR point cloud as well as an image as input. It first generates 3D object proposals from bird’s eye view map and project them to three views. A deep fusion network is used to combine region-wise features obtained via ROI pooling for each view. The fused features are used to jointly predict object class and do oriented 3D box regression.
Chen [3.2. 3D Proposal Network] teaches we use a multi-task loss to simultaneously classify object/background and do 3D box regression… Examiner interprets the 3D boxes to be the objects or scenery.
projecting the labeled sensor data to a second view of the environment to generate transformed classification data representing the one or more classifications in the second view; Chen [Figure 1: Multi-View 3D object detection network (MV3D)] teaches The network takes the bird’s eye view and front view of LIDAR point cloud as well as an image as input. It first generates 3D object proposals from bird’s eye view map and project them to three views. A deep fusion network is used to combine region-wise features obtained via ROI pooling for each view. The fused features are used to jointly predict object class and do oriented 3D box regression.
Chen [3.3. Region based Fusion Network] teaches given the generated 3D proposals, we can project them to any views in the 3D space. In our case, we project them to three views, i.e., bird’s eye view (BV), front view (FV), and the image plane (RGB). Given a 3D proposal p3D, we obtain ROIs on each view via : (equations 2 and 3)
and generating, using the one or more NNs, second data representing and generating, using the one or more neural networks (NNs), second data representing one or more bounding shapes of one or more objects detected in the environment based at least on the transformed classification data. Chen [3.1. 3D Proposal Network] teaches given a bird’s eye view map. the network generates 3D box proposals from a set of 3D prior boxes. Each 3D box is parameterized by (x; y; z; l; w; h), which are the center and size (in meters) of the 3D box in LIDAR coordinate system. (applied in equation 2)
Chen Figure 6: Qualitative comparisons of 3D detection results: 3D Boxes are projected to the bird’s eye view and the images.
Claim 19. The method of claim 18, wherein the second data further represents one or more class labels of the one or more objects. Banerjee [0097] teaches it is proposed to use the fused RGBD information from the camera and LiDAR sensors (D being the depth information for each pixel in the camera image) for object detection. One approach for object detection is to use machine learning techniques with neural networks where labelled data can be used for training and evaluating the neural network. Labelled data in the context of object recognition is manually labelling the bounding boxes for each object of interest in an image and assigning a class label for each object of interest.
Mercep [Abstract] teaches the computing system can generate a matchable representation of sensor measurement data collected by sensors mounted in a vehicle, and compare the matchable representation of the sensor measurement data to an object model describing a type of an object capable of being located proximate to the vehicle. Based on the comparison, the computing system can classify the sensor measurement data as corresponding to the type of the object based, at least in part, on the comparison of the matchable representation of the sensor measurement data to the object model. A control system for the vehicle can configured to control operation of the vehicle based, at least in part, on the classified type of the object for the sensor measurement data.
Mercep [0048] teaches the object detection system 320 can include a classification system 400 to classify sensor measurement data associated with the detection events 325. In some embodiments, the classification system 400 can assign classifications 327 to the detection events 325 based on the classification of the sensor measurement data associated with the detection events 325. The classifications 327 can correspond to a type of object associated with the detection events 325, such as another vehicle, a pedestrian, a cyclist, an animal, a static object, or the like, which may be identified or pointed to by the hypothesis information 326. The classifications 327 also can include a confidence level associated with the classification and/or include more specific information corresponding to a particular pose, orientation, state, or the like, of the object type. The object detection system 320 can annotate the environmental model 315 with classifications 327 or the object detection system 320 can output the classifications 327 to the memory system 330, which populates the environmental model 315 with the classifications 327. 
Mercep [0058] teaches FIG. 4 illustrates an example classification system 400 in a sensor fusion system according to various embodiments. Referring to FIG. 4, the classification system 400 can include a management system 410 and a graph system 420, which can operate in conjunction to generate a classification 406 for sensor measurement data 401. The classification 406 can identify to a type of object associated with the sensor measurement data 401, such as another vehicle, a pedestrian, a cyclist, an animal, a static object, or the like. The classification 406 also can include a confidence level associated with the identification of the object type and/or include more specific information corresponding to a particular pose, orientation, state, or the like, of the object type.
Claim 20. The method of claim 18, wherein the generating of the second data representing the one or more bounding shapes comprises generating one or more bounding shapes and associated class labels for the objects based on second classification data representing one or more second classifications in the second view and third data representing object instance geometry, the second classification data and the third data extracted by the one or more NNs. Chen [Figure 1: Multi-View 3D object detection network (MV3D)]
Chen [Introduction] teaches methods based on LIDAR point cloud usually achieve more accurate 3D locations…we propose a Multi-View 3D object detection network (MV3D) which takes multimodal data as input and predicts the full 3D extent of objects in 3D space… Given the multi-view feature representation, the network performs oriented 3D box regression which predict accurate 3D location, size and orientation of objects in 3D space.
Mercep [0058] teaches FIG. 4 illustrates an example classification system 400 in a sensor fusion system according to various embodiments. Referring to FIG. 4, the classification system 400 can include a management system 410 and a graph system 420, which can operate in conjunction to generate a classification 406 for sensor measurement data 401. The classification 406 can identify to a type of object associated with the sensor measurement data 401, such as another vehicle, a pedestrian, a cyclist, an animal, a static object, or the like. The classification 406 also can include a confidence level associated with the identification of the object type and/or include more specific information corresponding to a particular pose, orientation, state, or the like, of the object type.
Claims 8, 12 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2018/0314253 A1 to Mercep et al., hereinafter, “Mercep” in view of Multi-View 3D Object Detection Network for Autonomous Driving to Chen et al., hereinafter, “Chen” and in further view of US 2020/0193606 A1 to Douillard et al., hereinafter, “Douillard”.
Claim 8. Chen is silent on claim 12, however Douillard, in the field of analyzing 3D LiDAR data in images, wherein the sensor data represents a LiDAR point cloud, wherein the transformed classification data represents one or more confidence maps in the second view, and the method further comprises: generating third data representing one or more height maps based at least on projecting the LiDAR point cloud into the second view; forming a tensor comprising a first set of one or more channels storing the transformed classification data representing the one or more confidence maps and a second set of one or more channels storing the third data representing the one or more height maps; and extracting, from the tensor using the one or more NNs, second classification data representing one or more second classifications in the second view and fourth data representing object instance geometry of the one or more objects. Douillard [0154-0164]
Douillard [0167 teaches D. The system of any one of example A through example C, wherein the second height is greater than the first height, and wherein the orientation of the rendering plane is selected to substantially maximize the second height associated with the extracted data relative to the rendering plane. 
Douillard [0181] teaches K. The method of any one of example H through example J, wherein: [0182] the extracted data represents a first width and a first height relative to the first perspective; and [0183] wherein the orientation of the rendering plane is selected such that the extracted data represents a second width and a second height relative to the second perspective, and wherein the second width is greater than the first width. 
Douillard [0206] teaches T. A system of example R or example S, wherein: [0207] the extracted data represents a first width and a first height relative to the first perspective; and [0208] wherein the orientation of the rendering plane is selected such that the extracted data represents a second width and a second height relative to the second perspective, and wherein the second width is greater than the first width
Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious to one of ordinary skill in the art to modify extracting, using one or more Neural Networks (NNs), classification data representing one or more classifications of objects or scenery depicted in the first view based at least on the first data by Chen and Mercep with Douillard’s teaching of training the one or more NNs using training data generated using annotation tracking to track an annotated object between two or more frames of corresponding sensor data. One would have been motivated to perform this combination due to the fact that it allows three-dimensional data can be used in computer vision contexts to locate and interact with objects in the physical world (Douillard, [0025]). In combination, Chen is not altered in that Chen continues 3D object detection in autonomous driving. Mercep continues to teach implementing perception in sensor data for an assisted or automated driving system of a vehicle. Douillard's teachings perform the same as they do separately of multi-dimensional data may include data captured by a LIDAR system for use in conjunction with a perception system for an autonomous vehicle.
Therefore one of ordinary skill in the art, such as an individual working in the field of LiDAR data in images could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to claim 8.
Claim 12. Chen is silent on claim 12, however Douillard, in the field of analyzing 3D LiDAR data in images, further comprising training the one or more NNs using training data generated using annotation tracking to track an annotated object between two or more frames of corresponding sensor data. Chen [Abstract] teaches we propose Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes.
Chen [Introduction] teaches 3D object detection plays an important role in the visual perception system of Autonomous driving cars. Modern self-driving cars are commonly equipped with multiple sensors, such as LIDAR and cameras. Laser scanners have the advantage of accurate depth information while cameras preserve much more detailed semantic information. The fusion of LIDAR point cloud and RGB images should be able to achieve higher performance and safety to self-driving cars.
Douillard [0058] teaches the classification module 316 may include functionality to receive segmented data and to identify a type of object represented by the data. For example, the classification module 316 may classify one or more objects, including but not limited to cars, buildings, pedestrians, bicycles, trees, free space, occupied space, street signs, lane markings, etc.
Douillard [0106] teaches the process may include receiving one or more images that have been segmented to create a set of images segmented for free space, while in some instances, the operation 1004 may include receiving indications of one or more objects identified in the segmented images to perform object tracking and/or object motion prediction. At operation 1206, the process can include inputting the images segmented for free space or inputting the identified and/or tracked objects into a planner system, to generate a trajectory for the autonomous vehicle. In some instances, the planner system may be incorporated into a computing system to receive free space segmented images or to receive objects to be tracked and to generate a trajectory based at least in part on the segmented images or tracked objects… In some instances, the trajectory generated in the operation 1206 may constrain the operation of the autonomous vehicle to operate within the free space segmented in the operation 1204, or to avoid objects identified and/or tracked by a planner system of the autonomous vehicle.
Douillard [0110] teaches the process can include applying the three-dimensional segmentation information to a three-dimensional dataset to identify a three-dimensional object. In some instances, the process 1300 may include performing classification on a per-object basis. Thus, the operation 1306 may include identifying, isolating, extracting, and/or segmenting the three-dimensional object from the three-dimensional dataset so that any subsequent processing can be optimized for the particular object.
Claim 13. Douillard further teaches further comprising training the one or more NNs using training data generated using a link between object tracks generated for a particular object from corresponding sensor data from two or more sensors. Douillard [0058] teaches the classification module 316 may include functionality to receive segmented data and to identify a type of object represented by the data. For example, the classification module 316 may classify one or more objects, including but not limited to cars, buildings, pedestrians, bicycles, trees, free space, occupied space, street signs, lane markings, etc.
Douillard [0065] teaches FIG. 4A depicts a side view 400 of an example vehicle 402 having multiple sensor assemblies mounted to the vehicle 402. In some instances, datasets from the multiple sensor assemblies can be combined or synthesized to form a meta spin (e.g., LIDAR data representing a plurality of LIDAR sensors) or can be combined or fused using sensor fusion techniques to improve an accuracy or processing for segmentation, classification, prediction, planning, trajectory generation, etc.
Douillard [0070] teaches at operation 502, the process may include receiving three-dimensional data. In some instances, the three-dimensional data may include LIDAR data from one or more sensors. In some instances, the three-dimensional data may include fused sensor data including LIDAR data and Radar data.
[0090] teaches the segmentation data 1008 can be applied to the three-dimensional LIDAR data 1006 to identify and isolate data on a per object basis. For example, the LIDAR data 1006 may only include LIDAR data associated with a particular segmentation ID, or may include only the LIDAR data inside a three dimensional bounding box output from the segmentation algorithm.
Douillard [0105] teaches the converted data is input into a convolutional neural network that is trained to segment images based on free space (e.g., drivable or navigable space) in the input image. In some instances, the convolutional neural network may identify objects to be tracked by various systems of the autonomous vehicle. As an example, the converted data may be generated from an image capture system (e.g., a perception system) onboard an autonomous vehicle. In some instances, the image capture system may include any number of sensors, including but not limited to image sensors, LIDAR, radar, etc.
Douillard [0106] teaches the process may include receiving one or more images that have been segmented to create a set of images segmented for free space, while in some instances, the operation 1004 may include receiving indications of one or more objects identified in the segmented images to perform object tracking and/or object motion prediction. At operation 1206, the process can include inputting the images segmented for free space or inputting the identified and/or tracked objects into a planner system, to generate a trajectory for the autonomous vehicle. In some instances, the planner system may be incorporated into a computing system to receive free space segmented images or to receive objects to be tracked and to generate a trajectory based at least in part on the segmented images or tracked objects… In some instances, the trajectory generated in the operation 1206 may constrain the operation of the autonomous vehicle to operate within the free space segmented in the operation 1204, or to avoid objects identified and/or tracked by a planner system of the autonomous
Douillard [0110] teaches the process can include applying the three-dimensional segmentation information to a three-dimensional dataset to identify a three-dimensional object. In some instances, the process 1300 may include performing classification on a per-object basis. Thus, the operation 1306 may include identifying, isolating, extracting, and/or segmenting the three-dimensional object from the three-dimensional dataset so that any subsequent processing can be optimized for the particular object.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds to Zhou et al. [3.2 Feature Fusion] 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DELOMIA L GILLIARD whose telephone number is (571)272-1681.  The examiner can normally be reached on 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on 571 272-8243.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/DELOMIA L GILLIARD/Primary Examiner, Art Unit 2661