DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This Office Action is responsive to the amendment received 29 November 2021.

Claims 1-20 are as originally presented.

The amendment of the paragraph [0022] of the specification is accepted and entered into the record.

Applicant’s arguments, see pages 9 and 10, filed 29 November 2021, with respect to the rejection(s) of claim(s) 1-20 under 35 U.S.C. § 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Kong et al. which teaches the multi-resolution feature extraction and combining methods to provide robust features to drive LiDAR data tailored neural network object detection architectures.  The newly cited Zhou reference provides teachings of voxel based methods that address features of claims 3, 4, 10 and 11 which were formerly viewed as allowable subject matter. Paragraph [0023] of the instant application provides Applicant Admitted Prior ART (AAPA) that also covers the same voxel and “pixel in a feature layer” features of the claims.

Applicant Admitted Prior Art (AAPA)
The Examiner notes that the specification of the instant application allows for any method to be used to implement the various processes described in the specification.  These processes are the main elements of the claimed invention and are subject to rejection as applicant admitted prior art.
The following are the key points are considered Applicant Admitted Prior Art (AAPA) and where AAPA is used in a rejection, the reader is referred to this section and the specification of the instant application.  See MPEP 2129.
As described in paragraphs from the as filed specification:
[0021] - The image sensor may be any known or yet-to-be-developed image sensor.

[0022] - Any known-or-yet- to-be-developed method of generating the pseudo-LiDAR point cloud 12 from the image data 10A and the depth data 10B may be utilized.

[0023] - Any known or yet-to-be-developed method of generating the bird's eye view map may be utilized. Non-limiting examples include PointPillars and VoxelNet.

[0030] - The first bird's eye view map 16A may be combined with the second bird's eye view map 16B (and any additional bird's eye view maps) by any method.

[0032] It is noted that the features of the multiple bird's eye view maps may be added or otherwise combined by a linear function or a non-linear function. As another non-limiting example, a neural network may be utilized to learn the best combination of features for a particular application such that the resulting combined bird's eye view map includes some features from each individual bird's eye view map and/or combinations of features from the individual bird's eye view maps. The neural network may be any network that learns end-to-end with a detector the best combination of features for the final task (e.g., object detection).

[0037] - The pseudo-LiDAR point cloud data is used to generate a bird's eye view map at a resolution and having one or more bird's eye view map layers at block 206. Any known or yet- to-be-developed method for generating the bird's eye view map may be used. methods to generate the bird's eye view map include PointPillars and VoxelNet. The bird's eye view map may have any number of layers, and may extract any number and type of features.

[0040] At block 214, the combined bird's eye view map is provided to an object detection algorithm. Any known or yet-to-be-developed object detection algorithm may be utilized to detect objects represented by the combined bird's-eye-view map. Example non-limiting object detection algorithms include retinaNet and a fully convolutional one-stage object detector (FCOS).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 2, 8, 9 and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (“Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving,” already of record) and in view of Kong et al. (HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection,”).

Regarding claim 1 (Original), Wang teaches a method for multi-resolution fusion of pseudo-LiDAR features (Wang; p 8447, col. 1, second paragraph; two-step approach by first estimating the dense pixel depth from stereo (or even monocular) imagery and then back-projecting pixels into a 3D point cloud resulting in a pseudo-LiDAR representation of the imaged scene), the method comprising: receiving image data from one or more image sensors (Wang; p 8447, fig. 2; first box on the left of the figure; col. 1, last full paragraph; received input is captured from a pair of cameras with a horizontal offset (i.e., baseline) b); generating a point cloud from the image data (Wang; p 8447, col. 1, second paragraph; two-step approach by first estimating the dense pixel depth from stereo (or even monocular) imagery and then back-projecting pixels into a 3D point cloud resulting in a pseudo-LiDAR representation of the pseudo-LiDAR representation of the imaged scene); generating, from the point cloud, a first bird's eye view map having a first resolution (Wang; p. 8448, col. 1, paragraph b; from a Bird’s Eye View (BEV). In particular, the 3D information (pseudo-LiDAR representation) is converted into a 2D image from the top-down view: width and depth become the spatial dimensions, and height is recorded in the channels); but does not teach generating, from the point cloud, a second bird's eye view map having a second resolution; and generating a combined bird's eye view map by combining features of the first bird's eye view map with features from the second bird's eye view map.
Kong, working in the same field of endeavor, however, teaches generating, from the point cloud, a second bird's eye view map having a second resolution (Kong; p. 847, fig. 2, network for Hyper Feature Extraction; conv 1-conv 5 generate there feature maps at three different resolutions equivalent to fig. 4 of the instant application); and generating a combined bird's eye view map by combining features of the first bird's eye view map with features from the second bird's eye view map (Kong; p. 12699, fig. 2, network for Hyper Feature Extraction; conv 1-conv 5 generate there feature maps at three different resolutions equivalent to fig. 4 of the instant application; p. 847, section 3.1) for the benefit of providing proper fusion of coarse-to fine CNN features which is more suitable for object region proposal generation and detection.
It would have been obvious to one of ordinary skill in the art prior to the effective date of the filing of the invention to have combined the multiple resolution feature maps as taught by Kong with the method for multi-resolution fusion of pseudo-LiDAR features as taught by Wang 

In regard to claim 2 (Original), Wang in view of Kong teach the method of claim 1 and further teach the method as further comprising generating, from the point cloud, one or more additional bird's eye view maps having one or more additional resolutions (Kong; p. 847, fig. 2, network for Hyper Feature Extraction; conv 1-conv 5 generate there feature maps at three different resolutions equivalent to fig. 4 of the instant application; p. 847, section 3.1), wherein generating the combined bird's eye view map further comprises combining features of the one or more additional bird's eye view maps with the features of the first bird's eye view map and the second bird's eye view map (Kong; p. 847, fig. 2, network for Hyper Feature Extraction; conv 1-conv 5 generate there feature maps at three different resolutions equivalent to fig. 4 of the instant application; p. 847, section 3.1).

Regarding claim 8 (Original); Wang in view of Kong teach the following as shown in the rejection of claim 1 above {A method of detecting an object, the method comprising: generating a bird's eye view of pseudo-LiDAR data by: receiving image data from one or more image sensors; generating a point cloud from the image data, wherein the point cloud comprises pseudo-LiDAR data; generating, from the point cloud, a first bird's eye view map having a first resolution; generating, from the point cloud, a second bird's eye view map having a second resolution; and generating a combined bird's eye view map by combining features of the first bird's eye view map with features from the second bird's eye view map;} and further teach detecting, using an object detection algorithm (Wang; fig. 2, right hand side showing Object  detection; fig. 4).

Regarding Claim 9 (Original), Wang in view of Kong teach the method of claim 8, further comprising generating, from the point cloud, one or more additional bird's eye view maps having one or more additional resolutions, wherein generating the combined bird's eye view map further comprises combining features of the one or more additional bird's eye view maps with the features of the first bird's eye view map and the second bird's eye view map (Kong; p. 12699, fig. 2, network for Hyper Feature Extraction; conv 1-conv 5 generate there feature maps at three different resolutions equivalent to fig. 4 of the instant application; p. 847, section 3.1).

Regarding claim 15 (Original), Wang and Kong teach the method of claim 8 and further teach wherein the object detection algorithm comprises an object detection neural network (Wang; fig. 2, right hand side showing LiDAR Based Detection; page 8449; section 3D Object detection – two different neural net detection algorithms are disclosed; fig. 4).

Regarding Claim 16, Wang in view of Kong teaches or suggests a vehicle ("Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving;" Wang Title) comprising: one or more image sensors that produce image data (Wang discloses the use of image sensors the capture monocular or stereo imagery data, stating "3D object detection is an essential task in autonomous driving. Recent techniques excel with highly accurate detection rates, provided the 3D input data is obtained from precise but 

Regarding claim 17 (Original), Wang in view of Kong teach the vehicle of claim 16 and further teach wherein the computer readable instructions further cause the one or more processors to control a movement of the vehicle based at least in part on the detected one or more objects ("Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving;” Wang Title and introduction).

Claims 3, 4, 10 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (“Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving,” already of record) as applied to claims 1, 2, 8, 9 and 15-17 above, and in view of Kong et al. (HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection”) as applied to claim 1, 2, 8, 9 and 15-17 above, and further in view of Zhou (“VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection”).

Regarding claim 3 (Original), Wang in view of Kong teach the method of claim 1 but do not teach wherein: generating the first bird's eye view map comprises: subdividing the pseudo-LiDAR data into a plurality of voxels having a first volume that is determined by the first resolution; and generating an array of first cells comprising one or more layers by extracting one or more features from the pseudo-LiDAR data within the plurality of voxels, wherein each layer of the one or more layers comprises a plurality of pixels, each pixel comprises an individual feature extracted from an individual voxel, and a number of layers equals a number of features extracted from the pseudo-LiDAR data; generating the second bird's eye view map comprises: subdividing the pseudo-LiDAR data into a plurality of voxels having a second volume that is determined by the first resolution; and generating an array of second cells comprising one or more layers by extracting one or more features from the pseudo-LiDAR data within the plurality of voxels, wherein each layer of the one or more layers comprises a plurality of pixels, each pixel comprises an individual feature extracted from an individual voxel, and a number of layers equals a number of features extracted from the pseudo-LiDAR data.
Zhou, working in the same field of endeavor, however, teaches wherein: generating the first bird's eye view map comprises: subdividing the pseudo-LiDAR data into a plurality of voxels having a first volume that is determined by the first resolution (Zhou; p.4491; fig. 2, feature learning network expanded at the bottom row where the features are represented as feature vectors at a layer of the first resolution); and generating an array of first cells comprising one or more layers by extracting one of the one or more layers comprises a plurality of pixels (Zhou; p.4491; fig. 2, feature learning network expanded at the bottom row where the features are represented as feature vectors at additional layers and differing resolutions), each pixel comprises an individual feature extracted from an individual voxel (Zhou; p.4491; fig. 2, bottom 
It would have been obvious to one of ordinary skill in the art prior to the effective date of the filing of the invention to have combined the teaching of Zhou of stacked multiple resolution feature maps combined with the methods for multi-resolution fusion of pseudo-LiDAR features as taught by Wang in view of Kong for the benefit of providing proper fusion of coarse-to fine CNN features which is more suitable for object region proposal generation and detection.

In regard to claim 4 (Original), Wang, Kong and Zhou further teach wherein the generating of the combined bird's eye view map comprises combining the one or more features from each first cell with the one or more features from each second cell that align with first cells (Zhou; figs. 2 and 3; pp 4490-4491; p.4492; column 2, Stacked Voxel Encoding section).

In regard to claim 10 (Original), Wang in view of Kong teach the method of claim 8 and further in view of Zhou teach the features of claim 10 as shown in the rejection of claim 3 above.

Claim 11 is rejected as for claim 4 above.

Claims 5-7, 12-14 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (“Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving,” already of record) as applied to claims 1-4, 8-11 and 15-17 above, and in view of Kong et al. (HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection”) as applied to claims 1-4, 8-11 and 15-17 above, and further in view of AAPA.

Regarding claim 5 (Original), Wang and Kong teach the method of claim 1 and AAPA teaches wherein the features of the first bird's eye view map are combined with the features from the second bird's eye view map by an addition function ([0030] - The first bird's eye view map 16A may be combined with the second bird's eye view map 16B (and any additional bird's eye view maps) by any method).

Regarding claim 6 (Original), Wang and Kong teach the method of claim 1 and AAPA teaches wherein the features of the first bird's eye view map are combined with the features from the second bird's eye view map by a neural network ([0030] - The first bird's eye view map 16A may be combined with the second bird's eye view map 16B (and any additional bird's eye view maps) by any method).

Regarding claim 7 (Original), Wand and Kong teach the method of claim 1 and AAPA teaches wherein the features of the first bird's eye view map are combined with the features from the second bird's eye view map by concatenation ([0030] - The first bird's eye view map 16A may be combined with the second bird's eye view map 16B (and any additional bird's eye view maps) by any method) (Kong; fig. 3; p 4493).

In regard to claim 12 (Original), Wang and Kong teach the method of claim 8 and AAPA teaches wherein the features of the first bird's eye view map are combined with the features from the second bird's eye view map by an addition function ([0030] - The first bird's eye view map 16A may be combined with the second bird's eye view map 16B (and any additional bird's eye view maps) by any method).

Regarding claim 13 (Original), Wang in view of Kong teaches the method of claim 8 and AAPA teaches wherein the features of the first bird's eye view map are combined with the features from the second bird's eye view map by a neural network ([0030] - The first bird's eye 

In regard to claim 14 (Original), Wang in view of Kong teach the method of claim 8 and AAPA teaches wherein the features of the first bird's eye view map are combined with the features from the second bird's eye view map by concatenation ([0030] - The first bird's eye view map 16A may be combined with the second bird's eye view map 16B (and any additional bird's eye view maps) by any method) (Kong; fig. 3; p 4493).

In regard to claim 18 (Original), Wang in view of Kong teach the vehicle of claim 16 and AAPA teaches wherein the features of the first bird's eye view map are combined with the features from the second bird's eye view map by an addition function (([0030] - The first bird's eye view map 16A may be combined with the second bird's eye view map 16B (and any additional bird's eye view maps) by any method)).

Regarding claim 19 (Original), Wang in view of Kong teach the vehicle of claim 16 and AAPA teaches wherein the features of the first bird's eye view map are combined with the features from the second bird's eye view map by a neural network ([0030] - The first bird's eye view map 16A may be combined with the second bird's eye view map 16B (and any additional bird's eye view maps) by any method).

In regard to claim 20 (Original), Wang in view of Kong teaches the vehicle of claim 16 and AAPA teaches the features of the first bird's eye view map are combined with the features 

Response to Arguments
Applicant’s arguments with respect to claims 1-20 have been considered but are moot because the arguments apply to the previous art rejection presented in the Non-Final Rejection, mailed 31 August 2021, and do not reflect the current combination of references and citations used in the current prior art rejection which includes a new grounds of rejection as presented above.

The Examiner respectfully requests that Applicant look to the Office Action provided above wherein the claims have now been examined and addressed with a new grounds of rejection and, in particular, the newly cited prior art references of Kong and Zhou are now relied upon for showing features not explicitly taught by the Wang primary reference.

Independent claims 1, 8 and 16 are rejected as shown in the first claim rejection section above and are argued as shown immediately above.

Dependent claims 2-7, 9-15 and 17-20 are rejected for being dependent upon a rejected base claim and for the additional features that they add as shown in the claim rejection sections above.

Conclusion
The following prior art, made of record, was not relied upon but is considered pertinent to applicant's disclosure:
US 20210342609 A1	Top-Down Object Detection from Lidar Point Clouds – A deep neural network(s) (DNN) may be used to detect objects from sensor data of a three dimensional (3D) environment. For example, a multi-view perception DNN may include multiple constituent DNNs or stages chained together that sequentially process different views of the 3D environment. An example DNN may include a first stage that performs class segmentation in a first view (e.g., perspective view) and a second stage that performs class segmentation and/or regresses instance geometry in a second view (e.g., top-down). The DNN outputs may be processed to generate 2D and/or 3D bounding boxes and class labels for detected objects in the 3D environment. As such, the techniques described herein may be used to detect and classify animate objects and/or parts of an environment, and these detections and classifications may be provided to an autonomous vehicle drive stack to enable safe planning and control of the autonomous vehicle.  See at least figure 10 and the associated disclosure.

US 11100669 B1	Multimodal three-dimensional object detection – A method includes obtaining surface samples that represent three-dimensional locations of surfaces of an environment; generating a voxelized representation of the surfaces of the environment in three-dimensional space using the surface samples; obtaining an image that shows the surfaces of the environment; associating each of the surface samples with image information that corresponds to a portion of the image that is spatially correlated with a respective one of the surface samples; determining voxel features for voxels from the voxelized representation based on the surface samples and the image information using a first trained machine learning model, wherein the voxel features each describe three-dimensional shapes present within a respective one of the voxels; and detecting objects based on the voxel features.  LiDAR sensors enable accurate localization of objects in three-dimensional space. Methods of detecting objects using LiDAR sensor outputs (or other three-dimensional sensor outputs) typically rely on converting a three-dimensional point cloud into a two-dimensional feature representation, such as a depth map or a bird's eye view map. Two-dimensional methods, such as two-dimensional CNN-based A recently developed three-dimensional object detection network architecture, referred to herein as VoxelNet, addresses the memory usage limitations associated with processing a voxelized representation of a point cloud by encoding the voxels using stacks of voxel feature encoding (VFE) layers. By voxelization and encoding, VoxelNet enables the use of three-dimensional region proposal networks for detection. The systems and methods described herein expand on these techniques to use multiple modalities. For example, images provide dense texture information that can be combined with three-dimensional sensing modalities to improve detection performance.

Beltran et al.	"BirdNet: a 3D Object Detection Framework from LiDAR Information" – Understanding driving situations regardless the conditions of the traffic scene is a cornerstone on the path towards autonomous vehicles; however, despite common sensor setups already include complementary devices such as LiDAR or radar, most of the research on perception systems has traditionally focused on computer vision. We present a LiDAR based 3D object detection pipeline entailing three stages. First, laser information is projected into a novel cell encoding for bird’s eye view projection. Later, both object location on the plane and its heading are estimated through a convolutional neural network originally designed for image processing. Finally, 3D oriented detections are computed in a post-processing phase. Experiments on KITTI dataset show that the proposed framework achieves state-of-the-art results among comparable methods. Further tests with different LiDAR sensors in real scenarios assess the multi-device capabilities of the approach.

Zhongyang et al.	"Classification of LiDAR Point Cloud based on Multiscale Features and PointNet" – Aiming at classifying the feature of LiDAR point cloud data in complex scenario, this paper proposed 

Seferbekov et al.	"Feature Pyramid Network for Multi-Class Land Segmentation" – See Figure 2 for a multi-resolution feature encoder. Feature pyramid network with Resnet50 encoder pre-trained on ImageNet. As an input, we have an RGB image. The number of channels increases stage by stage on the left part of the scheme while the size of the feature maps decreases stage by stage. The arrows on top show transformations implemented between the layers. In the final step, feature maps upsample to the same size and concatenated. Then, the number of channels decreases to the number of classes, and the resulting image is upsampled to the original image size. FCN has been further improved and now known as UNet and Feature Pyramid (FPN) neural networks [18, 13, 12]. We approach the problem of multi-class land segmentation using FPN. FPN uses a pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. It consisted of bottom-up and top-down pathways. For the bottom-up 

Simonyan et al.	"Very Deep Convolutional Networks for Large-Scale Image Recognition" – In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3×3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16–19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localization and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Edward Martello whose telephone number is (571) 270-1883.  The examiner can normally be reached on M-F 7:30-5:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on (571) 272-7761.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.





/EDWARD MARTELLO/
Primary Examiner, Art Unit 2613