Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
CLAIM INTERPRETATION
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a LiDAR data processing module”, “a temporal fusion processing module” and “a camera data processing module” in claim 8.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.



Examiner's Amendment/Statement
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an email by Ryan O'Donnell on 7/4/2022 at 12:45pm PT.
The application has been amended as follows:
In claim 11, line 1, delete “A computer storage medium” and insert in its place –A non-transitory computer storage medium -.

Allowable Subject Matter
Claims 1-10 are allowed.
Claim 11 is allowed subject to the above examiner’s amendment.
The following is an examiner’s statement of reasons for allowance:
The prior art fails to teach Claims 1-7 and 11, alone or in reasonable combination, which specifically comprise the following limitations (in consideration of the claim as a whole):  
determining a previous frame fused hidden state, and performing a temporal fusion process based on the previous frame fused hidden state and the current frame LiDAR three-dimensional embeddings to generate a current frame temporary hidden state and a current frame output result, wherein the previous frame fused hidden state is generated based on a previous frame temporary hidden state and previous frame camera three-dimensional embeddings; and 
obtaining current frame camera data, determining current frame camera three- dimensional embeddings of the current frame camera data, and generating a current frame fused hidden state based on the current frame camera three-dimensional embeddings and the current frame temporary hidden state, wherein the current frame fused hidden state is used for a process of performing a temporal fusion process based on the current frame fused hidden state and next frame LiDAR three- dimensional embeddings to generate a next frame temporary hidden state. 
The prior art fails to teach Claims 8-10, alone or in reasonable combination, which specifically comprise the following limitations (in consideration of the claim as a whole):  
a temporal fusion processing module, configured to determine a previous frame fused hidden state, and perform a temporal fusion process based on the previous frame fused hidden state and the current frame LiDAR three-dimensional embeddings to generate a current frame temporary hidden state and a current frame output result, wherein the previous frame fused hidden state is generated based on a previous frame temporary hidden state and previous frame camera three- dimensional embeddings; and 
a camera data processing module, configured to obtain current frame camera data, determine current frame camera three-dimensional embeddings of the current frame camera data, and generate a current frame fused hidden state based on the current frame camera three-dimensional embeddings and the current frame temporary hidden state, wherein the current frame fused hidden state is used for a process of performing a temporal fusion process based on the current frame fused hidden state and next frame LiDAR three-dimensional embeddings to generate a next frame temporary hidden state. 
The closest prior art, Hu et al. ("A Soft Time Synchronization Framework for Multi-Sensors in Autonomous Localization and Navigation," 2018 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), 2018, pp. 694-699, doi: 10.1109/AIM.2018.8452384) reveals a similar system and technique (see Abstract, Figs. 1 & 4 as copied in the following) for synchronizing sensor data obtained from different sensors such as color camera and LIDAR to achieve low latency and low synchronization error, but fails to anticipate or render obvious, either singularly or in combination with the other cited references, the above limitations (as combined with the other claimed limitations).

    PNG
    media_image1.png
    415
    597
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    348
    613
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    210
    575
    media_image3.png
    Greyscale

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Fasano et al., "Multi-sensor data fusion: A tool to enable UAS integration into civil airspace," 2011 IEEE/AIA A 30th Digital Avionics Systems Conference, 2011, pp. 5C3-1-5C3-15, doi: 10.1109/DASC.2011.6096082.

    PNG
    media_image4.png
    260
    337
    media_image4.png
    Greyscale
(Abstract)
Sengupta et al., "A DNN-LSTM based Target Tracking Approach using mmWave Radar and Camera Sensor Fusion," 2019 IEEE National Aerospace and Electronics Conference (NAECON), 2019, pp. 688-693.

    PNG
    media_image5.png
    333
    708
    media_image5.png
    Greyscale


Wei et al., ‘‘LiDAR and camera detection fusion in a real-time industrial multi-sensor collision avoidance system,’’ Electronics, vol. 7, no. 6, p. 84, May 2018, doi: 10.3390/electronics7060084.

    PNG
    media_image6.png
    364
    606
    media_image6.png
    Greyscale

Wang et al., "Multi-sensor fusion in automated driving: A survey." Ieee Access 8 (2019): 2847-2868.

    PNG
    media_image7.png
    342
    869
    media_image7.png
    Greyscale


Wang et al., "Integrating millimeter wave radar with a monocular vision sensor for on-road obstacle detection applications." Sensors (Basel). 2011;11(9):8992-9008. doi: 10.3390/s110908992. Epub 2011 Sep 21. PMID: 22164117; PMCID: PMC3231508.

    PNG
    media_image8.png
    280
    605
    media_image8.png
    Greyscale


    PNG
    media_image9.png
    178
    527
    media_image9.png
    Greyscale

Gale Bagi et al., "Sensing Structure for Blind Spot Detection System in Vehicles," 2019 International Conference on Control, Automation and Information Sciences (ICCAIS), 2019, pp. 1-6, doi: 10.1109/ICCAIS46528.2019.9074580.

    PNG
    media_image10.png
    395
    586
    media_image10.png
    Greyscale

De Silva et al., "Robust fusion of LiDAR and wide-angle camera data for autonomous mobile robots." Sensors 18, no. 8 (2018): 2730.

    PNG
    media_image11.png
    350
    594
    media_image11.png
    Greyscale

Liu et al., "Road Segmentation with Image-LiDAR Data Fusion." arXiv preprint arXiv:1905.11559 (2019).

    PNG
    media_image12.png
    523
    705
    media_image12.png
    Greyscale

Khan et al., "Towards self-supervised high level sensor fusion." arXiv preprint arXiv:1902.04272 (2019).

    PNG
    media_image13.png
    271
    518
    media_image13.png
    Greyscale


    PNG
    media_image14.png
    591
    533
    media_image14.png
    Greyscale

Banerjee et al., "Online Camera LiDAR Fusion and Object Detection on Hybrid Data for Autonomous Driving," 2018 IEEE Intelligent Vehicles Symposium (IV), 2018, pp. 1632-1638, doi: 10.1109/IVS.2018.8500699.

    PNG
    media_image15.png
    237
    711
    media_image15.png
    Greyscale


Kumar et al., "LiDAR and camera fusion approach for object distance estimation in self-driving vehicles." Symmetry 12, no. 2 (2020): 324. Feb. 24, 2020.

    PNG
    media_image16.png
    305
    596
    media_image16.png
    Greyscale


    PNG
    media_image17.png
    210
    435
    media_image17.png
    Greyscale

Zhang et al., "Vehicle detection based on LiDAR and camera fusion." In 17th IEEE International Conference on Intelligent Transport Systems (ITSC), Special Session on Advanced Vehicle Active Safety Systems. 2014.

    PNG
    media_image18.png
    404
    512
    media_image18.png
    Greyscale

Xue et al. (WO 2021051220 A1): A point cloud fusion method, device, and system, and a storage medium. The method comprises: acquiring a depth map of a current image frame (S201); acquiring a three-dimensional point cloud corresponding to a first adjacent image frame adjacent to the position of the current image frame (S202); and projecting the three-dimensional point cloud corresponding to the first adjacent image frame onto the depth map of the current image frame, and performing a fusion processing of the three-dimensional point cloud corresponding to the first adjacent image frame and the three-dimensional point cloud corresponding to the current image frame (S203). The present invention can increase the efficiency and effectiveness of point cloud fusion and reduce the resource occupancy rate. (Abstract)
Xu et al. (US 20200082614 A1): The present invention generally relates to generating a three-dimensional representation of a physical environment, which includes dynamic scenarios. An exemplary device comprises one or more processors; a memory; and one or more programs that includes instructions for: obtaining a set of three-dimensional data of the physical environment, wherein the three-dimensional data is associated with a first sensing device; based on the set of three-dimensional data, generating a three-dimensional representation of the physical environment; obtaining a set of two-dimensional data of the physical environment, wherein the set of two-dimensional data is associated with a second sensing device and wherein the set of two-dimensional data comprises information of a dynamic object in the physical environment; generating an alignment between the three-dimensional representation of the physical environment and the set of two-dimensional data; and based on the alignment, obtaining a set of three-dimensional information associated with the dynamic object. (Abstract)
Luo et al. (US 20190066344 A1): A method of localization for a non-transitory computer readable storage medium storing one or more programs is disclosed. The one or more programs comprise instructions, which when executed by a computing device, cause the computing device to perform by one or more autonomous vehicle driving modules execution of processing of images from a camera and data from a LiDAR using the following steps comprising: voxelizing a 3D submap and a global map into voxels; estimating distribution of 3D points within the voxels, using a probabilistic model; extracting features from the 3D submap and the global map; and classifying the extracted features into classes. (Abstract)

    PNG
    media_image19.png
    416
    478
    media_image19.png
    Greyscale

Xu (US 20210366202 A1): An interactive method for a movable platform, an interactive system, a movable platform and a storage medium including the interactive method. The interactive method may include projecting three-dimensional point cloud data collected by a sensor into image data collected by a camera for fusion processing to obtain a fused image; rendering the fused image to determine a three-dimensional visualization image of a surrounding environment where the movable platform is located; and outputting the three-dimensional visualization image of the surrounding environment where the movable platform is located on a display interface. (Abstract)
Wekel et al. (US 20210063578 A1): In various examples, a deep neural network (DNN) may be used to detect and classify animate objects and/or parts of an environment. The DNN may be trained using camera-to-LiDAR cross injection to generate reliable ground truth data for LiDAR range images. For example, annotations generated in the image domain may be propagated to the LiDAR domain to increase the accuracy of the ground truth data in the LiDAR domain—e.g., without requiring manual annotation in the LiDAR domain. Once trained, the DNN may output instance segmentation masks, class segmentation masks, and/or bounding shape proposals corresponding to two-dimensional (2D) LiDAR range images, and the outputs may be fused together to project the outputs into three-dimensional (3D) LiDAR point clouds. This 2D and/or 3D information output by the DNN may be provided to an autonomous vehicle drive stack to enable safe planning and control of the autonomous vehicle. (Abstract)

    PNG
    media_image20.png
    484
    729
    media_image20.png
    Greyscale

Groh et al. (US 20180060725 A1): A machine learning module may generate a probability distribution from training data including labeled modeling data correlated with reflection data. Modeling data may include data from a LIDAR system, camera, and/or a GPS for a target environment/object. Reflection data may be collected from the same environment/object by a radar and/or an ultrasonic system. The probability distribution may assign reflection coefficients for radar and/or ultrasonic systems conditioned on values for modeling data. A mapping module may create a reflection model to overlay a virtual environment assembled from a second set of modeling data by applying the second set to the probability distribution to assign reflection values to surfaces within the virtual environment. Additionally, a test bench may evaluate an algorithm, for processing reflection data to generate control signals to an autonomous vehicle, with simulated reflection data from a virtual sensor engaging reflection values assigned within the virtual environment. (Abstract)

    PNG
    media_image21.png
    469
    592
    media_image21.png
    Greyscale


Yang et al. (US 20210004613 A1): According to an aspect of an embodiment, a method may include obtaining multiple sets of camera images and light detection and ranging (LIDAR) point clouds along a track within a geographic sector of a map. The method may include applying a learning model to the camera images to characterize objects within the camera images within classes of objects to generate segmented images. The method may additionally include mapping the sets of camera images and the LIDAR point clouds to three dimensional points of the geographic sector of the map. The method may also include projecting the three dimensional points onto the segmented images to obtain corresponding classes for the three dimensional points of the geographic sector of the map. (Abstract)

    PNG
    media_image22.png
    461
    776
    media_image22.png
    Greyscale

Wang et al. (US 20200082554 A1): The embodiments of the present disclosure provide a method for processing 3D data, an apparatus for processing 3D data, a device and a storage medium. The method includes: acquiring three-dimensional point cloud data to be processed; mapping the three-dimensional point cloud data into a two-dimensional image space to obtain a first depth map; performing an expansion processing on the first depth map after changing an occlusion relation between target objects contained in the first depth map; and changing the occlusion relation between the target objects contained in the first depth map subject to the expansion processing to obtain a second depth map matched to the three-dimensional point cloud data. (Abstract)

    PNG
    media_image23.png
    308
    509
    media_image23.png
    Greyscale

Van Baar et al. (US 20140368615 A1): To generate a pixel-accurate depth map, data from a range-estimation sensor (e.g., a time-of flight sensor) is combined with data from multiple cameras to produce a high-quality depth measurement for pixels in an image. To do so, a depth measurement system may use a plurality of cameras mounted on a support structure to perform a depth hypothesis technique to generate a first depth-support value. Furthermore, the apparatus may include a range-estimation sensor which generates a second depth-support value. In addition, the system may project a 3D point onto the auxiliary cameras and compare the color of the associated pixel in the auxiliary camera with the color of the pixel in reference camera to generate a third depth-support value. The system may combine these support values for each pixel in an image to determine respective depth values. Using these values, the system may generate a depth map for the image. (Abstract)
Kauffmann et al. (US 20190318178 A1): A method for obtaining 3D information of objects shown in at least two images obtained by at least two on-vehicle sensors, a device for carrying out the respective steps of the method, a system including such a device, and a vehicle including such a device or such a system are described. (Abstract)
Jiang (US 20210012165 A1): A data processing method, device and multi-sensor fusion method for multi-sensor fusion, which can group data captured by different sensors in different probe dimensions to simultaneous interpreting deep learning data based on pixel elements in the multi-dimensional matrix structure, thereby realize the more effective data mining and feature extraction to support more effective ability of environment perception and target detection. (Abstract)
Zhu et al. (US 20150310281 A1): Methods and systems for object detection using multiple sensors are described herein. In an example embodiment, a vehicle's computing device may receive sensor data frames indicative of an environment at different rates from multiple sensors. Based on a first frame from a first sensor indicative of the environment at a first time period and a portion of a first frame that corresponds to the first time period from a second sensor, the computing device may estimate parameters of objects in the vehicle's environment. The computing device may modify the parameters in response to receiving subsequent frames or subsequent portions of frame of sensor data from the sensors even if the frames arrive at the computing device out of order. The computing device may provide the parameters of the objects to systems of the vehicle for object detection and obstacle avoidance. (Abstract)
Nehmadi et al. (US 20160292905 A1): A method and apparatus for generating 3D-maps for acquiring three-dimensional (3D) maps are presented. The method includes analyzing at least one image acquired by a passive sensor to identify a plurality of objects in the at least one image; classifying the plurality of objects; determining, based on the classification, whether to passively measure a distance to each of the plurality of objects; passively measuring the distance to at least one of the plurality of objects based on the determination; actively measuring a distance to some of the plurality of objects, wherein the distance to one of the same of the plurality of objects is actively measured when the distance to the object cannot be passively measured; and generating a 3D map of a scene based on the distance measured to each of the plurality of objects. (Abstract)
Shu et al. (US 20210012527 A1): An image processing method is provided for an electronic device. The method includes obtaining a target image comprising a target object; recognizing target two-dimensional location coordinates of the target object in the target image and a target attribute type corresponding to the target object; and obtaining a target three-dimensional point cloud associated with the target image. The method also include, according to a mapping relationship between the target three-dimensional point cloud and all pixels in the target image, obtaining three-dimensional location coordinates corresponding to pixels in the target two-dimensional location coordinates, as target three-dimensional location coordinates; determining a setting region in three-dimensional map data according to the target three-dimensional location coordinates; and setting the target attribute type for the target object in the setting region. Electronic device and non-transitory computer-readable storage medium counterparts are also contemplated. (Abstract)
Pacala (US 10809380 B2): Methods and systems can augment 360 degree panoramic LIDAR results (e.g., from a spinning LIDAR system) with color obtained from color cameras. A color-pixel-lookup table can specify the correspondence between LIDAR pixels (depth/ranging pixels) and color pixels, which may be done at different viewing object distances. The operation of the color cameras can be triggered by the angular positions of the LIDAR system. For example, a color image of a particular camera can be captured when the LIDAR system is at a particular angular position, which can be predetermined based on properties of the cameras (e.g., shutter speed). Alternatively or in addition, a common internal clock can be used to assign timestamps to LIDAR and color pixels as they are captured. The corresponding color pixel(s), e.g., as determined using a color-pixel-lookup table, with the closest timestamp can be used for colorization. (Abstract)

    PNG
    media_image24.png
    421
    489
    media_image24.png
    Greyscale

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FENG NIU whose telephone number is (571)272-9592.  The examiner can normally be reached on Monday - Friday, 8am-5pm PT.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on (571) 272-7409.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/FENG NIU/Primary Examiner, Art Unit 2669