DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 11164363. Although the claims at issue are not identical, they are not patentably distinct from each other because this application is a continuation of 16/924,080 (issued as U.S. Patent No. 11164363) and this application claims with more words but in a broader manner the invention concisely claimed in 16/924,080.
The claims map to each other as follows:
Instant Application
U.S. Patent No. 11164363
Claim 1
A method comprising: 
obtaining point cloud data representing a sensor measurement of a scene captured by a sensor, the point cloud data comprising a respective feature representation for each of a 5plurality of three-dimensional points in the scene; 

generating, for each of one or more views of the scene, a corresponding dynamic voxel representation that assigns, to each voxel of a set of voxels for the view, a variable number of three-dimensional points, wherein each three-dimensional point in the point cloud data is assigned to a respective one of the voxels of the set of voxels in the corresponding 10dynamic voxel representation, and wherein the generating comprises: 

assigning, based on positions of the three-dimensional points in the point cloud data according to the view, each of the three-dimensional points to a respective one of the voxels of the set of voxels; and 







processing the dynamic voxel representations corresponding to each of the one or 15more views to generate an output that characterizes the scene.
Claim 1
A method comprising: 
obtaining point cloud data representing a sensor measurement of a scene captured by a sensor, the point cloud data comprising a respective feature representation for each of a plurality of three-dimensional points in the scene; 

generating, for each of one or more views of the scene, a corresponding dynamic voxel representation that assigns, to each voxel of a set of voxels for the view, a variable number of three-dimensional points, wherein each three-dimensional point in the point cloud data is assigned to a respective one of the voxels of the set of voxels in the corresponding dynamic voxel representation, and wherein the generating comprises: 

assigning, based on positions of the three-dimensional points in the point cloud data according to the view, each of the three-dimensional points to a respective one of the voxels of the set of voxels; 
generating a network input from the dynamic voxel representations corresponding to each of the one or more views; and 

processing the network input generated from the dynamic voxel representations corresponding to each of the one or more views using a neural network to generate a network output that characterizes the scene. (this limitation along with the previous one, reads on the broadly recited processing the dynamic voxel representations corresponding to each of the one or more views to generate an output that characterizes the scene, in the instant application.)
Claim 2
The method of claim 1, wherein obtaining the point cloud data comprises: 
obtaining raw sensor data for each of the three-dimensional points; and 
processing the raw sensor data using an embedding neural network to generate the point cloud data.
Claim 2
The method of claim 1, wherein obtaining the point cloud data comprises: 
obtaining raw sensor data for each of the three-dimensional points; and 
processing the raw sensor data using an embedding neural network to generate the point cloud data.
Claim 3
The method of claim 1, wherein 


the output is an object detection output that identifies objects that are located in the scene.
Claim 3
The method of claim 1, wherein the neural network is an object detection neural network and 
the network output is an object detection output that identifies objects that are located in the scene. (reads on the broadly recited claim in the instant application.)
Claim 4
The method of claim 1, wherein the sensor is a LiDAR sensor.
Claim 4
The method of claim 1, wherein the sensor is a LiDAR sensor.
Claim 5
The method of claim 1, wherein a first view of the one or more views is a birds-eye view, and 25wherein assigning each of the three-dimensional points to a respective one of the of voxels in the dynamic voxel representation corresponding to the birds-eye view comprises assigning the three-dimensional points to voxels based on positions of the three-dimensional points in a Cartesian coordinate space.
Claim 5
The method of claim 1, wherein a first view of the one or more views is a birds-eye view, and wherein assigning each of the three-dimensional points to a respective one of the of voxels in the dynamic voxel representation corresponding to the birds-eye view comprises assigning the three-dimensional points to voxels based on positions of the three-dimensional points in a Cartesian coordinate space.
Claim 6
The method of claim 1, wherein a second view of the one or more views is a perspective view, and wherein assigning each of the three-dimensional points to a respective one of the voxels in the dynamic voxel representation corresponding to the perspective view comprises 5assigning the three-dimensional points to voxels based on positions of the three-dimensional points in a spherical coordinate space.
Claim 6
The method of claim 1, wherein a second view of the one or more views is a perspective view, and wherein assigning each of the three-dimensional points to a respective one of the voxels in the dynamic voxel representation corresponding to the perspective view comprises assigning the three-dimensional points to voxels based on positions of the three-dimensional points in a spherical coordinate space.
Claim 7
The method of claim 1, wherein generating the network input comprises, for each of the one or more views: for each voxel in the dynamic voxel representation corresponding to the view, 10processing the feature representations of the three-dimensional points assigned to the voxel to generate respective voxel feature representations of each of the three-dimensional points assigned to the voxel.
Claim 7
The method of claim 1, wherein generating the network input comprises, for each of the one or more views: for each voxel in the dynamic voxel representation corresponding to the view, processing the feature representations of the three-dimensional points assigned to the voxel to generate respective voxel feature representations of each of the three-dimensional points assigned to the voxel.
Claim 8
The method of claim 7, wherein the one or more views comprise a plurality of views and wherein generating the network input comprises, for each of the three-dimensional 15points in the point cloud data: generating a combined feature representation of the three-dimensional point from at least the voxel feature representations for the three-dimensional point for each of the views; and generating the network input by combining the combined feature representations of 20the three-dimensional points.
Claim 8 
The method of claim 7, wherein the one or more views comprise a plurality of views and wherein generating the network input comprises, for each of the three-dimensional points in the point cloud data: generating a combined feature representation of the three-dimensional point from at least the voxel feature representations for the three-dimensional point for each of the views; and generating the network input by combining the combined feature representations of the three-dimensional points.
Claim 9
The method of claim 8, wherein generating the combined feature representation of the three-dimensional point comprises concatenating the voxel feature representations for the three-dimensional point for each of the views and the feature representation for the three-dimensional point in the point cloud data.
Claim 9
The method of claim 8, wherein generating the combined feature representation of the three-dimensional point comprises concatenating the voxel feature representations for the three-dimensional point for each of the views and the feature representation for the three- dimensional point in the point cloud data.
Claim 10
The method of claim 1, wherein, for each of the one or more views, the dynamic voxel representation corresponding to the view defines a bi-directional mapping between voxels in the dynamic voxel representation and the three-dimensional points in the point cloud data.
Claim 10
The method of claim 1, wherein, for each of the one or more views, the dynamic voxel representation corresponding to the view defines a bi-directional mapping between voxels in the dynamic voxel representation and the three-dimensional points in the point cloud data.
Claim 11
A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising: 
obtaining point cloud data representing a sensor measurement of a scene captured by 5a sensor, the point cloud data comprising a respective feature representation for each of a plurality of three-dimensional points in the scene; 

generating, for each of one or more views of the scene, a corresponding dynamic voxel representation that assigns, to each voxel of a set of voxels for the view, a variable number of three-dimensional points, wherein each three-dimensional point in the point cloud 10data is assigned to a respective one of the voxels of the set of voxels in the corresponding dynamic voxel representation, and wherein the generating comprises: 

assigning, based on positions of the three-dimensional points in the point cloud data according to the view, each of the three-dimensional points to a respective one of the voxels of the set of voxels; and 


15processing the dynamic voxel representations corresponding to each of the one or more views to generate an output that characterizes the scene.
Claim 11
A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising: 
obtaining point cloud data representing a sensor measurement of a scene captured by a sensor, the point cloud data comprising a respective feature representation for each of a plurality of three-dimensional points in the scene;

generating, for each of one or more views of the scene, a corresponding dynamic voxel representation that assigns, to each voxel of a set of voxels for the view, a variable number of three-dimensional points, wherein each three-dimensional point in the point cloud data is assigned to a respective one of the voxels of the set of voxels in the corresponding dynamic voxel representation, and wherein the generating comprises: 

assigning, based on positions of the three-dimensional points in the point cloud data according to the view, each of the three-dimensional points to a respective one of the voxels of the set of voxels; generating a network input from the dynamic voxel representations corresponding to each of the one or more views; and 
processing the network input generated from the dynamic voxel representations corresponding to each of the one or more views using a neural network to generate a network output that characterizes the scene. (this limitation along with the previous one, reads on the broadly recited processing the dynamic voxel representations corresponding to each of the one or more views to generate an output that characterizes the scene, in the instant application.)
Claim 12
The system of claim 11, wherein obtaining the point cloud data comprises: obtaining raw sensor data for each of the three-dimensional points; and processing the raw sensor data using an embedding neural network to generate the 20point cloud data.
Claim 12
The system of claim 11, wherein obtaining the point cloud data comprises: obtaining raw sensor data for each of the three-dimensional points; and processing the raw sensor data using an embedding neural network to generate the point cloud data.
Claim 13
The system of claim 11, wherein the output is an object detection output that identifies objects that are located in the scene.
Claim 13

The system of claim 11, wherein the neural network is an object detection neural network and the network output is an object detection output that identifies objects that are located in the scene. (reads on the broadly recited claim in the instant application.)
Claim 14
The system of claim 11, wherein a first view of the one or more views is a birds-eye view, and 25wherein assigning each of the three-dimensional points to a respective one of the of voxels in the dynamic voxel representation corresponding to the birds-eye view comprises assigning the three-dimensional points to voxels based on positions of the three-dimensional points in a Cartesian coordinate space.
Claim 14
The system of claim 11, wherein a first view of the one or more views is a birds-eye view, and wherein assigning each of the three-dimensional points to a respective one of the of voxels in the dynamic voxel representation corresponding to the birds-eye view comprises assigning the three-dimensional points to voxels based on positions of the three-dimensional points in a Cartesian coordinate space.
Claim 15
The system of claim 11, wherein a second view of the one or more views is a perspective view, and wherein assigning each of the three-dimensional points to a respective one of the voxels in the dynamic voxel representation corresponding to the perspective view comprises 5assigning the three-dimensional points to voxels based on positions of the three-dimensional points in a spherical coordinate space.
Claim 15
The system of claim 11, wherein a second view of the one or more views is a perspective view, and wherein assigning each of the three-dimensional points to a respective one of the voxels in the dynamic voxel representation corresponding to the perspective view comprises assigning the three-dimensional points to voxels based on positions of the three-dimensional points in a spherical coordinate space.
Claim 16
The system of claim 11, wherein generating the network input comprises, for each of the one or more views: for each voxel in the dynamic voxel representation corresponding to the view, 10processing the feature representations of the three-dimensional points assigned to the voxel to generate respective voxel feature representations of each of the three-dimensional points assigned to the voxel.
Claim 16
The system of claim 11, wherein generating the network input comprises, for each of the one or more views: for each voxel in the dynamic voxel representation corresponding to the view, processing the feature representations of the three-dimensional points assigned to the voxel to generate respective voxel feature representations of each of the three-dimensional points assigned to the voxel.
Claim 17
The system of claim 16, wherein the one or more views comprise a plurality of views and wherein generating the network input comprises, for each of the three- 15dimensional points in the point cloud data: generating a combined feature representation of the three-dimensional point from at least the voxel feature representations for the three-dimensional point for each of the views; and generating the network input by combining the combined feature representations of 20the three-dimensional points.
Claim 17
The system of claim 16, wherein the one or more views comprise a plurality of views and wherein generating the network input comprises, for each of the three-dimensional points in the point cloud data: generating a combined feature representation of the three-dimensional point from at least the voxel feature representations for the three-dimensional point for each of the views; and generating the network input by combining the combined feature representations of the three-dimensional points.
Claim 18
The system of claim 17, wherein generating the combined feature representation of the three-dimensional point comprises concatenating the voxel feature representations for the three-dimensional point for each of the views and the feature representation for the three-dimensional point in the point cloud data.
Claim 18
The system of claim 17, wherein generating the combined feature representation of the three-dimensional point comprises concatenating the voxel feature representations for the three-dimensional point for each of the views and the feature representation for the three- dimensional point in the point cloud data.
Claim 19
The system of claim 11, wherein, for each of the one or more views, the dynamic voxel representation corresponding to the view defines a bi-directional mapping between voxels in the dynamic voxel representation and the three-dimensional points in the point cloud data.
Claim 19
The system of claim 11, wherein, for each of the one or more views, the dynamic voxel representation corresponding to the view defines a bi-directional mapping between voxels in the dynamic voxel representation and the three-dimensional points in the point cloud data.
Claim 20
One or more non-transitory computer-readable media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: 
obtaining point cloud data representing a sensor measurement of a scene captured by 5a sensor, the point cloud data comprising a respective feature representation for each of a plurality of three-dimensional points in the scene; 
generating, for each of one or more views of the scene, a corresponding dynamic voxel representation that assigns, to each voxel of a set of voxels for the view, a variable number of three-dimensional points, wherein each three-dimensional point in the point cloud 10data is assigned to a respective one of the voxels of the set of voxels in the corresponding dynamic voxel representation, and wherein the generating comprises: 

assigning, based on positions of the three-dimensional points in the point cloud data according to the view, each of the three-dimensional points to a respective one of the voxels of the set of voxels; and 




15processing the dynamic voxel representations corresponding to each of the one or more views to generate a output that characterizes the scene.
Claim 20
One or more non-transitory computer-readable media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: 
obtaining point cloud data representing a sensor measurement of a scene captured by a sensor, the point cloud data comprising a respective feature representation for each of a plurality of three-dimensional points in the scene; 
generating, for each of one or more views of the scene, a corresponding dynamic voxel representation that assigns, to each voxel of a set of voxels for the view, a variable number of three-dimensional points, wherein each three-dimensional point in the point cloud data is assigned to a respective one of the voxels of the set of voxels in the corresponding dynamic voxel representation, and wherein the generating comprises: 

assigning, based on positions of the three-dimensional points in the point cloud data according to the view, each of the three-dimensional points to a respective one of the voxels of the set of voxels; 

generating a network input from the dynamic voxel representations corresponding to each of the one or more views; and 
processing the network input generated from the dynamic voxel representations corresponding to each of the one or more views using a neural network to generate a network output that characterizes the scene. (this limitation along with the previous one, reads on the broadly recited processing the dynamic voxel representations corresponding to each of the one or more views to generate an output that characterizes the scene, in the instant application.)


Allowable Subject Matter

Claims 1-20 would be allowable if the double patenting rejections of the claims set forth in this office action, are addressed.
The following is an examiner’s statement of reasons for allowance:
Regarding claim 1, Du et al (US 20190213778 A1) discloses
A method (Du [0012], “a method”) comprising: 
obtaining point cloud data representing a sensor measurement of a scene captured by a sensor, the point cloud data comprising a respective feature representation for each of a plurality of three-dimensional points in the scene (Du [0058], “[0058] The video output of the camera pods (250) (e.g., streams images for texture values and streams of monochrome images for generating depth values)”; [0089], “a point cloud extracted from the depth map … A point cloud represents one or more objects in 3D space as a set of points.”); 
generating, for each of one or more views of the scene, a corresponding dynamic voxel representation (Du [0105], “dynamic voxels, or other data defining a deformable, volumetric representation for a dynamic 3D model.”)
However, none of the prior arts of record, alone or in combination, disclose claim 1 as a whole.
Claims 2-10 are allowed for depending from claim 1.
Claim 11 is allowed similar to claim 1, for reciting similar subject matter as claim 1.
Claims 12-19 are allowed for depending from claim 11.
Claim 20 is allowed similar to claim 1, for reciting similar subject matter as claim 1.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
See the notice of references cited (PTO-892) for prior art made of record, including art that is not relied upon but considered pertinent to applicant's disclosure.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JITESH PATEL whose telephone number is (571)270-3313. The examiner can normally be reached 8am - 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Greg Tryder can be reached on (571) 270-7365. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JITESH PATEL/Primary Examiner, Art Unit 2616