DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

EXAMINER'S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given in an interview with Kevin Leung (Reg. No. 67,243) on March 17, 2022.

The application has been amended as follows: 

5.  	(Currently Amended) The method of claim 1, further comprising:
after passing the 3D voxel volume through all layers of the 3D CNN, passing the refined features in the 3D voxel volume and TSDF values at each voxel of the 3D voxel volume through a batch normalization (batchnorm) function and a rectified linear unit (reLU) function.

6.	(Currently Amended) The method of claim [[1]]5, wherein additive skip connections are included from an encoder to a decoder of the 3D CNN, and the method further comprises: 


7.	(Currently Amended) The method of claim 6, wherein one or more null voxels of the 3D voxel volume do not have features back-projected into them corresponding to voxels which were not observed during the sequence of frames of RGB images, and the method further comprises:
	not using the additive skip connections from the encoder for the null voxels
	passing the null voxels through the batchnorm function and the reLU a 

19.	(Currently Amended) The system of claim [[14]]18, wherein skip connections are included from an encoder to a decoder of the 3D CNN, and the process for generating a three-dimensional (3D) reconstruction of the scene from the sequence of frames of RGB images further comprises:
	using the additive skip connections to skip one or more features in the 3D voxel volume from the encoder to the decoder of the 3D CNN.

20	(Currently Amended) The system of claim 19, wherein one or more null voxels of the 3D voxel volume do not have features back-projected into them corresponding to voxels which were not observed during the sequence of frames of RGB images, and the process for generating a three-dimensional (3D) reconstruction of the scene from the sequence of frames of RGB images further comprises:

	passing the null voxels through the batchnorm function and the reLU function to match a 


Allowable Subject Matter
Claims 1-21 are allowed.
The following is an examiner’s statement of reasons for allowance:
The cited prior art does not disclose or render obvious the combination of elements recited in the claims as a whole. Specifically, the cited prior art fails to disclose or render obvious the following limitations:
As per independent claim 1, the closest prior art includes teaching a method of generating a 3D reconstruction of a scene from multiview images including:
Ji et al. “SurfaceNet: An End-toend 3D Neural Network for Multiview Stereopsis.” In: Cornell University Library/ Computer Science/Computer Vision and Pattern Recognition, 5 Aug 2017 discloses obtaining a sequence of a frames of red green blue (RGB) images (Page 2 of Ji, Section 4 Surface Net, Par. 1: two images of a scene with known camera parameters) and projecting a 3D voxel volume into each frame using known camera intrinsics and extrinsics wherein each pixel of the voxel volume is mapped to a ray in the voxel volume; (Page 2 of Ji, Section 4 Surface Net, Paras. 1-2, extending to page 3: two images of a scene with known camera parameters and a voxelization of a scene denoted by 3D tensor C where 3D voxel representation encodes camera parameters implicitly; Page 3, left col., 1st full paragraph: “For a ∈ C onto the image Iv and storing the RGB values ix for each voxel respectively” – Fig. 2). 
In other words, Ji teaches reconstructing a 2D surface from the 3D space by projecting a voxel volume into each 2D image frame and storing color values within the voxels.  Claim 1 of the instant application, however, back-projects features extracted from the RGB images themselves into a 3D voxel volume, i.e. direct regression to 3D by first extracting features from 2D images using a 2D CNN, and back projecting the features into the 3D space to form the 3D voxel volume.  Furthermore, Ji fails to teach extracting features from the sequence of frames of the RGB images using a two-dimensional convolutional network, fusing/accumulating features from each frame into the 3D voxel volume, the 3D convolutional neural network having an encoder-decoder, and regressing output truncated signed distance function values at each voxel of the 3D voxel volume. 
Sadjadi et al. (US 8,547,374 B1) discloses 3D reconstruction using images obtained from a plurality of camera sensors in operable communication with the computing system (Col. 2, lines 41-51: Images of the object are collected from various viewpoints using either a single moveable passive imaging sensor, or an array of passive imaging sensors; Col. 1, lines 57-58 discloses an imaging sensor as a camera), wherein the computing system is configured to generate a three-dimensional (3D) reconstruction of the scene (Col. 1, lines 29-32: system and method are described where multiple views of a scene, taken by at least one passive imaging sensor, are utilized to detect and reconstruct object surfaces in three dimensions) from a sequence of frames of RGB images captured by the camera sensors by a process comprising: obtaining a sequence of a frames of red green blue (RGB) images of a scene within a field of view of the camera sensors from the camera sensors; (Col. 2, lines 41-51: Images of the object back-projecting from each frame using known camera intrinsics and extrinsics (Col. 3, lines 43-54: process needs to know relative positions and orientations of imaging sensor relative to object and z-direction towards object) into a 3D voxel volume wherein each pixel of the voxel volume is mapped to a ray in the voxel volume; (Col. 4, lines 51-59: for each voxel, image sensor votes on the contents of the voxel by reprojecting a ray, i.e. ray backprojection, from its optical center, at the location where the respective image was taken, through the voxel)
Sadjadi, however, does disclose extracting features from the sequence of frames of RGB images using a two-dimensional convolutional neural network (2D CNN), and back-projecting those extracted features from each frame into a voxel volume.  Furthermore, the reference is silent as to fusing/accumulating the features from each frame into the 3D voxel volume, and passing the 3D voxel volume through a 3D convolutional neural network (3D CNN) having an encoder-decoder to refine the features in the 3D voxel volume and regress output truncated signed distance function (TSDF) values at each voxel of the 3D voxel volume.
Ho (US 2018/0218513 A1) teaches refining the 3D voxel volume and regress output truncated signed distance function (TSDF) values of the 3D voxel volume (Par. 71: refine voxel grid by weighted average of individual truncated signed distance functions computed from each depth image).  
However, the prior art fails to teach the steps for generating a 3D reconstruction of a scene by extracting features from the sequence of frames of RGB images using a two-dimensional convolutional neural network (2D CNN), back-projecting the extracted features from each frame using known camera intrinsics and extrinsics into a 3D voxel volume wherein 
As per independent claims 14, the claims are allowed for the same reasons as independent claim 1.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM A BEUTEL whose telephone number is (571)272-3132. The examiner can normally be reached Monday-Friday 9:00 AM - 5:00 PM (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/WILLIAM A BEUTEL/Primary Examiner, Art Unit 2616