DETAILED ACTION


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 7, 10-14, 17, 21-24 and 27 is/are rejected under 35 U.S.C. 103 as being unpatentable over PHALAK (US 20210279950 A1) in view of Chibane et al. (Neural Unsigned Distance Fields for Implicit Function Learning, 2020).
Regarding Claim 1, PHALAK teaches a computer vision system for generating a PHALAK Abst: Methods, systems, and wearable extended reality devices for generating a floorplan of an indoor scene).
PHALAK discloses a generating a floor plane but does not specificity teaches a three-dimensional (3D) surface. Chibane teaches a generating a three-dimensional (3D) surface representation (Chibane Abst: Neural Distance Fields (NDF), a neural network based model which predicts the unsigned distance field for arbitrary 3D shapes given sparse point clouds).
PHALAK in view of Chibane further teaches comprising:
a memory; and a processor in communication with the memory, the processor (PHALAK [0021] a system having a processor and memory is provided):
receiving data associated with the 3D surface (PHALAK [0265] the input image may be identified at 1502B. An image may be obtained from a scan of a scene (e.g., an interior environment having one or more rooms with one or more walls). For example, an input image may be obtained from a 3D scan of a scene);
processing the data based at least in part on one or more computer vision models to predict an unsigned distance field and a normal vector field (the network takes the input partial scan as input (PHALAK [0312] encoded as an TSDF in a volumetric grid) as well as the previous low-resolution TDF prediction (if not the base level) and any previous voxel group TDF predictions; [0381] Take N points x i with 3D coordinates P i=(x.sub.i, y.sub.i, z.sub.i), point normals=(x.sub.i.sup.(n),y.sub.i.sup.(n),z.sub.i.sup.(n) and predicted cluster probability vector P(x)=(p.sup.(x), . . . , p.sub.k+1.sup.(x))), 
the unsigned distance field indicative of a proximity to the 3D surface (PHALAK [0305] A ScanComplete method takes as input a partial 3D scan, represented by a truncated signed distance field (TSDF) stored in a volumetric grid. The TSDF is generated from depth frames following the volumetric fusion approach, which has been widely adopted by modern RGB-D scanning methods. Some embodiments feed this partial TSDF into a new volumetric neural network, which outputs a truncated, unsigned distance field (TDF); [0306] As the network requires only depth input, some embodiments virtually scan depth data by generating scanning trajectories mimicking real-world scanning paths), the normal vector field indicative of a surface orientation of the 3D surface (PHALAK [0392] Various embodiments build a fully synthetic dataset along with normal labels, starting from a room perimeter skeleton randomly sampled from various shapes (rectangle, L-shaped, T-shaped, or U-shaped). Lengths and angular orientation of each edge and the height of the room are uniformly sampled; [0414] this procedure allows to propose amodal boundaries even from partial observations, as well as predicting other parameters like orientation, class, etc… To enable usage of local vote geometry, some embodiments transform vote locations to a local normalized coordinate system by z′.sub.i=(z.sub.i−z.sub.j)/r), 
wherein the unsigned distance field comprises a predicted closest unsigned distance to a surface point of the 3D surface from a given point in a 3D space (PHALAK [0309] For each voxel, these embodiments store a truncated distance value (no sign; truncation of 3× voxel size), as well as a semantic label of the closest object to the voxel center), and 
the normal vector field comprises a predicted normal vector to the surface point closest to the given point (PHALAK [0417] Objectness is supervised via a cross entropy loss normalized by the number of non-ignored proposals in the batch. For positive proposals, some embodiments further supervise the bounding box estimation and class prediction according to the closest ground truth bounding box); and
determining the 3D surface representation based at least in part on the unsigned distance field and the normal vector field (PHALAK [0199] FloorVoter is able to generate layouts of scenes without assumptions regarding the shape, size, number and configuration of rooms which renders it valuable for floorplan estimation from 3D data in the wild; Chibane pp7: 3D Shape Reconstruction of Closed Surfaces; 3D Shape Reconstruction of Complex Shape).
Chibane discloses a learnable output representation that allows continuous, high resolution outputs of arbitrary shape, which is analogous to the present patent application. 
It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention, to have modified PHALAK to incorporate the teachings of Chibane, and apply the predicts the unsigned distance field for arbitrary 3D shapes, as taught by Chibane into the systems and methods for efficient floorplan generation from 3d scans of indoor scenes.
Doing so would be have two main advantages:
First, NDF can learn directly from real world scan data without need to artificially close the surfaces before training. Second, and more importantly, NDF can represent a larger class of shapes including open surfaces, shapes with inner structures, as well as curves, manifold data and analytical mathematical functions in the computer vision systems and methods for high-fidelity representation of complex 3d surfaces using deep unsigned distance embeddings.

Regarding Claim 2, PHALAK in view of Chibane teaches the system of claim 1, and further teaches wherein the data comprises one or more open shapes with arbitrary topology (Chibane pp4, FIG.3: For each point in 3D, the unsigned distance field is predicted from the input with NDF. This yields a continuously completed representation of arbitrary resolution and topology). The same motivation as claim 1 applies here.

Regarding Claim 3, PHALAK in view of Chibane teaches the system of claim 1, and further teaches wherein the data comprises a triangle soup having a plurality of triangles (PHALAK [0480] Given a set of m control points, whose 3-dimensional coordinates are known in some coordinate frame, and given an image in which some subset of them control points is visible, determine the location (relative to the coordinate system of the control points) from which the image was obtained; [0484] For the case n=4, when all four control points lie in a common plane; [0486] Consider the tetrahedron in FIG. 13E-(a). The base ABC is an equilateral triangle and the “legs” (e.g., LA, LB, and LC) are all equal).

Regarding Claim 4, PHALAK in view of Chibane teaches the system of claim 1, and further teaches wherein the data comprises a plurality of point clouds (PHALAK [0378] Extracting Wall Point Clouds: [0379] Some embodiments utilize multiple observations of the same real-world scene from various poses to generate a per-frame dense depth map, through the state-of-the art Multiview Depth Estimation network).

Regarding Claim 7, PHALAK in view of Chibane teaches the system of claim 1, and further teaches wherein the processor further performs the steps of:
casting a plurality of rays from a viewpoint (Chibane pp6: Finding the intersection of a ray with the surface is necessary for direct image rendering);
processing each ray using sphere tracing to determine intersections of each ray and the 3D surface based at least in part on an unsigned distance field associated points along a ray direction of each ray and a normal vector field associated with stop points where iterative marching of the sphere tracing of each ray stops (Chibane pp7: Surface Normals and Differentiability of NDF: For rendering images, normals at the surface are needed, which can be derived from UDF gradients… Near the surface, it can be shown that the cut locus (points which are equidistant to at least two surface points) does not intersect a region of thickness rmax around the surface, if we can roll a closed ball Br of radius rmax inside and outside the surface, such that it touches all points in the surface [7] (see the supplementary for a visualization). When this condition is met, UDFs are differentiable (C1) in a region R(S) = {x 2 Rd \ S |UDF(x, S) < rmax}, excluding points exactly on the surface. In practice, since NDF are learned, we compute gradients only at points in a region of  = 5mm < rmax from the surface, which guarantees that we are sufficiently far without intersecting the cut locus – for surfaces of curvature k < 1/.); and
rendering a view of the 3D surface representation based at least in part on the determined intersections (Chibane pp7: when rendering, we approximate the normal at the intersection point q 2 S by traveling back along the ray  units and computing the gradient).
The same motivation as claim 1 applies here.

Regarding Claim 10, PHALAK in view of Chibane teaches the system of claim 9, and further teaches wherein the second loss is selected from a loss between the estimated normal vector and a first ground truth surface normal and a loss between the estimated normal vector and a second ground truth surface normal, the second ground truth surface normal indicative of a modulo 1800 of the first ground truth surface normal, wherein the ground truth surface normal comprises the first ground truth surface normal and the second surface normal.

Regarding Claim 11, PHALAK in view of Chibane teaches a computer vision method for generating a three-dimensional (3D) surface representation (PHALAK Abst: Methods, systems, and wearable extended reality devices for generating a floorplan of an indoor scene; Chibane Abst: Neural Distance Fields (NDF), a neural network based model which predicts the unsigned distance field for arbitrary 3D shapes given sparse point clouds).
The metes and bounds of the limitations of the claim substantially correspond to the claim as set forth in Claim 1; thus they are rejected on similar grounds and rationale as their corresponding limitations.

Regarding Claims 12-14 and 17, PHALAK in view of Chibane teaches the method of claim 11. The metes and bounds of the limitations of the claims substantially correspond to the claims as set forth in Claims 2-4 and 7; thus they are rejected on similar grounds and rationale as their corresponding limitations.

Regarding Claim 21, PHALAK in view of Chibane teaches a non-transitory computer readable medium having instructions stored thereon for a three- dimensional (3D) surface representation (PHALAK Abst: Methods, systems, and wearable extended reality devices for generating a floorplan of an indoor scene; [0022] According to some embodiments, a non-transitory computer-readable medium is provided. The non-transitory computer-readable medium stores instructions thereon which, when executed by a processor, cause the processor to perform any of the methods described herein; Chibane Abst: Neural Distance Fields (NDF), a neural network based model which predicts the unsigned distance field for arbitrary 3D shapes given sparse point clouds).
The metes and bounds of the limitations of the claim substantially correspond to the claim as set forth in Claim 1; thus they are rejected on similar grounds and rationale as their corresponding limitations.

Regarding Claims 22-24 and 27, PHALAK in view of Chibane teaches the non-transitory computer readable medium of claim 21. The metes and bounds of the limitations of the claims substantially correspond to the claims as set forth in Claims 2-4 and 7; thus they are rejected on similar grounds and rationale as their corresponding limitations.

Claim(s) 5, 6, 15, 16, 25 and 26 is/are rejected under 35 U.S.C. 103 as being unpatentable over PHALAK (US 20210279950 A1) in view of Chibane et al. (Neural Unsigned Distance Fields for Implicit Function Learning, 2020) further in view of Dean et al. (US 20060094951 A1).
Regarding Claim 5, PHALAK in view of Chibane teaches the system of claim 1, and further teaches wherein the processor further performs the steps of:
creating a voxel grid for the unsigned distance field at a first resolution, the voxel grid having a first plurality of voxels (PHALAK [0304] Some embodiments leverage fully-convolutional neural networks that can be trained on smaller sub-volumes but applied to arbitrarily-sized scene environments at test time.. Some embodiments show examples with bounds of up to 1480×1230×64 voxels (≈70×60×3 m)… Some embodiments adopt a coarse-to-fine strategy in which the model predicts a multi-resolution hierarchy of outputs);
hierarchically dividing the voxel grid into a selected group of voxels and non-selected group of voxels, the selected group of voxels having a resolution higher than the first resolution, the non-selected group of voxels having the first resolution (PHALAK [0304] Some embodiments adopt a coarse-to-fine strategy in which the model predicts a multi-resolution hierarchy of outputs. The first hierarchy level predicts scene geometry and semantics at low resolution but large spatial context. Following levels use a smaller spatial context but higher resolution);
converting the selected group of voxels into a mesh using marching cubes (Chibane pp3: Several applications require extracting point clouds, meshes or directly rendering the implicit surface onto an image, which requires finding its zero-levelset. Most classical methods, such as marching cubes [48] and volume rendering). The same motivation as claim 1 applies here.
PHALAK in view of Chibane does not but Dean teaches extracting iso-surface of the 3D representation based at least in part on the mesh (Dean [0165] FIG. 3 which incorporates FIGS. 3A-3D shows the 3D ROI clipping process. FIG. 3A shows the complete skull polygonal mesh isosurface where n.sub.b is the normal vector of the best-fitting plane of selected contour, and c.sub.s is the centroid of the contour defined by operator-seeded points pi FIG. 3B shows after n.sub.b is aligned with the Z-axis, points that have negative Z values are eliminated. FIG. 3C shows a series of linear inequality tests are performed to extract data that resides outside of the contour. FIG. 3D shows reconstructed data representing a 3D ROI).
Dean discloses a computer aided design method for producing an implant for a patient prior to operation, which is analogous to the present patent application. 
It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention, to have modified PHALAK to incorporate the teachings of Dean, and apply the isosurface extracting method, as taught by Dean into the systems and methods for efficient floorplan generation from 3d scans of indoor scenes.
Doing so would be able to generate a consecutive minimum-cost path in the graph in the computer vision systems and methods for high-fidelity representation of complex 3d surfaces using deep unsigned distance embeddings.

Regarding Claim 6, PHALAK in view of Chibane teaches the system of claim 1, and further teaches wherein the processor hierarchically divides the voxel grid by:
selecting a first group of voxels from the first plurality of voxels as a first subdivision based at least in part on that at least one corner of each voxel of the first group of voxels has a predicted closest unsigned distance less than an edge length of a voxel of the voxel grid, the first group of voxels being more proximity to the 3D surface than non-selected voxels of the first plurality of voxels (PHALAK [0182] some embodiments construct a synthetic point cloud using the corner, edge and room annotations provided in the dataset. The samples from this dataset are clean, noise-free and contain a uniform sampling of points along all walls. The second version, which some embodiments shall refer to as BKE-struct may be obtained by retaining points in the original scan that are nearer than 0.15 m to the nearest corresponding point from the same scene in BKE-syn; [0189] some embodiments jointly transform and project both the ground truth and prediction corners and edges onto a 256×256 image grid and use the following rules for calculating metrics);
increasing a resolution of the first subdivision to a second resolution higher than the first resolution, the first subdivision having a second plurality of voxels, the number of the second plurality of voxels being greater than the first group of voxels (PHALAK [0310] For training, some embodiments uniformly sample subvolumes at 3 m intervals out of each of the train scenes. These embodiments keep all subvolumes containing any non-structural object voxels (e.g., tables, chairs), and randomly discard subvolumes that contain only structural voxels (e.g., wall/ceiling/floor) with 90% probability. This results in a total of 225, 414 training subvolumes. Some embodiments use voxel grid resolutions of [32×16×32], [32×32×32], and [32×64×32] for each level, resulting in spatial extents of [6 m×3 m×6 m], [3 m.sup.3], [1.5 m×3 m×1.5 m], respectively; [0312] At each hierarchy level, the network takes the input partial scan as input (encoded as an TSDF in a volumetric grid) as well as the previous low-resolution TDF prediction (if not the base level) and any previous voxel group TDF predictions); and
selecting a second group of voxels from the second plurality of voxels as a second subdivision based at least in part on that at least one corner of each voxel of the second group of voxels has a predicted closest unsigned distance less than an edge length of a voxel of the first subdivision, the second group of voxels being more proximity to the 3D surface than the first group of voxels wherein the second group of voxels comprise the selected group of voxels (PHALAK [0171] FIG. 6 illustrates the input 602 of a set of clustered wall points to a perimeter estimation module for a room, the ordering of the wall segment endpoints 602 determined by the shortest path algorithm, and a room perimeter 606 determined as a polygon by extruding or extending the line segments to generate the polygon vertices. The set of nodes through which some embodiments compute a shortest path is the set of start-points {p.sub.1.sup.m′,k}.sub.m′=1.sup.C.sup.mk and end-points {p.sub.2.sup.m′,k}.sub.m′=1.sup.C.sup.mk of the line segments; [0312] At each hierarchy level, the network takes the input partial scan as input (encoded as an TSDF in a volumetric grid) as well as the previous low-resolution TDF prediction (if not the base level) and any previous voxel group TDF predictions).

Regarding Claims 15 and 16, PHALAK in view of Chibane teaches the method of claim 11. The metes and bounds of the limitations of the claims substantially correspond to the claims as set forth in Claims 5 and 6; thus they are rejected on similar grounds and rationale as their corresponding limitations.

Regarding Claims 25 and 26, PHALAK in view of Chibane teaches the non-transitory computer readable medium of claim 21. The metes and bounds of the limitations of the claims substantially correspond to the claims as set forth in Claims 5 and 6; thus they are rejected on similar grounds and rationale as their corresponding limitations.

.
Allowable Subject Matter
Claim(s) 8-10, 18-20 and 28-30 is/are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
Regarding Claim 8, PHALAK in view of Chibane teaches the system of claim 7, and further teaches wherein the processor processes each ray using the sphere tracing by:
processing each ray originating at a first point using the iterative marching to obtain a second point using a step size of a predicted closest unsigned distance to the 3D surface from the first point along the ray direction (Chibane pp6: We want to find  such that f(p) falls bellow a minimal threshold);
determining that the iterative marching stops at a stop point for each ray, the stop point is close to the 3D surface (Chibane pp6: The basic idea of sphere tracing is to march along the ray r in steps of size equal to the distance at the point f(p), which is theoretically guaranteed to converge to the correct solution for exact UDFs);
However, PHALAK in view of Chibane does not teach estimating an intersection of each ray and the 3D surface based at least in part on an angle between a predicted normal vector to the 3D surface closest to the stop point and the ray direction, wherein the determined intersections comprise the estimated intersection. Therefore, claim 8 in the context of claims 1 and 7 as a whole is allowable.

Regarding Claim 9, PHALAK in view of Chibane further in view of DUKA et al. (US 20200020155 A1) teaches the system of claim 1, and further teaches wherein the processor further trains the one or more computer vision models by:
sampling a set of training pairs from a given 3D shape represented by a noisy triangle soup, each training pair comprising a sampling surface point on a triangle face and a surface normal from the sampling surface point (DUKA [0054] In another embodiment, a Voronoi-based reconstruction is used. A point cloud is generated from all views, and then neighbor points are connected to get a final surface. To do this, a local two-dimensional (2D) Voronoi diagram can be employed. Each point in the point cloud contains a normal, which may be calculated from the capture information; [0067] In block 306, each of the loops is triangulated; [0071] For each voxel that exhibits an SDF sign change, Dual Contouring computes points of intersections of voxel edges with an SDF zero-value surface, and also computes normal vectors at those points. Those points are then used to compute final surface points as a minimizer of a special function);
constructing a set of training samples, each training sample comprising a sampling point in the 3D space, a ground truth distance, and a ground truth surface normal, wherein the ground truth distance is a distance between the sampling point and a nearest corresponding surface point in a training pair of the set of training pairs, and the ground truth surface normal is a surface normal from the training pair (PHALAK [0194] FloorVoter is able to generate accurate floorplans for a variety of shapes as shown in FIGS. 8A-8B. In FIG. 8A, 802A represents some example ground truth images);
However, PHALAK in view of Chibane further in view of DUKA does not teach
estimating, using the one or more computer vision models, an unsigned distance associated with each training sample;
estimating, using the one or more computer vision models, a normal vector associated with each training sample;
determining a first loss between the estimated unsigned distance and the ground truth distance;
determining a second loss between the estimated normal vector and the ground truth surface normal; and
training the one or more computer vision models based at least in part on minimizing the first loss and the second loss.
Therefore, claim 9 in the context of claim 1 as a whole is allowable.
Claim 10 in the context of claims 1 and 9 as a whole is allowable for the same reason described in the claim 9.

Regarding Claims 18-20, PHALAK in view of Chibane teaches the method of claim 11. Claim 18 in the context of claims 11 and 17 as a whole is allowable for the same reason described in the claim 8. Claims 19 and 20 are therefore allowable.

Regarding Claims 28-30, PHALAK in view of Chibane teaches the non-transitory computer readable medium of claim 21. Claim 28 in the context of claims 21 and 27 as a whole is allowable for the same reason described in the claim 8. Claims 29 and 30 are therefore allowable.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAMANTHA (Samantha (YUEHAN) WANG whose telephone number is (571)270-5011.  The examiner can normally be reached on Monday-Friday, 8am-5pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on (571) 272-7794.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Samantha (YUEHAN) WANG/
Primary Examiner
Art Unit 2611