DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments, filed on 08/03/2022, with respect to rejection of claims 1-3, 6-7, 10-12, and 15-16 under 35 U.S.C. 102(a)(1), have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. The amended independent claims 1 and 10 are now rejected under 35 U.S.C. 103 as being unpatentable over Wei et al (U.S. Pub. 2016/0012633 A1, hereinafter as “Wei_1”), and in view of Wei et al (U.S. Pub. 2017/0091996 A1, hereinafter as “Wei_2”). The plurality of images depicting a sense, which may be used to generate a 3D model, in Wei_1 (see paragraphs [0054], [0056]) are equivalent to a 3D digital object in the claim. Wei_2 is introduced to explicitly show the feature of 3D digital object surrounded by virtual cameras (paragraph [0066], “The method also includes creating one or more virtual cameras for each 3D mesh in a local reference frame (416).”). The combination of Wei_1 and Wei_2 renders the claims 1 and 10 obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention. The same rationale applies to the dependent claims. Therefore claims 1-18 remain rejected.
Applicant's arguments, with respect to rejection of claims 19 and 22 under 35 U.S.C. 102(a)(1), have been fully considered but they are not persuasive. 
Regarding claims 19 and 22, the applicant argues that, Newcombe fails to disclose, teach, or suggest the use of multiple virtual cameras directed at a digital object, a closed surface surrounding the digital object, or use of the digital object, closed surface, and virtual cameras to determine camera locations for the multiple virtual cameras, as disclosed by the present application and required by independent claims 19 and 22. More specifically, (1) rather than disclose use of multiple virtual cameras surrounding a digital object, Newcombe teaches use of images of a real-world environment captured by the same hardware camera as it moves within the real-world environment. In addition, the “dense 3D model” disclosed by Newcombe is a model of the real-world environment containing the hardware camera and through which the hardware camera moves. (2) Newcombe in fact merely describes spherical surfaces of real-world objects within the real-world environment being mapped to the 3D model. Hence, those spherical objects may be interpreted as real-world objects existing within the real-world environment, images captured by the hardware camera within the real-world environment, or virtual spherical objects contained within the 3D model of the real-world environment, but in no way can those spherical objects be interpreted as surrounding the 3D model described in Newcombe.
The examiner respectfully disagrees. (1) As explained by Newcombe in paragraph [0027], ‘The term "dense 3D model" is used in this document to refer to a representation of a three dimensional scene comprising objects and surfaces where that representation comprises detail about image elements of that scene.’ The dense 3D model may be created/derived from images of a real-world environment captured by the same hardware camera, or other captured data, but it is no longer just images. It is a digital object. As pointed out in the claim rejection, in the process of relocalization, the camera poses, or the positions and orientations are computed. In this process, the hardware cameras are no longer relevant. As stated in paragraphs [0052]-[0053] of Newcombe, “[t]he relocalization engine also comprises a relocalization process 508 which may use keyframes or may operate without keyframes.” Particularly, in the case without keyframes, there were no hardware cameras at those positions, at the first place. Therefore the cameras in relocalization process Newcombe can be considered as virtual cameras. (2) Since the dense 3D model is a representation of a three dimensional scene comprising objects and surfaces. If the objects or the surfaces have closed surfaces, the model would have closed surfaces. Newcombe discloses spherical surfaces, see Paragraph [0051]. A spherical surface is a closed surface. The dense 3D model of the spherical surface would have a closed surface surrounding the digital object. In summary, Newcombe discloses all limitations of the claims. The same rationale applies to the dependent claims. Therefore claims 19-24 remain rejected
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 19 and 22 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Newcombe et al (U.S. Pub. 2012/0194644 A1).
Regarding claim 19, Newcombe et al teaches an image processing system (Fig. 14) comprising: 
              a computing platform including a processing hardware, and a system memory storing a software code configured to optimize a positioning of a plurality of virtual cameras oriented toward a digital object (Fig. 14, 1400,1412, 1422. Note: In the relocalization process of Newcombe et al, the camera poses, or the positions and orientations are computed. Therefore the cameras there can be considered as virtual cameras, although not explicitly stated.); 
              the processing hardware configured to execute the software code to: 
                    generate at least one closed surface surrounding the digital object (Fig. 1, 110, Fig. 3, 326, a dense 3D model. Paragraph [0051], spherical surface. paragraph [0027], ‘The term "dense 3D model" is used in this document to refer to a representation of a three dimensional scene comprising objects and surfaces where that representation comprises detail about image elements of that scene.’); 
                   cast a plurality of occlusion rays from each of a plurality of points on a surface of the digital object toward the at least one closed surface, resulting in a plurality of intersections (paragraphs [052], [0079], “This is done by projecting a ray into the volume of the dense surface model. The ray is projected from an estimated camera position and orientation associated with the current depth map and into the 3D model through a point on a face of that 3D model which corresponds to a sample point in the current depth map.” “A first visible surface along that ray is found by stepping along the ray and assessing a surface density function to find a first positive to negative zero crossing. The associated sub pixel world point is found from an estimate of the intersection of the surface density function along the ray.”); 
              cluster the intersections, based on the plurality of virtual cameras and a surface density of the plurality of intersections on the at least one closed surface, to identify a respective plurality of camera locations for each of the plurality of virtual cameras (paragraphs [0053]-[0056], “A fast clustering algorithm such as a random decision forest is applied to patches of the current depth map and to patches of a plurality of previous depth maps obtained from the 3D model of the environment. The previous depth maps may be obtained from the 3D model of the environment by using a ray casting technique to render depth maps from the 3D model or in any other way.” Paragraph [0079], “A first visible surface along that ray is found by stepping along the ray and assessing a surface density function to find a first positive to negative zero crossing. The associated sub pixel world point is found from an estimate of the intersection of the surface density function along the ray.”); and 
              generate the plurality of virtual cameras at the respective plurality of camera locations (paragraphs [0056]-[0057], “The relocalization process selects a previous depth map which is similar to the current depth map in terms of a histogram of the textons output by the random decision forest classifier. The camera pose associated with the selected depth map is then used as the current camera pose and the camera is relocalized.”).
Regarding claim 22, Newcombe et al teaches a method for use by an image processing system including a computing platform having a processing hardware and a system memory storing a software code to optimize a positioning of a plurality of virtual cameras oriented toward a digital object (Fig. 14, 1400,1412, 1422. Figs. 4, and 11, etc. Note: In the relocalization process of Newcombe et al, the camera poses, or the positions and orientations are computed. Therefore the cameras there can be considered as virtual cameras, although not explicitly stated. ), the method comprising: 
               generating, by the software code executed by the processing hardware, at least one closed surface surrounding the digital object (Fig. 1, 110, Fig. 3, 326, a dense 3D model. Paragraph [0051], spherical surface); 
              casting, by the software code executed by the processing hardware, a plurality of occlusion rays from each of a plurality of points on a surface of the digital object toward the at least one closed surface, resulting in a plurality of intersections (paragraphs [052], [0079], “This is done by projecting a ray into the volume of the dense surface model. The ray is projected from an estimated camera position and orientation associated with the current depth map and into the 3D model through a point on a face of that 3D model which corresponds to a sample point in the current depth map.” “A first visible surface along that ray is found by stepping along the ray and assessing a surface density function to find a first positive to negative zero crossing. The associated sub pixel world point is found from an estimate of the intersection of the surface density function along the ray.”); 
              clustering the intersections, by the software code executed by the processing hardware based on the plurality of virtual cameras and a surface density of the plurality of intersections on the at least one closed surface, to identify a respective plurality of camera locations for each of the plurality of virtual cameras (paragraphs [0053]-[0056], “A fast clustering algorithm such as a random decision forest is applied to patches of the current depth map and to patches of a plurality of previous depth maps obtained from the 3D model of the environment. The previous depth maps may be obtained from the 3D model of the environment by using a ray casting technique to render depth maps from the 3D model or in any other way.” Paragraph [0079], “A first visible surface along that ray is found by stepping along the ray and assessing a surface density function to find a first positive to negative zero crossing. The associated sub pixel world point is found from an estimate of the intersection of the surface density function along the ray.”); and 
              generating, by the software code executed by the processing hardware, the plurality of virtual cameras at the respective plurality of camera locations (paragraphs [0056]-[0057], “The relocalization process selects a previous depth map which is similar to the current depth map in terms of a histogram of the textons output by the random decision forest classifier. The camera pose associated with the selected depth map is then used as the current camera pose and the camera is relocalized.”).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-3, 6-7, 10-12, and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Wei et al (U.S. Pub. 2016/0012633 A1, hereinafter as “Wei_1”), and in view of Wei et al (U.S. Pub. 2017/0091996 A1, hereinafter as “Wei_2”).  
Regarding claim 1, Wei_1 suggests an image processing system (Fig. 14) comprising:
               a computing platform including a processing hardware, a display, and a system memory storing a software code (Fig. 14, 1402, 1404, 1406; Fig. 10); 
               the processing hardware configured to execute the software code to:     
                   surround a digital object with a plurality of virtual cameras oriented toward the digital object, the digital object being a three-dimensional (3D) digital object (paragraph [0057], “At (204) a pose can be determined for each of the plurality of images obtained at (202). For example, the pose for each image can describe a location and orientation in three-dimensional space at which such image was captured.” Also see paragraphs [0054], [0056]. The plurality of images depicting a sense, which may be used to generate a 3D model, in Wei_1 are equivalent to a 3D digital object in the claim. Note: the cameras in Wei et al may be considered as virtual cameras because only the poses or the parameters of the cameras are relevant to the context.);
                   render, using each one of the plurality of virtual cameras, a depth map identifying a distance of each respective one of the plurality of virtual cameras from the digital object (paragraph [0060], “At (206) a depth map can be determined for each of the plurality of images. As an example, a stereo matching algorithm can be performed to respectively obtain a plurality of depth maps for the plurality of images.”); 
                   generate, using the depth map, a volumetric perspective of the digital object from a perspective of each respective one of the plurality of virtual cameras, resulting in a corresponding plurality of volumetric perspectives of the digital object (paragraph [0060], “Further, each depth map can describe a plurality of points in three-dimensional space that correspond to physical objects in the scene. For example, each depth map can provide a depth for each of a plurality of points (e.g. for each of the pixels of the corresponding image) relative to the pose associated with such depth map.” Also see Fig. 2, 208);
                   merge the plurality of volumetric perspectives of the digital object to form a volumetric representation of the digital object (paragraphs [0045], [0123], “Referring again FIG. 2, after depth map alignment at (208) and outlier identification at (210), a three-dimensional model can be generated based at least in part on the plurality of depth maps. In particular, at (212) a volumetric fusion technique can be performed to merge the plurality of depth maps.”);
                 convert the volumetric representation of the digital object to a renderable form (paragraphs [0048], [0127], “After the depth maps have been merged at (212), at (214) a mesh model can be generated. For example, the mesh model can be generated at (214) based at least in part on a signed distance function generated at (212). As an example, marching cubes or other mesh modeling techniques can be performed to generate a three-dimensional polygonal mesh model.”).
Wei_2 is introduced to explicitly show the feature of 3D digital object surrounded by virtual cameras (paragraph [0007], “The method further includes creating a 3D mesh for each of the one or more detected shapes”; paragraph [0066], “The method also includes creating one or more virtual cameras for each 3D mesh in a local reference frame (416).”). The combination of Wei_1 and Wei_2 renders the claim obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention. The rationale of the combination may be combining prior art elements according to known methods to yield predictable results, see MPEP 2143.
Regarding claim 2, Wei_1 teaches wherein the volumetric representation of the digital object includes only externally visible details of the digital object (paragraphs [0045]-[0046], depth map merging. The volumetric representation of the digital object is the result of merging depth maps. That is, only the outside surface feature of the object is relevant.).
Regarding claim 3, Wei_1 teaches wherein to merge the plurality of volumetric perspectives of the digital object, the processing hardware is further configured to execute the software code to: determine a signed distance function (SDF) representation of each of the plurality of volumetric perspectives of the digital object, resulting in a corresponding plurality of SDF representations; and combine, using a Boolean intersection operation, the plurality of SDF representations to form the volumetric representation of the digital object (paragraph [0048], signed distance function).
Regarding claim 6, Wei_1 teaches wherein the processing hardware is further configured to execute the software code to: convert the volumetric representation of the digital object to a mesh representation of the digital object (paragraphs [0048], [0127], “After the depth maps have been merged at (212), at (214) a mesh model can be generated. For example, the mesh model can be generated at (214) based at least in part on a signed distance function generated at (212). As an example, marching cubes or other mesh modeling techniques can be performed to generate a three-dimensional polygonal mesh model.”).
Regarding claim 7, Wei_1 teaches wherein a field of view of the perspective of each respective one of the plurality of virtual cameras includes a plurality of pixels of the digital object, wherein the depth map identifies a distance of each respective one of the plurality of virtual cameras from each of the plurality of pixels, and wherein the processing hardware is further configured to execute the software code to: perform a deep rendering for at least some of the plurality of pixels, such that the depth map identifies a plurality of depth values corresponding to the distance of each respective one of the plurality of virtual cameras from each of the at least some of the plurality of pixels (paragraph [0029], “After a pose has been determined for each image, a stereo matching algorithm can be performed to respectively obtain a plurality of depth maps for the plurality of images. In particular, the depth map determined for each image can inherit the pose from such image. Further, each depth map can describe a plurality of points in three-dimensional space that correspond to objects in the scene. For example, each depth map can provide a depth for each of a plurality of points (e.g. for each of the pixels of the corresponding image) relative to the pose associated with such depth map.”).
Regarding claim 10, Wei_1 suggests a method for use by an image processing system including a computing platform having a processing hardware, a display, and a system memory storing a software code (Fig. 2; Fig. 14, 1402, 1404, 1406; Fig. 10), the method comprising: 
                      surrounding, by the software code executed by the processing hardware, the digital object with a plurality of virtual cameras oriented toward the digital object the digital object being a three-dimensional (3D) digital object (paragraph [0057], “At (204) a pose can be determined for each of the plurality of images obtained at (202). For example, the pose for each image can describe a location and orientation in three-dimensional space at which such image was captured.” Also see paragraphs [0054], [0056]. The plurality of images depicting a sense, which may be used to generate a 3D model, in Wei_1 are equivalent to a 3D digital object in the claim. Note: the cameras in Wei et al may be considered as virtual cameras because only the poses or the parameters of the cameras are relevant to the context.);
                      rendering, by the software code executed by the processing hardware and using each one of the plurality of virtual cameras, a depth map identifying a distance of each respective one of the plurality of virtual cameras from the digital object (paragraph [0060], “At (206) a depth map can be determined for each of the plurality of images. As an example, a stereo matching algorithm can be performed to respectively obtain a plurality of depth maps for the plurality of images.”); 
                     generating, by the software code executed by the processing hardware and using the depth map, a volumetric perspective of the digital object from a perspective of each respective one of the plurality of virtual cameras, resulting in a plurality of volumetric perspectives of the digital object (paragraph [0060], “Further, each depth map can describe a plurality of points in three-dimensional space that correspond to physical objects in the scene. For example, each depth map can provide a depth for each of a plurality of points (e.g. for each of the pixels of the corresponding image) relative to the pose associated with such depth map.” Also see Fig. 2, 208);
                  merging, by the software code executed by the processing hardware, the corresponding plurality of volumetric perspectives of the digital object to form a volumetric representation of the digital object (paragraphs [0045], [0123], “Referring again FIG. 2, after depth map alignment at (208) and outlier identification at (210), a three-dimensional model can be generated based at least in part on the plurality of depth maps. In particular, at (212) a volumetric fusion technique can be performed to merge the plurality of depth maps.”); and
                converting, by the software code executed by the processing hardware, the volumetric representation of the digital object to a renderable form (paragraphs [0048], [0127], “After the depth maps have been merged at (212), at (214) a mesh model can be generated. For example, the mesh model can be generated at (214) based at least in part on a signed distance function generated at (212). As an example, marching cubes or other mesh modeling techniques can be performed to generate a three-dimensional polygonal mesh model.”).
Wei_2 is introduced to explicitly show the feature of 3D digital object surrounded by virtual cameras (paragraph [0007], “The method further includes creating a 3D mesh for each of the one or more detected shapes”; paragraph [0066], “The method also includes creating one or more virtual cameras for each 3D mesh in a local reference frame (416).”). The combination of Wei_1 and Wei_2 renders the claim obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention. The rationale of the combination may be combining prior art elements according to known methods to yield predictable results, see MPEP 2143.
Regarding claim 11, Wei_1 teaches wherein the volumetric representation of the digital object includes only externally visible details of the digital object (paragraphs [0045]-[0046], depth map merging. The volumetric representation of the digital object is the result of merging depth maps. That is, only the outside surface feature of the object is relevant.).
Regarding claim 12, Wei_1 teaches wherein merging the plurality of volumetric perspectives of the digital object further comprises: determining, by the software code executed by the processing hardware, a signed distance function (SDF) representation of each of the plurality of volumetric perspectives of the digital object, resulting in a corresponding plurality of SDF representations; and combining, by the software code executed by the processing hardware and using a Boolean intersection operation, the plurality of SDF representations to form the volumetric representation of the digital object (paragraph [0048], signed distance function).
Regarding claim 15, Wei_1 teaches the method further comprising: converting, by the software code executed by the processing hardware, the volumetric representation of the digital object to a mesh representation of the digital object (paragraphs [0048], [0127], “After the depth maps have been merged at (212), at (214) a mesh model can be generated. For example, the mesh model can be generated at (214) based at least in part on a signed distance function generated at (212). As an example, marching cubes or other mesh modeling techniques can be performed to generate a three-dimensional polygonal mesh model.”).
Regarding claim 16, Wei_1 teaches wherein a field of view of the perspective of each respective one of the plurality of virtual cameras includes a plurality of pixels of the digital object, and wherein the depth map identifies a distance of each respective one of the plurality of virtual cameras from each of the plurality of pixels, the method further comprising: performing a deep rendering, by the software code executed by the processing hardware, for at least some of the plurality of pixels, such that the depth map identifies a plurality of depth values corresponding to the distance of each respective one of the plurality of virtual cameras from each of the at least some of the plurality of pixels (paragraph [0029], “After a pose has been determined for each image, a stereo matching algorithm can be performed to respectively obtain a plurality of depth maps for the plurality of images. In particular, the depth map determined for each image can inherit the pose from such image. Further, each depth map can describe a plurality of points in three-dimensional space that correspond to objects in the scene. For example, each depth map can provide a depth for each of a plurality of points (e.g. for each of the pixels of the corresponding image) relative to the pose associated with such depth map.”).

Claims 4-5 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Wei_1, and in view of Wei_2, as applied to claims 1 and 10 above, and further in view of PHALAK (U.S. Pub. 2021/0279950 A1).
Regarding claims 4 and 13, the combination of Wei_1 and Wei_2 remains as applied to claims 1 and 10 above, respectively. However, the combination does not explicitly teach wherein the volumetric representation of the digital object includes a first region having a first voxel size and a second region having a second voxel size different than the first voxel size. 
PHALAK, also in the same filed of endeavor, teaches wherein the volumetric representation of the digital object includes a first region having a first voxel size and a second region having a second voxel size different than the first voxel size (paragraph [0305], “Small voxel sizes capture local detail but lack spatial context; large voxel sizes provide large spatial context but lack local detail. To get the best of both worlds while maintaining high resolution, some embodiments use a coarse-to-fine hierarchical strategy. The network first predicts the output at a low resolution in order to leverage more global information from the input. Subsequent hierarchy levels operate at a higher resolution and smaller context size.”). Using different voxel sizes would meet the resolution requirements in the different regions. At the same time, it may conserve the computational resources. As PHALAK is combined with Wei_1 and Wei_2, that is, using variable voxel size, one would obtain the claimed feature. The implementation of the combination may be done by adding/modifying the relevant software components. The rationale of the combination may be combining prior art elements according to known methods to yield predictable results, see MPEP 2143. Therefore it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the systems and the methods as shown in Wei_1, Wei_2, and PHALAK, wherein the volumetric representation of the digital object includes a first region having a first voxel size and a second region having a second voxel size different than the first voxel size.
Regarding claims 5 and 14, the combination of Wei_1, Wei_2, and PHALAK would suggest the image processing system of claim 1, wherein the processing hardware is further configured to execute the software code to: decimate the volumetric representation of the digital object to produce a down-sampled volumetric representation of the digital object; wherein the down-sampled volumetric representation of the digital object includes a first region and a second region, and wherein the first region is less decimated than the second region; or the method of claim 10, further comprising: decimating, by the software code executed by the processing hardware, the volumetric representation of the digital object to produce a down-sampled volumetric representation of the digital object; wherein the down-sampled volumetric representation of the digital object includes a first region and a second region, and wherein the first region is less decimated than the second region (PHALAK: paragraph [0160], “The set abstraction 304 (SA Layer(s) for down sampling) layers and the feature propagation 306 (e.g., FP layer(s) for up sampling) layers in the backbone 318 compute features at various scales to produce a subsampled version of the input (e.g., the seed points 308) denoted by s, with M points, M≤N having C additional feature dimensions such that {si}i=1M and si ∈R3+C.” Paragraphs [0359]-[0363], non-uniform sampling density, multi-scale grouping, multi-resolution grouping). The rationale of the combination for claims 4 and 13 above is incorporated herein.
Claims 8 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Wei_1, and in view of Wei_2, as applied to claims 1 and 10 above, and further in view of Newcombe et al.
Regarding claims 8 and 17, the combination of Wei_1 and Wei_2 remains as applied to claims 1 and 10 above, respectively. However, the combination does not explicitly teach wherein the plurality of virtual cameras is a predetermined plurality, and wherein to surround the digital object with the plurality of virtual cameras, the processing hardware is further configured to execute the software code to: generate at least one closed surface surrounding the digital object; cast a plurality of occlusion rays from each of a plurality of points on a surface of the digital object toward a surface of the at least one closed surface, resulting in a plurality of intersections; cluster the intersections, based on the predetermined plurality and a surface density of the plurality of intersections on the at least one closed surface, to identify a respective plurality of camera locations for each of the plurality of virtual cameras; and generate the plurality of virtual cameras at the respective plurality of camera locations, as in claim 8; or wherein the plurality of virtual cameras is a predetermined plurality, and wherein surrounding the digital object with the plurality of virtual cameras further comprises: generating, by the software code executed by the processing hardware, at least one closed surface surrounding the digital object; casting, by the software code executed by the processing hardware, a plurality of occlusion rays from each of a plurality of points on a surface of the digital object toward the at least one closed surface, resulting in a plurality of intersections; clustering, by the software code executed by the processing hardware using the predetermined plurality and a surface density of the plurality of intersections on the at least one closed surface, to identify a respective plurality of camera locations for each of the plurality of virtual cameras; and generating, by the software code executed by the processing hardware, the plurality of virtual cameras at the respective plurality of camera locations, as in claim 17.
Newcombe et al, also in the same field of endeavor, teaches above claimed features that Wei_1 and Wei_2 fail to teach (Fig. 1, 110, Fig. 3, 326, a dense 3D model. Paragraph [0051], spherical surface. paragraphs [052], [0079], “This is done by projecting a ray into the volume of the dense surface model. The ray is projected from an estimated camera position and orientation associated with the current depth map and into the 3D model through a point on a face of that 3D model which corresponds to a sample point in the current depth map.”  paragraphs [0053]-[0056], “A fast clustering algorithm such as a random decision forest is applied to patches of the current depth map and to patches of a plurality of previous depth maps obtained from the 3D model of the environment. The previous depth maps may be obtained from the 3D model of the environment by using a ray casting technique to render depth maps from the 3D model or in any other way.” Paragraph [0079], “A first visible surface along that ray is found by stepping along the ray and assessing a surface density function to find a first positive to negative zero crossing. The associated sub pixel world point is found from an estimate of the intersection of the surface density function along the ray.” paragraphs [0056]-[0057], “The relocalization process selects a previous depth map which is similar to the current depth map in terms of a histogram of the textons output by the random decision forest classifier. The camera pose associated with the selected depth map is then used as the current camera pose and the camera is relocalized.”). It is noted that Wei et al discloses the similar process for increasing the accuracy or confidence score (see paragraphs [45]-[0047]). As Newcombe et al is combined with Wei_1 and Wei_2, that is, incorporating the relevant steps from Newcombe et al into Wei_1 and Wei_2, one would obtain the claimed features. The implementation of the combination may be done by adding/modifying the relevant software components. The rationale of the combination may be combining prior art elements according to known methods to yield predictable results, see MPEP 2143. Therefore it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the systems and the methods as shown in Wei_1, Wei_2, and Newcombe et al to obtain the claimed features.
Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Wei_1, and in view of Wei_2 and Newcombe et al, as applied to claims 8 and 17 above, and further in view of Strawn et al (U.S. Pub. 2017/0356755 A1).
Regarding claims 9 and 18, the combination of Wei_1, Wei_2, and Newcombe et al remains as applied to claims 8 and 17 above, respectively. However, the combination does not explicitly teach wherein the processing hardware is further configured to execute the software code to: adjust a lens parameter of at least one of the plurality of virtual cameras based on a two-dimensional bounding box of the digital object from the perspective of the at least one of the plurality of virtual cameras, as in claim 9; or further comprising: adjusting, by the software code executed by the processing hardware based on a two- dimensional bounding box of the digital object from the perspective of the at least one of the plurality of virtual cameras, a lens parameter of at least one of the plurality of virtual cameras, as in claim 18.
Strawn et al, also in the same field of endeavor, teaches adjusting a lens parameter of at least one of the plurality of virtual cameras based on a two-dimensional bounding box of the digital object from the perspective of the at least one of the plurality of virtual cameras (paragraph [0012], “A framing bounding shape (e.g., bounding box) is then defined in the screen space about the projection of the collection of points. The VC engine then uses the puck's bounding shape to determine how much the virtual camera's origin can be offset to capture as many of the collection of points being framed. This operation clips the framing bounding shape. The zoom level of the virtual camera is then adjusted to align one of the sides of the framing bounding shape with one of the sides of the sub-region that represents the display screen's field of focus.”). As Strawn et al is combined with Newcombe et al, Wei_1, and Wei_2, that is, incorporating the relevant steps from Strawn et al into Wei_1, one would obtain the claimed features. The implementation of the combination may be done by adding/modifying the relevant software components. The rationale of the combination may be combining prior art elements according to known methods to yield predictable results, see MPEP 2143. Therefore it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the systems and the methods as shown in Wei_1, Wei_2, Newcombe et al, and Strawn et al to obtain the claimed features.
Claims 20 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Newcombe et al, as applied to claims 19 and 22 above, and in view of Strawn et al.
Regarding claims 20 and 23, Newcombe et al remains as applied to claims 19 and 22 above, respectively. However, Newcombe et al does not explicitly teach wherein the processing hardware is further configured to execute the software code to: adjust a lens parameter of at least one of the plurality of virtual cameras based on a two-dimensional bounding box of the digital object from the perspective of the at least one of the plurality of virtual cameras, as in claim 20; or further comprising: adjusting, by the software code executed by the processing hardware based on a two- dimensional bounding box of the digital object from the perspective of the at least one of the plurality of virtual cameras, a lens parameter of at least one of the plurality of virtual cameras, as in claim 23.
Strawn et al, in the same field of endeavor, teaches adjusting a lens parameter of at least one of the plurality of virtual cameras based on a two-dimensional bounding box of the digital object from the perspective of the at least one of the plurality of virtual cameras (paragraph [0012], “A framing bounding shape (e.g., bounding box) is then defined in the screen space about the projection of the collection of points. The VC engine then uses the puck's bounding shape to determine how much the virtual camera's origin can be offset to capture as many of the collection of points being framed. This operation clips the framing bounding shape. The zoom level of the virtual camera is then adjusted to align one of the sides of the framing bounding shape with one of the sides of the sub-region that represents the display screen's field of focus.”). As Strawn et al is combined with Newcombe et al, that is, incorporating the relevant steps from Strawn et al into Newcombe et al, one would obtain the claimed features. The implementation of the combination may be done by adding/modifying the relevant software components. The rationale of the combination may be combining prior art elements according to known methods to yield predictable results, see MPEP 2143. Therefore it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the systems and the methods as shown in Newcombe et al, and Strawn et al to obtain the claimed features.
Claims 21 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Newcombe et al, as applied to claims 19 and 22 above, and in view of Mecca et al (U.S. Pub. 2021/0044788 A1).
Regarding claims 21 and 24, Newcombe et al remains as applied to claims 19 and 22 above, respectively. However, Newcombe et al does not explicitly teach wherein the at least one closed surface comprises a sphere, the sphere having a radius such that any of the plurality of virtual cameras sharing a same focal length and located on the sphere would view a boundary of the digital object from the perspective of any of the plurality of virtual cameras.
Mecca et al, in the same field of endeavor, teaches wherein the at least one closed surface comprises a sphere, the sphere having a radius such that any of the plurality of virtual cameras sharing a same focal length and located on the sphere would view a boundary of the digital object from the perspective of any of the plurality of virtual cameras (paragraph [0143], “The virtual object was scaled to have approximate radius 20 mm and the virtual camera of focal length 6 mm was placed in several locations on a sphere of 45 mm around the object.”). As Mecca et al is combined with Newcombe et al, that is, incorporating the relevant steps from Mecca et al into Newcombe et al, one would obtain the claimed features. The implementation of the combination may be done by adding/modifying the relevant software components. The rationale of the combination may be combining prior art elements according to known methods to yield predictable results, see MPEP 2143. Therefore it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the systems and the methods as shown in Newcombe et al, and Mecca et al to obtain the claimed features.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TIZE MA whose telephone number is (571)270-3709. The examiner can normally be reached 9AM-5PM EST M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TIZE MA/Primary Examiner, Art Unit 2613