DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-3, 6-15 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Pugh et al. (U. S. Patent Application Publication 2021/0142497, hereafter ‘497).

Regarding claim 1, Pugh teaches a method (‘497; fig. 1A; ¶ 0019), comprising: obtaining a virtual object (‘497; ¶ 0159; receiving user input identifying selection of at least one virtual object) and corresponding virtual depth information (‘497; ¶ 0159; receiving user input identifying a target location for at least one selected virtual object within the image; ¶ 0161; virtual objects positioned on the floor plane, mapping 2D pointing positions over a 2D image to 3D virtual positioning of an object on the floor plane, permitting 3D computations of virtual object depths and occlusion); capturing an image of at least one physical object (‘497; fig. 3 and 14; ¶ 0021; the method includes obtaining an image, that includes one or more objects) and corresponding physical depth information (‘497; ¶ 0021; determining a depth map (e.g., depth estimates for a set of image pixels; etc.) for the image (e.g., by using neural networks based on the image, the photogrammetry point cloud, hardware depth sensors, and/or any other suitable information); generating an occlusion mask based at least in part on the virtual depth information write mask in a shader. The depth value can be written to the framebuffer (e.g., 604 shown in FIG. 10), where the semantic segmentation mask allows the framebuffer and the depth information to subsequently occlude 3D rendering – write mask is an adjustment mask); and generating a composite image including at least a portion of the virtual object and at least a portion of the image using the occlusion mask and the adjustment mask (‘497; fig. 1G, process S600; ¶ 0143; ¶ 0146; Facilitate interactive virtual object insertion and movement in rendered scene; ¶ 0166; S640 includes rendering virtual objects using occlusion information generated at S630, using the generated occlusion information to perform occlusion processing for virtual objects that overlap real objects in the rendered scene).
Pugh discloses the above elements of claim 1 in several embodiments.  With the embodiments being disclosed in a single reference, one of ordinary skill in the art prior to the effective filing date of the invention being aware of one embodiment would also have been aware of the others, thus it would have been obvious to one of ordinary skill in the art prior to the 

In regard to claim 2, Pugh teaches the method of claim 1 and further teaches wherein the occlusion mask indicates pixel locations at which the physical object is foreground to the virtual object (‘497; fig. 4; several examples showing virtual objects partially occluded by real objects in the image - the physical object is foreground to the virtual object).

Regarding claim 3, Pugh teaches the method of claim 2 and further teaches the method as  further comprising generating an edge mask based on the virtual depth information (‘497; ¶ 0075; Estimating boundaries and depth discontinuities S410 preferably functions to estimate edges of objects (included within an image obtained at S100 or composited at S300), which can subsequently be used to guide semantic segmentation, to correct edges in the depth maps or point clouds (e.g., the dense depth map; sparse depth map; dense, scaled depth map, etc.), or otherwise used. S410 can be performed before S420, but can additionally or alternatively be performed at any other suitable time. S410 preferably determines edges based on information from S100-S300 (e.g., object information, metric scale information, metadata, visual information, depth discontinuities, extracted features, the raw set of images, pre-processed images, etc.), but can additionally or alternatively determine edges based on any other suitable set of data. The resultant edgemaps (generated by estimating edges of objects) are preferably associated with 

In regard to claim 6, Pugh teaches the method of claim 1 and further teaches wherein generating the composite image including the at least the portion of the virtual object and the at least the portion of the image using the occlusion mask and the adjustment mask comprises: 

Regarding claim 7, Pugh teaches the method of claim 6 and further teaches wherein generating the composite image further comprises: forming pixel values of the composite image from one or more of corresponding pixel values of a virtual image and corresponding pixel values of the image (‘497; fig. 1G, Rendering Scenes Interactively with Occlusion Masks S600; ¶ 0142-0143; forming pixel values of the composite image from one or more of corresponding pixel values of a virtual image and corresponding pixel values of the image), as determined based on corresponding parameters for corresponding pixels of the alpha mask (‘497; ¶ 0139).

In regard to claim 8, Pugh teches the method of claim 1 and further teaches the method as further comprising: splitting the adjustment mask to form an interior adjustment mask (‘497; ¶ 0044; adjustment mask minus the exterior adjustment mask) and an exterior adjustment mask (‘497; ¶ 0044; contour map – exterior adjustment mask), wherein generating the composite image using the occlusion mask and the adjustment mask comprises generating the composite image using the occlusion mask and either the interior adjustment mask or the exterior adjustment mask (‘497; fig. 1G, process S600; ¶ 0143; ¶ 0146; Facilitate interactive virtual object insertion and movement in rendered scene; ¶ 0166; S640 includes rendering virtual objects using occlusion information generated at S630, using the generated occlusion 

Regarding claim 9, Pugh teaches the method of claim 8 and further teaches wherein splitting the adjustment mask comprises: generating the interior adjustment mask based on the occlusion mask and the adjustment mask; generating an inverse of the occlusion mask (‘497; ¶ 0148; Custom graphics shaders can include a fragment shader and/or a vertex shader, but can additionally or alternatively include any other suitable combination of texture format storage, precision, numerical encodings, use of multiple textures, use of stencil tests instead of and/or in addition to alpha tests, and/or using destination buffer stencil test operations, or any other suitable shader. In one example, the fragment shader converts depth and semantic segmentation information from texture memory and transfers the information to the framebuffer); and generating the exterior adjustment mask (‘497; ¶ 0145-0146; the depth occlusion information (e.g., 602 shown in FIG. 10) based on the inverse of the occlusion mask and the adjustment mask.

In regard to claim 10, Pugh teaches the method of claim 1 and further teaches the method as further comprising: splitting the adjustment mask to form an interior adjustment mask and an exterior adjustment mask, wherein generating the composite image using the occlusion mask and the adjustment mask comprises generating the composite image using the occlusion mask and both the interior adjustment mask and the exterior adjustment mask (‘497; fig. 1G, process S600; ¶ 0143; ¶ 0146; Facilitate interactive virtual object insertion and movement in rendered scene; ¶ 0166; S640 includes rendering virtual objects using occlusion information generated at S630, 

Regarding claim 11, Pugh teaches the method of claim 10 and further teaches wherein generating the composite image using the occlusion mask and both the interior adjustment mask and the exterior adjustment mask comprises: performing a correction associated with the image using the interior adjustment mask (‘497; fig. 1G, process S600; ¶ 0143; ¶ 0146; Facilitate interactive virtual object insertion and movement in rendered scene; ¶ 0166; S640 includes rendering virtual objects using occlusion information generated at S630, using the generated occlusion information to perform occlusion processing for virtual objects that overlap real objects in the rendered scene); and performing a correction associated with the virtual object using the exterior adjustment mask (‘497; fig. 1G, process S600; ¶ 0143; ¶ 0146; Facilitate interactive virtual object insertion and movement in rendered scene; ¶ 0166; S640 includes rendering virtual objects using occlusion information generated at S630, using the generated occlusion information to perform occlusion processing for virtual objects that overlap real objects in the rendered scene).

Regarding claim 12, Pugh teaches a device (‘497; fig. 1A and 2; ¶ 0019-0020; computing device such as a smartphone), comprising: a camera (‘497; fig. 2; ¶ 0030; one or more cameras); memory (‘497; fig. 1A and 2; ¶ 0019-0020; computing device such as a smartphone which includes memory); and one or more processors (‘497; fig. 1A and 2; ¶ 0019-0020; computing device such as a smartphone which includes one or more processors) configured to: obtain a virtual object  (‘497; ¶ 0159; receiving user input identifying selection of at least one virtual permitting 3D computations of virtual object depths and occlusion); capture, using at least the camera (‘497; fig. 2; ¶ 0030; one or more cameras), an image of at least one physical object (‘497; fig. 3 and 14; ¶ 0021; the method includes obtaining an image, that includes one or more objects) and corresponding physical depth information (‘497; ¶ 0021; determining a depth map (e.g., depth estimates for a set of image pixels; etc.) for the image (e.g., by using neural networks based on the image, the photogrammetry point cloud, hardware depth sensors, and/or any other suitable information); generate an occlusion mask based at least in part on the virtual depth information and the physical depth information (‘497; ¶ 0121; determine foreground occlusion masks and/or depths for the scene imagery (e.g., for each of a set of objects appearing in the scene)) and computing a depth measurement for the mask based on the corresponding portion of the 3D model (‘497; ¶ 0121; determine foreground occlusion masks and/or depths for the scene imagery (e.g., for each of a set of objects appearing in the scene); ¶ 0123); generate an adjustment mask based at least in part on the occlusion mask and the virtual depth information (‘497; ¶ 0146; the depth occlusion information (e.g., 602 shown in FIG. 10) and semantic segmentation information (e.g., 603 shown in FIG. 10) can be stored in the texture memory (e.g., 601) as components of a packed 3 or 4 component texture and used as a depth value and a write mask in a shader. The depth value can be written to the framebuffer (e.g., 604 shown in FIG. 10), where the semantic segmentation mask allows the framebuffer and the depth information to subsequently occlude 3D rendering – write mask is an adjustment mask); and generate a composite image including at least a portion 
Pugh discloses the above elements of claim 12 in several embodiments.  With the embodiments being disclosed in a single reference, one of ordinary skill in the art prior to the effective filing date of the invention being aware of one embodiment would also have been aware of the others, thus it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to have combined these elements from two or more embodiments into a single arrangement as the embodiments are all related and being in the same disclosure provides the motivation to combine providing the benefit of enjoying the advantages of all the embodiments disclosed by Pugh to be combined into a single arrangement sequenced to satisfy the order required for claim 12.

Regarding claim 13, Pugh teaches the device of claim 12 and further teaches the device as further comprising a depth sensor configured to obtain the physical depth information (’49; fig. 2, element 215, depth sensor; ¶ 0030), and a display configured to display the composite image (’49; fig. 10, computer and display).

In regard to claim 14, Pugh teaches the device of claim 12 and further teaches wherein the one or more processors are configured to generate the composite image including the at least the portion 

Regarding claim 15, Pugh teaches the device of claim 14 and further teaches wherein the one or more processors are further configured to generate the composite image by forming pixel values of the composite image from one or more of corresponding pixel values of a virtual image and corresponding pixel values of the image (‘497; fig. 1G, Rendering Scenes Interactively with Occlusion Masks S600; ¶ 0142-0143; forming pixel values of the composite image from one or more of corresponding pixel values of a virtual image and corresponding pixel values of the image), as determined based on corresponding parameters for corresponding pixels of the alpha mask (‘497; ¶ 0139).

In regard to claim 18, Pugh teaches a non-transitory computer-readable medium storing instructions (‘497; fig. 2, programed device 200) which, when executed by one or more processors (‘497; fig. 1A and 2; ¶ 0019-0020; computing device such as a smartphone which permitting 3D computations of virtual object depths and occlusion); capturing an image of at least one physical object (‘497; fig. 3 and 14; ¶ 0021; the method includes obtaining an image, that includes one or more objects) and corresponding physical depth information (‘497; ¶ 0021; determining a depth map (e.g., depth estimates for a set of image pixels; etc.) for the image (e.g., by using neural networks based on the image, the photogrammetry point cloud, hardware depth sensors, and/or any other suitable information); generating an occlusion mask based at least in part on the virtual depth information and the physical depth information (‘497; ¶ 0121; determine foreground occlusion masks and/or depths for the scene imagery (e.g., for each of a set of objects appearing in the scene)) and computing a depth measurement for the mask based on the corresponding portion of the 3D model (‘497; ¶ 0121; determine foreground occlusion masks and/or depths for the scene imagery (e.g., for each of a set of objects appearing in the scene); ¶ 0123); generating an adjustment mask based at least in part on the occlusion mask and the virtual depth information (‘497; ¶ 0146; the depth occlusion information (e.g., 602 shown in FIG. 10) and semantic segmentation information (e.g., 603 shown in FIG. 10) can be stored in the texture memory (e.g., 601) as components of a packed 3 or 4 component texture and used as a depth value and a write mask in a shader. The depth value can be written to the framebuffer (e.g., 604 
Pugh discloses the above elements of claim 18 in several embodiments.  With the embodiments being disclosed in a single reference, one of ordinary skill in the art prior to the effective filing date of the invention being aware of one embodiment would also have been aware of the others, thus it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to have combined these elements from two or more embodiments into a single arrangement as the embodiments are all related and being in the same disclosure provides the motivation to combine providing the benefit of enjoying the advantages of all the embodiments disclosed by Pugh to be combined into a single arrangement sequenced to satisfy the order required for claim 18.

Claims 4, 5, 16, 17, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Pugh et al. (U. S. Patent Application Publication 2021/0142497, hereafter ‘497) as applied to claims 1-3, 6-15 and 18 above, and in view of Jain et al. (U. S. Patent Application Publication 2019/0057513 A1, hereafter, ‘513).

In regard to claim 4, Pugh teaches the method of claim 3 and further teaches generating the adjustment mask based on the occlusion mask and the dilated edge mask (‘497; ¶ 0146; the depth occlusion information (e.g., 602 shown in FIG. 10) and semantic segmentation information (e.g., 603 shown in FIG. 10) can be stored in the texture memory (e.g., 601) as components of a packed 3 or 4 component texture and used as a depth value and a write mask in a shader. The depth value can be written to the framebuffer (e.g., 604 shown in FIG. 10), where the semantic segmentation mask allows the framebuffer and the depth information to subsequently occlude 3D rendering – write mask is an adjustment mask) but does not explicitly teach the method as further comprising: dilating the edge mask prior to generating the adjustment mask; and correcting the edge depths based on the depth of the object that the edges belong to; and generating the adjustment mask based on the occlusion mask and the dilated edge mask.
Jain, working in the same field of endeavor, however, teaches dilating the edge mask prior to generating the adjustment mask (‘513; ¶ 0010; the image processing application further directs the processor to compute the second composite depth map by applying edge detection to the filtered first composite depth map to result in an edge-detected depth map); and dilating the edge-detected depth map to result in the edge map (‘513; ¶ 0093); and correcting the edge depths based on the depth of the object that the edges belong to (‘513; ¶ 0093-0095) for the benefit of providing believable mixed-reality depth occlusions using improved and smoothed 3D depth estimates and improved 3D edge boundaries (which are both noisy in practice) which can dramatically improve user experience, as humans are particularly sensitive to errant boundary pixels.
It would have been obvious to one of ordinary skill in the prior to the effective filing date of the invention to have combined the edge boundary dilation and improvement techniques taught by Jain with the systems and methods for rendering virtual objects onto an image of a captured real scene as taught by Pugh for the benefit of providing believable mixed-reality depth occlusions using improved and smoothed 3D depth estimates and improved 3D edge boundaries (which are both noisy in practice) which can dramatically improve user experience, as humans are particularly sensitive to errant boundary pixels.

Regarding claim 5, Pugh and Jain teach the method of claim 4 and further teach wherein generating the adjustment mask based on the occlusion mask and the dilated edge mask comprises: dilating the occlusion mask prior to generating the adjustment mask (‘513; ¶ 0091); and generating the adjustment mask based on the dilated occlusion mask and the dilated edge mask (‘497; ¶ 0146; the depth occlusion information (e.g., 602 shown in FIG. 10) and semantic segmentation information (e.g., 603 shown in FIG. 10) can be stored in the texture memory (e.g., 601) as components of a packed 3 or 4 component texture and used as a depth value and a write mask in a shader. The depth value can be written to the framebuffer (e.g., 604 shown in FIG. 10), where the semantic segmentation mask allows the framebuffer and the depth information to subsequently occlude 3D rendering – write mask is an adjustment mask).

In regard to claim 16, Pugh teaches the device of claim 12 and further teaches generate the adjustment mask based on the occlusion mask and the dilated edge mask (‘497; ¶ 0146; the depth occlusion information (e.g., 602 shown in FIG. 10) and semantic segmentation information (e.g., write mask in a shader. The depth value can be written to the framebuffer (e.g., 604 shown in FIG. 10), where the semantic segmentation mask allows the framebuffer and the depth information to subsequently occlude 3D rendering – write mask is an adjustment mask) but does not explicitly teach wherein the one or more processors are further configured to: generate an edge mask based on the virtual depth information; dilate the edge mask prior to generating the adjustment mask.
Jain, working in the same field of endeavor, however, teaches dilating the edge mask prior to generating the adjustment mask (‘513; ¶ 0010; the image processing application further directs the processor to compute the second composite depth map by applying edge detection to the filtered first composite depth map to result in an edge-detected depth map); and dilating the edge-detected depth map to result in the edge map (‘513; ¶ 0093); and correcting the edge depths based on the depth of the object that the edges belong to (‘513; ¶ 0093-0095) for the benefit of providing believable mixed-reality depth occlusions using improved and smoothed 3D depth estimates and improved 3D edge boundaries (which are both noisy in practice) which can dramatically improve user experience, as humans are particularly sensitive to errant boundary pixels.
It would have been obvious to one of ordinary skill in the prior to the effective filing date of the invention to have combined the edge boundary dilation and improvement techniques taught by Jain with the systems and methods for rendering virtual objects onto an image of a captured real scene as taught by Pugh for the benefit of providing believable mixed-reality depth occlusions using improved and smoothed 3D depth estimates and improved 3D edge boundaries (which are 

Regarding claim 17, Pugh and Jain teach the device of claim 16 and further teach wherein the one or more processors are configured to generate the adjustment mask based on the occlusion mask and the dilated edge mask by: dilating the occlusion mask prior to generating the adjustment mask (‘513; ¶ 0091); and generating the adjustment mask based on the dilated occlusion mask and the dilated edge mask (‘497; ¶ 0146; the depth occlusion information (e.g., 602 shown in FIG. 10) and semantic segmentation information (e.g., 603 shown in FIG. 10) can be stored in the texture memory (e.g., 601) as components of a packed 3 or 4 component texture and used as a depth value and a write mask in a shader. The depth value can be written to the framebuffer (e.g., 604 shown in FIG. 10), where the semantic segmentation mask allows the framebuffer and the depth information to subsequently occlude 3D rendering – write mask is an adjustment mask).

Regarding claim 19, Pugh teaches the non-transitory computer-readable medium of claim 18 and further teaches generate the adjustment mask based on the occlusion mask and the dilated edge mask (‘497; ¶ 0146; the depth occlusion information (e.g., 602 shown in FIG. 10) and semantic segmentation information (e.g., 603 shown in FIG. 10) can be stored in the texture memory (e.g., 601) as components of a packed 3 or 4 component texture and used as a depth value and a write mask in a shader. The depth value can be written to the framebuffer (e.g., 604 shown in FIG. 10), where the semantic segmentation mask allows the framebuffer and the depth information to 
Jain, working in the same field of endeavor, however, teaches dilating the edge mask prior to generating the adjustment mask (‘513; ¶ 0010; the image processing application further directs the processor to compute the second composite depth map by applying edge detection to the filtered first composite depth map to result in an edge-detected depth map); and dilating the edge-detected depth map to result in the edge map (‘513; ¶ 0093); and correcting the edge depths based on the depth of the object that the edges belong to (‘513; ¶ 0093-0095) for the benefit of providing believable mixed-reality depth occlusions using improved and smoothed 3D depth estimates and improved 3D edge boundaries (which are both noisy in practice) which can dramatically improve user experience, as humans are particularly sensitive to errant boundary pixels.
It would have been obvious to one of ordinary skill in the prior to the effective filing date of the invention to have combined the edge boundary dilation and improvement techniques taught by Jain with the systems and methods for rendering virtual objects onto an image of a captured real scene as taught by Pugh for the benefit of providing believable mixed-reality depth occlusions using improved and smoothed 3D depth estimates and improved 3D edge boundaries (which are both noisy in practice) which can dramatically improve user experience, as humans are particularly sensitive to errant boundary pixels.

In regard to claim 20, Pugh and Jain teach the non-transitory computer-readable medium of claim 19 and further teach wherein generating the adjustment mask based on the occlusion mask and the dilated edge mask includes: dilating the occlusion mask prior to generating the adjustment mask (‘513; ¶ 0091); and generating the adjustment mask based on the dilated occlusion mask and the dilated edge mask (‘497; ¶ 0146; the depth occlusion information (e.g., 602 shown in FIG. 10) and semantic segmentation information (e.g., 603 shown in FIG. 10) can be stored in the texture memory (e.g., 601) as components of a packed 3 or 4 component texture and used as a depth value and a write mask in a shader. The depth value can be written to the framebuffer (e.g., 604 shown in FIG. 10), where the semantic segmentation mask allows the framebuffer and the depth information to subsequently occlude 3D rendering – write mask is an adjustment mask).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Edward Martello whose telephone number is (571) 270-1883.  The examiner can normally be reached on M-F 7:30-5:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on (571) 272-7761.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.





/EDWARD MARTELLO/
Primary Examiner, Art Unit 2613