Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2, 8, 10, 16, and 21 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claims 2, 10, and 16, the term “coarse geometric approximation” is a relative term which renders the claim indefinite. The term “coarse geometric approximation” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  That is, “coarse” is a relative level of approximation relative to “fine”, and as there is no claimed “fine geometric approximation” from which an objective relative level of approximation could be determined, the level of approximation required by “coarse” is subjective.
For purposes of applying prior art, the limitation will be read without “coarse”, i.e. any level of geometric approximation would be within the scope of the limitation.
Claims 8 and 21 recite that the image content includes “telepresence image data”.  The subjective adjective “telepresence” renders the claim indefinite, i.e. it is not definite whether “telepresence” image data necessarily must be used as part of a telepresence system in order to read on the claimed scope, or more broadly, includes any image data which could be used in a telepresence system.  
For the purposes of applying prior art, the broader interpretation is used.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4-16, and 18-21 are rejected under 35 U.S.C. 103 as being unpatentable over “Neural Volumes: Learning Dynamic Renderable Volumes from Images” by Stephen Lombardi, et al. (hereinafter Lombardi) in view of “Deep Appearance Models for Face Rendering” by Stephen Lombardi, Jason Saragih, et al. (hereinafter Saragih).
Regarding claim 1, the limitations “A computer-implemented method utilizing at least one processing device to perform operations including: receiving a pose associated with an object in image content … receiving, from [a] neural renderer … a color image and an alpha mask representing an opacity of at least a portion of the object; and generating a composite image based on the pose, the color image, and the alpha mask” are taught by Lombardi (Lombardi, e.g. abstract, sections 1, 3-8, describes a system for learning volume representations of multi-view image sequences for the purpose of generating novel views from different viewpoints and/or with changed animations.  In particular, Lombardi teaches that generating an output image may include providing a control signal defining object pose, e.g. figure 2 caption, section 8.6, paragraph 2, where the neural network decoder uses the control signal c and a latent code z, e.g. section 3, paragraph 2, figure 2, section 5, to generate/decode a volume V, from which a color and alpha image are rendered using accumulative ray marching based in part on a viewpoint pose, e.g. section 6, which are finally composited onto the background image, e.g. section 7.2.  That is, a neural renderer generates the pixel colors by raymarching all pixels using equation 7, followed by merging each pixel with the corresponding background pixel according to the remaining opacity based on the viewpoint pose relative to the object and the control signal defining the object pose, with examples shown in figure 8.)
The limitations “generating a plurality of three-dimensional (3D) proxy geometries of the object, the plurality of 3D proxy geometries being based on a shape of the object; generating, based on the plurality of 3D proxy geometries, a plurality of neural textures of the object, the neural textures defining a plurality of different shapes and appearances representing the object” is taught by Lombardi  in view of Saragih (Lombardi, e.g. sections 6.2, 8.6, figure 9, teaches an alternative hybrid rendering technique which relies on both the neural volume encoder/decoder network and a textured surface mesh.  Lombardi further indicates that they used Saragih’s Deep Appearance Model (DAM) trained to reconstruct the same face as the neural volume network, with a modified ray marching algorithm that uses the mesh color when intersected by the ray.  Lombardi does not describe DAMs in detail, i.e. although it would be implicit to one of ordinary skill in the art that a learned mesh representation of a human head would include a plurality of 3D proxy geometries based on the shape of the head in the input video, i.e. the surfaces making up the mesh are learned from the input images, Lombardi does not discuss whether the DAM includes neural textures based on the human head defining a plurality of different shapes and appearances representing the object, per se.)  However, these limitations are taught by Saragih (Saragih, e.g. abstract, sections 1, 3-6, describes DAMs, which are created by generating personalized blendshape mesh models, e.g. section 3, figure 2, textured using the output of a neural network decoder, e.g. section 4.  That is, analogous to Lombardi’s neural volume renderer, Saragih’s DAM generates a mesh model and uses control parameters including view and object pose as input to a neural network decoder which generates a reconstructed mesh and texture for rendering an image, e.g. section 4, paragraphs 4-6, corresponding to the claimed neural textures generated based on the 3D proxy geometries.)
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement Lombardi’s hybrid neural volume rendering system using Saragih’s DAM both because Lombardi discloses doing so, and because of the noted advantages to combining the techniques, e.g. as shown in figure 9, neural volume rendering performs better for hair whereas textured meshes perform better for fine details of the face.
The limitation “providing the plurality of neural textures to a neural renderer, the plurality of neural textures being provided in a stacked formation; receiving, from the neural renderer and based on the plurality of neural textures, a color image and an alpha mask representing an opacity of at least a portion of the object” is taught by Lombardi in view of Saragih (As discusses above, the hybrid neural volume/surface renderer includes a mesh model having neural textures used in rendering the color and alpha images.  Further, in performing the Lombardi’s ray marching, the mesh is placed into the volume, e.g. section 6.2, i.e. as the ray is marched through the scene it samples volume values until it reaches maximum opacity or it intersects a surface of the mesh.  This corresponds to the claimed “neural textures being provided in a stacked fashion”, because the mesh represents all, or nearly all, of the surface area of the head, e.g. as shown in Figure 2 of Saragih, the mesh includes not only the face, but the ear, top of the head, and the side of the neck, and Lombardi figure 7 shows at least 3 sides of the face are captured.  Placing a mesh representing all, or nearly all, of the surface area into the scene and performing ray casting corresponds to the stacked fashion, because roughly half of the mesh elements/neural textures will be behind one of the other mesh elements/neural textures, e.g. in the example viewpoint being rendered in figure 2, the mesh elements/neural textures for the head’s right ear and cheek are hidden behind the mesh elements/neural textures for the head’s left ear, cheek, etc., creating a stack of surfaces with respect to the viewpoint.)
Regarding claim 2, the limitation “rendering a latent texture onto a target viewpoint based at least in part on the pose associated with the object, wherein each of the plurality of 3D proxy geometries include a … geometric approximation of at least a portion of the object and the latent texture of the object mapped to the … geometry approximation” is taught by Lombardi in view of Saragih (As discussed in the claim 1 rejection above, Saragih’s DAM generates a mesh model and uses control parameters including view and object pose as input to a neural network decoder which generates a reconstructed mesh and texture, e.g. section 4, paragraphs 4-6, where the control parameters are a latent facial code and a relative pose of the head to the camera.  That is, the neural texture is reconstructed using a latent representation and the relative camera pose, and the mesh elements approximate the shape of the head, and have the reconstructed neutral textures mapped thereon, corresponding to the claimed geometric approximation with latent textures mapped thereon.)
Regarding claim 4, the limitation “wherein each of the plurality of 3D proxy geometries encode surface light field associated with the object in the image content, the surface light field including specular reflections associated with the object” is taught by Lombardi in view of Saragih (Saragih, e.g. section 2, paragraph 3, section 4, teaches that the DAM is capturing the lightfield at the manifold surfaces of the object, including view and light dependent specularity effects, i.e. the DAM encodes a surface lightfield of the object including specular reflections.)
Regarding claim 5, the limitation “wherein the plurality of neural textures are based, at least in part, on the pose, the neural texture being generated by: identifying a category of the object” is implicitly taught by Lombardi (Lombardi teaches, e.g. section 6.2, that certain kinds of scene content is more efficiently represented using surfaces, with human faces being one example.  Lombardi also includes example objects that are not human faces, e.g. figure 1.  Although not explicitly stated by Lombardi, one of ordinary skill in the art would have understood that the hybrid rendering technique could only be enabled for objects for which a surface based representation model was available, i.e. Saragih’s DAM is only appropriate for hybrid rendering when the object in the image is a human face, and could not be used for hybrid rendering of the left side examples in figure 1.  Therefore, the hybrid rendering technique would have to be enabled, implicitly, if not inherently, in response to identifying a category of the object, e.g. by a human operating the system could set a system parameter indicating whether a given input dataset is a human face or is an object that is not a human face as is conventional in the art for specifying processing parameters.)
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lombardi’s hybrid neural volume rendering system using Saragih’s DAM, to allow a human operator to selectively enable the hybrid rendering using the DAM based on the operator’s determination that the object in the input video is or is not a human face, because one of ordinary skill in the art would have understood that the hybrid rendering technique could only be enabled for objects for which a surface based representation model was available, i.e. Saragih’s DAM is only appropriate for hybrid rendering when the object in the image is a human face, and could not be used for hybrid rendering of the left side examples in figure 1.
The limitation “generating a feature map based on the identified category of the object; providing the feature map to a neural network; and generating the neural texture based on a latent code associated with each instance of the identified category of object and a view associated with the pose” is taught by Lombardi in view of Saragih (Saragih, e.g. section 4, paragraphs 4-6, teaches that the latent facial code zt encodes features of the subjects facial state, e.g. eye gaze direction, mouth pose, tongue expression, etc., where the features in Saragih’s model are for the identified category of object as discussed above, i.e. a human face.  Further, the neural texture is learned based on latent codes generated by the encoder, i.e. the encoder is trained based on received images having a known relative pose to the head generating latent codes zt encoding different instances/input images, which are used as input to the decoder, as explained in section 4.)
Regarding claim 6, the limitation “wherein at least a portion of the object is a transparent material” is taught by Lombardi (Lombardi, e.g. Figures 2, 6, 7, 10, includes a human head with glasses, where said glasses include a transparent material.  Further, Lombardi, e.g. section 9, paragraph 3 suggests the model can be extended to handle transparent, refractive, and reflective surfaces.)
Regarding claim 7, the limitation “wherein at least a portion of the object is a reflective material” is taught by Lombardi (Lombardi, e.g. Figures 2, 6, 7, 10, includes a human head with glasses, where said glasses are reflective, i.e. glass and other lens construction materials are usually reflective.  Further, as discussed in the claim 4 rejection above, the encoded texture includes specular effects, i.e. reflections.  Finally, Lombardi, e.g. section 9, paragraph 3 suggests the model can be extended to handle transparent, refractive, and reflective surfaces.)
Regarding claim 8, the limitation “wherein the image content includes telepresence image data including at least one user; and the object includes a pair of eyeglasses” is taught by Lombardi (Lombardi, e.g. Figures 2, 6, 7, 10, includes a human head with glasses.  Further, as discussed in the above 112b rejection, the broader interpretation is relied on, i.e. image content which could be used for telepresence image data, which includes Lombardi’s image content, i.e. the image content for the user with the glasses could be used as part of a telepresence system, such as a real-time VR application mentioned in section 1, paragraph 6.)
Regarding claims 9 and 15, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 1 above, with Lombardi teaching the use of programmed processors to implement the network, e.g. section 7.4.
Regarding claims 10 and 16, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 2 above.
Regarding claim 11, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 4 above.
Regarding claims 12 and 18, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 5 above.
Regarding claim 13, the limitation “wherein the neural renderer uses a generative model to reconstruct unseen object instances within the identified category, the reconstruction based on less than four captured views of the object” is suggested by Lombardi in view of Saragih (Lombardi, e.g. section 8.6, figure 8, teaches that the latent space interpolation allows for generating novel sequences in real time, i.e. providing a control signal causing the hybrid neural renderer to reconstruct unseen object instances.  Further, Saragih, section 4.1 describes additional conditioning variables, including identity conditioning, suggesting that with enough data, a single image would be sufficient to reconstruct an avatar for a given user, avoiding the need to generate an encoder/decoder for every new human face/user, thereby reducing processing requirements.) 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lombardi’s hybrid neural volume rendering system using Saragih’s DAM, allowing a human operator to selectively enable the hybrid rendering using the DAM based on the operator’s determination that the object in the input video is or is not a human face, to try using enough data to learn a single encoder/decoder which models multiple people through an identity variable in order to reduce the required number of images to one and avoid the processing requirements involved in generating an encoder/decoder for every new human face/user as suggested by Saragih.  In the modified system, the while the hybrid rendering mode would still require learning the neural volume encoder/decoder for a given input video sequence, the DAM representing the human would merely require providing said one input image to reconstruct the textures and meshes for other viewpoints and poses.
Regarding claim 14, the limitation “wherein the plurality of 3D proxy geometries are based on the geometry interpolation of shapes that construct the object in the image content” is taught by Lombardi in view of Saragih (Saragih’s reconstructed mesh generated by the decoder is a geometric interpolation of the meshes associated with each input image, where the meshes comprise a set of triangles representing the human face in the input image.  It is also noted that Applicants disclosure, e.g. paragraph 90-92 describes the interpolation of geometry being performed in a latent space using latent codes, which describes Saragih’s system, e.g. section 4.)
Regarding claim 19, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 6 above.
Regarding claim 20, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 7 above.
Regarding claim 21, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 8 above.

Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over “Neural Volumes: Learning Dynamic Renderable Volumes from Images” by Stephen Lombardi, et al. (hereinafter Lombardi) in view of “Deep Appearance Models for Face Rendering” by Stephen Lombardi, Jason Saragih, et al. (hereinafter Saragih) as applied to claim 15 above, and further in view of “Optimizing the Latent Space of Generative Networks” by Piotr Bojanowski, et al. (hereinafter Bojanowski).
Regarding claim 22, the limitation “wherein the composite image is generated using a Generative Latent Optimization (GLO) framework and perceptual reconstruction losses” is partially taught by Lombardi or Saragih (Although Lombardi, e.g. section 7.4, and Saragih, e.g. section 4, paragraph 10, both teach perceptual reconstruction losses, i.e. Lombardi measures differences between the rendered and ground truth images, and Saragih measures color and shape differences between the reconstructed and ground truth textures and meshes, neither teaches the use of Generative Latent Optimization, per se.  Rather, both rely on the Adam algorithm)  However, this limitation is suggested by Bojanowski (Bojanowski, e.g. abstract, sections 1-5, disclose the GLO framework for optimizing the latent space of generative networks, which can be used to improve the quality of the resulting network compared to other known latent space optimization techniques, e.g. section 1.1, section 3, section 5.)  
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lombardi’s hybrid neural volume rendering system using Saragih’s DAM, to use Bojanowski’s GLO framework to optimize the latent space of one or both of the volume encoder/decoder and the human face surface encoder/decoder models because the quality of the resulting networks could be improved, as suggested by Bojanowski’s results.

Allowable Subject Matter
Claims 3 and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
Claims 3 and 17 recite limitations requiring that the neural textures are configured to reconstruct a hidden portion of the object based on the stacked formation of the neural textures enabling the neural renderer to generate transparent layers of the object and surfaces behind the transparent layers.  Although, as discussed in the above rejections of claims 6-8, Lombardi’s hybrid neural rendering can render objects having transparent elements, and as discussed in the claim 13 rejection, can reconstruct unseen instances, Lombardi and Saragih do not teach or suggest that the learned surface model include transparent surfaces to reconstruct hidden portions of the object or other surfaces, i.e. Lombardi, section 6.2, teaches that intersection with the surface model is treated as a fully opaque surface.  The nearest cited prior art to this feature is “Through the Looking Glass: Neural 3D Reconstruction of Transparent Shapes” by Zhenqin Li, which generates an encoder/decoder for rendering a surface model of a transparent object analogous to Saragih’s DAMs, however Li’s neural rendering technique is exclusive to wholly transparent objects, e.g. fully glass objects, as in section 3, and does not suggest extending the modeling technique including transparent surfaces stacked on non-transparent surfaces.  Therefore, because the cited prior art does not teach or otherwise suggest a neural texture based surface modeling technique using stacked neural textures having transparent layers in front of visible reconstructed hidden portions of the object which could be used in Lombardi’s hybrid neural rendering system, the scope of depending claims 3 and 17 when considered as whole with the limitations of the independent claims is not anticipated by or obvious in view of the cited prior art.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT BADER whose telephone number is (571)270-3335. The examiner can normally be reached 10-6 m-f.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached on 571-272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ROBERT BADER/           Primary Examiner, Art Unit 2619