DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Election/Restrictions

	Applicant’s election without traverse of claims 1-17 in the reply filed on 11/02/2022 is acknowledged.


Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention. 
Claims 1, and 11, recite “A facial expression factor.” This claim language, given its broadest reasonable interpretation, is such that a person of ordinary skill in the relevant art would read it with more than one reasonable interpretation. The specification does not provide any clear definition to this phrase, rendering the metes and bounds of the claim unclear.
 The claims 2-10, and 12-17,  are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph for their dependency to claims 1, and 11. 
 
Claim Rejections - 35 USC § 103

1.        In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

2.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

3.	Claims 1, 2, 4-9, 11-16 are rejected under 35 U.S.C. 103 as being unpatentable over (BAI ZIQIAN ET AL: "Deep Facial Non-Rigid Multi-View Stereo", 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 13 June 2020 (2020-06-13)), and further in view of Wang et al., US 2020/0013212 A1.

4.	As per claim 3, Neuman discloses:  The apparatus of claim 2, (See rejection of claim 2 above)

5.	As per claim 1, Bai discloses: A computer-implemented method, comprising: 
collecting multiple images of a subject, the images from the subject comprising one or more simultaneous views from different profiles of the subject; ( Bai, Abstract, “We present a method for 3D face reconstruction from multi-view images with different expressions.”)
forming a three-dimensional mesh for the subject based on a facial expression factor and a head pose of the subject extracted from the images of the subject; (Bai, Sub-Section 3.1, “Given a set of M facial images {Ii}M i=1 capturing the same person but under different expressions and views, the estimation of 3D facial geometry Vi and 6 DoF rigid head pose pi for each image can be formulated as a Non-Rigid Multi-View Stereo (NRMVS) optimization by minimizing the appearance-consistency error and landmark fitting error.”, and Sub-Section 3.5: “Given ground truth meshes with the corresponding vertices of the reconstructed meshes, our network, i.e., 2 FPNs Ffpn and F_ fpn, the MLP Fmlp for step size prediction, and the basis network Fbasis, is trained in a supervised manner with standard losses. For each vertex, we compute the point-to-point L2 distance between ground truth and reconstructed meshes (with poses) of all iterations, all views, and all levels after depth alignment and dense alignment separately (i.e., 2 losses per vertex).”)
forming a three-dimensional model for the subject based on the three-dimensional mesh and the texture transformation; (Bai, Sub-Section 3.4,” In order to better recover the details of 3D face shapes, we adopt a multi-level scheme. Specifically, we split the reconstruction process into 3 sequential levels l = 1, 2, 3, each of which solves a NRMVS optimization and outputs the reconstructions for all views…”, and Figure 2, Level 1 Reconstructions)
determining a loss factor based on selected points in a test image from the subject and a rendition of the test image by the three-dimensional model; (Bai, Sub-Section 3.5,” Given ground truth meshes with the corresponding vertices of the reconstructed meshes, our network, i.e., 2 FPNs Ffpn and F_ fpn, the MLP Fmlp for step size prediction, and the basis network Fbasis, is trained in a supervised manner with standard losses….” and
 updating the three-dimensional model according to the loss factor.  (Bai, Sub-Section 3.6, “Here we further clarify the relationship between the NRMVS optimization and the training procedure. The NRMVS optimization belongs to the forward pass of our model. It can be analogized to a differentiable module, which takes in old reconstruction parameters and computes the updates to output new reconstructions iteratively. Then, the training losses are computed on the outputs of each iteration, whose gradient will be backwarded through the whole NRMVS optimization to update learnable weights (i.e., the weights of Ffpn and F_ fpn in Fig. 2, Fmlp in Fig. 3, andFbasis in Fig. 4).

6.	Bai doesn’t expressly disclose:
 	forming a texture transformation based on an illumination parameter associated with an illumination configuration for the images from the subject.

7.	Wang discloses:
 forming a texture transformation based on an illumination parameter associated with an illumination configuration for the images from the subject; (Wang, [0055], “Example 4 includes the subject matter of any of Examples 1-3, wherein the texture map generation further comprises removing illumination from the reference facial image based on the illumination parameter of the first 3DMM.”)

8.	Wang is analogous art with respect to Bai because they are from the same field of endeavor, namely image processing.  At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include the process of installing the processes of forming a texture transformation based on an illumination parameter associated with an illumination configuration for the images from the subject, as taught by Wang into the teaching of Bai.  The suggestion for doing so would simplify generate a replaced facial image. Therefore, it would have been obvious to combine Wang with Bai.

9.	As per claim 2, Bai in view of Wang discloses: The computer-implemented method of claim 1, further comprising: collecting a binocular image from the subject; (Bai, Abstract,” We present a method for 3D face reconstruction from multi-view images with different expressions. We formulate this problem from the perspective of non-rigid multi-view stereo (NRMVS).”) obtaining a three-dimensional representation of the subject by applying the three- dimensional model to the binocular image from the subject; (Bai, figure 1, “Figure 1: We present Deep Facial Non-Rigid Multi-Vie Stereo (DFNRMVS) to recover high-quality 3D models from multiple images of dynamic faces through multi-view optimization. From left to right are input images, initial 3D models, and 3D models after three-level optimization. Our DFNRMVS can gradually improves 3D models.”) and embedding the three-dimensional representation of the subject in a virtual reality environment in real-time.  (Wang, [0013], “The rendering is based on the parameters of the first 3DMM, the parameters of the second 3DMM, and the generated texture map associated with the fitted 3D reference face. A region of interest of the target facial image may then be determined and the 3D reference face is blended onto that region to generate a replaced facial image.”) The proposed combination as well as the motivation for combining the references presented in the rejection of the parent claim apply to this claim and are incorporated herein by reference.

10. 	As per claim 4, Bai in view of Wang discloses: The computer-implemented method of claim 1, wherein forming the three-dimensional mesh comprises identifying a facial expression of the subject in the images, and associating a facial expression factor with the facial expression of the subject.  (Bai, Sub-Section 3.1, “Given a set of M facial images {Ii}M i=1 capturing the same person but under different expressions and views, the estimation of 3D facial geometry Vi and 6 DoF rigid head pose pi for each image can be formulated as a Non-Rigid Multi-View Stereo (NRMVS) optimization by minimizing the appearance-consistency error and landmark fitting error.”, and Sub-Section 3.5: “Given ground truth meshes with the corresponding vertices of the reconstructed meshes, our network, i.e., 2 FPNs Ffpn and F_ fpn, the MLP Fmlp for step size prediction, and the basis network Fbasis, is trained in a supervised manner with standard losses. For each vertex, we compute the point-to-point L2 distance between ground truth and reconstructed meshes (with poses) of all iterations, all views, and all levels after depth alignment and dense alignment separately (i.e., 2 losses per vertex).”)

11. 	As per claim 5, Bai in view of Wang discloses:  The computer-implemented method of claim 1, wherein forming the three-dimensional mesh comprises identifying a head pose of the subject, the head pose including a rotation of a head of the subject and a translation of the head of the subject.  ( Bai, Sub-Section 3.1, “Given a set of M facial images {Ii}M i=1 capturing the same person but under different expressions and views, the estimation of 3D facial geometry Vi and 6 DoF rigid head pose pi for each image can be formulated as a Non-Rigid Multi-View Stereo (NRMVS) optimization by minimizing  the appearance-consistency error and landmark fitting error.”)

12. 	As per claim 6, Bai in view of Wang discloses: The computer-implemented method of claim 1, wherein forming the texture transformation comprises using a bias matrix and a gain matrix including the facial expression factor, the head pose, and the illumination parameter.  (Bai, Sub-Section 3.3.,. Notes: “Sub-Section 3.3 applies a texture transformation by using preliminary reconstructions derived based on a facial expression factor, a head pose and by applying UV feature maps and position map derived based on an illumination parameter. Furthermore, the reference Bai is using neural network that it is well known in the art uses a bias matrix. Also sub-section 3.5 discloses a training lose that uses a gain matrix.”)

13. 	As per claim 7, Bai in view of Wang discloses:  The computer-implemented method of claim 1, wherein forming the texture transformation comprises determining an illumination parameter based on an illumination configuration for the images from the subject.  (Wang, [0055], “Example 4 includes the subject matter of any of Examples 1-3, wherein the texture map generation further comprises removing illumination from the reference facial image based on the illumination parameter of the first 3DMM.”)

14. 	As per claim 8, Bai in view of Wang discloses:  The computer-implemented method of claim 1, wherein determining a loss factor comprises projecting a three-dimensional representation of the subject onto a two- dimensional image and comparing a selected point in the two-dimensional image with a corresponding point in the test image.(Bai, Sub-Section 3.5., “Given ground truth meshes with the corresponding vertices of the reconstructed meshes, our network, i.e., 2 FPNs Ffpn and F_ fpn, the MLP Fmlp for step size prediction, and the basis network Fbasis, is trained in a supervised manner with standard losses. For each vertex, we compute the point-to-point L2 distance between ground truth and reconstructed meshes (with poses) of all iterations, all views, and all levels after depth alignment and dense alignment separately (i.e., 2 losses per vertex). For depth alignment, we compute the mean depth difference”)

15.	As per claim 9,  Bai in view of Wang discloses: The computer-implemented method of claim 1, wherein updating the three-dimensional model comprises evaluating the loss factor for an incremental change to the head pose over an incremental period of time. (Bai, Sub-Section 3.5, “Given ground truth meshes with the corresponding vertices of the reconstructed meshes, our network, i.e., 2 FPNs Ffpn and F_ fpn, the MLP Fmlp for step size prediction, and the basis network Fbasis, is trained in a supervised manner with standard losses. For each vertex, we compute the point-to-point L2 distance between ground truth and reconstructed meshes (with poses) of all iterations, all views, and all levels after depth alignment and dense alignment separately (i.e., 2 losses per vertex).”)

16.	Claim 11, which is similar in scope to claim 1, thus rejected under the same rationale.

17. 	As per claim 12, Bai in view of Wang discloses: The system of claim 11, further comprising an array of video cameras configured to collect the multiple images of the subject, including one or more simultaneous views from different profiles of the subject.  (Bai, Section 1. Introduction, “With the images captured under well calibrated multi-view systems like camera arrays, faithful 3D geometry can be recovered with algorithms leveraging multi-view geometric constraints [5, 10].”)

18. 	As per claim 13, Bai in view of Wang discloses: The system of claim 11, further comprising an array of illumination sources to adjust the illumination configuration for the images from the subject. ( Wang, [0014], “As will be appreciated, the techniques described herein may allow for improved facial image replacement, particularly when the reference and target faces are in presented different pose and illumination, compared to existing methods that rely on 2-dimensional image warping techniques.”)

19. 	As per claim 14, Bai in view of Wang discloses: The system of claim 11, wherein the one or more processors further execute instructions to synchronize the images of the subject collected from two or more different cameras and to form a stereoscopic view of a facial expression of the subject. (Bai, Section 2. Related work, “ One major drawback of all mentioned methods is that they require either the images are synchronized or the subject is static during data capturing.”, and Figure 1, “We present Deep Facial Non-Rigid Multi-View Stereo (DFNRMVS) to recover high-quality 3D models from multiple images of dynamic faces through multi-view optimization. From left to right are input images, initial 3D models, and 3D models after three-level optimization. Our DFNRMVS can gradually improves 3D models.”.)

20.	Claim 15, which is similar in scope to claim 2, thus rejected under the same rationale.

21.	Claim 16, which is similar in scope to claim 9, thus rejected under the same rationale.

22.	Claims 3, 10, and 17, are  rejected under 35 U.S.C. 103 as being unpatentable over (BAI ZIQIAN ET AL: "Deep Facial Non-Rigid Multi-View Stereo", 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 13 June 2020 (2020-06-13)), and in view of Wang et al., US 2020/0013212 A1, and further in view of Tuzel et al., US 2017/0256033 A1.

23. 	As per claim 3, Bai in view of Wang discloses:  The computer-implemented method of claim 1,(See rejection of claim 1 above.)

24.	Bai in view of Wang doesn’t expressly  disclose:  forming a texture transformation based on an illumination parameter comprises providing the images from the subject under multiple illumination configurations to a low-resolution multilayered network and to a high-resolution multilayered network; and combining an output from the low-resolution multilayered network with an output of the high-resolution multilayered network.  

25.	Tuzel discloses: forming a texture transformation based on an illumination parameter comprises providing the images from the subject under multiple illumination configurations to a low-resolution multilayered network and to a high-resolution multilayered network; and combining an output from the low-resolution multilayered network with an output of the high-resolution multilayered network. (Tuzel, [0029], “The second stream produces the high frequency characteristic facial details, such as eyes and nose, using a non-linear fully connected neural network 201. Hidden layers of this network build a global representation of high-resolution face images that can be inferred from the low-resolution input 101. The multi-layer nonlinear embedding and reconstruction used by the network 201 enables more effective encoding of details of the upsampled image 204, such as characteristic facial features. In addition, variations such as alignment, face pose, and illumination can be effectively modelled. The two streams generated by GN are concatenated 210 to be processed by LN 260.”)

26.	Tuzel is analogous art with respect to Bai in view of Wang because they are from the same field of endeavor, namely image processing.  At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include the process of installing the processes of forming a texture transformation based on an illumination parameter comprises providing the images from the subject under multiple illumination configurations to a low-resolution multilayered network and to a high-resolution multilayered network; and combining an output from the low-resolution multilayered network with an output of the high-resolution multilayered network, as taught by Tuzel into the teaching of Bai in view of Wang.  The suggestion for doing so would generate an image with high resolution. Therefore, it would have been obvious to combine Tuzel with Bai in view of Wang.

27. 	As per claim 10, Bai in view of Wang, and in view of Tuzel discloses: The computer-implemented method of claim 1, wherein updating the three-dimensional model according to the loss factor comprises embedding a statistical value for the illumination parameter in the texture transformation, the statistical value derived from a multilayered network comprising the images of the subject under multiple illumination configurations. ; (Tuzel, [0029], “The second stream produces the high frequency characteristic facial details, such as eyes and nose, using a non-linear fully connected neural network 201. Hidden layers of this network build a global representation of high-resolution face images that can be inferred from the low-resolution input 101. The multi-layer nonlinear embedding and reconstruction used by the network 201 enables more effective encoding of details of the upsampled image 204, such as characteristic facial features. In addition, variations such as alignment, face pose, and illumination can be effectively modelled. The two streams generated by GN are concatenated 210 to be processed by LN 260.”) The proposed combination as well as the motivation for combining the references presented in the rejection of the claim 3 apply to this claim and are incorporated herein by reference

28.	Claim 17, which is similar in scope to claim 3, thus rejected under the same rationale.

Conclusion 

29.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABDERRAHIM MEROUAN whose telephone number is (571)270-5254.  The examiner can normally be reached on Monday to Friday 8 AM-5 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Kent Chang can be reached on 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

	
/ABDERRAHIM MEROUAN/Primary Examiner, Art Unit 2619