Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

DETAILED ACTION
Specification
35 U.S.C. 112(a) or pre-AIA  35 U.S.C.  112, requires the specification to be written in “full, clear, concise, and exact terms.” The specification is replete with terms which are not clear, concise and exact. The specification should be revised carefully in order to comply with 35 U.S.C. 112(a) or pre-AIA  35 U.S.C.  112. Examples of some unclear, inexact or verbose terms used in the specification are: “face model generation model” which appears to refer to a neural network, not a face model, “initial model” which also appears to refer to a neural network, not a model, and “model parameter” of the initial network, which appears to refer to neural network weights or coefficients of the initial model/network.
Appropriate correction is required.

Drawings
The drawings are objected to because of unclear terms as in the specification above.  .  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
In Figs. 2 and 4, the unclear, inexact or verbose terms used in the specification are: “face model generation model” which appears to refer to a neural network, not a face model, “initial model” which also appears to refer to a neural network, not a model, and “model parameter” of the initial network, which appears to refer to neural network weights or coefficients of the initial model/network.  

Claim Objections
Claims  1, 2, 6, 7, 18, and 19 are objected to because of the following informalities:  
For claim 1, Examiner believes this claim should be amended in the following manner:
1. A method for generating a three-dimensional (3D) face model, performed by a face model generation [[model]]network running on a computer device, the method comprising:
obtaining a two-dimensional (2D) face image as input to the face model generation [[model]]network;
extracting global features and local features of the 2D face image;
obtaining a 3D face model parameter based on the global features and the local features; and 
outputting a 3D face model corresponding to the 2D face image based on the 3D face model parameter.  

2. The method according to claim 1, 
wherein the face model generation [[model]]network comprises an encoder, the encoder comprising a convolutional neural network (CNN) with a plurality of convolutional layers, and 
wherein extracting the global features and the local features of the 2D face image comprises:
performing feature extraction on the 2D face image based on the plurality of convolutional layers, to obtain the global features of the 2D face image;
obtaining a central position of a landmark of the 2D face image; and
extracting, based on the central position, partial features as the local features of the 2D face image from features obtained by at least one target convolutional layer in the plurality of convolutional layers, 
wherein the at least one target convolutional layer corresponds to a target size.  

6. A method for training a face model generation [[model]]network, the face model generation [[model]]network being used for generating a 3D face model based on a 2D face image, the method comprising:
obtaining a plurality of 2D face image samples;
invoking an initial [[model]]network comprising a [[model]]network parameter, and inputting the plurality of 2D face image samples into the initial [[model]]network;
extracting global features and local features of each of the plurality of 2D face image samples by using the initial [[model]]network;
obtaining a 3D face model parameter based on the global features and the local features;
19PCT444/JS35outputting a 3D face model corresponding to the 2D face image sample based on the 3D face model parameter;
projecting the 3D face model, to obtain a 2D face image corresponding to the 3D face model;
obtaining a similarity between the 2D face image and the 2D face image sample; and 
adjusting the [[model]]network parameter of the initial [[model]]network based on the similarity until a target condition is met, to obtain the face model generation [[model]]network.  

7. The method according to claim 6, 
wherein the face model generation [[model]]network comprises an encoder, the encoder comprising a convolutional neural network with a plurality of convolutional layers, and 
wherein extracting the global features and the local features of the 2D face image sample comprises:
performing feature extraction on the 2D face image sample based on a plurality of convolutional layers, to obtain the global features of the 2D face image sample;
obtaining a central position of a landmark of the 2D face image sample; and 
extracting, based on the central position, partial features as the local features of the 2D face image sample from features obtained by at least one target convolutional layer in the plurality of convolutional layers,
wherein the at least one target convolutional layer corresponds to a target size.  

18. A device for generating a 3D face model using a face model generation [[model]]network running on the device, the device comprising a memory for storing computer instructions and a processor in communication with the memory, wherein, when the processor executes the computer instructions, the processor is configured to cause the device to:
obtain a two-dimensional (2D) face image as input to the face model generation [[model]]network;
extract global features and local features of the 2D face image;
obtain a 3D face model parameter based on the global features and the local features; and 
output a 3D face model corresponding to the 2D face image based on the 3D face model parameter.  

19. The device according to claim 18, wherein the face model generation [[model]]network comprises an encoder, the encoder comprising a convolutional neural network with a plurality of convolutional layers, and wherein, when the processor is configured to cause the device to extract the global features and the local features of the 2D face image, the processor is configured to cause the device to:
perform feature extraction on the 2D face image based on the plurality of convolutional layers, to obtain the global features of the 2D face image;
obtain a central position of a landmark of the 2D face image; and
extract, based on the central position, partial features as the local features of the 2D face image 19PCT444/JS39from features obtained by at least one target convolutional layer in the plurality of convolutional layers, 
wherein the least one target convolutional layer corresponds to a target size.

Appropriate correction is required.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1 and 18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Dou et al. (NPL “End-to-end 3D face reconstruction with deep neural networks”).
As per claim 1, Dou discloses a method for generating a three-dimensional (3D) face model, performed by a face model generation model running on a computer device (Dou, abstract, “ … End-to-End 3D face reconstruction …”), the method comprising:
obtaining a two-dimensional (2D) face image as input to the face model generation model (Dou, abstract, “ … from a single 2D image”; and Fig. 2, where a neural network model is used that corresponds to the face model generation model; and the 2D image is input into the face-model-generation model);
extracting global features and local features of the 2D face image (Dou, Figure 2 and Section 3.1, the “first type of neural layers” computes generic features such as edges and corners.  These features are regarded as local features.  The “third type of neural layers” produce class-specific features which are regarded as global features);
obtaining a 3D face model parameter based on the global features and the local features (Dou, Figure 2 and Equation (1), the identity parameter vector and the expression parameter vector are the 3D face model parameters and are computed using the outputs of the first type of neural layers (= local features) and the third type of neural layers (= global features)); and 
outputting a 3D face model corresponding to the 2D face image based on the 3D face model parameter (Dou, Figures 1 and 2).

As per claim 18, Dou discloses a device for generating a 3D face model using a face model generation model running on the device, the device comprising a memory for storing computer instructions and a processor in communication with the memory (Dou, abstract and Section 4.3), wherein, when the processor executes the computer instructions, the processor is configured to cause the device to:
obtain a two-dimensional (2D) face image as input to the face model generation model;
extract global features and local features of the 2D face image;
obtain a 3D face model parameter based on the global features and the local features; and 
output a 3D face model corresponding to the 2D face image based on the 3D face model parameter (see claim 1 rejection for detailed analysis).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 3, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Dou et al. (NPL “End-to-end 3D face reconstruction with deep neural networks”) in view of Jackson et al. (NPL “Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression”).
As per claim 2, claim 1 is incorporated and Dou discloses wherein extracting the global features and the local features of the 2D face image comprises:  performing feature extraction on the 2D face image based on the plurality of convolutional layers, to obtain the global features of the 2D face image (Dou, Figure 2 and Section 3.1, where the fourth pooling layer learns class-specific features which are more suitable for predicting the identity parameters).
Dou doesn’t disclose but Jackson discloses wherein the face model generation model comprises an encoder, the encoder comprising a convolutional neural network (CNN) with a plurality of convolutional layers (Jackson, Fig. 4b, “VRN Guided” and Section 3.3, where the network has an encoding/decoding structure), and 
obtaining a central position of a landmark of the 2D face image (Jackson, Section 3.3, subsection “VRN – Guided”, where a Gaussian centered on each of 68 landmarks is determined); and
extracting, based on the central position, partial features as the local features of the 2D face image from features obtained by at least one target convolutional layer in the plurality of convolutional layers (Jackson, Figure 4(b) and Section 3.3, subsection “VRN – Guided”, where a 6-diameter-pixel portion centered on each of 68 landmarks is extracted and used in the regression of a 3D face volume), 
wherein the at least one target convolutional layer corresponds to a target size (Jackson, Figure 4(b), where the 2D projection of each of the 3D landmarks are stacked with the original image; thus its convolutional layer corresponds to these landmark representations).  
Dou and Jackson are analogous since both of them are dealing with the reconstruction of 3D face models from a single image using neural networks. Dou provides a way of generating a 3D face model by regressing global features and local features separately. Jackson provides a way of generating a 3D face model by incorporating 6-pixel-diameter portions around landmark centers as part of the regression. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the use of landmark portions surrounding landmark centers taught by Jackson into the modified invention of Dou such that the system will be able to benefit by performing a simpler facial analysis task first (Jackson, Section 3.3, subsection “VRN – Guidance”).

As per claim 3, claim 2 is incorporated and Dou doesn’t disclose but Jackson discloses wherein extracting, based on the central position, the partial features as the local features of the 2D face image comprises:  snipping, from a feature map obtained by the at least one target convolutional layer, a feature map of the target size corresponding to the at least one target convolutional layer as the local features of the 2D face image by using the central position as a center (Jackson, Figure 4(b) and Section 3.3, subsection “VRN – Guided”, where a 6-diameter-pixel portion centered on each of 68 landmarks is extracted and used in the regression of a 3D face volume).
See claim 2 rejection for reason to combine.

As per claim 19, claim 18 is incorporated and Dou in view of Jackson discloses wherein the face model generation model comprises an encoder, the encoder comprising a convolutional neural network with a plurality of convolutional layers, and wherein, when the processor is configured to cause the device to extract the global features and the local features of the 2D face image, the processor is configured to cause the device to:
perform feature extraction on the 2D face image based on the plurality of convolutional layers, to obtain the global features of the 2D face image;
obtain a central position of a landmark of the 2D face image; and
extract, based on the central position, partial features as the local features of the 2D face image 19PCT444/JS39from features obtained by at least one target convolutional layer in the plurality of convolutional layers, 
wherein the least one target convolutional layer corresponds to a target size (see claim 2 rejection for detailed analysis).

As per claim 20, claim 19 is incorporated and Dou in view of Jackson discloses wherein, when the processor is configured to cause the device to extract, based on the central position, the partial features as the local features of the 2D face image, the processor is configured to cause the device to:
snip, from a feature map obtained by the at least one target convolutional layer, a feature map of the target size corresponding to the at least one target convolutional layer as the local features of the 2D face image by using the central position as a center (see claim 3 rejection for detailed analysis).

Claims 6 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Dou et al. (NPL “End-to-end 3D face reconstruction with deep neural networks”) in view of Yu et al. (NPL “Learning dense facial correspondences in unconstrained images”).
As per claim 6, Dou discloses a method for training a face model generation model, the face model generation model being used for generating a 3D face model based on a 2D face image (Dou, abstract, “ … End-to-End 3D face reconstruction …”), the method comprising:
obtaining a plurality of 2D face image samples (Dou, abstract, “ … from a single 2D image”; and Fig. 2, where a neural network model is used that corresponds to the face model generation model; and the 2D image is input into the face-model-generation model; and Section 1, second page, where available 3D facial databases map to additional 2D face samples);
extracting global features and local features of the 2D face image (Dou, Figure 2 and Section 3.1, the “first type of neural layers” computes generic features such as edges and corners.  These features are regarded as local features.  The “third type of neural layers” produce class-specific features which are regarded as global features);
obtaining a 3D face model parameter based on the global features and the local features (Dou, Figure 2 and Equation (1), the identity parameter vector and the expression parameter vector are the 3D face model parameters and are computed using the outputs of the first type of neural layers (= local features) and the third type of neural layers (= global features));
19PCT444/JS35 outputting a 3D face model corresponding to the 2D face image based on the 3D face model parameter (Dou, Figures 1 and 2).
Dou discloses reconstructing a 3D face shape from a single photo using a neural network.  Dou doesn’t disclose a lot of details regarding training the neural network.  However Yu discloses invoking an initial model comprising a model parameter, and inputting the plurality of 2D face image samples into the initial model (Yu, Figure 3 and Section 3.3, “Training Procedure”, where the proposed network to be trained maps to the initial network);
extracting global features and local features of each of the plurality of 2D face image samples by using the initial model (Yu, Section 3.3, “Training Procedure”, where, as part of the training procedure, synthetic renderings of faces are used that include changes in identity, expression, pose, lighting, and occlusion; these features are extracted and the loss between the input and target images is determined to update the trained network (mapping to the initial model));
projecting the 3D face model, to obtain a 2D face image corresponding to the 3D face model  (Yu, Section 3.2, where perspective projection is used in order to project a 3D face S to a 2D image);
obtaining a similarity between the 2D face image and the 2D face image sample  (Yu, Section 3.3 and Section 3.4, subsections “Loss Function” and “Training procedure”, where the loss function is calculated using a projection of the target image which is a projection of the 3D model with the source image); and 
adjusting the model parameter of the initial model based on the similarity until a target condition is met, to obtain the face model generation model  (Yu, Section 3.3 and Section 3.4, subsection “Training Procedure”, where the 3DMM fitting parameters are adjusted while training the network; this maps to adjusting the network parameters).  
Dou and Yu are analogous since both of them are dealing with the reconstruction of 3D face models from a single image using neural networks. Dou provides a way of generating a 3D face model by regressing global features and local features separately. Yu provides a way of training a network for generating a 3D model by projecting an image of the model and comparing it with the input image. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the comparison of the projected image with the original image taught by Yu into the modified invention of Dou such that the system will be able to get accurate face fitting without having to rely on invisible landmarks and the need for defining contour landmarks (Yu, p. 4729, first column).

As per claim 11, claim 6 is incorporated and Dou doesn’t disclose but Yu discloses wherein projecting the 3D face model, to obtain the 2D face image corresponding to the 3D face model comprises:
obtaining shooting information of the 2D face image sample based on the global features, the shooting information being used for indicating at least one of a shooting posture, illumination, or a shooting background during shooting of the 2D face image sample (Yu, Figure 4 and Section 3.3, subsection “Loss Function”, where illumination information when shooting the input image is determined); and 
projecting the 3D face model based on the shooting information, to obtain the 2D face image corresponding to the 3D face model  (Yu, Figure 4 and Section 3.3, subsection “Loss Function”, where changes in illumination are accounted for, and Section 3.3 and Section 3.4, subsections “Loss Function” and “Training procedure”, where the loss function is calculated using a projection of the target image which is a projection of the 3D model with the source image).  
See claim 6 rejection for reason to combine.

Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Dou et al. (NPL “End-to-end 3D face reconstruction with deep neural networks”) in view of Yu et al. (NPL “Learning dense facial correspondences in unconstrained images”) and in further view of Jackson et al. (NPL “Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression”).
As per claim 7, claim 6 is incorporated and Dou in view of Yu discloses wherein extracting the global features and the local features of the 2D face image comprises:  performing feature extraction on the 2D face image based on the plurality of convolutional layers, to obtain the global features of the 2D face image (Dou, Figure 2 and Section 3.1, where the fourth pooling layer learns class-specific features which are more suitable for predicting the identity parameters).
Dou in view of Yu doesn’t disclose but Jackson discloses wherein the face model generation model comprises an encoder, the encoder comprising a convolutional neural network (CNN) with a plurality of convolutional layers (Jackson, Fig. 4b, “VRN Guided” and Section 3.3, where the network has an encoding/decoding structure), and 
obtaining a central position of a landmark of the 2D face image (Jackson, Section 3.3, subsection “VRN – Guided”, where a Gaussian centered on each of 68 landmarks is determined); and
extracting, based on the central position, partial features as the local features of the 2D face image from features obtained by at least one target convolutional layer in the plurality of convolutional layers (Jackson, Figure 4(b) and Section 3.3, subsection “VRN – Guided”, where a 6-diameter-pixel portion centered on each of 68 landmarks is extracted and used in the regression of a 3D face volume), 
wherein the at least one target convolutional layer corresponds to a target size (Jackson, Figure 4(b), where the 2D projection of each of the 3D landmarks are stacked with the original image; thus its convolutional layer corresponds to these landmark representations).  
Dou in view of Yu and Jackson are analogous since both of them are dealing with the reconstruction of 3D face models from a single image using neural networks. Dou in view of Yu provides a way of training a neural network to generate a 3D face model from a single image by comparing a projected image from the model with the source image. Jackson provides a way of generating a 3D face model by incorporating 6-pixel-diameter portions around landmark centers as part of the regression. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the use of landmark portions surrounding landmark centers taught by Jackson into the modified invention of Dou in view of Yu such that the system will be able to benefit by performing a simpler facial analysis task first (Jackson, Section 3.3, subsection “VRN – Guidance”). 

As per claim 8, claim 7 is incorporated and Dou in view of Yu doesn’t disclose but Jackson discloses wherein extracting, based on the central position, the partial features as the local features of the 2D face image comprises:  snipping, from a feature map obtained by the at least one target convolutional layer, a feature map of the target size corresponding to the at least one target convolutional layer as the local features of the 2D face image by using the central position as a center (Jackson, Figure 4(b) and Section 3.3, subsection “VRN – Guided”, where a 6-diameter-pixel portion centered on each of 68 landmarks is extracted and used in the regression of a 3D face volume).
See claim 7 rejection for reason to combine.

Allowable Subject Matter
Claims 4-5, 9-10, and 12-17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and to overcome all claim objections.
The prior art of record fails to teach or suggest wherein:  performing feature extraction on the 2D face image based on the plurality of convolutional 19PCT444/JS34layers, to obtain the global features of the 2D face image comprises:
encoding the 2D face image based on the plurality of convolutional layers of the encoder, to obtain global feature vectors of the 2D face image;
extracting, based on the central position, partial features as the local features of the 2D face image comprises:
extracting a part of feature values from a global feature vector obtained by the at least one target convolutional layer, and 
obtaining a first local feature vector of the 2D face image based on the part of feature values; and 
obtaining the 3D face model parameter based on the global features and the local features comprises:
decoding the global feature vector and the first local feature vector based on a first decoder, to obtain the 3D face model parameter. in the context of claims 4 and 9..

The prior art of record fails to teach or suggest wherein obtaining a similarity between the 2D face image and the 2D face image sample comprises:
obtaining a first similarity based on positions of a landmark of the 2D face image and a corresponding landmark of the 2D face image sample; 19PCT444/JS37obtaining a second similarity based on a pixel value of a pixel of the 2D face image and a pixel value of a corresponding pixel of the 2D face image sample;
matching the 2D face image against the 2D face image sample, to obtain a third similarity, the third similarity being used for indicating whether an identity of a face in the 2D face image is the same as an identity of a face in the 2D face image sample; and 
obtaining the similarity between the 2D face image and the 2D face image sample based on the third similarity and at least one of the first similarity or the second similarity in the context of claim 12.

Conclusion
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DIANE M WILLS whose telephone number is (571)272-5583. The examiner can normally be reached on Mondays through Fridays from 9am to 6pm Eastern time.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang, can be reached at telephone number 571-272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
/DIANE M WILLS/Primary Examiner, Art Unit 2619