Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION

Status of Claims
Claims 1-15 are currently pending in this application.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on October 18, 2021 is hereby acknowledged.  All references have been considered by the examiner. Initialed copies of the PTO-1449 are included in this correspondence.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3, 6-8 and 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over Du et al. (2014/0035934 same as WO 2012/139276; IDS).

Regarding claim 1, Du teaches a method for generating a virtual avatar (e.g., A method and apparatus for capturing and representing 3D wire-frame, color and shading of facial expressions are provided, wherein the method includes the following steps: storing a plurality of feature data sequences, each of the feature data sequences corresponding to one of the plurality of facial expressions; and retrieving one of the feature data sequences based on user facial feature data; and mapping the retrieved feature data sequence to an avatar face. Du: Abstract L.1-8), comprising: 
determining a template coefficient of a target face in a to-be-processed image (e.g., FIG. 3 illustrates an exemplary logic flow 300, which shows exemplary operations of an avatar-based facial animation implementation.  Du: [0045] L.1-3.  At a block 302, a camera captures a user's facial expressions and head movements. Du: [0046]. It is obvious that the user’s facial expressions and head movements are to be processed and mapped to an avatar face, the image of the user is a to-be-processed image.  At a block 310, the facial features are fed into the motion capture database, and a most similar facial expression sequence is retrieved from the database. This retrieval provides a sequence that resembles the user's facial expression. Du: [0049].  In turn, a block 312 is performed. At this block, the human face is normalized and remapped to the avatar face. Also, at this block, the facial expression changes are copied to the avatar.  Du: [0051]. Then, at a block 314, the avatar is driven. This involve perform the same facial expression changes for the avatar as in the retrieval sequence. Also, in driving the avatar, the head rigid movements will be directly used. Du: [0051].  Facial expression changes are mapped to the avatar face.  FIG. 4 illustrates an exemplary logic flow 400, which shows exemplary operations of an avatar-based facial animation system.  Du: [0054] L.1-3. The face tracking module analyzes the face area, and calculates the animation parameters according to the facial image. Du: [0055] L.3-5. The animation parameters may include the pitch and yaw of the head, the mouth opening and closing, the eyebrow raising and squeezing. In embodiments, all of these parameters are analyzed through the face tracking module.  Du: [0056].  At block 414, parameters are provided. These parameters may be in the form of facial features (also referred to as input face features). In embodiments, sequences of such features may be used to drive an avatar at a block 416.  Du: [0062]) based on at least two real face feature templates (e.g., In embodiments, these multiple facial features may be the following nine features: 1. distance between upper and lower lips; 2. distance between two mouth corners; 3. distance between upper lip and nose tip; 4. distance between lower lip and nose tip; [0036] 5. distance between nose-wing and nose tip; 6. distance between upper and lower eyelids; 7. distance between eyebrow tip and nose-tip; 8. distance between two eyebrow tips; and 9. distance between eyebrow tip and eyebrow middle. Du: [0031]-[0040]); and 
determining a virtual avatar of the target face according to the template coefficient (e.g., Based on features retrieved from motion capture database 104, mapping module 106 controls the avatar. This may involve normalizing and remapping the human face to the avatar face, copying the facial expression changes to the avatar, and then driving the avatar to perform the same facial expression changes as in the retrieved features. In embodiments, mapping module 106 may include graphics rendering features that allow the avatar to be output by display device 112.  Du: [0022]. In an avatar-based system (e.g., a video chatting system), it is important to capture a user's head gestures, as well as the user's facial expressions. In embodiments, these operations may be performed by a face tracking module. In turn, these gestures and expressions may be expressed as animation parameters. Such animation parameters are transferred to a graphics rendering engine. In this way, the avatar system will be able to reproduce the original user's facial expression on a virtual 3D model. Du: [0052] L.4-12) and at least two virtual face feature templates associated with the at least two real face feature templates (e.g., In these blocks, a head model is projected onto a face area detected within the video frame that was read at block 402. More particularly, embodiments may employ a parameterized 3D head model to help the facial action tracking. The shape (e.g., the wireframe) of the 3D model is fully controlled by a set of parameters. In projecting the 3D model onto the face area of the input image, its parameters are adjusted so that the wireframe changes its shape and matches the user head position and facial expression. Du: [0060]. These parameters may be in the form of facial features (also referred to as input face features). In embodiments, sequences of such features may be used to drive an avatar at a block 416.  Du: [0062] L.1-4).

Regarding claim 2, Du teaches the method according to claim 1, wherein the determining the virtual avatar of the target face according to the template coefficient and the at least two virtual face feature templates associated with the at least two real face feature templates comprises: 
determining a virtual face image of the target face according to the template coefficient (e.g., Based on features retrieved from motion capture database 104, mapping module 106 controls the avatar. This may involve normalizing and remapping the human face to the avatar face, copying the facial expression changes to the avatar, and then driving the avatar to perform the same facial expression changes as in the retrieved features. In embodiments, mapping module 106 may include graphics rendering features that allow the avatar to be output by display device 112.  Du: [0022]. In an avatar-based system (e.g., a video chatting system), it is important to capture a user's head gestures, as well as the user's facial expressions. In embodiments, these operations may be performed by a face tracking module. In turn, these gestures and expressions may be expressed as animation parameters. Such animation parameters are transferred to a graphics rendering engine. In this way, the avatar system will be able to reproduce the original user's facial expression on a virtual 3D model. Du: [0052] L.4-12), the at least two virtual face feature templates () and the to-be-processed image (At a block 302, a camera captures a user's facial expressions and head movements. Du: [0046].  The user’s facial expressions and head movements are to be processed and mapped to an avatar face.); 
filling the virtual face image into a target face area in the to-be-processed image using a face mask of the target face in the to-be-processed image (e.g., As described above, the face tracking module may perform blocks 404-412, which provide an iterative procedure.  Du: [0059]. In these blocks, a head model is projected onto a face area detected within the video frame that was read at block 402. More particularly, embodiments may employ a parameterized 3D head model to help the facial action tracking. The shape (e.g., the wireframe) of the 3D model is fully controlled by a set of parameters. In projecting the 3D model onto the face area of the input image, its parameters are adjusted so that the wireframe changes its shape and matches the user head position and facial expression. Du: [0060]. For instance, FIG. 4 shows that, at block 404, the head model is projected onto the detected face (also referred to as the current face). This yields an un-warped texture of the current face at a block 406. At a block 408, this un-warped texture is compared with the template texture. Based on this calculation, one or more parameters of the 3D head model may be updated at a block 410.  Du: [0061] L.1-7); and 
using an image obtained through the filling as the virtual avatar (e.g., As indicated by a block 412, blocks 404-410 may be repeated if the 3D head model and the current face have not converged within a predetermined amount. Otherwise, operation may proceed to a block 414. Du: [0061] L.7-10. At block 414, parameters are provided. These parameters may be in the form of facial features (also referred to as input face features). In embodiments, sequences of such features may be used to drive an avatar at a block 416. Du: [0062]).

Regarding claim 3, Du teaches the method according to claim 2, wherein the determining the virtual face image of the target face according to the template coefficient, the at least two virtual face feature templates and the to-be-processed image comprises: 
determining a three-dimensional model of a virtual face according to the template coefficient and the at least two virtual face feature templates (e.g., FIG. 4 shows that, at a block 414, the animation parameters are sent to a rendering engine. In turn, the rendering engine drives an avatar 3D model based on the animation parameters at a block 416. Du: [0057]. In these blocks, a head model is projected onto a face area detected within the video frame that was read at block 402. More particularly, embodiments may employ a parameterized 3D head model to help the facial action tracking. The shape (e.g., the wireframe) of the 3D model is fully controlled by a set of parameters. In projecting the 3D model onto the face area of the input image, its parameters are adjusted so that the wireframe changes its shape and matches the user head position and facial expression.  Du: [0060]); 
extracting texture information of the target face from the to-be-processed image (e.g., FIG. 4 shows that, at block 404, the head model is projected onto the detected face (also referred to as the current face). This yields an un-warped texture of the current face at a block 406. Du: [0061] L.1-4); and 
rendering the three-dimensional model of the virtual face according to the texture information of the target face, to obtain the virtual face image (e.g., At a block 408, this un-warped texture is compared with the template texture. Based on this calculation, one or more parameters of the 3D head model may be updated at a block 410. As indicated by a block 412, blocks 404-410 may be repeated if the 3D head model and the current face have not converged within a predetermined amount. Otherwise, operation may proceed to a block 414. Du: [0061] L.4-10).

Regarding claims 6-8, the claims are device claims of method claims 1-3 respectively.  The claims are similar in scope to claims 1-3 respectively and they are rejected under similar rationale as claims 1-3 respectively.
Du further teaches that “As described herein, various embodiments may be implemented using hardware elements, software elements, or any combination thereof. Examples of hardware elements may include processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth.
” (Du: [0088]). “Examples of software may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (API), instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof.” (Du: [0089]). “embodiments may include storage media or machine-readable articles. These may include, for example, any suitable type of memory unit, memory device, memory article, memory medium, storage device, storage article, storage medium and/or storage unit, for example, memory, removable or non-removable media, erasable or non-erasable media, writeable or re-writeable media, digital or analog media, hard disk, floppy disk, Compact Disk Read Only Memory (CD-ROM), Compact Disk Recordable (CD-R), Compact Disk Rewriteable (CD-RW), optical disk, magnetic media, magneto-optical media, removable memory cards or disks, various types of Digital Versatile Disk (DVD), a tape, a cassette, or the like.” (Du: [0091] L.1-13).

Regarding claims 11-13, the claims are computer readable storage medium claims of method claims 1-3 respectively.  The claims are similar in scope to claims 1-3 respectively and they are rejected under similar rationale as claims 1-3 respectively.
Du further teaches that “Some embodiments may be implemented, for example, using a storage medium or article which is machine readable. The storage medium may store an instruction or a set of instructions that, if executed by a machine, may cause the machine to perform a method and/or operations in accordance with the embodiments. Such a machine may include, for example, any suitable processing platform, computing platform, computing device, processing device, computing system, processing system, computer, processor, or the like, and may be implemented using any suitable combination of hardware and/or software.” (Du: [0090]).

Claims 4-5, 9-10 and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Du as applied to claim(s) 1 (6 and 11) and further in view of Thomas et al. (“Real-time Simultaneous 3D Head Modeling and Facial Motion Capture with an RGB-D camera,” ARXIV.ORG, Cornell University Library, dated April 22, 2020).

Regarding claim 4, Du teaches the method according to claim 1, wherein the determining the virtual avatar of the target face according to the template coefficient and at least two virtual face feature templates associated with the at least two real face feature templates comprises: 
adjusting a baseline face model according to the template coefficient and the at least two virtual face feature templates (see 4_1 below); and 
using a new face model obtained through the adjusting as the virtual avatar of the target face (see 4_2 below).
While Du does not explicitly teach, Thomas teaches:
(4_1). adjusting a baseline face model according to the template coefficient and the at least two virtual face feature templates (e.g., First, we detect facial features using the system called IntraFace [37]. These sparse features are matched to manually defined features in the blendshape mesh B0 with neutral expression. B0 is then scaled so that the euclidean distances between the facial features in B0 match the ones computed from the RGB-D image. The blendshape mesh B0 is aligned to the first input RGB-D image by minimizing the sum of squared distances between the matched facial features. Thomas: sec. 4.1 para. 2);
(4_2). using a new face model obtained through the adjusting as the virtual avatar of the target face (e.g., Second, we perform elastic registration with the facial features as proposed in [38] to quickly and roughly fit B0 to the user’s head. All deformations are then transferred to all other blendshape meshes Bi, i > 0 [39]. We create the Deviation and color images with the first RGB-D image (see Sec. 4.3). In order to improve tracking performances at runtime we automatically define sparse facial features in the Deviation image. We identify these features as the pixels in the Deviation image that represent the 3D points closest to the facial features detected in the first RGB-D image. Note that the quality of the initial fitting has little influence on the result because the Deviation image will compensate the initial fitting error. The most important point is that the facial features match well for accurate facial expression tracking. Thomas: sec. 4.1 para. 3).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Thomas into the teaching of Du so as to grow and refine the deforming 3D model of the head on-the-fly and in real-time.

Regarding claim 5, the combined teaching of Du and Thomas teaches the method according to claim 4, wherein the adjusting the baseline face model according to the template coefficient and the at least two virtual face feature templates comprises: 
matching a face key point in each of the virtual face feature templates with a face key point in the baseline face model to obtain matching point pairs (e.g., At runtime, our built 3D model of the head is rigidly aligned to the current RGB-D image using the Iterative Closest Point (ICP) algorithm. The blendshape coefficients are then estimated and the pair of Deviation and color images is updated with the current RGB-D image. The pipeline of our proposed method is illustrated in Fig. 6. Thomas: sec. 4 para. 1 L.8-14. For each input RGB-D image we estimate the blendshape coefficients using the same approach as in [2]. Note that differently from [2] we used the point-to-point constraints on the 3D facial features (instead of on the 2D facial features). Moreover, we use all points available from the Deviation image for dense point correspondences. Our point-to-plane fitting term on the depth image is

    PNG
    media_image1.png
    62
    499
    media_image1.png
    Greyscale

where (u, v) is a pixel in the Deviation image, v(u,v) is the closest point to T(l∗)Px(u, v) in the depth image and n(u,v) is the normal vector of v(u,v).  Thomas: sec. 4.2.2 para. 1. Our point-to-point fitting term on 3D facial features is 

    PNG
    media_image2.png
    68
    377
    media_image2.png
    Greyscale

where lmkj is the location of the jth landmark in the Deviation image and vj is the jth 3D facial landmark in the RGB-D image. Thomas: sec. 4.2.2 para. 2); 
performing a weighted summation on distances of at least two matching point pairs with associated face key point in the baseline face model according to the template coefficient (e.g., For each pixel (u, v), we search for the 3D point in the RGB-D image that is closest to the line Lˆx(u, v) by walking through a projected segment in the depth image. We define the segment S(u, v) = [T(l∗)Px(u, v) − λRNx(u, v); T(l∗)Px(u, v) + λRNx(u, v)], where λ = 5 cm if the list of deviation values is empty (in such a case Dev(u, v) = 0), λ = max(1, 5/s ) cm otherwise (where s is the current size of the list). We then walk through the projected segment π(S(u, v)), where π is the perspective projection operator and identify the point pu,v closest to the line Lˆx(u, v). We compute the distance d(u,v) from pu,v to the corresponding point T(l∗)Vx(u, v) on the blended mesh in the direction RNx(u, v): 

    PNG
    media_image3.png
    55
    431
    media_image3.png
    Greyscale

where · is the scalar product. d(u,v) is then inserted in place in the sorted list of deviation values that corresponds to pixel (u, v).  Thomas: sec. 4.3 para. 3); and 
translating the face key point in the baseline face model according to a weighted summation result (e.g., We do not update the value of the Deviation image at pixel (u,v) when the corresponding point pu,v is either farther than 1 cm to the line Lˆx(u, v), farther than τ cm to the point Px(u, v) (with τ = 3 if the list is empty and τ = 1 otherwise), or when the difference in angle between the normal vector of pu,v and Nx(u, v) is greater than 45 degrees. Thomas: sec. 4.3 para. 4.  At each frame, we apply a bilateral gaussian filter (with a window size of 3 × 3 pixels) to the Deviation image to remove outliers. Thomas: sec. 4.3 para.. 5).

Regarding claims 9-10, the claims are device claims of method claims 4-5 respectively.  The claims are similar in scope to claims 4-5 respectively and they are rejected under similar rationale as claims 4-5 respectively.

Regarding claims 14-15, the claims are computer readable storage medium claims of method claims 4-5 respectively.  The claims are similar in scope to claims 4-5 respectively and they are rejected under similar rationale as claims 4-5 respectively.

Conclusion
The prior arts made of record and not relied upon is considered pertinent to applicant's disclosure:
a).	De la Torre (9,799,096) teaches that “A system and method for real-time image and video face de-identification that removes the identity of the subject while preserving the facial behavior is described. The facial features of the source face are replaced with that of the target face while preserving the facial actions of the source face on the target face. The facial actions of the source face are transferred to the target face using personalized Facial Action Transfer (FAT), and the color and illumination is adapted. Finally, the source image or video containing the target facial features is outputted for display. Alternatively, the system can run in real-time.” (De la Torre: Abstract).
b).	Wei et al. (“A Real Time Face Tracking And Animation System,” Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’04)) teaches that “In this paper, a novel system for real time face tracking and animation is presented. The system is composed of two major components: (1) real time infra-red (IR) based active facial feature tracking, and (2) real time facial expression generation based on a 3D face avatar. Twenty-two feature points, head pose orientation and eye close-open status are effectively extracted through a video input. Based on the detected facial features, a 3D face model is animated by a dynamic inference algorithm and a transformation from facial motion parameters to facial animation parameters. The work can be extended to the ﬁelds of real time facial ex-pression analysis and synthesis for applications of human-computer interaction, model-based video conferencing and low bit rate avatar communication. The performance of the developed system is evaluated by its real time implementation for facial expression generation.” (Wei: Abstract).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SING-WAI WU whose telephone number is (571)270-5850. The examiner can normally be reached 9:00am - 5:30pm (Central Time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SING-WAI WU/Primary Examiner, Art Unit 2611