Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant’s amendment filed on 8/23/2022 has been entered. Claims 1-15 remain pending in the present application.

Response to Arguments
Applicant argued that the technique used in the present invention to generate the body model of the first person is different than that of Haring (Remarks, pages 8-10). However, the present amendment did not incorporate that difference into claim 1, and as such, the cited text of Haring still “reads” on claim 1. If Applicant believes a feature is novel, that feature should be recited in the claim language. Although claims are interpreted in light of the specification, limitations from the specification are not read into the claims.
Applicant argued that the present invention prepares lip-sync of any photo without the need of recording a person while he or she is speaking. In contrast, Wang’s technique needs at least 20 minutes of recording of audio/video footage of the person (Remarks, pages 10-12). This argument has been fully considered, but is moot because Haring was cited to teach this feature, not Wang. As shown on page 6 of the Office action dated 2/23/2022, par. 5 of Haring discloses “In some embodiment, the avatar's body is remotely controlled by the user that has generated the avatar by sending commands of body animations stored in all devices, to make the 3D avatar, for example, walk, jump, run in circles or simply move or act as much and as how as the user wishes to”.
Applicant’s arguments regarding  other references used in the rejection of dependent claims are not persuasive because these references were cited to teach additional limitations recited in their respective claims; they do not have to be identical to the instant invention.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-6, 8-9, 11 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Haring (Pub. No. US 2016/0110922), in view of Wang et al. (“HMM trajectory-guided sample selection for photo-realistic talking head”, 2015), and further in view of Wang et al. (Pub. No. 2010/0082345; “Wang ‘345” hereinafter).

Regarding claim 1, Haring discloses a method for providing visual sequences using one or more images comprising:
receiving one or more images of a first person showing at least one face of the first person (Pars. 37: “At block 200 metadata is received from a remote computer device... In some embodiments, the metadata includes image properties taken from a 2D frontal photo of a face. Such image properties are taken from an image that is inputted by the user of the remote device and which changes the mesh proportions of the 3D model's head and face accordingly”, and 52: “In some cases, the sender customizes the avatar to resemble the user. In some embodiments, the image of the sender is projected on the avatar to customize the avatar to resemble the user”. In particular, the first person is the user of the remote computer device);
using human body information to identify requirement of one or more body parts to generate a body model of the first person (Pars. 5: “In some embodiments, the enhancement or customization of the 3D avatar includes changes in the measurements of the mesh of the 3D model according to an inputted image and texture projection of the same image over the 3D model. For example, a real image of the face or the body of the user may enhance the avatar to resemble the user's skin texture, color, head and face parts sizes and proportions”, and 37: “In one other embodiment, the metadata is included in a voice message that is sent from the remote computer device. The metadata includes information for generating an avatar; for example identification of the avatar and identifications of the properties of the avatar. Such properties may be colors, shape, hair, skin, size, etc.”);
receiving at least one image or photograph of the one or more body parts based on the identified requirement, wherein the one or more body part belongs to one or more second persons or is generated based on image processing techniques (See Fig. 5A. The image of the avatar is based on image processing techniques); 
processing the one or more images of the first person with the at least one image or photograph of the one or more human body parts of using the human body information to generate the body model of the first person, the body model comprises a face of the first person (See par. 37 and Fig. 5A);
receiving a message to be enacted by the first person, wherein the message comprises at least a text or an emotional and movement command (Par. 5: “In some embodiment, the avatar's body is remotely controlled by the user that has generated the avatar by sending commands of body animations stored in all devices, to make the 3D avatar, for example, walk, jump, run in circles or simply move or act as much and as how as the user wishes to”);
processing the message to extract or receive audio data related to a voice of the first person (Par. 6: “The remote devices receive the recorded data and recorded audio, generate the sending user's avatar according to the recorded data, augment this newly generated avatar in the receiving device's current surroundings and play the audio message while moving the avatar's body according to the recorded data”), and facial movement data related to an expression to be carried on the face of the first person (Par. 8: “Additionally the face expression of the avatar may be changed in accordance with the audio recording of the user”);
processing the body model of the first person, the audio data, and the facial movement data, and generating an animation of the body model of the first person enacting the message (See par. 6),
wherein the emotional and movement command is a GUI or multimedia-based instruction to invoke the generation of one or more facial expression and/or one or more body parts movement (Fig. 5A shows a microphone icon (a GUI element) in the lower left corner of a mobile phone. This suggests that the user of the mobile phone can press this GUI element to record and send an audio/voice message/command),

Haring does not disclose using a trained model to: (1) make lips of the first person in the one or more images move in synchronization with the expression to be carried on the face of the first person, and to (2) make human body part movements in the body model of the first person.
In the same field of lip synchronization, Wang teaches limitation (1) (Pg. 9850, 2nd par: “a talking head needs to be not just photo-realistic in a static appearance, but exhibit convincing plastic deformations of the lips synchronized with the corresponding speech, realistic head movements and natural facial expressions. In this paper, we introduce the whole system for constructing personalized photo-realistic talking head from video footage, and focus on the articulator movements rendering (including lips, teeth, and tongue), which is the most eye-catching region on a talking face”, and 3rd par: “The photo-realistic talking head we proposed consists of two, training and synthesis, parts. In training, an audio/visual database is recorded and used to train a statistical Hidden Markov Model (HMM)”. See also the Conclusions section).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to further modify Haring by using a trained model to make lips of the first person in the one or more images move in synchronization with the expression to be carried on the face of the first person, as taught by Wang. The motivation would have been to render a photo-realistic video of the first user in sync with given speech signals by searching for the closest real image sample sequence in the library to the HMM predicted trajectory (Wang, Conclusions section).
Also in the same field of body animation synthesis, Wang ‘345 teaches limitation (2) (Par. 50: “In general, animation model training involves adapting the framework of a probabilistic model based text-to-speech synthesis process to model "animation units" for various body parts (e.g., eyes, eyebrows, mouth, nose, ears, face, head, hands, arms, etc.)”, and par. 56: “More specifically, one or more synchronized audio/video inputs are processed using object detection and recognition techniques in combination with probabilistic modeling techniques to learn probabilistic motions (i.e., "animation units") corresponding for each different body part, including, for example, lip sync motions, head motions, hand and/or finger motions, facial expressions, eye blink, etc.”).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to further modify Haring by using a trained model to make human body part movements in the body model of the first person, as taught by Wang ‘345. The motivation would have been to make full body animation.

Regarding claim 2, Haring in view of Wang and Wang ‘345 teaches the method of claim 1, wherein the message is received as an input from the first person (Haring, par. 5: “In some embodiment, the avatar's body is remotely controlled by the user that has generated the avatar by sending commands of body animations stored in all devices, to make the 3D avatar, for example, walk, jump, run in circles or simply move or act as much and as how as the user wishes to”).

Regarding claim 3, Haring in view of Wang and Wang ‘345 teaches the method according to claim 2, wherein the message comprises the audio data (Haring, par. 6: “The remote devices receive the recorded data and recorded audio, generate the sending user's avatar according to the recorded data, augment this newly generated avatar in the receiving device's current surroundings and play the audio message while moving the avatar's body according to the recorded data”).

Regarding claim 4, Haring in view of Wang and Wang ‘345 teaches the method according to claim 1, wherein the message comprises a body movement data related to movement of the one or more body parts of the first person (Haring, par. 5: “In some embodiment, the avatar's body is remotely controlled by the user that has generated the avatar by sending commands of body animations stored in all devices, to make the 3D avatar, for example, walk, jump, run in circles or simply move or act as much and as how as the user wishes to”), the method further comprises:
processing the body model, the audio data, the body movement data and the facial movement data, and generating an animation of the body model of the first person enacting the message with the movement of the one or more body parts (Haring, par. 6: “The remote devices receive the recorded data and recorded audio, generate the sending user's avatar according to the recorded data, augment this newly generated avatar in the receiving device's current surroundings and play the audio message while moving the avatar's body according to the recorded data””).

Regarding claim 5, Haring in view of Wang and Wang ‘345 teaches the method according to claim 1, comprising:
processing the message and the audio data to produce lip sync data; 
processing the body model, the audio data, the facial movement data, the lip sync data and generating an animation of the body model of the first person enacting the message with lip sync (Haring, par. 6 as modified by Wang).

Regarding claim 6, Haring in view of Wang and Wang ‘345 teaches the method according to claim 1, wherein the one or more images of the first person comprises faces of more than one person including the face of the first person, the method further comprising:
- receiving messages to be enacted by the first person and other one or more persons shown in the one or more images, in an order, 
- processing the messages to extract or receive the audio data related to voice of the first person and the other one or more persons shown in the one or more images, and the facial movement data related to expressions to be carried on the faces of the first person and the other one or more persons shown in the one or more images, 
- processing the body models of the first person and the other one or more persons shown in the one or more images, the audio data, and the facial movement data, and generating an animation of the body models of the first person and the other one or more persons shown in the one or more images enacting the messages in the respective order as provided (Haring suggests these limitations in pars. 32: “When in a communication session the display unit 1024 displays the avatars of all the users that participate in the session on the live video stream that is captured by the computer device 101 during the voice communication session”, and 33: “The audio unit 1025 is configured for playing audio streams that are received from the other users. The audio may be played as a result of receiving a voice message or during a voice communication session”).

Claim 8 recites similar limitations as claim 1, and further recites additional limitations related to a chat environment established between two users. Since Haring also teaches a video chat environment, claim 8 can be rejected under the same rationale set forth in the rejection of claim 1.

Regarding claim 9, Haring in view of Wang and Wang ‘345 teaches the method according to claim 8, wherein the chat message from a first computing device is received at a second computing device, and processing the body models, the audio data, and the facial movement data, and generating an animation of the first person enacting the chat message in the chat environment, and displaying the animation on a display of the second computing device in the chat environment (See Fig. 3 of Haring).

Regarding claim 11, Haring in view of Wang and Wang ‘345 teaches the method according to claim 1, comprising:
receiving a wearing input related to a body part of the body model onto which a fashion accessory is to be worn; 
processing the wearing input and identifying body part/s of the body model onto which the fashion accessory is to be worn; 
receiving an image/video of the accessory according to the wearing input;
processing the identified body part/s of the body model and the image/video of the accessory and generating a view showing the body model wearing the fashion accessory, 
processing the view, the audio data, and the facial movement data, and generating an animation of the persons enacting the message wearing the fashion accessory (See Fig. 5A of Haring, as well as par. 95: “FIGS. 5A and 5B show an exemplary screen capture of an enhanced voice communication in accordance with some exemplary embodiments of the disclosed subject matter. FIG. 5A shows an avatar 500 that is generated by a computer device A of user A. The avatar 500 is customized to resemble user A. The avatar 500 is customized, for example, with clothing items 501. The avatar 500 is sent from the computer device A of user A to the computer device B of user B at the beginning of the voice session”).

Regarding claim 13, Haring in view of Wang and Wang ‘345 teaches the method according to claim 1, comprising: 
receiving an image of a cloth and combining the body model of the first person and the image of the cloth to show the body model of the first person wearing the cloth (Haring, Fig. 5A and par. 95).

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Haring in view of Wang and Wang ‘345 as applied to claim 1 above, and further in view of McCulloch (Pub. No. US 2016/0134840).

Regarding claim 7, Haring in view of Wang and Wang ‘345 teaches the method according to claim 1, wherein the one or more images of the first person comprises faces of more than one person including the face of the first person, the method comprising:
- 


In the same field of video communications, McCulloch teaches a video communication system wherein avatars representing a plurality of users are rendered and animated in a generated scene (See Fig. 21 and pars. 365-375).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to further incorporate the teaching of McCulloch into Haring by generating a scene showing one or more body models of the first person and the other one or more persons shown in the one or more images with faces based on a selection input, and processing the scene, the audio data, and the facial movement data, and generating an animation of the body models of the first person and the other one or more persons shown in the one or more images enacting the message. The motivation would have been to reduce processor power as compared to stitching together live video streams (McCulloch, par. 374).

Claim(s) 10 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Haring in view of Wang and Wang ‘345 as applied to respective claims 9 and 1 above, further in view of Winchester, and still further in view of McCulloch (Pub. No. US 2016/0134840).

Regarding claim 10, Haring in view of Wang and Wang ‘345 teaches the method according to claim 9, comprising:
receiving at least one image representative of more than one first person in the chat environment (Haring, par. 32), 


processing the images of the first person with the images of other human body parts using the human body information to generate a body model of each of the first persons for whom the body parts are required, the body model comprises face of the first persons, 
processing the body model, images of the first persons for whom the body parts were not required and generating an 
receiving a message from at least one of the first persons in the chat environment, wherein the message comprises at least a text or an emotional and movement command, 
processing the message to extract or receive the audio data related to voice of the first person from whom the message is received, and the facial movement data related to expression to be carried on face of the first person from whom the message is received, 
processing the scene, the audio data, and the facial movement data, and generating an animation of the first person enacting the message in the chat environment,
wherein emotional and movement command is a GUI or multimedia based instruction to invoke the generation of facial expression/s and or body part/s movement (For the limitations that are not lined through, see the rejection of claim 1 above).
In the same field of endeavor, Winchester teaches animating facial expressions of two avatars representing two different users who are captured in the same image (Fig. 4 and par. 37).
In light of the above teaching of Winchester, it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to further modify Haring to receive at least one image representative of more than one user in the chat environment, and generate a body model for each of the users. The motivation would have been to enhance users' experience by providing the users with realistic emotions captured in real time.
Furthermore, McCulloch teaches a video communication system wherein avatars representing a plurality of users are rendered and animated in a generated scene (See Fig. 21 and pars. 365-375).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to further incorporate the teaching of McCulloch into Haring by generating a scene showing one or more body models of the first person and the other one or more persons shown in the one or more images with faces based on a selection input, and processing the scene, the audio data, and the facial movement data, and generating an animation of the body models of the first person and the other one or more persons shown in the one or more images enacting the message. The motivation would have been to reduce processor power as compared to stitching together live video streams (McCulloch, par. 374).

Claim 15 recites similar limitations as claim 1 with additional limitations related to generating body models for a plurality of users in an image and generating a scene showing the plurality of users. Since Winchester teaches animating facial expressions of two avatars representing two different users who are captured in the same image, and McCulloch teaches generating a scene, claim 15 can be rejected using the same rationales set forth in the rejection of claims 1 and 10 above.

Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Haring in view of Wang and Wang ‘345 as applied to claim 1 above, and further in view of Afifi et al. (“Video Face Replacement System Using a Modified Poisson Blending Technique”, 2014).

Regarding claim 12, Haring in view of Wang and Wang ‘345 teaches the method according to claim 1, comprising:



In the same field of video editing, Afifi teaches generating a morphed video showing a user’s face from a first video on a user’s body from a second video (See Abstract and Fig. 2).
In light of the above teaching of Afifi, it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to further modify Haring by receiving a target image showing a face of another person or animal, processing the body model and the target image to generate a morphed body model showing the face from the target image on the person's body model, processing the morphed body model, the audio data, and the facial movement data, and generating an animation of the morphed body model enacting the message. The motivation would have been to provide a means for face replacement (Afifi, section I. Introduction, 1st paragraph).

Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Haring in view of Wang and Wang ‘345 as applied to claim 1 above, and further in view of Corazza et al. (Pub. No. US 2016/0027200).

Regarding claim 14, Haring in view of Wang and Wang ‘345 teaches the method according to claim 1, comprising: 

processing the body model, the audio data, and the facial movement data, and generating an animation of the body model of the person enacting the message (Haring, par. 6).
In the same field of animation, Corazza teaches receiving an animation input related to nodes of skeleton of the body model, wherein the skeleton of the body model is thinned down structure of the body model (See pars. 44 and 75).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to further modify Haring by receiving an animation input related to nodes of skeleton of the body model, wherein the skeleton of the body model is thinned down structure of the body model, as taught by Corazza. The motivation would have been to drive 3D mesh deformation, or movements of polygons within a 3D mesh, in response to movement by the 3D model (Corazza, par. 44).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHONG X NGUYEN whose telephone number is (571)270-1591. The examiner can normally be reached Mon-Fri 8am - 5pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on (571)272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PHONG X NGUYEN/           Primary Patent Examiner, Art Unit 2613