Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 1/17/2022 has been entered. Claims 1-14 remain pending in the present application.

Response to Arguments
Applicant’s arguments regarding claim 1 have been considered but are moot because the new ground of rejection no longer relies on Bouguerra as the primary reference, nor does it rely on the Challapali reference.
Applicant’s arguments regarding the dependent claims are not persuasive because the references used in the rejection of these claims were cited to teach the limitations recited in these claims. Arguing that these references are not the same as the invention is not persuasive.

Claim Objections
The examiner suggests the following amendment to claim 1 to clarify the claim language:
“wherein a trained model is utilized to make lips of the person in the one or more ”.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (“HMM trajectory-guided sample selection for photo-realistic talking head”, 2015), in view of Bouguerra (Pub. No. US 2012/0069028).

Regarding claim 1, Bouguerra discloses a method for providing visual sequences using one or more images comprising: 
receiving one or more images showing at least a face of a person (Page 9853, 2nd paragraph: “The remaining task is to stitch the lips image sequence into a full face background sequence”. In particular, one or more full face background images are received, and a mouth sample sequence from the image ,
receiving a message to be enacted by the person, wherein the message comprises at least a text or an emotional and movement command (Pg. 9851, last par: “At the synthesis stage, the input can be natural speech, or for any given text, the audio is firstly synthesized by Text-to-Speech (TTS)”),
processing the message to extract or receive audio data (Pg. 9851, last par: “At the synthesis stage, the input can be natural speech, or for any given text, the audio is firstly synthesized by Text-to-Speech (TTS)”), and facial movement data related to an expression to be carried on a face of the person (Pg. 9853, last par: “We use a mouth replacement mask to specify which region of the final video come from the selected lips image sequence and which come from the background video. Figure 4 shows an example of mouth region mask, applied to lips images and background images. In a similar way, we replace the upper face region like the eyes and eye brow by using an eye mask for rendering eye blinking”. In particular, eye movement and lip shapes (such as those illustrated in Fig. 11) constitute facial expressions),
processing the one or more images, the audio data, and the facial movement data, and generating an animation of the person enacting the message (See Fig. 4, also pg. 9854, 1st par: “The final rendered video is photo-, video-realistic, lip sync with speech, also with natural head motion”),

wherein a trained model is utilized to make lips of the person in the one or more images move in synchronization with the expression to be carried on the face of the person (Pg. 9850, 2nd par: “a talking head needs to be not just photo-realistic in a static appearance, but exhibit convincing plastic deformations of the lips synchronized with the corresponding speech, realistic head movements and natural facial expressions. In this paper, we introduce the whole system for constructing personalized photo-realistic talking head from video footage, and focus on the articulator movements rendering (including lips, teeth, and tongue), which is the most eye-catching region on a talking face”, and 3rd par: “The photo-realistic talking head we proposed consists of two, training and synthesis, parts. In training, an audio/visual database is recorded and used to train a statistical Hidden Markov Model (HMM)”).
Wang, however, does not disclose the above strike-through limitation.
In the same field of facial animation, Bouguerra teaches using GUI-based instructions to invoke the generation of one or more facial movements (Par. 74: “Video emoticons menu 506 may be used to select a video emoticon. Chat box 508 provides an alternative means for a user to select a video emoticon”. See also pars. 75-76).
In light of Bouguerra’s teaching, it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to extend Wang’s talking head technique to the field of video chat where a user could use GUI-based text or speech to invoke the generation of one or more facial movements of a talking head. The motivation would have been to improve realism in the chat environment.

Regarding claim 2, Wang in view of Bouguerra teaches the method according to claim 1, wherein the message is received as an input from a user (Wang, pg. 9851, last par: “At the synthesis stage, the input can be natural speech, or for any given text, the audio is firstly synthesized by Text-to-Speech (TTS)”).

Regarding claim 3, Wang in view of Bouguerra teaches the method according to claim 2, wherein the message comprises the audio data (Wang, pg. 9851, last par: “At the synthesis stage, the input can be natural speech, or for any given text, the audio is firstly synthesized by Text-to-Speech (TTS)”).

Regarding claim 4, Wang in view of Bouguerra teaches the method according to claim 1, wherein the message comprises a body movement instruction related to movement of one or more body parts of the person (Bouguerra, par. 75).

Regarding claim 5, Wang in view of Bouguerra teaches the method according to claim 1, comprising:
processing the message and the audio data to producing lip sync data; 
processing the one or more images, the audio data, the facial movement data, the lip sync data and generating an animation of the person enacting the message with lip sync (Wang, Abstract).

Regarding claim 6, Wang in view of Bouguerra teaches the method according to claim 1, wherein the images comprise faces of more than one person (Since Bouguerra discloses an emoticon-based animation technique used in a video chat environment, a person skilled in the art would infer that a first video comprises the face of a first user, and a second video comprises the face of a second user. This could be interpreted as “the images (the video frames of the first and second videos) comprise faces of more than one person”), the method comprising:
receiving messages to be enacted by a person in an order (In the video chat environment of Bouguerra, the first user sends a first message, and the second user replies with a second message. This could be interpreted as receiving the first and second messages in an order), 
processing the messages to extract or receive the audio data related to voice of the persons, and the facial movement data related to expressions to be carried on faces of the persons (Since the talking head technique of Wang is utilized in a video chat environment, a person skilled in the art would infer that each message, whether it be from the first user or the second user, would be processed in the same manner), 
processing the one or more images, the audio data, and the facial movement data, and generating an animation of the person enacting the messages in the respective order as provided (Again, a person skilled in the art would infer that in Wang as modified by Bouguerra, if the first user sends the first message at time t0, the first message would be processed at time t0 to generate a first animation. Then, if the second user replies with the second message at time t1 (t1 > t0), the second message would be processed at time t1 to generate a second animation).

Regarding claim 7, Wang in view of Bouguerra teaches the method according to claim 1, wherein the images comprises faces of more than one person (Since Wang as modified by Bouguerra is directed to a talking head technique used in a video chat environment, a person skilled in the art would infer that a first video comprises the face of a first user, and a second video comprises the face of a second user. This could be interpreted as “the images (the video frames of the first and second videos) comprise faces of more than one person”), the method comprising:
receiving a selection input to select one or more person with one or more faces from the one or more images received, 
generating a scene image showing the one or more persons with one or more faces based on the selection input, 
processing the scene image, the audio data, and the facial movement data, and generating an animation of more than one person enacting the message (Bouguerra, par. 64: “In one embodiment, when a user of a client device selects an animated video emoticon, the animated video emoticon is applied to the video captured by the first client device before it is transmitted to another client device. However, the user of the client device may additionally or alternatively select to apply an animated video emoticon to a video stream received from the other client device. For example, a first friend may want to see what his video-chat buddy would look like `surprised`, and so the first friend may invoke the `surprised` video emoticon on the video stream depicting his buddy”. In particular, instead of applying the animation to the user’s own images, the user can select a friend that he or she is chatting with, and apply the animation the friend’s images).

Regarding claim 8, Wang in view of Bouguerra teaches the method according to claim 1, comprising:
receiving a chat request made by a user with at least another user,
establishing a chat environment between the users based on the chat request,
receiving at least one image representative of at least one of the users, wherein the image comprising at least one face,
receiving a message from at least one of the users in the chat environment, wherein the message comprises at least a text or an emotional and movement command (Since Bouguerra teaches animating facial images in a video chat environment, a person skilled in the art would infer that the above steps (i.e. receiving a chat request, establishing a chat environment, receiving images of users, and receiving messages exchanged between the users) are inherent in any video chat application).

Regarding claim 9, Wang in view of Bouguerra teaches the method according to claim 8, wherein the message from a first computing device is received at a second computing device (Fig. 1 of Bouguerra suggests this limitation because Bouguerra teaches a video chat environment. For example, a user of video chat client device 101 can send a message with emoticons to video chat client device 102), and processing the one or more images, the audio data, and the facial movement data, and generating an animation of the person enacting the message in the chat environment (Bouguerra, pars. 64, 69 and 75 and Wang, Abstract).

Claim(s) 10 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Bouguerra as applied to respective claims 9 and 1 above, and further in view of Sun et al. (Pub. No. US 2007/0216675).

Regarding claim 10, Wang in view of Bouguerra teaches the method according to claim 9, wherein the images comprise faces of more than one person (Since Wang in view of Bouguerra is directed to a talking head technique used in a video chat environment, a person skilled in the art would infer that a first video comprises the face of a first user, and a second video comprises the face of a second user. This could be interpreted as “the images (the video frames of the first and second videos) comprise faces of more than one person”), the method comprising:


processing 
In the same field of video chatting/conferencing, Sun teaches receiving images representative of more than one users in a chat environment, and processing the images, and generating a scene image showing the users in the chat environment (See Fig. 2 and pars. 24 and 29. In particular, a user can select an arbitrary background image to replace a current background in an image. The background image could be interpreted as a scene image).
In light of the above teaching of Sun, it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to further modify Wang by receiving at least one image representative of more than one users in the video chat environment, processing the at least one image, and generating a scene image showing the users in the chat environment, and processing the scene image, the audio data, and facial movement data, and generating an animation of the person enacting the message in the chat environment. The motivation would have been to give the user an option to replace the current background with a scene image that he or she likes.

Regarding claim 11, Wang in view of Bouguerra teaches the method according to claim 1, comprising:




processing 
In the same field of video chat, Sun teaches the above strike-through limitations (See pars. 30-31 and Fig. 3).
In light of the above teaching of Sun, it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to further modify Wang by receiving a wearing input related to a body part of at least one of the persons in the images onto which a fashion accessory is to be worn, processing the wearing input and identifying body parts of the person onto which the fashion accessory is to be worn, receiving an image or video of the accessory according to the wearing input, processing the identified body parts the person and the image or video of the accessory and generating a wearing image showing the person wearing the fashion accessory, processing the wearing image, the audio data, and the facial movement data, and generating an animation of the persons enacting the message wearing the fashion accessory. The motivation would have been to give the user an option to incorporate digital effects into existing video streams.

Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Bouguerra as applied to claim 1 above, and further in view of Afifi et al. (“Video Face Replacement System Using a Modified Poisson Blending Technique”, 2014).

Regarding claim 12, Wang in view of Bouguerra teaches the method according to claim 1, comprising:


processing 
In the same field of video editing, Afifi teaches generating a morphed video showing a user’s face from a first video on a user’s body from a second video (See Abstract and Fig. 2).
In light of the above teaching of Afifi, it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to further modify Wang by receiving a target video showing a face of another user, processing the user’s video and the target video to generate a morphed video showing the face from the target video on the user's body from the user’s video, and processing the morphed video, the audio data, and the facial movement data to generate an animation of the user enacting the message. The motivation would have been to provide a means for face replacement (Afifi, section I. Introduction, 1st paragraph).

Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Bouguerra as applied to claim 1 above, and further in view of Senftner et al. (Pub. No. US 2008/0019576).

Regarding claim 13, Wang in view of Bouguerra teaches the method according to claim 1, comprising:

processing the one or more person images, 
In the same field of video editing, Senftner teaches replacing the body of a first user in a video with the body of a second user (Par. 58: “The initial description of the processes will be made using an example case where the video is personalized by substituting the image of the face of a new actor for the facial portion of the image of one of the video's original actors. Within this specification, the terms face and facial should be interpreted to include the visible portions of the ears, neck, and other adjacent skin areas unless otherwise noted. The same processes can be applied to substituting a larger portion of the new actor for the corresponding portion of the original actor, up to and including full-body substitution”).
In light of the above teaching of Senftner, it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to further modify Wang by receiving a target image or video showing a body of another user, and processing the one or more person images, the target image or video, the audio data, and the facial movement data to generate an animation of the user enacting the message with the body of said another user from the target image or video. The motivation would have been to provide a personalized video.

Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Bouguerra as applied to claim 1 above, and further in view of Vronay et al. (Pub. No. US 2006/0251384).

Regarding claim 14, Wang in view of Bouguerra teaches the method according to claim 1, the method is being implemented in a video call environment between at least two callers FIG. 5A illustrates a non-limiting, non-exhaustive example of a video-chat session. User 502 appears in video-chat session 504 on a client device of another user. The other user may optionally be viewing a similar video-chat session on his client device”), the method comprising:
- receiving an image of the caller In one embodiment, video chat client 243 may support video-chat sessions, wherein a video of a user may be captured using video capture device 259 and streamed to another user for display with display 254. Additionally or alternatively, a video of the other user may be captured and streamed to video chat client device 200 for display with display 254”), 
- receiving a message to be enacted by the caller 
- processing the message to extract or receive an audio data related to voice of the caller 
- processing the image, the audio data, and the facial movement data, and generating an animation of the caller 
wherein emotional and movement command is a GUI or multimedia based instruction to invoke the generation of facial expressions and/or the body parts movement (Bouguerra, par. 74).
Bouguerra does not disclose that at least one of the users is not using a video camera.
In the same field of video chat/conference, Vronay teaches a video conferencing system wherein at least one of the participants is using previously-stored image data, without using a video camera (Par. 53: “it is noted that previously stored image data can be input into the computer 110 from any of the aforementioned computer-readable media as well, without directly requiring the use of a camera”).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to incorporate the teaching of Vronay into Wang as modified by Bouguerra such that at least one of the users is using previously-stored image data, without using a video camera. The motivation would have been to provide support for devices that lack a video camera.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHONG X NGUYEN whose telephone number is (571)270-1591. The examiner can normally be reached Mon-Fri 8am - 5pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on (571)272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PHONG X NGUYEN/           Primary Patent Examiner, Art Unit 2613