Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant’s amendment filed on 6/9/2021 has been entered. Claims 1-14 remain pending in the present application.

Response to Arguments
Applicant argued: “Bouguerra is putting some image/video showing emoticon on the face image/video which is not like generating a realistic lisping [sic] on an image uploaded by user which requires trained model to make lips move in sync on the text/voice data and generate expression on face. It is simply putting that effect in overlay,” (Remarks, page 9).
Examiner’s response: A claim is rejected based on limitations explicitly recited in it, not based on limitations described in the specification but not recited in the claim. In other words, limitations from the specification are not read into the claim (see MPEP, section 2111 Claim Interpretation; Broadest Reasonable Interpretation [R-07.2015]). In this case, claim 1 did not recite a “realistic lip sync” or a “trained model”. Therefore, the argument is not persuasive. If the Applicant believes these limitations are novel, they should be incorporated into claim 1. For example, claim 1 could be amended to recite “...wherein a 

Applicant argued: “With respect to Challapali, the application shall like to emphasize that the current patent application clearly explains in whole description and in claim 1, that the image provided by user is not a pre-designed computer graphic image,” (Remarks, page 11).
Examiner’s response: Challapali was cited for teaching the limitation “converting a text message into a spoken message” that was not disclosed by Bouguerra (see page 6 of the previous office action). Therefore, whether the user-provided image in Challapali is of a different type or the same type is not relevant.

Applicant argued: “Wang is also not directly related to our invention. In Wang, they are also creating a 3D head. We use real photo, as in it the hair, the face skin and parts are real so while it talks, it looks like real video chat/message. This is our goal to achieve the realism while using 3D head this can't be achieved,” (Remarks, page 12).
Examiner’s response: As with the Challapali reference, Wang was cited only for teaching the limitation “producing lip sync based on audio data” that was not disclosed by Bouguerra (see page 11 of the 

Applicant argued: “the Applicant emphasizes that Sun is nowhere related to the current invention. Sun shows how 3D things can appear on a 2d video and provide effect like blurring, etc. to make it feel like merged. Sun uses the face detection algorithms to detect the feature points,” (Remarks, pages 13-14).
Examiner’s response: Again, Sun was cited only for teaching the limitation “receiving images representative of more than one users in a chat environment, and processing the images, and generating a scene image showing the users in the chat environment,” (see page 12 of the previous office action). Arguing that Sun’s invention is different is moot.

Applicant argued: “With respect to Afifi, the citation talks about new blending technique by which it replaces source face on a target face and merge it by blending. This is not related to the current invention, as wherever the technology of the current invention is switching the face of one person or animal to other, it is replacing it along with the neck which is different. As in case of Afifi, the face and hair are of same person but the portion including eyes, nose and lips are of other person. So, it looks like the mixture of source and target while in our case it is exactly the same person which have face, hair and everything of that person,” (Remarks, page 14).
Examiner’s response: Claim 12 did not recite in detail how the morphing is performed. Rather, it only recited: “generate a morphed image showing the face from the target image on the person's body from the person image.” Therefore, arguing that the invention is different because it replaces the face along with the neck, hair, and eyes is moot.

Applicant argued: “With respect to Senftner, which shows the alteration in background and adding other person in same frame or else which shows the functionality of a video editing software which don't produce results like us,” (Remarks, page 15).
Examiner’s response: Senftner was cited only for teaching the limitation "replacing the body of a first user in a video with the body of a second user," (see pages 15-16 of the previous office action). Therefore, it is moot to argue that Senftner “don’t produce results like us”.

Applicant argued: “With respect to Vronay, it discloses a technology which is a video editing software, which detect the face, and specially in a group video call, it can detect the talking face and can zoom it and do other editing activities. It has nothing to do with generation of a video through a photo and voice message, as claimed in claim 1 of the current patent application,” (Remarks, page 15).
Examiner’s response: Vronay was cited only for teaching a video conferencing system wherein at least one of the participants is using previously-stored image data, without using a video camera (see page 17 of the previous office action). Therefore, it is moot to argue that Vronay “has nothing to do with generation of a video through a photo and voice message, as claimed in claim 1 of the current patent application”.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
In particular, claim 1 recites “receiving a message to be enacted by the person”. There is insufficient antecedent basis for “the person”, and it is not clear what person is being referred to. The examiner suggests the following amendment to the preceding paragraph: “receiving one or more images showing at least a face of a person” to obviate this lack of antecedent basis.
Claims 7 and 10 recite “the persons enacting the message” (note the plural form “persons”). However, there is only one person enacting the message, as recited in the parent claim 1.
Claim 11 recites “receiving a wearing input related to a body part of at least one of the person”. However, there is only one person recited in parent claim 1.
Claim 12 recites “showing the face from the target image on the person’s body from the person image”. It’s not clear what person image is being referred to, because there may be more than one person image, as recited in parent claim 1.
Claim 14 recites “receiving a message to be enacted by the caller”, “audio data related to voice of the caller”, “expression to be carried on face of the caller”, and “generating an animation of the caller”. Since there are at least two callers, it is not clear which caller is being referred to.
Corrections are required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4 and 6-9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouguerra (Pub. No. US 2012/0069028), in view of Challapali (Pub. No. US 2002/0194006).

Regarding claim 1, Bouguerra discloses a method for providing visual sequences using one or more images comprising: 
receiving one or more person images showing at least one face (Par. 49: “In one embodiment, video chat client 243 may support video-chat sessions, wherein a video of a user may be captured using video capture device 259 and streamed to another user for display with display 254. Additionally or alternatively, a video of the other user may be captured and streamed to video chat client device 200 for display with display 254”. A video comprises a plurality of image frames),
receiving a message to be enacted by the person, wherein the message comprises at least a text or an emotional and movement command (Par. 61: “The video emoticon may be selected from a menu, or a video emoticon may be selected through text input. For example, a ‘smiley’ video emoticon may be selected by typing ":-)" into a chat window associated with the video-chat. Additionally or alternatively, a video emoticon may be selected from a graphical interface”. In particular, the smiley video emoticon is an emotional and movement command because it results in an emotion (happiness) and movement (animation of the lips). It could be inputted in textual (such as “:-)”) or graphical form as a message to be enacted by the user. See also par. 74: “Video emoticons menu 506 may be used to select a video emoticon. Chat box 508 provides an alternative means for a user to select a video emoticon, as discussed above”),
processing the message to extract or receive (Par. 65),
processing the one or more person images, (Pars. 64, 69 and 75),
wherein an emotional and movement command is a GUI or multimedia-based instruction to invoke the generation of one or more facial and/or one or more body parts movement (Par. 74: “Video emoticons menu 506 may be used to select a video emoticon. Chat box 508 provides an alternative means for a user to select a video emoticon”. See also pars. 75-76).
Bouguerra does not disclose processing the message to extract or receive audio data and processing the audio data as part of the animation generation.
However, it is well known in the art that a text message can be transformed into a spoken message. For example, Challapali discloses in par. 7: “’Text to visual speech’ systems utilize a keyboard or the like to enter text, then convert the text into a spoken message, and broadcast the spoken message along with an animated face image”.
In light of the above teaching of Challapali, it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to incorporate into Bouguerra a text-to-speech system such that a text message from the user could be transformed into audio data related to a voice of the user, and the audio data would then be used in the animation of the user’s video. The suggestion would have been to add audio to the animated video.

Regarding claim 2, Bouguerra in view of Challapali teaches the method according to claim 1, wherein the message is received as an input from a user (Bouguerra, par. 61: “The video emoticon may be selected from a menu, or a video emoticon may be selected through text input. For example, a ‘smiley’ video emoticon may be selected by typing ":-)" into a chat window associated with the video-chat. Additionally or alternatively, a video emoticon may be selected from a graphical interface”).

Regarding claim 3, Bouguerra in view of Challapali teaches the method according to claim 2, wherein the message comprises the audio data (As pointed out in the rejection of claim 1, a text message can be converted into audio data using text-to-speech techniques).

Regarding claim 4, Bouguerra in view of Challapali teaches the method according to claim 1, wherein the message comprises a body movement instruction related to movement of one or more body parts of the person (Bouguerra, par. 75).

Regarding claim 6, Bouguerra in view of Challapali teaches the method according to claim 1, wherein the images comprise faces of more than one person (Since Bouguerra discloses an emoticon-based animation technique used in a video chat environment, a person skilled in the art would infer that a first video comprises the face of a first user, and a second video comprises the face of a second user. This could be interpreted as “the images (the video frames of the first and second videos) comprise faces of more than one person”), the method comprising:
receiving messages to be enacted by the persons in an order (In the video chat environment of Bouguerra, the first user sends a first message, and the second user replies with a second message. This could be interpreted as receiving the first and second messages in an order), 
processing the messages to extract or receive the audio data related to voice of the persons, and the facial movement data related to expressions to be carried on faces of the persons (The real-time emoticon-based animation technique of Bouguerra is utilized in a video chat environment. Therefore, a person skilled in the art would infer that each message, whether it be from the first user or the second user, would be processed in the same manner), 
processing the one or more images, the audio data, and the facial movement data, and generating an animation of the person enacting the messages in the respective order as provided (Again, a person skilled in the art would infer that in Bouguerra, if the first user sends the first message at time t0, the first message would be processed at time t0 to generate a first animation. Then, if the second user replies with the second message at time t1 (t1 > t0), the second message would be processed at time t1 to generate a second animation).

Regarding claim 7, Bouguerra in view of Challapali teaches the method according to claim 1, wherein the images comprises faces of more than one person (Since Bouguerra discloses an emoticon-based animation technique used in a video chat environment, a person skilled in the art would infer that a first video comprises the face of a first user, and a second video comprises the face of a second user. This could be interpreted as “the images (the video frames of the first and second videos) comprise faces of more than one person”), the method comprising:
receiving a selection input to select one or more person with one or more faces from the one or more images received, 
generating a scene image showing the one or more persons with one or more faces based on the selection input, 
processing the scene image, the audio data, and the facial movement data, and generating an animation of the person enacting the message (Bouguerra, par. 64: “In one embodiment, when a user of a client device selects an animated video emoticon, the animated video emoticon is applied to the video captured by the first client device before it is transmitted to another client device. However, the user of the client device may additionally or alternatively select to apply an animated video emoticon to a video stream received from the other client device. For example, a first friend may want to see what his video-chat buddy would look like `surprised`, and so the first friend may invoke the `surprised` video emoticon on the video stream depicting his buddy”. In particular, instead of applying the animation to the user’s own images, the user can select a friend that he or she is chatting with, and apply the animation the friend’s images).

Regarding claim 8, Bouguerra in view of Challapali teaches the method according to claim 1, comprising:
receiving a chat request made by a user with at least another user,
establishing a chat environment between the users based on the chat request,
receiving at least one image representative of at least one of the users, wherein the image comprising at least one face,
receiving a message from at least one of the users in the chat environment, wherein the message comprises at least a text or an emotional and movement command (Since Bouguerra teaches animating facial images in a video chat environment, a person skilled in the art would infer that the above steps (i.e. receiving a chat request, establishing a chat environment, receiving images of users, and receiving messages exchanged between the users) are inherent in any video chat application).

Regarding claim 9, Bouguerra in view of Challapali teaches the method according to claim 8, wherein the message from a first computing device is received at a second computing device (Fig. 1 of Bouguerra suggests this limitation because Bouguerra teaches a video chat environment. For example, a user of video chat client device 101 can send a message with emoticons to video chat client device 102), and processing the one or more images, the audio data, and the facial movement data, and generating an animation of the person enacting the message in the chat environment (Bouguerra, pars. 64, 69 and 75).

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouguerra in view of Challapali as applied to claim 1 above, and further in view of Wang et al. (HIGH QUALITY LIP-SYNC ANIMATION FOR 3D PHOTO-REALISTIC TALKING HEAD, 2012).

Regarding claim 5, Bouguerra in view of Challapali teaches the method according to claim 1, comprising:


In the same field of endeavor, Wang teaches producing lip sync based on audio data (See Abstract and section 3).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to further modify Bouguerra by processing a message and its audio data to producing lip sync data, and processing the video, the audio data, the facial movement data, the lip sync data to generate an animation of the user enacting the message with lip sync. The motivation would have been to provide a photo-realistic talking head (Wang, Abstract).

Claim(s) 10 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouguerra in view of Challapali as applied to respective claims 9 and 1 above, and further in view of Sun et al. (Pub. No. US 2007/0216675).

Regarding claim 10, Bouguerra in view of Challapali teaches the method according to claim 9, wherein the images comprise faces of more than one person (Since Bouguerra discloses an emoticon-based animation technique used in a video chat environment, a person skilled in the art would infer that a first video comprises the face of a first user, and a second video comprises the face of a second user. This could be interpreted as “the images (the video frames of the first and second videos) comprise faces of more than one person”), the method comprising:


processing 
In the same field of video chatting/conferencing, Sun teaches receiving images representative of more than one users in a chat environment, and processing the images, and generating a scene image showing the users in the chat environment (See Fig. 2 and pars. 24 and 29. In particular, a user can select an arbitrary background image to replace a current background in an image. The background image could be interpreted as a scene image).
In light of the above teaching of Sun, it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to further modify Bouguerra by receiving at least one image representative of more than one users in the video chat environment, processing the at least one image, and generating a scene image showing the users in the chat environment, and processing the scene image, the audio data, and facial movement data, and generating an animation of the person enacting the message in the chat environment. The motivation would have been to give the user an option to replace the current background with a scene image that he or she likes.

Regarding claim 11, Bouguerra in view of Challapali teaches the method according to claim 1, comprising:




processing 
In the same field of video chat, Sun teaches the above strike-through limitations (See pars. 30-31 and Fig. 3).
In light of the above teaching of Sun, it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to further modify Bouguerra by receiving a wearing input related to a body part of at least one of the persons in the images onto which a fashion accessory is to be worn, processing the wearing input and identifying body parts of the person onto which the fashion accessory is to be worn, receiving an image or video of the accessory according to the wearing input, processing the identified body parts the person and the image or video of the accessory and generating a wearing image showing the person wearing the fashion accessory, processing the wearing image, the audio data, and the facial movement data, and generating an animation of the persons enacting the message wearing the fashion accessory. The motivation would have been to give the user an option to incorporate digital effects into existing video streams.

Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouguerra in view of Challapali as applied to claim 1 above, and further in view of Afifi et al. (“Video Face Replacement System Using a Modified Poisson Blending Technique”, 2014).

Regarding claim 12, Bouguerra in view of Challapali teaches the method according to claim 1, comprising:


processing 
In the same field of video editing, Afifi teaches generating a morphed video showing a user’s face from a first video on a user’s body from a second video (See Abstract and Fig. 2).
In light of the above teaching of Afifi, it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to further modify Bouguerra by receiving a target video showing a face of another user, processing the user’s video and the target video to generate a morphed video showing the face from the target video on the user's body from the user’s video, and processing the morphed video, the audio data, and the facial movement data to generate an animation of the user enacting the message. The motivation would have been to provide a means for face replacement (Afifi, section I. Introduction, 1st paragraph).

Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouguerra in view of Challapali as applied to claim 1 above, and further in view of Senftner et al. (Pub. No. US 2008/0019576).

Regarding claim 13, Bouguerra in view of Challapali teaches the method according to claim 1, comprising:

processing the one or more person images, 
In the same field of video editing, Senftner teaches replacing the body of a first user in a video with the body of a second user (Par. 58: “The initial description of the processes will be made using an example case where the video is personalized by substituting the image of the face of a new actor for the facial portion of the image of one of the video's original actors. Within this specification, the terms face and facial should be interpreted to include the visible portions of the ears, neck, and other adjacent skin areas unless otherwise noted. The same processes can be applied to substituting a larger portion of the new actor for the corresponding portion of the original actor, up to and including full-body substitution”).
In light of the above teaching of Senftner, it would have been obvious to one skilled in the art before the effective filing date of the claimed invention to further modify Bouguerra by receiving a target image or video showing a body of another user, and processing the one or more person images, the target image or video, the audio data, and the facial movement data to generate an animation of the user enacting the message with the body of said another user from the target image or video. The motivation would have been to provide a personalized video.

Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouguerra in view of Challapali as applied to claim 1 above, and further in view of Vronay et al. (Pub. No. US 2006/0251384).

Regarding claim 14, Bouguerra in view of Challapali teaches the method according to claim 1, the method is being implemented in a video call environment between at least two callers FIG. 5A illustrates a non-limiting, non-exhaustive example of a video-chat session. User 502 appears in video-chat session 504 on a client device of another user. The other user may optionally be viewing a similar video-chat session on his client device”), the method comprising:
- receiving an image of the caller In one embodiment, video chat client 243 may support video-chat sessions, wherein a video of a user may be captured using video capture device 259 and streamed to another user for display with display 254. Additionally or alternatively, a video of the other user may be captured and streamed to video chat client device 200 for display with display 254”), 
- receiving a message to be enacted by the caller, wherein the message comprises at least the text or the emotional and movement command (Bouguerra, par. 61), 
- processing the message to extract or receive an audio data related to voice of the caller (Challapali, par. 7), and a facial movement data related to expression to be carried on face of the caller (Bouguerra, par. 65), 
- processing the image, the audio data, and the facial movement data, and generating an animation of the caller enacting the message (Bouguerra, pars. 64, 69 and 75),
wherein emotional and movement command is a GUI or multimedia based instruction to invoke the generation of facial expressions and/or the body parts movement (Bouguerra, par. 74).
Bouguerra does not disclose that at least one of the users is not using a video camera.
In the same field of video chat/conference, Vronay teaches a video conferencing system wherein at least one of the participants is using previously-stored image data, without using a video camera (Par. 53: “it is noted that previously stored image data can be input into the computer 110 from any of the aforementioned computer-readable media as well, without directly requiring the use of a camera”).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to incorporate the teaching of Vronay into Bouguerra such that at least one of the users is using previously-stored image data, without using a video camera. The motivation would have been to provide support for devices that lack a video camera.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHONG X NGUYEN whose telephone number is (571)270-1591.  The examiner can normally be reached on Mon-Fri 8am - 5pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on (571)272-7761.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/PHONG X NGUYEN/           Primary Patent Examiner, Art Unit 2613