DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/29/2020 has been entered.

Information Disclosure Statement
The information disclosure statement (IDS) submitted is considered by the examiner.

Response to Arguments
Applicant's arguments have been fully considered. The Double Patenting rejection has been maintained since there is no Terminal Disclaimer filed. Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 

Claims 1, 4, 8, 11, 15 and 18 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 5, 6, 10, 11 and 14 of U.S. Patent No. 10,586,368 B2. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1, 4, 8, 11, 15 and 18 are anticipated by claims 1, 5- 6, 10-11 and 14 as shown below. 

Instant Application
Patent 10,586,368 B2
1. A method comprising: accessing a data stream that comprises audio data and video data at a client device, the audio data comprising a speech signal; determining a phone sequence of the audio data based on the speech signal; identifying a user profile that corresponds with the set of facial landmarks from the video data of the data stream, the user profile comprising user profile data that includes a selection of a user avatar; generating a facial model based on the selection of the user avatar; causing display of a presentation of the facial model; and animating the presentation of the facial model based on the phone sequence.

4. The method of claim 1, wherein the data stream comprises the audio data and video data, and wherein the method further comprises: detecting a loss in the audio data in real-time data; parsing the video data to identify a first frame from among the set of video frames in response to the detecting the loss in the audio data; determining locations of a set of facial landmarks within the first frame of the video data; and causing display of the presentation of the facial model based on the locations of the set of facial landmarks.

Claims 8, 11, 15 and 18
1. A method comprising: accessing audio data and video data at a client device, the audio data comprising a speech signal; determining locations of a set of facial landmarks based on the video data; identifying a user profile based on the locations of the set of facial landmarks, the user profile comprising a selection of a user avatar; generating a weighted finite state transducer (WFST) based on at least the speech signal of the audio data: performing a breadth-first search upon an output of the WFST; determining a phone sequence based on the breadth-first search; generating a first facial model based on the locations of the set of facial landmarks; generating a second facial model based on the phone sequence; constructing a composite facial model based on the first facial model, the second facial model, and the selection of the user avatar; and causing display of the composite facial model at the client device.

5. The method of claim 1, wherein the locations are a first set of locations, the video data comprises a set of video frames, and the method further comprises: detecting a loss in real-time data; parsing the video data to identify a first frame from among the set of video frames in response to the detecting the loss in real-time data; determining a second set of locations of the set of facial landmarks within the first frame of the video data; and altering the composite facial model based on the second set of locations of the set of facial landmarks.

Claims 6, 10, 11 and 14



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 7-10 and 14-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dimtrva (US Publication Number 2006/0290699 A1, hereinafter “Dimtrva”) in view of Li et al. (US Publication Number 2015/0213604 A1, hereinafter “Li”).

(1) regarding claim 8:
As shown in figs. 1-8, Dimtrva disclosed a system (para. [0011], note that the present invention to provide a system and method for creating and displaying a realistic audio-visual representation of a speaker who is speaking) comprising: 
a memory (160, memory, fig. 1); and 
at least one hardware processor coupled to the memory (para. [0028], note that computer 120 comprises a central processing unit (CPU) 150 and memory 160. Memory 160 comprises operating system software 170 and application programs 180) and comprising instructions that causes the system to perform operations comprising: 
accessing a data stream that comprises audio data and video data at a client device (para. [0036], note that the input audio-visual signals in FIG. 4 are represented by source 410. Source 410 provides the audio-visual signals to module 310.), the audio data comprising a speech signal (para. [0038], note that source 410 of audio-visual signals also provides audio-visual signals to module 340. Module 340 obtains the speech portion of the audio signal for the speaker whose face is identified by module 310);
determining a phone sequence of the audio data based on the speech signal (para. [0043], note that a logical unit may be a word, or a phoneme, or a viseme. In one advantageous embodiment of the invention, the logical unit is a phoneme. A phoneme is a unit of sound in spoken language by which utterances are represented);
causing display of a presentation of the facial model (para. [0076], note that content synthesis application processor 190 analyzes the audio-visual signals to obtain a visual display of the speaker's face (step 620).); and 
animating the presentation of the facial model based on the phone sequence (para. [0072], note that facial animation for selected parameters module 370 synthesizes the speaker's face (i.e., creates a computer generated animated version of the speaker's face) using facial animation parameters that correspond to the appropriate classification). 
Dimtrva disclosed most of the subject matter as described as above except for specifically teaching the video data comprising a set of facial landmarks, identifying a user profile that corresponds with the set of facial landmarks from the video data of the data stream, the user profile comprising user profile data that includes a selection of a user avatar; and generating a facial model based on the selection of the user avatar.
However, Li teaches the video data comprising a set of facial landmarks (para. [0035], note that to identify and track a head, face, and/or facial region within image(s) provided by imaging input device 104 and to determine one or more facial characteristics of the user (e.g., facial characteristics 206)), identifying a user profile that corresponds with the set of facial landmarks from the video data of data stream (para. [0035], note that face detection module 204 also may be configured to track the detected face through a series of images (e.g., video frames at a given frame rate, such as 24 frames/second) and to determine a head position based on the detected face, as well as changes in facial characteristics of the user (e.g., facial characteristics 206).), the user profile comprising user profile data that includes a selection of a user avatar (para. [0039], note that device 102 further may include an avatar selection module 208 configured to allow selection (e.g., by the user) of an avatar for use during the communication session); and generating a facial model based on the selection of the user avatar (para. [0040], note that device 102 further may include an avatar control module 210 configured to generate an avatar in response to selection input from avatar selection module 208. Avatar control module 210 may include custom, proprietary, known, and/or after-developed avatar generation processing code (or instruction sets) that are generally well-defined and operable to generate an avatar based on the user's face/head position and/or facial characteristics 206 detected by face detection module 208.). 
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach the video data comprising a set of facial landmarks, identifying a user profile that corresponds with the set of facial landmarks from the video data of the data stream, the user profile comprising user profile data that includes a selection of a user avatar; and generating a facial model based on the selection of the user avatar. The suggestion/motivation for doing so would have been in order to help to reduce communications bandwidth use, preserve the individual's anonymity, and/or provide enhanced entertainment value (e.g., amusement) for the individual (para. [0014]). Therefore, it would have been obvious to combine Dimtrva with Li to obtain the invention as specified in claim 8.

(2) regarding claim 9:
Dimtrva disclosed most of the subject matter as described as above except for specifically teaching generating the facial model based on the selection of the user avatar and the video data. 
However, Li teaches generating the facial model based on the selection of the user avatar and the video data (para. [0040], note that device 102 further may include an avatar control module 210 configured to generate an avatar in response to selection input from avatar selection module 208. Generate an avatar based on the user's face/head position and/or facial characteristics 206 detected by face detection module 208. A single animation may alter the appearance of a still image, or multiple animations may occur in sequence to simulate motion in the image (e.g., head turn, nodding, talking, frowning, smiling, laughing, etc.).).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach generating the facial model based on the selection of the user avatar and the video data. The suggestion/motivation for doing so would have been in order to help to reduce communications bandwidth use, preserve the individual's anonymity, and/or provide enhanced entertainment value (e.g., amusement) for the individual (para. [0014]). Therefore, it would have been obvious to combine Dimtrva with Li to obtain the invention as specified in claim 9.

(3) regarding claim 10:
Dimtrva further disclosed the system of claim 8, wherein the client device is a first client device, and the causing display of the composite facial model includes: causing display of a presentation of the facial model at the second client device (para. [0082], note that this creates an audio-visual representation of the speaker's face that is synchronized with the speaker's speech. The audio-visual representation of the speaker's face is then output to display unit 110 (step 880).). 

(4) regarding claim 14:
Dimtrva further disclosed the system of claim 8, wherein the presentation of the facial model comprises a three-dimensional facial model (para. [0072], note that facial animation for selected parameters module 370 receives additional input from a three dimensional (3D) facial model module 540 and a texture maps module 550. Facial animation for selected parameters module 370 synthesizes the speaker's face (i.e., creates a computer generated animated version of the speaker's face) using facial animation parameters that correspond to the appropriate classification).

The proposed rejection as explained in system claims 8-10, 14 renders obvious the steps of the method (para. [0028]) claims 1-3, 7 and the non-transitory machine-readable storage medium (para. [0030]) claims 15-17 because these steps occur in the operation of the proposed rejection as discussed above. Thus, the arguments similar to that presented above for claims 8-10, 14 are equally applicable to claims 1-3, 7 and 15-17.

Claims 5-6, 12-13, 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dimtrva and Li, further in view of Corazza et al. (US Publication Number 2013/0235045 A1, hereinafter “Corazza”).

(1) regarding claim 12:
Dimtrva disclosed most of the subject matter as described as above except for specifically teaching generating a message that includes the presentation of the facial model, the message including an identifier associated with a second client device; and causing display of the message that includes the presentation of the facial model at the second client device, the message including an ephemeral message.
However, Corazza disclosed generating a message that includes the presentation of the facial model, the message including an identifier associated with a second client device (para. [0054], note that Animated video messages can be sent directly utilizing any method, including (but not limited to) by streaming a rendered character facial animation to another networked computing device, sending tracked human facial expression data that enables a networked computing device to generate, render and display a character facial animation, or by sending character facial animation data to a networked computing device that enables a networked computing device to render and display a character facial animation); and causing display of the message that includes the presentation of the facial model at the second client device, the message including an ephemeral message (para. [0056], note that second networked computing device can receive the captured video, detect changes in human facial expressions or recognize an animation trigger, generate a character facial animation from the tracked changes or other character animation in accordance with the animation trigger, and/or render the character animation. In a number of embodiments, the second networked computing device can change a variety of aspects of the animation including (but not limited) modifying the 3D character selection, modifying backdrops, props, text, audio, camera movement, angle, orientation, zoom, and/or any other characteristic of the animation that was controllable during the initial generation of the animation).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach generating a message that includes the presentation of the facial model, the message including an identifier associated with a second client device; and causing display of the message that includes the presentation of the facial model at the second client device, the message including an ephemeral message. The suggestion/motivation for doing so would have been in order to enable collaborative creation, transmission, sharing, non-linear exploration, and modification of animated video messages (para. [0006]). Therefore, it would have been obvious to combine Dimtrva and Li with Corazza to obtain the invention as specified in claim 12.

(2) regarding claim 13:
Dimtrva disclosed most of the subject matter as described as above except for specifically teaching parsing the video data from the data stream; identifying a set of facial landmarks based on the video data; and identifying the user profile based on the set of facial landmarks. 
However, Corazza disclosed parsing the video data from the data stream (para. [0054], note that these animated video messages can be generated by a networked computing device and sent to another networked computing device directly. Animated video messages can be sent directly utilizing any method, including by streaming a rendered character facial animation to another networked computing device, sending tracked human facial expression data that enables a networked computing device to generate, render and display a character facial animation, or by sending character facial animation data to a networked computing device that enables a networked computing device to render and display a character facial animation.); identifying a set of facial landmarks based on the video data; and identifying the user profile based on the set of facial landmarks (para. [0054], note that streaming can be accomplished by sending individual component packets of data from a rendered animated video message in a specific order where the packets of data are buffered by the networked computing device that receives the stream and plays back the animated video message in real time.).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach parsing the video data from the data stream; identifying a set of facial landmarks based on the video data; and identifying the user profile based on the set of facial landmarks. The suggestion/motivation for doing so would have been in order to enable collaborative creation, transmission, sharing, non-linear exploration, and modification of animated video messages (para. [0006]). Therefore, it would have been obvious to combine Dimtrva and Li with Corazza to obtain the invention as specified in claim 13.

The proposed rejection as explained in system claims 12-13, renders obvious the steps of the method (para. [0028]) claims 5-6 and the non-transitory machine-readable storage medium (para. [0030]) claims 19-20 because these steps occur in the operation of the proposed rejection as discussed above. Thus, the arguments similar to that presented above for claims 12-13 are equally applicable to claims 5-6 and 19-20.

Claims 4, 11 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dimtrva and Li, further in view of Lin (US Publication Number 2004/0120554 A1)

 (1) regarding claim 11:
Dimtrva disclosed most of the subject matter as described as above except for specifically teaching wherein the data stream comprises the audio data and video data, and wherein the method further comprises: detecting a loss in the audio data in real-time data; parsing the video data to identify a first frame from among the set of video frames in response to the detecting the loss in the audio data; determining locations of a set of facial landmarks within the first frame of the video data; and causing display of the presentation of the facial model based on the locations of the set of facial landmarks.
However, Lin disclosed wherein the data stream comprises the audio data and video data, and wherein the method further comprises: detecting a loss in the audio data in real-time data (para. [0077], note that the synthesis process becomes real-time with a face shape being output for every block of audio data input (assuming 40 ms blocks and a frame rate of approximately 25 fps), albeit with some loss in accuracy and continuity); parsing the video data to identify a first frame from among the set of video frames in response to the detecting the loss in the audio data (para. [0078], note that coefficients were appropriately adjusted in different cases to find a best match between the original and synthesized faces); determining locations of a set of facial landmarks within the first frame of the video data (para. [0079], note that In FIGS. 6A, 6B and 6C, the lip heights of the synthesized faces were compared with the original ones when the system input several seconds of a person's voice. The slopes of the two curves were similar in most cases. At the same time, the curve matched the input sound wave and phonemes accurately.); and causing display of the presentation of the facial model based on the locations of the set of facial landmarks (para. [0052], note that FIG. 4C displays the output points that show the shape of the mouth, nose and chin. These output points are used to model the facial data of a video frame associated with a person speaking and are referred to as face shapes herein after).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach wherein the data stream comprises the audio data and video data, and wherein the method further comprises: detecting a loss in the audio data in real-time data; parsing the video data to identify a first frame from among the set of video frames in response to the detecting the loss in the audio data; determining locations of a set of facial landmarks within the first frame of the video data; and causing display of the presentation of the facial model based on the locations of the set of facial landmarks. The suggestion/motivation for doing so would have been in order to design a real-time execution of lip synchronization with highly continuous video (para. [0009]). Therefore, it would have been obvious to combine Dimtrva, Li with Lin to obtain the invention as specified in claim 11.

The proposed rejection as explained in system claim 11, renders obvious the steps of the method (para. [0028]) claim 4 and the non-transitory machine-readable storage medium (para. [0030]) claim 18 because these steps occur in the operation of the proposed rejection as discussed above. Thus, the arguments similar to that presented above for claim 11 is equally applicable to claims 4 and 18.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Guiterrez-Osuna (Speech-Driven Facial Animation with Realistic Dynamics, NPL, 2005) disclosed an integral system capable of generating animations with realistic dynamics, including the individualized nuances, of three-dimensional (3-D) human faces driven by speech acoustics.

Breton et al. (FaceEngine A 3D Facial Animation Engine for Real Time Applications, NPL, 2001) disclosed to allow transmission of facial animation.

Any inquiry concerning this communication or earlier communication from the examiner should be directed to Hilina K Demeter whose telephone number is (571) 270-1676. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Tieu could be reached at (571) 272- 7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about PAIR system, see http://pari-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HILINA K DEMETER/Primary Examiner, Art Unit 2674