Notice of Pre-AIA  or AIA  Status. 
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

2.	Claims 1-20 filed on 07/11/2021 are pending and being examined. Claims1, 11, and 20 are independent form.

Priority
3.	 This application is a CON of 16/509,370 filed on 07/11/2019, now PAT 11114086, 16/509,370 is a CIP of 16/251,436 filed on 01/18/2019, now PAT 10789453.

Claim Rejections - 35 USC § 103
4.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

6.	Claims 1-2, 5, 11-12, 15, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Vats (US Pub 2019/0197755, hereinafter “Vats”). 

Regarding claim 1, Vats discloses a method (the method and the system for rendering a video including a set of face images showing the movement of lips with the voice data of a person; see para.97—para.102) comprising: 
receiving, by a computing device (by the recipient device 103; see 103 of fig.13), an input text (receiving a text input; see para.98) and a target image, the target image including a target face (and a face image of a person; see para.98); generating, by the computing device and based on the input text, a sequence of sets of acoustic features representing the input text (converting the text input into the voice data; see para.99); generating, by the computing device and based on the sequence of sets of acoustic features, a sequence of sets of mouth key points (“Lipsing the image according to voice data received as input or the voice generated from the text data and accordingly generating facial expression”, see para.100. It would be readily appreciated that the lipsing the image includes generating a sequence of sets of mouth key points from the image; see 310—313 of fig.2 and para.175); 
generating, by the computing device and based on the sequence of sets of mouth key points, a sequence of sets of facial key points (It would be readily appreciated that the generating facial expression includes generating a sequence of sets of facial key points from the image; see 310—313 of fig.2 and para.175).

In the foregoing cited embodiment, Vats does not explicitly disclose: generating, by the computing device and based on the sequence of sets of the facial key points and the target image, a sequence of frames, wherein the frames include the target face modified based on at least one set of mouth key points of the sequence of sets of mouth key points; and generating, by the computing device and based on the sequence of frames, an output video. However, in the following alternative embodiments, Vats clearly teach: generating, by the computing device and based on the sequence of sets of the facial key points and the target image, a sequence of frames, wherein the frames include the target face modified based on at least one set of mouth key points of the sequence of sets of mouth key points; and generating, by the computing device and based on the sequence of frames, an output video (generating a video including a set of facial frames indicating movement of lips/mouth according to the syllable of spoken word of the voice data converted from the text; see 205a, 205b, 205c, and 205d in fig.2 and para.175). It would be obvious and straightforward for a skilled person in the art before the effective filling date of the claimed invention was made to combine all the features taught by Vats into an alternative method as recited in claim. Suggestion/motivation for doing so would have been to render a video including a sequence of face frames showing movement of lips according to the voice data converted from text data (Vats, see fig.2 and para.175).  As a further rationale, one of ordinary skill in the art before the effective filling date of the claimed invention was made would have found it obvious to combine all the above features taught by Vats into an alternative method for rendering a video including a set of face images showing movement of lips with voice data of a person since doing this would amount to a simple combination of known elements to obtain predictable results.

Regarding claim 2, 12, Vats discloses, further comprising: synthesizing, by the computing device and based on the sequence of sets of acoustic features, an audio data representing the input text (converting the text input into the voice data; see para.99); and adding, by the computing device, the audio data to the output video (generating a video including a set of facial frames indicating movement of lips according to the syllable of spoken word of the voice data converted from the text; see 205a, 205b, 205c, and 205d in fig.2 and para.175).

Regarding claim 5, 15, Vats discloses, wherein: the generating the sequence of frames includes: determining, based on a sequence of sets of facial key points, a sequence of sets of two-dimensional (2D) deformations (face frames 310,…313; see fig.2); and applying each set of 2D deformations of the sequence of the sets of 2D deformations to the target input to obtain the sequence of frames (wherein each of the video facial frames indicates opening of lips/mouth according to the syllable of spoken word of the voice data converted from the text; see 205a, 205b, 205c, and 205d in fig.2 and para.175).

Regarding claim 11, 20, each of them is an inherent variation of claim 1, thus it is interpreted and rejected for the reasons set forth above in the rejection of claim 1.

7.	Claims 3-4 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Vats (US Pub 2019/0197755, hereinafter “Vats”) in view of Juvela et al (“SPEECHWAVEFORM SYNTHESIS FROM MFCC SEQUENCES WITH GENERATIVE ADVERSARIAL NETWORKS”, 2018, hereinafter “Juvela”).

Regarding claim 3, 13, Vats does not disclose, wherein the acoustic features include Mel-frequency cepstral coefficients. However the technique for generating acoustic features from Mel-frequency cepstral coefficients by a generative adversarial network (GAN) is widely used in the field of speech recognition. As evidence, Juvela teaches the method for generating acoustic features from Mel-frequency cepstral coefficients by a GAN (see Title, abstract, section 1 para.5). It would have been obvious to persons skilled in the art before the effective filling date of the claimed invention was made to incorporate the teachings of Juvela into the teachings of Vats by using Mel-frequency cepstral coefficients to generate acoustic features with a GAN, as taught by Vats. Suggestion/motivation for doing so would have been to reduce loss of high frequency components (Juvela, see section 1 para.5).

Regarding claim 4, 14, the combination of Vats and Juvela discloses, wherein the sequence of sets of acoustic features is generated by a neural network (Juvela, with GAN; see section 2.4).

8.	Claims 6-10 and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Vats (US Pub 2019/0197755, hereinafter “Vats”) in view of Olszewski et al (“High-Fidelity Facial and Speech Animation for VR HMDs”, 2016, hereinafter “Olszewski”).

Regarding claim 6, 16, Vats does not explicitly disclose, the sequence of sets of mouth key points is generated by a neural network; and at least one set of the sequence of sets of mouth key points is generated based on a pre-determined number of sets preceding the at least one set in the sequence of sets of mouth key points. However, in the same field of endeavor, Olszewski teaches, the sequence of sets of mouth key points is generated by a neural network (see fig.5, and section 5); and at least one set of the sequence of sets of mouth key points is generated based on a pre-determined number of sets (see “previous frame(s)” of fig.5) preceding the at least one set in the sequence of sets of mouth key points (see “target frame” of fig.5). It would have been obvious to persons skilled in the art before the effective filling date of the claimed invention was made to incorporate the teachings of Olszewski into the teachings of Vats by using a deep neural network to generate sets of mouth key points, as taught by Olszewski.  Suggestion/motivation for doing so would have been to provide real-time speech animation (Olszewski, see abstract).

Regarding claim 7, 17, the combination of Vats and Olszewski discloses the method of claim 6, wherein: the at the least one set of the sequence of sets of mouth key points corresponds to at least one set (S) of the sequence of sets of acoustic features (Vats: wherein each of the video facial frames indicates opening of lips/mouth according to the syllable of spoken word of the voice data converted from the text; see 205a, 205b, 205c, and 205d in fig.2 and para.175); and the at least one set of the sequence of sets of mouth key points is generated based on a first pre-determined number of sets of acoustic features  preceding the S in the sequence of sets of acoustic features (Vats: lips point in frame 310 beofre the syllable of spoken word in frame 311; see 205a fig.2) and a second pre-determined number sets of acoustic features succeeding the S in the sequence of sets of acoustic features (Vats: lips point in frame 310 after the syllable of spoken word in frame 311 ; see 205c fig.2).

Regarding claim 8, 18, the combination of Vats and Olszewski discloses the method of claim 5, wherein: the sequence of sets of facial key points is generated by a neural network (Olszewski: see fig.5 and section 5); and at least one set of the sequence of sets of facial key points is determined based on a pre-determined number of sets (Olszewski: see “previous frame(s)” of fig.5) preceding the at least one set in the sequence of sets of facial key points (Olszewski: see “target frame” of fig.5).

Regarding claim 9, 19, the combination of Vats and Olszewski discloses the method of claim 5, further comprising: generating, by the computing device and based on the sequence of sets of mouth key points, a sequence of mouth texture images; and inserting, by the computing device, each of the sequence of mouth texture images in a corresponding frame of the sequence of the frames (Olszewski: wherein each output facial texture image includes “mouth key points”; see “output” of fig.5).

Regarding claim 10, the combination of Vats and Olszewski discloses the method of claim 9, wherein each mouth texture image of the sequence of mouth texture images is generated by a neural network based on a first pre-determined number of mouth texture images preceding the mouth region image in the sequence of mouth region images (Olszewski: section 5).

Conclusion
9.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to RUIPING LI whose telephone number is (571)270-3376. The examiner can normally be reached 8:30am--5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, EMILY TERRELL can be reached on (571)270-3717. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit https://patentcenter.uspto.gov; https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center, and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RUIPING LI/Primary Examiner, Ph.D., Art Unit 2666                                                                                                                                                                                                        10/25/2022