DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/11/2020 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Allowable Subject Matter
Claims 1-24 are allowed.
The following is an examiner’s statement of reasons for allowance: no prior art discloses alone or in combination the italicized and bolded features.Claim 1. A computer-implemented method for training a system to generate a video of a person given an input text or an input audio comprising: given an input video comprising a person speaking and gesturing, using the input video and a joint three-dimensional (3D) model of a human body, face, and hands to generate a set of 3D poses corresponding to the person speaking and gesturing in the input video; using speech information related to the person speaking in the input video and a neural network model to generate a set of hidden states, which represent a set of 3D poses; comparing the set of hidden states from the neural network model with the set of 3D poses from the joint 3D model of a human body, face, and hands to train the neural network model, in which the set of 3D poses from the joint 3D model of a human body, face, and hands are treated as ground truth data; using the input video, the set of 3D poses from the joint 3D model of a human body, face, and hands, and a video generative adversarial network (GAN) to train a generative network of the video GAN to generate a video; and outputting the trained neural network and the trained generative network. 

Claims 2-8, 22 and 23 depend on allowable claim 1 and are therefore allowable for the same reasons as claim 1. 
 
Claim 9. A computer-implemented method for synthesizing a video of a person given an input speech data, the method comprising: generating a set of speech representations corresponding to the input speech data; inputting the set of speech representations into the trained neural network to generate an initial set of three-dimensional (3D) poses corresponding to the set of speech representations; identifying, using the input speech data, a set of words in the input speech data that correspond to a set of word entries in a key pose dictionary, which comprises, for each word entry in the key pose dictionary, one or more poses; responsive to identifying a word in the set of words from the input speech data that exists in the key pose dictionary that is set for replacement, forming a final set of 3D poses by replacing a set of one or more 3D poses from the initial set of 3D poses that are correlated to occurrence of the word in the initial set of 3D poses with a replacement set of one or more 3D poses obtained from the key pose dictionary that corresponds to the word; and generating a video of a person that poses in correspondence with the input speech data using the final set of 3D poses as an input into a trained generative network. 


Claim 16. A non-transitory computer-readable medium or media comprising one or more sequences of instructions which, when executed by one or more processors, causes steps for synthesizing a video of a person given an input speech data to be implemented, the steps comprising: generating a set of speech representations corresponding to the input speech data; inputting the set of speech representations into the trained neural network to generate an initial set of three-dimensional (3D) poses corresponding to the set of speech representations; identifying, using the input speech data, a set of words in the input speech data that correspond to a set of word entries in a key pose dictionary, which comprises, for each word entry in the key pose dictionary, one or more poses; responsive to identifying a word in the set of words from the input speech data that exists in the key pose dictionary that is set for replacement, forming a final set of 3D poses by replacing a set of one or more 3D poses from the initial set of 3D poses that are correlated to occurrence of the word in the initial set of 3D poses with a replacement set of one or more 3D poses obtained from the key pose dictionary that corresponds to the word; and generating a video of a person that poses in correspondence with the input speech data using the final set of 3D poses as an input into a trained generative network. 

Claims 17-18, 20 and 21 depend on allowable claim 16 and are therefore allowable for the same reasons as claim 16.
Relevant prior art:

US 20200294201 A1 A method of removing noise from a depth image includes presenting real-world depth images in real-time to a first generative adversarial neural network (GAN), the first GAN being trained by synthetic images generated from computer assisted design (CAD) information of at least one object to be recognized in the real-world depth image. The first GAN subtracts the background in the real-world depth image and segments the foreground in the real-world depth image to produce a cleaned real-world depth image. Using the cleaned image, an object of interest in the real-world depth image can be identified via the first GAN trained with synthetic images and the cleaned real-world depth image. In an embodiment the cleaned real-world depth image from the first GAN is provided to a second GAN that provides additional noise cancellation and recovery of features removed by the first GAN.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN MUSHAMBO whose telephone number is (571)270-3390. The examiner can normally be reached Monday-Friday (8:00AM-5:00PM).

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Tieu can be reached on (571) 272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARTIN MUSHAMBO/            Primary Examiner, Art Unit 2674                                                                                                                                                                                            3/11/2022