DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Allowable Subject Matter
Claims 1-20 are allowed.
The following is an examiner’s statement of reasons for allowance: no prior art discloses alone or in combination the italicized and bolded features.
Claim 1. A computer-implemented method comprising: given an input text, identifying phonemes in the input text; generating speech audio from the input text using a text-to-speech (TTS) system; obtaining phoneme timestamps that identify time occurrences in the generated speech audio of phonemes of the input text; using a phoneme-pose dictionary that correlates phonemes to key pose sequences, the phoneme timestamps, and interpolation to generate a sequence of poses corresponding to the phonemes in the input text; using the sequences of poses and a trained generative neural network model to generate a video; and combining the generated video with the generated speech audio to obtain a final output video.Claims 2-6 depend on allowable claim 1 and are therefore allowable for the same reasons as claim 1.Claim 7. A computer-implemented for generating a text-to-video system, the method comprising: building a phoneme-pose dictionary that correlates phonemes to key pose sequences by performing steps comprising: for each input video from a set of training input videos: extracting key pose sequences from the input video; given audio from the input video and a corresponding transcript for the input video, identifying phonemes and their time positions in the audio; and aligning the identified phonemes with their corresponding key pose sequences using the time positions; and adding each unique phoneme and its corresponding key pose sequence to the phoneme-pose dictionary; training a generative neural network model that generates photorealistic video using image sequences from training videos and their corresponding poses as inputs into a generative adversarial network (GAN) that comprises the generative neural network model; and forming a text-to-video system comprising: a text-to-speech (TTS) system that generates speech audio from an input text: the phoneme-pose dictionary that is used to generate a sequence of poses corresponding to the phonemes in the input text; and the trained generative neural network model to generate a video corresponding to the sequence of poses, which is combined with the generated speech audio to obtain a final output video.Claims 8-14 depend on allowable claim 1 and are therefore allowable for the same reasons as claim 7.Claim 15. A text-to-video system comprising: a phoneme-pose dictionary that correlates phonemes to key pose sequences; a trained generative neural network model for generating a video; one or more processors; and a non-transitory computer-readable medium or media comprising one or more sets of instructions which, when executed by at least one of the one or more processors, causes steps to be performed comprising: given an input text, identifying phonemes in the input text; generating speech audio from the input text using a text-to-speech (TTS) system; obtaining phoneme timestamps that identify time occurrence in the generated speech audio of phonemes of the input text; using the phoneme-pose dictionary that correlates phonemes to key pose sequences, the phoneme timestamps, and interpolation to generate a sequence of poses corresponding to the phonemes in the input text; using the sequences of poses and the trained generative neural network model to generate a video; and combining the generated video with the generated speech audio to obtain a final output video.Claims 16-20 depend on allowable claim 1 and are therefore allowable for the same reasons as claim 15.Relevant prior arts:US 20120203557 A1 The method involves receiving a voice signal from a source over a network. A destination associated with the received signal is determined. A signal processing algorithm is determined from multiple signal processing algorithms based on a determined address. A voice signal is processed according to the determined algorithm. The processed signal is sent to the associated address. An originator of the voice signal is determined if the determined destination is a human recipient. The address for voice transmission is selected.US 20060200344 A1 A method of reducing noise in an audio signal, comprising the steps of: using a furrow filter to select spectral components that are narrow in frequency but relatively broad in time; using a bar filter to select spectral components that are broad in frequency but relatively narrow in time; analyzing the relative energy distribution between the output of the furrow and bar filters to determine the optimal proportion of spectral components for the output signal; and reconstructing the audio signal to generate the output signal. A second pair of time-frequency filters may be used to further improve intelligibility of the output signal. The temporal relationship between the furrow filter output and the bar filter output may be monitored so that the fricative components are allowed primarily at boundaries between intervals with no voiced signal present and intervals with voice components. A noise reduction system for an audio signal.US 20130132085 A1 Methods and systems for non-negative hidden Markov modeling of signals are described. For example, techniques disclosed herein may be applied to signals emitted by one or more sources. In some embodiments, methods and systems may enable the separation of a signal's various components. As such, the systems and methods disclosed herein may find a wide variety of applications. In audio-related fields, for example, these techniques may be useful in music recording and processing, source extraction, noise reduction, teaching, automatic transcription, electronic games, audio search and retrieval, and many other applications.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN MUSHAMBO whose telephone number is (571)270-3390. The examiner can normally be reached Monday-Friday (8:00AM-5:00PM).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Tieu can be reached on (571) 272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARTIN MUSHAMBO/Primary Examiner, Art Unit 2674                                                                                                                                                                                                        8/27/2022