DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
2.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 3/16/2022 has been entered.

Response to Amendment
3.	Applicant’s amendments filed on 03/16/2022 have been entered. Claims 1-3, 7-8, and 15 have been amended. Claims 1-20 are pending in this application, with claims 1, 8 and 15 being independent.

Response to Arguments
4.	Applicant's arguments, filed 03/16/2022, with respect to the 103 rejection have been fully considered and are persuasive.
Applicant argues, the prior art should not be found to teach or suggest the claimed (1) "generating a time-frequency representation of [a] chunk of [an] audio signal and flattening the time-frequency representation into a single dimensional vector."; (2) "audio feature vector," or "generating ... predicted 3D facial landmarks reflecting a corresponding portion of the head saying a portion of the speech in the sliding window" "by applying the audio feature vector [extracted from a chunk of [an] audio signal in a sliding window ] ... to a neural network."; (3) "an audio feature vector" "extract[ed] . .. from a chunk of the audio signal in a sliding window," "initial 3D facial landmarks extracted.from the single image [of a head to animate with [the} audio signal of [the] speech," or "generating, by applying the audio feature vector and initial 3D facial landmarks extracted from the single image to a neural network, predicted 3D facial landmarks reflecting a corresponding portion of the head saying a portion of the speech in the sliding window." as recited in the amended independent claims 1, 8 and 15.
In reply, the Examiner agrees.

EXAMINER’S STATEMENT OF REASONS FOR ALLOWANCE
All current claims 1-20 are in condition for allowance.
The following is an examiner’s statement of reasons for allowance: 
With regard to amended independent claims 1, 8 and 15 the prior arts of record teach:
Moulton et al. (US-2002/0097380-A1), teaches one or more computer storage media storing computer-useable instructions that, when used by one or more computing devices (¶0012; ¶0088) , cause the one or more computing devices to perform operations comprising: accessing an image of a head to animate with an audio signal of speech (Fig. 1 and ¶0063; ¶0023; ¶0026; ¶0034); generating a plurality of animation frames, by, for each of the animation frames (¶0006; ¶0023): generating a set of predicted 3D facial landmarks reflecting a corresponding portion of the head saying a portion of the speech in the sliding window (¶0011; ¶0014; ¶0017; ¶0037); and generating the animation frame by transforming the image to fit the predicted 3D facial landmarks (¶0007; ¶0011; ¶0022); and compiling the plurality of animation frames with the audio signal into an animation of the head saying the speech (¶0023; ¶0064; ¶0065); accessing a representation of a head to animate with an audio signal of speech (Fig. 1 and ¶0063; ¶0023; ¶0026; ¶0034); generating, from the representation, a set of initial 3D facial landmarks of the head (¶0022; ¶0037); extracting, from each window of the audio signal, a portion of the audio signal in the window (¶0064; ¶0065); generating, the set of initial 3D facial 
Savchenkov et al. (US-2020/0234690-A1), teaches accessing a single image of a head to animate (Fig. 1 and ¶0033; Fig. 9 and ¶0085-0087); generating, by a neural network evaluating initial 3D facial landmarks extracted from the single image (¶0008-0009; Fig. 4 and ¶0040; Fig. 5 and ¶0043; ¶0082); an audio feature encoding an audio chunk from a sliding window (¶0058; ¶0060; ¶0062); generating the animation frame by warping the single image (¶0048; ¶0052); accessing a single representation of a head to animate (Fig. 1 and ¶0033; Fig. 9 and ¶0085-0087); generating, from the single representation, a set of initial facial landmarks representing a rest pose of the head (Fig. 4 and ¶0040); an audio feature encoding a portion of the audio signal in the window (¶0058; ¶0060; ¶0062); generating, the audio feature for each window (¶0058; ¶0060; ¶0062); generate, from a single representation of a head to animate (Fig. 1 and ¶0033; Fig. 9 and ¶0085-0087) and audio encodings of successive windows of an audio signal of speech (¶0058; ¶0060; ¶0062). 
Zhou et al. (VisemeNet: Audio-Driven Animator-Centric Speech Animation), teaches generating the set of predicted 3D facial landmarks is based on one of a plurality of speaking 
Deller et al. (US-10,127,908-B1), teaches receiving a supplemental training video capturing a supplemental set of facial dynamics (col 8 lines 18-50; col 9 lines 46-51; col 28 lines 1-3; col 38 lines 48-62).
Karras et al. (Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion), teaches capture partially overlapping audio chunks from the audio signal (page 94:4, section 3.2 Audio processing, 4th paragraph).
Found references:
Strietzel et al. (US-2012/0249762-A1), teaches creating dynamic interactive advertisements using individualized three-dimensional (3D) human models. The interactive advertisements may be automatically generated based on a user profile associated with the viewer or may be created from an advertisement template by the viewer […] one or more individualized 3D human head models are automatically selected for inclusion in the interactive advertisement based on the profile or preferences of the user without requiring a user selection of the 3D head model (Abstract).  The warping routine can be performed using landmark points and/or other reference points or features (¶0117).
Cao et al. (US-2019/0130628-A1), teaches a joint automatic audio visual driven facial animation system that in some example embodiments includes a full scale state of the art Large Vocabulary Continuous Speech Recognition (LVCSR) with a strong language model for speech recognition and obtained phoneme alignment from the word lattice (Abstract). Tracking facial landmarks and generating a 3D facial model based on audio and video data (¶0002, ¶0020-0024).

Stenger et al. (US-2014/0210831-A1), teaches a method of animating a computer generation of a head, the head having a mouth which moves in accordance with speech to be output by the head, said method comprising: providing an input related to the speech which is to be output by the movement of the mouth […] outputting said sequence of image vectors as video such that the mouth of said head moves to mime the speech associated with the input text with the selected expression (Abstract). 
Huang et al. (US-2007/0009180-A1), teaches producing a synthesized facial model synchronized with voice. According to one embodiment, synchronizing colorful human or human-like facial images with voice is carried out as follows: determining feature points in a plurality of image templates about a face […] coloring each of the templates with reference to the chromaticity data, and processing the image templates to obtain a synthesized image (Abstract). The training module 102 is configured to determine the Mel-frequency Cepstrum Coefficient (MFCC) vector from the voice data, and subtract an average voice feature vector therefrom to obtain a voice feature vector (¶0037).
Lee et al. (US-2018/0178372-A1), teaches a home robot device (Abstract). The analysis algorithm may include instructions that extract a voice feature vector from the voice signal. The extracted voice feature vector may be compared to an emotion table previously stored in the memory, so that the emotion of the user and the user preference may be determined (¶0047).

When considering Claim 1 as a whole, however, the combination of prior art does not teach the limitation of "extracting an audio feature vector from a chunk of the audio signal in a sliding window; generating, by applying the audio feature vector and initial 3D facial landmarks extracted from the single image to a neural network, predicted 3D facial landmarks reflecting a corresponding portion of the head saying a portion of the speech in the sliding window; ” as recited by amended independent claim 1 (emphasis added) as described in the specification at figures 2-3 and at least at paragraphs 40 and 46 of the specification of the invention.
Therefore, in the context of claim 1 as a whole, the prior arts do not teach the claimed subject matter. Thus, the subject matter of claim 1 is allowable.
The remaining dependent claims depend directly or indirectly from allowable independent claim 1, and are therefore also allowable. 
When considering Claim 8 as a whole, however, the combination of prior art does not teach the limitation of "extracting, from a chunk of audio in each window of the audio signal, an audio feature vector encoding a portion of the audio signal in the window, by generating a time-frequency representation of the chunk of the audio signal and flattening the time-frequency representation into a single dimensional vector.” as recited by amended independent claim 8 (emphasis added) as described in the specification at figures 2-3 and at least at paragraphs 40, 46 of the specification of the invention.
Therefore, in the context of claim 8 as a whole, the prior arts do not teach the claimed subject matter. Thus, the subject matter of claim 8 is allowable.
The remaining dependent claims depend directly or indirectly from allowable independent claim 8, and are therefore also allowable. 

generate, by applying to a neural network (i) initial 3D facial landmarks extracted from a single representation of a head to animate and (ii) audio encodings extracted from successive windows of an audio signal of speech, a plurality of sets of predicted 3D facial landmarks for the head; and an animation compiler configured to use the one or more hardware processors to generate, based on the plurality of sets of predicted 3D facial landmarks, an animation of the head saying the speech while the audio signal plays.” as recited by amended independent claim 15 (emphasis added) as described in the specification at figures 2-3 and at least at paragraphs 40, 45 and 46 of the specification of the invention.
Therefore, in the context of claim 15 as a whole, the prior arts do not teach the claimed subject matter. Thus, the subject matter of claim 15 is allowable.
The remaining dependent claims depend directly or indirectly from allowable independent claim 15, and are therefore also allowable. 

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
5.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL LE whose telephone number is (571)272-5330. The examiner can normally be reached 9am-5pm.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached on (571) 272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MICHAEL LE/Primary Examiner, Art Unit 2619