Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 2-7 and 14-19 are objected to because of the following informalities: each of these claims recites “generating the sequence of facial poses” which does not have antecedent basis in the claims, although it is clear what was intended.  The step in claims 1 and 13 to which this was intended to refer recites “deriving” the facial poses; accordingly, claims 2-7 and 14-19 should be amended to read --wherein deriving the sequence of facial poses further comprises--.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 8-15, and 20-26 are rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (U.S. Patent Application Publication No. 2004/0120554), referred herein as Lin, in view of Chun et al. (U.S. Patent Application Publication No. 2018/0268806), referred herein as Chun, and further in view of Navaratnam (U.S. Patent Application Publication No. 2017/0011745), referred herein as Navaratnam.
Regarding claim 1, Lin teaches a method for matching mouth shape and movement in digital video to alternative audio, the method comprising: deriving a sequence of facial poses including mouth shapes for an actor from a source digital video, wherein each pose in the sequence of facial poses corresponds to a middle position of each sample of the alternative audio (pp 40, lines 1-5; pp 52, lines 1-11; pp 54, lines 1-7); generating an animated face image based on the sequence of facial poses and the source digital video (pp 41, lines 1-5; pp 42, lines 1-14); transferring tracked expressions from at least one of the animated face image or a target video rendered therefrom to the source video and generating a rough output video that includes transfers of the tracked expressions (pp 55, lines 1-5; pp 56, lines 1-4; pp 66; pp 67, the last 9 lines); and generating a finished video at least in part by refining the rough video using a parametric model trained on mouth shapes in the animated face image or the target video (pp 61, lines 1-8; pp 63, lines 1-5; pp 67, the last 9 lines; pp 68, the last 8 lines).
As shown above, Lin teaches refining the video through unsupervised learning of representations/encodings by training the parametric model on mouth shapes; but Lin does not explicitly teach an autoencoder.  Chun teaches a method for synthesizing text to speech input comprising sequences of audio samples, wherein the synthesis uses an autoencoder (pp 25, lines 1-8; pp 26; pp 27, lines 1-6; pp 85, lines 1-6).  It would have been obvious to one of ordinary skill in the art to employ an autoencoder because as known in the art, and taught by Chun, this reduces the computational complexity and power consumption of the audio synthesis, while improving the quality of the audio synthesis such that it more closely approximates the input speech (see, for example, Chun, pp 17 and pp 85, the last 3 lines).

Regarding claim 2, Lin in view of Chun, further in view of Navaratnam teaches the method of claim 1, wherein generating the sequence of facial poses comprises sampling a sequence of audio samples taken from a recording of spoken dialog (Lin, pp 41, lines 1-5; pp 66, lines 1-6; Chun, pp 31, the last 5 lines; pp 85, lines 1-4; Navaratnam, pp 97, the last 8 lines).
Regarding claim 3, Lin in view of Chun, further in view of Navaratnam teaches the method of claim 2, wherein generating the sequence of facial poses comprises converting text to speech using a text-to-speech synthesizer (Chun, pp 25, lines 1-4; pp 31, the last 5 lines; Navaratnam, pp 45, the last 5 lines; pp 97).
Regarding claim 8, Lin in view of Chun, further in view of Navaratnam teaches the method of claim 1, wherein transferring tracked expressions from the target video to the source video further comprises synthesizing the mouth region and rendering each frame of the rough output video (Lin, pp 41; pp 42, lines 1-15; pp 66; Navaratnam, pp 95, the last 16 lines).
Regarding claim 9, Lin in view of Chun, further in view of Navaratnam teaches the method of claim 1, wherein refining the rough video using a parametric auto encoder trained on mouth shapes in the target video further comprises generating a training set for the autoencoder by random alteration of frames in the target set (Lin, pp 56, lines 1-22; pp 63, lines 1-16; Chun, pp 41, lines 1-11; pps 42 and 89).
Regarding claim 10, Lin in view of Chun, further in view of Navaratnam teaches the method of claim 1, wherein refining the rough video using a parametric auto encoder trained on mouth shapes in the target video further comprises cropping corresponding areas of the rough output video and the target video around the actor's mouth (Lin, pp 41; pp 52, lines 1-11; Navaratnam, pp 49, the last 5 lines; pp 103, the last 13 lines).
Regarding claim 11, Lin in view of Chun, further in view of Navaratnam teaches the method of claim 1, wherein refining the rough video using a parametric auto encoder trained on mouth shapes in the target video further comprises aligning and inserting processed images from the target video into the rough output video (Lin, pp 66; pp 67, the last 9 lines; Chun, pp 58).
Regarding claim 12, Lin in view of Chun, further in view of Navaratnam teaches the method of claim 1, further comprising rendering a target video from the animated face mesh, wherein transferring the tracked expressions includes transferring the expressions from the target video to the source video (Lin, pp 55, lines 1-5; pp 56, lines 1-4; pp 66; pp 67, the last 9 lines; Navaratnam, pp 95, the last 19 lines; pp 146, the last 6 lines; pp 147).
Regarding claim 13, Lin in view of Chun, further in view of Navaratnam teaches the limitations of this claim substantially correspond to the limitations of claim 1 (except for the processor and memory, which is taught by Lin, fig 1 and pp 28, lines 1-7); thus they are rejected on similar grounds.
Regarding claims 14, 15, and 20-24, the limitations of these claims substantially correspond to the limitations of claims 2, 3, and 8-12, respectively; thus they are rejected on similar grounds as their corresponding claims.
Regarding claim 25, the limitations of this claim substantially correspond to the limitations of claim 1 (except for the computer readable medium, which is taught by Lin, fig 1 and pp 29, lines 1-11); thus they are rejected on similar grounds.
Regarding claim 26, the limitations of this claim substantially correspond to the limitations of claim 1 (except for the apparatus, which is taught by Lin, fig 1 and pp 28, lines 1-7); thus they are rejected on similar grounds.

Claims 4, 5, 16, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Lin, in view of Chun, further in view of Navaratnam, and further in view of Xu et al. (U.S. Patent Application Publication No. 2012/0130717), referred herein as Xu.
Regarding claim 4, Lin in view of Chun, further in view of Navaratnam teaches the method of claim 3, wherein generating the sequence of facial poses further comprises extracting keypoints for the mouth shapes from an image of the actor, normalizing the keypoints, and applying an analysis to normalized keypoints to derive the mouth shapes (Lin, pp 52, lines 1-11; pp 54, lines 1-7; Navaratnam, pp 95, the last 16 lines).
Lin in view of Chun, further in view of Navaratnam does not teach applying a principle component analysis (PCA).  Xu teaches a method for deriving a sequence of facial poses and motions from a source video, generating an animated face based on the sequence, transferring tracked expressions from a target to a source video, and 
Regarding claim 5, Lin in view of Chun, further in view of Navaratnam, and further in view of Xu teaches the method of claim 4, wherein generating the sequence of facial poses further comprises deriving a mel-frequency cepstral coefficient (MFCC) for each of the samples and mapping each MFCC coefficient to one of the mouth shapes using a recurrent neural network (Lin, pps 47 and 48; pp 61, lines 1-8; pp 63, lines 1-10; Chun, pp 28; pp 36, lines 1-9; pps 37 and 38; Xu, pp 53).
Regarding claims 16 and 17, the limitations of these claims substantially correspond to the limitations of claims 4 and 5, respectively; thus they are rejected on similar grounds as their corresponding claims.

Allowable Subject Matter
Claims 6, 7, 18, and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.  The following is a statement of reasons for the indication of allowable subject matter:
Regarding claim 6, the prior art teaches the method of claims 1 and 2, and teaches linear and non-linear features, among other claimed features.  In the context of claims 1, 2, and 6 as a whole, however, the prior art does not teach the method, wherein generating the sequence of facial poses comprises sampling a sequence of audio samples taken from a recording of spoken dialog, and wherein generating the sequence of facial poses further comprises separating linear features from non-linear features, generating a time-varying sequence of speech features by processing the linear features with a deep neural network for format analysis, and generating a facial pose at the middle position by processing the non-linear features with deep neural network for facial articulation.
Regarding claim 7, this claim comprises allowable subject matter insomuch as it depends from claim 6, which comprises allowable subject matter.
Regarding claims 18 and 19, these claims substantially correspond to claims 6 and 7, and comprise allowable subject matter for similar reasons.

Conclusion
The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Haisma (U.S. Patent No. 6,697,120); Post-synchronizing an information stream including the replacement of lip objects.
Brand (U.S. Patent No. 6,735,566); Generating realistic facial animation from speech.

Cao (U.S. Patent Application Publication No. 2014/0185924); Face alignment by explicit shape regression.
Marcheret (U.S. Patent No. 9,697,833); Audio-visual speech recognition with scattering operators.
Francisco (U.S. Patent Application Publication No. 2017/0213076); Facial capture analysis and training system.
Liao (U.S. Patent Application Publication No. 2017/0337682); Method and system for image registration using an intelligent artificial agent.
Risser (U.S. Patent Application Publication No. 2018/0068463); Systems and methods for providing convolutional neural network based image synthesis using stable and controllable parametric models, a multiscale synthesis framework and novel network architectures.
Scholar (U.S. Patent Application Publication No. 2019/0057714); Systems and methods for machine-generated avatars.
Parshionikar (U.S. Patent Application Publication No. 2019/0265802); Gesture based user interfaces, apparatuses and control systems.
Shukla (U.S. Patent Application Publication No. 2019/0279642); System and method for speech understanding via integrated audio and visual based speech recognition.
Miller (U.S. Patent Application Publication No. 2019/0325633); Avatar facial expression representation in multidimensional space.

Perry (U.S. Patent Application Publication No. 2020/0097767); System and method for image de-identification.
Sleevi (U.S. Patent Application Publication No. 2020/0106708); Load balancing multimedia conferencing system, device, and methods.
Heller (U.S. Patent Application Publication No. 2020/0160581); Automatic viseme detection for generating animatable puppet.
Baker (U.S. Patent Application Publication No. 2020/0311572); Learning coach for machine learning system.
Comer (U.S. Patent Application Publication No. 2021/0012549); Animating virtual avatar facial movements.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID T WELCH whose telephone number is (571)270-5364.  The examiner can normally be reached on Monday-Thursday, 8:30-5:30 EST, and alternate Fridays, 9:00-2:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


DAVID T. WELCH
Primary Examiner
Art Unit 2613



/DAVID T WELCH/Primary Examiner, Art Unit 2613