Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5-13, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Xiao et al. (U.S. Patent No. 11,113,859), referred herein as Xiao, in view of Ravikumar et al. (“Reading Between the Dots: Combining 3D Markers and FACS Classification for High-Quality Blendshape Facial Animation”; Graphics Interface Conference; June 2016), referred herein as Ravikumar, and further in view of Pereira et al. (U.S. Patent No. 10,360,905), referred herein as Pereira.
Regarding claim 1, Xiao teaches a method for generating animation from audio, the method comprising: receiving input audio data (col 8, lines 30-35; col 10, lines 56-59); generating metadata for the input audio data (col 8, lines 38-52); generating a set of predictions from the generated metadata for the input audio, wherein the set of predictions affects blendshape weights (col 8, lines 59-66; col 10, line 65 through col 11, line 9); generating a set of event predictions from the generated metadata for the input audio data, wherein the set of event predictions comprises at least one of event detection and voice activity detection (col 7, lines 11-17; col 8, lines 30-35; col 9, lines 2-9; col 11, lines 2-9); generating a final prediction from the set of predictions and the set of event predictions, wherein the final prediction comprises a set of final blendshape weights, and generating an output based on the generated final prediction (col 11, lines 16-23 and 46-53).  Although Xiao teaches that the predictions affect blendshape weights, Xiao does not explicitly teach generating blendshape weight predictions, wherein the blendshape weight predictions comprise blendshape weights.
Ravikumar teaches a method for facial animation utilizing, in part, facial action coding system information (abstract, lines 8-15; page 144, section 2, first para, lines 1-5), comprising generating blendshape weight predictions, wherein the blendshape weight predictions comprise blendshape weights (page 146, figure 2 caption, last 2 lines; page 147, section 7.1, first para, lines 1-6; section 7.2, first para, lines 1-12; page 149, section 7.2.4, para beginning “The output”, lines 1-6 and the last 2 lines).  It would have been obvious to one of ordinary skill in the art to utilize such blendshape weight prediction because as known in the art, and taught by Ravikumar, this improves the flexibility and quality of the facial animation process, and enables higher quality animation of more complex facial feature movements (see, for example, Ravikumar, page 144, section 2, para beginning “We propose”, lines 1-12; para beginning “Traditional solving”, lines 1-7; para beginning “Increasing the number”, the last 7 lines).  The term “embedding” has a variety of interpretations, some of which may be applicable in Xiao and Ravikumar; however, Xiao in view of Ravikumar does not explicitly teach an embedding, wherein the embedding identifies features from the input audio data.


Pereira teaches a method comprising receiving input audio data comprising mel-frequency cepstral coefficient features, and generating an embedding for the input audio data, wherein the embedding identifies features from the input audio data (col 10, lines 30-38 and 42-46; col 14, lines 39-50; col 15, lines 9-16 and 52-60).  It would have been obvious to one of ordinary skill in the art to generate an embedding identifying audio features because as taught by Pereira, this helps process the input audio data such that relevant audio can be enhanced while interfering audio can be reduced, thereby improving the accuracy and quality of the audio (see, for example, Pereira, col 6, lines 51-62 and col 15, line 67 through col 16, line 5).
Regarding claim 2, Xiao in view of Ravikumar, further in view of Pereira teaches the method of claim 1, wherein the input audio data comprises mel-frequency cepstral coefficient (MFCC) features (Pereira, col 10, lines 30-38; col 15, lines 52-56; Xiao, col 8, lines 38-52; col 9, lines 2-9).
Regarding claim 3, Xiao in view of Ravikumar, further in view of Pereira teaches the method of claim 2, wherein generating the embedding comprises utilizing at least one of a recurrent neural network and a convolutional neural network to generate the embedding based on the MFCC features (Xiao, col 9, lines 2-9; Ravikumar, page 150, section 9, the last 13 lines; Pereira, col 10, lines 30-38).
Regarding claim 5, Xiao in view of Ravikumar, further in view of Pereira teaches the method of claim 1, wherein generating the set of event predictions comprises determining a level of voice activity in the input audio data (Xiao, col 8, lines 46-58; Pereira, col 13, lines 17-28).

Regarding claim 6, Xiao in view of Ravikumar, further in view of Pereira teaches the method of claim 1, wherein generating the set of event predictions comprises determining whether an audio event has occurred, wherein the audio event comprises at least one of laughing, crying, screaming, and/or shouting (Xiao, col 8, lines 48-52; Pereira, col 10, lines 30-38; para 14, lines 39-50; col 15, lines 52-60).
Regarding claim 7, Xiao in view of Ravikumar, further in view of Pereira teaches the method of claim 6, wherein generating the final prediction comprises: determining whether a laughter event has occurred; and generating blendshape weights to cause an avatar to perform a laughing motion (Xiao, col 11, lines 16-23 and 46-53; Pereira, col 10, lines 30-38; col 15, lines 52-60; Ravikumar, page 149, section 9, para beginning “The strength”).
Regarding claim 8, Xiao in view of Ravikumar, further in view of Pereira teaches the method of claim 1, wherein generating the final prediction comprises: determining whether a level of voice activity exceeds a threshold; and when the level of voice activity does not exceed a threshold, generating blendshape weights that close the mouth (Xiao, col 7, lines 11-17; col 8, lines 48-58; col 9, lines 21-27; Pereira, col 13, lines 17-28; Ravikumar, page 149, section 9, para beginning “The strength”).
Regarding claim 9, Xiao in view of Ravikumar, further in view of Pereira teaches the method of claim 1, wherein generating the output comprises rendering an avatar model based on the final blendshape weights of the final prediction (Xiao, col 11, lines 16-23 and 46-53; Ravikumar, page 149, section 9, para beginning “The strength”).


Regarding claim 10, Xiao in view of Ravikumar, further in view of Pereira teaches the method of claim 1, wherein the final prediction further comprises animation curves for animating an avatar model (Xiao, col 7, line 61 through col 8, line 4; Ravikumar, page 147, section 7.2, first para; page 149, section 9, para beginning “The strength” and para beginning “For our purposes”).
Regarding claim 11, the limitations of this claim substantially correspond to the limitations of claim 1 (except for the medium, instructions, and processor, which are disclosed by Xiao, col 14, lines 13-16); thus they are rejected on similar grounds.
Regarding claims 12, 13, and 15-20, the limitations of these claims substantially correspond to the limitations of claims 2, 3, and 5-10, respectively; thus they are rejected on similar grounds as their corresponding claims.

Allowable Subject Matter
Claims 4 and 14 remain objected to as being dependent upon a rejected base claim, but allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Response to Arguments
Applicant’s arguments with respect to the 103 rejections have been fully considered, but are moot in view of the new grounds of rejection presented above.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID T WELCH whose telephone number is (571)270-5364. The examiner can normally be reached Monday-Thursday, 8:30-5:30 EST, and alternate Fridays, 9:00-2:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

DAVID T. WELCH
Primary Examiner
Art Unit 2613



/DAVID T WELCH/Primary Examiner, Art Unit 2613