Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-11, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Xiao et al. (U.S. Patent No. 11,113,859), referred herein as Xiao, in view of Ravikumar et al. (“Reading Between the Dots: Combining 3D Markers and FACS Classification for High-Quality Blendshape Facial Animation”; Graphics Interface Conference; June 2016), referred herein as Ravikumar, and further in view of el Kaliouby et al. (U.S. Patent Application Publication No. 2019/0133510), referred herein as Kaliouby.
Regarding claim 1, Xiao teaches a method for generating animation from audio, the method comprising: receiving input audio data (col 8, lines 30-35; col 10, lines 56-59); generating metadata for the input audio data (col 8, lines 38-52); generating a set of predictions from the generated metadata for the input audio, wherein the set of predictions affects blendshape weights (col 8, lines 59-66; col 10, line 65 through col 11, line 9); generating a set of event predictions from the generated metadata for the input audio data, wherein the set of event predictions comprises at least one of event detection and voice activity detection (col 7, lines 11-17; col 8, lines 30-35; col 9, lines 
Although Xiao teaches that the predictions affect blendshape weights, Xiao does not explicitly teach generating blendshape weight predictions, wherein the blendshape weight predictions comprise blendshape weights.  Ravikumar teaches a method for facial animation utilizing, in part, facial action coding system information (abstract, lines 8-15; page 144, section 2, first para, lines 1-5), comprising generating blendshape weight predictions, wherein the blendshape weight predictions comprise blendshape weights (page 146, figure 2 caption, last 2 lines; page 147, section 7.1, first para, lines 1-6; section 7.2, first para, lines 1-12; page 149, section 7.2.4, para beginning “The output”, lines 1-6 and the last 2 lines).  It would have been obvious to one of ordinary skill in the art to utilize such blendshape weight prediction because as known in the art, and taught by Ravikumar, this improves the flexibility and quality of the facial animation process, and enables higher quality animation of more complex facial feature movements (see, for example, Ravikumar, page 144, section 2, para beginning “We propose”, lines 1-12; para beginning “Traditional solving”, lines 1-7; para beginning “Increasing the number”, the last 7 lines).
The term “embedding” has a variety of interpretations, some of which may be applicable in Xiao and Ravikumar; however, Xiao in view of Ravikumar does not explicitly teach an embedding.  Kaliouby teaches a method for generating animation comprising receiving and processing input audio data, generating predictions for the input data and generating an output based on the predictions (para 92, lines 1-13; para 94, lines 1-7), and further comprising generating an embedding for the input audio data 
Regarding claim 5, Xiao in view of Ravikumar, further in view of Kaliouby teaches the method of claim 1, wherein generating the set of event predictions comprises determining a level of voice activity in the input audio data (Xiao, col 8, lines 46-58).
Regarding claim 6, Xiao in view of Ravikumar, further in view of Kaliouby teaches the method of claim 1, wherein generating the set of event predictions comprises determining whether an audio event has occurred, wherein the audio event comprises at least one of laughing, crying, screaming, and/or shouting (Xiao, col 8, lines 48-52; Kaliouby, para 81, lines 1-2 and 12-18; para 92, lines 1-9).
Regarding claim 7, Xiao in view of Ravikumar, further in view of Kaliouby teaches the method of claim 6, wherein generating the final prediction comprises: determining whether a laughter event has occurred; and generating blendshape weights to cause an avatar to perform a laughing motion (Xiao, col 11, lines 16-23 and 46-53; Kaliouby, para 81, lines 1-2 and 12-18; para 92, lines 1-9).
Regarding claim 8, Xiao in view of Ravikumar, further in view of Kaliouby teaches the method of claim 1, wherein generating the final prediction comprises: determining whether a level of voice activity exceeds a threshold; and when the level of voice activity does not exceed a threshold, generating blendshape weights that close the mouth (Xiao, col 7, lines 11-17; col 8, lines 48-58; col 9, lines 21-27).
Regarding claim 9, Xiao in view of Ravikumar, further in view of Kaliouby teaches the method of claim 1, wherein generating the output comprises rendering an avatar model based on the final blendshape weights of the final prediction (Xiao, col 11, lines 16-23 and 46-53).
Regarding claim 10, Xiao in view of Ravikumar, further in view of Kaliouby teaches the method of claim 1, wherein the final prediction further comprises animation curves for animating an avatar model (Xiao, col 7, line 61 through col 8, line 4; Kaliouby, para 92, lines 10-13).
Regarding claim 11, the limitations of this claim substantially correspond to the limitations of claim 1 (except for the medium, instructions, and processor, which are disclosed by Xiao, col 14, lines 13-16); thus they are rejected on similar grounds.
Regarding claims 15-20, the limitations of these claims substantially correspond to the limitations of claims 5-10, respectively; thus they are rejected on similar grounds as their corresponding claims.

Claims 2, 3, 12, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Xiao, in view of Ravikumar, further in view of Kaliouby, and further in view of Sak et al. (U.S. Patent Application Publication No. 2019/0057683), referred herein as Sak.
Regarding claim 2, Xiao in view of Ravikumar, further in view of Kaliouby teaches that the input audio data comprises logarithmic mel features, and teaches identifying associated frequency bands and determining or modifying coefficients for the mel features (Xiao, col 8, lines 38-52; col 9, lines 2-9); thus one of ordinary skill in the art would infer Xiao’s teachings regarding the audio data’s mel-frequency cepstral 
Regarding claim 3, Xiao in view of Ravikumar, further in view of Kaliouby, and further in view of Sak teaches the method of claim 2, wherein generating the embedding comprises utilizing at least one of a recurrent neural network and a convolutional neural network to generate the embedding based on the MFCC features (Xiao, col 9, lines 2-9; Kaliouby, para 109, lines 1-10; Sak, para 69, lines 1-6).
Regarding claims 12 and 13, the limitations of these claims substantially correspond to the limitations of claims 2 and 3, respectively; thus they are rejected on similar grounds as their corresponding claims.

Allowable Subject Matter
Claims 4 and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.  The following is a statement of reasons for the indication of allowable subject matter:
Regarding claim 4, the prior art teaches the method of claim 1, and LSTMs are also known in the art, as previously shown.  In the context of claims 1 and 4 as a whole, however, the prior art does not teach generating animation from audio by receiving input audio data, generating an embedding for the input audio data, generating set of blendshape weight predictions from the generated embedding for the input audio data, wherein the set of blendshape weight predictions comprises blendshape weights, generating a set of event predictions from the generated embedding for the input audio data, wherein the set of event predictions comprises at least one of event detection and voice activity detection, generating a final prediction from the set of blendshape weight predictions and the set of event predictions, wherein the final prediction comprises a set of final blendshape weights; and generating an output based on the generated final prediction, wherein generating the sets of blendshape weight and event predictions comprises utilizing a multi-branch decoder, wherein the multi-branch decoder comprises a first Long Short Term Memory network (LSTM) that generates the set of blendshape weight predictions and a second LSTM that generates the set of event predictions based on the generated embedding.
Regarding claim 14, this claim substantially corresponds to claim 4 and similarly comprises allowable subject matter.

Response to Arguments
Applicant’s arguments with respect to the 103 rejections have been fully considered, but are moot in view of the new grounds of rejection presented above.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID T WELCH whose telephone number is (571)270-5364. The examiner can normally be reached Monday-Thursday, 8:30-5:30 EST, and alternate Fridays, 9:00-2:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

DAVID T. WELCH
Primary Examiner
Art Unit 2613



/DAVID T WELCH/Primary Examiner, Art Unit 2613