DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

2.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

Response to Arguments
3.	Applicant's arguments filed 26 October 2022 have been fully considered but they are not persuasive.
	Applicant argues Chen (WO-2020/010530) does not teach using two modalities to animate a face, and that only a speech or text mode is used, but not both.
	Examiner replies in paragraph [0047], Chen teaches three modalities which can be used for animating a face: a plain text modality, a voice modality which is then converted to text, and a Speech Synthesis Markup Language (SSML) modality, all of which can be obtained from a message.  Furthermore, the SSML can be based on a voice message through the SMML technique.  Finally, the language of claim 1 recites that the first modality and second modality related to the avatar are identified, and receiving information based at least in part on the first modality data and second modality data useful for animating the image of the face of the avatar.  Since both text and voice can be applied for animating a face, even if voice has to be converted to text, the specific limitations set forth in claim 1 are taught by Chen.  While the aligning step recited in claims 8 and 15 are not fully taught by Chen, additional prior art is cited below, which is necessitated by Applicant’s amendments to the claims.  Also, while some further citations and/or explanations are provided in the prior art rejections set forth below, inasmuch as the further citations and/or explanations can be considered new grounds of rejection, such new grounds of rejection are necessitated by Applicant’s amendments to the claims.  Accordingly, the present Office Action is made final.

Claim Rejections - 35 USC § 102
4.	The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

5.	Claims 1-7 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Chen et al. (WIPO Publication 2020/010530), referred herein as Chen.
	Regarding claim 1:  Chen teaches an apparatus comprising: at least one processor configured with instructions to (para 142, lines 1-3 of Chen): identify an image of a face of a computerized avatar (para 44 of Chen); identify first modality data related to the avatar, and identify second modality data related to the avatar (fig 5(510,512,514) and para 47 of Chen – identify text modality and voice modality, either from directly from message 510 or as a Speech Synthesis Markup Language file 514 obtained from message 510; either use plain text 512 or convert voice-to-text when voice modality is identified); receive information based at least in part on the first modality data and the second modality data useful for animating the image of the face of the avatar (fig 5(520) and paras 47-48 of Chen – Sequential Motion Parsing 520 performed on received information, either plain text or SSML file, which can be based on both textual and voice information (text and voice modalities) from the message 510), and animate the face of the avatar in accordance with the information (paras 48 and 74 of Chen).
	Regarding claim 2:  Chen teaches the apparatus of Claim 1 (as rejected above), wherein the information comprises facial action units (FAU), each FAU pertaining to a respective portion of the image of the face (para 48; and para 99, lines 1-7 of Chen).
	Regarding claim 3:  Chen teaches the apparatus of Claim 1 (as rejected above), wherein the first modality data comprises text (fig 5 and para 47 of Chen – first modality from message is plain text).
	Regarding claim 4:  Chen teaches the apparatus of Claim 1 (as rejected above), wherein the second modality data comprises speech (fig 5 and para 47 of Chen – second modality from message is voice, which is then converted to text).
	Regarding claim 5:  Chen teaches the apparatus of Claim 3 (as rejected above), wherein the second modality data comprises speech (fig 5 and para 47 of Chen).
	Regarding claim 6:  Chen teaches the apparatus of Claim 3 (as rejected above), wherein the instructions are executable to derive, using at least one machine learning (ML) engine, emotion action information from the first and second modality data (paras 51 and 53; para 99, lines 1-7 of Chen).
	Regarding claim 7:  Chen teaches the apparatus of Claim 6 (as rejected above), wherein the information is based at least in part on time-aligned word level emotion probabilities produced from the emotion action information (fig 7; para 53, lines 3-7; paras 77 and 80; para 88, the last 7 lines; and para 109, the last 9 lines of Chen).

Claim Rejections - 35 USC § 103
6.	Claims 8-10 and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Chen, in view of el Roche et al. (U.S. Patent No. 10,521,946), referred herein as Roche.
	Regarding claim 8:  Chen teaches a method, comprising: generating an image of a first face to be animated to speak words (paras 48 and 74 of Chen) in accordance with both first text and first speech (fig 5(510,512,514) and para 47 of Chen – animated according to text and voice, either from directly from message 510 or as a Speech Synthesis Markup Language file 514 obtained from message 510, and using plain text 512 and using voice by converting voice-to-text); and animating the image of the first face to speak first words in accordance with the first text and the first speech (fig 5 and paras 47-48 of Chen).
	Chen does not disclose aligning in time the first text and the first speech.
	Roche discloses aligning in time the first text and the first speech (fig 1(104); column 6, lines 33-39; column 9, lines 27-49; column 11, lines 33-58; and column 13, lines 28-39 of Roche – text and speech data are analyzed and time coded for use in animating avatars; multi-modal disambiguation service processes the different types of data, including text and speech data, and does so according to the time code).
	Chen and Roche are analogous art because they are from the same field of endeavor, namely animation of avatar and avatar speech.  Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to align in time the first text and the first speech, as taught by Roche.  The motivation for doing so would have been to allow for a more comprehensive and accurate animation of the talking avatar.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Chen according to the relied-upon teachings of Roche to obtain the invention as specified in claim 8.
	Regarding claim 9:  Chen in view of Roche teaches the method of Claim 8 (as rejected above), comprising: training a machine learning (ML) model using a training set of animated faces speaking known words (para 47; para 84; para 87; para 99; and para 107 of Chen); inputting the first text and first speech to the ML model; animating the image of the first face according with output of the ML model (fig 5; para 48; para 99; and para 108 of Chen – text and speech inputted into ML model and used to determine avatar animation); detecting emotion and sentiment from the first text (para 52; and para 53, lines 1-7 of Chen); aligning the first text with speech representing the first text to render aligned text/speech, and inputting the emotion, sentiment, and aligned text/speech to the ML model (fig 7; paras 77 and 80; para 88, the last 7 lines; para 99, lines 1-7; and para 109, the last 9 lines of Chen).
	Regarding claim 10:  Chen in view of Roche teaches the method of Claim 9 (as rejected above), comprising: inputting a target emotion to the ML model (para 53; and para 109 of Chen).
	Regarding claim 15:  Chen teaches an assembly comprising: at least one display configured to present an animated computer avatar (para 26, lines 1-2 and the last 3 lines of Chen); at least one processor configured with instructions to execute a machine learning (ML) model, the instructions being executable to (para 142, lines 1-3 of Chen): receive text indicating speech to be spoken by the avatar (para 47, lines 1-6 of Chen); receive speech (fig 5(510,512, 514) and para 47, lines 4-10 of Chen – using voice by converting voice-to-text); process the text and speech using the ML model to generate facial action units (FAU) (fig 5(510,512,514) and paras 47-48 of Chen – facial actions according to text and voice, either from directly from message 510 or as a Speech Synthesis Markup Language file 514 obtained from message 510, and using plain text 512 and using voice by converting voice-to-text), and animate the computer avatar in accordance with the FAU (paras 48 and 74; para 99, lines 1-7 of Chen).
	Chen does not disclose align the text and the speech in time.
	Roche discloses aligning the text and the speech in time (fig 1(104); column 6, lines 33-39; column 9, lines 27-49; column 11, lines 33-58; and column 13, lines 28-39 of Roche – text and speech data are analyzed and time coded for use in animating avatars; multi-modal disambiguation service processes the different types of data, including text and speech data, and does so according to the time code).
	Chen and Roche are analogous art because they are from the same field of endeavor, namely animation of avatar and avatar speech.  Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to align the text and the speech in time, as taught by Roche.  The motivation for doing so would have been to allow for a more comprehensive and accurate animation of the talking avatar.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Chen according to the relied-upon teachings of Roche to obtain the invention as specified in claim 15.
	Regarding claim 16:  Chen in view of Roche discloses the assembly of Claim 15 (as rejected above), wherein the instructions are executable to: detect emotion and sentiment from the text (para 52; and para 53, lines 1-7 of Chen); align the text with speech representing the text to render aligned text/speech, and inputting the emotion, sentiment, and aligned text/speech to the ML model (fig 7; paras 77 and 80; para 88, the last 7 lines; para 99, lines 1-7; and para 109, the last 9 lines of Chen).
	Regarding claim 17:  Chen in view of Roche discloses the assembly of Claim 16 (as rejected above), wherein the instructions are executable to: input a target emotion to the ML model (para 53; and para 109 of Chen).

7.	Claims 11-14 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Roche, and in further view of Sachs et al. (U.S. Patent Application Publication No. 2019/0122411), referred herein as Sachs.
	Regarding claim 11:  Chen in view of Roche teaches the method of Claim 9 (as rejected above), comprising: receiving first probabilities representing facial action (para 88; para 109; and para 127 of Chen).  Chen in view of Roche does not explicitly teach receiving probabilities from the ML model.
	Sachs teaches a method comprising training a machine learning model to animate an avatar with spoken words in accordance with text information using a Facial Action Coding System (para 73, lines 1-4; para 77, lines 1-13; para 80, lines 1-14; and para 156 of Sachs), and further comprising receiving probabilities from the ML model (para 167 of Sachs).  It would have been obvious to one of ordinary skill in the art to utilize probabilities from the ML model because as taught by Sachs, this provides a flexible and efficient way to analyze the FACS information more accurately and with smaller datasets (see, for example, para 157, lines 1-10; para 159; and para 167, the last 5 lines of Sachs).
	Regarding claim 12:  Chen in view of Roche, and in further view of Sachs, teaches the method of Claim 11 (as rejected above), comprising: receiving second probabilities from the ML model representing emotion (para 53, lines 3-7; para 88; para 109; and para 127 of Chen; para 167 of Sachs).  Chen and Sachs are combined for the reasons set forth above with respect to claim 11.
	Regarding claim 13:  Chen in view of Roche, and in further view of Sachs, teaches the method of claim 12 (as rejected above), comprising: using one or both of the first and second probabilities to establish facial action units (FAU) (para 48; para 84; para 109; and para 127 of Chen; para 167 of Sachs).  Chen and Sachs are combined for the reasons set forth above with respect to claim 11.
	Regarding claim 14:  Chen in view of Roche, and in further view of Sachs, teaches the method of Claim 13 (as rejected above), comprising: animating the image of the first face in accordance with the FAU (para 48; para 74; para 108; and para 110 of Chen).
	Regarding claims 18-20:  The limitations of these claims substantially correspond to the limitations of claims 11-13, respectively; thus they are rejected on similar grounds as their corresponding claims.

Conclusion
8.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to James A Thompson whose telephone number is (571)272-7441. The examiner can normally be reached M-F 8am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JAMES A THOMPSON/Primary Examiner, Art Unit 2616