DETAILED ACTION
This Office Action is in response to the correspondence filed by the applicant on 11/4/2020.
The Amendment filed on 11/4/2020 has been entered.  
Claims 1-10 have been amended by Applicant.
Claims 1-10 remain pending in the application of which Claims 1 is independent.  
Applicant’s amendments and arguments are considered but are either unpersuasive or moot in view of the new grounds of rejection that were necessitated by the amendments to the Claims.   

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file.

Response to Arguments
Regarding the objection to Claim 7, the claim is amended.  The objection is now withdrawn.
Regarding the rejection under 35 U.S.C. 112 (b), the claim 1 is amended. The rejection is now withdrawn.
Regarding the claim interpretation under 35 U.S.C.112 (f), the claims are amended by removing the nonce terms.  The claims are no long interpreted under 112 (f).
Regarding the claim rejection under 35 U.S.C. 102(a)(2), Applicant’s arguments, pages 8-10, with respect to the rejection of claims 1-4, 8 and 9 have been fully considered and are moot MEULEN (US 10,360,716 B1), and further in view of SUN (US 2018/0336891 A1).  Please see the rejection below for more details.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-4 and 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over MEULEN (US 10,360,716 B1), and further in view of SUN (US 2018/0336891 A1).

REGARDING CLAIM 1, MEULEN discloses a system for synchronizing a speech and a motion of a character, the system comprising: 
a speech engine configured to generate reproduction time information of a speech (MEULEN Col 6:26-34 – “The audio processor 216 may process results of the ASML module 212 to create an audio/speech sequence associated with time. For example, the audio processor 216 may select phonemes and/or sample sounds or other sounds from the audio library 224 based on the content of the ASML text/document. In some embodiments, the audio determine timing for the audio sequences. The timing may be based on a determined rate of speech, among other possible factors. The timing for the audio sequences may provide a framework for timing of other events, such as special sounds and the animation, as discussed below.”) from an utterance sentence that is input (MEULEN Col 6:61-7:3 – “This information may result in receipt of a message 304, which may include text with emoticons, abbreviations, special font, and/or other characteristics which may be used to modify speech and/or animation representing the content of the message 304.”; Fig. 4 – “Receive message 402.”); 

a character motion engine configured to generate motion information of a motion of a character corresponding to the utterance sentence (MEULEN Fig. 4 – “Select/generate animation including special animation features 406”; Fig. 9 – “Determine audio and animation segments 902”; Col 3:65-4:9– “The visual actions are not limited to speech movements, but may also include body movements, such as jumping, clapping hands, and/or other movements of appendages, limbs, bodies, or body parts. Further, the techniques and systems may be used to create the VSML text 114 without human input. Thus, a message or other textual data may be analyzed and used to create VSML, which may then be used to create visual outputs.”) and execution time information of the motion (MEULEN Col 6:20-25– “The graphics processor 214 may process results of the VSML module 210 to create an animation sequence associated with time. For example, the graphics processor 214 may select visemes and/or animation snippets or other animations from the animation library based on the content of the VSML text/document.”; Col 12:53-58– “At 910, the graphics processor 214 may determine the timing for animation sequences. The timing may be determined based at least in part on the timing of the audio sequences determined at the operation 904, along with any additional special audio sounds and/or adjustment for silent animations.”) from the utterance sentence that is input receipt of a message 304, which may include text with emoticons, abbreviations, special font, and/or other characteristics which may be used to modify speech and/or animation representing the content of the message 304.”; Fig. 4 – “Receive message 402.), wherein the reproduction time information of the speech and the execution time information of the motion are independently generated from the utterance sentence (MEULEN Fig. 9 – “Determine timing for audio sequences 904” and “Determine Timing for Animation Sequence 910”; Col 12:29-34 – “At 904, the audio processor 216 may determine timing for the audio sequences. The timing may be based on a determined rate of speech, among other possible factors. The timing for the audio sequences may provide a framework for timing of other events, such as special sounds and the animation, as discussed below.”; Col 12:53-58 – “At 910, the graphics processor 214 may determine the timing for animation sequences. The timing may be determined based at least in part on the timing of the audio sequences determined at the operation 904, along with any additional special audio sounds and/or adjustment for silent animations.”; As shown in Fig. 9, the time information of speech and motion are independently determined.);

a controller configured to generate modified execution time information of the motion (MEULEN Fig. 9 – “Insert special animation 912”; Col 12:59-13:2 – “At 912, the graphics processor 214 may insert special animations, which may or may not modify the timing determined at the operation 910. As an example, the timing of the animation may be initially set to the timing of the audio sequences, but may then be extended for an additional animation of laughter that is inserted at the end of an animation sequence for the reading of a message that ends with “LOL”. In this situation, the operation 906 may insert a laughter sound at the end of the audio track, which may be synchronized with the laughter animation inserted at the operation 912. Synchronization is described below.”) and modified reproduction time information of the speech (MEULEN Fig. 9 – “Revise mix 920?”; Col 13:9-21– “At 918, the revise the timing of the audio and/or animation, which may result in additional mixing. If additional mixing is needed based on modifications to the audio and/or animation, the processes 900 may return to perform the operations 914 and/or 916 again.”) which is modified in synchronization with the modified execution time information of the motion (MEULEN Fig. 9 – “Synchronize audio/animation 918”; Col 13:9-21 – “At 918, the synchronization module 222 may check synchronization of the audio and animation to ensure that the segments (e.g., a specific phoneme and corresponding viseme) are synchronized and occur at a same time during output of the audio and animation. At 920, the synchronization module 222 may revise the timing of the audio and/or animation, which may result in additional mixing.”) based on the utterance sentence (MEULEN Col 12:59-13:2 – “As an example, the timing of the animation may be initially set to the timing of the audio sequences, but may then be extended for an additional animation of laughter that is inserted at the end of an animation sequence for the reading of a message that ends with “LOL”.”), the reproduction time information of the speech (MEULEN Col 12:59-13:2 – “As an example, the timing of the animation may be initially set to the timing of the audio sequences, but may then be extended for an additional animation of laughter that is inserted at the end of an animation sequence for the reading of a message that ends with “LOL”.”), and execution time information of the motion (MEULEN Col 12:59-13:2 – “As an example, the timing of the animation may be initially set to the timing of the audio sequences, but may then be extended for an additional animation of laughter that is inserted at the end of an animation sequence for the reading of a message that ends with “LOL”.”; Col 13:9-21 – “At 918, the synchronization module 222 may check synchronization of the audio and animation to ensure that the segments (e.g., a specific phoneme and corresponding viseme) are synchronized and occur at a same time during output of the audio and animation. At 920, the synchronization module 222 may revise the timing of the audio and/or animation, which may result in additional mixing.”); 

a video player configured to generate an image to play the motion of the character (MEULEN Figs. 4 and 9 – “Output audio/animation ; Col 8:65-9:6– “At 410, the resulting animation and audio may be output, by way of an avatar that speaks text in the message 402. The avatar may be a personal assistant that is animated to appear to read communications such as emails and text messages, responds to user commands, and/or performs other services for a user or in response to user requests. The avatar may be output by a mobile telephone, a tablet computer, a television, a set top box, and/or any other electronic device with outputs devices of a speaker and/or display screen or projector.”; Col 13:9-21– “When the synchronization is complete and correct, then the computing device(s) may output or send the audio/animation to an end user device for output.”) according to the motion information of the character (MEULEN Fig. 4 – “Select/generate animation including special animation features 406”; Fig. 9 – “Determine audio and animation segments 902”) and the modified execution time information of the motion that are provided by the controller (MEULEN Fig. 9 – “Insert special animation 912” and “Revise mix 920”; Col 12:59-13:2 – “At 912, the graphics processor 214 may insert special animations, which may or may not modify the timing determined at the operation 910.”); and 

an audio player configured to generate a speech (MEULEN Col 8:65-9:6– “At 410, the resulting animation and audio may be output, by way of an avatar that speaks text in the message 402. The avatar may be a personal assistant that is animated to appear to read communications such as emails and text messages, responds to user commands, and/or performs other services for a user or in response to user requests. The avatar may be output by a mobile telephone, a tablet computer, a television, a set top box, and/or any other electronic according to the modified reproduction time information of the speech that is provided by the controller (MEULEN Fig. 9 – “Adjust timing for special silent animations 908” and “Revise mix? 920”; Col 13:9-21– “At 920, the synchronization module 222 may revise the timing of the audio and/or animation, which may result in additional mixing. If additional mixing is needed based on modifications to the audio and/or animation, the processes 900 may return to perform the operations 914 and/or 916 again.”) and reproduce the generated speech (MEULE Figs. 4 and 9 – “Output audio/animation”; Col 13:9-21 – “When the synchronization is complete and correct, then the computing device(s) may output or send the audio/animation to an end user device for output.”),
[wherein the controller is further configured to, in modifying the reproduction time information of the speech, change a start time of the speech substantially later than a start time of the motion of the character in case matching the execution time of the motion and the reproduction of the speech causes distortion to the speech].

MUELEN does not explicitly teach the [square-bracketed] limitations.

SUN discloses the [square-bracketed] limitations.  SUN discloses a method/system for visual information and auditory information for synchronization comprising: [wherein the controller is further configured to, in modifying the reproduction time information of the speech, change a start time of the speech substantially later than a start time of the motion of the character (SUN Figs. 7 and 13; Par 101 – “Further, as shown in FIG. 7, it is possible to reduce the time lag from the robot motion by accelerating and slowing down the speech, or by inserting a pause. In the second embodiment, the “image” in FIG. 6 and FIG. 7 can be replaced with the “robot motion”.”; Par 105 – “FIG. 13 shows an example of performing speech editing, in addition to motion command editing by the process shown in FIG. 12. In the example in FIG. 13, a pause is inserted to slow down “kono kan 1 wo” in the speech, and at the same time, the part of “irete kudasai.” is spoken fast to limit the whole time within a predetermined time range.”;  Please note the speech start time for “KONO KAN2 ni” in Fig. 7and the speech start time for “KONO KAN1 wo” in Fig. 13 have been delayed by inserting a pause in order to be synchronized with the motion/image data.) in case matching the execution time of the motion and the reproduction of the speech causes distortion to the speech] (SUN Fig. 8B – “Speech/image naturalness evaluation module 8065 -> Speech synthesis unit (second language) 1061 -> Time-lag evaluation module between second language speech and gesture 8062 -> Greater than threshold? 1063 -> YES -> Editing module 8064”; Par 102 – “The speech/motion naturalness evaluation module 8065 evaluates the naturalness for each of a plurality of methods (motion command editing, text editing, speech editing, and the like) that eliminate the “time lag.”; Please note that the process is in a loop; thus, when it is determined the time-lag evaluation is not satisfied (greater than threshold), the process continues with the modifications, wherein the modifications include inserting a pause to reduce the lag.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of MEULEN to include changing a start time of the speech, as taught by SUN.
One of ordinary skill would have been motivated to include changing a start time of the speech, in order to reduce the time lag between the two data (SUN Par 74).

REGARDING CLAIM 2, MEULEN in view of SUN discloses the system of claim 1, wherein utterance type information (MEULEN Fig. 5 – “502 Next, you need to repeat step 2. => Type: format”) is further input to the speech engine  (MEULEN Fig. 5 “502 .. =>Lower pitch for step 2”) and the character motion engine  (MEULEN Fig. 5 “502 .. =>exaggerated visemes for step 2”), the utterance type information includes at least one of emphasis information indicating a part to be emphasized in the utterance sentence and an extent of the emphasis, stress information of a syllable, and length information of a syllable (MEULEN Col 9:14-23– “In a first example 502, a message may include text “Next, you will need to repeat<bold> step 2</bold>” (possibly with other font indicators). The message analyzer 208 may determine the occurrence of a font “bold.” The VSML module 210 may create a visual action indicator of <exaggerated visemes for “step 2”>, which may be inserted into an output directed to the graphics processor 214. The ASML module 212 may create an audio indicator of <lower pitch for “step 2”>, which may be inserted into an output directed to the audio processor 216.”; Col 8:26-45– “Other aspects of the message, such as special or unusual text formatting (e.g., emphasis, all capital letters, etc.), punctuation, spacing, and/or other aspects may also be identified and associated with audio and/or visual features to be implemented by the avatar.”), 

the speech engine  generates the reproduction time information of the speech (MEULEN Col 6:26-34 – “The audio processor 216 may process results of the ASML module 212 to create an audio/speech sequence associated with time. For example, the audio processor 216 may select phonemes and/or sample sounds or other sounds from the audio library 224 based on the content of the ASML text/document. In some embodiments, the audio processor 216 may be implemented by one or more text-to-speech (TTS) algorithms.”; Col 12:29-34 – “At 904, the audio processor 216 may determine timing for the audio sequences. The timing may be based on a determined rate of speech, among other possible factors. The timing for the audio sequences may provide a framework for timing of other events, such as special sounds and the animation, as discussed below.”) from the utterance sentence (MEULEN) using the utterance type information (MEULEN Col 9:14-23– “The ASML module 212 may create an audio indicator of <lower pitch for “step 2”>, which may be inserted into an output directed to the audio processor 216.”), and 

the character motion engine  generates the motion information of the character corresponding to the utterance sentence (MEULEN Fig. 4 – “Select/generate animation including special animation features 406”; Fig. 9 – “Determine audio and animation segments 902”; Col 3:65-4:9– “The visual actions are not limited to speech movements, but may also include body movements, such as jumping, clapping hands, and/or other movements of appendages, limbs, bodies, or body parts. Further, the techniques and systems may be used to create the VSML text 114 without human input. Thus, a message or other textual data may be analyzed and used to create VSML, which may then be used to create visual outputs.”) and the execution time information of the motion (MEULEN Col 6:20-25– “The graphics processor 214 may process results of the VSML module 210 to create an animation sequence associated with time. For example, the graphics processor 214 may select visemes and/or animation snippets or other animations from the animation library based on the content of the VSML text/document.”; Col 12:53-58– “At 910, the graphics processor 214 may determine the timing for animation sequences. The timing may be determined based at least in part on the timing of the audio sequences determined at the operation 904, along with any additional special audio sounds and/or adjustment for silent animations.”) from the utterance sentence using the utterance type information (MEULEN Col 9:14-23– “The VSML module 210 may create a visual action indicator of <exaggerated visemes for “step 2”>, which may be inserted into an output directed to the graphics processor 214.”).

REGARDING CLAIM 3, MEULEN in view of SUN discloses the system of claim 1, wherein the character motion engine  generates a plurality of pieces of character motion information (MEULEN Fig. 6 – “V(W)[L1], V(O)[L2], V(W)[LN]”; Col 10:30-39 – “The graphics processor 214 may select visemes V(W)[L1] 608, V(O)[L2] 610, and V(W)[LN] 612 to create an animation sequence for this word spoken while laughing.”) corresponding to one of a syntactic word, a space between syntactic words, or a word included in the utterance sentence (MEULEN Fig. 6 – “Input: <laugh visual> wow </laugh visual> 606”; Col 10:8-29– “For example, when laughing and talking, there may be multiple possible movements of laughter associated with each viseme. These may be selected and stitched or sequenced together to create a smooth animation of laughter and of the words spoken. As shown here, a VSML text 606 may include the following example text with visual indicators: “<laugh visual>wow</laugh visual>”.”) and execution time information of each motion (MEULEN Col 10:30-39 – “The graphics processor 214 may select visemes V(W)[L1] 608, V(O)[L2] 610, and V(W)[LN] 612 to create an animation sequence for this word spoken while laughing. The animation may be created to start and end at a point where a consecutive string of the selected visemes (e.g., L1, L2 LN) result in a smooth and continuous animation. As shown in FIG. 6, the number of visemes (1)-(N) indicates a number of sequenced movements for an action, such as laughter, which may require multiple frames of animation to create an action that represents laughter, for example”).

REGARDING CLAIM 4, MEULEN in view of SUN discloses the system of claim 1, wherein the speech engine  generates and transmits a speech corresponding to the utterance sentence (MEULEN Fig. 4 – S404->S406; Col 8:46-56 – “For example, the graphics processor 214 may receive phonemes for speech generated by a text-to-speech processor (e.g., the audio processor 216) … ”; Fig. 8 – “Generate audio/speech based on audio features and phonemes 810”; Col 11:63-12:2– “At 810, the audio processor 216 may create audio/speech based on audio features and the phonemes identified at the operation 804 and 806. The audio processor 216 may combine sounds to create the speech using TTS algorithms. The speech may be modified to include a speed, a pitch, a volume, and/or other attributes of speech, which may be included in the audio features.”), and the audio player modifies the speech, which is generated by the speech engine  (MEULEN Fig. 8 – “”Mix/Synchronize audio and animation 814; Col 12:9-13 – “At 814, the graphics mixer 218 may mix and/or synchronize the speech and animations to align the avatar's animated movements with sounds from the speech. The speech may be delayed (e.g., include a pause), to allow for insertion of an animation, such as a laughing or smiling animation.”; Fig. 9 – “Revise mix 920? -> “Mix audio 914”; Col 13:9-21– “At 920, the synchronization module 222 may revise the timing of the audio and/or animation, which may result in additional mixing. If additional mixing is needed based on modifications to the audio and/or animation, the processes 900 may return to perform the operations 914 and/or 916 again.”; Col 13:3-8 – “At 914, the audio mixer 220 may mix the audio while at 916, the graphics mixer 218 may mix the animation. The mixing may create smooth transitions between segments, create fades, and/or make other adjustments to audio and/or animations to create a continuous and smooth presentation of audio and animation.”), according to the modified reproduction time information of the speech that is provided by the controller (MEULEN Fig. 9 – “Revise mix 920?”; Col 13:9-21– “At 920, the synchronization module 222 may revise the timing of the audio and/or animation, which may result in additional mixing. If additional mixing is needed based on modifications to the audio and/or animation, the processes 900 may return to perform the operations 914 and/or 916 again.”) and reproduces the modified speech (MEULEN Fig. 9; Col 13:9-21– “When the synchronization is complete and correct, then the computing device(s) may output or send the audio/animation to an end user device for output.”).

REGARDING CLAIM 8, MEULEN in view of SUN discloses the system of claim 1, further comprising a synthesizer configured to generate a character animation by synthesizing the image output using the video player with the speech output by the audio player (MEULEN Fig. 1; Col 13:9-21– “At 918, the synchronization module 222 may check synchronization of the audio and animation to ensure that the segments (e.g., a specific phoneme and corresponding viseme) are synchronized and occur at a same time during output of the audio and animation. At 920, the synchronization module 222 may revise the timing of the audio and/or animation, which may result in additional mixing. If additional mixing is needed based on modifications to the audio and/or animation, the processes 900 may return to perform the computing device(s) may output or send the audio/animation to an end user device for output.”; Col 3:58-64– “As shown in FIG. 1, the audio track 116 is represented with respect to a timeline 126, along with the animation sequence 122. An abbreviated illustrative animation sequence 128(1)-(N) shows the avatar 120 speaking and acting out visual indicators included in the VSML text 114. The animation sequence 128(N) shows part of a smile action representing the visual action indicator 116.”).

CLAIM 9 is similar to Claim 4; thus, it is rejected under the same rationale.










Claim 5 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over MEULEN (US 10,360,716 B1) in view of SUN (US 2018/0336891 A1), and further in view of WANG (US 2010/0082345 A1).

REGARDING CLAIM 5, MEULEN in view of SUN discloses the system of claim 1, wherein the character motion engine  generates and transmits operation information of a character [skeleton] body parts (MEULEN Col 3:65-4:9 – “The visual actions are not limited to speech movements, but may also include body movements, such as jumping, clapping hands, and/or other movements of appendages, limbs, bodies, or body parts.”) for executing the motion of the character (MEULEN Fig. 9 – “Insert special animation 912 - > Mix animation 916”; Col 12:59-13:2– “At 912, the graphics processor 214 may insert special animations, which may or may not modify the timing determined at the operation 910. As an example, the timing of the animation may be initially set to the timing of the audio sequences, but may then be extended for an additional animation of laughter that is inserted at the end of an animation sequence for the reading of a message that ends with “LOL”. In this situation, the operation 906 may insert a laughter sound at the end of the audio track, which may be synchronized with the laughter animation inserted at the operation 912. Synchronization is described below.”) according to the generated motion information of the character (MEULEN Fig. 4 – “Select/generate animation including special animation features 406”; Fig. 9 – “Determine audio and animation segments 902”; Col 3:65-4:9– “The visual actions are not limited to speech movements, but may also include body movements, such as jumping, clapping hands, and/or other movements of appendages, limbs, bodies, or body parts. Further, the techniques and systems may be used to create the VSML text 114 without human input. Thus, a message or other textual data may be analyzed and used to create VSML, which may then be used to create visual outputs.”) and the modified execution time information of the motion (MEULEN Fig. 9 – “Insert special animation 912”; Col 12:59-13:2 – “At 912, the graphics processor 214 may insert special animations, which may or may not modify the timing determined at the operation 910. As an example, the timing of the animation may be initially set to the timing of the audio sequences, but may then be extended for an additional animation of laughter that is inserted at the end of an animation sequence for the reading of a message that ends with “LOL”.”), and the video player modifies the operation information of the character [skeleton] body parts (MEULEN Fig. 9 – “Mix animation 916”; Col 13:3-8– “At 914, the audio mixer 220 may mix the audio while at 916, the graphics mixer 218 may mix the animation. The mixing may create smooth transitions between segments, create fades, and/or make other adjustments to audio and/or animations to create a continuous and smooth presentation of audio and animation.”; Col 3:65-4:9 – “The visual actions are not limited to speech movements, but may also include body movements, such as jumping, clapping hands, and/or other movements of appendages, limbs, bodies, or body parts.”), which is generated by the character motion engine  according to the motion information of the character (MEULEN Fig. 4 – “Select/generate animation including special animation features 406”; Fig. 9 – “Determine audio and animation segments 902”; Col 3:65-4:9– “The visual actions are not limited to speech movements, but may also include body movements, such as jumping, clapping hands, and/or other movements of appendages, limbs, bodies, or body parts. Further, the techniques and systems may be used to create the VSML text 114 without human input. Thus, a message or other textual data may be analyzed and used to create VSML, which may then be used to create visual outputs.”) and the modified execution time information of the motion that are provided by the controller (MEULEN Fig. 9 – “Insert special animation 912”; Col 12:59-13:2 – “At 912, the graphics processor 214 may insert special animations, which may or may not modify the timing determined at the operation 910. As an example, the timing of the animation may be initially set to the timing of the audio sequences, but may then be extended for an additional animation of laughter that is inserted at the end of an animation sequence for the reading of a message that ends with “LOL”.”), to generate an image in which the motion of the character is executed (MEULEN Fig. 9 – “Output audio/animation 922”; Col 13:9-21– “When the synchronization is complete and correct, then the computing device(s) may output or send the audio/animation to an end user device for output.”).
MEULEN does not explicitly teach the [square-bracketed] limitations.

WANG disclose the [square-bracketed] limitations. WANG discloses a method/system for speech and text driven hmm-based body animation synthesis comprising: motion engine  generates and transmits operation information of a character [skeleton] (WANG Par 23 – “The Animation Synthesizer then uses the resulting probabilistic model for selecting, or more specifically, for “predicting,” the appropriate animation trajectories for one or more different body parts (e.g., mouth (i.e., lip sync and other mouth motions), nose, eyes, eyebrows, ears, face, head, fingers, hands, arms, legs, feet, torso, spine, skeletal elements of a body, etc.) based on an arbitrary text and/or speech input based on an evaluation of the context, punctuation, and any emotional characteristics associated with that input.”); and the video player modifies the operation information of the character [skeleton] (WANG Par 73 – “. Each controller controls the manner and displacement of vertices on the associated meshes. As such, virtual lip muscles can be used to control how wide the mouth of the avatar or robot opens. Similarly, body gestures or motions can be achieved by controlling the spatial positions of bones on the skeleton.”; Par 10 – “In general, an “Animation Synthesizer” as described herein, provides various techniques for enabling automatic speech and text driven animation synthesis. The Animation Synthesizer uses a trainable probabilistic model (also referred to herein as an “animation model”) for selecting animation motions based on an arbitrary text and/or speech input. These animation motions are then used to synthesize a sequence of animations for digital avatars, cartoon characters, computer generated anthropomorphic persons or creatures, actual motions for physical robots, etc., that are synchronized with a speech output corresponding to the arbitrary text and/or speech input.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of MEULEN in view of SUN to include a character skeleton, as taught by WANG.
One of ordinary skill would have been motivated to include a character skeleton, in order to generate an avatar animation in a more natural way (WANG Par 13).

CLAIM 10 is similar to Claim 5; thus, it is rejected under the same rationale.





Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over MEULEN (US 10,360,716 B1) in view of SUN (US 2018/0336891 A1), and further in view of LEISTIKOW (US 2015/0120308 A1).

REGARDING CLAIM 6, MEULEN in view of SUN discloses the system of claim 1, wherein the modification of the reproduction time information of the speech by the controller (MEULEN Fig. 9 – “Revise mix 920?”; Col 13:9-21– “At 918, the synchronization module 222 may check synchronization of the audio and animation to ensure that the segments (e.g., a specific phoneme and corresponding viseme) are synchronized and occur at a same time during output of the audio and animation. At 920, the synchronization module 222 may revise the timing of the audio and/or animation, which may result in additional mixing. If additional mixing is needed based on modifications to the audio and/or animation, the processes 900 may return to perform the operations 914 and/or 916 again.”) includes [modifying a pronunciation time of a syllable] or modifying an interval between [syllables] (MEULEN Fig. 9 – “Revise mix 920? -> “Mix audio 914 -> Synchronize audio/animation 918”; Col 13:3-8– “At 914, the audio mixer 220 may mix the audio while at 916, the graphics mixer 218 may mix the animation. The mixing may create smooth transitions between segments, create fades, and/or make other adjustments to audio and/or animations to create a continuous and smooth presentation of audio and animation.”; Fig. 8 – “Mix/synchronize audio and animation 814”; Col 12:9-13 – “At 814, the graphics mixer 218 may mix and/or synchronize the speech and animations to align the avatar's animated movements with sounds from the speech. The speech may be delayed (e.g., include a pause), to allow for insertion of an animation, such as a laughing or smiling animation.”; Col 6:43-50 – “The audio mixer 220 may mix the audio generated and/or selected by the audio processor 216. For example the audio mixer 220 may modify timing, smooth transitions, and/or otherwise modify the audio to create audio/speech that is synchronized with animation from the animation mixer 218. The audio mixer 220 may receive input from the synchronization module 222 to synchronize the audio with the animation.”).
MEULEN does not explicitly teach the [square-bracketed] limitations.  MEULEN teaches modifying “timing” of the speech signal in order to synchronize it with the animation. Thus, as the result, the components of the speech (the durations of syllables, vowels, words, phonemes, etc.) will be modified.  Although, MEULEN implicitly teach the modifying the duration of the syllables, EXAMINER provides LEISTIKOW for the clarity of the rejections.

LEISTIKOW disclose the [square-bracketed] limitations. LEISTIKOW disclose a method/system for synchronizing speech with other data comprising:
[modifying a pronunciation time of a syllable] or modifying an interval between [syllables] (LEISTIKOW Par 90 – “Although the alignment and note-to-segment stretching processes synchronize the onsets of the voice with the notes of the melody, the musical structure of the backing track can be further emphasized by stretching the syllables to fill the length of the notes.  To achieve this without losing intelligibility, we use dynamic time stretching to stretch the vowel sounds in the speech, while leaving the consonants as they are.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of MEULEN in view of SUN to include modifying the length of a syllable, as taught by LEISTIKOW.
One of ordinary skill would have been motivated to include the length of a syllable, in order to synchronize voice data with other data without losing intelligibility (LEISTIKOW Par 90).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over MEULEN (US 10,360,716 B1) in view of SUN (US 2018/0336891 A1), and further in view of LARKIN (US 2014/0267303 A1).

REGARDING CLAIM 7, MEULEN in view of SUN discloses the system of claim 1, wherein the execution time information of the motion generated by the character motion engine  (MEULEN Fig. 9 – “Determining timing for animation sequence 910”; Col 12:53-59 -- “At 910, the graphics processor 214 may determine the timing for animation sequences. The timing may be determined based at least in part on the timing of the audio sequences determined at the operation 904, along with any additional special audio sounds and/or adjustment for silent animations.”) includes [a minimum execution time and a maximum] execution time of the motion (MEULEN Col 10:30-39 – “The graphics processor 214 may select visemes V(W)[L1] 608, V(O)[L2] 610, and V(W)[LN] 612 to create an animation sequence for this word spoken while laughing. The animation may be created to start and end at a point where a consecutive string of the selected visemes (e.g., L1, L2 LN) result in a smooth and continuous animation. As shown in FIG. 6, the number of visemes (1)-(N) indicates a number of sequenced movements for an action, such as laughter, which may require multiple frames of animation to create an action that represents laughter, for example.”), and the modification of the execution time information of the motion by the controller includes determining an execution time of the motion (MEULEN Fig. 9 – “Insert special animation 912”; Col 12:59-13:2 – “At 912, the graphics processor 214 may insert special animations, which may or may not modify the timing determined at the operation 910. As an example, the timing of the animation may be initially set to the timing of the audio sequences, but may then be extended for an additional animation of laughter that is inserted at the end of an animation sequence for the reading of a message that ends with “LOL”. In this situation, the operation 906 may insert a laughter sound at the end of the audio track, which may be synchronized with the laughter animation inserted at the according to the reproduction time information of the speech (MEULEN Col 12:59-13:2 – “As an example, the timing of the animation may be initially set to the timing of the audio sequences, but may then be extended for an additional animation of laughter that is inserted at the end of an animation sequence for the reading of a message that ends with “LOL”.”) [within a range of a minimum execution time to a maximum execution time of the motion].

MEULEN does not explicitly teach the [square-bracketed] limitations. MEULEN teaches for a certain action (e.g., laughter), it requires a certain number of frames (N) to represent.  Thus, MEULEN implicitly teach the minimum execution time corresponds to “sampling frequency (e.g., number frames per second) / 2 frames (e.g., multiple frames of animation)”; and the maximum execution time corresponds to “sampling frequency x N frames (e.g., N sequenced movements for an action)”. Also, since the modification of the animation is based on picking the frames within the N sequenced movements, the resulting modification is also within the range.  However, for the clarity of the rejection, EXAMINER provides LARKIN.

LARKIN discloses the [square-bracketed] limitations.  LARKING discloses a method/system for generating animations, wherein the execution time information of the motion generated by the character motion engine  includes [a minimum execution time and a maximum] execution time of the motion (LARKIN Par 48 – “If the user elected to adjust the image count instead of adjusting the display rate, then in step 311, the computing device may determine whether the animation exceeded a maximum duration (was too long), or fell below a minimum duration (was too short).”), and the modification of the execution time information of the motion by the controller includes determining an execution time of the motion according to the reproduction time information of the user’s desire (Note that MEULEN already teaches the modification according to the reproduction time information of the speech.  the user-specified animation duration.”) [within a range of a minimum execution time to a maximum execution time of the motion] (LARKIN Par 48 – “If the user elected to adjust the image count instead of adjusting the display rate, then in step 311, the computing device may determine whether the animation exceeded a maximum duration (was too long), or fell below a minimum duration (was too short).  If the animation was too long, then in step 312, the computing device may select one or more images in the path to skip in the animation.  To select images to skip, the computing device may first determine the playback duration of the animation as identified in the path (e.g., by multiplying the display image rate by the number of images in the path), and determine the amount of time by which that duration exceeds the user's required duration limit ([Excess Time]=[Playback Duration]-[Duration Maximum]).  Then, the computing device may determine how many excess images need to be trimmed from the animation to fit within the duration maximum.”; Par 49 – “If, in step 311, the animation was too short, then in step 313, the computing device may select one or more images form the path to display more than once during the animation.  To do so, the computing device may first determine how much additional time is needed for the animation.  This may be done by identifying the [Playback Duration] as discussed above, and determining the difference between that duration and the [Duration Minimum].  This difference may be the amount of [Added Time] needed for the animation, and the computing device can determine the number of [Duplicate Images] as follows: [Duplicate Images]=[Added Time]/[Display Rate].”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of MEULEN in view of SUN to include a minimum duration and a maximum duration for generating animation, as taught by LARKIN.
.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN C. KIM whose telephone number is (571)272-3327.  The examiner can normally be reached on Monday to Friday 9:00 AM thru 5:30 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/JONATHAN C KIM/Primary Examiner, Art Unit 2659