Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Compact Prosecution 
The Examiner believes the primary reference is very strong.  For the purposes of compact prosecution, the Examiner encourages Applicant’s representative to have an interview with the Examiner to discuss the potential difference between Applicant’s disclosure and Taylor et al. 

Response to Amendment 
This is in response to applicant’s amendment/response filed on 4/7/2021, which has been entered and made of record.  Claims 1, 3, 8, 10, and 15 have been amended.  Claims 2 and 9 have been cancelled.  Claims 1, 3-8, and 10-15 are pending in the application. 

Applicant’s arguments with respect to claims 1, 3-8, and 10-15 filed 4/7/2021 have been considered but they are unpersuasive.
Remarks 8):

    PNG
    media_image1.png
    223
    636
    media_image1.png
    Greyscale

	The Examiner disagrees. 
	First, 

    PNG
    media_image2.png
    102
    455
    media_image2.png
    Greyscale
 Taylor Fig. 4, which shows a matrix of subsequences and which is two dimensional. 
	Second and alternatively, the claim recites a “two-dimensional . . . matrix sequence.”  It does not require a two-dimensional matrix representation.  A two-dimensional matrix may even be represented by one sequence of numbers.  For example, this is how two dimensional arrays in programming are often implemented. 

	(2) Applicant states (Remarks 9):

    PNG
    media_image3.png
    165
    625
    media_image3.png
    Greyscale

	The Examiner disagrees. 
	The claim does not recite or require that a convolutional layer not be one dimensional, and it is unclear what one dimensional convolutional layer is. 
	The Examiner already explained under (1) that Taylor teaches the two-dimensional feature matrix sequence.
	
Claim Rejections - 35 USC § 103 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3-8, and 10-15 are rejected under 35 U.S.C. 103 as being unpatentable over Taylor et al. (Taylor) (“A Deep Learning Approach for Generalized Speech Animation”). 
Regarding Claim 1, Taylor discloses A method for controlling mouth shape changes of a three-dimensional virtual portrait, comprising: 
acquiring a to-be-played speech (
    PNG
    media_image4.png
    350
    1340
    media_image4.png
    Greyscale

The “target speech” in figure 1 may correspond to the “to-be-played speech.”);   
sliding a preset time window at a preset step length in the to-be-played speech to obtain at least one speech segment (
“Our approach is a continuous deep learning sliding window predictor, inspired by [Kim et al. 2015]. The sliding window approach means our predictor is able to represent a complex non-linear regression between the input phonetic description and output video representation of continuous speech that naturally includes context and coarticulation efects. Our results demonstrate the improvement of using a neural network deep learning approach over the decision tree approach in [Kim et al. 2015]. The use of overlapping sliding windows more directly focuses the learning on capturing localized context and coarticulation efects and is better suited to predicting speech animation than conventional sequence learning approaches, such as 
    PNG
    media_image5.png
    804
    647
    media_image5.png
    Greyscale


    PNG
    media_image2.png
    102
    455
    media_image2.png
    Greyscale
); 
generating, based on the at least one speech segment, a mouth shape control parameter sequence for the 10to-be-played speech by (
Fig. 1.

    PNG
    media_image6.png
    234
    645
    media_image6.png
    Greyscale
):
generating, based on the at least one speech segment, 20a two-dimensional feature matrix sequence (

    PNG
    media_image7.png
    766
    649
    media_image7.png
    Greyscale
  Taylor 5.2.

    PNG
    media_image2.png
    102
    455
    media_image2.png
    Greyscale
 Taylor Fig. 4. ); and 
inputting the two-dimensional feature matrix sequence into a pre-established convolutional neural network to obtain the mouth shape control parameter sequence (

    PNG
    media_image8.png
    257
    443
    media_image8.png
    Greyscale

Taylor 5 DEEP LEARNING SLIDING WINDOW REGRESSION.  
Taylor states “Since the mapping from phonetic subsequences to animation subsequences can be very complex, we instantiate h using a deep neural network. Our learning objective is minimizing square loss between the ground truth ixed-length subsequence and its corresponding prediction outputs among training data.”  Taylor 5 DEEP LEARNING SLIDING WINDOW REGRESSION.
Taylor states “One can equivalently view our sliding window predictor as a variant of a convolutional deep learning architecture.”  Taylor 5.1 Deep Learning Details & Discussions.), 
wherein the pre-established convolutional neural network is used to 25characterize corresponding relationships between two-dimensional feature matrices and mouth shape control parameters (Taylor Fig. 4. 
    PNG
    media_image8.png
    257
    443
    media_image8.png
    Greyscale
  Taylor Figs. 1, 6.); and 
controlling, in response to playing the to-be-played speech, a preset mouth shape of the three-dimensional virtual portrait to change based on the mouth shape control parameter sequence (
Fig. 1, showing the mouth shape of an animated character. 
“Other retargeting approaches are possible, and by design, independent of our speech animation prediction approach. Mesh deformation transfer [Sumner and Popović 2004] may be used to automate retargeting of reference shapes for rig-space deformation for example.  Deformation transfer could also be used per-frame to transfer prediction animation to an un-rigged character mesh.”  Taylor 93:7 left col. lines 31-36. 

    PNG
    media_image9.png
    470
    645
    media_image9.png
    Greyscale

The mouth shape is preset according the character model as shown in Fig. 6. 
The “retargeting approaches” retargets mouth shapes to 3D models as indicated by Fig. 6.  
The recited “Mesh deformation transfer” and “un-rigged character mesh” indicate a three-dimensional virtual portrait.  Taylor 93:7 left col. lines 31-36.).
Taylor teaches or suggests a three-dimensional virtual portrait as the Examiner has explained.  However, there is no explicit disclosure.  The Examiner takes an Official Notice that it would have been well-known in the art that facial expressions, including shapes of a mouth, may be retargeted to a 3D model.  The benefits of combining this well-known knowledge would have been that more expressive and/or sophisticated models may be animated.  It would have made animations more interesting and attractive.  Applicant does not traverse the examiner’s assertion of official notice or applicant’s traverse is not adequate, the common knowledge or well-known in the art statement is taken to be admitted prior art because applicant either failed to traverse the examiner’s assertion of official notice or that the traverse was inadequate. 

Regarding Claim 3, Taylor discloses The method according to claim 1, 
wherein the generating, based on the at least one speech segment, the 30two-dimensional feature matrix sequence comprises: 
generating, for a speech segment of the at least one 32speech segment, at least one two-dimensional feature matrix for the speech segment (
Taylor 5.2 Feature Representation. 
    PNG
    media_image7.png
    766
    649
    media_image7.png
    Greyscale


    PNG
    media_image2.png
    102
    455
    media_image2.png
    Greyscale
 Taylor Fig. 4.); and 
splicing, based on an order of the at least one speech segment in the to-be-played speech, the generated at least 5one two-dimensional feature matrix into the two-dimensional feature matrix sequence (
    PNG
    media_image2.png
    102
    455
    media_image2.png
    Greyscale
 Taylor Fig. 4).  

Regarding Claim 4, Taylor discloses The method according to claim 3, 
wherein the generating, for the speech segment of the at least one speech segment, the two-dimensional feature matrix for the speech segment 10comprises: 
dividing the speech segment into a preset number of speech sub-segments, wherein two adjacent speech sub-segments partially overlap (
    PNG
    media_image2.png
    102
    455
    media_image2.png
    Greyscale
  Taylor Fig. 4.

    PNG
    media_image10.png
    216
    643
    media_image10.png
    Greyscale
); 
extracting, for a speech sub-segment in the preset number of speech sub-segments, a feature of the speech sub-segment to obtain a speech feature vector for the speech sub-segment (Taylor 5.2 Feature Representation. 
    PNG
    media_image7.png
    766
    649
    media_image7.png
    Greyscale
); and 
generating, based on obtained preset number of speech feature vectors, the two-dimensional feature matrix for the 20speech segment (
    PNG
    media_image11.png
    26
    337
    media_image11.png
    Greyscale
).  

Regarding Claim 5, Taylor discloses The method according to claim 1, 
wherein the generating, based on the at least one speech segment, the mouth shape control parameter sequence for the to-be-played speech comprises:  
25generating, for a speech segment of the at least one speech segment, a phoneme sequence of the speech segment, and encoding the phoneme sequence to obtain phoneme information (
    PNG
    media_image7.png
    766
    649
    media_image7.png
    Greyscale


    PNG
    media_image12.png
    211
    643
    media_image12.png
    Greyscale
); 
inputting a phoneme information sequence composed of at 30least one piece of phoneme information into a pre-established mouth shape key point predicting model to obtain a mouth shape key point information sequence composed 33of at least one piece of mouth shape key point information (Taylor Fig. 4. 
    PNG
    media_image8.png
    257
    443
    media_image8.png
    Greyscale
.  Taylor Fig. 6.), 
wherein the pre-established mouth shape key point predicting model is used to characterize a corresponding relationship between the phoneme information sequence and 5the mouth shape key point information sequence (Id.); and 
generating, based on the mouth shape key point information sequence, the mouth shape control parameter sequence (Fig. 6.).  

Regarding Claim 6, Taylor discloses The method according to claim 5, 
wherein the generating, 10based on the mouth shape key point information sequence, the mouth shape control parameter sequence comprises: 
obtaining, for mouth shape key point information in the mouth shape key point information sequence, at least one mouth shape control parameter corresponding to the mouth 15shape key point information based on a pre-established corresponding relationship between sample mouth shape key point information and a sample mouth shape control parameter (

    PNG
    media_image13.png
    482
    636
    media_image13.png
    Greyscale
“Retargeting approaches that are of particular interest are those that can be pre-computed once . ); and 
generating the mouth shape control parameter sequence 20based on the obtained at least one mouth shape control parameter (
    PNG
    media_image14.png
    411
    610
    media_image14.png
    Greyscale
).  

Regarding Claim 7, Taylor discloses The method according to claim 5, 
wherein the pre-established mouth shape key point predicting model is a recurrent neural network, and a loop body of the recurrent neural network is 25a long short-term memory (
    PNG
    media_image8.png
    257
    443
    media_image8.png
    Greyscale
 Taylor Fig. 4. Taylor states “Since the mapping from phonetic subsequences to animation subsequences can be very complex, we instantiate h using a deep neural network. Our learning objective is minimizing square loss between the ground truth ixed-length subsequence and its corresponding prediction outputs among training data.”  Taylor 5 DEEP LEARNING SLIDING WINDOW REGRESSION.  Taylor states “One can equivalently view our sliding window predictor as a variant of a convolutional deep learning architecture.”  Taylor 5.1 Deep Learning Details & Discussions.

    PNG
    media_image15.png
    175
    647
    media_image15.png
    Greyscale


    PNG
    media_image16.png
    148
    642
    media_image16.png
    Greyscale

Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine Taylor’s embodiments.  The suggestion/motivation would have been in order to consider a wider consideration of the context of the speech by maintaining more context information by using LSTM.  It would have been a simple substitution of known element for the other that produces predictable results (KSR). ).  

Regarding Claim 8, Taylor discloses An apparatus for controlling mouth shape changes of a three-dimensional virtual portrait, comprising: 
at least one processor; and a memory storing instructions, the instructions when 30executed by the at least one processor, cause the at least one processor to perform operations, the operations 34comprising: 
acquiring a to-be-played speech (See Claim 1 rejection for detailed analysis.); 
sliding a preset time window at a preset step length in the to-be-played speech to obtain at least one speech 5segment (See Claim 1 rejection for detailed analysis.); 
generating, based on the at least one speech segment, a mouth shape control parameter sequence for the to-be-played speech (See Claim 1 rejection for detailed analysis.) by:
generating, based on the at least one speech segment, a two-dimensional feature matrix sequence, and inputting the two-dimensional feature matrix sequence into a pre-established convolutional neural network to obtain the mouth shape control parameter sequence, wherein the pre-established convolutional neural network is used to characterize corresponding relationships between two-dimensional feature matrices and mouth shape control parameters (See Claim 1 rejection for detailed analysis.); and 
controlling, in response to playing the to-be-played 10speech, a preset mouth shape of the three-dimensional virtual portrait to change based on the mouth shape control parameter sequence (See Claim 1 rejection for detailed analysis.).  

Regarding Claim 10, Taylor discloses The apparatus according to claim 8, 
wherein the generating, based on the at least one speech segment, the two-dimensional feature matrix sequence comprises: 
generating, for a speech segment of the at least one 30speech segment, at least one two-dimensional feature matrix for the speech segment (See Claim 3 rejection for detailed analysis.); and 
splicing, based on an order of the at least one speech 35segment in the to-be-played speech, the generated at least one two-dimensional feature matrix into the two-dimensional feature matrix sequence (See Claim 3 rejection for detailed analysis.).  

Regarding Claim 11, Taylor discloses The apparatus according to claim 10, 
wherein the generating, 5for the speech segment of the at least one speech segment, the two-dimensional feature matrix for the speech segment comprises: 
dividing the speech segment into a preset number of speech sub-segments, wherein two adjacent speech 10sub-segments partially overlap (See Claim 4 rejection for detailed analysis.); 
extracting, for a speech sub-segment in the preset number of speech sub-segments, a feature of the speech sub-segment to obtain a speech feature vector for the speech sub-segment (See Claim 4 rejection for detailed analysis.); and  
15generating, based on obtained preset number of speech feature vectors, the two-dimensional feature matrix for the speech segment (See Claim 4 rejection for detailed analysis.).  

Regarding Claim 12, Taylor discloses The apparatus according to claim 8, wherein the generating, based on the at least one speech segment, the mouth shape 20control parameter sequence for the to-be-played speech comprises: 
generating, for a speech segment of the at least one speech segment, a phoneme sequence of the speech segment, and encoding the phoneme sequence to obtain phoneme 25information (See Claim 5 rejection for detailed analysis.); 
inputting a phoneme information sequence composed of at least one piece of phoneme information into a pre-established mouth shape key point predicting model to obtain a mouth shape key point information sequence composed 30of at least one piece of mouth shape key point information (See Claim 5 rejection for detailed analysis.), 
wherein the pre-established mouth shape key point predicting model is used to characterize a corresponding 36relationship between the phoneme information sequence and the mouth shape key point information sequence (See Claim 5 rejection for detailed analysis.); and 
generating, based on the mouth shape key point information sequence, the mouth shape control parameter 5sequence (See Claim 5 rejection for detailed analysis.).  

Regarding Claim 13, Taylor discloses The apparatus according to claim 12, wherein the generating, based on the mouth shape key point information sequence, the mouth shape control parameter sequence comprises: 
obtaining, for mouth shape key point information in the 10mouth shape key point information sequence, at least one mouth shape control parameter corresponding to the mouth shape key point information based on a pre-established corresponding relationship between sample mouth shape key point information and a sample mouth shape control parameter (See Claim 6 rejection for detailed analysis.);  15and 
generating the mouth shape control parameter sequence based on the obtained at least one mouth shape control parameter (See Claim 6 rejection for detailed analysis.).  

Regarding Claim 14, Taylor discloses The apparatus according to claim 12, wherein the 20pre-established mouth shape key point predicting model is a recurrent neural network, and a loop body of the recurrent neural network is a long short-term memory (See Claim 7 rejection for detailed analysis.).  

A non-transitory computer readable medium, storing a computer program thereon, wherein the computer program, 25when executed by a processor, causes the processor to perform operations, the operations comprising: 
acquiring a to-be-played speech (See Claim 1 rejection for detailed analysis.); 
sliding a preset time window at a preset step length in the to-be-played speech to obtain at least one speech 30segment (See Claim 1 rejection for detailed analysis.); 
generating, based on the at least one speech segment, 37a mouth shape control parameter sequence for the to-be-played speech; and controlling, in response to playing the to-be-played speech (See Claim 1 rejection for detailed analysis.)by: generating, based on the at least one speech segment, a two-dimensional feature matrix sequence, and inputting the two-dimensional feature matrix sequence into a pre-established convolutional neural network to obtain the mouth shape control parameter sequence, wherein the pre-established convolutional neural network is used to characterize corresponding relationships between two-dimensional feature matrices and mouth shape control parameters (See Claim 1 rejection for detailed analysis.);
and controlling, in response to playing the to-be-played speech, a preset mouth shape of a three-dimensional virtual 5portrait to change based on the mouth shape control parameter sequence (See Claim 1 rejection for detailed analysis.).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Kim, Taehwan, et al. "A decision tree framework for spatiotemporal sequence prediction." Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015.  Kim et al. discloses a lot of the detailed technical details borrowed by Taylor et al. 
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZHENGXI LIU whose telephone number is (571)270-7509.  The examiner can normally be reached on M-F 9 AM - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on (571) 272-7794.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 






/ZHENGXI LIU/Primary Examiner, Art Unit 2611