DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Priority
Receipt is acknowledged of certified copies of papers submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file. 

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 07/13/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
	
Response to Amendments and Arguments
Applicant's arguments filed 08/16/2022 have been fully considered but they are not persuasive.

Regarding a rejection over Kang (US PG Pub.  2022/0044463) in view of Zhou (“Visemenet: Audio-driven animator-centric speech animation”, ACM, published in 2018), applicant amended independent claims by adding several limitations.

Applicant argued (Remarks, pages 6-7) that Kang in view of Zhou fails to teach the following limitation added to the independent claims.

“PPGs of target phonetic features, a target phonetic feature being a phonetic feature having complete semantics which is obtained by slicing according to the semantics of the speech”

In response, the examiner notes that Kang is a published patent application by the USPTO and corresponds to a published reference (CN 110503942), which was submitted by the applicant (in an IDS filed on 07/14/2021) and cited by other patent offices (European Patent Office, Japan Patent Office, and Korean Patent Office) to reject similar claims of a corresponding patent application filed in other patent offices (EPO, JPO or KPO). 

Kang discloses an apparition / a method of a speech-driven animation by mapping phonemes to mouth shapes ([0027], a Speech2Face system) using various neural networks (Abstract, [0028], [0035], [0045-0046], [0061]). In particular, Kang discloses analyzing speech frames to obtain linguistic information such as phonemes ([0008], [0045-0046], [0055], [0069-0071]). 
In light of the specification, the claimed “a phonetic feature” refers to a phoneme or a syllable (Spec. [0003], At present, a phoneme, a syllable, etc. are mainly used as phonetic features). The claimed “a phonetic feature having complete semantics” just means a phoneme without truncation (Spec. [0060-0061], “the dynamic slicing and having the complete semantics, an information discontinuity due to that a syllable is truncated manually may be eliminated”). Therefore, the argued limitation: “a phonetic feature having complete semantics which is obtained by slicing according to the semantics of the speech” means segmenting a speech signal to obtain a complete phoneme (that is, without truncating a phoneme segment). Kang discloses the feature defined by the above underlined limitation ([0045-0046], [0051], [0054-0055], identifying speech frames pertain to a phoneme). 

	The examiner further points out that the secondary reference to Zhou also discloses segmenting a speech signal according to a phoneme group (Zhou, Fig. 1, Abstract, Introduction). A phoneme or phoneme group meets the claimed “a phonetic feature having complete semantics”. In addition, both Kang and Zhou further disclose a system of mapping phonemes from speech segments to mouth shapes using neural network-based techniques. Kang further discloses extracting PPG using neural network (Kang, [0038], [0055]). 

	Kang in view of Zhou either explicitly or implicitly discloses the added limitations to the independent claims. MPEP (2144.01) stated “[I]n considering the disclosure of a reference, it is proper to take into account not only specific teachings of the reference but also the inferences which one skilled in the art would reasonably be expected to draw therefrom.” In re Preda, 401 F.2d825, 826, 159 USPQ 342, 344 (CCPA 1968). 

	Applicant further argued (Remarks, page 7) that dependent claims are allowable because of dependency. For the same reasons explained above for independent claims, the argument is not persuasive. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claims 3, 9, and 15 are rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends.

Applicant canceled claims 2, 8 and 14. Now claims 3, 9 and 15 depend from a cancelled claim. The examiner assumes claims 3, 9 and 15 depend from their corresponding independent claims 1, 7 and 13, respectively.   

	Claim Rejections - 35 USC § 103
Claims 1, 3-4, 6-7, 9-10, 12-13, 15-16 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Kang et al. (US PG Pub. 2022/0044463, referred to as Kang) in view of Zhou et al. ("Visemenet: Audio-driven animator-centric speech animation", published in 2018, referred to as Zhou).

Regarding claims 1, 7, and 13, Kang teach A method for predicting a mouth-shape feature, comprising: 
recognizing a phonetic posterior gram (PPG) of a phonetic feature (Fig 4 and Paragraph 0052-0053); and 
performing a prediction on the PPG by using a neural network model, to predict a mouth-shape feature of the phonetic feature, (Paragraph 0062-0063; mentioned a pre-trained neural network being used to get expression parameter for mouth shape), 

wherein the PPG training sample comprises: PPGs of target phonetic features, the target phonetic features being obtained based on dynamic slicing and having complete semantics (Zhou, Fig 2 show the divided/sliced phoneme groups being used to get the probability distribution shown to be used in Fig 3 and Section 5: “Training”); and the mouth-shape feature training sample comprises: mouth-shape features corresponding to the PPGs of the target phonetic features (Zhou, Fig 2 shows the mouth shapes corresponding to phoneme group being used; Also see, fig 3 and Section 5: “Training”). Zhou is considered analogous to the claimed invention because it is also aimed towards animated face modeling using phoneme features. Therefore, it would have been obvious to one skilled in the art before the effective filling date of the claimed invention to have modified Kang to incorporate phoneme and corresponding mouth shapes for neural network training as taught by Zhou to improve performance of the system (Page 10, col 1, line 5-7).

	Even though Kang does mention a pre-trained neural network being used to get mouth shape parameter (Fig 4), it fails to specifically mention the training details relating to the pre-trained neural network being used. Therefore, it fails to teach the claimed limitation of: “the neural network model being obtained by training with training samples and an input thereof including a PPG and an output thereof including a mouth-shape feature, and the training samples including a PPG training sample and a mouth-shape feature training sample”
Zhou does teach the claimed limitation of, the neural network model being obtained by training with training samples and an input thereof including a PPG and an output thereof including a mouth-shape feature, and the training samples including a PPG training sample and a mouth-shape feature training sample (Fig 3 presents an overview of the training model for LSTM neural network which include phoneme probability group vector/ graph and associated facial landmark to get the facial result shown in Fig 5; See Page 5, Col 2 Paragraph 2-3; Also see section 5: “Training” for details on the training model). Here, phoneme probability group vector/ graph (shown in fig 3) can be associated to PPG mentioned in the claim as both are aimed towards presenting the probability of the phoneme being used. Furthermore, the facial landmark mentioned in the training stage can be also equated to mouth-shape feature sample mentioned in the claim. Zhou is considered analogous to the claimed invention because it is also aimed towards animated face modeling using phoneme features. Therefore, it would have been obvious to one skilled in the art before the effective filling date of the claimed invention to have modified Kang to incorporate neural network training model as taught by Zhou to improve performance of the system (Page 10, col 1, line 5-7).
As seen in the claim set, claims 1, 7, and 13 cover similar scope of invention. However, claim 1 is a method claim while claims 7 and 13 are device and computer readable medium claim respectively. Claims 1 method of using correspond with each claimed element in claim 7 and 13. Furthermore, Kang also mention of processor and memory (Fig 12 and Paragraph 0010 which include storing of the program code), and a computer readable medium (Paragraph 0011 and 0132) mentioned within claim 7 and 13. Therefore, claims 7 and 13 are rejected under same rationale as applied to claim 1.

Regarding claims 3, 9, and 15, Kang in view of Zhou teaches the method according to claim 2, the electronic device according to claim 8, and the medium according to claim 14; wherein a frequency of a target phonetic feature matches a frequency of a mouth-shape feature corresponding to a PPG of the target phonetic feature (Zhou, Fig 2, show the phoneme group list being synchronically matched with a specific mouth shape (landmark); also see Page 7, col 1, paragraph 4, lines 1-3 and Fig 3). Here, frequency matching of both elements is inherent as each phoneme group has specific mouth-shape (landmark) being assigned. Zhou is considered analogous to the claimed invention because it is also aimed towards animated face modeling using phoneme features. Therefore, it would have been obvious to one skilled in the art before the effective filling date of the claimed invention to have modified Kang to incorporate phoneme and corresponding mouth shapes for neural network training as taught by Zhou to improve performance of the system (Page 10, col 1, line 5-7).
Regarding claim 4, 10, and 16, Kang in view of Zhou teaches the method according to claim 1, the electronic device according to claim 7, and the medium according to claim 13; wherein the neural network model is a recurrent neural network (RNN) model having an autoregressive mechanism, and a process of training the RNN model includes (Zhou, show an LSTM model being trained which a type of RNN model; see fig 3 and Page 7, col 1, paragraph 4): 2120A12189US 
performing the training by using a mouth-shape feature training sample of a frame preceding a current frame as an input, by using a PPG training sample of the current frame as a condition constraint, and a mouth-shape feature 5training sample of the current frame as a target (Zhou, See Fig 3 and Page 7, col 1, paragraph 4 through col 2, Paragraph 7). Here, it can be seen that loss functions are being incorporated for training which include classification loss, regression loss, smoothness loss, and joint loss. The loss such as regression and smoothness are used for landmark displacement which can be equated to mouth-shape feature training mentioned in the claim. 
Zhou is considered analogous to the claimed invention because it is also aimed towards animated face modeling using phoneme features. Therefore, it would have been obvious to one skilled in the art before the effective filling date of the claimed invention to have modified Kang to incorporate phoneme and corresponding mouth shapes for neural network training as taught by Zhou to improve performance of the system (Page 10, col 1, line 5-7).
	Regarding claims 6, 12, and 18, Kang in view of Zhou teaches the method according to claim 1, the electronic device according to claim 7, and the medium according to claim 13; further comprising: performing predictions on PPGs of pieces of pieces of real speech data using the neural network model, to obtain mouth-shape features of the pieces of real speech data (Kang, fig 4 and Paragraph 0054-0055, show PPG being found; and Paragraph 0062-0063, mentioned a pre-trained neural network being used to get expression parameter for mouth shape). Kang however fails to specifically mention a mouth shape library being formed to use for mouth shape virtual image. 
Zhou does teach the claimed limitation of constructing a mouth-shape feature index library based on the mouth-shape features of the pieces of real speech data, the mouth-shape feature index library being used for synthesizing a mouth shape of a virtual image (Fig 2 show the identified 20 visual groups along with relevant mouth shape output and Viseme; also, see Page 3, section 3: “Algorithm Design”, Paragraph 1-5, for Viseme and phoneme prediction model for outputting the virtual output mouth shape shown in fig 2). Zhou is considered analogous to the claimed invention because it is also aimed towards animated face modeling using phoneme features. Therefore, it would have been obvious to one skilled in the art before the effective filling date of the claimed invention to have modified Kang to incorporate mouth shape output and feature library as taught by Zhou. Furthermore, one of ordinary skill in the art would have recognized that result of the combination was predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex lnc., 82 USPQ2D 1385 (U.S. 2007).

Allowable Subject Matter
Claims 5, 11 and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

The following is a statement of reasons for the indication of allowable subject matter:  
Applicant amended claims 5, 11 and 17 by including more specific limitations based on disclosure (Spec. [0071]). When considering all limitations (including limitations of a base claim) as a whole, the claimed invention defined by each of these dependent claims is sufficient to distinguish with prior art of the record. 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jialong He, whose telephone number is (571) 270-5359.  The examiner can normally be reached on Monday – Friday, 8:00AM – 4:30PM, EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Pierre Desir can be reached on (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JIALONG HE/Primary Examiner, Art Unit 2659