DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-9 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-9 of U.S. Patent No. 11,217,224. Although the claims at issue are not identical, they are not patentably distinct from each other because of the following:

Pending U.S. Applicant No. 17/533,459
U.S. Patent No. 11,217224
Claims 1, 8 and 9: receiving an articulatory feature of a speaker regarding a first language; receiving an input text of a second language; and generating output speech data for the input text of the second language that simulates the speaker's speech by inputting the input text of the second language and the articulatory feature of the speaker regarding the first language to a single artificial neural network multilingual text-to-speech synthesis model, wherein the single artificial neural network multilingual text-to-speech synthesis model is generated by learning similarity information between phonemes of the first language and phonemes of the second language based on a first learning data of the first language and a second learning data of the second language.
Claims 1, 6 and 9: receiving first learning data including a learning text of a first language and learning speech data of the first language corresponding to the learning text of the first language; receiving second learning data including a learning text of a second language and learning speech data of the second language corresponding to the learning text of the second language; generating a single artificial neural network text-to-speech synthesis model by learning similarity information between phonemes of the first language and phonemes of the second language based on the first learning data and the second learning data; receiving an articulatory feature of a speaker regarding the first language; receiving an input text of the second language; and generating output speech data for the input text of the second language that simulates the speaker's speech by inputting the input text of the second language and the articulatory feature of the speaker regarding the first language to the single artificial neural network text-to-speech synthesis model.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4-6 and 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Qian et al. (US 2012/0253781) in view of Fructuoso et al. (US 2015/0186359).

Claims 1 and 8-9,
Qian teaches a method for multilingual text-to-speech synthesis, comprising: receiving an articulatory feature of a speaker regarding a first language ([Figs. 1 & 3] [0027] speech synthesis using frame mapping-based cross-lingual voice transformation; the voice characteristics of the first language (L1) as spoken by the target speaker); 
receiving an input text of a second language ([Figs. 1 & 3] [0027] input text in foreign language that the target speaker desires to annunciate); and generating output speech data for the input text of the second language that simulates the speaker's speech by inputting the input text of the second language and the articulatory feature of the speaker regarding the first language to a multilingual text-to-speech synthesis model ([Figs. 1 & 3] [0027] by the using the HMM-based speech synthesis, the speech synthesis engine generates synthesized speech in the foreign language that resembles the speech of the target speaker in the native language, but which has the voice characteristics (e.g., pronunciation and/or tone quality) of the foreign language).
The difference between the prior art and the claimed invention is that Qian does not explicitly teach wherein the single artificial neural network multilingual text-to-speech synthesis model is generated by learning similarity information between phonemes of the first language and phonemes of the second language based on a first learning data of the first language and a second learning data of the second language.
Fructuoso teaches wherein the single artificial neural network multilingual text-to-speech synthesis model is generated by learning similarity information between phonemes of the first language and phonemes of the second language based on a first learning data of the first language and a second learning data of the second language ([0031] to identify a sequence of phonetic units, such as phonemes, in a phonetic representation of the text; mapping of sounds to corresponding linguistic feature values (phoneme identifiers) across multiple languages can facilitate providing prosody for multiple languages with a single model (a single trained neural network)).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Qian with teachings of Fructuoso by modifying the frame mapping approach for cross-lingual voice transformation as taught by Qian to include a single artificial neural network multilingual text-to-speech synthesis model is generated by learning similarity information between phonemes of the first language and phonemes of the second language based on a first learning data of the first language and a second learning data of the second language as taught by Fructuoso for the benefit of training a prosody model with speech from multiple languages can improve the quality of prosody information provided for languages for which only relatively small amounts of training data is available (Fructuoso [0010]).

Claim 2,
Fructuoso further teaches the method of claim 1, wherein the first learning data of the first language includes a learning text of the first language and learning speech data of the first language corresponding to the learning text of the first language, and the second learning data of the second language includes a learning text of the second language and learning speech data of the second language corresponding to the learning text of the second language ([0078-0080] a first set of data is obtained for a first language. The first set of data includes (i) first speech data for utterances in the first language, (ii) data indicating a transcription for the first speech data, and (iii) data identifying the first language; a second set of data is obtained for a second language that is different from the first language. The second set of data includes (i) second speech data for utterances in the first language, (ii) data indicating a transcription for the second speech data, and (iii) data identifying the second language; the first set of data for the first language and the second set of data for the second language are provided to a system configured to train a neural network).

Claim 4,
Qian further teaches the method of claim 1, further comprising: receiving a prosody feature of the speaker in the first language; and generating output speech data for the input text of the second language that simulates the speaker's speech and prosody by inputting the input text of the second language, the articulatory feature of the speaker regarding the first language, and the prosody feature to the single artificial neural network multilingual text-to-speech synthesis model ([0022] speech transformation engine performs pitch extraction on the source speech waveforms to extract the fundamental frequencies of the source speech waveforms).

Claim 5,
Qian further teaches the method of claim 4, wherein the prosody feature includes at least one of information on utterance speed, information on accentuation, information on voice pitch, or information on pause duration ([0022] pitch).

Claim 6,
Qian further teaches the method of claim 1, wherein receiving the articulatory feature includes: receiving an input speech of the first language; and extracting a feature vector from the input speech of the first language to generate the articulatory feature of the speaker regarding the first language ([0064] the extracted features includes fundamental frequencies 230, LSPs 232, and gains 234 (extracted feature vectors)).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Qian et al. (US 2012/0253781) in view of Fructuoso et al. (US 2015/0186359) and further in view of Newell et al. (US 2019/0191224).

Claim 3,
Qian and Fructuoso teach all the limitations in claim 1. The difference between the prior art and the claimed invention is that Qian nor Fructuoso explicitly teach receiving an emotion feature of the speaker in the first language; and generating output speech data for the input text of the second language that simulates the speaker's speech and emotion by inputting the input text of the second language, the articulatory feature of the speaker regarding the first language, and the emotion feature to the single artificial neural network multilingual text-to-speech synthesis model.
Newell teaches receiving an emotion feature of the speaker in the first language; and generating output speech data for the input text of the second language that simulates the speaker's speech and emotion by inputting the input text of the second language, the articulatory feature of the speaker regarding the first language, and the emotion feature to the single artificial neural network multilingual text-to-speech synthesis model ([0041] the spoken words can be assigned emotional values and converted to text; the text is then translated to a different language and natural language processing can be used to convert the translated text using the emotional values to a sound layer audio component to output synthesized speech).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Qian and Fructuoso with teachings of Dasgupta by modifying the frame mapping approach for cross-lingual voice transformation as taught by Qian to include receiving an emotion feature of the speaker in the first language; and generating output speech data for the input text of the second language that simulates the speaker's speech and emotion by inputting the input text of the second language, the articulatory feature of the speaker regarding the first language, and the emotion feature to the single artificial neural network multilingual text-to-speech synthesis model as taught by Newell for the benefit of translating one language to another using emotion characteristic (Newell [0041]).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Qian et al. (US 2012/0253781) in view of Fructuoso et al. (US 2015/0186359) and further in view of Dasgupta (US 2018/0330732).

Claim 7,
Qian in view of Fructuoso teach all the limitations in claim 6. The difference between the prior art and the claimed invention is that Qian nor Fructuoso teach wherein receiving the input text of the second language includes: converting the input speech of the first language into an input text of the first language; and converting the input text of the first language into an input text of the second language.
Dasgupta teaches wherein receiving the input text of the second language includes: converting the input speech of the first language into an input text of the first language; and converting the input text of the first language into an input text of the second language ([0035] language translation module translates the content from a first language to a second language (a preferred language) during content translation (during the text-to-speech or speech-to-text translation)).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Qian and Fructuoso with teachings of Dasgupta by modifying the frame mapping approach for cross-lingual voice transformation as taught by Qian to include wherein receiving the input text of the second language includes: converting the input speech of the first language into an input text of the first language; and converting the input text of the first language into an input text of the second language as taught by Dasgupta for the benefit of enabling the illiterate users to create the content or access communication services using speech to text and text to speech (Dasgupta [0025]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Chun et al. (US 2012/0278081) teaches a text-to-speech method for use in a plurality of languages, including: inputting text in a selected language; dividing the inputted text into a sequence of acoustic units; converting the sequence of acoustic units to a sequence of speech vectors using an acoustic model, wherein the model has a plurality of model parameters describing probability distributions which relate an acoustic unit to a speech vector; and outputting the sequence of speech vectors as audio in the selected language. A parameter of a predetermined type of each probability distribution in the selected language is expressed as a weighted sum of language independent parameters of the same type. The weighting used is language dependent, such that converting the sequence of acoustic units to a sequence of speech vectors includes retrieving the language dependent weights for the selected language.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHREYANS A PATEL whose telephone number is (571)270-0689. The examiner can normally be reached Monday-Friday 8am-5pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

SHREYANS A. PATEL
Examiner
Art Unit 2657



/SHREYANS A PATEL/Examiner, Art Unit 2656