DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments have been fully considered but they are not persuasive.   Applicant essentially argues that the prior art on record fails to explicitly disclose that the first model data is generated “at runtime” (REMARKS, page 9).  There is no clear indication that the claimed invention generates the first model data “at runtime”.  The text in the “first data” and the text used to synthesize speech may be the same text entered at different times (during training and during use).  For this reason, examiner maintains the previous grounds of rejection.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 21 and 31 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 21 and 32 of copending Application No. 16877863.  Although the claims at issue are not identical, they are not patentably distinct from each other because they are obvious variants of the same invention.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Claims 21 and 31 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 5, and 13 of U.S. Patent No. 10692484.  Although the claims at issue are not identical, they are not patentably distinct from each other because they are obvious variants of the same invention.

Claims of the copending application		Claims of the instant application
21. (New) A computer-implemented method, comprising: receiving metadata corresponding to a desired synthetic voice, the metadata including audio data and input data representing text; processing the input data to determine phonetic data; processing the audio data to determine acoustic feature data; and using the phonetic data and the acoustic feature data to configure a trained speech model configured to output audio data corresponding to the desired synthetic voice, wherein the trained speech model is configured using: a first task, and a second task corresponding to increasing a metric of perceived quality of audio output data.

Other independent claims are similar to the above claim.
21. (New) A computer-implemented method comprising: receiving first data representing text to be used to create synthesized speech; receiving first metadata representing a first vocal attribute of speech; generating, using the first metadata and a first trained model, first model data representing the first vocal attribute; and using a second trained model, the first data, and the first model data to generate first audio output data corresponding to synthesized speech of the text, the synthesized speech corresponding to the first vocal attribute.  



Other independent claims are similar to the above claim.



Claims of the patent					Claims of application
1. A computer-implemented method for generating speech from text, the method comprising: training, using multi-task learning, to create a speech model, wherein: the speech model includes: a sample model configured to input text data and output audio samples, a conditioning model configured to input text metadata corresponding to the input text data and to condition the sample model, and an output model configured to input the output audio samples and output audio output data, and the training includes: using a first section of the conditioning model, configuring at least one hidden layer of the sample model in accordance with a first task, wherein the first task includes minimizing a difference between the audio output data and corresponding training data; using a second section of the conditioning model, configuring the at least one hidden layer of the sample model in accordance with a second task, wherein the second task includes maximizing a metric of perceived quality of the audio output data; including the first section of the conditioning model in the speech model, and discarding the second section of the conditioning model; and generating, using first text data and the speech model, first audio output data corresponding to the input text data.
Independent claims 5 and 13 are similar to claim 1 above.
21. (New) A computer-implemented method comprising: receiving first data representing text to be used to create synthesized speech; receiving first metadata representing a first vocal attribute of speech; generating, using the first metadata and a first trained model, first model data representing the first vocal attribute; and using a second trained model, the first data, and the first model data to generate first audio output data corresponding to synthesized speech of the text, the synthesized speech corresponding to the first vocal attribute.  
 






















Independent claim 31 are similar to claim 1 above.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 21-22, 24-25, 28-29, 31-32, 34-35, and 38-39 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Edrenkin (USPG 2017/0092258, hereinafter referred to as Edrenkin).

Regarding claims 21 and 31, Edrenkin discloses a computer-implemented method and system, comprising: at least one processor (figure 1 and/or paragraphs 40-47, memory for storing instructions and processor); and at least one memory comprising instructions that, when executed by the at least one processor (figure 1 and/or paragraphs 40-47, memory for storing instructions and processor), cause the method and system to: receiving first data representing text to be used to create synthesized speech (figure 2, steps 202 and/or 210); receiving first metadata representing a first vocal attribute of speech (figure 2, step 206, extracting “features” from acoustic data and/or textual data); generating, using the first metadata and a first trained model, first model data representing the first vocal attribute (figure 2, step 206, “generating a set of training data of speech attributes” from extracted features); and using a second trained model, the first data, and the first model data to generate first audio output data corresponding to synthesized speech of the text, the synthesized speech corresponding to the first vocal attribute (figure 2, step 214, generating synthesized speech using text data, speech attribute or first model, and acoustic space or second model); and causing output of the first audio output (figure 2, step 216);  

Regarding claims 22, 24-25, 28-29, 32, 34-35, and 38-39, Edrenkin further discloses the computer-implemented method of claim 21, wherein the first vocal attribute comprises a style of the speech (paragraphs 12, 26, 59, speaking style); wherein the first vocal attribute comprises an accent of the speech (paragraphs 12, 26, 59, accent); wherein the first metadata represents a linguistic context feature (paragraphs 52-53, phonetic and linguistic features); further comprising receiving second metadata representing a second vocal attribute of speech, wherein: generating the first model data further uses the second metadata; and the first model data further represents the second vocal attribute (figure 2, step 206, “generating a set of training data of speech attributes” from a plurality of extracted features); further comprising: receiving a request to change from the first vocal attribute to a second vocal attribute (paragraph 115, “the text 410 and the speech attribute 420 are received separately (e.g., at different times, or from different applications, or from different users, or in different files, etc.), via the input module 113”); receiving second metadata representing the second vocal attribute of speech (paragraph 115, “the speech attribute 420 are received separately (e.g., at different times, or from different applications, or from different users, or in different files, etc.), via the input module 113”); generating, using the second metadata and the first trained model, second model data representing the second vocal attribute (system in figure 2 can be trained by different users); receiving second data representing second text to be used to create synthesized speech (processed in figure 2 can be trained by different users; same process for different users; see claim 1); and using the second trained model, the second data, and the second model data to generate second audio output data corresponding to second synthesized speech of the second text, the second synthesized speech corresponding to the second vocal attribute (processed in figure 2 can be trained by different users; same process for different users; see claim 1).  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 23 and 33 are rejected under 35 U.S.C. 103 as being unpatentable over Edrenkin in view of Koul et al. (USPG 2016/0064033, hereinafter referred to as Koul).

Regarding claims 23 and 33, Edrenkin fails to explicitly disclose, however, Koul teaches wherein the style of the speech corresponds to a newscaster (paragraphs 32-35, abstract section and/or process in figure 6).  
Since Edrenkin and Koul are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of synthesizing speech using voice characteristics of a particular person.  One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).

Claims 26 and 36 are rejected under 35 U.S.C. 103 as being unpatentable over Edrenkin in view of Legat (USPG 2014/0222415, hereinafter referred to as Legat).

Regarding claims 26 and 36, Edrenkin fails to explicitly disclose, however, Legat teaches wherein the first metadata represents grapheme-to-phoneme data (paragraphs 72-73 and 80-86, TTS system uses grapheme-phoneme or G2P conversion process before creating synthesized speech).  
Since Edrenkin and Legat are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of grapheme-to-phoneme conversion process for TTS.  One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).


Claims 27 and 37 are rejected under 35 U.S.C. 103 as being unpatentable over Edrenkin in view of Cosatto (USPN 7209882, hereinafter referred to as Cosatto).

Regarding claims 27 and 37, Edrenkin fails to explicitly disclose, however, Cosatto teaches wherein the first metadata represents duration data (figure 1, text and linguistic processing 102 generates prosody information such as phoneme duration is feed into the speech synthesis system).  
Since Edrenkin and Cosatto are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of processing text into prosodic information such as phoneme duration for speech synthesis system.  One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).

Allowable Subject Matter
Claims 30 and 40 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Desimone (USPG 2006/0074677) teaches a text-to-speech system that is considered pertinent to the claimed invention.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUYEN X VO whose telephone number is (571)272-7631. The examiner can normally be reached M-F, 8-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HUYEN X VO/Primary Examiner, Art Unit 2656