DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on August 12, 2022 has been entered.

Response to Arguments
Applicants argue that the prior art cited fails to teach that the concept of using a trained neural network audio synthesis model that receives the phonemes, the phoneme durations, the fundamental frequency profiles for the phonemes as input to generate a signal representing synthesized human speech of the written text.  Applicants arguments have been considered, but are moot in view of new grounds of rejection.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 6, 8-9, 13, 14-15 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sharman in view of Bellegarda et al. (USPN 6,366,884), hereinafter referenced as Bellegarda and in further view of Fructuoso et al. (PGPUB 2016/0343366), hereinafter referenced as Fructuoso.

Regarding claims 1, 8 and 14, Sharman discloses a computer-implemented method, medium and system, hereinafter referenced as a method for using a text-to-speech (TTS) system to synthesize human speech from text, comprising: 
one or more processors (column 3, lines 43-55); and 
a non-transitory computer-readable medium or media comprising one or more sets of instructions which, when executed by at least one of the one or more processors (column 3, lines 43-55), causes steps to be performed comprising:
using a trained grapheme-to-phoneme model to convert written text to phonemes corresponding to the written text (column 1, lines 8-11, column 4, lines 1-3), but does not specifically teach 
inputting the phonemes into either: 
(1) a trained phoneme duration and fundamental frequency model or 
(2) a trained phoneme duration model and a trained fundamental frequency model, to output for a phoneme:
a phoneme duration; 
a fundamental frequency profile; and 
using a trained audio synthesis model that receives the phonemes, the phoneme durations, the fundamental frequency profiles for the phonemes, and for each phoneme, a probability whether the phoneme is voiced as an input to generate a signal representing synthesized human speech of the written text.
Bellegarda discloses a method comprising:
inputting the phonemes into either: 
(1) a trained phoneme duration and fundamental frequency model (trained; abstract with column 5, lines 5-27 with column 2, line 23 - column 3, line 43) or 
(2) a trained phoneme duration model and a trained fundamental frequency model, to output for a phoneme:
a phoneme duration (phoneme duration; abstract with column 6, lines 35-56); 
a fundamental frequency profile (column 2, line 23 - column 3, line 8); and 
using a trained audio synthesis model that receives the phonemes, the phoneme durations, the fundamental frequency profiles for the phonemes, and for each phoneme (column 2, line 23 - column 3, line 43), to improve synthesized speech.
Therefore, it would have been obvious to one of ordinary skill of the art to modify the method as described above, to assist with the modeling of phoneme duration in speech synthesis.
Fructuoso discloses a method of determining a probability whether the phoneme is voiced as an input to generate a signal representing synthesized human speech of the written text (p. 0049-0053), to allow for improve handling and synthesis of combinations of unseen linguistic features.
Therefore, it would have been obvious to one of ordinary skill of the art to modify the method as described above, to effectively map acoustic frames to linguistic model clusters during synthesis.
Regarding claims 2, 9 and 15, Sharman discloses a method wherein the step of using a trained grapheme- to-phoneme model to convert written text to phonemes corresponding to the written text comprises: 
using, for one or more words in the written text, a phoneme dictionary look-up to convert the one or more words to phonemes (dictionary look-up; column 4, lines 35-47).  
Regarding claims 6, 13 and 19, Sharman discloses a method wherein the phoneme represent phoneme with stresses, when applicable to the phoneme (stressed; column 10, lines 21-33).  

Claims 3-4, 10-11 and 16-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sharman in view of Bellegarda and Fructuoso and in further view of Ben Ezra et al. (PGPUB 2014/0236597), hereinafter referenced as Ben Ezra.

Regarding claims 3, 10 and 16, it is interpreted and rejected for similar reasons as set forth above, however does not specifically teach a method further comprising utilizing one or more computational efficiencies to help the text-to-speech system produce the signal representing synthesized human speech of the written text in real-time or faster than real-time.  
Ben Ezra discloses a method further comprising utilizing one or more computational efficiencies to help the text-to-speech system produce the signal representing synthesized human speech of the written text in real-time or faster than real-time (p. 0018-0029), to provide a more customized output.
Therefore, it would have been obvious to one of ordinary skill of the art to modify the method a described above, to provide a dynamic environment.
Regarding claims 4, 11 and 17, Sharman discloses a method wherein one of the one or more computation efficiencies comprises the trained audio synthesis model using multiple threads (wavelets) and overlapping computation on those threads to produce the signal representing synthesized human speech (overlap; column 7, lines 51-66).  

Claims 5, 12 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sharman in view of Bellegarda and Fructuoso and in further view of Kosek et al. (PGPUB 2006/0200344), hereinafter referenced as Kosek.

Regarding claims 5, 12 and 18, Sharman in view of Bellegarda and Fructuoso discloses a method as described above, but does not specifically teach wherein the fundamental frequency profile for a phoneme is a set of fundamental frequencies values equally spaced in a time domain across the phoneme duration for the phoneme.  
Kosek discloses a method wherein the fundamental frequency profile for a phoneme is a set of fundamental frequencies values equally spaced in a time domain across the phoneme duration for the phoneme (p. 0026 and 0124), to assist with identifying the data.
Therefore, it would have been obvious to one of ordinary skill of the art to modify the method a described above, to enhance the signal and provide a desirable output.

Allowable Subject Matter
Claims 7, 13 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAKIEDA R JACKSON whose telephone number is (571)272-7619. The examiner can normally be reached Mon - Fri 6:30a-2:30p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on 571.272.5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JAKIEDA R JACKSON/Primary Examiner, Art Unit 2657