DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant's arguments with respect to 35 U.S.C. 112(b) rejection of claims 2, 7-8, 12 and 17-18 have been considered and found persuasive due to amendments and/or cancellation of the claim, and the rejection has been withdrawn.
Applicant's arguments with respect to 35 U.S.C. 103 in regards to claims 1 and 11 have been considered but are moot due to new grounds of rejection necessitated by amendments. See detailed rejection below. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-6, 11 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Honeycutt (US 2012/0265533) in view of Li et al. (CN 107301859) in view of Kaszczuk et al. (US 9,484,014).

Claims 1 and 11,
Honeycutt teaches a text-to-speech conversion system comprising: a text converter adapted to convert input text to at least one phoneme selected from a plurality of phonemes stored in memory; a machine-learning model storing voice patterns for a. plurality of individuals and adapted to receive the at least one phoneme and an identity of a speaker and to generate acoustic features for each phoneme: and to receive the generated acoustic features and to generate a speech signal simulating a voice of the identified speaker ([0018-0022] TTS system 200 for outputting speech having voice characteristics based on a speaker profile; receives communications (e.g., e-mail, text message) and identifies metadata; metadata (e.g., e-mail address, contact card information) is used by metadata module 204 to generate a speaker profile; the raw text and the speaker profile is input to TTS engine 210; TTS 210 uses the speaker profile to select voice data from voice database 208; the voice data is used by TTS engine 210 to convert the raw text to speech having voice characteristics that best match the speaker profile; TTS engine 210 includes a synthesizer that incorporates a model of the human vocal tract or other human voice characteristics to create a synthetic speech output according to the speaker profile; TTS engine 210 performs text-to-phoneme or grapheme-to-phoneme conversion where phonetic transcriptions are assigned to each word and the text is divided; phonetic transcriptions and prosody information together make up a symbolic linguistic representation of the raw text; the synthesizer converts the symbolic linguistic representation into sound; the synthesizer can include the computation of a target prosody (e.g., pitch contour, phoneme durations), which is applied to the output speech; the target prosody can be determined based on the voice data that is selected based on a speaker profile).
The difference between the prior art and the claimed invention is that Honeycutt does not teach enhance acoustic features for each phoneme, wherein the enhanced acoustic features comprise at least one of spectral enhancement or focal enhancement; and a decoder adapted to receive the generated acoustic features and to generate a speech signal simulating a voice of the identified speaker in a language.
Li teaches enhance acoustic features for each phoneme, wherein the enhanced acoustic features comprise at least one of spectral enhancement or focal enhancement ([pg. 6] for a given N source voice characteristic parameter vector (xk), the formula (1) to dynamically search for N target voice characteristic parameter vector (Yk) so that the distance function value C (Yk)) is minimal; in the unit selection process considers two factors: (1) to enhance the minimum spectral distance between the phoneme information of the matching degree between the source speech characteristic parameter vector and the target voice characteristic parameter vector; (2) to ensure selecting the target voice characteristic parameter vector has a frame continuity to the phoneme information more complete).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Honeycutt with teachings of Li by modifying voice assignment for text-to-speech output as taught by Honeycutt to include enhance acoustic features for each phoneme, wherein the enhanced acoustic features comprise at least one of spectral enhancement or focal enhancement as taught by Li for the benefit of increasing conversion voice in individual speaker characteristics and improving the quality of voice (Li [pg. 2]).
The difference between the prior art and the claimed invention is that Honeycutt nor Li teach a decoder adapted to receive the generated acoustic features and to generate a speech signal simulating a voice of the identified speaker in a language.
Kaszczuk teaches a decoder adapted to receive the generated acoustic features and to generate a speech signal simulating a voice of the identified speaker in a language ([col. 8 lines 4-49] encoder/decoder for encoding and decoding speech data, such as digitized audio data, feature vectors, etc.; the speech synthesis engine 218 may include specialized databases or models to account for such user preferences; TTS device 202 may also be configured to perform TTS processing in multiple languages; for each language, the TTS module 214 may include specially configured data, instructions and/or components to synthesize speech in the desired language(s)).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Honeycutt and Li with teachings of Kaszczuk by modifying voice assignment for text-to-speech output as taught by Honeycutt to include a decoder adapted to receive the generated acoustic features and to generate a speech signal simulating a voice of the identified speaker in a language as taught by Kaszczuk for the benefit of enabling the TTS module to improve speech recognition beyond the capabilities provided in the training corpus (Kaszczuk [col. 8 lines 47-49]).

Claims 5 and 15,
Kaszczuk further teaches wherein the text converter is further adapted to detect a language of the input text to be converted ([col. 8 line 44] desired language).

Claims 6 and 16,
Kaszczuk further teaches the system of claim 5, wherein the language of the input text to be converted is detected using an n-gram approach ([col. 6 line 17] Hidden Markov Models (HMM)).

Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Honeycutt (US 2012/0265533) in view of Li et al. (CN 107301859) in view of Kaszczuk et al. (US 9,484,014) and further in view of Summerfield (US 2016/0372116).

Claims 2 and 12,
Honeycutt, Li and Kaszczuk teach all the limitations in claim 1. The difference between the prior art and the claimed invention is that Honeycutt, Li nor Kaszczuk teach wherein the plurality of phonemes stored in memory comprise phonemes of the International Phonetic Alphabet and silence and breath.
Summerfield teaches wherein the plurality of phonemes stored in memory comprise phonemes of the International Phonetic Alphabet and silence and breath ([0031] a phonetic look-up dictionary to determine the speech units (triphones, diphones, senones or phonemes); recognizing silence and breathe noises).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Honeycutt, Li and Kaszczuk with teachings of Summerfield by modifying voice assignment for text-to-speech output as taught by Honeycutt to include wherein the plurality of phonemes stored in memory comprise phonemes of the International Phonetic Alphabet and silence and breath as taught by Summerfield for the benefit of recognizing the individual and their speech (Summerfield [0002]).

Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Honeycutt (US 2012/0265533) in view of Li et al. (CN 107301859) in view of Kaszczuk et al. (US 9,484,014) and further in view of Arik et al. (US 2018/0036880).

Claims 3 and 13,
Honeycutt, Li and Kaszczuk teach all the limitations in claim 1. The difference between the prior art and the claimed invention is that Honeycutt, Li nor Kaszczuk teach wherein the machine-learning model comprises a neural network model.
Arik teaches wherein the machine-learning model comprises a neural network model ([0048] TTS system using deep neural network).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Honeycutt, Li and Kaszczuk with teachings of Arik by modifying voice assignment for text-to-speech output as taught by Honeycutt to include wherein the machine-learning model comprises a neural network model as taught by Arik for the benefit of building average voice models, with i-vectors representing the speakers as additional inputs and separate output layers for each target speaker (Arik [0048]).

Claims 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Honeycutt (US 2012/0265533) in view of Li et al. (CN 107301859) in view of Kaszczuk et al. (US 9,484,014) and further in view of Akagi et al. (JP 2006243178).

Claims 4 and 14,
Honeycutt, Li and Kaszczuk teach all the limitations in claim 1. The difference between the prior art and the claimed invention is that Honeycutt, Li nor Kaszczuk teach wherein spectral enhancement comprises increasing a peak of a spectral envelope or decreasing a trough of the spectral envelope and focal enhancement comprises emphasizing the difference between a first frame and a second frame.
Akagi teaches wherein spectral enhancement comprises increasing a peak of a spectral envelope or decreasing a trough of the spectral envelope and focal enhancement comprises emphasizing the difference between a first frame and a second frame ([pg. 5] the spectral envelope is achieved by changing the formant frequency of the spectral envelope, ie the position of the peaks and valleys; the purpose of the deformation of the spectrum envelope is to break the phoneme, and the positional relationship between the peaks and valleys of the spectrum envelope is important for the perception of the phoneme).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Honeycutt, Li and Kaszczuk with teachings of Akagi by modifying voice assignment for text-to-speech output as taught by Honeycutt to include wherein spectral enhancement comprises increasing a peak of a spectral envelope or decreasing a trough of the spectral envelope and focal enhancement comprises emphasizing the difference between a first frame and a second frame as taught by Akagi for the benefit of perceiving the content of conversational speech without causing surrounding people to feel noisy (Akagi [Tech-Problem]).

Claims 9-10 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Honeycutt (US 2012/0265533) in view of Li et al. (CN 107301859) in view of Kaszczuk et al. (US 9,484,014) and further in view of Vanreusel et al. (US 2017/0309272).

Claims 9 and 19,
Honeycutt, Li and Kaszczuk teach all the limitations in claim 1. The difference between the prior art and the claimed invention is that Honeycutt, Li nor Kaszczuk teach wherein the generated acoustic features include accent acoustic features and the generated speech signal further simulate a voice of the identified speaker in a language and in an accent.
Vanreusel teaches wherein the generated acoustic features include accent acoustic features and the generated speech signal further simulate a voice of the identified speaker in a language and in an accent ([0031] the phonetic inventory includes a mapping of words (e.g., regional nouns), user accent classifications, and phonetic transcriptions).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Honeycutt, Li and Kaszczuk with teachings of Vanreusel by modifying voice assignment for text-to-speech output as taught by Honeycutt to include wherein the generated acoustic features include accent acoustic features and the generated speech signal further simulate a voice of the identified speaker in a language and in an accent as taught by Vanreusel for the benefit of improving synthesis of words using personalized and culturally correct phonetic transcription (Vanreusel [0017]).

Claims 10 and 20,
Vanreusel further teaches the system of claim 9, wherein the accent corresponds to a native accent of the identified speaker ([0027] user accent classification).

Allowable Subject Matter
Claims 21 and 23 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHREYANS A PATEL whose telephone number is (571)270-0689. The examiner can normally be reached Monday-Friday 8am-5pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

SHREYANS A. PATEL
Examiner
Art Unit 2657



/SHREYANS A PATEL/Examiner, Art Unit 2656