EXAMINER'S STATEMENT OF REASONS FOR ALLOWANCE

The following is an examiner’s statement of reasons for allowance:
Independent claims 5 and 13 are allowable because the prior art of record does not disclose or reasonably suggest a computer-implemented method and system comprising receiving first input data representing a vocal characteristic and a request to output content corresponds to the vocal characteristic, performing natural language understanding (NLU) understanding using the first input data to determine an intent to perform speech synthesis corresponding to the vocal characteristic represented in a portion of the first input data, based at least in part on determining the intent to perform speech synthesis corresponding to the vocal characteristic represented in the portion of the first input data, processing the first input data to determine vocal characteristic data representing at least the vocal characteristic, determining using a trained model and the vocal characteristic data a model output data, receiving second input data corresponding to a speech synthesis task, determining using an encoder and the second input data encoded data, and determining using a decoder the model output data and the encoded data synthesized speech data corresponding to the vocal characteristic.
Independent claim 1 presents similar limitations to independent claims 5 and 13, but is narrower in scope directed to using natural language processing to determine an intent to perform speech synthesis using a description of a speaking style indicated in input text data.  Applicants’ non-elected claims 1 to 4 are being rejoined pursuant to MPEP §821.04.  

The Specification, ¶[0018], describes embodiments where a user may provide a description of a voice by input data that requests, “Generate speech that sounds like a 40-year-old news anchor” or “Generate speech that sounds like this: ‘Hasta la vista, baby’”, where the phrase ‘Hasta la vista, baby’ is spoken in an accent of Arnold Schwarzenegger.  Similarly, Applicants’ Specification, ¶[0048] - ¶[0049], describes embodiments of a natural language understanding component that determines an intent of input data that is a vocal description including a phrase of “sounds like a professor”, “distinguished”, “pirate”, “received pronunciation accent”, “sound like Elmo from Sesame Street”, or “childlike”.  These embodiments, then, use natural language processing of first input data to determine an intent to perform speech synthesis according to a natural language description of a desired speaking style for speech synthesis.  However, the prior art of record does not clearly disclose or reasonably suggest performing speech synthesis in this manner using natural language descriptions of a desired speaking style 
Mainly, the prior art of record matches a style for speech synthesis to a speaking style of a user or enables a user to select a speech style for speech synthesis from a list.  Sakai (U.S. Patent Publication 2002/0055843) only discloses selecting a voice from a list as a request to output content, but does not process this with natural language understanding to determine an intent to perform speech synthesis with a vocal characteristic using first input data.  Ye et al. (U.S. Patent Publication 2020/0012675) teaches analyzing a voice request using natural language processing for a more generic application of requesting a media resource, but is not directed to a problem of requesting a speech style for speech synthesis using natural language input.  Moreover, Sakai and Ye et al. do not disclose or teach an encoder and a decoder using a trained model as is known to produce speech synthesis with neural networks.  Even if it is known in the prior art to perform speech synthesis with an encoder, a decoder, and a trained model using a neural network as taught by Chae (U.S. Patent Publication 2020/0005764), there is no reasonable combination to address all of the limitations of the claims that include performing natural language processing using first input data to determine an intent to perform speech synthesis corresponding to a vocal characteristic of the first input data.  Boss et al. (U.S. Patent 5,933,805) discloses selection of a voice font for speech synthesis using a voice of, e.g., Arnold Schwarzenegger, but does not select this voice font using natural language input.  
Any comments considered necessary by Applicants must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        February 14, 2022