DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
In response to the Office Action mailed September 16, 2021, applicant submitted an amendment filed on December 13, 2021, in which the applicant amended and requested reconsideration.

Response to Arguments
Applicants argue that the prior art cited fails to teach the claims as amended.  It is noted that although part of the dependent claim 6 has be incorporated into the independent claims, the claim has been further amended to include that the generating is dependent on a cost, where it was previously generated in response to the cost.  Therefore, Applicants arguments have been considered and are persuasive, but are moot in view of new grounds of rejection.  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having 


Claim(s) 1-2, 4-5, 9, 15-16 and 18-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pollet (PGPUB 2016/0093289) in view of Kim et al. (PGPUB 2015/0149178), hereinafter as Kim.

Regarding claims 1, 15 and 20, Pollet discloses a speech generation method and apparatus, hereinafter referenced as a method comprising: 
obtaining, by a processor, a linguistic feature (linguistic) and a prosodic feature from an input text (prosodic; p. 0094-0095); 
determining, by the processor, a first candidate speech element through a cost calculation (cost; p. 0017, 0036, 0047-0051) and a Viterbi search based on the linguistic feature and the prosodic feature (Viterbi; p. 0048, 0089, 0099); 
generating, at a speech element generator implemented at the processor, a second candidate speech element based on the linguistic feature or the prosodic feature and the first candidate speech element (candidates; p. 0046-0051, 0089-0099, 0102-0103); and 
outputting, by the processor, an output speech by concatenating the second candidate speech element and a speech sequence determined through the Viterbi search (p. 0033, 0043, 0052, 0069-0080), but does not specifically teach wherein the generating of the second candidate speech element being performed dependent on a cost of the first candidate speech element being greater than a preset threshold.

Therefore, it would have been obvious to one of ordinary skill of the art to modify the method as described above, to assist with generating additional speech.
Regarding claims 2 and 16, Pollet disclose a speech generation method wherein the generating of the second candidate speech element comprises: 
extracting, through the speech element generator, a content feature from a fourth candidate speech element that is different from the second candidate speech element from among the candidate speech elements (candidates; p. 0046-0051, 0089-0099, 0102-0103); and 
generating, through the speech element generator, the second candidate speech element based on the linguistic feature, the style feature, and the content feature (p. 0094-0099).  In addition, Kim discloses a method comprising:
extracting, through the speech element generator, a style feature from a third candidate speech element (style) having a smallest cost (below target cost) from among candidate speech elements (candidate speech units; p. 0031-0033). 
Regarding claims 4 and 18, Pollet discloses a speech generation method further comprising: 
generating, through the speech element generator, candidate speech elements and storing the generated candidate speech elements in a memory, in response to a cost of a candidate speech element in a first candidate speech sequence being greater than a threshold (cost; p. 0017, 0036, 0047-0051). 
claims 5 and 19, Pollet discloses a speech generation method wherein the generating of the second candidate speech element comprises: 
generating, through the speech element generator, the second candidate speech element for each phonetic transcription unit corresponding to the input text (phonetic transcription; p. 0175). 
Regarding claim 9, Pollet discloses a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the speech generation method of claim 1 (p. 0030). 
Regarding claim 21, Pollet discloses an apparatus wherein the memory is further configured to store speech elements corresponding to linguistic features and prosodic features, and the first candidate speech element is selected from the speech elements. (0094-0095). 

Claims 3, 10-14 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pollet in view of Kim and in further view of Haider et al. (PGPUB 2020/0134415), hereinafter referenced as Haider.

Regarding claims 3 and 17, Pollet in view of Kim disclose a speech generation method as described above, but does not specifically teach wherein the generating of the second candidate speech element comprises: 
generating, through the speech element generator, the second candidate speech element based on the linguistic feature and the prosodic feature, wherein the speech element generator comprises a generative adversarial network (GAN). 

generating, through the speech element generator, the second candidate speech element based on the linguistic feature and the prosodic feature, wherein the speech element generator comprises a generative adversarial network (GAN; p. 0056-0059), to generate realistic data.
Therefore, it would have been obvious to one of ordinary skill of the art to modify the method as described above, to improve the output data. 
Regarding claim 10, it is interpreted and rejected for similar reasons as set forth in claims 1, 3 and 17.  In addition, Pollet discloses a method of training a speech element generator, comprising: 
obtaining, by a processor, a linguistic feature and a prosodic feature from a training text (p. 0020-0022).  Furthermore, Haider discloses a method comprising:
calculating, by the processor, a loss value corresponding to the second candidate speech element based on the first candidate speech element and the second candidate speech element (p. 0035-0039, 0043-0047, 0053); and 
updating, by the processor, a parameter of the speech element generator based on the loss value (update; p. 0043-0047, 0053-0059). 
Regarding claim 11, it is interpreted and rejected for similar reasons as set forth in claims 1, 3 and 17.  In addition, Haider discloses a speech generation method wherein the calculating of the loss value comprises: 
calculating the loss value using the cost function, in response to the cost function being differentiable (p; 0043-0047). 
claim 12, it is interpreted and rejected for similar reasons as set forth in the claims above.  In addition, Kim discloses a speech generation method wherein the updating of the parameter of the speech element generator comprises: 
updating a parameter of a style extractor configured to extract a style feature from the second candidate speech element having a smallest cost from among candidate speech elements (below target cost; p. 0031-0033).
Furthermore, Haider discloses a method comprising updating a parameter of a content extractor configured to extract a content feature from a third candidate speech element different from the second candidate speech element from among the candidate speech elements (update; p. 0043-0047, 0053-005).  
Regarding claim 13, it is interpreted and rejected for similar reasons as set forth in claims 1, 3 and 17.  In addition, Haider discloses a speech generation method wherein the updating of the parameter of the speech element generator comprises: 
updating parameters of a generator and a discriminator in the speech element generator (update; p. 0043-0047, 0053-0059). 
Regarding claim 14, Pollet discloses a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the speech generation method of claim 10 (p. 0030). 

Claims 7-8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pollet in view of Kim and in further view of Fujita et al. (PGPUB 2008/0243511), hereinafter referenced as Fujita.
claim 7, it is interpreted and rejected for similar reasons as set forth above, however Pollet in view of Kim does not specifically teach wherein the cost is based on a weighted average of a sum of respective costs of a speech element for each of phonetic transcription units in the input text and a sum of costs associated with concatenations of the speech elements.   
Fujita discloses a speech generation method wherein the cost is based on a weighted average of a sum of respective costs (sum of weights of individual costs) of a speech element for each of phonetic transcription units in the input text (phonetic transcription of speech) and a sum of costs associated with concatenations of the speech elements (concatenation; p. 0064-0070), to obtain candidate data.
Therefore, it would have been obvious to one of ordinary skill of the art to modify the method as described above, to provide a tailored approach.
Regarding claim 8, Pollet discloses a speech generation method wherein the outputting of the output speech comprises outputting the output speech in response to the cost meeting a threshold (threshold; p. 0082, 0144). 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  This information has been detailed in the PTO 892 attached (Notice of References Cited).

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 


Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on 571.272.5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JAKIEDA R JACKSON/Primary Examiner, Art Unit 2657