DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the amendment filed July 27, 2022.  Claims 1, 6, and 11 have been amended.  Claims 2, 3, 7, 8, 12, and 13 are cancelled.  Claims 1, 4-6, 9-11, and 14-15 remain pending.

Priority
Should applicant desire to obtain the benefit of foreign priority under 35 U.S.C. 119(a)-(d) prior to declaration of an interference, a certified English translation of the foreign application must be submitted in reply to this action. 37 CFR 41.154(b) and 41.202(e).
 Failure to provide a certified translation may result in no benefit being accorded for the non-English application.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on May is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 4-6, 9-11 and 14-15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation “a first-to-be-evaluated speech synthesis model” in line 2 and additionally recites “a first-to-be-evaluated speech synthesis model”  in line 4.  It is unclear if the recitation at line 4 refers to a new and different first-to-be-evaluated speech synthesis model or refers to the first-to-be-evaluated speech synthesis model of line 2.  Claims 4-5 are similarly rejected by dependency.
Claim 6 recites the limitation “a first-to-be-evaluated speech synthesis model” in line 8 and additionally recites “a first-to-be-evaluated speech synthesis model”  in line 10.  It is unclear if the recitation at line 10 refers to a new and different first-to-be-evaluated speech synthesis model or refers to the first-to-be-evaluated speech synthesis model of line 8.  Claims 9-10 are similarly rejected by dependency.
Claim 11 recites the limitation “a first-to-be-evaluated speech synthesis model” in line 4 and additionally recites “a first-to-be-evaluated speech synthesis model”  in line 6.  It is unclear if the recitation at line 6 refers to a new and different first-to-be-evaluated speech synthesis model or refers to the first-to-be-evaluated speech synthesis model of line 4.  Claims 14-15 are similarly rejected by dependency.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 4-6, 9-11 and 14-15 are directed to non-statutory subject matter because the claims as a whole, considering all claim elements both individually and in combination, do not amount to significantly more than an abstract idea.
The independent claims 1, 6 and 11 are directed to the abstract idea of:
“A model evaluation method, applied to evaluate a text-to-speech system comprising a first-to-be-evaluated speech synthesis model and a second to-be-evaluated speech synthesis model, the method comprising: obtaining M first audio signals synthesized by using a first to-be-evaluated speech synthesis model, and obtaining N second audio signals generated through recording; performing voiceprint extraction on each of the M first audio signals to obtain M first voiceprint features, and performing voiceprint extraction on each of the N second audio signals to obtain N second voiceprint features; clustering the M first voiceprint features to obtain K first central features, and clustering the N second voiceprint features to obtain J second central features; counting cosine distances between the K first central features and the J second central features to obtain a first distance; and evaluating the first to-be-evaluated speech synthesis model based on the first distance; wherein M, N, K and J are positive integers greater than 1, M is greater than K, and N is greater than J; wherein the step of counting the cosine distances between the K first central features and the J second central features to obtain the first distance comprises: for every first central feature, calculating the cosine distance between the first central feature and each of the second central features to obtain J cosine distances corresponding to the first central feature, and calculating a sum of the J cosine distances corresponding to the first central feature to obtain a cosine distance sum corresponding to the first central feature; and calculating a sum of the cosine distance sums corresponding to the K first central features to obtain the first distance; wherein the step of evaluating the first to-be-evaluated speech synthesis model based on the first distance comprises: in the case where the first distance is smaller than a first preset threshold, determining that the evaluation of the first to-be-evaluated speech synthesis model is successful; and in the case where the first distance is greater than or equal to the first preset threshold, determining that the evaluation of the first to-be-evaluated speech synthesis model is not successful.”

The limitation of “obtaining …”, “performing …”, “clustering …”, “counting …’, “evaluating …”, “calculating…”,  under broadest reasonable interpretation, as drafted, covers a human performing mental processes and utilizing relevant concepts. An ordinary person skilled in the art would be able to obtain audio signals from the speech synthesis model as well as the from recorded medium. Obtaining a Synthesized audio signal, under broadest reasonable interpretation, is nothing more than a human listening to a text-to-speech audio and storing it mentally. Similarly, obtaining audio signals generated through recording is nothing more that listening to audio which is pre-recoded and storing it to memory mentally. Limitation of performing a voiceprint extraction is nothing more than performing a mental analysis on the audio signal or an audio signal image such as a spectrogram and noting down acoustic characteristics such as tone change, linguistic preference, loudness, emotions, etc. As for the limitation of clustering the voiceprint features is simply grouping that extracted feature in particular classifications which a human would be able to perform either mentally or by using pen and paper. Furthermore, the limitation of getting the central feature from the clustering is nothing more than getting an average for set grouping which a basic mathematical concept and can be performed by a person skilled in the art. The limitation on counting the cosine distance is a well-known mathematical concept in the art used to get the similarity/ confidence value between two feature/ vector. Thus, this limitation can also be performed by human via utilizing relevant mathematic concepts.  The limitation of evaluating the speech synthesis is nothing more than checking the cosine distance and examining if the similarity score is above certain threshold.  The steps for counting the cosine distances and obtaining the first distance, are directed towards the abstract idea of cosine distance calculation process. Thus, the claim is aimed towards nothing more than a mathematical concept being computed which a human would be able to perform using a pen and paper.  The steps for evaluating the speech synthesis model are directed towards the abstract idea of evaluating the speech synthesis model based on cosine distance found and deciding if the speech synthesis is successful or not. The evaluation process shown in the claim is simply comparing the cosine distance found with a threshold value to decide if the speech synthesis was successful or not. This evaluation of comparing a value with a threshold is something a human would be able to perform by doing mental processes and deciding if the speech synthesis was successful or not. 
The judicial exception is not integrated into a practical application. In particular claim 6 and 11 recites additional elements – “text-to-speech system”, “processor”, “memory”, “non-transitory computer-readable storage medium”. All of these elements are cited at a very high level of generality and do not add meaningful limits to the abstract ideas being performed.  The claims recite the model evaluation is applied to the generic/general purpose computer TTS system, but the claims do not recite or include any meaningful limits to the abstract ideas that are performed to provide any elements or limits in “applying” the evaluation to the generic/general purpose computer TTS system.   The additional element, “processor”, is used to perform generic abstract idea which can be performed by a human. Furthermore, a processor is considered as a generic element that can be found in most computer devices. As for the additional element, “memory”, present within the claim, it adds no meaningful limits to the claim and is only used for data and instruction storing purposes. Similar to a processor, the element of memory is considered as generic and is known to be found in most computer devices. Lastly, the additional element, “non-transitory computer-readable storage medium” (CRM), is considered as a well-known generic component used conventionally in most of the generic computer devices. Also, it is known that CRM or computer-implementation of an abstract idea is not a factor that weighs in favor of patentability under subject matter eligibility. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
The dependent claims 4, 9 and 14 are directed towards the abstract idea of repeating the same limitation as claim 1 which cover: “obtaining …”, “performing …”, and “clustering …”, “counting …”, and “evaluating ...”. All of these limitations are performing identical function but for different set of audio signals. Therefore, all of them would be identified as abstract idea for the same reasoning as presented for claim 1, 6, and 11 above. 
The dependent claims 5, 10, and 15 are directed towards the abstract idea of specifying that cosine distance between two first central feature is greater than a threshold and cosine distance of two second central features is greater than a certain threshold. This limitation is only used to set a rule for the calculation model of cosine distance and add no limits that can make it so that it could not be used/performed by human using relevant mathematical computations on paper.
Dependent claims 4-5, 9-10, and 14-15 do not impose the judicial exception being integrated into a practical application and further fails to include additional elements that are sufficient to amount to significantly more than the judicial.

Response to Arguments

Applicant's arguments filed July 27, 2022 with respect to the rejections under 35 USC § 101 have been fully considered but they are not persuasive. 

Applicant argues a human cannot practically perform voiceprint extraction, clustering, counting/calculating cosine distances, since the human mind is not equipped to extract voiceprints and analyze the features with an extracted voiceprint as applied to evaluation of a TTS system.  The Examiner respectfully disagrees.  As indicated in the rejection above, voiceprint extraction is nothing more than performing a mental analysis on the audio signal or an audio signal image such as a spectrogram and noting down acoustic characteristics such as tone change, linguistic preference, loudness, emotions.  A human can mentally extract the acoustic characteristics that are heard or observed, and using pen and paper analyze the characteristics and organize the features using numerical values to obtain central features, where the central features can be used in the cosine distance processing.  
Applicant argues the claim is not directed to the abstract idea of mathematical calculation.  The Examiner notes, as indicated in the rejection above, the claims, given the broadest reasonable interpretation, covers performance of limitations in the mind but for the recitation of generic/general purpose components, and falls within the “Mental Processes” grouping of abstract ideas.  While certain limitations, after analysis, have been indicated to be mathematical calculations, the limitations are calculations that can be performed by a human mental process or by the human using pen and paper.
Applicant argues the claims recite a practical application of the alleged abstract idea.   The Examiner respectfully disagrees.  As indicated in the rejection above, the claims do not recite or include any meaningful limits to the abstract ideas that are performed to provide any elements or limits in “applying” the evaluation to the generic/general purpose computer TTS system.   Specifically, although the preamble recites the method is applied to evaluate a text-to-speech system comprising a first-to-be-evaluated speech synthesis model and a second to-be-evaluated speech synthesis model, the claims do not recite any features for applying the abstract idea.  The limitation of “applying” does include any meaningful limits to provide an application of the abstract idea or any application that would specifically provide an indication of an improvement to the system.



Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 


Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGELA A ARMSTRONG whose telephone number is (571)272-7598. The examiner can normally be reached M,T,TH,F 11:30-8:00.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


ANGELA A. ARMSTRONG
Primary Examiner
Art Unit 2659



/ANGELA A ARMSTRONG/Primary Examiner, Art Unit 2659