DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the office action from 7/9/2021, the applicant has submitted an amendment, filed 10/5/2021, amending independent claims 1, 9, and 17, dependent claims 5, 6, 13, and 14, cancelling claims 4, 12, and 20, while arguing to traverse the prior art rejections. Applicant’s arguments have been fully considered but as they were not determined persuasive, in alternate the examiner recommended an examiner’s amendment to place the case into condition for allowance by absorbing the dependent claim 3 which was determined allowable in the first action. Therefore claims 1-2, 5-10, 13-18 are allowable over the prior art of record for the below provided reasons for allowance.
EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
the attorney on file Mr. Yalei Sun on 12/10/2021.
Amend claims 1, 9, 17; Cancel claims 3, 11, 19.

As Per Claim 1:
(Currently Amended) A speech synthesis method performed at a server having one or more processors and memory storing a plurality of programs to be executed by the one or more processors, the method comprising:
obtaining, by the server, to-be-converted text information;
processing, by the server, the to-be-converted text information to obtain a corresponding text feature sequence;
obtaining, by the server, an initialized speech sample point and some text feature sample points in the text feature sequence, and matching the initialized speech sample point with the text feature sample points to form an initialized vector matrix;
inputting, by the server, the initialized vector matrix into a target statistical parameter model, to obtain a prediction speech sample point sequence corresponding to the text feature sequence, wherein the target statistical parameter model is generated by:
obtaining model training data, the model training data comprising a training text feature sequence and a corresponding training original speech sample sequence, 
generating an original vector matrix formed by matching a text feature sample point in the training text feature sample sequence with a speech sample point in the training original speech sample sequence; 
inputting the original vector matrix into a statistical parameter model for training, 
performing non-linear mapping calculation on the original vector matrix in a hidden layer of the statistical parameter model, 

determining the target statistical parameter model by updating the statistical parameter model according to the training prediction speech sample point and a corresponding training original speech sample point by using a smallest difference principle; 
the inputting further including:

inputting, by the server, the initialized vector matrix into the target statistical parameter model, to obtain a first prediction speech sample point;
using, by the server, the first prediction speech sample point as a current prediction sample point, obtaining a target text feature sample point corresponding to the current prediction speech sample point from the text feature sequence, matching the target text feature sample point with the current prediction speech sample point to form a vector pair, and adding the vector pair to the initialized vector matrix to obtain an updated vector matrix; and
repeatedly performing the step of obtaining the target text feature sample point and matching the target text feature sample point with the current prediction speech sample point to form a vector pair, until all text feature sample points in the text feature sequence have corresponding prediction speech sample points, the prediction speech sample points forming a prediction speech sample point sequence; and
outputting, by the server, synthesized speech corresponding to the to-be-converted text information according to the prediction speech sample point sequence.

As Per Claim 3:
3 	 (Canceled) 

As Per claim 9:

obtaining, by the server, to-be-converted text information;
processing, by the server, the to-be-converted text information to obtain a corresponding text feature sequence;
obtaining, by the server, an initialized speech sample point and some text feature sample points in the text feature sequence, and matching the initialized speech sample point with the text feature sample points to form an initialized vector matrix;
inputting, by the server, the initialized vector matrix into a target statistical parameter model, to obtain a prediction speech sample point sequence corresponding to the text feature sequence, wherein the target statistical parameter model is generated by:
obtaining model training data, the model training data comprising a training text feature sequence and a corresponding training original speech sample sequence, 
generating an original vector matrix formed by matching a text feature sample point in the training text feature sample sequence with a speech sample point in the training original speech sample sequence; 
inputting the original vector matrix into a statistical parameter model for training, 
performing non-linear mapping calculation on the original vector matrix in a hidden layer of the statistical parameter model, 
outputting a corresponding training prediction speech sample point from the statistical parameter model, and 
determining the target statistical parameter model by updating the statistical parameter model according to the training prediction speech sample point and a corresponding training original speech sample point by using a smallest difference principle; 
the inputting further including:
inputting, by the server, the initialized vector matrix into the target statistical parameter model, to obtain a first prediction speech sample point;
using, by the server, the first prediction speech sample point as a current prediction sample point, obtaining a target text feature sample point corresponding to the current prediction speech sample point from the text feature sequence, matching the target text feature sample point with the current prediction speech sample point to form a vector pair, and adding the vector pair to the initialized vector matrix to obtain an updated vector matrix; and
repeatedly performing the step of obtaining a target text feature sample point and matching the target text feature sample point with the current prediction speech sample point to form a vector pair, until all text feature sample points in the text feature sequence have corresponding prediction speech sample points, the prediction speech sample points forming a prediction speech sample point sequence; and
outputting, by the server, synthesized speech corresponding to the to-be-converted text information according to the prediction speech sample point sequence.
As Per Claim 11:

11	 (Canceled) 

	As Per Claim 17:
17	A non-volatile computer readable storage medium, storing a plurality of computer-readable instructions that, when executed by one or more processors of a server, cause the server to perform the following steps:
obtaining, by the server, to-be-converted text information;
processing, by the server, the to-be-converted text information to obtain a corresponding text feature sequence;
obtaining, by the server, an initialized speech sample point and some text feature sample points in the text feature sequence, and matching the initialized speech sample point with the text feature sample points to form an initialized vector matrix;

obtaining model training data, the model training data comprising a training text feature sequence and a corresponding training original speech sample sequence, 
generating an original vector matrix formed by matching a text feature sample point in the training text feature sample sequence with a speech sample point in the training original speech sample sequence; 
inputting the original vector matrix into a statistical parameter model for training, 
performing non-linear mapping calculation on the original vector matrix in a hidden layer of the statistical parameter model, 
outputting a corresponding training prediction speech sample point from the statistical parameter model, and 
determining the target statistical parameter model by updating the statistical parameter model according to the training prediction speech sample point and a corresponding training original speech sample point by using a smallest difference principle; 
the inputting further including:
inputting, by the server, the initialized vector matrix into the target statistical parameter model, to obtain a first prediction speech sample point;
using, by the server, the first prediction speech sample point as a current prediction sample point, obtaining a target text feature sample point corresponding to the current prediction speech sample point from the text feature sequence, matching the target text feature sample point with the current prediction speech sample point to form a vector pair, and adding the vector pair to the initialized vector matrix to obtain an updated vector matrix; and
repeatedly performing the step of obtaining the target text feature sample point and matching the target text feature sample point with the current prediction speech sample point to form a vector pair, until all text feature sample points in the text feature sequence have corresponding prediction speech sample points, the prediction speech sample points forming a prediction speech sample point sequence; and
outputting, by the server, synthesized speech corresponding to the to-be-converted text information according to the prediction speech sample point sequence.

As Per Claim 19:
19	 (Canceled).

Allowable Subject Matter
The following is an examiner’s statement of reasons for allowance: The independent claims 1, 9 and 17 teach a method, system and non-volatile computer readable storage medium employing neural networks or machine learning to do speech synthesis. Initially a “text” which is “to be converted” to speech is inputted. It generates corresponding “text feature sequence” (e.g. “statement” and/or “word segmentation” by “divid[ing] a paragraph into corresponding statements” and further “word segmentation” of the “statements” (specification ¶ 0133 last 6 lines), and/or performs “part of speech tagging” (“tagged” as one of a noun, verb, adjective” (specification ¶ 0135), and/or “group a word segment as a prosodic word” “phrase, or an intonational phrase” and/or “pause”, “heteronym” “erhua” and/or “neutral tone” (specification ¶ 0137).
It then determines an “original speech sample sequence” “corresponding” to the “text feature sequence”. At this point the “original speech sample sequence” and the 
For further improvement,  the “prediction speech sample point” is treated as a “first” or “current”  “prediction speech sample point”, from which a “target text feature sample point” is “obtain[ed]”. The “current” “prediction speech sample point” and the “target text feature sample point” are “matched” to form a new “vector pair”, or “update” the “vector matrix”. This is a new iteration, and after this is done, another iteration begins until all the “text feature sequenc[es]” “have corresponding prediction speech sample points”.
Finally “synthesized speech” according to the final “prediction speech sample point sequence” (corresponding “to the to-be converted text information”) is “outputted”.
nd column lines 5+: “a distortion measure” (a difference) “for minimization” (using a smallest difference principle between “O[Pj(k)]” “outputs of the prosodic information synthesizer” (prediction speech sample point or input text feature sample point sequence))  and “T[Pj(k)]” “corresponding desired target values” (original speech sample point)). Here therefore the pair “O[pj(k)]” and “T[pj(k)]” are analogous to the claim’s “vector pair” of the “vector matrix”, and they are labeled as “nodes” (page 231 2nd column line 1), which are applied a “sigmoid activation function” (a non-linear mapping calculation) “to generate the eight prosodic parameters” (to output e.g. a corresponding prediction speech sample point. Finally according to page 232 first column lines 5-2 above §IV: “well trained RNN” “used as a prosody synthesizer” (synthesizing speech) “for generating proper prosodic parameters” (according to prediction speech sample points thus obtained) “for given input text”.
Chen et al. though does not teach using the obtained “prediction speech sample point” here, it generates a new “target text feature sample point”, and forms a new “vector pair” and thus “vector matrix” and repeats all the above steps. In plain language the pair “O[pj(k)]” and “T[pj(k)]” are not updated and to the updated pair the “sigmoid 
Further search did not produce any prior art teaching this phenomenon. Therefore claims 1, 9 and 17 became allowable. Claims 2, 5-8 (dependent on claim 1), 10, 13-16 (dependent on claim 9), 18 (dependent on claim 17), further narrow the scope of their allowed parent claims and are thus allowable under similar rationale.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARZAD KAZEMINEZHAD whose telephone number is (571)270-5860. The examiner can normally be reached 10:30 am to 11:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DANIEL C WASHBURN can be reached on (571)272-5551. The fax phone 
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Farzad Kazeminezhad/
Art Unit 2657
December 14th 2021.