DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter

Claims 1-2, 4-7, 9-12 and 14-15 are allowed.
The following is an examiner’s statement of reasons for allowance: Applicant filed correspondence dated 5/5/2022 in response to the Office Action mailed 3/7/2022 was considered. Applicant claims a speech synthesis method/device/non-transitory computer readable medium performing the method, comprising: inputting text information into an encoder of an acoustic model, to output a text feature of a current time step; splicing the text feature of the current time step with a spectral feature of a previous time step to obtain a spliced feature of the current time step, and inputting the spliced feature of the current time step into an decoder of the acoustic model to obtain a spectral feature of the current time step; and inputting the spectral feature of the current time step into a neural network vocoder, to output speech, wherein the splicing the text feature of the current time step with a spectral feature of a previous time step to obtain a spliced feature of the current time step, and inputting the spliced feature of the current time step into an decoder of the acoustic model to obtain a spectral feature of the current time step comprises: inputting the spliced feature of the previous time step into at least one gated recurrent unit and a fully connected layer in the decoder, to output a first spectral feature of the previous time step; inputting the first spectral feature of the previous time step into another fully connected layer, to obtain a second spectral feature of the previous time step; splicing the text feature of the current time step with the second spectral feature of the previous time step, to obtain the spliced feature of the current time step; and inputting the spliced feature of the current time step into the decoder of the acoustic model, to obtain a first spectral feature of the current time step. The prior art of Lee et al., or Mandal et al., or Arik et al., alone or in combination, fail to teach or disclose the claimed combination of features the claimed combination of features, especially, “wherein the splicing the text feature of the current time step with a spectral feature of a previous time step to obtain a spliced feature of the current time step, and inputting the spliced feature of the current time step into an decoder of the acoustic model to obtain a spectral feature of the current time step comprises: inputting the spliced feature of the previous time step into at least one gated recurrent unit and a fully connected layer in the decoder, to output a first spectral feature of the previous time step; inputting the first spectral feature of the previous time step into another fully connected layer, to obtain a second spectral feature of the previous time step; splicing the text feature of the current time step with the second spectral feature of the previous time step, to obtain the spliced feature of the current time step; and inputting the spliced feature of the current time step into the decoder of the acoustic model, to obtain a first spectral feature of the current time step.” Therefore, claims 1-2, 4-7, 9-12 and 14-15 are deemed allowable over cited prior art of record.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Response to Arguments

Applicant’s arguments, see page 7 of the response, filed 5/5/2022, with respect to claims 1-2, 4-7, 9-12 and 14-15, have been fully considered and are persuasive.  The 35 U.S.C. 103 rejection of claims 1-2, 4-7, 9-12 and 14-15 has been withdrawn. 

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.
Donovan et al., (US 6,266,637 B1) teach a method and system for providing generation of speech includes the steps of providing input to be acoustically produced, comparing the input to training data or application specific splice files to identify one of words and word sequences corresponding to the input for constructing a phone sequence, using a search algorithm to identify a segment sequence to construct output speech according to the phone sequence and concatenating segments and modifying characteristics of the segments to be substantially equal to requested characteristics. Application specific data is advantageously used to make pertinent information available to synthesize both the phone sequence and the output speech.
Huang et al., (5,913,193 A) teach a concatenative speech synthesis system and method which produces a more natural sounding speech. The system provides for multiple instances of each acoustic unit which can be used to generate a speech waveform representing an linguistic expression. The multiple instances are formed during an analysis or training phase of the synthesis process and are limited to a robust representation of the highest probability instances. The provision of multiple instances enables the synthesizer to select the instance which closely resembles the desired instance thereby eliminating the need to alter the stored instance to match the desired instance. This in essence minimizes the spectral distortion between the boundaries of adjacent instances thereby producing more natural sounding speech.
Bakis et al., (US 7,761,296 B1) teach a system and method for rescoring the N-best hypotheses from an automatic speech recognition system by comparing an original speech waveform to synthetic speech waveforms that are generated for each text sequence of the N-best hypotheses. A distance is calculated from the original speech waveform to each of the synthesized waveforms, and the text associated with the synthesized waveform that is determined to be closest to the original waveform is selected as the final hypothesis. The original waveform and each synthesized waveform are aligned to a corresponding text sequence on a phoneme level. The mean of the feature vectors which align to each phoneme is computed for the original waveform as well as for each of the synthesized hypotheses. The distance of a synthesized hypothesis to the original speech signal is then computed as the sum over all phonemes in the hypothesis of the Euclidean distance between the means of the feature vectors of the frames aligning to that phoneme for the original and the synthesized signals. The text of the hypothesis which is closest under the above metric to the original waveform is chosen as the final system output.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VIJAY B CHAWAN whose telephone number is (571)272-7601. The examiner can normally be reached 7-5 Monday thru Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/VIJAY B CHAWAN/Primary Examiner, Art Unit 2658