DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 21-40 are pending in this application.
Claims 1-20 are canceled.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.  A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory 
The USPTO internet Web site contains terminal disclaimer forms which may be used.  Please visit http://www.uspto.gov/forms/.  The filing date of the application will determine what form should be used.  A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission.  For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.  
Claims 21-40 are rejected on the ground of nonstatutory double patenting over claims 1 and 3-9 of U.S. Patent No. 10,573,293. Although the claims at issue are not identical, they are not patentably distinct from each other because adding inherent and/or unnecessary limitations/step and rearranging the claims would be within the level of one of ordinary skill in the art. It is well settled that the insertion of an element, e.g. “provide the sequence of characters as input to the sequence-to-sequence recurrent neural network; an attention-based decoder recurrent neural network; and wherein r is an integer greater than one, wherein each of the second and subsequent decoder inputs in the sequence is one or more of the r frames of the spectrogram that were generated by processing the preceding decoder input in the sequence”, and its function is an obvious expedient if the remaining elements perform the same function as before. In re Karlson, 136 USPQ 184 (CCPA 1963). Also note Ex parte Rainu, 168 USPQ 375 (Bd. App. 1969). Insertion of a reference element or step whose function is not needed would be obvious to one of ordinary skill in the art.
Instant Application No. 16/696,101
U.S. Patent No. 10,573,293
21. A computer-implemented method for generating, from a sequence of characters in a particular natural language, a spectrogram of a verbal utterance of the sequence of characters in the particular natural language using a text-to-speech conversion system, the method comprising:
processing, using an encoder neural network of the text-to-speech conversion system, the sequence of characters to generate a respective encoded representation of each of the characters in the sequence;
receiving a sequence of decoder inputs;
for each decoder input in the sequence of decoder inputs, processing, using a decoder neural network of the text-to-speech conversion system, the decoder input and the encoded representations to generate multiple frames of the spectrogram; and
generating a waveform from the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.
23. The method of claim 22, wherein the encoder CBHG neural network comprises a bank of 1-D convolutional filters, followed by a highway network, and followed by a bidirectional recurrent neural network.
24. The method of claim 23, wherein the bidirectional recurrent neural network is a gated recurrent unit neural network.
25. The method of claim 23, wherein the encoder CBHG includes a residual connection between the transformed embeddings and outputs of the bank of 1-D convolutional filters.
26. The method of claim 23, wherein the bank of 1-D convolutional filters includes a max pooling along time layer with stride one.
27. The method of claim 21, wherein a first decoder input in the sequence is a predetermined initial frame.
28. The method of claim 21, wherein the spectrogram is a compressed spectrogram.
29. The method of claim 28, wherein the compressed spectrogram is a mel-scale spectrogram.

a sequence-to-sequence recurrent neural network configured to:
receive a sequence of characters in a particular natural language, and
process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and
a subsystem configured to:
receive the sequence of characters in the particular natural language, and
provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language, wherein the sequence-to-sequence recurrent neural network comprises:
an encoder neural network configured to:
receive the sequence of characters, and
process the sequence of characters to generate a respective encoded 
an attention-based decoder recurrent neural network configured to:
receive a sequence of decoder inputs; and
for each decoder input in the sequence:
process the decoder input and the encoded representations to generate r frames of the spectrogram, wherein r is an integer greater than one, wherein each of the second and subsequent decoder inputs in the sequence is one or more of the r frames of the spectrogram that were generated by processing the preceding decoder input in the sequence.
3. The system of claim 2, wherein the encoder CBHG neural network comprises a bank of 1-D convolutional filters, followed by a highway network, and followed by a bidirectional recurrent neural network.
4. The system of claim 3, wherein the bidirectional recurrent neural network is a gated recurrent unit neural network.
5. The system of claim 3, wherein the encoder CBHG includes a residual connection between the transformed embeddings and outputs of the bank of 1-D convolutional filters.
6. The system of claim 3, wherein the bank of 1-D convolutional filters includes a max pooling along time layer with stride one.
wherein a first decoder input in the sequence is a predetermined initial frame.
8. The system of claim 1, wherein the spectrogram is a compressed spectrogram.
9. The system of claim 8, wherein the compressed spectrogram is a mel-scale spectrogram.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:

2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 21, 27, 31, 35, and 40 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Sotelo et al., (“CHAR2WAV: END-TO-END SPEECH SYNTHESIS”, hereinafter Sotelo, Mar-10-2017) in view of Pollet et al., (US Pub. 2009/0048841, hereinafter Pollet).
Regarding claim 21, Sotelo discloses a computer-implemented method for generating, from a sequence of characters in a particular natural language, a spectrogram of a verbal utterance of the sequence of characters in the particular natural language using a text-to-speech conversion system, the method comprising: 

receiving a sequence of decoder inputs; for each decoder input in the sequence of decoder inputs, processing, using a decoder neural network of the text-to-speech conversion system, the decoder input and the encoded representations to generate multiple frames of the [spectrogram] (pp. 2, section 3 and Fig. 1, An attention-based recurrent sequence generator is a RNN that receives and processes text input in order to generate vocoder feature frames; pp. 3, Fig. 2 shows samples generated by our model and their corresponding alignments to the text); and 
generating a waveform from the [spectrogram] of the verbal utterance of the sequence of characters in the particular natural language (pp. 2, section 3 and Fig. 1, mapping from a sequence of vocoder features to corresponding audio samples to generate audio a waveform).
Sotelo does not explicitly teach, however, Pollet does explicitly teach:
[spectrogram] (Figs. 8 and 10, [0100]-[0102] an example of hybrid output speech synthesis. The top pane displays the spectrogram and the bottom pane displays the output speech signal).

Regarding claim 27, Sotelo in view of Pollet discloses the method of claim 21, and Sotelo further discloses:
wherein a first decoder input in the sequence is a predetermined initial frame (Fig. 1, pp. 2, section 3.1, decoder input is a frame).
Regarding claim 31, Sotelo in view of Pollet discloses the method of claim 21, and Sotelo further discloses:
generating speech using the waveform; and providing the generated speech for playback (pp. 2, Fig. 1, generate audio waveform).  
Regarding claims 35 and 40, Claims 35 and 40 are the corresponding system claims to method claims 21 and 31. Therefore, claims 35 and 40 are rejected using the same rationale as applied to claims 21 and 31 above.

Claims 28-30, 32-34, and 37-39 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Sotelo (“CHAR2WAV: END-TO-END SPEECH SYNTHESIS”, Mar-10-2017) in view of Pollet (US Pub. 2009/0048841) and further in view of Griffin et al., (“Signal Estimation from Modified Short-Time Fourier Transform”, hereinafter Griffin).
Regarding claim 28, Sotelo in view of Pollet discloses the method of claim 21.
Sotelo in view of Pollet does not explicitly teach, however, Griffin does explicitly teach:
wherein the spectrogram is a compressed spectrogram ([abstract] synthesizing waveform from the predicted spectrogram by estimating a signal from its modified STFT magnitude and the discrete Fourier transform (DFT) computation; pp. 238 -239, Section IV. Time-Scale Modification of Speech and Fig. 3, examples of compressed spectrogram to generate speech output).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate a system and method of speech synthesis as taught by Sotelo in view of Pollet with applying a compressed spectrogram as taught by Griffin to provide the resultant signal estimate which is clean high quality speech and the estimates produced by LSEE-MSTFTM and OA-MSTFTM were indistinguishable in listening tests (Griffin, pp. 239, left Column, 2nd paragraph).
Regarding claim 29, Sotelo in view of Pollet and further in view of Griffin discloses the method of claim 28.
Sotelo does not explicitly teach, however, Pollet does explicitly teach:
wherein the compressed spectrogram is a mel- scale spectrogram. 29 (0044] Spectrum information represented in a specific form such as MEL-LSP's, MFCC's, MEL-CEPs, harmonic components, etc.).

Regarding claim 30, Sotelo in view of Pollet and further in view of Griffin discloses the method of claim 28, and Sotelo further discloses:
processing the compressed spectrogram to generate a waveform synthesizer input; and processing, using a waveform synthesizer of the text-to-speech conversion system, the waveform synthesizer input to generate the waveform of the verbal utterance of the input sequence of characters in the particular natural language (pp. 2, section 3 and Fig. 1, an attention-based recurrent sequence generator is a RNN that receives and processes text input in order to generate Audio waveform; mapping from a sequence of vocoder features to corresponding audio samples to generate audio a waveform; pp. 3, Fig. 2 shows samples generated by our model and their corresponding alignments to the text).
Regarding claim 32, Sotelo in view of Pollet and further in view of Griffin discloses the method of claim 30, and Sotelo further discloses:
wherein the waveform synthesizer is a trainable spectrogram to waveform inverter (pp. 3, sections 4 and 5, training detail).  
Regarding claim 33, Sotelo in view of Pollet and further in view of Griffin discloses the method of claim 30, and Sotelo further discloses:
 wherein the waveform synthesizer is a vocoder (pp. 2, section 3 and Fig. 1, An attention-based recurrent sequence generator is a RNN that receives and processes text input in order to generate vocoder feature frames).  
Regarding claim 34, Sotelo in view of Pollet and further in view of Griffin discloses the method of claim 30.
Sotelo does not explicitly teach, however, Pollet does explicitly teach:

Regarding claims 37-39, Claims 37-39 are the corresponding system claims to method claims 28-30. Therefore, claims 37-39 are rejected using the same rationale as applied to claims 28-30 above.

Allowable Subject Matter
Claims 22-26 and 36 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and if rewritten or amended to overcome the rejection(s) under provisional double patent, set forth in this Office action.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEONG-AH A. SHIN whose telephone number is (571)272-5933.  The examiner can normally be reached on 9 AM-3PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


Seong-ah A. Shin
Primary Examiner
Art Unit 2659



/SEONG-AH A SHIN/Primary Examiner, Art Unit 2659