Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings were received on 8/27/2019.  These drawings are accepted.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 4 recites the limitation "the first electronic device" in claim 1.  There is insufficient antecedent basis for this limitation in the claim. There are two instances of “a first electronic device”, one mentioned in claim 1’s preamble and the second mentioned in claim 4’s preamble. Which is the limitation “the first electronic device” referencing?

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1,3-7,9-12,17,18,21-22 is/are rejected under 35 U.S.C. 102a2 as being anticipated by Bengio et al (US Publication No.: 20190311708).
	Claim 1, Bengio et al discloses
	obtain text (Fig. 1, label 104);
generate a plurality of segments of a spectrogram using a first neural network (Fig. 1, label 106 shows the first neural network. Paragraph 18-21 discloses “The attention-based decoder recurrent neural network 118 (herein refereed to as “the decoder neural network 118”) is configured to receive a sequence of decoder inputs. … the decoder neural network 118 is configured to process the decoder input and the encoded representations generated by the encoder CBHG neural network 116 to generate multiple frames of the spectrogram of the sequence of characters.”), each spectrogram segment of the plurality of spectrogram segments representing a portion of the obtained text (Paragraph 21 discloses “For each decoder input in the sequence, the decoder neural network 118 is configured to process the decoder input and the encoded representations generated by the encoder CBHG neural network 116 to generate multiple frames of the spectrogram of the sequence of characters.”);

provide the plurality of speech segments as a speech output (Fig. 1, label 102,120, paragraph 29 discloses generating speech from the waveform and playing back speech.).
	 Claim 3, Bengio et al discloses the text is a textual representation of a desired speech output. (Fig. 1, label 104, Fig. 3, label 302 is a textual representation of the desired speech output, label 120 of Fig. 1, label 308 of Fig. 3.)
	Claim 4, Bengio et al discloses determine, using the first neural network, an order for the plurality of spectrogram segments. (Paragraph 21 discloses generating r frames of the spectrogram, wherein r is greater than 1. Paragraph 23 discloses “By generating r frames at each time step, the decoder neural network 118 divides the total number of decoder steps by r, thus reducing model size, training time and inference time. Additionally, this technique substantially increases convergence speech, i.e., because it results in a much faster (and more stable) alignment between frames and encoded 
	Claim 5, Bengio et al discloses the order of the plurality of spectrogram segments is determined by processing the text using the first neural network prior to generating the plurality of spectrogram segments. (paragraph 18,19 discloses processing the text or sequence of characters is performed by label 102,114. The processing of the text or sequence of characters is used to generate the plurality of frames of spectrogram outputted by label 118, which indicates the order of the multiple frames of the spectrogram is determined prior to generation of the multiple frames.)
	Claim 6, Bengio et al discloses each spectrogram segment of the plurality of spectrogram segments is generated in the determined order. (Paragraph 23 discloses “By generating r frames at each time step, the decoder neural network 118 divides the total number of decoder steps by r, thus reducing model size, training time and inference time. Additionally, this technique substantially increases convergence speech, i.e., because it results in a much faster (and more stable) alignment between frames and encoded representations as learned by the attention mechanism. This is because neighboring speech frames are correlated and each character usually corresponds to multiple frames. Emitting multiple frames at a time step allows the decoder neural network 118 to leverage this quality to quickly learn how to, i.e., be trained to, efficiently attend to the encoded representations during training.”)

Claim 10, Bengio et al discloses the plurality of spectrogram segments and the plurality of speech segments are generated at least partially in parallel. (Paragraph 23 discloses “… neighboring speech frames are correlated and each character usually corresponds to multiple frames. Emitting multiple frames at time step allows the decoder neural network 118 to leverage this quality to quickly learn how to, i.e., be trained to, efficiently attend to the encoded representations during training.” Such 
	Claim 11, Bengio et al discloses 
	generating the plurality of spectrogram segments using the first neural network (Fig. 1, label 106) further comprises generating a first spectrogram segment of the plurality of spectrogram segments (Paragraph 21 discloses generation of r frames of the spectrogram performed by Fig. 1, label 106.);
	generating, based on the plurality of spectrogram segments, the plurality of speech segments using the second neural network (Fig. 1, label 108,110) further comprises generating, based on the first spectrogram segment, a first speech segment of the plurality of speech segments (Paragraphs 21,24,25,28, 56 discloses speech segments or sounds of the waveform or verbal utterance of the characters is outputted at label 120. Paragraph 23 discloses “… neighboring speech frames are correlated and each character usually corresponds to multiple frames”, wherein the multiple frames are r frames of the spectrogram. Such indicates generation of the speech frames or sounds of the waveform or verbal utterance of the characters is performed based on the frames of the spectrogram, which includes the first spectrogram segment.); and
	wherein generation of the first spectrogram segment and generation of the first speech segment at least partially overlap (Paragraph 23 discloses “… neighboring speech frames are correlated and each character usually corresponds to multiple frames”, wherein the multiple frames are r frames of the spectrogram. Such indicates partial overlap in generation of the first spectrogram segment and second speech segment.). 
	Claim 12, generating based on the plurality of spectrogram segments, a plurality of speech segments using the second neural network (Fig. 1, label 108,110 generates the 
Claim 17, Bengio et al discloses providing the plurality of speech segments as a speech output further comprises providing one or more speech segments of the plurality of speech segments concurrently as one or more speech segments are generated (Fig. 1, label 120 provides the speech generated by Fig. 1, label 150,120. Paragraph 29 discloses label 102 generates speech from the waveform, wherein the speech is provided for playback via a user device. Such indicates concurrently outputting or playback as speech is generated. Paragraph 21,23,24,25,28,56’s disclosure indicates a plurality of speech frames or speech segments are generated as explained in claim 1.). 
Claim 18, Bengio et al discloses the plurality of speech segments are generated in a predetermined order based on the text. (Paragraph 23 discloses “By generating r frames at each time step, the decoder neural network 118 divides the total number of decoder steps by r, thus reducing model size, training time and inference time. alignment between frames and encoded representations as learned by the attention mechanism. This is because neighboring speech frames are correlated and each character usually corresponds to multiple frames.” paragraph 28 discloses label 110 generates a waveform of the verbal utterance of the input sequence of characters in the particular natural language.  The highlighted portion and paragraph 28 indicates the speech segments or speech frames are generated in a predetermined order based on the characters or text. )
Claim 21, Bengio et al discloses
one or more processors (paragraph 58,59);
a memory (paragraph 64); and
one or more programs (paragraph 58), wherein the one or more programs are stored in the memory (paragraph 64 discloses a computer readable medium storing computer program instructions. Paragraph 58 discloses the embodiments can be implemented with one or more programs.) and configured to be executed by the one or more processors (Paragraph 58 discloses execution of the programs on a data processing apparatus. Paragraph 59 discloses the data processing apparatus as one or more processors.), the one or more programs including instructions for: 
	obtaining text (Fig. 1, label 104);
generating a plurality of segments of a spectrogram using a first neural network (Fig. 1, label 106 shows the first neural network. Paragraph 18-21 discloses “The attention-based decoder recurrent neural network 118 (herein referred to as “the decoder neural network 118”) is configured to receive a sequence of decoder inputs. … 
generating, based on the plurality of spectrogram segments, a plurality of speech segments using a second neural network (Fig. 1, label 108,110 generates the audio waveform. Paragraph 21,23,24,25,28 discloses generating a waveform of the verbal utterance of the input sequence of characters in the particular natural language based on the compressed spectrogram. The compressed spectrogram is generated based on the r frames of the spectrogram. Such indicates the waveform includes a plurality of speech segments since the prior art discloses “a waveform of verbal utterance of the input sequence of characters”. Paragraph 56 further discloses “The system then generates speech using the waveform i.e., generates the sounds that are represented by the waveform (step 408).” Such indicates label 120 includes a plurality of speech segments or sounds that are represented by the waveform or verbal utterance of the input sequence of characters in the particular natural language.); and
providing the plurality of speech segments as a speech output (Fig. 1, label 102,120, paragraph 29 discloses generating speech from the waveform and playing back speech.).
Claim 22, Bengio et al discloses

	obtain text (Fig. 1, label 104);
generate a plurality of segments of a spectrogram using a first neural network (Fig. 1, label 106 shows the first neural network. Paragraph 18-21 discloses “The attention-based decoder recurrent neural network 118 (herein refereed to as “the decoder neural network 118”) is configured to receive a sequence of decoder inputs. … the decoder neural network 118 is configured to process the decoder input and the encoded representations generated by the encoder CBHG neural network 116 to generate multiple frames of the spectrogram of the sequence of characters.”), each spectrogram segment of the plurality of spectrogram segments representing a portion of the obtained text (Paragraph 21 discloses “For each decoder input in the sequence, the decoder neural network 118 is configured to process the decoder input and the encoded representations generated by the encoder CBHG neural network 116 to generate multiple frames of the spectrogram of the sequence of characters.”);
generate, based on the plurality of spectrogram segments, a plurality of speech segments using a second neural network (Fig. 1, label 108,110 generates the audio waveform. Paragraph 21,23,24,25,28 discloses generating a waveform of the verbal utterance of the input sequence of characters in the particular natural language based on the compressed spectrogram. The compressed spectrogram is generated based on the r frames of the spectrogram. Such indicates the waveform includes a plurality of speech segments since the prior art discloses “a waveform of verbal utterance of the input sequence of characters”. Paragraph 56 further discloses “The system then generates speech using the waveform i.e., generates the sounds that are represented by the 
provide the plurality of speech segments as a speech output (Fig. 1, label 102,120, paragraph 29 discloses generating speech from the waveform and playing back speech.).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bengio et al (US Publication No.: 20190311708) in view of Yu et al (US Publication No.: 20200342849).
Claim 2, Bengio et al discloses the embodiments described in the specification can be implemented in a computing system (paragraph 66). Paragraph 67 discloses the computing system can include clients and servers, wherein clients and servers are remote from each other and typically interact through a communication network. 
Yu et al discloses clients and servers text to speech system (Fig. 2, label 210,222 shows the clients and servers. Fig. 4 shows the text to speech system.), wherein the text is received from an external source (Fig. 4, label 410, Fig. 2, label 210,222. Paragraph 26 discloses the user device may receive information and/or transmit information to the platform 222 that performs text to speech as per paragraph 27.) It would be obvious to one skilled in the art to modify Bengio et al’s computing system by receiving text from an external source as disclosed by Yu et al so to provide service between clients and servers, hence improving a user’s experience by providing speech for desired text.
 
Claims 19,20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bengio et al (US Publication No.: 20190311708) in view of Wang et al (Publication Title: “Training Deep Neural Networks with 8-bit floating point numbers”).
Claim 19, Bengio et al discloses the first neural network generates the plurality of spectrogram segments (Fig. 1, label 106 shows the first neural network. Paragraph 18-21 discloses “The attention-based decoder recurrent neural network 118 (herein refereed to as “the decoder neural network 118”) is configured to receive a sequence of decoder inputs. … the decoder neural network 118 is configured to process the decoder input and the encoded representations generated by the encoder CBHG neural network 116 to generate multiple frames of the spectrogram of the sequence of characters.”), but fails to disclose the neural network is trained using 16 bit calculations.

Claim 20, Bengio et al discloses the second neural network generates the plurality of speech segments (Fig. 1, label 108,110 generates the audio waveform. Paragraph 21,24,25,28 discloses generating a waveform of the verbal utterance of the input sequence of characters in the particular natural language based on the compressed spectrogram. The compressed spectrogram is generated based on the r frames of the spectrogram. Such indicates the waveform includes a plurality of speech segments since the prior art discloses “a waveform of verbal utterance of the input sequence of characters”. Paragraph 56 further discloses “The system then generates speech using the waveform i.e., generates the sounds that are represented by the waveform (step 408).”), but fails to disclose the second neural network is trained using 16-bit calculations.
Wang et al discloses training a deep neural network using 16-bit floating point training hardware (Page 1, Section I discloses “state of the art training platforms have started to offer 16-bit floating point training hardware [8,5] with >= 4x performance over equivalent 32-bit systems.). It would be obvious to one skilled in the art before the effective filing date of the application to modify the second neural network as disclosed .

Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bengio et al (US Publication No.: 20190311708) in view of TechTarget Contributor (Publication Title: AI accelerator (https://searchenterpriseai.techtarget.com/definition/AI-accelerator)). 
Claim 8, Bengio et al discloses “computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both …” can be used to implement the logic flows and processes of the disclosure (paragraphs 62,63), but fails to disclose the second processor is a neural network accelerator. 
TechTarget Contributor discloses an AI accelerator as a microchip to enable faster processing of AI tasks. (page 1) It would be obvious to one skilled in the art before the effective filing date of the application to implement the processes and logic flows disclosed Bengio’s specification with a special processor such as an AI accelerator or neural network accelerator as disclosed by TechTarget Contributor so to enable faster processing of AI tasks.

Claim 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bengio et al (US Publication No.: 20190311708) in view of Ping et al (Publication Title: “Deep Voice 3: Scaling Text to Speech with Convolutional Sequence Learning”).

a second data associated with the second neural network (Fig. 1, label 106. Fig. 2, paragraph 30,26 discloses label 108 includes a CBHG neural network shown in Fig. 2. Fig. 2, label 200 includes label 208 paragraph 32 discloses a bank of 1-D convolutional filters 204 and training of the filters is performed using batch normalization method. Such indicates a second data or data is generated for the second neural network with a structure shown in Fig. 2 and per paragraph 30,26.) is stored in a cache of the second processor of the at least two processors (paragraph 62 discloses processes and logic 
Bengio et al fails to disclose the first data of the first neural network is a first weight of the first neural network and the second data of the first neural network is a second weight of the second neural network.
Ping et al discloses text to speech system such as Fig. 1,6. Fig. 1,6, label encoder, decoder as the first neural network. Page 5, section 3.4 Encoder discloses “The key vectors hk are used by each attention block to compute attention weights ….”) Fig. 1, label converter, Griffin-Lim, Wolrd,WaveNet, Fig. 6, label converter,Griffin-Lim,World Synthesis,wavenet as the second neural network. Section Appendices A detailed model architecture of Deep Voice 3 discloses “Weight normalization is applied to all convolution filters and fully connected layer weight matrices in the model. Fig. 6, label converter, convol block as the convolution filters. It would be obvious to one skilled in the art before the effective filing date of the application to modify Bengio et al’s text to speech system by incorporating a first weight of the first neural network and a second weight of the second neural network as disclosed by Ping et al so to improve the .

Allowable Subject Matter
Claims 13-16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Publication Titles “Myanmar text to speech system based on tacotron-2”, “Tacotron A fully end to end text to speech synthesis” discloses text to speech systems pertinent to the applicant’s disclosure.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to LINDA WONG whose telephone number is (571)272-6044.  The examiner can normally be reached on 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on (571) 272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 

/LINDA WONG/Primary Examiner, Art Unit 2656