Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Should applicant desire to obtain the benefit of foreign priority under 35 U.S.C. 119(a)-(d) prior to declaration of an interference, a certified English translation of the foreign application must be submitted in reply to this action.  37 CFR 41.154(b) and 41.202(e).
Failure to provide a certified translation may result in no benefit being accorded for the non-English application.
Information Disclosure Statement
The information disclosure statement filed 08 December 2020 fails to comply with the provisions of 37 CFR 1.97, 1.98 and MPEP § 609 because documents ("ISR for PCT/CN2019?12327" and "Written opinions of ISA for PCT/CN/2019/127327") provided has not been translated to English.  It has been placed in the application file, but the information referred to therein has not been considered as to the merits.  Applicant is advised that the date of any re-submission of any item of information contained in this information disclosure statement or the submission of any missing element(s) will be the date of submission for purposes of determining compliance with the requirements based on the time of filing the statement, including all certification requirements for statements under 37 CFR 1.97(e).  See MPEP § 609.05(a).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4-5, 6-7, 9-10, 11-12, and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Biadsy (Document ID: WO-2021118604-A1) in view of Sheng (Non-Patent Literature: High-quality Speech Synthesis Using Super-resolution Mel-Spectrogram)
Even though Sheng list few inventors same as the instant application, inventor such as Evgeniy N. Pavlovskiy listed in Sheng are not present in the instant application. Thus, it is understood that Sheng has a different set of inventors compared to the instant application.
Regarding claims 1, 6 and 11, Biadsy teaches a computer-implemented speech synthesis method (Fig 1, Paragraph 0031, and Abstract), comprising steps of: 
obtaining a to-be-synthesized text (Fig 1 and Page 3, lines 26-31), and extracting one or more to-be-processed Mel spectrum features of the to-be-synthesized text through a preset speech feature extraction algorithm (Paragraph 0016, show reference audio feature of the text; also Page 3, line 26-31, mentions the audio feature include mel-frequency spectrogram);
converting the target Mel spectrum features into a target speech corresponding to the to-be-synthesized text (Fig 1, reference character 106 provide speech output; fig 1 and 2, reference character 118 is predicted mel spectrogram as seen in Page 11, lines 25-26; Fig 3 then show the predicted mel Spectrogram 118 being used to get audio output sequence; also see, page 18, lines 11-15 )
	However, Biadsy fails to specifically teach:
inputting the to-be-processed Mel spectrum features into a preset ResUnet network model to obtain one or more first intermediate features; performing an average pooling and a first down 
Sheng does teach inputting the to-be-processed Mel spectrum features into a preset ResUnet network model (Page 3, Column 2, lines 27-30; and Figure 2, mention of proposed model where mel spectrogram is seen to be inputted in modified local enhancer consisting of ResUnet) to obtain one or more first intermediate features (Fig 3(a)-(b), show the ResUnet architecture that takes mel spectrogram to get output which can be considered as the first feature);
 performing an average pooling and a first down sampling on the to-be-processed Mel spectrum features to obtain one or more second intermediate features (Fig 3(c) show the average pooling and down sampling being performed on Mel spectrum); 
taking the second intermediate features and the first intermediate features output by the ResUnet network model as an input to perform a deconvolution (Fig 3(c) shows the output from ResUnet architect shown in Fig 3(a)-(b) and downsampled Mel spectrum being processed in transpose convolution which is a term commonly used in the art for deconvolution) and a first up sampling so as to obtain one or more target Mel spectrum features corresponding to the to-be-processed Mel spectrum features (Fig 3(c), show upsampling being performed to get predicted Mel spectrum); and
Sheng is considered analogous art to the claimed invention because it is also aimed toward same field of speech synthesis and text to speech system. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invented to implement the Text-to-speech system using Mel-spectrum as taught by Biasdy that consist of ResUnet architecture with downsampling 
As seen in the claim set, claims 1, 6, and 11 cover similar scope of invention. However, claim 1 is a method claim while claim 6 and 11 are apparatus and computer readable medium claims respectively. Both claim 6 and 11 are related to claim 1, with each claimed element’s function corresponding to the claimed method. Furthermore, Paragraph 0067-0068 in Biadsy mention of memory, processor, computer program, and computer readable medium recited in claim 6 and claim 11.
Regarding claims 2, 7, and 12, Biadsy in view of sheng teaches the method of claim 1, the apparatus of claim 6, and the storage medium of claim 11,  wherein the step of inputting the to-be-processed Mel spectrum features into the preset ResUnet network model to obtain the first intermediate features comprises: (Page 3, Column 2, lines 27-30; and Figure 2, mention of proposed model where mel spectrogram is seen to be inputted in modified local enhancer consisting of ResUnet; also see Fig 3)
performing a second down sampling (Fig 3(b) show UNet-Up ResBlock performing downsampling; For example, taking in 2048 size and outputting 1024 size), a residual connection processing (Fig 3(a)-(b) is the embodiment of residual connections), and a second up sampling on the to-be-processed Mel spectrum features through the ResUnet network model (Fig 3(b), show upsampling by using the UNet ConvBlock and Residual unit component from ResUnet architecture; for example, taking in 3 size and outputting 64size) to obtain the first intermediate features ( Fig 3(b), the ResUNet output of size 64 at last UNet-Up ResBlock ).
Sheng is considered analogous art to the claimed invention because it is also aimed toward same field of speech synthesis and text to speech system. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invented to implement the Text-to-speech system using Mel-spectrum as taught by Biasdy that consist of ResUnet architecture with downsampling 
Regarding claim 4, 9, and 14, Biadsy in view of sheng teaches the method of claim 1, the apparatus of claim 6, and the storage medium of claim 11, wherein the step of performing the average pooling and the first down sampling on the to-be-processed Mel spectrum features to obtain the second intermediate features (Fig 3(c), show the average pooling and down sampling being performed on Mel spectrum); comprises: 
performing at least one average pooling on the to-be-processed Mel spectrum features (Fig 3(c)); and 
performing the first down sampling on a processing result of each average pooling after the average pooling to obtain the second intermediate features (Fig3(c) show downsampling being performed after average pooling)
Sheng is considered analogous art to the claimed invention because it is also aimed toward same field of speech synthesis and text to speech system. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invented to implement the Text-to-speech system using Mel-spectrum as taught by Biasdy that consist of ResUnet architecture with downsampling and upsampling model as taught by Sheng (Fig 3). The use of ResUnet with downsampling and upsampling model can help improve speech quality (Page 1, lines 39-41)
Regarding claim 5, 10, and 15, Biadsy in view of sheng teaches the method of claim 4, the apparatus of claim 9, and the storage medium of claim 14, wherein the step of taking the second intermediate features and the first intermediate features output by the ResUnet network model as the input to perform the deconvolution (Fig 3(c) shows the output from ResUnet architect shown in Fig 3(a)-(b) and downsampled Mel spectrum being processed in transpose convolution which is a term commonly used in the art for deconvolution)  and the first up (Fig 3(c), show upsampling being performed to get predicted Mel spectrum):  
performing the deconvolution on the first intermediate features and the second intermediate features; (Fig 3(c))
performing at least one first up sampling on a processing result of the deconvolution (Fig 3(c), show upsampling being performed after transpose convolution which is a term commonly used in the art for deconvolution); 
and adding results of the first up sampling and the first down sampling, and performing the deconvolution on the results to obtain the Mel spectrum features (Fig 3c, it can be seen the results from downsampling block is added into the upsampling block and transpose convolution is performed to get an output. Depending on the input Mel size the process can be lowered to just first upsampling and downsampling iteration).
Sheng is considered analogous art to the claimed invention because it is also aimed toward same field of speech synthesis and text to speech system. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invented to implement the Text-to-speech system using Mel-spectrum as taught by Biasdy that consist of ResUnet architecture with downsampling and upsampling model as taught by Sheng (Fig 3). The use of ResUnet with downsampling and upsampling model can help improve speech quality (Page 1, lines 39-41)
Allowable Subject Matter
Claims 3, 8, and 12 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Kuleshov (Document ID: Audio Super Resolution using Neural Networks) teaches residual network with down and up sampling.
Kaneko (Document ID: CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks) teaches down and up sampling and residual connection but fails to teach ResUNet style model for feature processing
Ernst (Document ID: Speech Dereverberation Using Fully Convolutional Networks) does down and up sampling on speech spectrogram image using U-net. It is missing mention of residual connection.
Jin (Document ID: WO-2019139430-A1) teaches text to speech synthesis.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NEEL P. KARELIA whose telephone number is (571)272-4377. The examiner can normally be reached Monday-Friday 6:30 am - 4:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit 





/NEEL PIYUSHKUMAR KARELIA/Examiner, Art Unit 2659                                                                                                                                                                                                        

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659