DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 7/20/2021 has been entered.
Response to Amendment
In response to the office action from 4/28/2021, the applicant has submitted a request for continued examination filed 7/20/2021, amending claims 1-11, 13-14 while arguing to traverse the prior art rejections. Applicant’s arguments have been fully considered but since the latest amendments were overcome by prior art, therefore the examiner in alternate determined allowable subject matter and recommended it to the applicant representative. Therefore claims 1-14 with the examiner’s amendment below are allowable over prior art for the below provided reasons for allowance.
EXAMINER’S AMENDMENT
UTILIZING DIPHONES OR TRIPHONES AND MACHINE LEARNING” so as to be more descriptive of the invention.
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with the attorney on file Mr. Lyle Kimms on 11/2/2021.
Amend claims 1, 9, 13-14, and the Abstract:

As Per Claim 1:

1. (Currently Amended)  A voice synthesis method comprising:
	sequentially acquiring voice units comprising at least one of a diphone or a triphone in accordance with synthesis information for synthesizing voices, each voice unit specifying a frequency spectrum for each of unit temporal periods;
	generating a statistical spectral envelope of each unit temporal period using a statistical model built by machine learning in advance, in accordance with the synthesis information, the statistical model being trained to estimate a spectral envelope;
	modifying a frequency spectral envelope, including a frequency spectrum thereof, of each unit temporal period of each of the sequentially acquired voice units in 
	concatenating the sequentially acquired voice units before the modifying or the modified acquired voice units after the modifying.

	As Per Claim 9:

9. (Currently Amended)  A voice synthesis apparatus comprising:
	a memory storing instructions; and
	one or more processors that implement the instructions to:
sequentially acquire voice units comprising at least one of a diphone or a triphone in accordance with synthesis information for synthesizing voices, each voice unit specifying a frequency spectrum for each of unit temporal periods;
generate a statistical spectral envelope of each unit temporal period using a statistical model that is built by machine learning in advance, in accordance with the synthesis information, the statistical model being trained to estimate a spectral envelope;
modify a frequency spectral envelope of, including a frequency spectrum thereof, of each unit temporal period of each of the sequentially acquired voice units in accordance with the generated statistical spectral envelope of the respective unit temporal period to synthesize a voice signal having modified frequency spectra; and
concatenate the sequentially acquired voice units before the modifying or the modified acquired voice units after the modifying.

As Per Claim 13:

13. (Currently Amended)  The voice synthesis method according to claim 1, wherein the[[ the]] estimated spectral envelope is of a voice feature corresponding to one of a voice uttered more forcefully, a voice uttered more gently, a voice uttered more vigorously, or a voice uttered less clearly than another voice feature of the voice units.

	As Per Claim 14:

14. (Currently Amended)  A non-transitory computer-readable storage medium storing a program executable by a computer to execute a voice synthesis method comprising:	sequentially acquiring voice units comprising at least one of a diphone or a triphone in accordance with synthesis information for synthesizing voices, each voice unit specifying a frequency spectrum for each of unit temporal periods;
	generating a statistical spectral envelope of each unit temporal period using a statistical model that is built by machine learning in advance, in accordance with the synthesis information, the statistical model being trained to estimate a spectral envelope;
	modifying a frequency spectral envelope, including a frequency spectrum thereof, of each unit temporal period of each of the sequentially acquired voice units in accordance with the generated statistical spectral envelope of the respective unit temporal period to synthesize a voice signal having modified frequency spectra; and
	concatenating the sequentially acquired voice units before the modifying or the modified acquired voice units after the modifying.

	As Per Abstract:

comprising at least one of diphone or a triphone in accordance with synthesis information for synthesizing voices; generating statistical spectral envelopes using a statistical model built by machine learning in accordance with the synthesis information for synthesizing the voices; and concatenating the sequentially acquired voice units and modifying a frequency spectral envelope of each voice unit in accordance with the generated statistical spectral envelope, thereby synthesizing a voice signal based on the concatenated voice units having the modified frequency spectra.

Allowable Subject Matter
The following is an examiner’s statement of reasons for allowance: The independent claims 1, 9 and 14 recite are about a speech synthesizer that begins by “sequentially” “acquiring voice units” comprising of “diphones” and/or “triphones, along with “synthesis information” (“information” such as “pitch” “and” “one or more phonemes for” “each” “musical tone” (specification ¶ 0023)). It then “generates a statistical spectral envelope” by “machine learning” which represents a “temporal change of a spectral envelope” (spec. ¶ 0024 lines 3-4), where a “spectral envelope” “express[es]” “an outline of the corresponding frequency spectrum” (Fig. 2 and spec. ¶ 0021). This is followed by “modifying a frequency spectral envelope of each of the” “voice units” (“dipohnes” and/or “triphones”, where a “modifier” “adjust[s] the pitch of [each] voice unit” (spec. ¶ 0029 lines 6+), and/or “the [modified] frequency spectra are obtained” “such that the envelope” “of [unmodified] spectra” “approach the statistical spectral envelope” (spec. ¶ 0043) and/or by “interpolation between” “original before 
Prior art of record Kemmochi et al. (US 2006/0173676) does teach in ¶ 0022 “successively” (sequentially) “obtaining” (acquiring) “phonetic entity data specifying a phonetic entity; obtaining a spectral envelope” (synthesis information) “of a voice segment corresponding to the phonetic entity specified by the phonetic entity data out of a plurality of voice segments” (voice units) “corresponding to different phonetic entities”; ¶ 0046 lines 1+: “data generation portion” “provides means for generating” (generating) “conversion spectrum data Dt” “representing conversion spectrum SPt”, where according to ¶ 0049 lines 3-4 “SPt” is associated with “spectral envelope EVt” (spectral envelope generated which is also represented by “Dev indicat[ing] a spectral envelope” “of a frequency spectrum of voice segment” “[of] source voice”.  As Fig. 1 shown unit 22 indicates this “spectral envelope” “Dev” is modified into “Dnew” (a modified frequency spectral envelope), and according to ¶ 0043 lines 3+: “Dnew indicat[es] output voice’s frequency spectrum” “Spnew” “based on frequency spectrum” “SPt”. Or in sum it shows how a voice unit’s associated spectral envelope is modified by altering the “frequency spectrum” associated with the said voice unit and 
Kemmochi et al. though does not teach using machine learning to achieve these goals.
Agiomyrgiannakis et al. (US 2016/0140951) does teach utilizing an “SPSS system” (statistical model of speech synthesis) “TTS” “may use” “machine learning” (¶ 0028). Although this reference suggests utilizing “triphones” (¶ 0056, 94, 99, 104), but nowhere it hints at using these units in its “SPSS” model.
Bandino et al. (US 2007/0083367) in ¶ 0019 do teach: “voice, and speech units such as dipohones” “are sent to a text to speech (TTS)”. However they also are silent on utilizing any machine learning, or neural network methods in their “TTS”.
Bellegarda (US Patent 7,643,990) does teach in Col. 1 lines 47+: “To make synthetic speech more natural” it uses “polyphone synthesis” which “includes several examples of each dipohone”. This reference is also silent on utilizing any machine learning, or neural network methods in its “synthesis”
Further search did not produce any reference teaching this phenomenon, and therefore these claims became allowable. Claims 2-8, 10-13 (dependent on claim 1) further limit the scope of their allowed parent claim and are thus allowable under similar rationale.

Conclusion 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARZAD KAZEMINEZHAD whose telephone number is (571)270-5860. The examiner can normally be reached 10:30 am to 11:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DANIEL C WASHBURN can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, 





/Farzad Kazeminezhad/
Art Unit 2657
November 5th 2021.