DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This communication is in response to the Amendments and Arguments filed on 05 October 2021. Claims 1-15 are pending and have been examined. 
All previous objections and rejections directed to the Applicant’s disclosure and claims not discussed in this Office Action have been withdrawn by the Examiner.


Response to Amendments and Arguments
The applicant’s remarks and arguments have been carefully considered, but are moot in view of new grounds for rejection. The amendment which necessitates a new reference is “selectinq a voice expression to be imparted from among a plurality of voice expressions” (Henton) and “the extracted series of amplitude spectrum envelope contours having been extracted frame by frame from spectra of expressive samples of the selected voice expression” (Nakano et al.).
The examiner also notes that Henton reads on the second limitation of the amended claim.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 


Claims 1-3 and 8-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20040260544, hereinafter referred to as Kikumoto, in view of US 9159329, hereinafter referred to as Agiomyrgiannakis et al., further in view of US 5860064, hereinafter referred to as Henton, and further in view of US 20130151256, hereinafter referred to as Nakano et al.

Regarding claim 1 (Currently Amended), Kikumoto discloses a voice synthesis method comprising: 


altering a series of synthesis spectra in a partial period, among a total period, of a sysynthesized voice (Kikumoto, fig. 4 with formant control. Fig. 6, and para [0073] together show altering a series of synthesis spectra. And, Kikumoto, para [0058]-[0060] and fig. 6 teach that the modulating formant is modified every few millisecond and that the period is several milliseconds, implying that a large number of the formant series exist for each partial syllable or phoneme of the voice input - i.e., in a partial period of a synthesis voice.) based on an extracted series of amplitude spectrum envelope contours of [[a ]] the selected voice expression (Kikumoto, see fig. 4 and para 62 and 66-67 together with fig. 10a-10c with the formant control based on the formant curve which corresponds to the claimed spectrum envelope contours of a voice ,  to obtain a series of altered spectra to which the voice expression has been imparted (Kikumoto – fig. 4, and see the output of the envelope detector and interpolator.); and

synthesizing a series of voice samples of the synthesized voice to which the selected voice expression has been imparted, based on the obtained series of altered spectra (Kikumoto – fig. 4 with the last sentence of para [0073].).

Kikumoto, though, does not disclose wherein each frame of the extracted series of amplitude spectrum envelope contours of the selected voice expression: expresses a corresponding one of the series of respective amplitude spectrum envelopes of the voice expression more roughly frequency-wise; and includes less information on lyrics or a singer's individuality compared with the series of the amplitude spectrum envelopes.

Agiomyrgiannakis et al. is cited to disclose wherein each frame of the extracted series of amplitude spectrum envelope contours of the selected voice expression: 

expresses a corresponding one of the series of respective amplitude spectrum envelopes of the voice expression more roughly frequency-wise (“As will be discussed in more detail later, natural spectral envelopes 502A may correspond to a number of respective synthesized spectral envelopes 502B, perhaps generated by TTS synthesis system 400, such as those shown in FIG. 5B. As noted, HMM synthesis and/or Thus, as reflected in FIG. 5B, synthesized spectral envelopes 504B, 506B, and 508B (corresponding, respectively, to natural spectral envelopes 504A, 506A, and 508A), are generally smoothed compared with the natural spectral envelopes. As a general matter, the smoothing of the spectral envelopes may desirably reduce error in the generation of synthesized spectral envelopes,” Agiomyrgiannakis et al., col. 10, lines 4-19.); and 

includes less information on lyrics or a singer's individuality compared with the series of the amplitude spectrum envelopes (“however, it also causes the degradation of the naturalness of synthetic speech because it removes details of the natural spectral envelopes,” Agiomyrgiannakis et al., col. 10, lines 19-21.). Agiomyrgiannakis et al. benefits Kikumoto by determining a scale factor that, when applied to a synthesized reference spectral envelope, minimizes a statistical divergence between a natural reference spectral envelope and the synthesized reference spectral envelope (Agiomyrgiannakis et al., col. 1, lines 25-30). Therefore, it would be obvious to one skilled in the art to combine the teachings of Kikumoto with those of Agiomyrgiannakis et al. to improve the vocoder method of Kikumoto.

Neither Kikumoto nor Agiomyrgiannakis et al., though, discloses selectinq a voice expression to be imparted from among a plurality of voice expressions.

selectinq a voice expression to be imparted from amonq a plurality of voice expressions (“Referring now to FIG. 5, application of vocal emotion synthetic speech parameters according to the preferred embodiment of the present invention will now be explained. After a portion of text has been selected 501, and a particular vocal emotion has been chosen 503, the appropriate speech synthesizer values are obtained via look-up table 505, and thereby applied 507 by embedding the appropriate speech synthesizer commands in the selected text,” Henton, col. 3, lines 47-55.); Henton benefits Kikumoto by allowing a user to select the emotion (i.e., voice expression) to be imparted to the synthesized voice (Henton, col. 3, lines 47-55). Therefore, it would be obvious for one skilled in the art to combine the teachings of Kikumoto with those of Henton to extend the voice synthesis applications of Kikumoto.

Neither Kikumoto nor Agiomyrgiannakis et al. nor Henton, though, discloses the extracted series of amplitude spectrum envelope contours having been extracted frame by frame from spectra of expressive samples of the selected voice expression.

Nakano et al. is cited to disclose that the extracted series of amplitude spectrum envelope contours having been extracted frame by frame from spectra of expressive samples of the selected voice expression (“A spectral transform surface is generated in expression (2) using estimated W.sub.k(f,t) and p.sub.m(f,t). Following that, upper and lower limits are defined for each frame to reduce the unnaturalness of singing synthesis and alleviate the influence caused when the user's singing is outside the timbre change tube. Abrupt changes are reduced by smoothing the time-frequency surface, thereby maintaining the spectral continuity. Finally, a synthesized audio signal for synthesized Nakano et al. benefits Kikumoto by generating a synthesized singing voice mimicking pitch, dynamics, and voice timbre changes of an input singing voice (Nakano et al., Abstract). Therefore, it would be obvious for one skilled in the art to combine the teachings of Kikumoto with those of Nakano et al. to extend the voice synthesis applications of Kikumoto.
As to claim 10, device claim 10 and method claim 1 are related as method and device of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 10 is similarly rejected under the same rationale as applied above with respect to method claim. And, Kikumoto, fig. 1 and para [0037]-[0038] teach a processor, memory, CRM, and instructions. 
As to claim 11, CRM claim 11 and method claim 1 are related as method and CRM of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 11 is similarly rejected under the same rationale as applied above with respect to method claim. And, Kikumoto, fig. 1 and para [0037]-[0038] teach a processor and memory. 

Regarding claim 2 (Currently Amended), Kikumoto, as modified by Agiomyrgiannakis et al., Henton, and Nakano et al., discloses the voice synthesis method according to claim 1, wherein the extracted altering includes altering the series of amplitude spectrum envelope contours of the selected voice expression through morphing performed based on the series of amplitude spectrum envelope contours of the selected voice expression (Kikumoto, fig. 10a-10c show the formant of both input voice and control formant are morphed.). 

Regarding claim 3 (Currently Amended), Kikumoto, as modified by Agiomyrgiannakis et al., Henton, and Nakano et al., discloses the voice synthesis method according to claim 1, wherein the altering includes altering the series of synthesis spectra based on the extracted series of amplitude spectrum envelope contours of the selected voice expression and[[ a]] the series of amplitude spectrum envelope of the selected voice expression (Kikumoto, fig. 4 and fig. 6-7 and para [0050]-[0051].).  

Regarding claim 8 (Currently Amended), Kikumoto, as modified by Agiomyrgiannakis et al., Henton, and Nakano et al., discloses the voice synthesis method according to claim 1, wherein the altering includes:

shifting a series of pitches of the selected voice expression based on a pitch difference between a pitch in the partial period of the synthesized voice, and a representative value of the pitches of the selected voice expression; and 

altering the series of synthesis spectra based on the shifted series of pitches and the extracted series of amplitude spectrum envelope contours of the selected voice expression (Kikumoto, fig. 7 and para [0051]).  

claim 9 (Currently Amended), Kikumoto, as modified by Agiomyrgiannakis et al., Henton, and Nakano et al., discloses the voice synthesis method according to claim 1, wherein the altering further includes altering the series of synthesis spectra based onselected voice expression (Kikumoto, fig. 10a-10c show the formant of both input voice and control formant are morphed.).  

Regarding claim 12 (Currently Amended), Kikumoto, as modified by Agiomyrgiannakis et al., Henton, and Nakano et al., discloses the voice synthesis method according to claim 1, wherein the series of respective amplitude spectrum envelopes of the selected voice expression relate to perception of lyrics and a singer's individuality (“In addition, there are cases in which it is desired to change the center frequency or bandwidth of the specific band of the formant characteristics and produce a special effect. For example, there are cases in which it is desired to intentionally move the resonant frequency of the formant in order to match the singing pitch. This is called a singing formant. In this case, since it is not possible to obtain the desired output by simply expanding and contracting the formant on a logarithmic frequency axis, it is necessary to expand and contract the formant non-uniformly on the logarithmic frequency axis,” Kikumoto, para [0064]. The singing formant relates to a singer’s individuality.).  

Regarding claim 13 (Currently Amended), Kikumoto, as modified by Agiomyrgiannakis et al., Henton, and Nakano et al., discloses the voice synthesis method according to claim 1, wherein the extracted series of amplitude spectrum envelope contours of the selected voice expression relate to brightness of a voice (Kikumoto, para [0064]. Intelligibility is interpreted as brightness/clarity of a voice.).  

Regarding claim 14 (Previously Presented), Kikumoto, as modified by Agiomyrgiannakis et al., Henton, and Nakano et al., discloses the voice synthesis method according to claim 1, further comprising extracting the series of amplitude spectrum envelope contours of the voice expression (Kikumoto, see fig. 4 and para 62 and 66-67 together with fig. 10a-10c with the formant control based on the formant curve which corresponds to the claimed spectrum envelope contours of a voice expression.).


Claims 4-6 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20040260544, hereinafter referred to as Kikumoto, in view of US 9159329, hereinafter referred to as Agiomyrgiannakis et al., further in view of US 5860064, hereinafter referred to as Henton, further in view of US 20130151256, hereinafter referred to as Nakano et al., and further in view of US 20010021904, hereinafter referred to as Plumpe.

Regarding claim 4 (Currently Amended), Kikumoto, as modified by Agiomyrgiannakis et al., Henton, and Nakano et al., discloses the voice synthesis method according to claim 1, but not wherein the altering includes: 

extracted series of amplitude spectrum envelope contours of the voice expression to align a feature point of the synthesized voice on a time axis aligns with an expression reference time that is set for the selected voice expression; and 

altering the series of synthesis spectra based on the positioned series of amplitude spectrum envelope contours of the selected voice expression.

Plumpe is cited to disclose wherein the altering includes:

positioning the series of amplitude spectrum envelope contours of the voice expression to align a feature point of the synthesized voice on a time axis aligns with an expression reference time that is set for the voice expression (Plumpe, para [0042]); and 

altering the series of synthesis spectra based on the positioned series of amplitude spectrum envelope contours (“Since as discussed previously, formants vary from person to person and even across repetitions of the same utterance for a single speaker, the formants output by formant synthesizer 104 and the actual formant values associated with the speech signal will likely be somewhat different. For instance, the time interval within which the formant frequency appears may be slightly shifted in the synthesized formants output by formant synthesizer 104 relative to the actual timing associated with the formant frequencies. Further, the formant frequencies output from formant synthesizer 104 may be slightly different than the actual formant frequencies. In order to modify the synthesized formants provided by formant synthesizer 104 to accommodate for these differences, time warp component 108 and frequency warp component 110 are provided,” Plumpe, para [0042]. Here, the feature point(s) of the Plumpe benefits Kikumoto by aligning the vowels of the actual utterance to those of the synthesized utterance (Plumpe, para [0042]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Kikumoto with those of Plumpe to improve the synthesis sound quality of Kikumoto. 

Regarding claim 5 (original), Kikumoto, as modified by Agiomyrgiannakis et al., Henton, and Nakano et al., and Plumpe, discloses the voice synthesis method according to claim 4, wherein the feature point of the synthesized voice is a vowel start time of the synthesized voice (“Since as discussed previously, formants vary from person to person and even across repetitions of the same utterance for a single speaker, the formants output by formant synthesizer 104 and the actual formant values associated with the speech signal will likely be somewhat different. For instance, the time interval within which the formant frequency appears may be slightly shifted in the synthesized formants output by formant synthesizer 104 relative to the actual timing associated with the formant frequencies. Further, the formant frequencies output from formant synthesizer 104 may be slightly different than the actual formant frequencies. In order to modify the synthesized formants provided by formant synthesizer 104 to accommodate for these differences, time warp component 108 and frequency warp component 110 are provided,” Plumpe, para [0042]. The examiner notes that a formant 

Regarding claim 6 (original), Kikumoto, as modified by Agiomyrgiannakis et al., Henton, and Nakano et al., and Plumpe, discloses the voice synthesis method according to claim 4, wherein the feature point of the synthesized voice is a vowel end time of the synthesized voice or a pronunciation end time of the synthesized voice (“Since as discussed previously, formants vary from person to person and even across repetitions of the same utterance for a single speaker, the formants output by formant synthesizer 104 and the actual formant values associated with the speech signal will likely be somewhat different. For instance, the time interval within which the formant frequency appears may be slightly shifted in the synthesized formants output by formant synthesizer 104 relative to the actual timing associated with the formant frequencies. Further, the formant frequencies output from formant synthesizer 104 may be slightly different than the actual formant frequencies. In order to modify the synthesized formants provided by formant synthesizer 104 to accommodate for these differences, time warp component 108 and frequency warp component 110 are provided,” Plumpe, para [0042]. The examiner notes that a formant is each of several prominent bands of frequency that determine the phonetic quality of a vowel. The time warp aligns the synthesized formant (i.e., vowel) frequencies to the actual formant (i.e., vowel) frequencies start and end time.).  


Claim 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20040260544, hereinafter referred to as Kikumoto, in view of US 9159329, hereinafter referred to as Agiomyrgiannakis et al., further in view of US 5860064, hereinafter referred to as Henton, further in view of US 20130151256, hereinafter referred to as Nakano et al., and further in view of US 20030221542, hereinafter referred to as Kenmochi et al.  

Regarding claim 7 (Currently Amended), Kikumoto, as modified by Agiomyrgiannakis et al., Henton, and Nakano et al., discloses the voice synthesis method according to claim 1, but not wherein the altering includes: 

expanding or contracting the extracted series of amplitude spectrum envelope contours of the selected voice expression on a time axis to match a time length of the partial period of the synthesized voice; and 

altering the series of synthesis spectra based on the expanded or contracted extracted series of - 42 - amplitude spectrum envelope contours of the selected voice expression.

Kenmochi et al. is cited to disclose wherein the altering includes:

expanding or contracting the extracted series of amplitude spectrum envelope contours of the selected voice expression on a time axis to match a time length of the partial period of the synthesized voice (Kenmochi, para [0080]-[0081] and figs. 14-16.)[[,]]; and 

altering the series of synthesis spectra based on the expanded or contracted extracted series of - 42 - amplitude spectrum envelope contours of the selected voice expression Kenmochi et al. benefits Kikumoto by providing a technique to avoid synthesizing voice which has an artificial sound (Kenmochi et al. para [0016]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Kikumoto with those of Kenmochi et al. to improve the synthesis sound quality of Kikumoto. 


Claim 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20040260544, hereinafter referred to as Kikumoto, in view of US 9159329, hereinafter referred to as Agiomyrgiannakis et al., further in view of US 5860064, hereinafter referred to as Henton, further in view of US 20130151256, hereinafter referred to as Nakano et al., and further in view of US 9947341, hereinafter referred to as Marsh et al.  

Regarding claim 15 (Currently Amended), Kikumoto, as modified by Agiomyrgiannakis et al., discloses the voice synthesis method according to claim 1, but not wherein each of the extracted series of amplitude spectrum envelope contours of the selected voice expression is a group of cepstrum coefficients having a lower order than the corresponding amplitude spectrum envelope of the selected voice expression. 

Marsh et al. is cited to disclose wherein each of the extracted series of amplitude spectrum envelope contours of the selected voice expression is a group of cepstrum coefficients having a lower order than the corresponding amplitude spectrum envelope of the selected voice expression (“Various techniques to calculate the magnitude spectral envelope of a signal are described in Caetano et al., Improved Estimation of the Amplitude Envelope Of Time-Domain Signals Using True Envelope Cepstral In some embodiments, the magnitude spectral envelope may be determined by calculating a cepstrum using a Fourier transformation, low-pass filtering the cepstrum, and transforming the cepstrum back into a spectrum by using another Fourier transformation. Low-pass filtering can be implemented by calculating the cepstrum and discarding a number of the highest Fourier coefficients of the cepstrum. For example, the upper 40%, 60% or 80% or coefficients may be set to zero. In an example embodiment, a Fast Fourier Transformation size of 2048 is chosen, and only the lowest 40 Fourier coefficients are kept, with all higher coefficients set to zero,” Marsh et al., col. 8, lines 49-65. Thus, an envelope contour is derived which comprises a lower order of cepstrum coefficients than the original envelope. The applicant may also refer to section 3.4 of attached Caetano et al.). Marsh et al. benefits Kikumoto by applying cepstral techniques for pitch and formant adjustment, implemented as modifications to a Phase Vocoder, and performed continuously on a series of successive signal segments (or windows) to provide voice transformation in real-time communication systems (Marsh et al., col. 4, lines 27-31). Therefore, it would be obvious for one skilled in the art to combine the teachings of Kikumoto with those of Marsh et al. to improve the vocoder system of Kikumoto.


Conclusion
The prior art made of record and not relied upon is considered pertinent to the applicant’s disclosure and is listed in form 892.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANNE L THOMAS-HOMESCU whose telephone number is (571)272-0899.  The examiner can normally be reached on Mon-Fri 8-6.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 5712727453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/ANNE L THOMAS-HOMESCU/Primary Examiner, Art Unit 2656