Status of Claims
Claims 1-15 are pending.
This communication is in response to the communication filed 8/12/2019.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 8/12/2019, which was before the mailing of a first Office action on the merits.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 12 and 15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject 
Claim 12 recites the limitation "said frame" in line 8 and again in line 9.  There is insufficient antecedent basis for this limitation in the claim.  It is suggested to amend the limitation to recite “said created time frame of audio samples.”
Claim 15 recites the limitation "The computer program product" in line 1.  There is insufficient antecedent basis for this limitation in the claim.  It is suggested to amend the limitation to recite “A computer program product.”
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claim is directed towards “a computer-readable medium…” which encompasses signal per se.  For example, the computer-readable medium is described as:
The invention also describes a computer program product comprising program code instructions recorded on a computer-readable medium in order to carry out the steps of the method when said program operates on a computer (page 4, lines 34–36). 
Amending claim 15 to recite “a non-transitory computer-readable medium.... ” is one way to overcome this rejection. 

The independent claim 15 recites: 
“The computer program product comprising program code instructions recorded on a computer-readable medium in order to carry out the steps of the method according to claim 1 when said program operates on a computer,”  where claim 1 recites:
 “A method for modifying a sound signal, said method comprising:
a step of obtaining (310) time frames of the sound signal, in the frequency domain; 
for at least one time frame, applying a first transformation (320a) of the sound signal in the frequency domain, comprising: 
a step of extracting (330) a spectral envelope of the sound signal for said at least one time frame; 
a step of calculating (340) frequencies of formants of said spectral envelope; 
a step of modifying (350) the spectral envelope of the sound signal, said modification comprising application (351) of an increasing continuous transformation function of frequencies of the spectral envelope, parameterized by at least two frequencies of formants of the spectral envelope.“

This judicial exception is not integrated into a practical application.  In particular, independent claim 15 recites an additional element of a “computer program product.”  For example, in the as-filed specification, the computer program is described as:
Said computer program can for example be stored and/or run on the workstation of the call center operator 210, or on the server 220 (page 11, lines 32-33),
and
This system is given solely as an example, and other architectures can be set up. For example, the user 240 can use a landline telephone. The call center agent can also use a telephone, connected to the server 220. The invention can thus be applied to all system architectures allowing a connection between a user and a call center agent, comprising at least a server or a workstation (page 7, lines 6-10).
The computer is listed as a computing device and is mainly used as an application thereof.  Accordingly, these additional elements do not integrate the abstract idea into a 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer is noted as being using a general purpose computer.
Any additional limitation in claim 1 is directed towards insignificant solution activity, and is not patent eligible.
With respect to claim 3, the claims relate to classifying a time frame.  This relates to how a human can determine a difference between voiced and unvoiced syllables/phones to recognize parts of speech.  No additional limitations are present.  
With respect to claim 6, the claims relate to applying the increasing continuous transformation function of the frequencies of the spectral envelope based on calculating a first set of frequencies where each element corresponds to an initial frequency of a respective formant, and based on calculating a second set of frequencies where each element corresponds to an modified/transformed frequency for a respective formant.  This relates to how a human could can map the frequency transformation for each formant from its initial frequency to a respective modified frequency.  No additional limitations are present. 
With respect to claim 12, the claims relate to modifying the sound signal that includes a voice in real time, where the obtaining (310) time frames of the sound signal in the frequency domain comprises: receiving audio samples; creating a time frame of audio samples, when a sufficient number of samples is available to form said frame; applying a frequency 
With respect to claim 13, the claims relate to an application of a smiling timbre to a voice, wherein said at least two frequencies of formants are frequencies of formants affected by the smiling timbre of a voice.  This relates to how a human could speak when smiling (which affects at least two formants of the human’s speech).  No additional limitations are present.  
With respect to claim 14, the claims relate to an applying increasing continuous transformation function of the frequencies of the spectral envelope that has been determined during a training phase, by comparing spectral envelopes of phenomena stated by users, neutrally or while smiling.  This relates to how a human could listen to speech produced from others while they are either smiling or not, and then mimicking such speech after having listened to such speech.  No additional limitations are present.  
These above-listed claims further do not remedy the judicial exception being integrated into a practical application and further fail to include additional elements that are sufficient to amount to significantly more than the judicial exception.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6, and 15 are rejected under 35 U.S.C. 103 as being anticipated by Tian (US 20090171657 A1) in view of Coorman (US 20120265534 A1). 

Regarding independent claim 1, Tian discloses a method for modifying a sound signal (see Tian [0059], which notes FIG. 4 illustrates a flow diagram of a method of applying a warping function to sounds of a source speaker to convert the sounds to approximate speech of a target speaker), said method comprising: 
for at least one time frame, applying a first transformation of the sound signal in the frequency domain (see Tian [0061], which notes in block 404, the method 400 may include performing feature extraction to generate a feature vector based on the segments/frames of the source voice input, where the DSP 106 may generate a feature vector/formant frequency based on the source input in the manner discussed above in Tian; and see Tian [0026], which notes in block 204, the method 200 may include modeling/transforming the segments/frames of the equivalent acoustic events of the digitized source and target voice input in the frequency domain), comprising: 
a step of extracting a spectral envelope of the sound signal for said at least one time frame (see Tian [0029], which notes in block 252, the method 250 may include obtaining a spectral envelope to model the vocal tract contribution, where the DSP 106 may obtain a spectral envelope of the vocal tract contribution of the segment to model the vocal tract contribution using linear prediction, such as, but not limited to, a line spectral frequency (LSF) representation. Using the well-known linear prediction approach, the DSP 106 may use previous speech samples to form a prediction for a new sample); 
a step of calculating frequencies of formants of said spectral envelope (see Tian [0048], which notes in block 212, the method 200 may include aligning formants of the spectral envelopes from the selected mean mixture pair to establish the mixture specific warping function, where the microprocessor 108 can align the formants of the paired spectral envelopes to establish the mixture specific warping function Wl(ω), which is described below with reference to FIG. 3; and see Tian [0052], which notes the microprocessor 108 can identify spectral peaks denoted as SP1, SP2, . . . , SPm from the source spectral envelop of the mean μlx of the source speaker, and spectral peaks denoted as TP1, TP2, . . . , TPn from the target spectral envelop of the mean μly from the target speaker. The microprocessor 108 may align the spectral peaks of the target and source spectral envelopes to generate a lattice 300, where each node in the lattice 300 denotes one possible aligned formant pair); 
a step of modifying the spectral envelope of the sound signal, said modification comprising application of an increasing continuous transformation function of frequencies (see Tian FIG. 3, which shows a continuously fitted curve of increasing frequency for frequency-transforming formants of source speech to respective formants of target speech, where at least two source formants are so-transformed, such as SP1[Wingdings font/0xE0]TP1, and SP2[Wingdings font/0xE0]TP2 ) of the spectral envelope, parameterized by at least two frequencies of formants of the spectral envelope (see Tian [0055], which notes by finding the best path, the microprocessor 108 identifies the best (i.e., lowest cost) aligned formant pairs from the set of possible aligned formant pairs. Then, the microprocessor 108 calculates the mixture specific warping function for a particular mixture mean pair based on fitting a smooth/continuous curve through the aligned formant pairs along the best path in the lattice 300. The microprocessor 108 may then obtain the warping function based on a weighted combination of the mixture specific warping functions for each of the mixture mean pairs).  
Tian fails to specifically teach a step of obtaining time frames of the sound signal in the frequency domain.
(see Coorman [0011], which notes the fundamental frequency (F0 in FIG. 2) is determined by a "pitch detection" algorithm, the speech signals are windowed and split into equidistant segments (called frames), the distance between successive frames is constant and equal to the window hop size, and for each frame, the spectral envelope is obtained and a MFCC speech description vector ('real cepstrum' in FIG. 2) is derived through (frame-synchronous) cepstral analysis (FIG. 2), where the MFCC representation is a low-dimensional projection of the Mel-frequency scaled log-spectral envelope).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Tian with the rapidly varying component of the spectral envelope as taught by Coorman in order to create an enhanced spectral envelope final representation (see Coorman Abstract, which notes in one solution the improvement is made by manipulation an extremum, i.e. a peak or a valley, in the rapidly varying component of the spectral envelope representation. The rapidly varying component of the spectral envelope representation is manipulated to sharpen and/or accentuate extrema after which it is merged back with the slowly varying component or the spectral envelope input representation to create an enhanced spectral envelope final representation).
The combination of Tian and Coorman includes predictable results, such as enhancing/sharpening a spectral envelope.


Tian further discloses The computer program product comprising program code instructions recorded on a computer-readable medium in order to carry out the steps of the method according to claim 1 when said program operates on a computer (see Tian [0077], which notes the methods and features recited herein may further be implemented through any number of computer readable media that are able to store computer readable instructions. Examples of computer readable mediums that may be used include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, DVD or other optical disk storage, magnetic cassettes, magnetic tape, magnetic storage and the like).

As per claim 6, Tian in view of Coorman teaches all of the limitations of claim 1 above.  
Tian further teaches a method wherein the application of an increasing continuous transformation function of the frequencies of the spectral envelope comprises: 
calculation, for a set of initial frequencies determined from formants of the spectral envelope, modified frequencies (see Tian [0052], which notes the microprocessor 108 can identify spectral peaks denoted as SP1, SP2, . . . , SPm from the source spectral envelop of the mean μlx of the source speaker, and spectral peaks denoted as TP1, TP2, . . . , TPn from the target spectral envelop of the mean μly from the target speaker. The microprocessor 108 may align the spectral peaks of the target and source spectral envelopes to generate a lattice 300, where each node in the lattice 300 denotes one possible aligned formant pair);  
a linear interpolation between the initial frequencies of the set of initial frequencies determined from formants of the spectral envelope and the modified frequencies (see Tian FIG. 3, which shows a continuously fitted curve of increasing frequency for frequency-transforming formants of source speech to respective formants of target speech, where at least two source formants are so-transformed, such as SP1[Wingdings font/0xE0]TP1, and SP2[Wingdings font/0xE0]TP2).

Claim 12 is rejected under 35 U.S.C. 103 as being anticipated by Tian (US 20090171657 A1) in view of Coorman (US 20120265534 A1) and in further view of Chong-White (US 7065485 B1). 
 
As per claim 12, Tian in view of Coorman teaches all of the limitations of claim 1 above.  
Tian in view of Coorman fails to specifically teach the method according to claim 1, said method being suitable for modifying the sound signal in real time, and wherein: the sound signal comprises a voice; the step of obtaining time frames of the sound signal in the frequency domain comprises: receiving audio samples; creating a time frame of audio samples, when a sufficient number of samples is available to form said frame; applying a frequency transformation to the audio samples of said frame.
However, Chong-White does teach a method suitable for modifying the sound signal in real time (see Chong-White, col. 3, lines 56 – 65, which notes common methods used to modify the time duration of speech/sound signal such as overlap-add (OLA) techniques. OLA is a time-domain technique that modifies the time-scale of a signal without altering its perceived frequency attributes. OLA constructs a modified signal that has a short-time Fourier Transform (STFT) maximally close to that of the original signal. These techniques are popular due to their low complexity, allowing for real-time implementation), and wherein: 
the sound signal comprises a voice (see Chong-White, col. 7, lines 48 – 58, which notes that once the TSMSs (time-scaled modification syllables) have been identified, an appropriate time-scaling factor is dynamically determined by the time scale determinator 106 for each 10 ms segment of the frame, where a segment is a portion of speech that is processed by a variable-rate scale modification process. The strategy adopted is to emphasize the formant transitions through time expansion. This effect is then strengthened by compressing the following vowel segment. Hence, the first portion of the TSMS containing the formant transitions is expanded by αtr. The second portion containing the steady-state vowel is compressed by αss. Fricatives are lengthened by αfric); 
the step of obtaining time frames of the sound signal in the frequency domain comprises: 
receiving audio samples (see Chong-White, col. 7, line 58 – col. 8, line 8, which notes the scaling factors that are defined as follows: α<1 corresponds to lengthening the time duration of the current segment/frame, α>1 corresponds to compression, and α =1 corresponds to no time-scale modification at all.  Time scaling is inversely related to the scaling factor. Typically, αtr=1/αss; however, for increased effect, αtr<1/αss. Significant changes in time duration, e.g. α>3, may introduce distortions, especially in the case of stop bursts. The factors used in the current implementation are: αtr =0.5, αss =1.8 and αfric =0.8. In low energy regions of the speech, residual delays may be reduced by scaling the corresponding speech regions by the factor αsil=min(1.5, 1+d/(LFss)), where d is the current delay in samples, L is the frame duration and Fs is the sampling rate; see FIG. 9, which depicts an input/received signal corresponding to the word "fin"; and see FIG. 10, which depicts the self-determined scaling factors during the time duration corresponding to FIG. 9); 
creating a time frame of audio samples, when a sufficient number of samples is available to form said frame (see Chong-White, col. 8, lines 4 – 8, which notes in low energy regions of the speech, residual delays may be reduced by scaling the corresponding speech regions by the factor αsil=min(1.5, 1+d/(LFss)), where d is the current delay in samples, L is the frame duration and Fs is the sampling rate; and see Chong-White, col. 3, line 61 – col. 4, line 1, which notes the OLA technique constructs a modified signal that has a short-time Fourier Transform (STFT) maximally close to that of the original signal. These techniques are popular due to their low complexity, allowing for real-time implementation. OLA techniques average overlapping frames of a signal at points of highest correlation to obtain a time-scaled signal, which maintains the local pitch and spectral properties of the original signal); 
applying a frequency transformation to the audio samples of said frame (see Chong-White, col. 3, line 61 – col. 4, line 10, which notes the OLA technique constructs a modified signal that has a short-time Fourier Transform (STFT) maximally close to that of the original signal. These techniques are popular due to their low complexity, allowing for real-time implementation. OLA techniques average overlapping frames of a signal at points of highest correlation to obtain a time-scaled signal, which maintains the local pitch and spectral properties of the original signal. To reduce discontinuities at waveform boundaries and improve synchronization, the waveform similarity overlap-add (WSOLA) technique was developed. WSOLA overcomes distortions of OLA by selecting the segment for overlap-addition, within a given tolerance of the target position, such that the synthesized waveform has maximal similarity to the original signal across segment boundaries. The synthesis equation for WSOLA with regularly spaced synthesis instants kL and a symmetric unity gain window, v(n), is shown as Equation (1)).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Tian in view of Coorman with the time-scaling OLA techniques as taught by Chong-White in order to achieve real-time processing speeds (see Chong-White, col. 3, lines 56 – 65, which notes common methods used to modify the time duration of speech/sound signal such as overlap-add (OLA) techniques. OLA is a time-domain technique that modifies the time-scale of a signal without altering its perceived frequency attributes. OLA constructs a modified signal that has a short-time Fourier Transform (STFT) maximally close to that of the original signal. These techniques are popular due to their low complexity, allowing for real-time implementation).


Claims 2-4 are rejected under 35 U.S.C. 103 as being anticipated by Tian (US 20090171657 A1) in view of Coorman (PCT US 2004/260544 A1) and in further view of Brown (US 20110119061 A1). 

As per claim 2, Tian in view of Coorman teaches all of the limitations of claim 1 above.  
Tian in view of Coorman fails to specifically teach wherein the step of modifying the spectral envelope of the sound signal also comprises the application of a filter to the spectral envelope, said filter being parameterized by the frequency of a third formant of the spectral envelope of the sound signal.
However, Brown does teach wherein the step of modifying the spectral envelope of the sound signal also comprises the application of a filter to the spectral envelope (see Brown [0015], which notes it is known to filter an audio signal with a kind of equalization filter known as a peaking filter, to emphasize frequency components of the signal in a frequency range critical to intelligibility of speech, relative to frequency components of the signal outside this frequency range), said filter being parameterized by the frequency of a third formant of the spectral envelope of the sound signal (see Brown [0015], which notes it is known to use a peaking filter to emphasize frequency components of an audio signal in a range centered on the 3rd formant of speech (F3) relative to frequency components outside such range, where F3 can vary from approximately 2300 Hz to 3000 Hz in normal human speech; and see Brown [0052], which notes peaking filter 7 is preferably a biquadratic filter having a center frequency centered on the 3rd formant of speech, F3 (i.e., in the range from about 2300 Hz to about 3000 Hz) and a varying gain, where peaking filter 7's response (including the gain it applies at the center frequency) is determined dynamically by the "Cpow" control value).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Tian in view of Coorman with the emphasizing of formants critical to speech intelligibility as taught by Brown in order to emphasize frequencies of speech that are in the voiced frequency range (see Brown [0015], which notes it is known to filter an audio signal with a kind of equalization filter known as a peaking filter, to emphasize frequency components of the signal in a frequency range critical to intelligibility of speech, relative to frequency components of the signal outside this frequency range).
The combination of Tian and Coorman with Brown includes predictable results, such as the emphasis of speech frequencies that are in the voiced frequency range.

As per claim 3, Tian in view of Coorman teaches all of the limitations of claim 1 above.  
Tian in view of Coorman fails to specifically teach a method comprising a step for classifying a time frame, according to a set of time frame classes comprising at least one class of voiced frames and one class of non-voiced frames.
However, Brown does teach a method comprising a step for classifying a time frame, according to a set of time frame classes comprising at least one class of voiced frames and one (see Brown [0053], which notes in response to a "Cpow" value indicative of high dialog content in the speech channel, element 320 asserts to filter 7 a control value set that causes filter 7 to apply relatively high gain to frequency components of the speech channel likely to be indicative of dialog (e.g., frequency components in a range centered on the 3rd formant of speech) and lower gain to frequency components outside this range; and in response to a "Cpow" value indicative of low dialog content in the speech channel, element 320 asserts to filter 7 a different control value set that causes filter 7 to apply relatively low gain to all frequency components of the speech channel).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Tian in view of Coorman with the emphasizing of formants critical to speech intelligibility as taught by Brown in order to emphasize frequencies of speech that are in the voiced frequency range (see Brown [0015], which notes it is known to filter an audio signal with a kind of equalization filter known as a peaking filter, to emphasize frequency components of the signal in a frequency range critical to intelligibility of speech, relative to frequency components of the signal outside this frequency range).
The combination of Tian and Coorman with Brown includes predictable results, such as the emphasis of speech frequencies that are in the voiced frequency range.

As per claim 4, Tian in view of Coorman teaches all of the limitations of claim 3 above.  
Tian in view of Coorman teaches a method wherein for each voiced frame, the application of said first transformation of the sound signal in the frequency domain (see Tian [0062], which notes in block 406, the method 400 may include calculating a mixture weight (i.e., conditional probability) based on the source voice input/voiced frames to generate a warping function).
Tian in view of Coorman fails to specifically teach a method wherein for each non-voiced frame, the application of a second transformation of the sound signal in the frequency domain, said second transformation comprising a step for application of a filter to increase the energy of the sound signal centered on a predefined frequency.
However, Brown does teach a method comprising: 
for each voiced frame, the application of said first transformation of the sound signal in the frequency domain (see Brown [0052], which notes peaking filter 7 is preferably a biquadratic filter having a center frequency centered on the 3rd formant of speech, F3 (i.e., in the range from about 2300 Hz to about 3000 Hz) and a varying gain, where peaking filter 7's response (including the gain it applies at the center frequency) is determined dynamically by the "Cpow" control value; see Brown [0024], which notes the analysis module of the inventive system may dynamically determine the control value (Cpow) which is a measure of power of the speech channel of the input signal relative to power of a non-speech channel of the input signal, where the peaking filter's response is dynamically controlled by this control value, so that the gain it applies at the center frequency is increased in response to an increase in the control value Cpow; and see Brown [0083] with reference to FIG. 5, which notes the filtering subsystem 204 includes active peaking filter 207, which is steered by filter control value Cpow from module 202, and is coupled and configured to filter the center channel of the input signal (which is assumed to be a speech channel) to improve clarity and/or intelligibility of dialog relative to other content determined by the center channel); 
for each non-voiced frame, the application of a second transformation of the sound signal in the frequency domain, said second transformation comprising a step for application of a filter to increase the energy of the sound signal centered on a predefined frequency (see Brown [0025], which notes the ratio of speech channel power to non-speech channel power is used to determine how much ducking (attenuation) should be applied to each non-speech channel. For example, in the FIG. 4 embodiment, the gain applied by ducking amplifiers 8 and 9 may be reduced in response to an increase in a gain control value (output from element 318) that is indicative of relative power of the speech channel and non-speech channels determined in the analysis module, so that the ducking amplifiers more greatly attenuate the non-speech channels relative to the speech channel when the speech channel power increases relative to the combined power of non-speech channels; see Brown [0083] with reference to FIG. 5, which notes the filtering subsystem 204 also includes active ducking amplifiers 208, 209, 210, and 211, which are steered by filter control values Cpow, Lpow, Rpow, and Lpow from module 202, and are coupled and configured to apply attenuation (ducking) to the other channels of the input signal (which are assumed to be non-speech channels). The outputs of peaking filter 207 and ducking circuitry 208-211 determine a filtered, multichannel output signal).  
(see Brown [0015], which notes it is known to filter an audio signal with a kind of equalization filter known as a peaking filter, to emphasize frequency components of the signal in a frequency range critical to intelligibility of speech, relative to frequency components of the signal outside this frequency range).
The combination of Tian and Coorman with Brown includes predictable results, such as the emphasis of speech frequencies that are in the voiced frequency range.

Claims 13 and 14 are rejected under 35 U.S.C. 103 as being anticipated by Tian (US 20090171657 A1) in view of Coorman (PCT US 2004/260544 A1) and in further view of non-patent literature Haddad (Kevin El Haddad, Stéphane Dupont, Jérôme Urbain, Thierry Dutoit  - An HMM-based Speech-smile Synthesis System: An Approach for Amusement Synthesis, IEEE International Conference on Acoustics, Speech and Signal Processing, May 2015). 

As per claim 13, Tian in view of Coorman teaches all of the limitations of claim 1 above. 
Tian in view of Coorman fails to specifically teach said method being suitable for the application of a smiling timbre to a voice, wherein said at least two frequencies of formants are frequencies of formants affected by the smiling timbre of a voice.
However, Haddad does teach said method being suitable for the application of a smiling timbre to a voice, wherein said at least two frequencies of formants are frequencies of (see Haddad FIG. 2, page 3, which shows the standard frequency shifting of each formant pair of mean F1 frequency values and mean F2 frequency values for a respective vowel [a], [e], [i], [o] and [u] in smile-speech and neutral speech styles).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Tian in view of Coorman with the frequency shifting as taught by Haddad in order to generate different styles/expressivities of speech, such as smiling or neutral (see Haddad, page 2, left column, first full paragraph, which notes after the training phase comes an adaptation step in which a speaker’s source voice is adapted to a target voice using the Constrained Maximum Likelihood Linear Regression (CMLLR) algorithm [8]. This algorithm uses an affine linear function to transform the means and covariances of an initial model (created a speaker A’s voice) so as to maximize the likelihood of the target voice (coming from speaker B). This phase allows us to obtain speech-smile, spread lips and neutral speaker B speech acoustic models from limited amounts of data of those speech styles).
The combination of Tian and Coorman with Haddad includes predictable results, such as the generation of expressive speech.

As per claim 14, Tian in view of Coorman teaches all of the limitations of claim 1 above. 
Tian in view of Coorman fails to specifically teach the method according to claim 13, characterized in that said increasing continuous transformation function of the frequencies of 
However, Haddad does teach the method according to claim 13, characterized in that said increasing continuous transformation function of the frequencies of the spectral envelope has been determined during a training phase, by comparing spectral envelopes of phenomena stated by users, neutrally or while smiling (see Haddad FIG. 3, page 3, which notes initial-to-modified frequency transformations from neutral to smiling for five vowel sounds made in a training phase; see Haddad first paragraph, page 3, which notes Here, we rather focus on a comparison between the acoustic effects spreading the lips have (spread-lips style), as well as the ones speaking in a “happy way” (speech-smile style). In order to do that, French vowels [a], [e], [i], [o] and [u] were first extracted from each of the three different target voice/ multiple users recordings, where the same amount of each vowel instances was extracted from each speech style for comparison by choosing common sentences in all styles; and see Haddad, page 5, right column, last paragraph, which notes the authors’ near-future perspective is to be able to control the degree of smile/amusement in speech by using interpolation techniques to carry/continuously transform speech from one style of speech to another).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Tian in view of Coorman with the frequency shifting as taught by Haddad in order to generate different styles/expressivities of speech, such as smiling or neutral (see Haddad, page 2, left column, first full paragraph, which notes after the training phase comes an adaptation step in which a speaker’s source voice is adapted to a target voice using the Constrained Maximum Likelihood Linear Regression (CMLLR) algorithm [8]. This algorithm uses an affine linear function to transform the means and covariances of an initial model (created a speaker A’s voice) so as to maximize the likelihood of the target voice (coming from speaker B). This phase allows us to obtain speech-smile, spread lips and neutral speaker B speech acoustic models from limited amounts of data of those speech styles).
The combination of Tian and Coorman with Haddad includes predictable results, such as the generation of expressive speech.
Allowable Subject Matter
Claims 5, 7-11 are objected to as being dependent upon a rejected base claim, but would be allowable rewritten in independent form including all of the limitations of the base claim and any intervening claims.  
As per claim 5, the prior art does not specifically teach wherein the second transformation of the sound signal comprises:
the step of extracting a spectral envelope of the sound signal for said at least one time frame; 
applying an increasing continuous transformation function of the frequencies of the spectral envelope parameterized identically to an increasing continuous transformation function of the frequencies of the spectral envelope for an immediately preceding time frame.  

As per claim 8, the prior art does not specifically teach the method according to claim 7, wherein the set of frequencies determined from formants of the spectral envelope comprises:
a first initial frequency calculated from half of the frequency of a first formant of the spectral envelope of the sound signal; 
a second initial frequency calculated from the frequency of a second formant of the spectral envelope of the sound signal; 
a third initial frequency calculated from the frequency of a third formant of the spectral envelope of the sound signal; 
a fourth initial frequency calculated from the frequency of a fourth formant of the spectral envelope of the sound signal; 
a fifth initial frequency calculated from the frequency of a fifth formant of the spectral envelope of the sound signal.
As per claim 9, the prior art does not specifically teach method according to claim 8, wherein:
a first modified frequency is calculated as being equal to the first initial frequency; 
a second modified frequency is calculated by multiplying the second initial frequency by the multiplier coefficient; 
a third modified frequency is calculated by multiplying the third initial frequency by the multiplier coefficient; 
a fourth modified frequency is calculated by multiplying the fourth initial frequency by the multiplier coefficient; 
a fifth modified frequency is calculated as being equal to the fifth initial frequency. 
As per claim 10, the prior art does not specifically teach the method according to claim 8, wherein each initial frequency is calculated from the frequency of a formant of a current time frame.  
As per claim 11, the prior art does not specifically teach the method according to claim 8, each initial frequency is calculated from the average of the frequencies of formants of equal rank, for a number greater than or equal to two successive time frames.  
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Qian (US 20120253781 A1) is cited to disclose frame mapping-based cross-lingual voice transformation including formant-based frequency warping for vocal tract length normalization (VTLN) and speech trajectory tiling.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK R HENNINGS whose telephone number is (571) 272-9676. The examiner can normally be reached on Monday-Friday 8:00 am-5:00 pm. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll- free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MARK HENNINGS/
Examiner, Art Unit 2659

/PIERRE LOUIS DESIR/             Supervisory Patent Examiner, Art Unit 2659