DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 9/20/2022 has been entered.
 Response to Arguments
Applicant’s arguments with respect to claim(s) 1,17,19,21,23-26,28-30 and 32-35 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 17, 19-21, 24-26, 28-30, 33-35 is/are rejected under 35 U.S.C. 103 as being unpatentable over Haupt U.S. PAP 2013/019738 A1, in view of Marsh U.S. Patent No. 9,947,341 B1.


Regarding claim 1 Haupt teaches an electronic device comprising at least one memory and one or several processors (CPU, RAM and ROM in a personal computer, see par. [0028]) configured for: 
obtaining at least one base audio signal (voice analysis of spoken fragments of the source speaker, see par. [0030]); 
and generating at least one output audio signal from said at least one base audio signal such that a distance between at least one base style feature representative of a base style of said at least one base audio signal and at least one reference style feature representative of a reference style decreases (voice analysis of spoken fragments of the source speaker, voice analysis of the target singer, the re-estimation of parameters of the source speaker to match the target speaker, and the re-synthesis of the source model to make a singing voice, see par. [0030]; A singing voice model may be added to change characteristics of speech into singing. This includes, but is not limited to, phoneme segments of the spoken voice to match the singer's voice, see par. [0037]), 
wherein said same temporal portion of said at least one base audio signal is iteratively modified until said distance reaches a value and wherein said at least one base audio signal comprises an audio content other than a speech content, the audio content being iteratively modified according to the reference style to be included in the at least one output audio signal ( a singing voice sample can be synthesized from these variables and subjected to a post voice synthesis analysis 80 by means of a correction unit 82 added to reduce any artifacts from the source-filter analysis. With the timing information 84 of the triphones uttered in the singer database 28, the resultant speech sample after voice synthesis 86 is then placed in a signal timed in such a manner that the sung voice and the newly formed sample occur at the exact same point in the song. The resulting track 88 will be singing in a speaker's voice in the manner of a target singer. Thus the invention achieves the effect of modifying a speaker's voice to sound as if singing in the same manner as a singer, see par. [0042]).  
Although Haupt teaches using Fourier transforms (see par. [0038]) it does not teach  iteratively modifying a same temporal portion of said at least one base audio signal to gradually transform said same temporal portion of said at least one base audio signal into a corresponding temporal portion of said at least one output audio signal such that a distance between at least one base style feature representative of a base style of said at least one base audio signal and at least one reference style feature representative of a reference style decreases.
In the same field of endeavor Marsh teaches real-time voice masking using cepstral analysis based on pitch and formant parameters to synthesize a modified signal, see abstract. 
FIG. 10 illustrates an example envelope estimation subroutine using true envelope estimation. In step 1010, the input signal may optionally be subsampled, or decimated, on the frequency scale to reduce the computational complexity and memory requirements of envelope estimation. For example, the magnitude spectral envelope may be calculated at a frequency resolution that is lower than the signal by a factor of 2, 4, 8 or 16. This may allow significant performance benefits during subsequent steps, for example by making the Fast Fourier Transformation size executed within the iteration step smaller. In one embodiment, the magnitude spectral envelope is calculated at a frequency resolution that is lower than the frequency resolution of the signal by a factor of 2. (48) In step 1020, variables are initialized to prepare for a first iteration. The iteration counter n is initialized with 1 to reflect that this is the first iteration. The spectrum of the signal, as subsampled in step 1010, is referred to as X(k), and A.sub.0 is initialized as the natural logarithm of that spectrum, A.sub.0=log(X(k)).The algorithm then proceeds to step 1030, which may be considered the first step of the iteration. In step 1040, a cepstrum C.sub.n is then calculated from A.sub.n(k) by performing a Fourier transformation on A.sub.n(k). In step 1050, smoothing, such as, for example, low-pass filtering, is applied to the cepstrum C.sub.n calculated in step 1040. In step 1060, the cepstrum C.sub.n is transformed back into the frequency domain by using Fourier transformation. In step 1070, a termination criterion is applied to decide whether to perform another iteration. For example, step 1070 may lead to termination when a set number of rounds has been performed, and/or upon observing that log(X(k)) and C.sub.n have converged sufficiently close. In one embodiment, the iterative smoothing may be stopped once 16 rounds have been performed or upon the maximum difference between log(X(k)) and C.sub.n being below 0.23 for all frequency. Advantageously, this allows for the execution time of the smoothing algorithm to be assigned an upper limit, for example to support real-time operation, while still performing as many iterations as feasible within that limit, see col. 10 line 65 to col. 12 line 3. 
It would have been obvious to one of ordinary skill in the art to combine the Haupt invention with the teachings of Marsh in order to reduce the computational complexity and memory requirements of envelope estimation, see col. 11 lines 1-4.
Regarding claim 17 Haupt teaches the electronic device according to claim 1, wherein said at least one base audio signal comprises a speech content (voice fragments, see par. [0030]).
Regarding claim 19 Haupt teaches the electronic device according to claim 1, wherein said reference style is a style of at least one reference audio signal (target singer track database, see par. [0029]).  
Regarding claim 20 Haupt teaches the electronic device according to claim 19 wherein said at least one reference audio signal comprises a speech content (vocal tract of the singer, see par. [0029]).  
Regarding claim 21 Haupt teaches the electronic device according to claim 19, wherein said at least one reference audio signal comprises an audio content other than a speech content (instrumental recording, see par. [0049]).    .  
Regarding claim 24 Marsh teaches the electronic device according to claim 19, wherein obtaining said at least one reference style feature comprises at least one of: 
subband filtering of said at least one reference audio signal (to adjust the formant frequency without substantially affecting the pitch, the filter's transmission function can be linearly rescaled on the frequency axis, see col. 4 lines 7-15); 
obtaining an envelope of said at least one filtered reference audio signal (Multiplying the modified spectral envelope with the excitation spectrum then yields the spectrum of the formant-adjusted signal, see col. 6 lines 17-29); 
and modulating said obtained envelope ( rescaling may be applied to the excitation spectrum to accomplish pitch adjustment, see col. 6 ; lines 17-29]) .  
Regarding claim 25 Marsh teaches the electronic device according to claim 1, wherein obtaining said at least one base style feature comprises at least one of:
subband filtering of said at least one base audio signal (to adjust the formant frequency without substantially affecting the pitch, the filter's transmission function can be linearly rescaled on the frequency axis, see col. 4 lines 7-15); 
obtaining an envelope of said at least one filtered base audio signal (Multiplying the modified spectral envelope with the excitation spectrum then yields the spectrum of the formant-adjusted signal, see col. 6 lines 17-29); 
and modulating said obtained envelope ( rescaling may be applied to the excitation spectrum to accomplish pitch adjustment, see col. 6 ; lines 17-29]) .  
Regarding claim 26 Haupt teaches a method comprising: 
obtaining at least one base audio signal (voice analysis of spoken fragments of the source speaker, see par. [0030]); 
and generating at least one output audio signal from said at least one base audio signal such that a distance between at least one base style feature representative of a base style of said at least one base audio signal and at least one reference style feature representative of a reference style decreases (voice analysis of spoken fragments of the source speaker, voice analysis of the target singer, the re-estimation of parameters of the source speaker to match the target speaker, and the re-synthesis of the source model to make a singing voice, see par. [0030]; A singing voice model may be added to change characteristics of speech into singing. This includes, but is not limited to, phoneme segments of the spoken voice to match the singer's voice, see par. [0037]), 
wherein said same temporal portion of said at least one base audio signal is iteratively modified until said distance reaches a value and wherein said at least one base audio signal comprises an audio content other than a speech content, the audio content being iteratively modified according to the reference style to be included in the at least one output audio signal ( a singing voice sample can be synthesized from these variables and subjected to a post voice synthesis analysis 80 by means of a correction unit 82 added to reduce any artifacts from the source-filter analysis. With the timing information 84 of the triphones uttered in the singer database 28, the resultant speech sample after voice synthesis 86 is then placed in a signal timed in such a manner that the sung voice and the newly formed sample occur at the exact same point in the song. The resulting track 88 will be singing in a speaker's voice in the manner of a target singer. Thus the invention achieves the effect of modifying a speaker's voice to sound as if singing in the same manner as a singer, see par. [0042]).  
Although Haupt teaches using Fourier transforms (see par. [0038]) it does not teach  iteratively modifying a same temporal portion of said at least one base audio signal to gradually transform said same temporal portion of said at least one base audio signal into a corresponding temporal portion of said at least one output audio signal such that a distance between at least one base style feature representative of a base style of said at least one base audio signal and at least one reference style feature representative of a reference style decreases.
In the same field of endeavor Marsh teaches real-time voice masking using cepstral analysis based on pitch and formant parameters to synthesize a modified signal, see abstract. 
FIG. 10 illustrates an example envelope estimation subroutine using true envelope estimation. In step 1010, the input signal may optionally be subsampled, or decimated, on the frequency scale to reduce the computational complexity and memory requirements of envelope estimation. For example, the magnitude spectral envelope may be calculated at a frequency resolution that is lower than the signal by a factor of 2, 4, 8 or 16. This may allow significant performance benefits during subsequent steps, for example by making the Fast Fourier Transformation size executed within the iteration step smaller. In one embodiment, the magnitude spectral envelope is calculated at a frequency resolution that is lower than the frequency resolution of the signal by a factor of 2. (48) In step 1020, variables are initialized to prepare for a first iteration. The iteration counter n is initialized with 1 to reflect that this is the first iteration. The spectrum of the signal, as subsampled in step 1010, is referred to as X(k), and A.sub.0 is initialized as the natural logarithm of that spectrum, A.sub.0=log(X(k)).The algorithm then proceeds to step 1030, which may be considered the first step of the iteration. In step 1040, a cepstrum C.sub.n is then calculated from A.sub.n(k) by performing a Fourier transformation on A.sub.n(k). In step 1050, smoothing, such as, for example, low-pass filtering, is applied to the cepstrum C.sub.n calculated in step 1040. In step 1060, the cepstrum C.sub.n is transformed back into the frequency domain by using Fourier transformation. In step 1070, a termination criterion is applied to decide whether to perform another iteration. For example, step 1070 may lead to termination when a set number of rounds has been performed, and/or upon observing that log(X(k)) and C.sub.n have converged sufficiently close. In one embodiment, the iterative smoothing may be stopped once 16 rounds have been performed or upon the maximum difference between log(X(k)) and C.sub.n being below 0.23 for all frequency. Advantageously, this allows for the execution time of the smoothing algorithm to be assigned an upper limit, for example to support real-time operation, while still performing as many iterations as feasible within that limit, see col. 10 line 65 to col. 12 line 3. 
It would have been obvious to one of ordinary skill in the art to combine the Haupt invention with the teachings of Marsh in order to reduce the computational complexity and memory requirements of envelope estimation, see col. 11 lines 1-4.

Regarding claim 28 Haupt teaches the method according to claim 26, wherein said reference style is a style of at least one reference audio signal (target singer track database, see par. [0029]).  .  
Regarding claim 29 Haupt teaches the method according to claim 28, wherein said at least one reference audio signal comprises a speech content (voice fragments, see par. [0030]).  
Regarding claim 30 Haupt teaches the method according to claim 28, wherein said at least one reference audio signal comprises an audio content other than a speech content (instrumental recording, see par. [0049]).    
Regarding claim 33 Marsh teaches the method according to claim 28, wherein obtaining said at least one reference style feature comprises at least one of : subband filtering of said at least one reference audio signal (to adjust the formant frequency without substantially affecting the pitch, the filter's transmission function can be linearly rescaled on the frequency axis, see col. 4 lines 7-15); 
obtaining an envelope of said at least one filtered reference audio signal (Multiplying the modified spectral envelope with the excitation spectrum then yields the spectrum of the formant-adjusted signal, see col. 6 lines 17-29); 
and modulating said obtained envelope ( rescaling may be applied to the excitation spectrum to accomplish pitch adjustment, see col. 6 ; lines 17-29]) .  
Regarding claim 34 Marsh teaches the method according to claim 26, wherein obtaining said at least one base style feature comprises at least one of : subband filtering of said at least one base audio signal (to adjust the formant frequency without substantially affecting the pitch, the filter's transmission function can be linearly rescaled on the frequency axis, see col. 4 lines 7-15); 
obtaining an envelope of said at least one filtered base audio signal (Multiplying the modified spectral envelope with the excitation spectrum then yields the spectrum of the formant-adjusted signal, see col. 6 lines 17-29); 
and modulating said obtained envelope ( rescaling may be applied to the excitation spectrum to accomplish pitch adjustment, see col. 6 ; lines 17-29]) .  

 
Regarding claim 35 Haupt teaches a non-transitory computer readable storage medium (a computer readable medium containing instruction to carry the method of the invention to perform the sequencing of the steps needed for the PC to perform the intended processing, see par. [0010]), comprising program code instructions executable by a processor, for: 
obtaining at least one base audio signal (voice analysis of spoken fragments of the source speaker, see par. [0030]); 
and generating at least one output audio signal from said at least one base audio signal such that a distance between at least one base style feature representative of a base style of said at least one base audio signal and at least one reference style feature representative of a reference style decreases (voice analysis of spoken fragments of the source speaker, voice analysis of the target singer, the re-estimation of parameters of the source speaker to match the target speaker, and the re-synthesis of the source model to make a singing voice, see par. [0030]; A singing voice model may be added to change characteristics of speech into singing. This includes, but is not limited to, phoneme segments of the spoken voice to match the singer's voice, see par. [0037]), 
wherein said same temporal portion of said at least one base audio signal is iteratively modified until said distance reaches a value and wherein said at least one base audio signal comprises an audio content other than a speech content, the audio content being iteratively modified according to the reference style to be included in the at least one output audio signal ( a singing voice sample can be synthesized from these variables and subjected to a post voice synthesis analysis 80 by means of a correction unit 82 added to reduce any artifacts from the source-filter analysis. With the timing information 84 of the triphones uttered in the singer database 28, the resultant speech sample after voice synthesis 86 is then placed in a signal timed in such a manner that the sung voice and the newly formed sample occur at the exact same point in the song. The resulting track 88 will be singing in a speaker's voice in the manner of a target singer. Thus the invention achieves the effect of modifying a speaker's voice to sound as if singing in the same manner as a singer, see par. [0042]).  
Although Haupt teaches using Fourier transforms (see par. [0038]) it does not teach  iteratively modifying a same temporal portion of said at least one base audio signal to gradually transform said same temporal portion of said at least one base audio signal into a corresponding temporal portion of said at least one output audio signal such that a distance between at least one base style feature representative of a base style of said at least one base audio signal and at least one reference style feature representative of a reference style decreases.
In the same field of endeavor Marsh teaches real-time voice masking using cepstral analysis based on pitch and formant parameters to synthesize a modified signal, see abstract. 
FIG. 10 illustrates an example envelope estimation subroutine using true envelope estimation. In step 1010, the input signal may optionally be subsampled, or decimated, on the frequency scale to reduce the computational complexity and memory requirements of envelope estimation. For example, the magnitude spectral envelope may be calculated at a frequency resolution that is lower than the signal by a factor of 2, 4, 8 or 16. This may allow significant performance benefits during subsequent steps, for example by making the Fast Fourier Transformation size executed within the iteration step smaller. In one embodiment, the magnitude spectral envelope is calculated at a frequency resolution that is lower than the frequency resolution of the signal by a factor of 2. (48) In step 1020, variables are initialized to prepare for a first iteration. The iteration counter n is initialized with 1 to reflect that this is the first iteration. The spectrum of the signal, as subsampled in step 1010, is referred to as X(k), and A.sub.0 is initialized as the natural logarithm of that spectrum, A.sub.0=log(X(k)).The algorithm then proceeds to step 1030, which may be considered the first step of the iteration. In step 1040, a cepstrum C.sub.n is then calculated from A.sub.n(k) by performing a Fourier transformation on A.sub.n(k). In step 1050, smoothing, such as, for example, low-pass filtering, is applied to the cepstrum C.sub.n calculated in step 1040. In step 1060, the cepstrum C.sub.n is transformed back into the frequency domain by using Fourier transformation. In step 1070, a termination criterion is applied to decide whether to perform another iteration. For example, step 1070 may lead to termination when a set number of rounds has been performed, and/or upon observing that log(X(k)) and C.sub.n have converged sufficiently close. In one embodiment, the iterative smoothing may be stopped once 16 rounds have been performed or upon the maximum difference between log(X(k)) and C.sub.n being below 0.23 for all frequency. Advantageously, this allows for the execution time of the smoothing algorithm to be assigned an upper limit, for example to support real-time operation, while still performing as many iterations as feasible within that limit, see col. 10 line 65 to col. 12 line 3. 
It would have been obvious to one of ordinary skill in the art to combine the Haupt invention with the teachings of Marsh in order to reduce the computational complexity and memory requirements of envelope estimation, see col. 11 lines 1-4.


Claim(s) 1, 17, 19-21, 24-26, 28-30, 33-35 is/are rejected under 35 U.S.C. 103 as being unpatentable over Haupt U.S. PAP 2013/019738 A1, in view of Marsh U.S. Patent No. 9,947,341 B1 further in view of Theverapperum,a U.S. PAP 2018/0033449 A1.



Regarding claim 23 Haupt in view of Marsh does not teach the electronic device according to claim 19, wherein at least one of said at least one reference style feature and said at least one base style feature is obtained by processing at least one of said at least one reference audio signal and said at least one base audio signal in at least one neural network.  
In a similar field of endeavor Theverapperuma teaches a system and method of speech enhancement using a deep neural network-based combined signal, see par. [0001]. As the user is using the headset or directly using the electronic device to transmit his speech, environmental noise may also be present (e.g., noise sources in FIG. 1), see par. [0014]. The noise suppressor 160 receives the acoustic signal from the microphone 120, the noise reference signal (non-speech) from the speech suppressor 150, and the speech reference signal from the neural network 140 and generates an enhanced speech signal, see par. [0044].
It would have been obvious to one of ordinary skill in the art to combine the Haupt in view of Marsh in view of Basu invention with the teachings of Theverapperuma for the benefit of suppressing noise (non-speech) and enhancing the output signal, see par. [0044].
Regarding claim 32 Haupt in view of Marsh does not teach the method according to claim 28, wherein at least one of said at least one reference style feature and said at least one base style feature is obtained by processing at least one of said at least one reference audio signal and said at least one base audio signal in at least one neural network.  
In a similar field of endeavor Theverapperuma teaches a system and method of speech enhancement using a deep neural network-based combined signal, see par. [0001]. As the user is using the headset or directly using the electronic device to transmit his speech, environmental noise may also be present (e.g., noise sources in FIG. 1), see par. [0014]. The noise suppressor 160 receives the acoustic signal from the microphone 120, the noise reference signal (non-speech) from the speech suppressor 150, and the speech reference signal from the neural network 140 and generates an enhanced speech signal, see par. [0044].
It would have been obvious to one of ordinary skill in the art to combine the Haupt in view of Marsh in view of Basu invention with the teachings of Theverapperuma for the benefit of suppressing noise (non-speech) and enhancing the output signal, see par. [0044].
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Pertinent prior art available on form 892.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Ortiz-Sanchez whose telephone number is (571)270-3711. The examiner can normally be reached Monday- Friday 9AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL ORTIZ-SANCHEZ/Primary Examiner, Art Unit 2656