DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments

Applicant’s arguments with respect to claim(s) 1,17,19-26,28-35 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant cancelled claim 36 so 101 rejection is invalid.
Applicant amended the claims to include “wherein said at least one input audio signal comprises an audio content other than a speech content, the audio content being modified to be included in the at least one output audio signal”. A new search was made and art was found to Basu which teaches a "Concatenative Synthesizer" applies concatenative synthesis to create a musical output from a database of musical notes and an input musical score, see abstract. The Concatenative Synthesizer begins operation by receiving one or more music texture databases 315 selected via a user control module 335. As noted above, these music texture databases each represent different musical genres, performers, performances, instrument recordings, etc. that are to be emulated in constructing the musical output. Given a sound sample A' 310 (audio content other than speech content), and possibly a corresponding musical score A 305, see par. [0060]. A candidate assembly module 350 uses concatenative synthesis to combine the sequence of notes from the music texture database 315 corresponding to the optimal path. Finally, the candidate assembly module 350 then outputs either an audio music output sound B' 355, or a new music score B.sub.new 360, or both, see par. [0065]. See par. [0060-0065].


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 17, 19-20, 22, 24-26, 28-29, 31, 33-36 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tamura U.S. PAP 2007/0168189 A1, in view of Basu U.S. PAP 2007/0289432 A1.

Regarding claim 1 Tamura teaches an electronic device comprising at least one memory and one or several processors configured for (speech processing apparatus, see par. [0011]):

generating at least one output audio signal from said at least one base audio signal, said at least one output audio signal having style features obtained by modifying said at least one base audio signal so that a distance between at least one base style feature representative of a style of said at least one base signal and at least one reference style feature decreases (a voice conversion  making means to make speech conversion functions for converting the one of a plurality of source speaker speech units to target speaker speech units, see par. [0011]). 
However Tamura does not teach wherein said at least one input audio signal comprises an audio content other than a speech content, the audio content being modified to be included in the at least one output audio signal.
In a similar field of endeavor Basu teaches a "Concatenative Synthesizer" applies concatenative synthesis to create a musical output from a database of musical notes and an input musical score, see abstract. The Concatenative Synthesizer begins operation by receiving one or more music texture databases 315 selected via a user control module 335. As noted above, these music texture databases each represent different musical genres, performers, performances, instrument recordings, etc. that are to be emulated in constructing the musical output. Given a sound sample A' 310 (audio content other than speech content), and 
IT would have been obvious to one of ordinary skill in the art to combine the Tamura invention with the teachings of Basu for the benefit of better fitting musical notes to an input, see par. [0011].
Regarding claim 17 Tamura teaches the electronic device according to claim 1, wherein said at least one input audio signal comprises a speech content (input speech data, see par. [0052]). 
Regarding claim 19 Tamura teaches the electronic device according to claim 1, wherein said at least one reference style feature is representative of a style of at least one reference audio signal (attribute-information generating means generates attribute information corresponding to the extracted conversion-target-speaker speech units, see par. [0056]). 
Regarding claim 20 Tamura teaches the electronic device according to claim 19, wherein said at least one reference audio signal comprises a speech content ((attribute-information generating means generates attribute information corresponding to the extracted conversion-target-speaker speech units, see par. [0056]). 
claim 22 Tamura teaches the electronic device according to claim 1, wherein modifying said at least one base audio signal takes into account a distance between at least one input content feature representative of a content of said at least one input signal and at least one base content feature representative of a content of said at least one base audio signal (the voice conversion rule includes a translation distance between a spectrum parameter of the conversion-source speaker and the conversion-target-speaker, see par. [0107]). 
Regarding claim 24 Tamura teaches the electronic device according to claim 1, wherein obtaining said at least one reference style feature comprises at least one of: subband filtering of said at least one reference audio signal; obtaining an envelope of said at least one subband filtered signal; modulating said obtained envelope (pitch-cycle waveforms are generated by inverse Fourier transformation, the pitch-cycle waveforms may be regenerated by filtering with appropriate voice-source information, for mel-cepstrum coefficients, the waveforms can be generated with voice-source information and a spectrum envelope parameter, see ar. [0171]). 
Regarding claim 25 Tamura teaches the electronic device according to claim 1, wherein obtaining said at least one base style feature comprises at least one of: subband filtering of said at least one base signal; obtaining an envelope of said at least one subband filtered base signal; modulating said obtained envelope (pitch-cycle waveforms are generated by inverse Fourier transformation, the pitch-cycle waveforms may be regenerated by filtering with appropriate voice-source information, for mel-cepstrum coefficients, the waveforms can be generated with voice-source information and a spectrum envelope parameter, see ar. [0171]). 
Regarding claim 26 Tamura teaches a, said method comprising: 

generating at least one output audio signal from said at least one base signal, said at least one output audio signal having style features obtained by modifying said at least one base audio signal so that a distance between at least one base style feature representative of a style of said at least one base audio signal and at least one reference style feature decreases (a voice conversion  making means to make speech conversion functions for converting the one of a plurality of source speaker speech units to target speaker speech units, see par. [0011]). 
However Tamura does not teach wherein said at least one input audio signal comprises an audio content other than a speech content, the audio content being modified to be included in the at least one output audio signal.
In a similar field of endeavor Basu teaches a "Concatenative Synthesizer" applies concatenative synthesis to create a musical output from a database of musical notes and an input musical score, see abstract. The Concatenative Synthesizer begins operation by receiving one or more music texture databases 315 selected via a user control module 335. As noted above, these music texture databases each represent different musical genres, performers, performances, instrument recordings, etc. that are to be emulated in constructing the musical output. Given a sound sample A' 310 (audio content other than speech content), and 
IT would have been obvious to one of ordinary skill in the art to combine the Tamura invention with the teachings of Basu for the benefit of better fitting musical notes to an input, see par. [0011].

Regarding claim 28 Tamura teaches the method according to claim 26, wherein said at least one reference style feature is representative of a style of at least one reference audio signal (attribute-information generating means generates attribute information corresponding to the extracted conversion-target-speaker speech units, see par. [0056]). 
Regarding claim 29 Tamura teaches the method according to claim 28, wherein said at least one reference audio signal comprises a speech content (attribute-information generating means generates attribute information corresponding to the extracted conversion-target-speaker speech units, see par. [0056]). 
claim 31 Tamura teaches the method according to claim 26, wherein modifying said at least one base audio signal takes into account a distance between at least one input content feature representative of a content of said at least one input signal and at least one base content feature representative of a content of said at least one base audio signal (the voice conversion rule includes a translation distance between a spectrum parameter of the conversion-source speaker and the conversion-target-speaker, see par. [0107]). 
Regarding claim 33 Tamura teaches the method according to claim 26, wherein obtaining said at least one reference style feature comprises at least one of: subband filtering of said at least one reference audio signal; obtaining an envelope of said at least one subband filtered signal; modulating said obtained envelope (pitch-cycle waveforms are generated by inverse Fourier transformation, the pitch-cycle waveforms may be regenerated by filtering with appropriate voice-source information, for mel-cepstrum coefficients, the waveforms can be generated with voice-source information and a spectrum envelope parameter, see ar. [0171]). 
Regarding claim 34 Tamura teaches the method according to claim 26, wherein obtaining said at least one base style feature comprises at least one of: subband filtering of said at least one base audio signal; obtaining an envelope of said at least one subband filtered base signal; modulating said obtained envelope(pitch-cycle waveforms are generated by inverse Fourier transformation, the pitch-cycle waveforms may be regenerated by filtering with appropriate voice-source information, for mel-cepstrum coefficients, the waveforms can be generated with voice-source information and a spectrum envelope parameter, see ar. [0171]). 
claim 35 Tamura teaches a non-transitory computer readable storage medium, comprising program code instructions executable by a processor, for (program processing speech, see claim 13), said method comprising: 
obtaining at least one base audio signal being a copy of at least one input audio signal (conversion-source-speaker speech storing means configured to store information on a plurality of speech units of a conversion-source speaker, see par. [0011,0052]); 
generating at least one output audio signal from said at least one base signal, said at least one output audio signal having style features obtained by modifying said at least one base audio signal so that a distance between at least one base style feature representative of a style of said at least one base audio signal and at least one reference style feature decreases (a voice conversion  making means to make speech conversion functions for converting the one of a plurality of source speaker speech units to target speaker speech units, see par. [0011]). 
However Tamura does not teach wherein said at least one input audio signal comprises an audio content other than a speech content, the audio content being modified to be included in the at least one output audio signal.
In a similar field of endeavor Basu teaches a "Concatenative Synthesizer" applies concatenative synthesis to create a musical output from a database of musical notes and an input musical score, see abstract. The Concatenative Synthesizer begins operation by receiving one or more music texture databases 315 selected via a user control module 335. As noted above, these music texture that are to be emulated in constructing the musical output. Given a sound sample A' 310 (audio content other than speech content), and possibly a corresponding musical score A 305, see par. [0060]. A candidate assembly module 350 uses concatenative synthesis to combine the sequence of notes from the music texture database 315 corresponding to the optimal path. Finally, the candidate assembly module 350 then outputs either an audio music output sound B' 355, or a new music score B.sub.new 360, or both, see par. [0065]. See par. [0060-0065].
IT would have been obvious to one of ordinary skill in the art to combine the Tamura invention with the teachings of Basu for the benefit of better fitting musical notes to an input, see par. [0011].

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 21, 23, 30, 32  is/are rejected under 35 U.S.C. 103 as being unpatentable over Tamura U.S. PAP 2007/0168189 A1, in view of Basu U.S. PAP 2007/0289432 A1, further in view of Theverapperuma U.S. PAP 2018/0033449 A1.

Regarding claim 21 Tamura in view of Basu does not teach the electronic device according to claim 19, wherein said at least one reference audio signal comprises an audio content other than a speech content.
In a similar field of endeavor Theverapperuma teaches a system and method of speech enhancement using a deep neural network-based combined signal, see par. [0001]. As the user is using the headset or directly using the electronic device to transmit his speech, environmental noise may also be present (e.g., noise sources in FIG. 1), see par. [0014]. The noise suppressor 160 receives the acoustic signal from the microphone 120, the noise reference signal (non-speech) from the speech suppressor 150, and the speech reference signal from the neural network 140 and generates an enhanced speech signal, see par. [0044].
It would have been obvious to one of ordinary skill in the art to combine the Tamura in view of Basu invention with the teachings of Theverapperuma for the benefit of suppressing noise (non-speech) and enhancing the output signal, see par. [0044].
 Regarding claim 23 Tamura in view of Basu does not teach the electronic device according to claim 1, wherein at least one of said reference style feature, said at least one input content feature, said at least one base style feature and said at least one base content feature is obtained by processing at 
In a similar field of endeavor Theverapperuma teaches a system and method of speech enhancement using a deep neural network-based combined signal, see par. [0001]. As the user is using the headset or directly using the electronic device to transmit his speech, environmental noise may also be present (e.g., noise sources in FIG. 1), see par. [0014]. The noise suppressor 160 receives the acoustic signal from the microphone 120, the noise reference signal (non-speech) from the speech suppressor 150, and the speech reference signal from the neural network 140 and generates an enhanced speech signal, see par. [0044].
It would have been obvious to one of ordinary skill in the art to combine the Tamura in view of Basu invention with the teachings of Theverapperuma for the benefit of suppressing noise (non-speech) and enhancing the output signal, see par. [0044].

Regarding claim 30 Tamura in view of Basu does not teach the method according to claim 28, wherein said at least one reference audio signal comprises an audio content other than a speech content. 
In a similar field of endeavor Theverapperuma teaches a system and method of speech enhancement using a deep neural network-based combined signal, see par. [0001]. As the user is using the headset or directly using the electronic device to transmit his speech, environmental noise may also be present (e.g., noise sources in FIG. 1), see par. [0014]. The noise suppressor 160 receives the acoustic signal from the microphone 120, the noise reference signal (non-
It would have been obvious to one of ordinary skill in the art to combine the Tamura in view of Basu invention with the teachings of Theverapperuma for the benefit of suppressing noise (non-speech) and enhancing the output signal, see par. [0044].
Regarding claim 32 Tamura in view of Basu does not teach the method according to claim 26, wherein at least one of said at least one reference style feature, said at least one input content feature, said at least one base style feature and said at least one base content feature is obtained by processing at least one of said at least one input audio signal, said at least one reference audio signal and/or said at least one base audio signal in at least one neural network. 
In a similar field of endeavor Theverapperuma teaches a system and method of speech enhancement using a deep neural network-based combined signal, see par. [0001]. As the user is using the headset or directly using the electronic device to transmit his speech, environmental noise may also be present (e.g., noise sources in FIG. 1), see par. [0014]. The noise suppressor 160 receives the acoustic signal from the microphone 120, the noise reference signal (non-speech) from the speech suppressor 150, and the speech reference signal from the neural network 140 and generates an enhanced speech signal, see par. [0044].
It would have been obvious to one of ordinary skill in the art to combine the Tamura in view of Basu invention with the teachings of Theverapperuma for the benefit of suppressing noise (non-speech) and enhancing the output signal, see par. [0044].
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Pertinent prior art available on form 892.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Ortiz-Sanchez whose telephone number is (571)270-3711.  The examiner can normally be reached on Monday- Friday 9AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to 






/MICHAEL ORTIZ-SANCHEZ/Primary Examiner, Art Unit 2656