DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 8, 14, 17 and 20 rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claims are directed to software programs which do not fall within the statutory categories of invention.

 Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: in claim 7, a step of converting acoustic data…, claim 15, training unit configured to…, claim 17 step for training a discriminator, claim 18, a training unit configured to…, claim 10 a step of training…
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 15 and 18 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the enablement requirement.  The claim(s) contains subject matter which was not described in the specification in such a way as to enable one skilled in the art to which it pertains, or with which it is most nearly connected, to make and/or use the invention. Claims 15 and 18 are directed to a single means of training a voice quality converter.
A single means claim, i.e., where a means recitation does not appear in combination with another recited element of means, is subject to an enablement rejection under 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph. In re Hyatt, 708 F.2d 712, 714-715, 218 USPQ 195, 197 (Fed. Cir. 1983) (A single means claim which covered every conceivable means for achieving the stated purpose was held nonenabling for the scope of the claim because the specification disclosed at most only those means known to the inventor.). When claims depend on a recited property, a fact situation comparable to Hyatt is possible, where the claim covers every conceivable structure (means) for achieving the stated property (result) while the specification discloses at most only those known to the inventor.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-5, 7, 8 and 16-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated  by Nakashika U.S. PAP 2019/0051314 A1.


Regarding claim 1 Nakashika teaches a signal processing apparatus (voice conversion device, see abstract) comprising: 
a voice quality conversion unit configured to convert acoustic data of any sound of an input sound source to acoustic data of voice quality of a target sound source different from the input sound source on a basis of a voice quality converter parameter obtained by training using acoustic data for each of one or more sound sources as training data (a parameter learning unit in which a probabilistic model that uses speech information, speaker information, and phonological information as variables to thereby express relationships among binding energies between any two of the speech information, the speaker information and the phonological information by parameters is prepared, wherein the speech information is obtained based on a speech;  a voice conversion processing unit that performs voice conversion processing of the speech information obtained on the basis of the speech of an input speaker, based both on the parameters determined by the parameter learning unit and on the speaker information of a target speaker, see abstract), the acoustic data being different from parallel data or clean data (non-parallel voice conversion can perform learning using free utterance, and therefore is superior in terms of convenience and usefulness, see par. [0004]).
Regarding claim 2 Nakashika teaches the signal processing apparatus according to claim 1, wherein the training data includes acoustic data of a sound of the input sound source or acoustic data of a sound of the target sound source (the parameter learning unit, the parameters are determined by performing learning by sequentially inputting the speech information and the speaker information corresponding to the speech information into the probabilistic model, see par. [0010]).
Regarding claim 3 Nakashika teaches the signal processing apparatus according to claim 1, wherein the voice quality converter parameter is obtained by training using the training data and a discriminator parameter for discriminating a sound source of input acoustic data obtained by training using the training data (The speech signal for learning may either be a speech signal based on speech data recorded in advance, or a speech signal obtained by directly converting a speech vocalized by a speaker through a microphone. The corresponding speaker information is not particularly limited as long as it can discriminate whether one speech signal for learning and another speech signal for learning are speech signals caused by the same speaker or by different speakers, see par. [0023]).
Regarding claim 4 Nakashika teaches the signal processing apparatus according to claim 3, wherein the training data of a sound of another sound source different from the input sound source and the target sound source is used for training the discriminator parameter (speaker information is not particularly limited as long as it can discriminate the speaker of one speech signal for learning from the speaker of another speech signal for learning, see par. [0028]).
Regarding claim 5 Nakashika teaches the signal processing apparatus according to claim 3, wherein the training data of a sound of the target sound source is used for training the discriminator parameter (converts the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]), and only the training data of a sound of the input sound source is used as the training data for training the voice quality converter parameter (he speaker information setting section 123 is adapted to set a target speaker (which is a voice conversion destination), and output target speaker information. Here, the target speaker to be set by the speaker information setting section 123 is selected from speakers whose speaker information is acquired by the parameter estimating section 114 of the parameter learning unit 11 by performing learning processing in advance, see par. [0038]).
Regarding claim 7 Nakashika teaches a signal processing method (invention relates to a voice conversion device, a voice conversion method and a program that make it possible to perform voice conversion for an arbitrary speaker, see par. [0001]), by a signal processing apparatus, comprising: 
converting acoustic data of any sound of an input sound source to acoustic data of voice quality of a target sound source different from the input sound source on a basis of a voice quality converter parameter obtained by training using acoustic data for each of one or more sound sources as training data(a parameter learning unit in which a probabilistic model that uses speech information, speaker information, and phonological information as variables to thereby express relationships among binding energies between any two of the speech information, the speaker information and the phonological information by parameters is prepared, wherein the speech information is obtained based on a speech;  a voice conversion processing unit that performs voice conversion processing of the speech information obtained on the basis of the speech of an input speaker, based both on the parameters determined by the parameter learning unit and on the speaker information of a target speaker, see abstract), the acoustic data being different from parallel data or clean data (non-parallel voice conversion can perform learning using free utterance, and therefore is superior in terms of convenience and usefulness, see par. [0004]).
Regarding claim 8 Nakashika teaches a program that causes a computer to execute process (invention relates to a voice conversion device, a voice conversion method and a program that make it possible to perform voice conversion for an arbitrary speaker, see par. [0001]) comprising:
a step of converting acoustic data of any sound of an input sound source to acoustic data of voice quality of a target sound source different from the input sound source on a basis of a voice quality converter parameter obtained by training using acoustic data for each of one or more sound sources as training data (a parameter learning unit in which a probabilistic model that uses speech information, speaker information, and phonological information as variables to thereby express relationships among binding energies between any two of the speech information, the speaker information and the phonological information by parameters is prepared, wherein the speech information is obtained based on a speech;  a voice conversion processing unit that performs voice conversion processing of the speech information obtained on the basis of the speech of an input speaker, based both on the parameters determined by the parameter learning unit and on the speaker information of a target speaker, see abstract), the acoustic data being different from parallel data or clean data (non-parallel voice conversion can perform learning using free utterance, and therefore is superior in terms of convenience and usefulness, see par. [0004]).
Regarding claim 16 Nakashika teaches a training method, by a training apparatus, comprising: training a discriminator parameter for discriminating a sound source of input acoustic data using each acoustic data for each of a plurality of sound sources as training data, the acoustic data being different from parallel data or clean data (a parameter learning unit in which a probabilistic model that uses speech information, speaker information, and phonological information as variables to thereby express relationships among binding energies between any two of the speech information, the speaker information and the phonological information by parameters is prepared, wherein the speech information is obtained based on a speech;  a voice conversion processing unit that performs voice conversion processing of the speech information obtained on the basis of the speech of an input speaker, based both on the parameters determined by the parameter learning unit and on the speaker information of a target speaker, see abstract; non-parallel voice conversion can perform learning using free utterance, and therefore is superior in terms of convenience and usefulness, see par. [0004]).
Regarding claim 17 Nakashika teaches a program that causes a computer to execute processing including: a step of training a discriminator parameter for discriminating a sound source of input acoustic data using each acoustic data for each of a plurality of sound sources as training data, the acoustic data being different from parallel data or clean data (a parameter learning unit in which a probabilistic model that uses speech information, speaker information, and phonological information as variables to thereby express relationships among binding energies between any two of the speech information, the speaker information and the phonological information by parameters is prepared, wherein the speech information is obtained based on a speech;  a voice conversion processing unit that performs voice conversion processing of the speech information obtained on the basis of the speech of an input speaker, based both on the parameters determined by the parameter learning unit and on the speaker information of a target speaker, see abstract; non-parallel voice conversion can perform learning using free utterance, and therefore is superior in terms of convenience and usefulness, see par. [0004]).
Regarding claim 18 Nakashika teaches a training apparatus comprising: a training unit configured to train a voice quality converter parameter for converting acoustic data of any sound of an input sound source to acoustic data of voice quality of a target sound source different from the input sound source using acoustic data for each of one or more sound sources as training data, the acoustic data being different from parallel data or clean data (a parameter learning unit in which a probabilistic model that uses speech information, speaker information, and phonological information as variables to thereby express relationships among binding energies between any two of the speech information, the speaker information and the phonological information by parameters is prepared, wherein the speech information is obtained based on a speech;  a voice conversion processing unit that performs voice conversion processing of the speech information obtained on the basis of the speech of an input speaker, based both on the parameters determined by the parameter learning unit and on the speaker information of a target speaker, see abstract; non-parallel voice conversion can perform learning using free utterance, and therefore is superior in terms of convenience and usefulness, see par. [0004]).
Regarding claim 19 Nakashika teaches a training method, by a training apparatus, comprising: training a voice quality converter parameter for converting acoustic data of any sound of an input sound source to acoustic data of voice quality of a target sound source different from the input sound source using acoustic data for each of one or more sound sources as training data, the acoustic data being different from parallel data or clean data (a parameter learning unit in which a probabilistic model that uses speech information, speaker information, and phonological information as variables to thereby express relationships among binding energies between any two of the speech information, the speaker information and the phonological information by parameters is prepared, wherein the speech information is obtained based on a speech;  a voice conversion processing unit that performs voice conversion processing of the speech information obtained on the basis of the speech of an input speaker, based both on the parameters determined by the parameter learning unit and on the speaker information of a target speaker, see abstract; non-parallel voice conversion can perform learning using free utterance, and therefore is superior in terms of convenience and usefulness, see par. [0004]).
Regarding claim 20 Nakashika teaches a program that causes a computer to execute processing comprising: a step of training a voice quality converter parameter for converting acoustic data of any sound of an input sound source to acoustic data of voice quality of a target sound source different from the input sound source using acoustic data for each of one or more sound sources as training data, the acoustic data being different from parallel data or clean data (a parameter learning unit in which a probabilistic model that uses speech information, speaker information, and phonological information as variables to thereby express relationships among binding energies between any two of the speech information, the speaker information and the phonological information by parameters is prepared, wherein the speech information is obtained based on a speech;  a voice conversion processing unit that performs voice conversion processing of the speech information obtained on the basis of the speech of an input speaker, based both on the parameters determined by the parameter learning unit and on the speaker information of a target speaker, see abstract; non-parallel voice conversion can perform learning using free utterance, and therefore is superior in terms of convenience and usefulness, see par. [0004]).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 9-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nakashika U.S. PAP 2019/0051314 A1 in view of Sako U.S. PAP 2015/0356980 A1.

Regarding claim 6 Nakashika does not teach the signal processing apparatus according to claim 1, wherein the training data is acoustic data obtained by performing sound source separation.
In a similar field of endeavor Sako teaches an estimation unit 14B which can separate voice signals of a specific user from voice signals of other users, noise, and environmental sound among input voice signals into a sound source, and can perform an estimation process using an estimation filter, see par. [0098]. The sound source separation processor 141 performs a sound source separation process on the recorded content, in other words, voice signals read out from the voice signal DB 17, see par. [00100]. The specific user's voice determination processor 143 determines (identifies or recognizes) voice signals of the user specified by the user specifying unit 11 from the respective voice signals separated into sound sources by the sound source separation processor 141. For example, the voice determination processor 143 may perform speaker recognition on respective voice signals, and may determine voice signals of the specific user, see par. [0101]. The estimation processor 145 performs a process to estimate a voice signal (Ousual in FIG. 8) that is directly heard by the specific user himself/herself usually, on the basis of voice signals (Orec in FIG. 8) determined to be the voice signals of the specific user. Specifically, the process is performed by using the voice signal estimation filter corresponding to the specific user detected by the filter detecting unit 12 as explained above, see par. [0102]. The combiner 147 performs a process to combine the voice signals of the specific user subjected to the estimation process by the estimation processor 145, with other voice signals separated into a sound source, see par. [0103].
It would have been obvious to one of ordinary skill in the art to combine the  Nakashika invention with the teachings of Sako for the benefit of identifying a users specific voice signals in  the input, see par. [0101].
Regarding claim 9 Nakashika teaches a signal processing apparatus (invention relates to a voice conversion device, a voice conversion method and a program that make it possible to perform voice conversion for an arbitrary speaker, see par. [0001]) comprising: 
a voice quality conversion unit configured to perform voice quality conversion on the acoustic data of the target sound ( a voice conversion processing unit that performs voice conversion processing of the speech information obtained on the basis of the speech of an input speaker, based both on the parameters determined by the parameter learning unit and on the speaker information of a target speaker, see abstract); 
and a synthesizing unit configured to synthesize acoustic data obtained by the voice quality conversion and acoustic data of the non-target sound ( the converted speech signal generated by the post-processing section 125 is outputted to the outside by the speech signal output section 126, see par. [0072]).
However Nakashika does not teach a sound source separation unit configured to separate predetermined acoustic data into acoustic data of a target sound and acoustic data of a non-target sound by sound source separation;
In a similar field of endeavor Sako teaches an estimation unit 14B which can separate voice signals of a specific user from voice signals of other users, noise, and environmental sound among input voice signals into a sound source, and can perform an estimation process using an estimation filter, see par. [0098]. The sound source separation processor 141 performs a sound source separation process on the recorded content, in other words, voice signals read out from the voice signal DB 17, see par. [00100]. The specific user's voice determination processor 143 determines (identifies or recognizes) voice signals of the user specified by the user specifying unit 11 from the respective voice signals separated into sound sources by the sound source separation processor 141. For example, the voice determination processor 143 may perform speaker recognition on respective voice signals, and may determine voice signals of the specific user, see par. [0101]. The estimation processor 145 performs a process to estimate a voice signal (Ousual in FIG. 8) that is directly heard by the specific user himself/herself usually, on the basis of voice signals (Orec in FIG. 8) determined to be the voice signals of the specific user. Specifically, the process is performed by using the voice signal estimation filter corresponding to the specific user detected by the filter detecting unit 12 as explained above, see par. [0102]. The combiner 147 performs a process to combine the voice signals of the specific user subjected to the estimation process by the estimation processor 145, with other voice signals separated into a sound source, see par. [0103].
It would have been obvious to one of ordinary skill in the art to combine the  Nakashika invention with the teachings of Sako for the benefit of identifying a users specific voice signals in  the input, see par. [0101].
Regarding claim 10 Sako teaches signal processing apparatus according to claim 9, wherein the predetermined acoustic data is acoustic data of a mixed sound including the target sound (voice signals  of a specific user from voice signals of other users, noise, and environmental sound among input voice signals into a sound source, and can perform an estimation process using an estimation filter, see par. [0098]).
Regarding claim 11 Sako teaches the signal processing apparatus according to claim 9, wherein the predetermined acoustic data is clean data of the target sound (estimating a first voice signal heard by a specific user himself/herself, see par. [0010]).
Regarding claim 12 Nakashika teaches the signal processing apparatus according to claim 9, wherein the voice quality conversion unit performs the voice quality conversion on a basis of a voice quality converter parameter obtained by training using acoustic data for each of one or more of sound sources as training data, the acoustic data being different from parallel data or clean data (non-parallel voice conversion can perform learning using free utterance, and therefore is superior in terms of convenience and usefulness, see par. [0004]).
Regarding claim 13 Nakashika teaches a signal processing method, by a signal processing apparatus (invention relates to a voice conversion device, a voice conversion method and a program that make it possible to perform voice conversion for an arbitrary speaker, see par. [0001]), comprising: 
performing voice quality conversion on the acoustic data of the target sound ( a voice conversion processing unit that performs voice conversion processing of the speech information obtained on the basis of the speech of an input speaker, based both on the parameters determined by the parameter learning unit and on the speaker information of a target speaker, see abstract);  
and synthesizing acoustic data obtained by the voice quality conversion and acoustic data of the non-target sound ( the converted speech signal generated by the post-processing section 125 is outputted to the outside by the speech signal output section 126, see par. [0072]).
However Nakashika does not teach separating predetermined acoustic data into acoustic data of a target sound and acoustic data of a non-target sound by sound source separation.
In a similar field of endeavor Sako teaches an estimation unit 14B which can separate voice signals of a specific user from voice signals of other users, noise, and environmental sound among input voice signals into a sound source, and can perform an estimation process using an estimation filter, see par. [0098]. The sound source separation processor 141 performs a sound source separation process on the recorded content, in other words, voice signals read out from the voice signal DB 17, see par. [00100]. The specific user's voice determination processor 143 determines (identifies or recognizes) voice signals of the user specified by the user specifying unit 11 from the respective voice signals separated into sound sources by the sound source separation processor 141. For example, the voice determination processor 143 may perform speaker recognition on respective voice signals, and may determine voice signals of the specific user, see par. [0101]. The estimation processor 145 performs a process to estimate a voice signal (Ousual in FIG. 8) that is directly heard by the specific user himself/herself usually, on the basis of voice signals (Orec in FIG. 8) determined to be the voice signals of the specific user. Specifically, the process is performed by using the voice signal estimation filter corresponding to the specific user detected by the filter detecting unit 12 as explained above, see par. [0102]. The combiner 147 performs a process to combine the voice signals of the specific user subjected to the estimation process by the estimation processor 145, with other voice signals separated into a sound source, see par. [0103].
It would have been obvious to one of ordinary skill in the art to combine the  Nakashika invention with the teachings of Sako for the benefit of identifying a users specific voice signals in  the input, see par. [0101].
Regarding claim 14 Nakashika teaches a program that causes a computer to execute processing comprising the steps of:
performing voice quality conversion on the acoustic data of the target sound( a voice conversion processing unit that performs voice conversion processing of the speech information obtained on the basis of the speech of an input speaker, based both on the parameters determined by the parameter learning unit and on the speaker information of a target speaker, see abstract);
and synthesizing acoustic data obtained by the voice quality conversion and acoustic data of the non-target sound ( the converted speech signal generated by the post-processing section 125 is outputted to the outside by the speech signal output section 126, see par. [0072]).
However Nakashika does not teach separating predetermined acoustic data into acoustic data of a target sound and acoustic data of a non-target sound by sound source separation.
In a similar field of endeavor Sako teaches an estimation unit 14B which can separate voice signals of a specific user from voice signals of other users, noise, and environmental sound among input voice signals into a sound source, and can perform an estimation process using an estimation filter, see par. [0098]. The sound source separation processor 141 performs a sound source separation process on the recorded content, in other words, voice signals read out from the voice signal DB 17, see par. [00100]. The specific user's voice determination processor 143 determines (identifies or recognizes) voice signals of the user specified by the user specifying unit 11 from the respective voice signals separated into sound sources by the sound source separation processor 141. For example, the voice determination processor 143 may perform speaker recognition on respective voice signals, and may determine voice signals of the specific user, see par. [0101]. The estimation processor 145 performs a process to estimate a voice signal (Ousual in FIG. 8) that is directly heard by the specific user himself/herself usually, on the basis of voice signals (Orec in FIG. 8) determined to be the voice signals of the specific user. Specifically, the process is performed by using the voice signal estimation filter corresponding to the specific user detected by the filter detecting unit 12 as explained above, see par. [0102]. The combiner 147 performs a process to combine the voice signals of the specific user subjected to the estimation process by the estimation processor 145, with other voice signals separated into a sound source, see par. [0103].
It would have been obvious to one of ordinary skill in the art to combine the  Nakashika invention with the teachings of Sako for the benefit of identifying a users specific voice signals in  the input, see par. [0101].
Regarding claim 15 Nakashika teaches a training apparatus comprising: a training unit configured to train a discriminator parameter for discriminating a sound source of input acoustic data using each acoustic data for each of a plurality of sound sources as training data, the acoustic data being different from parallel data or clean data (a parameter learning unit in which a probabilistic model that uses speech information, speaker information, and phonological information as variables to thereby express relationships among binding energies between any two of the speech information, the speaker information and the phonological information by parameters is prepared, wherein the speech information is obtained based on a speech;  a voice conversion processing unit that performs voice conversion processing of the speech information obtained on the basis of the speech of an input speaker, based both on the parameters determined by the parameter learning unit and on the speaker information of a target speaker, see abstract; non-parallel voice conversion can perform learning using free utterance, and therefore is superior in terms of convenience and usefulness, see par. [0004]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Pertinent prior art available on form 892.
Chen ‘822 teaches a speech separator which discriminates between multiple speakers, see abstract.
 Duong ‘499 teaches a method for modifying audio style of an object using features and not relying on a lot of data, see abstract.
Fukuda ‘197 teaches methods and system for separating target speech from other speeches, see abstract.
Tamura ‘189 teaches voice conversion using target and source speaker attribute information, see abstract.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Ortiz-Sanchez whose telephone number is (571)270-3711. The examiner can normally be reached Monday- Friday 9AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL ORTIZ-SANCHEZ/Primary Examiner, Art Unit 2656