DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments, see remarks, filed 8/24/2022, with respect to claims 8, 14, 17, 20 have been fully considered and are persuasive.  The 101 rejection of claims 8, 14, 17, and 20 has been withdrawn. 
Applicant's arguments filed 8/24/2022 have been fully considered but they are not persuasive.
Applicant argues that Nakashika does not expressly or inherently describe at least, for example, the features of "receive a voice quality converter parameter, wherein the voice quality converter parameter is based on first training data and a discriminator parameter, the discriminator parameter discriminates a sound source of the first acoustic data, the first training data is based on second acoustic data of mixed sound, the mixed sound includes the sound of the input sound source and a sound of a target sound source ... convert the first acoustic data of the input sound source to third acoustic data of voice quality of the target sound source, wherein the conversion of the first acoustic data to the third acoustic data is based on the voice quality converter parameter," as recited in amended independent claim 1. Nakashika, in its entirety, does not describes receiving a voice quality converter parameter that is based on the learning data and a discriminator parameter, where the learning data is based on acoustic data of mixed sound including sounds of different sound sources. The examiner disagrees. Nakashika teaches the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024];  Additionally it teaches that the present invention also include a configuration in which, as the speech signal for learning (i.e., the input signal), a speech signal of various sounds other than the speaking voice of human may be learned. For example, any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079].

Applicant argues that Nakashika further does not describe that the voice conversion device performs voice conversion of the acoustic data of an input sound source to acoustic data of voice quality of a target sound source based on the voice quality converter parameter.  The examiner disagrees. Nakashika teaches a voice quality conversion processing unit that performs voice conversion processing of the speech information obtained on the basis of the speech of an input speaker, based both on the parameters determined by the parameter learning unit and on the speaker information of a target speaker, see abstract and title.
Applicant argues that Nakashika does not expressly or inherently describe at least, for example, the features of "receive a voice quality converter parameter, wherein the voice quality converter parameter is based on first training data and a discriminator parameter, the discriminator parameter discriminates a sound source of the first acoustic data, the first training data is based on second acoustic data of mixed sound, the mixed sound includes the sound of the input sound source and a sound of a target sound source ... convert the first acoustic data of the input sound source to third acoustic data of voice quality of the target sound source, wherein the conversion of the first acoustic data to the third acoustic data is based on the voice quality converter parameter," as recited in amended independent claim 1. The examiner disagrees. Nakashaki teaches the voice conversion device 1 includes a parameter learning unit 11 and a voice conversion processing unit 12. The parameter learning unit 11 is adapted to determine parameters for voice conversion by performing learning based on the speech signal for learning and the corresponding speaker information. After the parameters are determined by performing the aforesaid learning, the voice conversion processing unit 12 converts the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker (referred to as “target speaker information” hereinafter), and outputs the voice of the target speaker as the converted speech signal, see par. [0024]. The present invention also include a configuration in which, as the speech signal for learning (i.e., the input signal), a speech signal of various sounds other than the speaking voice of human may be learned, as long as learning for obtaining various kinds of information described in the aforesaid embodiment can be performed. For example, any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079].
Regarding the 112 rejections of claims 15 and 18 the applicant’s amendment has overcome the single means claim rejection.


 Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: in claim 7, a step of converting acoustic data…, claim 15, training unit configured to…, claim 17 step for training a discriminator, claim 18, a training unit configured to…, claim 10 a step of training…
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 2, 4-5, 7-8 and 15-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated  by Nakashika U.S. PAP 2019/0051314 A1.


Regarding claim 1 Nakashika teaches a signal processing apparatus (voice conversion device, see abstract) comprising:
A signal processing apparatus, comprising: 
a central processing unit (CPU) (thee voice conversion device 1 includes a Central Processing Unit, see par. [0049]) configured to:
receive first  acoustic data of a sound of an input sound source (he speech signal acquisition section 111 is adapted to acquire the speech signal for learning from an external device, see par. [0026]);
receive a voice quality converter parameter, wherein the voice quality converter parameter is based on first training data and a discriminator parameter (The corresponding speaker information acquisition section 113 is adapted to acquire the corresponding speaker information associated with the acquisition of the speech signal for learning by the speech signal acquisition section 111. The corresponding speaker information is not particularly limited as long as it can discriminate the speaker of one speech signal for learning from the speaker of another speech signal for learning, see par. [0028]), 
the discriminator parameter discriminates a sound source of the first acoustic data (the corresponding speaker information acquisition section 113 acquires information for distinguishing the speaker, among the speakers, whose speech signal for learning is being inputted into the speech signal acquisition section 111, see par. [0028]), 
the first training data is based on second acoustic data of mixed sound (a plurality of speech signals for learning respectively correspond to different speakers, see par. [0028]), 
the mixed sound includes the sound of the input sound source and a sound of a target sound source (the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]; the present invention also include a configuration in which, as the speech signal for learning (i.e., the input signal), a speech signal of various sounds other than the speaking voice of human may be learned. For example, any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079]).
the target sound source is different from the input sound source (convert a speech signal for conversion caused by an arbitrary speaker into a voice of a target speaker, see par. [0022]), and the second acoustic data is different from parallel data and clean data (the non-parallel voice conversion is used. Compared to the parallel voice conversion which needs parallel data, the non-parallel voice conversion does not need parallel data, see par. [0007]; any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079]); 
and  convert the first acoustic data of the input sound source to third acoustic data of voice quality of the target sound source, wherein the conversion of the first acoustic data to the third acoustic data is based on the voice quality converter parameter (e voice conversion processing unit 12 converts the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]).
Regarding claim 2 Nakashika teaches the signal processing apparatus according to claim 1, wherein the first training data includes one of the first acoustic data of the sound of the input sound source or the third acoustic data of the sound of the target sound source (the parameter learning unit, the parameters are determined by performing learning by sequentially inputting the speech information and the speaker information corresponding to the speech information into the probabilistic model, see par. [0010]).
Regarding claim 4 Nakashika teaches the signal processing apparatus according to claim 1, wherein the discrimination parameter is trained based on second training data of a sound of a sound source different from the input sound source and the target sound (speaker information is not particularly limited as long as it can discriminate the speaker of one speech signal for learning from the speaker of another speech signal for learning; the parameter learning unit 11 learns speaking voices of 10 speakers, see par. [0028]).
Regarding claim 5 Nakashika teaches the signal processing apparatus according to claim 1, wherein the discriminator parameter is trained based on second training data of the sound of the target sound source (converts the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]), and the voice quality converter parameter is trained based on the first training data of the sound of the input sound source (the speaker information setting section 123 is adapted to set a target speaker (which is a voice conversion destination), and output target speaker information. Here, the target speaker to be set by the speaker information setting section 123 is selected from speakers whose speaker information is acquired by the parameter estimating section 114 of the parameter learning unit 11 by performing learning processing in advance, see par. [0038]).
Regarding claim 7 Nakashika teaches a signal processing method (invention relates to a voice conversion device, a voice conversion method and a program that make it possible to perform voice conversion for an arbitrary speaker, see par. [0001]), comprising:
receiving first  acoustic data of a sound of an input sound source (he speech signal acquisition section 111 is adapted to acquire the speech signal for learning from an external device, see par. [0026]);
receiving a voice quality converter parameter, wherein the voice quality converter parameter is based on first training data and a discriminator parameter (The corresponding speaker information acquisition section 113 is adapted to acquire the corresponding speaker information associated with the acquisition of the speech signal for learning by the speech signal acquisition section 111. The corresponding speaker information is not particularly limited as long as it can discriminate the speaker of one speech signal for learning from the speaker of another speech signal for learning, see par. [0028]), 
the discriminator parameter discriminates a sound source of the first acoustic data (the corresponding speaker information acquisition section 113 acquires information for distinguishing the speaker, among the speakers, whose speech signal for learning is being inputted into the speech signal acquisition section 111, see par. [0028]), 
the first training data is based on second acoustic data of mixed sound (a plurality of speech signals for learning respectively correspond to different speakers, see par. [0028]), 
the mixed sound includes the sound of the input sound source and a sound of a target sound source (the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]; the present invention also include a configuration in which, as the speech signal for learning (i.e., the input signal), a speech signal of various sounds other than the speaking voice of human may be learned. For example, any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079]).
the target sound source is different from the input sound source (convert a speech signal for conversion caused by an arbitrary speaker into a voice of a target speaker, see par. [0022]), and the second acoustic data is different from parallel data and clean data (the non-parallel voice conversion is used. Compared to the parallel voice conversion which needs parallel data, the non-parallel voice conversion does not need parallel data, see par. [0007]; any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079]); 
and  converting the first acoustic data of the input sound source to third acoustic data of voice quality of the target sound source, wherein the conversion of the first acoustic data to the third acoustic data is based on the voice quality converter parameter (the voice conversion processing unit 12 converts the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]).
Regarding claim 8 Nakashika teaches a non-transitory computer-readable medium having stored thereon computer-executable instructions, which when executed by a computer, cause the computer to execute operations, the operations (The program may either be acquired through a record medium, or through the network. Alternatively, the program may be used in a state where the program is incorporated into the ROM, see par. [0050]) comprising:
receiving first  acoustic data of a sound of an input sound source (he speech signal acquisition section 111 is adapted to acquire the speech signal for learning from an external device, see par. [0026]);
receiving a voice quality converter parameter, wherein the voice quality converter parameter is based on first training data and a discriminator parameter (The corresponding speaker information acquisition section 113 is adapted to acquire the corresponding speaker information associated with the acquisition of the speech signal for learning by the speech signal acquisition section 111. The corresponding speaker information is not particularly limited as long as it can discriminate the speaker of one speech signal for learning from the speaker of another speech signal for learning, see par. [0028]), 
the discriminator parameter discriminates a sound source of the first acoustic data (the corresponding speaker information acquisition section 113 acquires information for distinguishing the speaker, among the speakers, whose speech signal for learning is being inputted into the speech signal acquisition section 111, see par. [0028]), 
the first training data is based on second acoustic data of mixed sound (a plurality of speech signals for learning respectively correspond to different speakers, see par. [0028]), 
the mixed sound includes the sound of the input sound source and a sound of a target sound source (the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]; the present invention also include a configuration in which, as the speech signal for learning (i.e., the input signal), a speech signal of various sounds other than the speaking voice of human may be learned. For example, any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079]).
the target sound source is different from the input sound source (convert a speech signal for conversion caused by an arbitrary speaker into a voice of a target speaker, see par. [0022]), and the second acoustic data is different from parallel data and clean data (the non-parallel voice conversion is used. Compared to the parallel voice conversion which needs parallel data, the non-parallel voice conversion does not need parallel data, see par. [0007]; any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079]); 
and  converting the first acoustic data of the input sound source to third acoustic data of voice quality of the target sound source, wherein the conversion of the first acoustic data to the third acoustic data is based on the voice quality converter parameter (e voice conversion processing unit 12 converts the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]).
Regarding claim 15 Nakashika teaches a training apparatus comprising: 
a central processing
receive training data of first acoustic data of each of a plurality of sound sources, wherein the plurality of sound sources includes a target sound source and an input sound source, the training data is based on second acoustic data of mixed sound, the mixed sound includes a sound of the input sound source and a sound of the target sound source, and the target sound source is different from the input sound source (voice signals  of a specific user from voice signals of other users, noise, and environmental sound among input voice signals into a sound source, and can perform an estimation process using an estimation filter, see par. [0098]); 
train a discriminator parameter based on the received training data, wherein the discriminator parameter is for discrimination of the input sound source (the corresponding speaker information acquisition section 113 acquires information for distinguishing the speaker, among the speakers, whose speech signal for learning is being inputted into the speech signal acquisition section 111, a plurality of speech signals for learning respectively correspond to different speakers, see par. [0028]); 
generate a voice quality converter parameter based on the training data and the discriminator parameter, wherein the first acoustic data is different from parallel data and clean data(the non-parallel voice conversion is used. Compared to the parallel voice conversion which needs parallel data, the non-parallel voice conversion does not need parallel data, see par. [0007]; any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079]); 
and output the generated voice quality converter parameter (the voice conversion processing unit 12 converts the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]).

Regarding claim 16 Nakashika teaches a training method, comprising: 
receiving training data of first acoustic data of each of a plurality of sound sources, wherein the plurality of sound sources includes a target sound source and an input sound source, the training data is based on second acoustic data of mixed sound, the mixed sound includes a sound of the input sound source and a sound of the target sound source, and the target sound source is different from the input sound source (voice signals  of a specific user from voice signals of other users, noise, and environmental sound among input voice signals into a sound source, and can perform an estimation process using an estimation filter, see par. [0098]); 
training a discriminator parameter based on the received training data, wherein the discriminator parameter is for discrimination of the input sound source (the corresponding speaker information acquisition section 113 acquires information for distinguishing the speaker, among the speakers, whose speech signal for learning is being inputted into the speech signal acquisition section 111, a plurality of speech signals for learning respectively correspond to different speakers, see par. [0028]); 
generating a voice quality converter parameter based on the training data and the discriminator parameter, wherein the first acoustic data is different from parallel data and clean data(the non-parallel voice conversion is used. Compared to the parallel voice conversion which needs parallel data, the non-parallel voice conversion does not need parallel data, see par. [0007]; any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079]); 
and outputting the generated voice quality converter parameter (the voice conversion processing unit 12 converts the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]).
Regarding claim 17 Nakashika A non-transitory computer-readable medium having stored thereon computer-executable instructions, which when executed by a computer, cause the computer to execute operations (The program may either be acquired through a record medium, or through the network. Alternatively, the program may be used in a state where the program is incorporated into the ROM, see par. [0050]), the operations comprising: 
receiving training data of first acoustic data of each of a plurality of sound sources, wherein the plurality of sound sources includes a target sound source and an input sound source, the training data is based on second acoustic data of mixed sound, the mixed sound includes a sound of the input sound source and a sound of the target sound source, and the target sound source is different from the input sound source (voice signals  of a specific user from voice signals of other users, noise, and environmental sound among input voice signals into a sound source, and can perform an estimation process using an estimation filter, see par. [0098]); 
training a discriminator parameter based on the received training data, wherein the discriminator parameter is for discrimination of the input sound source (the corresponding speaker information acquisition section 113 acquires information for distinguishing the speaker, among the speakers, whose speech signal for learning is being inputted into the speech signal acquisition section 111, a plurality of speech signals for learning respectively correspond to different speakers, see par. [0028]); 
generating a voice quality converter parameter based on the training data and the discriminator parameter, wherein the first acoustic data is different from parallel data and clean data(the non-parallel voice conversion is used. Compared to the parallel voice conversion which needs parallel data, the non-parallel voice conversion does not need parallel data, see par. [0007]; any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079]); 
and outputting the generated voice quality converter parameter (the voice conversion processing unit 12 converts the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]).

Regarding claim 18 Nakashika teaches a training apparatus comprising: 
a central processing
receive first training data of an input sound source and a discriminator parameter, wherein the first training data is based on a mixed sound including a sound of the input sound source and a sound of a target sound source and the input sound source is different from the target sound source parameter (The corresponding speaker information acquisition section 113 is adapted to acquire the corresponding speaker information associated with the acquisition of the speech signal for learning by the speech signal acquisition section 111. The corresponding speaker information is not particularly limited as long as it can discriminate the speaker of one speech signal for learning from the speaker of another speech signal for learning, see par. [0028]; 
and  train a voice quality converter parameter for conversion of first acoustic data of the sound of the input sound source to second acoustic data of voice quality of the target sound source a plurality of speech signals for learning respectively correspond to different speakers, see par. [0028], 
wherein the first acoustic data is different from parallel data and clean data the non-parallel voice conversion is used. Compared to the parallel voice conversion which needs parallel data, the non-parallel voice conversion does not need parallel data, see par. [0007]; any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079], 
the voice quality converter parameter is trained based on the received first training data of the input sound source and the discriminator parameter, and the discriminator parameter discriminates a sound source of the first acoustic data (the corresponding speaker information acquisition section 113 acquires information for distinguishing the speaker, among the speakers, whose speech signal for learning is being inputted into the speech signal acquisition section 111, a plurality of speech signals for learning respectively correspond to different speakers, see par. [0028]).


Regarding claim 19 Nakashika teaches a training method, comprising: receiving first training data of an input sound source and a discriminator parameter, wherein the first training data is based on a mixed sound including a sound of the input sound source and a sound of a target sound source and the input sound source is different from the target sound source parameter (The corresponding speaker information acquisition section 113 is adapted to acquire the corresponding speaker information associated with the acquisition of the speech signal for learning by the speech signal acquisition section 111. The corresponding speaker information is not particularly limited as long as it can discriminate the speaker of one speech signal for learning from the speaker of another speech signal for learning, see par. [0028]; 
and  training a voice quality converter parameter for conversion of first acoustic data of the sound of the input sound source to second acoustic data of voice quality of the target sound source a plurality of speech signals for learning respectively correspond to different speakers, see par. [0028], 
wherein the first acoustic data is different from parallel data and clean data the non-parallel voice conversion is used. Compared to the parallel voice conversion which needs parallel data, the non-parallel voice conversion does not need parallel data, see par. [0007]; any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079], 
the voice quality converter parameter is trained based on the received first training data of the input sound source and the discriminator parameter, and the discriminator parameter discriminates a sound source of the first acoustic data (the corresponding speaker information acquisition section 113 acquires information for distinguishing the speaker, among the speakers, whose speech signal for learning is being inputted into the speech signal acquisition section 111, a plurality of speech signals for learning respectively correspond to different speakers, see par. [0028]).
Regarding claim 20 Nakashika teaches a non-transitory computer-readable medium having stored thereon computer-executable instructions, which when executed by a computer, cause the computer to execute operations (The program may either be acquired through a record medium, or through the network. Alternatively, the program may be used in a state where the program is incorporated into the ROM, see par. [0050]), the operations comprising
receiving first training data of an input sound source and a discriminator parameter, wherein the first training data is based on a mixed sound including a sound of the input sound source and a sound of a target sound source and the input sound source is different from the target sound source parameter (The corresponding speaker information acquisition section 113 is adapted to acquire the corresponding speaker information associated with the acquisition of the speech signal for learning by the speech signal acquisition section 111. The corresponding speaker information is not particularly limited as long as it can discriminate the speaker of one speech signal for learning from the speaker of another speech signal for learning, see par. [0028]; 
and  training a voice quality converter parameter for conversion of first acoustic data of the sound of the input sound source to second acoustic data of voice quality of the target sound source a plurality of speech signals for learning respectively correspond to different speakers, see par. [0028], 
wherein the first acoustic data is different from parallel data and clean data the non-parallel voice conversion is used. Compared to the parallel voice conversion which needs parallel data, the non-parallel voice conversion does not need parallel data, see par. [0007]; any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079], 
the voice quality converter parameter is trained based on the received first training data of the input sound source and the discriminator parameter, and the discriminator parameter discriminates a sound source of the first acoustic data (the corresponding speaker information acquisition section 113 acquires information for distinguishing the speaker, among the speakers, whose speech signal for learning is being inputted into the speech signal acquisition section 111, a plurality of speech signals for learning respectively correspond to different speakers, see par. [0028]).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 6, 9, 11, 13-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nakashika U.S. PAP 2019/0051314 A1 in view of Sako U.S. PAP 2015/0356980 A1.

Regarding claim 6 Nakashika does not teach the signal processing apparatus according to claim 1, wherein the first training data is acoustic data and the first training data is further based on execution of sound source separation on the mixed sound.
In a similar field of endeavor Sako teaches an estimation unit 14B which can separate voice signals of a specific user from voice signals of other users, noise, and environmental sound among input voice signals into a sound source, and can perform an estimation process using an estimation filter, see par. [0098]. The sound source separation processor 141 performs a sound source separation process on the recorded content, in other words, voice signals read out from the voice signal DB 17, see par. [00100]. The specific user's voice determination processor 143 determines (identifies or recognizes) voice signals of the user specified by the user specifying unit 11 from the respective voice signals separated into sound sources by the sound source separation processor 141. For example, the voice determination processor 143 may perform speaker recognition on respective voice signals, and may determine voice signals of the specific user, see par. [0101]. The estimation processor 145 performs a process to estimate a voice signal (Ousual in FIG. 8) that is directly heard by the specific user himself/herself usually, on the basis of voice signals (Orec in FIG. 8) determined to be the voice signals of the specific user. Specifically, the process is performed by using the voice signal estimation filter corresponding to the specific user detected by the filter detecting unit 12 as explained above, see par. [0102]. The combiner 147 performs a process to combine the voice signals of the specific user subjected to the estimation process by the estimation processor 145, with other voice signals separated into a sound source, see par. [0103].
It would have been obvious to one of ordinary skill in the art to combine the  Nakashika invention with the teachings of Sako for the benefit of identifying a users specific voice signals in  the input, see par. [0101].
Regarding claim 9 Nakashika teaches a signal processing apparatus (invention relates to a voice conversion device, a voice conversion method and a program that make it possible to perform voice conversion for an arbitrary speaker, see par. [0001]) comprising: 
A central processing apparatus configured to:
receive specific acoustic data of a mixed sound, wherein the mixed sound includes a target sound of a target sound source and a non-target sound of a non-target sound source, and the target sound source is different from the non-target sound source (voice signals  of a specific user from voice signals of other users, noise, and environmental sound among input voice signals into a sound source, and can perform an estimation process using an estimation filter, see par. [0098]); 

 receive a voice quality converter parameter, wherein the voice quality converter parameter is based on first training data and a discriminator parameter, the discriminator parameter discriminates the target sound source of the first acoustic data, the first training data is based on the specific acoustic data of the mixed sound, and the second acoustic data is different from parallel data and clean data (the corresponding speaker information acquisition section 113 acquires information for distinguishing the speaker, among the speakers, whose speech signal for learning is being inputted into the speech signal acquisition section 111, a plurality of speech signals for learning respectively correspond to different speakers, see par. [0028]; the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]; the present invention also include a configuration in which, as the speech signal for learning (i.e., the input signal), a speech signal of various sounds other than the speaking voice of human may be learned. For example, any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079]; the non-parallel voice conversion is used. Compared to the parallel voice conversion which needs parallel data, the non-parallel voice conversion does not need parallel data, see par. [0007]);
execute voice quality conversion on the first acoustic data of the target sound to obtain third acoustic data, wherein the conversion of the first acoustic data is based on the voice quality converter parameter, and the first acoustic data is different from parallel data and clean data (the voice conversion processing unit 12 converts the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]); 
and synthesize the third acoustic data125 is outputted to the outside by the speech signal output section 126, see par. [0072]).
However Nakashika does not teach execute sound source separation to separate the specific
In a similar field of endeavor Sako teaches an estimation unit 14B which can separate voice signals of a specific user from voice signals of other users, noise, and environmental sound among input voice signals into a sound source, and can perform an estimation process using an estimation filter, see par. [0098]. The sound source separation processor 141 performs a sound source separation process on the recorded content, in other words, voice signals read out from the voice signal DB 17, see par. [00100]. The specific user's voice determination processor 143 determines (identifies or recognizes) voice signals of the user specified by the user specifying unit 11 from the respective voice signals separated into sound sources by the sound source separation processor 141. For example, the voice determination processor 143 may perform speaker recognition on respective voice signals, and may determine voice signals of the specific user, see par. [0101]. The estimation processor 145 performs a process to estimate a voice signal (Ousual in FIG. 8) that is directly heard by the specific user himself/herself usually, on the basis of voice signals (Orec in FIG. 8) determined to be the voice signals of the specific user. Specifically, the process is performed by using the voice signal estimation filter corresponding to the specific user detected by the filter detecting unit 12 as explained above, see par. [0102]. The combiner 147 performs a process to combine the voice signals of the specific user subjected to the estimation process by the estimation processor 145, with other voice signals separated into a sound source, see par. [0103].
It would have been obvious to one of ordinary skill in the art to combine the  Nakashika invention with the teachings of Sako for the benefit of identifying a users specific voice signals in  the input, see par. [0101].
Regarding claim 11 Sako teaches the signal processing apparatus according to claim 9, wherein the specific acoustic data is clean data of the target sound (estimating a first voice signal heard by a specific user himself/herself, see par. [0010]).
Regarding claim 13 Nakashika teaches a signal processing method, (invention relates to a voice conversion device, a voice conversion method and a program that make it possible to perform voice conversion for an arbitrary speaker, see par. [0001]), comprising: 
receiving specific acoustic data of a mixed sound, wherein the mixed sound includes a target sound of a target sound source and a non-target sound of a non-target sound source, and the target sound source is different from the non-target sound source (voice signals  of a specific user from voice signals of other users, noise, and environmental sound among input voice signals into a sound source, and can perform an estimation process using an estimation filter, see par. [0098]); 
receiving a voice quality converter parameter, wherein the voice quality converter parameter is based on first training data and a discriminator parameter, the discriminator parameter discriminates the target sound source of the first acoustic data, the first training data is based on the specific acoustic data of the mixed sound, and the second acoustic data is different from parallel data and clean data (the corresponding speaker information acquisition section 113 acquires information for distinguishing the speaker, among the speakers, whose speech signal for learning is being inputted into the speech signal acquisition section 111, a plurality of speech signals for learning respectively correspond to different speakers, see par. [0028]; the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]; the present invention also include a configuration in which, as the speech signal for learning (i.e., the input signal), a speech signal of various sounds other than the speaking voice of human may be learned. For example, any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079]; the non-parallel voice conversion is used. Compared to the parallel voice conversion which needs parallel data, the non-parallel voice conversion does not need parallel data, see par. [0007]);
executing voice quality conversion on the first acoustic data of the target sound to obtain third acoustic data, wherein the conversion of the first acoustic data is based on the voice quality converter parameter, and the first acoustic data is different from parallel data and clean data (the voice conversion processing unit 12 converts the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]); 
and synthesizing the third acoustic data125 is outputted to the outside by the speech signal output section 126, see par. [0072]).
However Nakashika does not teach executing sound source separation to separate the specific
In a similar field of endeavor Sako teaches an estimation unit 14B which can separate voice signals of a specific user from voice signals of other users, noise, and environmental sound among input voice signals into a sound source, and can perform an estimation process using an estimation filter, see par. [0098]. The sound source separation processor 141 performs a sound source separation process on the recorded content, in other words, voice signals read out from the voice signal DB 17, see par. [00100]. The specific user's voice determination processor 143 determines (identifies or recognizes) voice signals of the user specified by the user specifying unit 11 from the respective voice signals separated into sound sources by the sound source separation processor 141. For example, the voice determination processor 143 may perform speaker recognition on respective voice signals, and may determine voice signals of the specific user, see par. [0101]. The estimation processor 145 performs a process to estimate a voice signal (Ousual in FIG. 8) that is directly heard by the specific user himself/herself usually, on the basis of voice signals (Orec in FIG. 8) determined to be the voice signals of the specific user. Specifically, the process is performed by using the voice signal estimation filter corresponding to the specific user detected by the filter detecting unit 12 as explained above, see par. [0102]. The combiner 147 performs a process to combine the voice signals of the specific user subjected to the estimation process by the estimation processor 145, with other voice signals separated into a sound source, see par. [0103].
It would have been obvious to one of ordinary skill in the art to combine the  Nakashika invention with the teachings of Sako for the benefit of identifying a users specific voice signals in  the input, see par. [0101].
Regarding claim 14 Nakashika teaches non-transitory computer-readable medium having stored thereon computer-executable instructions, which when executed by a computer, cause the computer to execute operations (The program may either be acquired through a record medium, or through the network. Alternatively, the program may be used in a state where the program is incorporated into the ROM, see par. [0050]), the operations comprising: receiving specific acoustic data of a mixed sound, wherein the mixed sound includes a target sound of a target sound source and a non-target sound of a non-target sound source, and the target sound source is different from the non-target sound source (voice signals  of a specific user from voice signals of other users, noise, and environmental sound among input voice signals into a sound source, and can perform an estimation process using an estimation filter, see par. [0098]); 
receiving a voice quality converter parameter, wherein the voice quality converter parameter is based on first training data and a discriminator parameter, the discriminator parameter discriminates the target sound source of the first acoustic data, the first training data is based on the specific acoustic data of the mixed sound, and the second acoustic data is different from parallel data and clean data (the corresponding speaker information acquisition section 113 acquires information for distinguishing the speaker, among the speakers, whose speech signal for learning is being inputted into the speech signal acquisition section 111, a plurality of speech signals for learning respectively correspond to different speakers, see par. [0028]; the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]; the present invention also include a configuration in which, as the speech signal for learning (i.e., the input signal), a speech signal of various sounds other than the speaking voice of human may be learned. For example, any kinds of sounds, such as siren wailing, animal call and the like, may be learned, see par. [0079]; the non-parallel voice conversion is used. Compared to the parallel voice conversion which needs parallel data, the non-parallel voice conversion does not need parallel data, see par. [0007]);
executing voice quality conversion on the first acoustic data of the target sound to obtain third acoustic data, wherein the conversion of the first acoustic data is based on the voice quality converter parameter, and the first acoustic data is different from parallel data and clean data (the voice conversion processing unit 12 converts the voice of the speech signal for conversion into the voice of the target speaker based on the determined parameters and the information of the target speaker, see par. [0024]); 
and synthesizing the third acoustic data125 is outputted to the outside by the speech signal output section 126, see par. [0072]).
However Nakashika does not teach executing sound source separation to separate the specific
In a similar field of endeavor Sako teaches an estimation unit 14B which can separate voice signals of a specific user from voice signals of other users, noise, and environmental sound among input voice signals into a sound source, and can perform an estimation process using an estimation filter, see par. [0098]. The sound source separation processor 141 performs a sound source separation process on the recorded content, in other words, voice signals read out from the voice signal DB 17, see par. [00100]. The specific user's voice determination processor 143 determines (identifies or recognizes) voice signals of the user specified by the user specifying unit 11 from the respective voice signals separated into sound sources by the sound source separation processor 141. For example, the voice determination processor 143 may perform speaker recognition on respective voice signals, and may determine voice signals of the specific user, see par. [0101]. The estimation processor 145 performs a process to estimate a voice signal (Ousual in FIG. 8) that is directly heard by the specific user himself/herself usually, on the basis of voice signals (Orec in FIG. 8) determined to be the voice signals of the specific user. Specifically, the process is performed by using the voice signal estimation filter corresponding to the specific user detected by the filter detecting unit 12 as explained above, see par. [0102]. The combiner 147 performs a process to combine the voice signals of the specific user subjected to the estimation process by the estimation processor 145, with other voice signals separated into a sound source, see par. [0103].
It would have been obvious to one of ordinary skill in the art to combine the  Nakashika invention with the teachings of Sako for the benefit of identifying a users specific voice signals in  the input, see par. [0101].

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Ortiz-Sanchez whose telephone number is (571)270-3711. The examiner can normally be reached Monday- Friday 9AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL ORTIZ-SANCHEZ/Primary Examiner, Art Unit 2656