Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Allowable Subject Matter
Claims 4 and 6  are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim Objections
Claims 1-4 and 6-8 are objected to for the reason that examiner seek clarification to the phrase “voice conversion/voice identity conversion device” The claim limitation is not clear if a voice conversion is be claimed or voice identification is claimed. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim 8 is rejected under 35 USC 101 because Claim 8 claims… “A program that causes … a parameter learning step” However, the claims do not define the Computer program code … to be a functional descriptive material encoded on a non transitory memory/disk/computer-readable medium, and is thus non-statutory for that reason (i.e., “When functional descriptive material is recorded on some non-transitory computer-readable medium it becomes structurally and functionally interrelated to the medium and will be statutory in most cases since use of technology permits the function of the descriptive material to be realized”).  Moreover, a “A program that causes … a parameter learning step” is neither a process (“action”), nor machine, nor manufacture, nor composition of matter (i.e., non transitory”) and therefore non-statutory. 

One ordinary skilled in the art will conclude that a program executes a parameter learning step … in a transmitting node. 
Examiner suggests amending the claim to include “a non-transitory computer readable medium that contains a driver apparatus…”.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 and 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Toru Nakashika (IEEE Transactions Pub. 2015-Referred to as Toru in this correspondence) in view of  Mouchtaris (May 2004)
Claim 1,Toru discloses a voice conversion / voice identity conversion device that converts a voice of a source speaker into a voice of a target speaker, (Abstract, lines 1-14- thus Voice conversion (VC) that converts (match) features of emphasis of a source speaker to those of the target speaker) for the comprising:
(Neural Network-NN- Page 581, Col. 2 lines 17-25) that determines a parameter for voice conversion / voice identity conversion from acoustic information (Feature Vector)  based on a voice for learning (source speaker’s voice) and speaker information corresponding to the acoustic information; (Feature vector of a source speaker is converted to that of the target speaker; Page 580, Col. 2 lines 9-12- thus in the Neural Network conditional probability is directly trained where feature vector of a source speaker is converted  to that of a target speaker)
(Understand the feature vector reads on the parameter that is determined since it is converted by the neural network) 
a parameter storage unit that stores a parameter determined by the parameter learning unit; (Page 584, Col. 1, lines 22-24-Acoustic feature from the speech database means acoustic features vector are stored in the database)  
(NB: understand Acoustic features vectors are parameters for the system)
and a voice conversion / voice identity conversion processing unit that performs voice conversion / voice identity conversion processing of the acoustic information based on the voice of the source speaker  based on the parameter stored in the parameter storage unit and the speaker information of the target speaker, (Page 582, Col. 2 lines 11-14..”-encoding the source speaker’s acoustic features to the linguistic information and decoding it into the target speaker’s acoustic features…” means the acoustic information (acoustic features) of both the source and target speakers are in consideration during the voice conversion)  
(Neural network) uses the acoustic information based on the voice,  the speaker information corresponding to the acoustic information,  (both source and target speakers voice signal) and phonological information representing a phoneme in the voice as variables, (Page 582 Col. 2 lines 27-31 -the source  and target speaker’s acoustic features to the Linguistic information are encoded and decoded this means to match indexes of the source speaker’s latent features to those of the target speaker’s both the acoustic features based on the linguistic information are considered) so that a probability model representing a relationship in connection energy among the acoustic information, (Page 582, Col. 2 lines 5-8- thus after the parameters (acoustic features, linguistic information and voice signal of both the source speaker and target speaker) the conditional probability of h(t) are expressed in relationship shown in equation (16))
the speaker information and the phonological information by the parameter is obtained and a plurality of speaker clusters having specific matrices (Page 582, Col. 2 lines 2-5- thus the computed reconstructed values reads on the matrices) are defined as the probability model, (Page 584, Col. 2 lines 40-50- thus the mel-cepstral distortion (MCD) reads on the probability model because it shows how close the converted vector (source) is close to the targeted vector (target))  and the voice conversion / voice identity conversion processing unit is configured to obtain speaker information of the target speaker from the parameter and obtain acoustic information of the target speaker from the obtained speaker information. (Page 581, Col. 1 lines 5-9- teaches that the system captures time information  and latent (deep) relationships between source speaker and target speaker features in a network, understand the latent (deep information) are the various phonemes speaker specific information and capture linguistical or phonological related information)
Toru does not disclose wherein there are specific adaptive matrices. 
 Mouchtaris discloses a similar technology where a Voice conversion method is to modify the speech characteristics of a particular speaker by estimating its parameters (Abstract lines 1-3) and where user specific values or matrices are adapted. (Page 2 Col. 2 , Lines 1-4- thus “spectral conversion is preceded by adaptation of the derived parameters to the non- parallel corpus). 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of adapting to new values. The motivation is it makes the system convert voices quicker and  effectively because the system can transform the voice of any person to another person. 
 
Claim 2, Toru in view of Mouchtaris discloses wherein the device further comprising an adaptive unit that adapts the parameter stored in the parameter storage unit to the voice of the source speaker to obtain a parameter after the adaptation, (Toru: Page 582, Col. 2 lines 6-8 thus the Parameters in the forward inference is the before adaption while the parameters backward inference reads on the after adaption)  wherein the parameter storage unit stores the parameter after the adaptation by the adaptive unit, (Toru: Parameters estimated backward inference (after adaption parameter)  is stored in database in page 584 col. 1 lines 24-25)
and the voice conversion / voice identity conversion processing unit performs voice conversion / voice identity conversion processing of the acoustic information based on the voice of the source speaker based on the parameter after the adaptation and the speaker information of the target speaker. (Toru: Page 582, Col. 2 lines 11-14..”-encoding the source speaker’s acoustic features to the linguistic information and decoding it into the target speaker’s acoustic features…” means the acoustic information (acoustic features) of both the source and target speakers are in consideration during the voice conversion)
Claim 3, Toru in view of Mouchtaris discloses wherein the device further comprising wherein the parameter learning unit and the adaptive unit are configured by a common arithmetic processing part, (Toru: Equations (6), (7) and (8) reads on the arithmetic processing parts) and the common arithmetic processing part is configured to perform a process of determining the parameter based on the voice for learning and a process of obtaining the parameter after the adaptation based on the voice of the source speaker. (Toru: Page 582, col. 2 lines 1-5- thus parameters relating to the model are also calculated from Eqs. (6) (7) and (8)- that means the equations (6), (7) and (8) reads on the arithmetic processing part)

Claim 4, Objected to as Allowable.
Claim 5,  Cancelled 
Claim 6, Objected to as Allowable. 
Claim 7, Toru discloses a voice conversion / voice identity conversion method for converting a quality of a voice of a source speaker to a voice of a target speaker, (Abstract, lines 1-14- thus Voice conversion (VC) that converts (match) features of emphasis of a source speaker to those of the target speaker) comprising:
 a parameter learning step including using acoustic information based on the voice, speaker information corresponding to the acoustic information, and phonological information representing a phoneme of the voice as variables (Page 582 Col. 2 lines 27-31 -the source  and target speaker’s acoustic features to the Linguistic information are encoded and decoded this means to match indexes of the source speaker’s latent features to those of the target speaker’s both the acoustic features based on the linguistic information are considered) to prepare a probability model representing a relationship in connection energy among the acoustic information, (Page 582, Col. 2 lines 5-8- thus after the parameters (acoustic features, linguistic information and voice signal of both the source speaker and target speaker) the conditional probability of h(t) are expressed in relationship shown in equation (16))
the speaker information ,and the phonological information by a parameter; 
defining a plurality of speaker clusters having specific matrices (Page 582, Col. 2 lines 2-5- thus the computed reconstructed values reads on the matrices)  as the probability model; (Page 584, Col. 2 lines 40-50- thus the mel-cepstral distortion (MCD) reads on the probability model because it shows how close the converted vector (source) is close to the targeted vector (target)) estimating a weight to the plurality of speaker clusters for respective speakers; and determining the parameter of the voice for learning; (Page 585, Col. 1, lines 15-20- thus the criteria for male-to female, male to male and female to female reads on the clusters of speakers) 
	and a voice conversion / voice identity conversion processing step of performing, based on a parameter obtained in the parameter learning step or a parameter after adaptation obtained by adapting the parameter to a voice of the source speaker and the speaker information of the target speaker, (Page 582, Col. 2 lines 6-8 thus the Parameters in the forward inference is the before adaption while the parameters backward inference reads on the after adaption)  voice conversion / voice identity conversion processing of the acoustic information based on the voice of the source speaker, (Parameters estimated backward inference (after adaption parameter)  is stored in database in page 584 col. 1 lines 24-25)
wherein the voice conversion / voice identity conversion processing in the voice conversion / voice identity conversion processing step includes obtaining speaker information of the target speaker from the parameter, and obtaining acoustic information of the target speaker from the obtained speaker information. (Page 581, Col. 1 lines 5-9- teaches that the system captures time information  and latent (deep) relationships between source speaker and target speaker features in a network, understand the latent (deep information) are the various phonemes speaker specific information and capture linguistical or phonological related information)
Toru does not disclose wherein there are specific adaptive matrices. 
 Mouchtaris discloses a similar technology where a Voice conversion method is to modify the speech characteristics of a particular speaker by estimating its parameters (Abstract lines 1-3) and where user specific values or matrices are adapted. (Page 2 Col. 2 , Lines 1-4- thus “spectral conversion is preceded by adaptation of the derived parameters to the non- parallel corpus). 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of adapting to new values. The motivation is it makes the system convert voices quicker and  effectively because the system can transform the voice of any person to another person. 

 Claim 8, Toru discloses a program that causes a computer to execute a parameter learning step including using acoustic information based on the voice, speaker information corresponding to the acoustic information, and phonological
information representing a phoneme of the voice as variables (Page 582 Col. 2 lines 27-31 -the source  and target speaker’s acoustic features to the Linguistic information are encoded and decoded this means to match indexes of the source speaker’s latent features to those of the target speaker’s both the acoustic features based on the linguistic information are considered) to prepare a probability model representing a relationship in connection energy among the acoustic information, (Page 582, Col. 2 lines 5-8- thus after the parameters (acoustic features, linguistic information and voice signal of both the source speaker and target speaker) the conditional probability of h(t) are expressed in relationship shown in equation (16))
 the speaker information and the phonological information by a parameter; defining a plurality of speaker clusters having specific matrices (Page 582, Col. 2 lines 2-5- thus the computed reconstructed values reads on the matrices) as the probability model; (Page 584, Col. 2 lines 40-50- thus the mel-cepstral distortion (MCD) reads on the probability model because it shows how close the converted vector (source) is close to the targeted vector (target))
 estimating a weight to the plurality of speaker clusters for respective speakers; and determining and storing the parameter of the voice for learning; (Page 585, Col. 1, lines 15-20- thus the criteria for male-to female, male to male and female to female reads on the clusters of speakers) 
and a voice conversion / voice identity conversion processing step of performing, based on a parameter obtained in the parameter learning step or a parameter after adaptation obtained by adapting the parameter to a voice of the source speaker and the speaker information of the target speaker, (Page 582, Col. 2 lines 6-8 thus the Parameters in the forward inference is the before adaption while the parameters backward inference reads on the after adaption) voice conversion / voice identity conversion processing of the acoustic information based on the voice of the source speaker, wherein the voice conversion / voice identity conversion processing in the voice conversion / voice identity conversion processing step is configured to obtain speaker information of the target speaker from the parameter, and to obtain acoustic information of the target speaker from the obtained speaker information. (Page 581, Col. 1 lines 5-9- teaches that the system captures time information  and latent (deep) relationships between source speaker and target speaker features in a network, understand the latent (deep information) are the various phonemes speaker specific information and capture linguistical or phonological related information)
Toru does not disclose wherein there are specific adaptive matrices. 
 Mouchtaris discloses a similar technology where a Voice conversion method is to modify the speech characteristics of a particular speaker by estimating its parameters (Abstract lines 1-3) and where user specific values or matrices are adapted. (Page 2 Col. 2 , Lines 1-4- thus “spectral conversion is preceded by adaptation of the derived parameters to the non- parallel corpus). 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of adapting to new values. The motivation is it makes the system convert voices quicker and  effectively because the system can transform the voice of any person to another person. 


Cited Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Agiomyrgiannakis (20150127349_ discloses an HMM-based ASR subsystem for an input language may be trained using extensive input-language standard-voice recordings in an input language. This can amount to application of high-quality, proven training techniques, for example. Referring to the HMM of the ASR subsystem as an "auxiliary HMM" and the speaker source of the input-language standard-voice recordings as an "auxiliary speaker," this training process may be said to train the auxiliary HMM in the voice of the auxiliary speaker.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Akwasi M Sarpong whose telephone number is (571)270-3438.  The examiner can normally be reached on Mon-Fri. 8:00am-4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KING D POON can be reached on 571-272-7440.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/AKWASI M SARPONG/           Primary  Examiner, Art Unit 2675                                                                                                                                                                                                          06/09/2021