Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Allowable Subject Matter
Claim 6 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Objections
Claims 7 and 8 are objected to for the reason that examiner seek clarification
to the phrase “voice conversion/voice identity conversion device” The claim limitation is
not clear if a voice conversion  or voice identification is being claimed.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 7 and 8 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. See below for explanation why the combination of Toru “Voice Conversion Using RNN Pre-Trained by Recurrent Temporal Restricted Boltzmann Machines and Kim et al. (US2005/0182626)  reads on the limitation “generating a plurality of speaker clusters having specific adaptive matrices, which define a probability model, each of the plurality of speaker clusters indicating a relationship between speakers based on the acoustic information, the speaker information, and the phonological information by the determined parameter.”
 (“Variation matrix” ) which define a probability model, (Section 0052, lines 1-4 “Clustering the speakers into M-numbered model variation groups from the speakers on the basis of a likelihood of the model variation) each of the plurality of speaker clusters (Section 0053, lines 1-2 “Plurality of speaker groups”) indicating a relationship between speakers based on the acoustic information, (Feature Vector) the speaker information, and the phonological information (Section 0069, lines 2-3- thus speech data) by the determined parameter. (Section 0066, lines 6-12- thus Speaker Cluster 1 has d1 indicating a relationship between speakers 1, 2 and 3 (speaker information)). 


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4 and 7- 8 are rejected under 35 U.S.C. 103 as being unpatentable over Toru Nakashika (IEEE Transactions Pub. 2015 Referred to as Toru in this correspondence) in view of Kim et al. (US2005/0182626).
Claim 1, Toru discloses a voice conversion device that converts a voice of a source speaker into voice of a target speaker, (Abstract, lines 1-14- thus Voice Conversion (VC) that converts (match) features of a source speaker to those of the target speaker) the voice conversion device comprising:
a parameter storage device that stores a parameter for voice convention and 
a processor (the database inherently is operated with a processor) coupled to the parameter storage device, ( Page 584, Col. 1, lines 22-24- Acoustic feature is retrieved from the speech database means it is stored in the database) the processor being programmed to: 
determine the parameter for voice conversion from acoustic information based on a voice for learning, (Col. 2 lines 5-8 “After the parameters (3 types) are estimated” ) speaker information corresponding to the acoustic information, (Acoustic Feature vector) - and phonological information representing a phoneme in the voice as variables (Page 582, Col. 2 lines 15-22 Variables x(t) and y (t) reads on the acoustic features vectors from phoneme) 
 (Feature vector of a source speaker is converted to that of the target speaker; Page 580, Col. 2 lines 9-12- thus in the Neural Network conditional probability is directly trained where feature vector of a source speaker is converted  to that of a target speaker)
obtain the speaker information of the target speaker from the parameter, and obtain the acoustic information of the target speaker from the obtained speaker information; (Page 582, Col. 2 lines 15-21- the parameter variables x(t) and y (t) (acoustic feature or acoustic information) from the speech of the source speaker and a target speaker are captured) 
perform voice conversion processing of the acoustic information based on the voice of the source speaker based on the stored parameter and the speaker information of the target speaker (Page 582, Col. 2 lines 11-14..”-encoding the source speaker’s acoustic features to the linguistic information and decoding it into the target speaker’s acoustic features…” means the acoustic information (acoustic features) of both the source and target speakers are in consideration during the voice conversion)  
Toru discloses in Page 581, Col. 1, lines 11-13 An RTRBM which is a non-linear probabilities model used to capture temporal dependencies in time-series data (speech data)  however does not disclose generating a plurality of speaker clusters having specific adaptive matrices, which define a probability model each of the plurality of speaker clusters indicating a relationship between speakers based on the acoustic information, the speaker information, and the phonological information by the determined parameter.
Kim discloses generating a plurality of speaker clusters having specific adaptive matrices, (“Variation matrix” ) which define a probability model, (Section 0052, lines 1-4 “Clustering the speakers into M-numbered model variation groups from the speakers on the basis of a likelihood of the model variation)  each of the plurality of speaker clusters (Section 0053, lines 1-2 “Plurality of speaker groups”) indicating a relationship between speakers based on the acoustic information, (Feature Vector) the speaker information, and the phonological information (Section 0069, lines 2-3- thus speech data) by the determined parameter. (Section 0066, lines 6-12- thus Speaker Cluster 1 has d1 indicating a relationship between speakers 1, 2 and 3 (speaker information))  
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of clustering speaker. The motivation is clustering makes it easier for speakers to be identified in a later date. 
Claim 2, Toru in view of Kim discloses conversion device wherein the processor is programmed to adapt the stored parameter to the voice of the source speaker to obtain a parameter after the adaptation, (Toru: Page 580, Col. 1 Introduction lines 14-16- “GMM-adaptation technique” reads on parameters of a source speaker is stored as a model)
the parameter storage device stores the parameter after the adaptation and perform the voice conversion processing of the acoustic information based on the voice of the source speaker based on the parameter after the adaptation and the speaker information of the target speaker. (Toru: Page 582, Col. 2 lines 10-14- By encoding the source speaker’s acoustic features (Parameters)  to the linguistic information and decode it into the target speaker’s acoustic feature – thus the acoustic features (parameters) are adapted by the target speakers) … Linguistic information from the specific speaker’s speech, we adopt speaker-dependent recurrent temporal restricted machines)  
Claim 3, Toru in view of Kim discloses conversion device wherein the processor is programmed to determine the parameter based on the voice for learning (Toru: Page 581, Col. 2 lines 28-30 “Parameter estimation” indicates the input data (voice)) and obtained the parameter after the adaptation based on the voice of the source speaker. (Toru: Page 582, Col. 1, lines 16-20 “Three types of parameters are estimated”) 
Claim 4, Toru in view of Kim discloses wherein the processor is programmed to perform learning so that the plurality of clusters are located at positions farthest from each other ( Toru: Page 581, Col. 1 Lines 17-20- “we train them (speakers) using training data) and set a position of a weight to each speaker cluster among the plurality of learned speaker clusters. (Kim: Section 0018 “the preselected weight” for each speaker cluster) 
Claim 5, Canceled:
Claim 6, Objected to as an allowable subject matter. See item 2 for details. 
Claim 7, Toru discloses a voice conversion method for converting a quality of a voice of a source speaker to a voice of a target speaker, (Abstract, lines 1-14- thus Voice Conversion (VC) that converts (match) features of a source speaker to those of the target speaker) the method comprising:
a parameter learning step including generating a probability model indicating a relationship between speakers based on acoustic information (Page 582, Col. 2 lines 6-10 – “conditional probability of h (t) given v (t) and h (t-1)” are generated for a source speaker and a target speaker based on other parameters related to the parameters model (W, b and c), also see lines 23-25 of page 582, Col. 2) and  based on the voice speaker information corresponding to the acoustic information, and phonological information representing a phoneme of the voice; (Page 582, Col. 2 lines 15-22 the acoustic feature vectors (voice speaker information) for both source and target speakers)
and determining a parameter of the voice for learning; (Parameters are estimated using conditional probability-Page 582, Col. 2 lines 5-9) and a voice conversion processing step of performing voice conversion / voice identity conversion processing of the acoustic information based on the voice of the source speaker based on the parameter obtained in the parameter learning step (Col. 2 lines 5-8 “After the parameters (3 types) are estimated” ) or a parameter after adaptation obtained by adapting the parameter to the voice of the source speaker and the speaker information of the target speaker, (Page 582, Col. 2 lines 11-14..  “encoding the source speaker’s acoustic features to the linguistic information and decoding it into the target speaker’s acoustic features…” means the acoustic information (acoustic features) of both the source and target speakers are in consideration during the voice conversion)  
(Feature vector of a source speaker is converted to that of the target speaker; Page 580, Col. 2 lines 9-12- thus in the Neural Network conditional probability is directly trained where feature vector of a source speaker is converted  to that of a target speaker)
the speaker information of the target speaker being obtained from the parameter, and the acoustic information of the target speaker being obtained from the obtained speaker information. (Page 582, Col. 2 lines 15-21- the parameter variables x(t) and y (t) (acoustic feature or acoustic information) from the speech of the source speaker and a target speaker are captured) 
Toru discloses in Page 581, Col. 1, lines 11-13 An RTRBM which is a non-linear probabilities model used to capture temporal dependencies in time-series data (speech data)  however does not discloses defining a plurality of speaker clusters having specific adaptive matrices as the probability model, each of the plurality of speaker clusters indicating a relationship between speakers; estimating a weight to the plurality of speaker clusters for respective speakers;
Kim discloses defining a plurality of speaker clusters having specific adaptive matrices (“Variation Matrix”) as the probability model, (Section 0052, lines 1-4 “Clustering the speakers into M-numbered model variation groups from the speakers on the basis of a likelihood of the model variation)  each of the plurality of speaker clusters indicating a relationship between speakers estimating a weight to the plurality of speaker clusters for respective speakers; (Section 0066, lines 6-12- thus Speaker Cluster 1 has d1 indicating a relationship between speakers 1, 2 and 3 (speaker information))  
Regarding the weight, Kim in Section 0018 teaches using a preselected weight in clustering speaker based on the variation matrix. 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of clustering speaker. The motivation is clustering makes it easier for speakers to be identified in a later date. 

Claim 8, Toru discloses a non-transitory computer readable storage medium (database in Col. 2 Page 584, lines 21-22) including a program having computer-executable instructions (Col. 2, page 584, lines 33-34- thus the learning algorithm reads on the executable instructions) that causes a computer to execute:
a parameter learning step including generating a probability model between speakers based on acoustic information (Page 582, Col. 2 lines 6-10 – “conditional probability of h (t) given v (t) and h (t-1)” are generated for a source speaker and a target speaker based on other parameters related to the parameters model (W, b and c), also see lines 23-25 of page 582, Col. 2) based on a voice, speaker information corresponding to the acoustic information, and phonological information representing a phoneme of the voice: (Page 582, Col. 2 lines 15-22 the acoustic feature vectors (voice speaker information) for both source and target speakers)
and determining the parameter of the voice for learning; (Parameters are estimated using conditional probability-Page 582, Col. 2 lines 5-9)
and a voice conversion processing step of performing voice conversion/ voice identity conversion processing of the acoustic information based on the voice of the source speaker based on the parameter obtained in the parameter learning step (Col. 2 lines 5-8 “After the parameters (3 types) are estimated” ) or a parameter after adaptation obtained by adapting the parameter to the  voice of a source speaker (Page 582, Col. 2 lines 11-14..  “encoding the source speaker’s acoustic features to the linguistic information and decoding it into the target speaker’s acoustic features…” means the acoustic information (acoustic features) of both the source and target speakers are in consideration during the voice conversion)  
(Feature vector of a source speaker is converted to that of the target speaker; Page 580, Col. 2 lines 9-12- thus in the Neural Network conditional probability is directly trained where feature vector of a source speaker is converted  to that of a target speaker)
and the speaker information of a target information of the target speaker being obtained from the parameter and the acoustic information of the target speaker being obtained from the obtained speaker information. (Page 582, Col. 2 lines 15-21- the parameter variables x(t) and y (t) (acoustic feature or acoustic information) from the speech of the source speaker and a target speaker are captured) 
Toru discloses in Page 581, Col. 1, lines 11-13 An RTRBM which is a non-linear probabilities model used to capture temporal dependencies in time-series data (speech data)  however does not discloses defining a plurality of speaker clusters having specific adaptive matrices as the probability model, each of the plurality of speaker clusters indicating a relationship between speakers; estimating a weight to the plurality of speaker clusters for respective speakers;
Kim discloses defining a plurality of speaker clusters having specific adaptive matrices (“Variation Matrix”) as the probability model, (Section 0052, lines 1-4 “Clustering the speakers into M-numbered model variation groups from the speakers on the basis of a likelihood of the model variation)  each of the plurality of speaker clusters indicating a relationship between speakers estimating a weight to the plurality of speaker clusters for respective speakers; (Section 0066, lines 6-12- thus Speaker Cluster 1 has d1 indicating a relationship between speakers 1, 2 and 3 (speaker information))  
Regarding the weight, Kim in Section 0018 teaches using a preselected weight in clustering speaker based on the variation matrix. 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of clustering speaker. The motivation is clustering makes it easier for speakers to be identified in a later date. 
Cited Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Zizzamia Frank (WO 2015/002630 A2) An unsupervised statistical analytics approach to detecting fraud utilizes cluster analysis to identify specific clusters of claims or transactions for additional investigation, or utilizes association rules as tripwires to identify outliers. The clusters or sets  of rules define a "normal" profile for the claims or transactions used to filter out normal claims, leaving "not normal" claims for potential investigation. To generate clusters or association rules, data relating to a sample set of claims or transactions may be obtained and a set of variables used to discover patterns in the data that indicate a normal profile. New claims may be filtered, and not normal claims analyzed further.
Tadayon et al. (US 20140079297) discloses detecting or classification of a feature of an object in the distance of the data to one or more clusters representing various contexts is determined. The clusters the image/data is further explored, e.g., by other classifiers or feature/object detectors), selected based on the set of predicted/suggested concepts/objects.
Agiomyriannakis et al. (20160140951) discloses a plurality of utterances of a reference speaker, a set of reference speaker vectors may be extracted and for each of a plurality of utterances of a colloquial speaker, a respective set of colloquial speaker. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each colloquial speaker vector to a reference-speaker vector. The colloquial speaker vector may be replaced with the matched reference speaker vector. The matching-and-replacing can be carried out separately for each set of colloquial-speaker vectors. A conditioned set of speaker vectors can then be constructed by aggregating all the replaced speaker vectors.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Akwasi M Sarpong whose telephone number is (571)270-3438. The examiner can normally be reached Mon-Fri. 8:00am-4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KING D POON can be reached on 571-272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AKWASI M SARPONG/Primary  Examiner, Art Unit 2675                                                                                                                                                                                                        11/13/2021.