DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-5 and 14-26 are pending in this application.
Claims 6-13 are canceled.
Response to Arguments
Regarding Rejection under 35 U.S.C. 103
Applicant’s arguments with respect to rejections have been fully considered but they are not persuasive.
Regarding Claim 1, Applicant argues that the rejection under 35 USC 103 is improper because nowhere does Roblek in view of Lei disclose or teach the limitation, “[E]ach first vector feature corresponds to a respective vector feature extractor configured to each character”, as recited in amended claim 1. Lei discloses that the i-vector is extracted and the subspace is trained (REMARKS, on page 10-11).
However, Examiner respectfully disagrees because Roblek in view of Lei still discloses the newly amended claim 1. Lei discloses that the t-th speech frame from i-th speech segment is determined by the Gaussians which adapted to each speech segment and i-vector, ω(i), is extracted by sufficient statistics. Thus, each i-vector, ω(i), is corresponds to each character/subword/segment/frame. 

    PNG
    media_image1.png
    343
    607
    media_image1.png
    Greyscale

(Lei, pp. 1695, Right Column and pp. 1696, Left Column, section 2).
Therefore, the rejection under 35 USC 103 is maintained at this time.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5 and 14-26 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Roblek et al., (US Pub. 2015/0279374, hereinafter Roblek) in view of Lei et al., (IEEE, “A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK”, 2014, hereinafter Lei).
Regarding claim 1, Roblek discloses a method for registering a voiceprint, comprising: 
performing a frame alignment operation on each first character contained in a registration character string inputted by a user in voice to extract first acoustic features of each first character constituting the registration character string, wherein each frame corresponds to a first character (based On page 13, lines 16-21 of Applicant’s specification, “each frame”, as recited in claim 1, may not correspond to each character. For further prosecution, the Examiner interprets “character” as ‘word’ or ‘syllable’. Fig. 2, [0029]-[0031] prompting the particular user to speak the predefined enrollment phrase “DONUT” and storing training acoustic data/frame representing the subwords “DO” and “NUT” in the enrollment phrase “DONUT.”);
Roblek does not explicitly teach, however, Lei does explicitly teach:
calculating a first posterior probability of the first acoustic features of each first character in a global Gaussian Mixture Model (GMM) model to perform a Baum-Welch (BW) statistic (pp. 1695, section 1, Left Column, “The collection of sufficient statistics is a process where a sequence of feature vectors (e.g., mel-frequency cepstral coefficients (MFCC)) are represented by the Baum-Welch statistics obtained with respect to a GMM”; Right Column, “use a DNN trained for speech recognition to guide speaker modeling, specifically, by using the output posteriors as frame alignments for speaker modelling and i-vector extraction … the DNN replaces the GMM to compute the posterior of the frames with respect to each of the classes in the model”); 
extracting first vector features of each first character through vector feature extractor configured for  each character, wherein the first vector feature of each first  
stitching expected values of the first posterior probabilities of the first characters sequentially, to obtain a registration voiceprint model of the user (pp. 1696-1697, section 5 and Figs. 1 and 2, “The estimation of the observation probability distribution
and the realignment can be optimized alternatively and iteratively… pre-trained hidden markov model (HMM) ASR system with GMM states is needed to generate alignments for the subsequent DNN training”).
	Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of speaker verification using dynamically generated phrases as taught Roblek with applying the collection of sufficient statistics and performing Baum-Welch statistics algorithm to train DNN as taught by Lei to provide a significant increase in accuracy from the successful application of the i-vector extraction paradigm (Lei, section 1).
Regarding claim 2, Roblek in view of Lei discloses the method according to claim 1, and Roblek further discloses:

Regarding claim 3, Roblek in view of Lei discloses the method according to claim 1, and Roblek further discloses:
performing the frame alignment operation on a training character string inputted by the user in voice to extract second acoustic features of each second character constituting the training character string ([0041] obtaining the verification phrase based on obtaining multiple candidate phrases from a candidate phrases database).
Roblek does not explicitly teach, however, Lei does explicitly teach:
training the global GMM model according to the second acoustic features of each second character constituting the training character string; and calculating a second posterior probability of the second acoustic features of each second character in the global GMM model to perform a BW statistic, and training a vector feature extractor configured for each character using a joint factor analysis method ((pp. 1695, section 1, Left Column, “The collection of sufficient statistics is a process where a sequence of feature vectors (e.g., mel-frequency cepstral coefficients (MFCC)) are represented by the Baum-Welch statistics obtained with respect to a GMM”; Right Column, “use a DNN trained for speech recognition to guide speaker modeling, specifically, by using the output posteriors as frame alignments for speaker modelling and i-vector extraction … the DNN replaces the GMM to compute the posterior of the frames with respect to each of the classes in the model”).
Regarding claim 4, Roblek discloses a method for authenticating a voiceprint, comprising: 
performing a frame alignment operation on each third character contained in an authentication character string inputted by a user in voice, to extract third acoustic features of each third character constituting the authentication character string, wherein each frame corresponds to a third character (For further prosecution, the Examiner interprets “character” as ‘word’ or ‘syllable’. Fig. 2, [0029]-[0031] prompting the particular user to speak the predefined enrollment phrase “DONUT” and storing training acoustic data/frame representing the subwords “DO” and “NUT” in the enrollment phrase “DONUT.”); and 
matching a pre-stored registration voiceprint model of the user with the authentication voiceprint model, to determine whether the user is legal according to a matching result ([0054]-[0058] generating a match score for each compared subword based on the one or more comparisons of the obtained acoustic data and the stored training acoustic data).
Roblek does not explicitly teach, however, Lei does explicitly teach:
calculating a third posterior probability of the third acoustic features of each third character in a global Gaussian Mixture Model (GMM) model, to perform a Baum-Welch (BW) statistic (pp. 1695, section 1, Left Column, “The collection of sufficient statistics is a process where a sequence of feature vectors (e.g., mel-frequency cepstral coefficients (MFCC)) are represented by the Baum-Welch statistics obtained with respect to a GMM”; Right Column, “use a DNN trained for speech recognition to guide speaker modeling, specifically, by using the output posteriors as frame alignments for 
extracting second vector features of each third character through vector feature extractor configured for each character, wherein the second vector feature of each third character is an expected value of the third posterior probability, wherein each first vector  feature corresponds to a respective vector feature extractor configured for each character (pp. 1695-1696, section 2, Given a speech segment, the following sufficient statistics can be computed using the posterior probabilities of the classes…These sufficient statistics are all that is needed to train the subspace T and extract the i-vector ω(i)”); and 
stitching the expected values of the third posterior probabilities of the third characters sequentially, to obtain an authentication voiceprint model of the user (pp. 1696-1697, section 5 and Figs. 1 and 2, “The estimation of the observation probability distribution and the realignment can be optimized alternatively and iteratively… pre-trained hidden markov model (HMM) ASR system with GMM states is needed to generate alignments for the subsequent DNN training”).
	Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of Random voiceprint certification system as taught Roblek with applying the collection of sufficient statistics and performing Baum-Welch statistics algorithm to train DNN as taught by Lei to provide a significant increase in accuracy from the successful application of the i-vector extraction paradigm (Lei, section 1).
Regarding claim 5, Roblek in view of Lei discloses the method according to claim 4, and Roblek further discloses:
wherein matching the pre-stored registration voiceprint model of the user with the authentication voiceprint model, to determine whether the user is legal according to the matching result comprises: 
when a matching degree between the registration voiceprint model and the authentication voiceprint model is greater than or equal to a predetermined threshold, determining that the user is legal ([0058] the speaker classifier may make a classification that the speaker is the particular user based on determining the obtained acoustic data matches the stored training acoustic data because a final score from the subword comparer is 90% or greater); and 
when the matching degree between the registration voiceprint model and the authentication voiceprint model is less than the predetermined threshold, determining that the user is illegal ([0058][0059] when a score from the subword comparer is less than 90%, the speaker classifier makes the classification that the speaker is not the particular user).
Regarding claim 14, Roblek discloses an apparatus for authenticating a voiceprint, comprising: 
one or more processors; a memory; one or more programs, stored in the memory, wherein when the one or more programs are executed by the one or more processors, a method for authenticating a voiceprint is executed, the method comprises ([0073][0076] speaker verification system comprises processors and memory):

matching a pre-stored registration voiceprint model of the user with the authentication voiceprint model, to determine whether the user is legal according to a matching result ([0054]-[0058] generating a match score for each compared subword based on the one or more comparisons of the obtained acoustic data and the stored training acoustic data).
Roblek does not explicitly teach, however, Lei does explicitly teach:
calculating a third posterior probability of the third acoustic features of each third character in a global Gaussian Mixture Model (GMM) model to perform a Baum-Welch (BW) statistic (pp. 1695, section 1, Left Column, “The collection of sufficient statistics is a process where a sequence of feature vectors (e.g., mel-frequency cepstral coefficients (MFCC)) are represented by the Baum-Welch statistics obtained with respect to a GMM”; Right Column, “use a DNN trained for speech recognition to guide speaker modeling, specifically, by using the output posteriors as frame alignments for speaker modelling and i-vector extraction … the DNN replaces the GMM to compute the posterior of the frames with respect to each of the classes in the model”); 
 
stitching the expected values of the third posterior probabilities of the third characters sequentially, to obtain an authentication voiceprint model of the user (pp. 1696-1697, section 5 and Figs. 1 and 2, “The estimation of the observation probability distribution and the realignment can be optimized alternatively and iteratively… pre-trained hidden markov model (HMM) ASR system with GMM states is needed to generate alignments for the subsequent DNN training”).
	Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of Random voiceprint certification system as taught Roblek with applying the collection of sufficient statistics and performing Baum-Welch statistics algorithm to train DNN as taught by Lei to provide a significant increase in accuracy from the successful application of the i-vector extraction paradigm (Lei, section 1).
Regarding claim 15, Roblek in view of Lei discloses the method according to claim 1, and Roblek further discloses:

Regarding claim 16, Roblek in view of Lei discloses the method according to claim 1, and Lei further discloses:
wherein the first posterior probability conforms to a Gaussian distribution and the expectations of the first posterior probability are the first vector features (pp. 1695, section 1, “using the output posteriors as frame alignments for speaker modelling and i-vector extraction, substituting for the role of the UBM in the standard framework”; pp. 1696, section 3, “The likelihood ratio of the test data given the UBM and speaker-specific GMM is used for speaker recognition”).
Regarding claim 17, Roblek in view of Lei discloses the method according to claim 3, and Roblek further discloses:
wherein the registration character string comprises: one or more textual character strings, and/or one or more numeral character strings (Fig. 2, [0029][0032] enrollment character string, e.g., DONUT).
Regarding claim 18, Roblek in view of Lei discloses the method according to claim 4, and Roblek further discloses:
wherein the registration voiceprint model is obtained by act of: performing a frame alignment operation on a registration character string inputted by a user in voice to extract first acoustic features of each first character constituting the registration 
Roblek does not explicitly teach, however, Lei does explicitly teach:
calculating a first posterior probability of the first acoustic features of each first character in a global GMM model to perform a BW statistic (pp. 1695, section 1, Left Column, “The collection of sufficient statistics is a process where a sequence of feature vectors (e.g., mel-frequency cepstral coefficients (MFCC)) are represented by the Baum-Welch statistics obtained with respect to a GMM”; Right Column, “use a DNN trained for speech recognition to guide speaker modeling, specifically, by using the output posteriors as frame alignments for speaker modelling and i-vector extraction … the DNN replaces the GMM to compute the posterior of the frames with respect to each of the classes in the model”); 
extracting first vector features of each first character through a preset vector feature extractor configured for multi-character (pp. 1695-1696, section 2, Given a speech segment, the following sufficient statistics can be computed using the posterior probabilities of the classes…These sufficient statistics are all that is needed to train the subspace T and extract the i-vector ω(i)”); and 
stitching the first vector features of each first character sequentially, to obtain a registration voiceprint model of the user (pp. 1696-1697, section 5 and Figs. 1 and 2, “The estimation of the observation probability distribution and the realignment can be optimized alternatively and iteratively… pre-trained hidden markov model (HMM) ASR system with GMM states is needed to generate alignments for the subsequent DNN training”).
Regarding claim 19, Roblek in view of Lei discloses the method according to claim 18, and Roblek further discloses:
 wherein the registration character string comprises: one or more textual character strings, and/or one or more numeral character strings (Fig. 2, [0029][0032] enrollment character string, e.g., DONUT).
Regarding claim 20, Roblek in view of Lei discloses the method according to claim 18, and wherein the registration voiceprint model is obtained further by acts of: 
wherein the registration voiceprint model is obtained further by acts of: performing the frame alignment operation on a training character string inputted by the user in voice to extract second acoustic features of each second character constituting the training character string ([0041] obtaining the verification phrase based on obtaining multiple candidate phrases from a candidate phrases database).
Roblek does not explicitly teach, however, Lei does explicitly teach:
training the global GMM model according to the second acoustic features of each second character constituting the training character string; and calculating a second posterior probability of the second acoustic features of each second character in the global GMM model to perform a BW statistic, and training a vector feature extractor configured for each character using a joint factor analysis method (pp. 1695, section 1, Left Column, “The collection of sufficient statistics is a process where a sequence of feature vectors (e.g., mel-frequency cepstral coefficients (MFCC)) are represented by the Baum-Welch statistics obtained with respect to a GMM”; Right Column, “use a DNN trained for speech recognition to guide speaker modeling, specifically, by using the output posteriors as frame alignments for speaker modelling and i-vector extraction … 
Regarding claim 21, Roblek in view of Lei discloses the method according to claim 18, and Roblek further discloses:
 wherein first acoustic feature is a Mel Frequency Cepstral Coefficient (MFCC), a Perceptual Linear Predictive (PLP), or a Linear Prediction Cepstrum Coefficient (LPCC) ([0025] Acoustic data for each of the subwords is MFCC coefficients).
Regarding claim 22, Roblek in view of Lei discloses the apparatus according to claim 14, and Roblek further discloses:
wherein matching the pre-stored registration voiceprint model of the user with the authentication voiceprint model, to determine whether the user is legal according to the matching result comprises: 
when a matching degree between the registration voiceprint model and the authentication voiceprint model is greater than or equal to a predetermined threshold, determining that the user is legal ([0058] the speaker classifier may make a classification that the speaker is the particular user based on determining the obtained acoustic data matches the stored training acoustic data because a final score from the subword comparer is 90% or greater); and 
when the matching degree between the registration voiceprint model and the authentication voiceprint model is less than the predetermined threshold, determining that the user is illegal ([0058][0059] when a score from the subword comparer is less than 90%, the speaker classifier makes the classification that the speaker is not the particular user).
Regarding claim 23, Roblek in view of Lei discloses the apparatus according to claim 14, and Roblek further discloses:
wherein the registration voiceprint model is obtained by act of: 
performing a frame alignment operation on a registration character string inputted by a user in voice to extract first acoustic features of each first character constituting the registration character string ([0041] obtaining the verification phrase based on obtaining multiple candidate phrases from a candidate phrases database).
Roblek does not explicitly teach, however, Lei does explicitly teach:
calculating a first posterior probability of the first acoustic features of each first character in a global GMM model to perform a BW statistic (pp. 1695, section 1, Left Column, “The collection of sufficient statistics is a process where a sequence of feature vectors (e.g., mel-frequency cepstral coefficients (MFCC)) are represented by the Baum-Welch statistics obtained with respect to a GMM”; Right Column, “use a DNN trained for speech recognition to guide speaker modeling, specifically, by using the output posteriors as frame alignments for speaker modelling and i-vector extraction … the DNN replaces the GMM to compute the posterior of the frames with respect to each of the classes in the model”); 
extracting first vector features of each first character through a preset vector feature extractor configured for multi-character (pp. 1695-1696, section 2, Given a speech segment, the following sufficient statistics can be computed using the posterior probabilities of the classes…These sufficient statistics are all that is needed to train the subspace T and extract the i-vector ω(i)”); and 

Regarding claim 24, Roblek in view of Lei discloses the apparatus according to claim 23, and Roblek further discloses:
wherein the registration character string comprises: one or more textual character strings, and/or one or more numeral character strings (Fig. 2, [0029][0032] enrollment character string, e.g., DONUT).
Regarding claim 25, Roblek in view of Lei discloses the apparatus according to claim 23, and Roblek further discloses:
wherein the registration voiceprint model is obtained further by acts of: 
performing the frame alignment operation on a training character string inputted by the user in voice to extract second acoustic features of each second character constituting the training character string ([0041] obtaining the verification phrase based on obtaining multiple candidate phrases from a candidate phrases database).
Roblek does not explicitly teach, however, Lei does explicitly teach:
training the global GMM model according to the second acoustic features of each second character constituting the training character string; and calculating a second posterior probability of the second acoustic features of each second character in the global GMM model to perform a BW statistic, and training a vector feature extractor 
Regarding claim 26, Roblek in view of Lei discloses the apparatus according to claim 23, and Roblek further discloses:
wherein first acoustic feature is a Mel Frequency Cepstral Coefficient (MFCC), a Perceptual Linear Predictive (PLP), or a Linear Prediction Cepstrum Coefficient (LPCC) ([0025] Acoustic data for each of the subwords is MFCC coefficients).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEONG-AH A. SHIN whose telephone number is (571)272-5933. The examiner can normally be reached 9 AM-3PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 

Seong-ah A. Shin
Primary Examiner
Art Unit 2659



/SEONG-AH A SHIN/Primary Examiner, Art Unit 2659