Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings were received on 9/12/2018.  These drawings are accepted.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 11-15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because of the following:
Claim 11 recites “A computer readable storage medium, which stores a system of identity verification, wherein the system of identity verification is executed by at least one processor to implement the following steps: …” Page 3 discloses “A computer readable medium is further provided, which stores a system of identity verification. The system of identity verification may be executed by at least one processor …” Page 18 discloses “A computer readable storage medium is further provided, which stores a system of identity verification. The system of identity verification is executed by a processor to implement the steps of the above mentioned method of identity verification.” Such paragraphs fails to disclose “a computer readable storage medium” as a non-transitory computer readable storage medium, wherein the 
Claim 12-15 further limits claim 11 but does not recite language directing the claim away from the judicial exception. Such claims are further directed towards non-statutory subject matter. Hence, such claims are ineligible. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2,6-7,11-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aronowitz (US Publication No.: 20130006635) in view of Laurila et al (Us Patent No.: 6691090).
Claim 1, Aronowitz discloses 

s2,    extracting preset types of acoustic features in all the voice frames by using standard spectral representation (paragraph 44 discloses using mel-frequency cepstrum to output an acoustic feature vector.), and generating multiple observed feature units corresponding to the current voice data according to the extracted acoustic features (Paragraph 44 discloses outputting the observed features using Mel-Frequency cepstrum.);
s3,    pairwise coupling all the observed feature units with pre-stored observed feature units respectively to obtain multiple groups of coupled observed feature units (Fig. 4, label 309, Fig. 1, label 132, Paragraph 49 discloses performing segmentation component to identify frames which belong to the same speaker. Fig. 1, label 133, Fig. 4, label 310, paragraph 50 discloses the extended acoustic feature vector is used in a clustering component 133 to cluster segments as belonging to the same speaker.);
s4,    inputting the multiple groups of coupled observed feature units into a preset type of identity verification model generated by pre-training (Paragraph 51-52,63 discloses “a labeling component 134 may be provided to label segments according to an identified speaker. A speaker identification component 135 carries out speaker identification using the pre-trained models 111-113 to identify the speaker of segments or clusters.” Fig. 4, label 403 receives the extended acoustic feature vectors clustered and segmented.), and obtaining an output identity verification result to carry out the 
Aronowitz discloses extracting feature vectors using a standard spectral representation, but fails to disclose the standard spectral representation as a predetermined filter.
Laurila et al discloses performing Mel-frequency cepstrum using a predetermined filter such as a Mel-filter to output acoustic features (Fig. 4, label mel-scale bandpass filtering and integration.). It would be obvious to one skilled in the art before the effective filing date of the application to substitute one well known element of a standard spectral representation as disclosed by Aronowitz with another well known element of Mel-frequency cepstrum using a predetermined filter as disclosed by Laurila et al so to yield predictable results of outputting feature vectors.
Aronowitz discloses performing windowing (Fig. 8, label 850), but fails to disclose carrying out a framing processing on the current voice data according to preset framing parameters to obtain multiple voice frames.
Laurila et al discloses frame blocking of the input speech signal (Fig. 1,4, label frame blocking, speech signal) prior to windowing (Fig. 1,4, label windowing) and MFCC features generation (Fig. 1,4, label mel-scaling band-pass filtering and integration, Col. 3, lines 1-42 discloses FFT, MFCC features.). It would be obvious to one skilled in the art before the effective filing date of the application to substitute one well known element of feature vector generation as disclosed by Aronowitz with another well know element of generating feature vectors by performing framing and windowing of the input speech signal as disclosed by Laurila et al so to yield predictable results of outputting feature 
Claim 2, Laurila et al discloses
wherein the predetermined filter is a Mel filter (Fig. 4, label Mel-scaling band-pass filtering ), and the step of extracting the preset types of acoustic features in the voice by using the predetermined filter (label Mel-scaling band-pass filtering) comprises:
carrying out windowing processing on the voice (Fig. 4, label windowing);
carrying out a fourier transformation on each window to obtain a corresponding frequency spectrum Fig. 4, label FFT);
inputting the frequency spectrums into the Mel filter so as to output Mel frequency spectrums (Fig. 4, label Mel-scaling bandpass filtering and integration);
carrying out cepstrum analysis on the Mel frequency spectrums to obtain Mel Frequency Cepstrum Coefficients (MFCCs), wherein the Mel Frequency Cepstrum Coefficients serve as the acoustic features of the voice frames (Fig. 4, label Mel-scaling bandpass filtering and integration, Logorithm of Filter bank energies, DCT, Fig. 5, filtered MFCC time series (channels) feature vector at time Tn, Col. 10, lines 24-62, Col. 5, lines 4-43).
It would be obvious to one skilled in the art before the effective filing date of the application to modify Aronowitz’s generation of acoustic feature vectors by performing the functionalities as disclosed by Laurila et al so to generate acoustic features, hence enabling user device to verify identification of a speaker, hence improving security of the user device.
Claim 6, Aronowitz discloses 

s2,    extracting preset types of acoustic features in all the voice frames by using standard spectral representation (paragraph 44 discloses using mel-frequency cepstrum to output an acoustic feature vector.), and generating multiple observed feature units corresponding to the current voice data according to the extracted acoustic features (Paragraph 44 discloses outputting the observed features using Mel-Frequency cepstrum.);
s3,    pairwise coupling all the observed feature units with pre-stored observed feature units respectively to obtain multiple groups of coupled observed feature units (Fig. 4, label 309, Fig. 1, label 132, Paragraph 49 discloses performing segmentation component to identify frames which belong to the same speaker. Fig. 1, label 133, Fig. 4, label 310, paragraph 50 discloses the extended acoustic feature vector is used in a clustering component 133 to cluster segments as belonging to the same speaker.);
s4,    inputting the multiple groups of coupled observed feature units into a preset type of identity verification model generated by pre-training (Paragraph 51-52,63 discloses “a labeling component 134 may be provided to label segments according to an identified speaker. A speaker identification component 135 carries out speaker identification using the pre-trained models 111-113 to identify the speaker of segments or clusters.” Fig. 4, label 403 receives the extended acoustic feature vectors clustered and segmented.), and obtaining an output identity verification result to carry out the 
Aronowitz discloses extracting feature vectors using a standard spectral representation, but fails to disclose the standard spectral representation as a predetermined filter.
Laurila et al discloses performing Mel-frequency cepstrum using a predetermined filter such as a Mel-filter to output acoustic features (Fig. 4, label mel-scale bandpass filtering and integration.). It would be obvious to one skilled in the art before the effective filing date of the application to substitute one well known element of a standard spectral representation as disclosed by Aronowitz with another well known element of Mel-frequency cepstrum using a predetermined filter as disclosed by Laurila et al so to yield predictable results of outputting feature vectors.
Aronowitz discloses performing windowing (Fig. 8, label 850), but fails to disclose carrying out a framing processing on the current voice data according to preset framing parameters to obtain multiple voice frames.
Laurila et al discloses frame blocking of the input speech signal (Fig. 1,4, label frame blocking, speech signal) prior to windowing (Fig. 1,4, label windowing) and MFCC features generation (Fig. 1,4, label mel-scaling band-pass filtering and integration, Col. 3, lines 1-42 discloses FFT, MFCC features.). It would be obvious to one skilled in the art before the effective filing date of the application to substitute one well known element of feature vector generation as disclosed by Aronowitz with another well know element of generating feature vectors by performing framing and windowing of the input speech signal as disclosed by Laurila et al so to yield predictable results of outputting feature 
Claim 7, Laurila et al discloses wherein the predetermined filter is a Mel filter (Fig. 4, label Mel-scaling band-pass filtering ), and the step of extracting the preset types of acoustic features in the voice by using the predetermined filter (label Mel-scaling band-pass filtering) comprises:
carrying out windowing processing on the voice (Fig. 4, label windowing);
carrying out a Fourier transformation on each window to obtain a corresponding frequency spectrum Fig. 4, label FFT);
inputting the frequency spectrums into the Mel filter so as to output Mel frequency spectrums (Fig. 4, label Mel-scaling bandpass filtering and integration);
carrying out cepstrum analysis on the Mel frequency spectrums to obtain Mel Frequency Cepstrum Coefficients (MFCCs), wherein the Mel Frequency Cepstrum Coefficients serve as the acoustic features of the voice frames (Fig. 4, label Mel-scaling bandpass filtering and integration, Logorithm of Filter bank energies, DCT, Fig. 5, filtered MFCC time series (channels) feature vector at time Tn, Col. 10, lines 24-62, Col. 5, lines 4-43).
It would be obvious to one skilled in the art before the effective filing date of the application to modify Aronowitz’s generation of acoustic feature vectors by performing the functionalities as disclosed by Laurila et al so to generate acoustic features, hence enabling user device to verify identification of a speaker, hence improving security of the user device.
Claim 11, Aronowitz discloses 

s2,    extracting preset types of acoustic features in all the voice frames by using standard spectral representation (paragraph 44 discloses using mel-frequency cepstrum to output an acoustic feature vector.), and generating multiple observed feature units corresponding to the current voice data according to the extracted acoustic features (Paragraph 44 discloses outputting the observed features using Mel-Frequency cepstrum.);
s3,    pairwise coupling all the observed feature units with pre-stored observed feature units respectively to obtain multiple groups of coupled observed feature units (Fig. 4, label 309, Fig. 1, label 132, Paragraph 49 discloses performing segmentation component to identify frames which belong to the same speaker. Fig. 1, label 133, Fig. 4, label 310, paragraph 50 discloses the extended acoustic feature vector is used in a clustering component 133 to cluster segments as belonging to the same speaker.);
s4,    inputting the multiple groups of coupled observed feature units into a preset type of identity verification model generated by pre-training (Paragraph 51-52,63 discloses “a labeling component 134 may be provided to label segments according to an identified speaker. A speaker identification component 135 carries out speaker identification using the pre-trained models 111-113 to identify the speaker of segments or clusters.” Fig. 4, label 403 receives the extended acoustic feature vectors clustered and segmented.), and obtaining an output identity verification result to carry out the 
Aronowitz discloses extracting feature vectors using a standard spectral representation, but fails to disclose the standard spectral representation as a predetermined filter.
Laurila et al discloses performing Mel-frequency cepstrum using a predetermined filter such as a Mel-filter to output acoustic features (Fig. 4, label mel-scale bandpass filtering and integration.). It would be obvious to one skilled in the art before the effective filing date of the application to substitute one well known element of a standard spectral representation as disclosed by Aronowitz with another well known element of Mel-frequency cepstrum using a predetermined filter as disclosed by Laurila et al so to yield predictable results of outputting feature vectors.
Aronowitz discloses performing windowing (Fig. 8, label 850), but fails to disclose carrying out a framing processing on the current voice data according to preset framing parameters to obtain multiple voice frames.
Laurila et al discloses frame blocking of the input speech signal (Fig. 1,4, label frame blocking, speech signal) prior to windowing (Fig. 1,4, label windowing) and MFCC features generation (Fig. 1,4, label mel-scaling band-pass filtering and integration, Col. 3, lines 1-42 discloses FFT, MFCC features.). It would be obvious to one skilled in the art before the effective filing date of the application to substitute one well known element of feature vector generation as disclosed by Aronowitz with another well know element of generating feature vectors by performing framing and windowing of the input speech signal as disclosed by Laurila et al so to yield predictable results of outputting feature 
Claim 12, Laurila et al discloses
wherein the predetermined filter is a Mel filter (Fig. 4, label Mel-scaling band-pass filtering ), and the step of extracting the preset types of acoustic features in the voice by using the predetermined filter (label Mel-scaling band-pass filtering) comprises:
carrying out windowing processing on the voice (Fig. 4, label windowing);
carrying out a fourier transformation on each window to obtain a corresponding frequency spectrum Fig. 4, label FFT);
inputting the frequency spectrums into the Mel filter so as to output Mel frequency spectrums (Fig. 4, label Mel-scaling bandpass filtering and integration);
carrying out cepstrum analysis on the Mel frequency spectrums to obtain Mel Frequency Cepstrum Coefficients (MFCCs), wherein the Mel Frequency Cepstrum Coefficients serve as the acoustic features of the voice frames (Fig. 4, label Mel-scaling bandpass filtering and integration, Logorithm of Filter bank energies, DCT, Fig. 5, filtered MFCC time series (channels) feature vector at time Tn, Col. 10, lines 24-62, Col. 5, lines 4-43).
It would be obvious to one skilled in the art before the effective filing date of the application to modify Aronowitz’s generation of acoustic feature vectors by performing the functionalities as disclosed by Laurila et al so to generate acoustic features, hence enabling user device to verify identification of a speaker, hence improving security of the user device.

s 3,8,13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aronowitz (US Publication No.: 20130006635) in view of Laurila et al (Us Patent No.: 6691090), further in view of Ghaemmaghami et al (US Publication No.: 20190304470).
Claim 3, Laurila et al discloses wherein the step of generating the multiple observed feature units corresponding to the current voice data according to the extracted acoustic features (Paragraph 44 discloses outputting the observed features using Mel-Frequency cepstrum.) comprises: forming a voice frame set by all the voice frames in each recording datum in the current voice data (Fig. 4, label frame blocking outputs voice frame set of the speech signal. Label windowing windows the frame set.), but fails to disclose splicing L dimensions of MFCCs of each voice frame in the voice frame set according to the sequence of framing moments of the corresponding voice frames, and  generating the observed feature units of a corresponding (L, N)-dimension matrix, wherein N is a total number of the frames in the voice frame set.
Ghaemmaghami et al discloses 
splicing L dimensions of MFCCs of each voice frame in the voice frame set according to the sequence of framing moments of the corresponding voice frames (paragraph 103 discloses 20 filter banks that yields 20 MFCC features for each frame (paragraph 105). Fig. 5b, label FFT, DCT, Delta, append), and 
generating the observed feature units of a corresponding (L, N)-dimension matrix, wherein N is a total number of the frames in the voice frame set (Paragraph 105 discloses 20 MFCC features for each frame, given T number of frames, which provides observed feature units corresponding to (20,T) dimension matrix, wherein T is the number of speech frames.).

Claim 8, Laurila et al discloses wherein the step of generating the multiple observed feature units corresponding to the current voice data according to the extracted acoustic features (Paragraph 44 discloses outputting the observed features using Mel-Frequency cepstrum.) comprises: forming a voice frame set by all the voice frames in each recording datum in the current voice data (Fig. 4, label frame blocking outputs voice frame set of the speech signal. Label windowing windows the frame set.), but fails to disclose splicing L dimensions of MFCCs of each voice frame in the voice frame set according to the sequence of framing moments of the corresponding voice frames, and generating the observed feature units of a corresponding (L, N)-dimension matrix, wherein N is a total number of the frames in the voice frame set.
Ghaemmaghami et al discloses 

generating the observed feature units of a corresponding (L, N)-dimension matrix, wherein N is a total number of the frames in the voice frame set (Paragraph 105 discloses 20 MFCC features for each frame, given T number of frames, which provides observed feature units corresponding to (20,T) dimension matrix, wherein T is the number of speech frames.).
Laurila et al discloses frame blocking and windowing of speech (Fig. 4, label frame block and windowing) and generating a new set of MFCCs per each speech frame (Col. 6, lines 6-67) and Ghaemmaghami et al discloses generating L dimensions of MFCCs of each voice frame and observed feature units corresponding to the matrix of the MFCCs of each voice frame (paragraph 105,Fig. 5b), hence it would be obvious to one skilled in the art before the effective filing date of the application to simply substitute one well known element of generating observed feature units from a number of MFCCS per a voice frame as disclosed by Laurila et al with another well known element of generating 20 dimensions of MFCCs of each voice frame and generate observed feature units corresponding to a matrix of such MFCCs as disclosed by Ghaemmaghami et al so to yield predictable results of generating MFCCs for each voice frame, hence improving the user’s device ability to process speech input.
Claim 13, Laurila et al discloses wherein the step of generating the multiple observed feature units corresponding to the current voice data according to the extracted acoustic features (Paragraph 44 discloses outputting the observed features  forming a voice frame set by all the voice frames in each recording datum in the current voice data (Fig. 4, label frame blocking outputs voice frame set of the speech signal. Label windowing windows the frame set.), but fails to disclose splicing L dimensions of MFCCs of each voice frame in the voice frame set according to the sequence of framing moments of the corresponding voice frames, and generating the observed feature units of a corresponding (L, N)-dimension matrix, wherein N is a total number of the frames in the voice frame set.
Ghaemmaghami et al discloses 
splicing L dimensions of MFCCs of each voice frame in the voice frame set according to the sequence of framing moments of the corresponding voice frames (paragraph 103 discloses 20 filter banks that yields 20 MFCC features for each frame (paragraph 105). Fig. 5b, label FFT, DCT, Delta, append), and 
generating the observed feature units of a corresponding (L, N)-dimension matrix, wherein N is a total number of the frames in the voice frame set (Paragraph 105 discloses 20 MFCC features for each frame, given T number of frames, which provides observed feature units corresponding to (20,T) dimension matrix, wherein T is the number of speech frames.).
Laurila et al discloses frame blocking and windowing of speech (Fig. 4, label frame block and windowing) and generating a new set of MFCCs per each speech frame (Col. 6, lines 6-67) and Ghaemmaghami et al discloses generating L dimensions of MFCCs of each voice frame and observed feature units corresponding to the matrix of the MFCCs of each voice frame (paragraph 105,Fig. 5b), hence it would be obvious to one skilled in the art before the effective filing date of the application to simply substitute one well known element of generating observed feature units from a number .

Allowable Subject Matter
Claims 4-5,9-10,14-15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LINDA WONG whose telephone number is (571)272-6044.  The examiner can normally be reached on 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on (571) 272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for 

/LINDA WONG/Primary Examiner, Art Unit 2656