DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This Office Action is in response to correspondence filed 27 October 2020 in reference to application 17/081,394. Claims 1-33 are pending and have been examined.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-33 rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-31 of U.S. Patent No. 10,832,683. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims of 10,832,682 anticipated the claims of the instant application as laid out in the chart below.
Instant Application
US Patent 10,832,683
Claim 1: A method for efficient universal background model (UBM) training for speaker recognition, comprising: 
Claim 1: A method for efficient universal background model (UBM) training for speaker recognition, comprising: 
receiving an audio input, divisible into a plurality of audio frames, wherein at least a first audio frame of the plurality of audio frames includes an audio sample having a length above a first threshold; 
receiving an audio input, divisible into a plurality of audio frames, wherein at least a first audio frame of the plurality of audio frames includes an audio sample having a length above a first threshold; 
extracting at least one identifying feature from the first audio frame and generating a feature vector based on the at least one identifying feature; 
extracting at least one identifying feature from the first audio frame and generating a feature vector based on the at least one identifying feature; 
generating an optimized training sequence computation based on the feature vector and a Gaussian Mixture Model (GMM), wherein the GMM is associated with a plurality of components, wherein each of the plurality of components is defined by a covariance matrix, a mean vector, and a weight vector; and 
generating an optimized training sequence computation based on the feature vector and a Gaussian Mixture Model (GMM), wherein the GMM is associated with a plurality of components, wherein each of the plurality of components is defined by a covariance matrix, a mean vector, and a weight vector; and 
updating any of the associated components of the GMM based on the generated optimized training sequence computation.
updating any of the associated components of the GMM based on the generated optimized training sequence computation which includes a first computation… 
Claim 2: The method of claim 1, wherein updating the optimized training sequence computation comprises: 
Claim 1: the first computation including: 
generating a feature matrix based on the feature vector; 
generating a feature matrix based on the feature vector; 
generating a GMM mean matrix based on the plurality of mean vectors associated with the plurality of GMM components; and 
generating a GMM mean matrix based on the plurality of mean vectors associated with the plurality of GMM components; and 
generating a delta matrix based on the feature matrix and the GMM mean matrix.
generating a delta matrix based on the feature matrix and the GMM mean matrix.
Claim 3: The method of claim 2, wherein updating the optimized training sequence computation further comprises: updating a mean vector, weight vector or a covariance matrix based on a computation of the delta matrix, an inverse covariance matrix, and a transposed delta matrix.
Claim 2: The method of claim 1, wherein updating the optimized training sequence computation further comprises: updating a mean vector, weight vector or a covariance matrix based on a computation of the delta matrix, an inverse covariance matrix, and a transposed delta matrix.
Claim 4: The method of claim 1, wherein updating the optimized training sequence computation comprises: 
Claim 3: The method of claim 1, wherein updating the optimized training sequence computation further comprises: 
generating a first multi-dimensional array comprising a plurality of duplicated matrices, where each matrix includes a plurality of GMM mean vectors; 
a second computation, the second computation including generating a first multi-dimensional array comprising a plurality of duplicated matrices, where each matrix includes a plurality of GMM mean vectors; 
generating a multi-dimensional feature matrix comprising a plurality of feature matrices, where each feature matrix corresponds to a feature vector of a single audio frame; and 
generating a multi-dimensional feature matrix comprising a plurality of feature matrices, where each feature matrix corresponds to a feature vector of a single audio frame; and 
generating a multi-dimensional delta array based on the first multi-dimensional array and the multi-dimensional feature matrix.
generating a multi-dimensional delta array based on the first multi-dimensional array and the multi-dimensional feature matrix.
Claim 5: The method of claim 4, wherein updating the optimized training sequence computation further comprises: updating a mean vector, weight vector or covariance matrix, based on a computation of the multi-dimensional delta array, an inverse covariance matrix, and a transposed delta array.
Claim 4: The method of claim 3, wherein updating the optimized training sequence computation further comprises: updating a mean vector, weight vector or covariance matrix, based on a computation of the multi-dimensional delta array, an inverse covariance matrix, and a transposed delta array.
Claim 6: The method of claim 1, wherein updating the optimized training sequence computation comprises: detecting diagonal matrices, and only performing computations that involve diagonal elements.
Claim 5: The method of claim 1, wherein updating the optimized training sequence computation comprises: detecting diagonal matrices, and only performing computations that involve diagonal elements.
Claim 7: The method of claim 1, wherein updating the optimized training sequence computation comprises: 
Claim 6: The method of claim 1, wherein updating the optimized training sequence computation comprises: 
detecting computations in an intermediate result that generate an off-diagonal element of a matrix which is diagonalized; and 
detecting computations in an intermediate result that generate an off-diagonal element of a matrix which is diagonalized; and 
eliminating the computation of the intermediate result.
eliminating the computation of the intermediate result.
Claim 8: The method of claim 1, wherein updating the optimized training sequence computation comprises: 
Claim 7: 	The method of claim 1, wherein updating the optimized training sequence computation comprises: 
detecting a recurring computation; 
detecting a recurring computation; 

precomputing the recurring computation; and 
storing the precomputed result in a cache.
storing the precomputed result in a cache.
Claim 9: The method of claim 1, wherein the at least one identifying feature is a mel frequency cepstrum coefficient (MFCC).
Claim 8: The method of claim 1, wherein the at least one identifying feature is a mel frequency cepstrum coefficient (MFCC).
Claim 10: The method of claim 9, further comprising: 
Claim 9: The method of claim 8, further comprising: 
generating a plurality of identifying features from a consecutive audio frame; 
generating a plurality of identifying features from a consecutive audio frame; 
storing the generated identifying features in a second feature vector data structure; and 
storing the generated identifying features in a second feature vector data structure; and 
generating delta coefficients of the MFCCs based on the feature vector and the second feature vector; and 
generating delta coefficients of the MFCCs based on the feature vector and the second feature vector; and 
wherein the optimized training sequence computation is further performed based on the generated delta coefficients.
wherein the optimized training sequence computation is further performed based on the generated delta coefficients.
Claim 11: The method of claim 10, further comprising: 
Claim 10: The method of claim 9, further comprising: 
generating delta-delta coefficients of the delta coefficients; and 
generating delta-delta coefficients of the delta coefficients; and 
wherein generating the optimized training sequence computation is further performed based on the generated delta-delta coefficients.
wherein generating the optimized training sequence computation is further performed based on the generated delta-delta coefficients.
Claim 12: The method of claim 10, wherein the consecutive audio frame partially overlaps with the first audio frame.
Claim 11: The method of claim 9, wherein the consecutive audio frame partially overlaps with the first audio frame.
Claim 13: The method of claim 1, wherein updating the optimized training sequence computation further includes: detecting one or more computations to be executed on at least one of: a general purpose graphics processor unit (GPGPU) and a multi-core CPU.
Claim 12: The method of claim 1, wherein updating the optimized training sequence computation further includes: detecting one or more computations to be executed on at least one of: a general purpose graphics processor unit (GPGPU) and a multi-core CPU.
Claim 14: The method of claim 1, wherein the audio sample is received from a speaker database including a plurality of audio samples, where each audio sample comprises a sample of a human speaker.
Claim 13: The method of claim 1, wherein the audio sample is received from a speaker database including a plurality of audio samples, where each audio sample comprises a sample of a human speaker.
Claim 15: The method of claim 1, wherein generating the at least one identifying feature further comprises: 
Claim 14: The method of claim 1, wherein generating the at least one identifying feature further comprises: 
providing each audio frame to a neural network, the neural network operative for extracting features from the audio frame; and 
providing each audio frame to a neural network, the neural network operative for extracting features from the audio frame; and 
generating an output vector of features.
generating an output vector of features.
Claim 16: The method of claim 1, further comprising: storing the at least one identifying feature in a feature vector data structure.
Claim 15: The method of claim 1, further comprising: storing the at least one identifying feature in a feature vector data structure.
Claim 17: A non-transitory computer readable medium having stored thereon instructions for causing a processing circuitry to perform a process, the process comprising: 
Claim 16: A non-transitory computer readable medium having stored thereon instructions for causing a processing circuitry to perform a process, the process comprising: 
receiving an audio input, divisible into a plurality of audio frames, wherein at least a first audio frame of the plurality of audio frames includes an audio sample having a length above a first threshold; 
receiving an audio input, divisible into a plurality of audio frames, wherein at least a first audio frame of the plurality of audio frames includes an audio sample having a length above a first threshold; 
extracting at least one identifying feature from the first audio frame and generating a feature vector based on the at least one identifying feature; 
extracting at least one identifying feature from the first audio frame and generating a feature vector based on the at least one identifying feature; 
generating an optimized training sequence computation based on the feature vector and a Gaussian Mixture Model (GMM), wherein the GMM is associated with a plurality of components, wherein each of the plurality of components is defined by a covariance matrix, a mean vector, and a weight vector; and 
generating an optimized training sequence computation based on the feature vector and a Gaussian Mixture Model (GMM), wherein the GMM is associated with a plurality of components, wherein each of the plurality of components is defined by a covariance matrix, a mean vector, and a weight vector; and matrix based on the feature matrix and the GMM mean matrix.
updating any of the associated components of the GMM based on the generated optimized training sequence computation.
updating any of the associated components of the GMM based on the generated optimized training sequence computation, plurality of GMM components; and generating a delta…
Claim 18: A system for efficient universal background model (UBM) training for speaker recognition, comprising: 
Claim 17: A system for efficient universal background model (UBM) training for speaker recognition, comprising: 
a processing circuitry; and 
a processing circuitry; and 

a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system to: 
receive an audio input, divisible into a plurality of audio frames, wherein at least a first audio frame of the plurality of audio frames includes an audio sample having a length above a first threshold; Page 17 of 25ILMA P1343C1 
receive an audio input, divisible into a plurality of audio frames, wherein at least a first audio frame of the plurality of audio frames includes an audio sample having a length above a first threshold; 
extract at least one identifying feature from the first audio frame and generating a feature vector based on the at least one identifying feature; 
extract at least one identifying feature from the first audio frame and generating a feature vector based on the at least one identifying feature; 
generate an optimized training sequence computation based on the feature vector and a Gaussian Mixture Model (GMM), wherein the GMM is associated with a plurality of components, wherein each of the plurality of components is defined by a covariance matrix, a mean vector, and a weight vector; and 
generate an optimized training sequence computation based on the feature vector and a Gaussian Mixture Model (GMM), wherein the GMM is associated with a plurality of components, wherein each of the plurality of components is defined by a covariance matrix, a mean vector, and a weight vector; and 
update any of the associated components of the GMM based on the generated optimized training sequence computation
update any of the associated components of the GMM based on the generated optimized training sequence computation
Claim 19: The system of claim 18, wherein the system is further configured to: 
Claim 17: which includes a first computation, wherein the first computation includes 
generate a feature matrix based on the feature vector; 
generating a feature matrix based on the feature vector; 
generate a GMM mean matrix based on the plurality of mean vectors associated with the plurality of GMM components; and
generating a GMM mean matrix based on the plurality of mean vectors associated with the plurality of GMM components; and 
generate a delta matrix based on the feature matrix and the GMM mean matrix.
generating a delta matrix based on the feature matrix and the GMM mean matrix.
Claim 20: The system of claim 19, wherein the system is further configured to: update a mean vector, weight vector or a covariance matrix based on a computation of the delta matrix, an inverse covariance matrix, and a transposed delta matrix.
Claim 18: The system of claim 17, wherein the system is further configured to: update a mean vector, weight vector or a covariance matrix based on a computation of the delta matrix, an inverse covariance matrix, and a transposed delta matrix.
Claim 21: 	The system of claim 18, wherein the system is further configured to: 
Claim 19: The system of claim 17, wherein the system is further configured to: 

generate a first multi-dimensional array comprising a plurality of duplicated matrices, where each matrix includes a plurality of GMM mean vectors; 
generate a multi-dimensional feature matrix comprising a plurality of feature matrices, where each feature matrix corresponds to a feature vector of a single audio frame; and 
generate a multi-dimensional feature matrix comprising a plurality of feature matrices, where each feature matrix corresponds to a feature vector of a single audio frame; and 
generate a multi-dimensional delta array based on the first multi-dimensional array and the multi-dimensional feature matrix.
generate a multi-dimensional delta array based on the first multi-dimensional array and the multi-dimensional feature matrix.
Claim 22: The system of claim 21, wherein the system is further configured to: update a mean vector, weight vector or covariance matrix, based on a computation of the multi-dimensional delta array, an inverse covariance matrix, and a transposed delta array.
Claim 20: The system of claim 19, wherein the system is further configured to: update a mean vector, weight vector or covariance matrix, based on a computation of the multi-dimensional delta array, an inverse covariance matrix, and a transposed delta array.
Claim 23: The system of claim 18, wherein the system is further configured to: detect diagonal matrices, and only performing computations that involve diagonal elements
Claim 21: The system of claim 17, wherein the system is further configured to: detect diagonal matrices, and only performing computations that involve diagonal elements.
Claim 24: The system of claim 18, wherein the system is further configured to: 
Claim 22: The system of claim 17, wherein the system is further configured to: 
detect computations in an intermediate result that generates an off-diagonal element of a matrix which is diagonalized; and 
detect computations in an intermediate result that generates an off-diagonal element of a matrix which is diagonalized; and 
eliminate the computation of the intermediate result
eliminate the computation of the intermediate result.
Claim 25: The system of claim 18, wherein the system is further configured to: 
Claim 23: The system of claim 17, wherein the system is further configured to:
detect a recurring computation; precompute the recurring computation; and 
detect a recurring computation; precompute the recurring computation; and 
store the precomputed result in a cache.
store the precomputed result in a cache.
Claim 26: The system of claim 18, wherein the at least one identifying feature is a mel frequency cepstrum coefficient (MFCC).
Claim 24: The system of claim 17, wherein the at least one identifying feature is a mel frequency cepstrum coefficient (MFCC).
Claim 27: The system of claim 26, wherein the system is further configured to: 
Claim 25: The system of claim 24, wherein the system is further configured to: 
generate a plurality of identifying features from a consecutive audio frame; 
generate a plurality of identifying features from a consecutive audio frame; 
store the generated identifying features in a second feature vector data structure; and 
store the generated identifying features in a second feature vector data structure; and 
generate delta coefficients of the MFCCs based on the feature vector and the second feature vector; and 
generate delta coefficients of the MFCCs based on the feature vector and the second feature vector; and 
wherein the optimized training sequence computation is further performed based on the generated delta coefficients.
wherein the optimized training sequence computation is further performed based on the generated delta coefficients.
Claim 28: The system of claim 27, wherein the system is further configured to:
Claim 26: The system of claim 25, wherein the system is further configured to: 
generate delta-delta coefficients of the delta coefficients; and 
generate delta-delta coefficients of the delta coefficients; and 
wherein generating the optimized training sequence computation is further performed based on the generated delta-delta coefficients.
wherein generating the optimized training sequence computation is further performed based on the generated delta-delta coefficients.
Claim 29: The system of claim 27, wherein the consecutive audio frame partially overlaps with the first audio frame.
Claim 27: The system of claim 25, wherein the consecutive audio frame partially overlaps with the first audio frame.
Claim 30: The system of claim 18, wherein the system is further configured to: 
Claim 28: The system of claim 17, wherein the system is further configured to: 
detect one or more computations to be executed on at least one of: 
detect one or more computations to be executed on at least one of: 
a general purpose graphics processor unit (GPGPU) and a multi-core CPU.
a general purpose graphics processor unit (GPGPU) and a multi-core CPU.
Claim 31: The system of claim 18, wherein the audio sample is received from a speaker database including a plurality of audio samples, where each audio sample comprises a sample of a human speaker.
Claim 29: 	The system of claim 17, wherein the audio sample is received from a speaker database including a plurality of audio samples, where each audio sample comprises a sample of a human speaker.
Claim 32: The system of claim 18, wherein the system is further configured to: 
Claim 30: 	The system of claim 17, wherein the system is further configured to: 
provide each audio frame to a neural network, the neural network operative for extracting features from the audio frame; and 
provide each audio frame to a neural network, the neural network operative for extracting features from the audio frame; and 
generate an output vector of features.
generate an output vector of features.
Claim 33: The system of claim 18, wherein the system is further configured to: store the at least one identifying feature in a feature vector data structure.
Claim 31: The system of claim 17, wherein the system is further configured to: store the at least one identifying feature in a feature vector data structure.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 9, 14, 16-18, 26, 31, and 33 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Mitsufuji et al. (US PAP 2015/0058015).

Consider claim 1, Mitsufuji teaches a method for efficient universal background model (UBM) training for speaker recognition (abstract, 0094), comprising: 
receiving an audio input, divisible into a plurality of audio frames, wherein at least a first audio frame of the plurality of audio frames includes an audio sample having a length above a first threshold (0059-63, 69, receiving audio and dividing into frames, of size N); 

generating an optimized training sequence computation based on the feature vector and a Gaussian Mixture Model (GMM) (0094-0107, training the universal background model based on initial GMM values (0105) and speaker cepstrum envelopes), wherein the GMM is associated with a plurality of components, wherein each of the plurality of components is defined by a covariance matrix, a mean vector, and a weight vector (0103, covariance matrix, mean vector, and weight vector); and 
updating any of the associated components of the GMM based on the generated optimized training sequence computation (0103-105, learning parameters of the GMM Universal Background Model).

Consider claim 9, Mitsufuji teaches The method of claim 1, wherein the at least one identifying feature is a mel frequency cepstrum coefficient (MFCC) (0073-80, 0084-89, converting framed speech of speaker #Z to frequency domain and then to Cepstrum, which may be mel cepstrum at 0089).

Consider claim 14, Mitsufuji teaches the method of claim 1, wherein the audio sample is received from a speaker database including a plurality of audio samples, where each audio sample comprises a sample of a human speaker (0036, supplied with voices of speakers 1 to Z).
Consider claim 16, Mitsufuji teaches the method of claim 1, further comprising: storing the at least one identifying feature in a feature vector data structure (i.e. 0088, cepstrum C(j,i)).

Consider claim 17, Mitsufuji A non-transitory computer readable medium having stored thereon instructions for causing a processing circuitry to perform a process (abstract, 0094, 0241-42, programs stored in memory), the process comprising: 
receiving an audio input, divisible into a plurality of audio frames, wherein at least a first audio frame of the plurality of audio frames includes an audio sample having a length above a first threshold (0059-63, 69, receiving audio and dividing into frames, of size N); 
extracting at least one identifying feature from the first audio frame and generating a feature vector based on the at least one identifying feature (0073-80, 0084-89, converting framed speech of speaker #Z to frequency domain and then to Cepstrum); 
generating an optimized training sequence computation based on the feature vector and a Gaussian Mixture Model (GMM) (0094-0107, training the universal background model based on initial GMM values (0105) and speaker cepstrum envelopes), wherein the GMM is associated with a plurality of components, wherein each of the plurality of components is defined by a covariance matrix, a mean vector, and a weight vector (0103, covariance matrix, mean vector, and weight vector); and 


Consider claim 18, Mitsufuji A system for efficient universal background model (UBM) training for speaker recognition (abstract, 0094), comprising: 
a processing circuitry (0244); and 
a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system (0241-43, 45, memory storing program) to: 
receive an audio input, divisible into a plurality of audio frames, wherein at least a first audio frame of the plurality of audio frames includes an audio sample having a length above a first threshold (0059-63, 69, receiving audio and dividing into frames, of size N); 
extract at least one identifying feature from the first audio frame and generating a feature vector based on the at least one identifying feature (0073-80, 0084-89, converting framed speech of speaker #Z to frequency domain and then to Cepstrum); 
generate an optimized training sequence computation based on the feature vector and a Gaussian Mixture Model (GMM) (0094-0107, training the universal background model based on initial GMM values (0105) and speaker cepstrum envelopes), wherein the GMM is associated with a plurality of components, wherein each of the plurality of components is defined by a covariance matrix, a mean vector, and a weight vector (0103, covariance matrix, mean vector, and weight vector); and 


Claim 26 contains similar limitations as claim 9 and is therefore rejected for the same reasons.

Claim 31 contains similar limitations as claim 14 and is therefore rejected for the same reasons.

Claim 33 contains similar limitations as claim 16 and is therefore rejected for the same reasons.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 6, 7, 23, and 24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mitsufuji in view of Martinez-Gonzalez et al. (US PAP 2019/0005962).

Consider claim 6, Mitsufuji teaches the method of claim 1, but does not specifically teach wherein updating the optimized training sequence computation comprises: detecting diagonal matrices, and only performing computations that involve diagonal elements.
In the same field of speaker recognition using universal background models, Martinez-Gonzalez teaches wherein updating the optimized training sequence computation comprises: detecting diagonal matrices, and only performing computations that involve diagonal elements (0086-87, diagonal matrix used to generate background model, and off diagonal components are not stored).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use diagonal matrices as taught by Martinez-Gonzalez in the system of Mitsufuji in order to reduce storage requirements needed to store the GMM components (Martinez-Gonzalez 0087).

Consider claim 7, Mitsufuji teaches the method of claim 1, but does not specifically teach wherein updating the optimized training sequence computation comprises: detecting computations in an intermediate result that generate an off-diagonal element of a matrix which is diagonalized; and 
eliminating the computation of the intermediate result.
In the same field of speaker recognition using universal background models, Martinez-Gonzalez teaches wherein updating the optimized training sequence computation comprises: detecting computations in an intermediate result that generate 
eliminating the computation of the intermediate result (0086-87, diagonal matrix used to generate background model, and off diagonal components are not stored).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use diagonal matrices as taught by Martinez-Gonzalez in the system of Mitsufuji in order to reduce storage requirements needed to store the GMM components (Martinez-Gonzalez 0087).

Claim 23 contains similar limitations as claim 6 and is therefore rejected for the same reasons.

Claim 24 contains similar limitations as claim 7 and is therefore rejected for the same reasons.

Claims 8 and 25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mitsufuji in view of Sieklucki et al. (US Patent 10,198,319).

Consider claim 8, Mitsufuji teaches the method of claim 1, but does not specifically teach wherein updating the optimized training sequence computation comprises: 
detecting a recurring computation; 
precomputing the recurring computation; and  Page 15 of 25ILMA P1343 

In the same field of data storage systems, Sieklucki teaches wherein updating the optimized training sequence computation comprises: 
detecting a recurring computation (col 12 lines 20-45, repeated computations); 
precomputing the recurring computation (col 12 lines 20-45, precomputing repeated computations); and  Page 15 of 25ILMA P1343 
storing the precomputed result in a cache (col 12 lines 20-45, storing precomputed repeated computations).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to precompute recurring computations as taught by Sieklucki in the system of Mitsufuji in order to improve performance of the system (Sieklucki col 2 lines 10-15).

Claim 25 contains similar limitations as claim 8 and is therefore rejected for the same reasons.

Claims 10-12 and 27-29 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mitsufuji in view of Aronowitz (US PAP 2017/0069313).

Consider claim 10, Mitsufuji teaches the method of claim 9, further comprising: 
generating a plurality of identifying features from a consecutive audio frame (0073-80, 0084-89, converting framed speech of speaker #Z to frequency domain and then to Cepstrum, cepstrums are computed for each frame l); 

Mitsufuji does not specifically teach
generating delta coefficients of the MFCCs based on the feature vector and the second feature vector; and 
wherein the optimized training sequence computation is further performed based on the generated delta coefficients.
In the same field of GMM based background models, Aronowitz teaches 
generating delta coefficients of the MFCCs based on the feature vector and the second feature vector (0047, computing delta coefficients of the MFCCs); and 
wherein the optimized training sequence computation is further performed based on the generated delta coefficients (0047-48, training may be based on the delta coefficients.).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use delta features as taught by Aronowitz in the system of Mitsufuji in order to account for changes in the signal (delta) over time within the background model (Aronowitz 0047-48).

Consider claim 11, Aronowitz teaches the method of claim 10, further comprising: 
generating delta-delta coefficients of the delta coefficients (0047, double delta coefficients generated); and 


Consider claim 12, Mitsufuji teaches the method of claim 10, wherein the consecutive audio frame partially overlaps with the first audio frame (0071, 50% overlap for example).

Claim 27 contains similar limitations as claim 10 and is therefore rejected for the same reasons.

Claim 28 contains similar limitations as claim 11 and is therefore rejected for the same reasons.

Claim 29 contains similar limitations as claim 12 and is therefore rejected for the same reasons.

Claims 13 and 30 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mitsufuji in view of Wang et al (US PAP 2014/0214417).

Consider claim 13, Mitsufuji teaches the method of claim 1, but does not specifically teach wherein updating the optimized training sequence computation further 
In the same field of speaker recognition, Wang teaches wherein updating the optimized training sequence computation further includes: detecting one or more computations to be executed on at least one of: a general purpose graphics processor unit (GPGPU) and a multi-core CPU (0107, muti-core processor may be used for computations).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use a multi-core processor as taught by Wang in the system of Mitsufuji in order to use well known processor configurations to perform computations.

Claim 30 contains similar limitations as claim 13 and is therefore rejected for the same reasons.

Claims 15 and 32 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mitsufuji in view of Zhong et al. (US PAP 2019/0130172).

Consider claim 15, Mitsufuji teaches the method of claim 1, but does not specifically teach wherein generating the at least one identifying feature further comprises: 
providing each audio frame to a neural network, the neural network operative for extracting features from the audio frame; and 
generating an output vector of features.

providing each audio frame to a neural network, the neural network operative for extracting features from the audio frame (0064, inputting voice data no neural network); and 
generating an output vector of features (0064, generating voiceprint features with neural network).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use a neural network to extract features as taught by Zhong in the system of Mitsufuji in order to make use of a widely applied method of extracting voice features (Zhong 0064).

Claim 32 contains similar limitations as claim 15 and is therefore rejected for the same reasons.

Allowable Subject Matter
Claims 2-5 and 19-22 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.  The following is a statement of reasons for the indication of allowable subject matter:  

Consider claim 2, Mitsufuji teaches the method of claim 1, wherein updating the optimized training sequence computation comprises: 

generating a GMM mean matrix based on the plurality of mean vectors associated with the plurality of GMM components (0101-03 generating means based on features).
However, the prior art of record does not specifically teach “generating a delta matrix based on the feature matrix and the GMM mean matrix” when combined with each and every other limitation of the claim and base claim.   Note that Aronowitz (US PAP 2017/0069313) teaches using delta features, but does not teach that they are generated “based on the feature matrix and the GMM mean matrix” as required by the claim.  Therefore claim 2 contains allowable subject matter.

Claim 3 depends on and further limits claim 2 and therefore contains allowable subject matter as well.

Consider claim 4, Mitsufuji teaches the method of claim 1, wherein updating the optimized training sequence computation comprises: 
generating a first multi-dimensional array comprising a plurality of duplicated matrices, where each matrix includes a plurality of GMM mean vectors (0101-03 generating means based on features); 
generating a multi-dimensional feature matrix comprising a plurality of feature matrices, where each feature matrix corresponds to a feature vector of a single audio frame (0092-93, cepstrums of speakers). 


Claim 5 depends on and further limits claim 4 and therefore contains allowable subject matter as well.

Claim 19 contains similar limitations as claim 2 and therefore contains allowable subject matter as well.

Claim 20 depends on and further limits claim 19 and therefore contains allowable subject matter as well.

Claim 21 contains similar limitations as claim 4 and therefore contains allowable subject matter as well.

Claim 22 depends on and further limits claim 21 and therefore contains allowable subject matter as well.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DOUGLAS C GODBOLD whose telephone number is (571)270-1451. The examiner can normally be reached 6:30am-5pm Monday-Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

DOUGLAS GODBOLD
Examiner
Art Unit 2655

/DOUGLAS GODBOLD/Primary Examiner, Art Unit 2655