DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 1-20 are pending in this application.

Response to Arguments
Regarding Rejection under 35 U.S.C. 103
Applicant’s amendment and arguments with respect to rejections have been fully considered but are moot because the arguments do not apply to any of the references being used in the current rejection.

Terminal Disclaimer
Examiner acknowledges that a terminal disclaimer with respect to prior patent No. 10,726,848, was filed on 06/21/2022.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 8, 9, 16, and 17 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Kajarekar et al., (US Pub. 20130144414) in view of Bocklet et al., (US Pub. 2016/0365096, PCT filing date 2014-03-28) and further in view of Dobry et al., (US Pub. 20110282661).
Regarding claim 1, Kajarekar discloses a method of blind diarization of audio data having a first-pass blind diarization process and a second-pass blind diarization process, the method comprising:
 identifying [non-speech segments] in audio data and segmenting the audio data into a plurality of utterances [that are separated by the identified non-speech segments];
extracting acoustic features from the audio data ([0035][0036] extracting a plurality of feature vectors of each of the speech segments); 
generating a Gaussian Mixture Model (GMM) using the extracted acoustic features ([0035][0036][0040] generating a GMM for each cluster based upon the feature vectors of each of the segments in the corresponding cluster); 
determining distances between each utterance of the plurality of utterances using the utterance models ([0038] “compares each segment in the audio stream with every other segment in the audio stream (e.g., by comparing statistical models)”); 
clustering the utterances of the plurality of utterances using the determined distances ([0038] “generates clusters such that each cluster identifies segments of the audio stream that the system has determined includes the same hypothetical speaker”); and 
constructing a plurality of speaker models from the clustered utterances ([0040] “generate a speaker model for each of the clusters”).
Kajarekar does not explicitly teach:
[for each utterance of the plurality of utterances, constructing an utterance model from the GMM for the utterance].
However, Bocklet does explicitly teach including the bracketed limitation:
extracting acoustic features from the audio data (Figs. 1, 3, and 4, [0034]-[0037] extracting audio features which include MFCC); 
generating a Gaussian Mixture Model (GMM) using the extracted acoustic features (Figs. 1, 3, and 4, [0021][0038][0039][0097] generating a target/cohort speaker GMM); 
[for each utterance of the plurality of utterances, constructing an utterance model from the GMM for the utterance] (Figs. 1, 3, and 4, [0097] creating a plurality of enrollment supervectors by extracting an enrollment supervector from each target/cohort speaker GMM);
determining distances between each utterance of the plurality of utterances using the utterance models (Figs. 1, 3, and 4, [0097] “calculating, from each cohort supervector to each enrollment supervector, a city block distance metric representing a similarity between the cohort supervector and the enrollment supervector”);
clustering the utterances of the plurality of utterances using the determined distances (Figs. 1, 3, and 4, [0097] “selecting, from the plurality of cohort supervectors, a proper subset of cohort supervectors based on the calculated distance metrics”); and 
constructing a plurality of speaker models from the clustered utterances (Fig. 4, [0097] “training a Support Vector Machine (SVM) to authenticate the target speaker, the training initiated by providing the plurality of enrollment supervectors and the selected proper subset of cohort supervectors to the SVM”).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the system and method of building speaker models as taught by Kajarekar with the method of constructing a speaker GMM as taught by Bocklet to reduce the computational complexity and memory consumption of the system and make the system suitable to use on devices with memory and processor constraints, such as application-specific integrated circuits (Bocklet, [0015]).
Kajarekar in view of Bocklet does not explicitly teach the bracketed limitation, however, Dobry does teach:
identifying [non-speech segments] in audio data and segmenting the audio data into a plurality of utterance [that are separated by the identified non-speech segments] (Dobry, [0039] “voice activity is detected, and inadequate parts of the audio, such as silent or noisy parts are eliminated, in order to leave only speech parts”). 
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the system and method of building speaker models as taught by Kajarekar in view of Bocklet with the method of identifying non-speech parts as taught by Dobry to improve audio analysis techniques which need to be applied at the audio in order to extract the information (Dobry, [0003]).
Regarding claim 8, Kajarekar in view of Bocklet and further 
in view of Dobry discloses the method of claim 1, and Dobry further discloses:
segmenting the audio data using a voice-activity-detector (VAD) (Dobry, [0039] “voice activity is detected, and inadequate parts of the audio, such as silent or noisy parts are eliminated, in order to leave only speech parts”).
Regarding claims 9 and 16, Claims 9 and 16 are the corresponding system claims to method claims 1 and 8. Therefore, claims 9 and 16 are rejected using the same rationale as applied to claims 1 and 8 above.
Regarding claim 17, Claim 17 is the corresponding medium claim to method claim 1. Therefore, claim 17 is rejected using the same rationale as applied to claim 1 above.
Claims 2-7, 10-15, and 18-20 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Kajarekar et al., (US Pub. 20130144414) in view of Bocklet et al., (US Pub. 2016/0365096, PCT filing date 2014-03-28), further in view of Dobry et al., (US Pub. 20110282661) and further in view of Abuzeina et al., (US Pub. 20140067394).
Regarding claim 2, Kajarekar in view of Bocklet and further in view of Dobry discloses the method of claim 1, and Dobry further discloses excepting the bracketed limitation:
for each segment, decoding the segment using a [decoder], wherein the decoder outputs words and non-speech symbols; and for each segment, analyzing the words and the non-speech symbols from the [decoder] for the segment, wherein the non-speech symbols are discarded and the segment is refined resulting in subsegments comprising the words (Dobry, [0039] “voice activity is detected, and inadequate parts of the audio, such as silent or noisy parts are eliminated, in order to leave only speech parts”).
Kajarekar in view of Bocklet, further in view of Dobry does not explicitly teach, however Abuzeina does explicitly teach including the bracketed limitation:
constructing a first hidden Markov model (HMM) of the plurality of speaker models; decoding a sequence of identified speaker models that best corresponds to the utterances of the audio data; and [decoder] ([0033]-[0039] “acoustic model 18 builds the HMMs for all the triphones and the probability distribution of the observations for each state in each HMM”; [0035] “The decoder 14 uses the speech features 24 presented by the front end 12 to search for the most probable matching words … and then sentences that correspond to observation speech features 24. The recognition process of the decoder 14 starts by finding the likelihood of a given sequence of speech features based on the phonemes' HMMs”).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the system and method of building speaker models as taught by Kajarekar in view of Bocklet, further in view of Dobry with the system and method for decoding speech using HMM as taugh by Abuzeina to improve recognition accuracy, as it characterizes the HMM of each phoneme (Abuzeina, [0036][0041]).
Regarding claim 3, Kajarekar in view of Bocklet, further in view of Dobry and further in view of Abuzeina discloses the method of claim 2, and Abuzeina further discloses:
wherein the decoder comprises a large-vocabulary continuous speech recognition (LVCSR) decoder (Abuzeina, [0037]-[0043] “In a natural language speech recognition system, the language model 22 is a statistically based model using unigram, bigrams, and trigrams of the language for the text to be recognized”).
Regarding claim 4, Kajarekar in view of Bocklet, further in view of Dobry and further in view of Abuzeina discloses the method of claim 2, and Kajaekar further discloses:
constructing a second plurality of speaker models using the subsegments; by feeding the resulting sub-segments into a clustering algorithm; and constructing a [hidden Markov model (HMM) HMM] of the second plurality of speaker models ([0040]-[0043] “apply the Viterbi algorithm at 310 to the Speaker Models to refine the segmentation boundaries … Clusters that are “similar” based upon the features of the corresponding segments may be grouped accordingly. In addition, the speaker models of these clusters may also be associated with one another, and/or a composite representation may be generated from the speaker models”).
Kajarekar in view of Dobry does not explicitly teach, however Abuzeina does explicitly teach:
constructing a [hidden Markov model (HMM) HMM ([0033]-[0039] “The speaker classification results can be further fed back and used for updating the models generated by speaker classification training component 140”).
Regarding claim 5, Kajarekar in view of Bocklet, further in view of Dobry and further in view of Abuzeina discloses the method of claim 4, and Kajaekar further discloses:
wherein constructing the second plurality of speaker models using the subsegments comprises feeding the subsegments into a clustering algorithm ([0043] “continue to apply Viterbi and optimize CLR or other suitable criterion at 310 and 312, respectively, until the system determines that the clusters are different enough that they cannot include the same speaker”).
Regarding claim 6, Kajarekar in view of Bocklet, further in view of Dobry and further in view of Abuzeina discloses the method of claim 4, and Kajaekar further discloses:
decoding a best path corresponding to the words in the second HMM by applying a Viterbi algorithm that performs word-level segmentation ([0041] “apply the Viterbi algorithm at 310 to the Speaker Models to refine the segmentation boundaries using all of the feature vectors obtained for the audio stream”).
Regarding claim 7, Kajarekar in view of Bocklet, further in view of Dobry and further in view of Abuzeina discloses the method of claim 6, and Abuzeina further discloses:
wherein decoding the best path corresponding to the words in the second HMM comprises decoding the best path by applying a Viterbi algorithm that performs word-level segmentation (Abuzeina, ([0033]-[0039] “The decoder 14 uses the known Viterbi algorithm to find the highest scoring state sequence”).
Regarding claims 10-15, Claims 10-15 are the corresponding system claims to method claims 2-7. Therefore, claims 10-15 are rejected using the same rationale as applied to claims 2-7 above.
Regarding claims 18-20, Claims 18-20 are the corresponding medium claims to method claims 2-4. Therefore, claims 18-20 are rejected using the same rationale as applied to claims 2-4 above.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEONG-AH A. SHIN whose telephone number is (571)272-5933. The examiner can normally be reached 9 AM-3PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Seong-ah A. Shin
Primary Examiner
Art Unit 2659



/SEONG-AH A SHIN/           Primary Examiner, Art Unit 2659