DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 1-20 are pending in this application.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.  A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of 
The USPTO internet Web site contains terminal disclaimer forms which may be used.  Please visit http://www.uspto.gov/forms/.  The filing date of the application will determine what form should be used.  A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission.  For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.  
Claims 1-20 are rejected on the ground of nonstatutory double patenting over claims 1-20 of U.S. Patent No. 10726848. Although the claims at issue are not identical, they are not patentably distinct from each other because adding inherent and/or unnecessary limitations/step and rearranging the claims would be within the level of one of ordinary skill in the art. It is well settled that the insertion of an element, e.g. “creating an acoustic signature for the first speaker using some or all of the tagged clustered segments”, and its function is an obvious expedient if the remaining elements perform the same function as before. In re Karlson, 136 USPQ 184 (CCPA 1963). Also note Ex parte Rainu, 168 USPQ 375 (Bd. App. 1969). Insertion of a reference element or step whose function is not needed would be obvious to one of ordinary skill in the art.
Instant Application No. 16/934,455
U.S. Patent No. 10726848
1. A method of blind diarization of audio data having a first-pass blind diarization process and a second-pass blind diarization process, the method comprising:

representing each utterance as an utterance model representative of a plurality of feature vectors of each utterance;
clustering the utterance models; and
constructing a plurality of speaker models from the clustered utterance models.
2. The method of claim 1, further comprising:
constructing a first hidden Markov model (HMM) of the plurality of speaker models;
decoding a sequence of identified speaker models that best corresponds to the utterances of the audio data;
for each segment, decoding the segment using a decoder, wherein the decoder outputs words and non-speech symbols; and
for each segment, analyzing the sequence of output words and non-speech symbols from the decoder for the segment, wherein non-speech parts are discarded and the segment is refined resulting in sub-segments comprising words.
3. The method of claim 2, wherein the decoder comprises a large-vocabulary continuous speech recognition (LVCSR) decoder.
4. The method of claim 2, further comprising:
constructing a second plurality of speaker models using the subsegments; by feeding the resulting sub-segments into a clustering algorithm; and

5. The method of claim 4, wherein constructing the second plurality of speaker models using the subsegments comprises feeding the sub-segments into a clustering algorithm.
6. The method of claim 4, further comprising decoding a best path corresponding to the sequence of output words in the HMM by applying a Viterbi algorithm that performs word-level segmentation.
7. The method of claim 6, wherein decoding the best path corresponding to the sequence of output words in the HMM comprises decoding the best path by applying a Viterbi algorithm that performs word-level segmentation.
8. The method of claim 1, further comprising segmenting the audio data using a voice-activity-detector (VAD).


receiving audio data at a communication interface of a computing system on a frame by frame basis, wherein at least 
selecting a linguistic model to create a speech to text transcription according to the metadata;
creating a speech to text transcription of the audio data;
segmenting the audio data according to identified word sequences;
clustering segments of the audio data according to the identified word sequences;
applying an acoustical matching technique and a text analysis technique to each of the clustered segments to identify which clustered segments include audio that was likely spoken by the first speaker and which clustered segments include audio that was likely spoken by the second speaker; and
tagging the clustered segments to indicate whether they include audio spoken by the first speaker or the second speaker; and
creating an acoustic signature for the first speaker using some or all of the tagged clustered segments, wherein creating the acoustical signature for the first speaker comprises:
classifying some or all of the tagged clustered segments to identify a set of common speaker Gaussian mixture models (GMMs);
constructing a first super-GMM for the set of common speaker GMMs; and
constructing a second super-GMM for a set of generic speaker GMMs, wherein 
2. The method according to claim 1, further comprising using the metadata to select the linguistic model to create the speech to text transcription.
3. The method according to claim 1, further comprising filtering out non-speech frames by evaluating envelope energy level of respective frames and comparing the envelope energy to a threshold energy above which a frame includes an utterance.
4. The method according to claim 1, further comprising assigning start time frames to the word sequences, wherein the clustering is further based on respective time lengths of the word sequences.
5. The method according to claim 1, wherein the segmenting further comprises using voice activity detection to segment the audio data into utterances having a statistical likelihood of emanating from a single speaker.
6. The method according to claim 1, wherein the metadata comprises an identification number for the first speaker and the second speaker.
7. The method according to claim 1, wherein the metadata comprises context data selected from the group consisting of including a topic, time, date, and location of audio data origin.      




Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was 
Claims 1, 8, 9, 16, and 17 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Kajarekar et al., (US Pub. 20130144414) in view of Dobry et al., (US Pub. 20110282661).
Regarding claim 1, Kajarekar discloses a method of blind diarization of audio data having a first-pass blind diarization process and a second-pass blind diarization process, the method comprising: 
identifying [non-speech segments] in audio data and segmenting the audio data into a plurality of utterance [that are separated by the identified non-speech segments] (Fig. 3, [0035] identifying speaker segmentation of an audio stream), 
representing each utterance as an utterance model representative of a plurality of feature vectors of each utterance (Fig. 3, [0035][0036] “segments are labeled with X, Y, or Z to identify those segments that are acoustically similar…generate a statistical model for each of the segments based upon the extracted feature vectors”); 
clustering the utterance models (Fig. 3, steps 304 and 306, [0036]-[0040] performing clustering models); and 
constructing a plurality of speaker models from the clustered utterance models (Fig. 3, step 308, [0036]-[0040] generating speaker models for each of the clusters).

identifying [non-speech segments] in audio data and segmenting the audio data into a plurality of utterance [that are separated by the identified non-speech segments] (Dobry, [0039] “voice activity is detected, and inadequate parts of the audio, such as silent or noisy parts are eliminated, in order to leave only speech parts”). 
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the system and method of building speaker models as taught by Kajarekar with the method of identifying non-speech parts as taught by Dobry to improve audio analysis techniques which need to be applied at the audio in order to extract the information (Dobry, [0003]).
Regarding claim 8, Kajarekar in view of Dobry discloses the method of claim 1, and Dobry further discloses:
segmenting the audio data using a voice-activity-detector (VAD) (Dobry, [0039] “voice activity is detected, and inadequate parts of the audio, such as silent or noisy parts are eliminated, in order to leave only speech parts”).
Regarding claims 9 and 16, Claims 9 and 16 are the corresponding system claims to method claims 1 and 8. Therefore, claims 9 and 16 are rejected using the same rationale as applied to claims 1 and 8 above.
Regarding claim 17, Claim 17 is the corresponding medium claim to method claim 1. Therefore, claim 17 is rejected using the same rationale as applied to claim 1 above.

Claims 2-7, 10-15, and 18-20 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Kajarekar et al., (US Pub. 20130144414) in view of Dobry et al., (US Pub. 20110282661) and further in view of Abuzeina et al., (US Pub. 20140067394).
Regarding claim 2, Kajarekar in view of Dobry discloses the method of claim 1, and Dobry further discloses excepting the bracketed limitation:
for each segment, decoding the segment using a [decoder], wherein the decoder outputs words and non-speech symbols; and for each segment, analyzing the sequence of output words and non-speech symbols from the [decoder] for the segment, wherein non-speech parts are discarded and the segment is refined resulting in sub-segments comprising words (Dobry, [0039] “voice activity is detected, and inadequate parts of the audio, such as silent or noisy parts are eliminated, in order to leave only speech parts”).
Kajarekar in view of Dobry does not explicitly teach, however Abuzeina does explicitly teach including the bracketed limitation:
constructing a first hidden Markov model (HMM) of the plurality of speaker models; decoding a sequence of identified speaker models that best corresponds to the utterances of the audio data; and [decoder] ([0033]-[0039] “acoustic model 18 builds the HMMs for all the triphones and the probability distribution of the observations for each state in each HMM”; [0035] “The decoder 14 uses the speech features 24 presented by the front end 12 to search for the most probable matching words … and then sentences that correspond to observation speech features 24. The recognition process of the decoder 14 starts by finding the likelihood of a given sequence of speech features based on the phonemes' HMMs”).

Regarding claim 3, Kajarekar in view of Dobry and further in view of Abuzeina discloses the method of claim 2, and Abuzeina further discloses:
wherein the decoder comprises a large-vocabulary continuous speech recognition (LVCSR) decoder (Abuzeina, [0037]-[0043] “In a natural language speech recognition system, the language model 22 is a statistically based model using unigram, bigrams, and trigrams of the language for the text to be recognized”).
Regarding claim 4, Kajarekar in view of Dobry and further in view of Abuzeina discloses the method of claim 2, and Kajaekar further discloses:
constructing a second plurality of speaker models using the subsegments; by feeding the resulting sub-segments into a clustering algorithm; and constructing a [hidden Markov model (HMM) HMM] of the second plurality of speaker models ([0040]-[0043] “apply the Viterbi algorithm at 310 to the Speaker Models to refine the segmentation boundaries … Clusters that are “similar” based upon the features of the corresponding segments may be grouped accordingly. In addition, the speaker models of these clusters may also be associated with one another, and/or a composite representation may be generated from the speaker models”).
Kajarekar in view of Dobry does not explicitly teach, however Abuzeina does explicitly teach:

Regarding claim 5, Kajarekar in view of Dobry and further in view of Abuzeina discloses the method of claim 4, and Kajaekar further discloses:
wherein constructing the second plurality of speaker models using the subsegments comprises feeding the sub-segments into a clustering algorithm ([0043] “continue to apply Viterbi and optimize CLR or other suitable criterion at 310 and 312, respectively, until the system determines that the clusters are different enough that they cannot include the same speaker”).
Regarding claim 6, Kajarekar in view of Dobry and further in view of Abuzeina discloses the method of claim 4, and Kajaekar further discloses:
decoding a best path corresponding to the sequence of output words in the HMM by applying a Viterbi algorithm that performs word-level segmentation ([0041] “apply the Viterbi algorithm at 310 to the Speaker Models to refine the segmentation boundaries using all of the feature vectors obtained for the audio stream”).
Regarding claim 7, Kajarekar in view of Dobry and further in view of Abuzeina discloses the method of claim 6, and Abuzeina further discloses:
wherein decoding the best path corresponding to the sequence of output words in the HMM comprises decoding the best path by applying a Viterbi algorithm that performs word-level segmentation (Abuzeina, [0035] “The decoder 14 uses the known Viterbi algorithm to find the highest scoring state sequence”).
Regarding claims 10-15, Claims 10-15 are the corresponding system claims to method claims 2-7. Therefore, claims 10-15 are rejected using the same rationale as applied to claims 2-7 above.
Regarding claims 18-20, Claims 18-20 are the corresponding medium claims to method claims 2-4. Therefore, claims 18-20 are rejected using the same rationale as applied to claims 2-4 above.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEONG-AH A. SHIN whose telephone number is (571)272-5933. The examiner can normally be reached 9 AM-3PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 

Seong-ah A. Shin
Primary Examiner
Art Unit 2659



/SEONG-AH A SHIN/Primary Examiner, Art Unit 2659