Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings were received on 3/31/2020.  These drawings are accepted.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea in the form of mental process without significantly more. The claim(s) recite(s) “removing … non-speech portions from a call audio …; dividing … the pre-processed audio to a plurality of audio segments …; and clustering … the plurality of segments into at least two groups corresponding to the at least two speakers.” Such recitation can be performed by the human mind when listening to an audio call with multiple speakers and noise. This judicial exception is not integrated into a practical application because the claimed language is merely directed towards the judicial exception without integrating the judicial exception by reciting additional elements or a combination of elements in the claim to apply, rely on or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claimed language is merely directed towards the 
	For these reasons claim 1 is ineligible.
	Claims 2-3 recites language that further limits the independent claim, but fails to recite additional elements that integrate the judicial exception into practical application that imposes a meaningful limit on the judicial exception nor recites additional elements that amount to significantly more than the judicial exception.
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea in the form of mental process without significantly more. The claim(s) recite(s) “removing … non-speech portions from a call audio …; dividing … the pre-processed audio to a plurality of audio segments …; and clustering … the plurality of segments into at least two groups corresponding to the at least two speakers.” Such recitation can be performed by the human mind when listening to an audio call with multiple speakers and noise. This judicial exception is not integrated into a practical application because the claimed language is merely directed towards the judicial exception without integrating the judicial exception by reciting additional elements or a combination of elements in the claim to apply, rely on or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception. Although the claim recites additional language such as “a processor; and a memory …;”, such language merely recites generic computer that performs the judicial exception, but does not integrate the judicial exception into practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claimed language is merely directed towards the judicial 
	For these reasons claim 6 is ineligible.
	Claims 7-8 recites language that further limits the independent claim, but fails to recite additional elements that integrate the judicial exception into practical application that imposes a meaningful limit on the judicial exception nor recites additional elements that amount to significantly more than the judicial exception.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-3,5-8,10 is/are rejected under 35 U.S.C. 102a1 as being anticipated by Gorodetski et al (US Publication No.: 20180218738).
Claim 1, Gorodetski et al discloses 
removing, at a call analytics server (CAS) (Fig. 1, label 104, Paragraph 66 discloses the SST server can also perform diarization process. Such process can also be 
	dividing, at the CAS (Paragraph 66 discloses the SST server can also perform diarization process. Such process can also be performed at a centralized server.), the pre-processed audio to a plurality of audio segments (Paragraph 38 discloses separating the audio data into frames for diarization.), each segment of the plurality of segments corresponding to speech from a single speaker of the at least two speakers (Paragraph 38,40 discloses segmentation attributes to separate speakers and segment the audio data 102 into utterances or short segments of audio data with a likelihood of emanating from a single speaker.), which indicates each segment of the plurality of segments correspond to speech from a single speaker (Paragraph 42,43,44 discloses segmentation indicates segments of speaker 1 and segments of speaker 2 and determining which of the two speakers is the agent.); and 
	clustering, at the CAS (Paragraph 66 discloses the SST server can also perform diarization process. Such process can also be performed at a centralized server.), the plurality of segments into at least two groups corresponding to the at least two speakers (Paragraph 42-44 discloses segmentation indicates segments of speaker 1 and segments of speaker 2 and determining which of the two speakers is the agent.).
Claim 2, Gorodetski et al discloses receiving, at the CAS (Paragraph 66 discloses the SST server can also perform diarization process. Such process can also be performed at a centralized server.), the call audio from a call audio source (Fig. 1, label 102, Fig. 3, 
Claim 3, Gorodetski et al discloses the removing comprises removing portions comprising at least one of beeps, rings, silence, noise or music. (paragraph 5,38 discloses diarization filters out non-speech frames that include noises or music, for example.)
Claim 5, Gorodetski et al discloses the clustering comprises:
deriving the MFCC values for each audio segment of the plurality of audio segments (Fig. 1, label blind diarization, Fig. 4 shows the process of blind diarization. Paragraph 69 discloses “acoustic features are extracted at 406 for the entire conversation … wherein T is the total number of frames …” Paragraph 70 discloses the acoustic features are MFCC for each frame.); 
calculating numerical array with MFCC values for each audio segment (Paragraph 71 discloses “MFCC features extracted from each frame are given as a vector of real values of some fixed dimension d.”); and
perform a clustering technique to yield the at least two groups of audio segments (Fig. 4, label 414,416,418, Paragraph 81 discloses clustering is used to identify those utterances having similar acoustic features into N clusters, wherein each of the identified N clusters can be used as a model of each speaker constructed at 416. Paragraph 82 discloses label 416 associates each cluster to a speaker.)
Claim 6, Gorodetski et al discloses 
a processor (Fig. 3 shows the computing system that implements any methods 100,200,400,500,700,800 (paragraph 26-27). Fig. 3, label 306.); and 

removing, at a call analytics server (CAS) (Fig. 1, label 104, Paragraph 66 discloses the SST server can also perform diarization process. Such process can also be performed at a centralized server.), non-speech portions from a call audio to produce a pre-processed audio (paragraph 38 discloses the blind diarization process then filters out non-speech frames.), the call audio comprising speech from at least two speakers (paragraph 38,42-44 discloses segmentation of the audio data into frames to separate speakers in the audio.); 
	dividing, at the CAS (Paragraph 66 discloses the SST server can also perform diarization process. Such process can also be performed at a centralized server.), the pre-processed audio to a plurality of audio segments (Paragraph 38 discloses separating the audio data into frames for diarization.), each segment of the plurality of segments corresponding to speech from a single speaker of the at least two speakers (Paragraph 38,40 discloses segmentation attributes to separate speakers and segment the audio data 102 into utterances or short segments of audio data with a likelihood of emanating from a single speaker.), which indicates each segment of the plurality of segments correspond to speech from a single speaker (Paragraph 42,43,44 discloses segmentation indicates segments of speaker 1 and segments of speaker 2 and determining which of the two speakers is the agent.); and 
	clustering, at the CAS (Paragraph 66 discloses the SST server can also perform diarization process. Such process can also be performed at a centralized server.), the plurality of segments into at least two groups corresponding to the at least two speakers 
Claim 7, Gorodetski et al discloses receiving, at the CAS (Paragraph 66 discloses the SST server can also perform diarization process. Such process can also be performed at a centralized server.), the call audio from a call audio source (Fig. 1, label 102, Fig. 3, label 320. Paragraph 32,34 discloses the audio data may be an audio recording or a conversation with unknown number of speakers.)
Claim 8, Gorodetski et al discloses the removing comprises removing portions comprising at least one of beeps, rings, silence, noise or music. (paragraph 5,38 discloses diarization filters out non-speech frames that include noises or music, for example.)
Claim 10, Gorodetski et al discloses the clustering comprises:
deriving the MFCC values for each audio segment of the plurality of audio segments (Fig. 1, label blind diarization, Fig. 4 shows the process of blind diarization. Paragraph 69 discloses “acoustic features are extracted at 406 for the entire conversation … wherein T is the total number of frames …” Paragraph 70 discloses the acoustic features are MFCC for each frame.); 
calculating numerical array with MFCC values for each audio segment (Paragraph 71 discloses “MFCC features extracted from each frame are given as a vector of real values of some fixed dimension d.”); and
perform a clustering technique to yield the at least two groups of audio segments (Fig. 4, label 414,416,418, Paragraph 81 discloses clustering is used to identify those utterances having similar acoustic features into N clusters, wherein each of the 

Allowable Subject Matter
Claims 4,9 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Khoury et al (Title: Speaker Diarization: Towards a More Robust and Portable System) discloses segmentation of speech into regions using Kullback-Leibler.
Madikeri et al (Title: KL-HMM based speaker diarization system for meetigns) discloses the use of Kullback Leibler Hidden Markov Model applied for unsupervised diarization of speech.
Vijayasenan et al (Title: KL Realignment for Speaker Diarization with Multiple Feature Streams) discloses the use of Kullback Leibler divergence based realignment with application to speaker diarization.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to LINDA WONG whose telephone number is (571)272-6044.  The examiner can normally be reached on 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LINDA WONG/Primary Examiner, Art Unit 2656