DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification

The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 


Double Patenting

The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 21-30 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-10 of U.S.Patent 10,446,156. Although the claims at issue are not identical, they are not patentably distinct from each other because the extra processing steps of 

16/703099
21. A non-transitory computer-readable medium having instructions stored thereon for facilitating diarization of audio files from a customer service interaction, wherein the instructions, when executed by a processing system, direct the processing system to: receive a set of textual transcripts from a transcription server and a set of audio files associated with the set of textual transcripts from an audio database server; perform a blind diarization on the set of textual transcripts and the set of audio files to segment and cluster the textual transcripts into a plurality of textual speaker clusters, wherein the number of textual speaker clusters is at least equal to a number of speakers in the textual transcript; automatedly apply at least one heuristic to the textual speaker clusters to select textual speaker 

apply the linguistic model to transcribed audio data with the processor to label a portion of the transcribed audio data as having been spoken by the identified group of speakers; 
save the at least one linguistic model to a linguistic database server and associating it with the labeled speaker; 

receive a new textual transcript from the transcription server and a new audio file associated with the new textual transcript from the audio database server; receive the at least one linguistic model from the linguistic database server;
 receive at least one acoustic voiceprint associated with a specific speaker from a voiceprint database server; apply the received 
comparing each new textual speaker cluster to the at least one linguistic model, and labeling each textual speaker cluster as belonging to a customer service agent or belonging to a customer, comparing each audio speaker segment to the at least one acoustic voiceprint, 
and labeling each audio speaker segment as belonging to a known speaker or belonging to an unknown speaker; when one of the audio speaker segments is labeled as belonging to a known speaker, 
select and transcribing the labeled audio speaker segments with the transcription server;
compare the selected transcribed labeled audio speaker segments to the textual speaker 



22. The non-transitory computer-readable medium of claim 21, wherein the identified group of speakers are customer service agents and the audio data is audio data of a customer service interaction between at least one customer service agent and at least one customer. 

23. The non-transitory computer-readable medium of claim 21, wherein the specific speaker is a specific customer service agent. 



25. The non-transitory computer-readable medium of claim 21, wherein the analysis of the selected textual speaker clusters includes determining word use frequencies for words in the selected textual speaker clusters with the processor, 
determining word use frequencies for words in the non-selected textual speaker clusters with the processor, and comparing the word use frequencies for words in the selected textual speaker clusters to the word use frequencies for words in the non-selected textual speaker clusters to identify a plurality of discriminating words for use in the at least one linguistic model. 



 comparing the plurality of scripts to the selected textual speaker clusters, 
comparing the plurality of scripts of non-selected textual speaker clusters, 
determining a correlation score between each of the textual speaker clusters and the plurality of scripts, identifying the group with the greatest correlation score for use in the at least one linguistic model. 

27. The non-transitory computer-readable medium of claim 26, further directing the processing system to:
 calculate a difference between the word use frequencies for each word in the selected textual speaker clusters and the non-selected textual speaker clusters; and compare the difference to a predetermined selection 

28. The non-transitory computer-readable medium of claim 21, wherein the textual speaker clusters are associated in groups of at least two, wherein the group of at least two includes a textual speaker cluster originating from the identified group of speakers and at least one textual speaker cluster originating from an other speaker, and wherein the non-selected textual speaker clusters are assumed to have originated from an other speaker. 

29. The non-transitory computer-readable medium of claim 21, wherein the at least one acoustic voiceprint is a set of acoustic voiceprints for each specific customer service agent saved in the acoustic voiceprint database server. 


receive the set of acoustic voiceprints from the acoustic voiceprint database server; apply the received at least one linguistic model from the linguistic database server to the new audio file transcript from an audio source to perform diarization of the new audio file by blind diarizing the new audio file and new textual transcript; compare each new textual speaker cluster to the at least one linguistic model, and labeling each textual speaker cluster as belonging to a customer service agent or belonging to a customer; compare each audio speaker segment to the set of acoustic voiceprints; determine which audio speaker segments match one of the acoustic voiceprints; and label those audio speaker segments as belonging to the known speaker. 



1. A method of diarization, the method comprising: 




receiving a set of textual transcripts from a transcription server and a set of audio files associated with the set of textual transcripts from an audio database server; performing a blind diarization on the set of textual transcripts and the set of audio files to segment and cluster the textual transcripts into a plurality of textual speaker clusters, wherein the number of textual speaker clusters is at least equal to a number of speakers in the textual transcript; automatedly applying at least one heuristic to the textual speaker clusters with a processor to select 
applying the linguistic model to transcribed audio data with the processor to label a portion of the transcribed audio data as having been spoken by the identified group of speakers; 
saving the at least one linguistic model to a linguistic database server and associating it with the labeled speaker; with the processor, 

receiving a new textual transcript from the transcription server and a new audio file associated with the new textual transcript from the audio database server; receiving the at least one linguistic model from the linguistic database server; 
receiving at least one acoustic voiceprint associated with a specific speaker from a voiceprint database server; applying the 
comparing each new textual speaker cluster to the at least one linguistic model, and labeling each textual speaker cluster as belonging to a customer service agent or belonging to a customer, comparing each audio speaker segment to the at least one acoustic voiceprint, 
and labeling each audio speaker segment as belonging to a known speaker or belonging to an unknown speaker; when one of the audio speaker segments is labeled as belonging to a known speaker, 
selecting and transcribing the labeled audio speaker segments with the transcription server; 
comparing the selected transcribed labeled audio speaker segments to the textual speaker 
 

2. The method of claim 1, wherein the identified group of speakers are customer service agents and the audio data is audio data of a customer service interaction between at least one customer service agent and at least one customer. 
 

3. The method of claim 1, wherein the specific speaker is a specific customer service agent. 
 


 

5. The method of claim 1, wherein the analysis of the selected textual speaker clusters includes determining word use frequencies for words in the selected textual speaker clusters with the processor, 

determining word use frequencies for words in the non-selected textual speaker clusters with the processor, and comparing the word use frequencies for words in the selected textual speaker clusters to the word use frequencies for words in the non-selected textual speaker clusters with the processor to identify a plurality of discriminating words for use in the at least one linguistic model. 
 


comparing the plurality of scripts to the selected textual speaker clusters, 
comparing the plurality of scripts of non-selected textual speaker clusters, 
determining a correlation score between each of the textual speaker clusters and the plurality of scripts, identifying the group with the greatest correlation score for use in the at least one linguistic model. 
 
7. The method of claim 6, further comprising: 


calculating a difference between the word use frequencies for each word in the selected textual speaker clusters and the non-selected textual speaker clusters; and comparing the difference to a predetermined selection 
 
8. The method of claim 1, wherein the textual speaker clusters are associated in groups of at least two, wherein the group of at least two includes a textual speaker cluster originating from the identified group of speakers and at least one textual speaker cluster originating from an other speaker, and wherein the non-selected textual speaker clusters are assumed to have originated from an other speaker. 
 

9. The method of claim 1, wherein the at least one acoustic voiceprint is a set of acoustic voiceprints for each specific customer service agent saved in the acoustic voiceprint database server. 
 



receiving the set of acoustic voiceprints from the acoustic voiceprint database server; applying the received at least one linguistic model from the linguistic database server to the new audio file transcript from an audio source to perform diarization of the new audio file by blind diarizing the new audio file and new textual transcript; comparing each new textual speaker cluster to the at least one linguistic model, and labeling each textual speaker cluster as belonging to a customer service agent or belonging to a customer; comparing each audio speaker segment to the set of acoustic voiceprints; determining which audio speaker segments match one of the acoustic voiceprints; and labeling those audio speaker segments as belonging to the known speaker. 



Claims 31-40 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-7 of U.S.Patent 10,950,242. Although the claims at issue are not identical, they are not patentably distinct from each other because the extra processing steps of the ‘242 patent are not necessary to realize the functionality of the claims in the instant invention. See table below.

16/702998
31. A method of diarization, the method comprising: 
receiving a set of audio data from an audio database server at a speech-to-text (SST) server,




wherein each audio data is an audio recording of a conversation between two or more speakers, wherein at least one of the two or more speakers is a customer service agent, further wherein at least one of the two or more speakers is a customer;


receiving a sub-set of the textual transcripts and a sub-set of audio data for diarization, wherein the sub-set of audio data is the audio data associated with each of the sub-set of the textual transcripts; 



performing a blind diarization on the sub-set of textual transcripts to segment and cluster the sub-set of textual transcripts into a plurality of textual speaker clusters,







applying at least one heuristic to the textual speaker clusters with a processor to select textual speaker clusters likely to be associated with an identified group of speakers; 

analyzing the selected textual speaker clusters to extract a subset of the textual speaker clusters known to be spoken by the identified group of speakers; 

creating at least one linguistic model based on the extracted subset of textual speaker clusters; and applying the at least one linguistic model to a new audio file transcript from an audio source to perform diarization of the new audio file by blind diarizing the new audio file, comparing each new textual speaker cluster to the at least one linguistic 

















32. The method of claim 31, the method further comprising receiving metadata 

33. The method of claim 32, wherein the SST server uses the metadata associated with each of the audio data to determine at least one technique for creating each of the textual transcripts. 


34. The method of claim 31, the method further comprising determining a confidence score for each of the textual transcripts. 
35. The method of claim 34, wherein the subset of textual transcripts received are textual transcripts that have a high confidence score. 



















36. The method of claim 31, the method further comprising

 performing a blind diarization on the sub-set of audio data to segment and cluster the sub-set of audio data into a plurality of audio speaker clusters, 











wherein the number of audio speaker clusters is at least equal to a number of speakers in each audio data, wherein the number of speakers in each of the sub-set of audio data equals the number of speakers in each related textual transcript. 






37. The method of claim 31, the method further comprising: applying the linguistic model to the textual speaker clusters to label a 

38. The method of claim 31, the method further comprising saving the at least one linguistic model to a linguistic database server and associating it with the identified group of speakers. 




40. The method of claim 31, wherein the identified group of speakers are customer service agents.

1. A method of diarization and labeling of audio data, the method comprising: 
receiving a set of textual transcripts from a transcription server (examiner notes this is a S-to-T server) and a set of audio files associated with the set of textual transcripts from an audio database server, wherein each textual transcript is a transcription of the associated audio file, 
wherein each audio file is an audio recording of a conversation between two or more speakers, wherein at least one of the two or more speakers is a customer service agent, further wherein at least one of the two or more speakers is a customer;




analyzing the selected textual speaker clusters to extract a subset of the selected textual speaker clusters known to be spoken by the identified group of speakers; creating at least one linguistic model based on the extracted subset of textual speaker clusters;


analyzing the blind diarized textual transcripts by applying the at least one linguistic model to the blind diarized textual transcripts to determine the textual speaker clusters that were spoken by the identified group of speakers;




applying at least one heuristic to the textual speaker clusters with a processor to select textual speaker clusters likely to be associated with an identified group of speakers; 

clustering the segmented textual transcripts of each textual transcript into a plurality of textual speaker clusters, 


labeling the determined textual speaker clusters as having been spoken by the identified group of speakers; determining word use frequencies for words in the labeled textual speaker clusters; determining word use frequencies for words in the non-labeled textual speaker clusters; comparing the word use frequencies for words in the labeled 
 

2. The method of claim 1, the method further comprising: transcribing audio files using the transcription server to create the textual 
 

3. The method of claim 1, wherein each of the textual transcripts includes a confidence score, further wherein the set textual transcripts received for diarization and inclusion in the at least one linguistic model are textual transcripts that have a high confidence score. 
 






 

7. The method of claim 1, 


(from claim 1 -- analyzing the selected textual speaker clusters to extract a subset of the selected textual speaker clusters known to be spoken by the identified group of speakers;)





wherein the number of new textual speaker clusters is at least equal to a number of speakers in the new audio file transcript; applying the at least one linguistic model to the new textual speaker clusters to select new textual speaker clusters as having been spoken by the identified group of speakers associated with the at least one linguistic model; and labeling the selected new textual speaker clusters as belonging to the identified group of speakers. 


(from claim 1) -- determining word use frequencies for words in the labeled textual 


(from claim 1) -- saving the at least one linguistic model to a linguistic database server and associating it with the identified group of speakers;





4. The method of claim 1, wherein the identified group of speakers are customer service agents.






Allowable Subject Matter

Claims 21-40 are allowable over the prior art of record.

The following is a statement of reasons for the indication of allowable subject matter:
As to the prior art of record, the prior art references in combination, did not fairly suggest or teach Arrowood fails to teach or suggest selecting a subset of a subset of audio files where the initial subset the subset of audio files for a known customer service agent which include at least one other speaker who is a customer and the subsequent subset is selected based on maximizing in each individual audio file the acoustical difference between the known customer service agent and the customer in that audio file. Support for this limitation is found in the specification at .


Conclusion

Please see related art listed on the PTO-892 form.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Michael N Opsasnick/
Primary Examiner, Art Unit 2658
11/22/2021