DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
2.	The information disclosure statement (IDS) submitted on 04/10/2020 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
3. 	Claims 2, 12, 19 are objected to because of the following informalities: typographical errors. Claims 2, 12, 19 recite “removing the portion of the audio data form the audio data.” This should be changed to “removing the portion of the audio data from the audio data.”  Appropriate correction is required. 

Claim Rejections - 35 USC § 103
4.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

5.	Claims 1, 9, 16 are rejected under 35 U.S.C. 103 as being unpatentable over Salhin (US 2020/0242134 A1) in view of Jheeta (US 9,672,827 B1.)


	With respect to Claim 1, Salhin disclose 
	extracting one or more candidate phrases from the recognition result using n-gram counts (Salhin [0050] compare one or more of the textual portions and temporal portions of the data records in the group with further textual portions and further temporal portions of further data records in a further group, [0125] FIG. 1 shows a system 100 according to an embodiment of the invention. A plurality of original data records 102 are passed to a text matching computer module 104, which is arranged to analyse textual descriptor portions of each data record and associate data records having textual descriptors which match (i.e. which are similar above a threshold similarity, [0142] The data record grouping module 212 may detect similarities between different descriptions between data records, by pairing up the data records, and calculating the similarity between each pair of textual descriptions 234 of the paired data records. For example, descriptions like “office supplies”, “office stationary”, and “telephone bill” would be paired as ([“office supplies”, “office stationary”], [“office supplies”, “telephone bill”], and [“office stationary”, “telephone bill”]) ); 
 	for each candidate phrase, making a plurality of pairs of same phrases with different time stamps (Salhin [0041] if the textual matched data records comprise timestamp portions separated by regular time intervals, categorising the textually matched data records as recurrent temporally-repeating textually matched data records; and [0042] if the textually matched data records do not comprise timestamp portions separated by regular time intervals, categorising the textually matched data records as non-recurrent temporally-repeating textually matched data records, [0051] if the textual portions and temporal portions of the data records in the group are determined to match the further textual portions and further temporal portions of the further data records above an inter-group matching threshold, link the group and further group together as associated group, [0145] This step may be performed by a textual similarity metric reconciliation module 238. Thus, in examples where a reconciled similarity metric is determined for each pair of data records 234, the data record pair matching module 232 is arranged to identify the textual similarity metric of the data records 210 of each pair of data records 234 by, following obtaining the textual similarity metric for each pair. Paragraphs [0041 and 0042] describes the separation of the timestamp portions between the textual portions in matching. It implies that these the textual portions have different time stamps); 
 	for each candidate phrase, clustering the plurality of pairs of the same phrases by using a difference in time stamps for each pair of the same phrases (Salhin [0039] analysing the timestamp portions of the textually matched data records to determine a time separation between pairs of the textually matched data records which are temporally consecutive; [0040] determining if the textually matched data records comprise timestamp portions separated by regular time intervals; [0041] if the textually matched data records comprise timestamp portions separated by regular time intervals, categorising the textually matched data records as recurrent temporally-repeating textually matched data records; and [0042] if the textually matched data records do not comprise timestamp portions separated by regular time intervals, categorising the textually matched data records as non-recurrent temporally-repeating textually matched data records); and 
	Salhin fail to explicitly teach 	
 	A computer-implemented method for detecting a portion of audio data to be removed, the method comprising: 
 	obtaining a recognition result of audio data, the recognition result including recognized text data and time stamps; 
 	determining a portion of the audio data to be removed using results of the clustering.  
	However, Jheeta teaches 
 	A computer-implemented method for detecting a portion of audio data to be removed, the method comprising: 
 	obtaining a recognition result of audio data, the recognition result including recognized text data and time stamps (Jheeta col. 3 lines 58-63 Upon receiving a text transcript 140 representing captured audio 130 from the speech recognition service 110, each client device 105 timestamps the text transcript with the time at which the captured audio associated with the text transcript was captured, and sends the timestamped text transcript to the communication backend 120); 
 	determining a portion of the audio data to be removed using results of the clustering (Jheeta col. 10 lines 28-36 The aggregated text can be ordered based on the timestamps associated with the synchronized text, and can be organized by speaker associated with each text transcript. In one embodiment, the aggregation module removes duplicate text, text determined to not be relevant, or text that does not satisfy one or more other parameters used to determine whether to include the text in the aggregated text. Examiner notes that Salhin categorizes the textually matched data records as recurrent temporally-repeating textually matched data or as non-recurrent temporally-repeating textually matched data records. It implies that results of the categorizing indicate whether the textual portion is recurrent or non-recurrent and Jheeta remotes the text if the text is duplicated.)
Salhin and Jheeta are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of categorizing the textually matched data based on the timestamp portions separation as taught by Salhin, using teaching of speech recognition as taught by Jheeta for the benefit of converting the capture audio data to text and removing the duplicated text (Jheeta col. 10 lines 28-36 The aggregated text can be ordered based on the timestamps associated with the synchronized text, and can be organized by speaker associated with each text transcript. In one embodiment, the aggregation module removes duplicate text, text determined to not be relevant, or text that does not satisfy one or more other parameters used to determine whether to include the text in the aggregated text.)

	With respect to Claim 9, Salhin discloses  
 	a memory tangibly storing the program instructions (Salhin [0129] The computer 1000 may comprise one or more processors 1006 arranged to operably execute computer software/computer program code thereon, where the computer software/computer program code is stored in a computer-readable medium accessible to the one or more processors 1006. The computer-readable medium may be one or more memory devices, where the memory may also store data for use by the software/program code (e.g. memory 1004 or a separate memory store external to the computer 1000)); 
 	a processor in communications with the memory (Salhin [0129] The computer 1000 may comprise one or more processors 1006 arranged to operably execute computer software/computer program code thereon, where the computer software/computer program code is stored in a computer-readable medium accessible to the one or more processors 1006. The computer-readable medium may be one or more memory devices, where the memory may also store data for use by the software/program code (e.g. memory 1004 or a separate memory store external to the computer 1000)), wherein the processor is configured to:  
 	extract one or more candidate phrases from the recognition result using n-gram counts (Salhin [0050] compare one or more of the textual portions and temporal portions of the data records in the group with further textual portions and further temporal portions of further data records in a further group, [0125] FIG. 1 shows a system 100 according to an embodiment of the invention. A plurality of original data records 102 are passed to a text matching computer module 104, which is arranged to analyse textual descriptor portions of each data record and associate data records having textual descriptors which match (i.e. which are similar above a threshold similarity, [0142] The data record grouping module 212 may detect similarities between different descriptions between data records, by pairing up the data records, and calculating the similarity between each pair of textual descriptions 234 of the paired data records. For example, descriptions like “office supplies”, “office stationary”, and “telephone bill” would be paired as ([“office supplies”, “office stationary”], [“office supplies”, “telephone bill”], and [“office stationary”, “telephone bill”]) );
 	make, for each candidate phrase, a plurality of pairs of same phrases with different time stamps (Salhin [0041] if the textual matched data records comprise timestamp portions separated by regular time intervals, categorising the textually matched data records as recurrent temporally-repeating textually matched data records; and [0042] if the textually matched data records do not comprise timestamp portions separated by regular time intervals, categorising the textually matched data records as non-recurrent temporally-repeating textually matched data records, [0051] if the textual portions and temporal portions of the data records in the group are determined to match the further textual portions and further temporal portions of the further data records above an inter-group matching threshold, link the group and further group together as associated group, [0145] This step may be performed by a textual similarity metric reconciliation module 238. Thus, in examples where a reconciled similarity metric is determined for each pair of data records 234, the data record pair matching module 232 is arranged to identify the textual similarity metric of the data records 210 of each pair of data records 234 by, following obtaining the textual similarity metric for each pair. Paragraphs [0041 and 0042] describes the separation of the timestamp portions between the textual portions in matching. It implies that these the textual portions have different time stamps); 
 	cluster, for each candidate phrase, the plurality of pairs of the same phrases by using differences in time stamps (Salhin [0039] analysing the timestamp portions of the textually matched data records to determine a time separation between pairs of the textually matched data records which are temporally consecutive; [0040] determining if the textually matched data records comprise timestamp portions separated by regular time intervals; [0041] if the textually matched data records comprise timestamp portions separated by regular time intervals, categorising the textually matched data records as recurrent temporally-repeating textually matched data records; and [0042] if the textually matched data records do not comprise timestamp portions separated by regular time intervals, categorising the textually matched data records as non-recurrent temporally-repeating textually matched data records); and 
	Salhin fail to explicitly teach
 	A computer system for detecting a portion of audio data to be removed, by executing program instructions, the computer system comprising:
 	obtain a recognition result of audio data, the recognition result including recognized text data and time stamps; 
	determine a portion of the audio data to be removed using results of the clustering.  
	However, Jheeta teaches 
 	A computer system for detecting a portion of audio data to be removed, by executing program instructions, the computer system comprising:
 	obtain a recognition result of audio data, the recognition result including recognized text data and time stamps (Jheeta col. 3 lines 58-63 Upon receiving a text transcript 140 representing captured audio 130 from the speech recognition service 110, each client device 105 timestamps the text transcript with the time at which the captured audio associated with the text transcript was captured, and sends the timestamped text transcript to the communication backend 120);  
	determine a portion of the audio data to be removed using results of the clustering (Jheeta col. 10 lines 28-36 The aggregated text can be ordered based on the timestamps associated with the synchronized text, and can be organized by speaker associated with each text transcript. In one embodiment, the aggregation module removes duplicate text, text determined to not be relevant, or text that does not satisfy one or more other parameters used to determine whether to include the text in the aggregated text. Examiner notes that Salhin categorizes the textually matched data records as recurrent temporally-repeating textually matched data or as non-recurrent temporally-repeating textually matched data records. It implies that results of the categorizing indicate whether the textual portion is recurrent or non-recurrent and Jheeta remotes the text if the text is duplicated.)
Salhin and Jheeta are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of categorizing the textually matched data based on the timestamp portions separation as taught by Salhin, using teaching of speech recognition as taught by Jheeta for the benefit of converting the capture audio data to text and removing the duplicated text (Jheeta col. 10 lines 28-36 The aggregated text can be ordered based on the timestamps associated with the synchronized text, and can be organized by speaker associated with each text transcript. In one embodiment, the aggregation module removes duplicate text, text determined to not be relevant, or text that does not satisfy one or more other parameters used to determine whether to include the text in the aggregated text.)
 
 	With respect to Claim 16, Salhin disclose 
	extracting one or more candidate phrases from the recognition result using n-gram counts (Salhin [0050] compare one or more of the textual portions and temporal portions of the data records in the group with further textual portions and further temporal portions of further data records in a further group, [0125] FIG. 1 shows a system 100 according to an embodiment of the invention. A plurality of original data records 102 are passed to a text matching computer module 104, which is arranged to analyse textual descriptor portions of each data record and associate data records having textual descriptors which match (i.e. which are similar above a threshold similarity, [0142] The data record grouping module 212 may detect similarities between different descriptions between data records, by pairing up the data records, and calculating the similarity between each pair of textual descriptions 234 of the paired data records. For example, descriptions like “office supplies”, “office stationary”, and “telephone bill” would be paired as ([“office supplies”, “office stationary”], [“office supplies”, “telephone bill”], and [“office stationary”, “telephone bill”]) ); 
 	for each candidate phrase, making a plurality of pairs of same phrases with different time stamps (Salhin [0041] if the textual matched data records comprise timestamp portions separated by regular time intervals, categorising the textually matched data records as recurrent temporally-repeating textually matched data records; and [0042] if the textually matched data records do not comprise timestamp portions separated by regular time intervals, categorising the textually matched data records as non-recurrent temporally-repeating textually matched data records, [0051] if the textual portions and temporal portions of the data records in the group are determined to match the further textual portions and further temporal portions of the further data records above an inter-group matching threshold, link the group and further group together as associated group, [0145] This step may be performed by a textual similarity metric reconciliation module 238. Thus, in examples where a reconciled similarity metric is determined for each pair of data records 234, the data record pair matching module 232 is arranged to identify the textual similarity metric of the data records 210 of each pair of data records 234 by, following obtaining the textual similarity metric for each pair. Paragraphs [0041 and 0042] describes the separation of the timestamp portions between the textual portions in matching. It implies that these the textual portions have different time stamps); 
 	for each candidate phrase, clustering the plurality of pairs of the same phrases by using a difference in time stamps for each pair of the same phrases (Salhin [0039] analysing the timestamp portions of the textually matched data records to determine a time separation between pairs of the textually matched data records which are temporally consecutive; [0040] determining if the textually matched data records comprise timestamp portions separated by regular time intervals; [0041] if the textually matched data records comprise timestamp portions separated by regular time intervals, categorising the textually matched data records as recurrent temporally-repeating textually matched data records; and [0042] if the textually matched data records do not comprise timestamp portions separated by regular time intervals, categorising the textually matched data records as non-recurrent temporally-repeating textually matched data records); and 
	Salhin fail to explicitly teach 	
 	A computer program product for detecting a portion of audio data to be removed, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a computer-implemented method comprising: 
 	obtaining a recognition result of audio data, the recognition result including recognized text data and time stamps;  
 	determining a portion of the audio data to be removed using results of the clustering.  
	However, Jheeta teaches 
 	A computer program product for detecting a portion of audio data to be removed, the computer program product comprising a computer readable storage medium having program instructions embodied therewith (Salhin [0129] The computer 1000 may comprise one or more processors 1006 arranged to operably execute computer software/computer program code thereon, where the computer software/computer program code is stored in a computer-readable medium accessible to the one or more processors 1006. The computer-readable medium may be one or more memory devices, where the memory may also store data for use by the software/program code (e.g. memory 1004 or a separate memory store external to the computer 1000), the program instructions executable by a computer to cause the computer to perform a computer-implemented method comprising: 
 	obtaining a recognition result of audio data, the recognition result including recognized text data and time stamps (Jheeta col. 3 lines 58-63 Upon receiving a text transcript 140 representing captured audio 130 from the speech recognition service 110, each client device 105 timestamps the text transcript with the time at which the captured audio associated with the text transcript was captured, and sends the timestamped text transcript to the communication backend 120); 
 	determining a portion of the audio data to be removed using results of the clustering (Jheeta col. 10 lines 28-36 The aggregated text can be ordered based on the timestamps associated with the synchronized text, and can be organized by speaker associated with each text transcript. In one embodiment, the aggregation module removes duplicate text, text determined to not be relevant, or text that does not satisfy one or more other parameters used to determine whether to include the text in the aggregated text. Examiner notes that Salhin categorizes the textually matched data records as recurrent temporally-repeating textually matched data or as non-recurrent temporally-repeating textually matched data records. It implies that results of the categorizing indicate whether the textual portion is recurrent or non-recurrent and Jheeta remotes the text if the text is duplicated.)
Salhin and Jheeta are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of categorizing the textually matched data based on the timestamp portions separation as taught by Salhin, using teaching of speech recognition as taught by Jheeta for the benefit of converting the capture audio data to text and removing the duplicated text (Jheeta col. 10 lines 28-36 The aggregated text can be ordered based on the timestamps associated with the synchronized text, and can be organized by speaker associated with each text transcript. In one embodiment, the aggregation module removes duplicate text, text determined to not be relevant, or text that does not satisfy one or more other parameters used to determine whether to include the text in the aggregated text.)

6.	Claims 2, 12, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Salhin (US 2020/0242134 A1) in view of Jheeta (US 9,672,827 B1) and Faizakof et al. (US 2021/0193169 A1.)

With respect to Claim 2, Salhin and Jheeta teach all the limitations of Claim 1 upon which Claim 2 depends. Salhin and Jheeta fail to explicitly teach 
further comprising preparing a training data for training a model by removing the portion of the audio data form the audio data.  
However, Faizakof et al. teach 
further comprising preparing a training data for training a model by removing the portion of the audio data form the audio data (Faizakof et al. [0078] a construction process of a training dataset of the present disclosure further comprises a step of removing from the sequences and sub-sequences audio segments comprising speech signal associated with an agent in call center interactions, [0086] At step 404, the annotated segments are arranged in temporal sequences associated with individual interactions by a speaker. In some embodiments, sections or portions of sequences comprising between 5 and 15 adjacent segments are extracted from the sequences. In some embodiments, segments representing a speech signal from an agent-side of the interactions are removed prior to further processing, [0086] At step 404, the annotated segments are arranged in temporal sequences associated with individual interactions by a speaker. In some embodiments, sections or portions of sequences comprising between 5 and 15 adjacent segments are extracted from the sequences. In some embodiments, segments representing a speech signal from an agent-side of the interactions are removed prior to further processing, [0090] In some embodiments, at step 412, a machine learning model is trained on the training dataset.)
Salhin, Jheeta and Faizakof et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of categorizing the textually matched data based on the timestamp portions separation as taught by Salhin, using teaching of speech recognition as taught by Jheeta for the benefit of converting the capture audio data to text and removing the duplicated text, using teaching of removing speech segments as taught by Faizakof et al. for the benefit of constructing the training data (Faizakof et al. [0086] At step 404, the annotated segments are arranged in temporal sequences associated with individual interactions by a speaker. In some embodiments, sections or portions of sequences comprising between 5 and 15 adjacent segments are extracted from the sequences. In some embodiments, segments representing a speech signal from an agent-side of the interactions are removed prior to further processing, [0086] At step 404, the annotated segments are arranged in temporal sequences associated with individual interactions by a speaker. In some embodiments, sections or portions of sequences comprising between 5 and 15 adjacent segments are extracted from the sequences. In some embodiments, segments representing a speech signal from an agent-side of the interactions are removed prior to further processing, [0090] In some embodiments, at step 412, a machine learning model is trained on the training dataset.)

With respect to Claim 12, Salhin and Jheeta teach all the limitations of Claim 9 upon which Claim 12 depends. Salhin and Jheeta fail to explicitly teach
 	wherein the processor is configured to: prepare a training data for training a model by removing the portion of the audio data form the audio data.  
	However, Faizakof et al. teach 
wherein the processor (Jheeta col. 4 lines 32 at least one processor) is configured to: 
prepare a training data for training a model by removing the portion of the audio data form the audio data (Faizakof et al. [0078] a construction process of a training dataset of the present disclosure further comprises a step of removing from the sequences and sub-sequences audio segments comprising speech signal associated with an agent in call center interactions, [0086] At step 404, the annotated segments are arranged in temporal sequences associated with individual interactions by a speaker. In some embodiments, sections or portions of sequences comprising between 5 and 15 adjacent segments are extracted from the sequences. In some embodiments, segments representing a speech signal from an agent-side of the interactions are removed prior to further processing, [0086] At step 404, the annotated segments are arranged in temporal sequences associated with individual interactions by a speaker. In some embodiments, sections or portions of sequences comprising between 5 and 15 adjacent segments are extracted from the sequences. In some embodiments, segments representing a speech signal from an agent-side of the interactions are removed prior to further processing, [0090] In some embodiments, at step 412, a machine learning model is trained on the training dataset.)
Salhin, Jheeta and Faizakof et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of categorizing the textually matched data based on the timestamp portions separation as taught by Salhin, using teaching of speech recognition as taught by Jheeta for the benefit of converting the capture audio data to text and removing the duplicated text, using teaching of removing speech segments as taught by Faizakof et al. for the benefit of constructing the training data (Faizakof et al. [0086] At step 404, the annotated segments are arranged in temporal sequences associated with individual interactions by a speaker. In some embodiments, sections or portions of sequences comprising between 5 and 15 adjacent segments are extracted from the sequences. In some embodiments, segments representing a speech signal from an agent-side of the interactions are removed prior to further processing, [0086] At step 404, the annotated segments are arranged in temporal sequences associated with individual interactions by a speaker. In some embodiments, sections or portions of sequences comprising between 5 and 15 adjacent segments are extracted from the sequences. In some embodiments, segments representing a speech signal from an agent-side of the interactions are removed prior to further processing, [0090] In some embodiments, at step 412, a machine learning model is trained on the training dataset.)

With respect to Claim 19, Salhin and Jheeta teach all the limitations of Claim 16 upon which Claim 19 depends. Salhin and Jheeta fail to explicitly teach 
 	wherein the computer-implemented method further comprises preparing a training data for training a model by removing the portion of the audio data form the audio data.  
 	However, Faizakof et al. teach 
wherein the computer-implemented method further comprises preparing a training data for training a model by removing the portion of the audio data form the audio data (Faizakof et al. [0078] a construction process of a training dataset of the present disclosure further comprises a step of removing from the sequences and sub-sequences audio segments comprising speech signal associated with an agent in call center interactions, [0086] At step 404, the annotated segments are arranged in temporal sequences associated with individual interactions by a speaker. In some embodiments, sections or portions of sequences comprising between 5 and 15 adjacent segments are extracted from the sequences. In some embodiments, segments representing a speech signal from an agent-side of the interactions are removed prior to further processing, [0086] At step 404, the annotated segments are arranged in temporal sequences associated with individual interactions by a speaker. In some embodiments, sections or portions of sequences comprising between 5 and 15 adjacent segments are extracted from the sequences. In some embodiments, segments representing a speech signal from an agent-side of the interactions are removed prior to further processing, [0090] In some embodiments, at step 412, a machine learning model is trained on the training dataset.)
Salhin, Jheeta and Faizakof et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of categorizing the textually matched data based on the timestamp portions separation as taught by Salhin, using teaching of speech recognition as taught by Jheeta for the benefit of converting the capture audio data to text and removing the duplicated text, using teaching of removing speech segments as taught by Faizakof et al. for the benefit of constructing the training data (Faizakof et al. [0086] At step 404, the annotated segments are arranged in temporal sequences associated with individual interactions by a speaker. In some embodiments, sections or portions of sequences comprising between 5 and 15 adjacent segments are extracted from the sequences. In some embodiments, segments representing a speech signal from an agent-side of the interactions are removed prior to further processing, [0086] At step 404, the annotated segments are arranged in temporal sequences associated with individual interactions by a speaker. In some embodiments, sections or portions of sequences comprising between 5 and 15 adjacent segments are extracted from the sequences. In some embodiments, segments representing a speech signal from an agent-side of the interactions are removed prior to further processing, [0090] In some embodiments, at step 412, a machine learning model is trained on the training dataset.)

Allowable Subject Matter
7.	Claims 3, 10, 13, 17, 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
	Claims 4-8, 11, 14, 15, 18 are objected to as being dependent upon an objected claim(s) by virtue of their dependency. 

Conclusion
8.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. See PTO-892.
a.	Chopra et al. (US 2021/0326379 A1.) In this reference, Chopra et al. disclose a method for deleting the duplicated audio content. 
b.	Kim et al. (US 2021/0097990 A1.) In this reference, Kim et al. disclose a method for deleting the duplicated spoken portion. 
c. 	Engelke et al. (US 2019/0312973 A1.) In this reference, Engelke et al. disclose a method for assign the time stamps for matching words. 

9.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429. The examiner can normally be reached Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C. Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/THUYKHANH LE/Primary Examiner, Art Unit 2655