Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
1.	This action is responsive to remarks filed 2/14/2021.	
Response to Amendment
2.	Claims 1, 11, 19 have been amended; claims 22-25 are newly added.
Response to Arguments
3.	Applicant’s arguments filed have been fully considered but are moot based on the new grounds of rejection responsive to the amendments.
Allowable Subject Matter
4.	Claims 22, 24-25 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 103
5.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
6.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


7.	Claims 1, 8, 10-11, 17-19, 23 are rejected under 35 U.S.C. 103 as being unpatentable over Milstein (2016/0112561) in view of Wittenstein et al (2011/0087491) in further view of Beach et al (10,395,640).

Regarding claim 1 Milstein teaches A system configured to assist in transcription of a repeated phrase (abstract), comprising: 
a frontend server (fig 1) configured to transmit an audio recording comprising speech of first and second people (fig 1; 6 call center; hundreds or thousands of calls, common utterances from different callers; 22; 28), and
a backend server (fig 1) configured to: 
generate a transcription of the audio recording utilizing an automatic speech recognition (ASR) system (fig 1-2; 30 transcription; automated speech recognition); 
cluster segments of the audio recording into clusters of similar utterances (abstract; fig 4 group of similar questionable utterances; 30-31); 

select first and second segments of the audio recording that comprise similar utterance spoken by the first and second people (6; 23; 30-33 pool of similar questionable utterances; selected; 41-42), respectively;
Provide the certain transcriber with the first and second segments of the audio recording and with transcriptions of the first and second segments (6; 23; 30-33 once selected similar questionable utterances transmitted to transcriber; 41-42);
receive from the certain transcriber: an indication indicating whether the first and second segments comprise repetitions of a phrase, and a correct transcription of said phrase (6; 23-24; 30-33 newly assigned transcribed value; 41-42);
and update the transcription of the audio recording based on the indication and the correct transcription (6; 8; 23-24; 33 transcribed values confirmed or assigned by the human transcriber can be incorporated into the initial transcribed message).
Milstein teaches The questionable utterances in the sample can be transmitted to the same human transcriber or alternatively, to different human transcribers (0032)

	but does not specifically teach:
	Calculate for each transcriber from among transcribers, expected accuracies of transcriptions of the first and second segments were they transcribed by the transcriber,
	Select a certain transcriber, from among the transcribers, whose expected accuracies reach a predetermined threshold.

	Wittenstein teaches multiple transcribers, each with a particular skill and accuracy.  The art then allows a user to specify certain criteria, such as the accuracy of transcription (expected accuracy), and using said criteria in selecting the appropriate transcriber (abstract: pool of human transcribers; 29: tracks transcribers speed, accuracy, updates transcriber performance; 30; 32; 33: transcriber profiles; 71; 74).

Beach teaches calculating expected accuracies of recognition, to help determine which method will present the best result before selection (col 1 l. 21-24 generate text in speech recognition; col 9 l. 20 Word Error Rate; col 15 l. 5- 28: ability to predict the achievable recognition accuracy based on accuracy statistics…threshold accuracy).
It would have been obvious to one of ordinary skill in the art before the effective filing date to determine expected accuracies before a transcriber is chosen to ensure the best transcriber is selected.  When incorporated with Wittenstein, it would be obvious to use the technique of Beach for predicting accuracy for each of the multiple methods which include human transcription, to apply to the pool of transcribers of Witternstein, thus allowing accuracy prediction for each transcriber.	 


	Regarding claim 8 Milstein teaches The system of claim 1, wherein the backend server is further configured cluster the segments based on similarity of paths corresponding to the segments in a lattice constructed by the ASR system (35-36; 48-52 – grouping similar utterances based on information obtained during recognition).

	Regarding claim 10 Milstein teaches the system of claim 1, wherein the backend server is further configured to update the transcription of the audio recording responsive 34-35; 50; 57).

	Regarding claim 11 Milstein, Wittenstein, and Beach teach A method for assisting in transcription of a repeated phrase, comprising: 
receiving an audio recording comprising speech of first and second people; 
generating a transcription of the audio recording utilizing an automatic speech recognition (ASR) system; 
3clustering segments of the audio recording into clusters of similar utterances; 
selecting first and second segments of the audio recording that comprise similar utterances spoken by the first and second people, respectively;
Calculating, for each transcriber from among transcribers, expected accuracies of transcriptions of the first and second segments were they transcribed by the transcriber;
	Selecting a certain transcriber, from among the transcribers, whose expected accuracies reach a predetermined threshold;
Providing the certain transcriber with the first and second segments of the audio recording and with transcriptions of the first and second segments;
receiving from the certain transcriber: an indication indicating whether the first and second segments comprise repetitions of a phrase, and a correct transcription of said phrase; and 
updating the transcription of the audio recording based on the indication and the correct transcription.
Recites limitations similar to claim 1 and is rejected for similar rationale and reasoning.

Claim 17 recites limitations similar to claim 8 and is rejected for similar rationale and reasoning.
Claim 18 recites limitations similar to claim 10 and is rejected for similar rationale and reasoning.

Regarding claim 19 Milstein, Wittenstein, and Beach teach A non-transitory computer-readable medium having instructions stored thereon that, in response to execution by a system including a processor and memory, causes the system to perform operations comprising: 
receiving an audio recording comprising speech of first and second people; 
generating a transcription of the audio recording utilizing an automatic speech recognition (ASR) system; 
3clustering segments of the audio recording into clusters of similar utterances; 
selecting first and second segments of the audio recording that comprise similar utterances spoken by the first and second people, respectively;
Calculating, for each transcriber from among transcribers, expected accuracies of transcriptions of the first and second segments were they transcribed by the transcriber;
	Selecting a certain transcriber, from among the transcribers, whose expected accuracies reach a predetermined threshold;
Providing the certain transcriber with the first and second segments of the audio recording and with transcriptions of the first and second segments;
receiving from the certain transcriber: an indication indicating whether the first and second segments comprise repetitions of a phrase, and a correct transcription of said phrase; and 
updating the transcription of the audio recording based on the indication and the correct transcription.
Recites limitations similar to claim 1 and is rejected for similar rationale and reasoning.

Regarding claim 23 Wittenstein and Beach teach The system of claim 1, wherein the expected accuracies of the transcriptions of the first and second segments were they transcribed by the transcriber are indicative of expected word error rates (WER) in said transcriptions (Wittenstein 71 comparing transcriptions…transcriber accuracy; 74 error rate; Beach col 9 l. 20 Word Error Rate).
Rejected for similar rationale and reasoning as claim 1.



8.	Claims 2-3, 12-13, 20-21 are rejected under 35 U.S.C. 103 as being unpatentable over Milstein in view of Wittenstein in further view of Beach in further view of Ju et al (2006/0004570).

33; 57-58) but does not specifically teach where Ju teaches The system of claim 1, wherein the backend server is further configured to utilize the indication to update a phonetic model utilized by the ASR system to reflect one or more pronunciations of the phrase (3: speech recognition system uses…acoustic model; tuning and adjustment of the models is necessary to ensure speech recognition system functions effectively; acoustic model tuning…acoustic parameters to increase accuracy; 0008; 28).  
It would have been obvious to one of ordinary skill in the art at the time of the invention to incorporate Ju for an improved system allowing the updated transcriptions to improve future ASR.

Regarding claim 3 Milstein does not specifically teach where Ju teaches The system of claim 1, wherein the backend server is further configured to update a language model utilized by the ASR system to include the correct transcription of the phrase (3: language model; language model tuning; 8; 28).  
Rejected for similar rationale and reasoning as claim 2.

Claim 12 recites limitations similar to claim 2 and is rejected for similar rationale and reasoning.
Claim 13 recites limitations similar to claim 3 and is rejected for similar rationale and reasoning.

Claim 20 recites limitations similar to claim 2 and is rejected for similar rationale and reasoning.
Claim 21 recites limitations similar to claim 3 and is rejected for similar rationale and reasoning.


9.	Claims 7, 9, 16 are rejected under 35 U.S.C. 103 as being unpatentable over Milstein in view of Wittenstein in further view of Beach in further view of Fingscheidt et al (2007/0112568).

Regarding claim 7 Milstein does not specifically teach where Fingscheidt teaches The system of claim 1, wherein the backend server is further configured to cluster the segments utilizing dynamic time warping (DTW) of acoustic feature representations of the segments (14 DTW). 
It would have been obvious to one of ordinary skill in the art at the time of the invention to incorporate DTW presenting a reasonable expectation of success in using a well-known technique to compare segments.


Regarding claim 9 Milstein does not specifically teach where Fingscheidt teaches The system of claim 1, wherein the backend server is further configured to represent 
one or more feature values indicative of acoustic properties of the segment, and at least some feature values indicative of phonetic transcription properties calculated by the ASR system (abstract; 12-13: feature vectors; 50: phonetic transcription); 
the backend server is further configured to utilize a distance function that operates on pairs of vectors of feature values (13; 63; 72).  
It would have been obvious to one of ordinary skill in the art at the time of the invention to incorporate Fingscheidt for an improved system, further demonstrating the components of audio/speech segments and how they are used in recognition, transcription, and clustering.

Claim 16 recites limitations similar to claim 7 and is rejected for similar rationale and reasoning.


Conclusion
10.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAUN A ROBERTS whose telephone number is (571)270-7541.  The examiner can normally be reached Monday-Friday 9-5 EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on 571-272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
/SHAUN ROBERTS/
Primary Examiner, Art Unit 2657