DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
In response to the amendment filed 3/21/2022; claims 1-13 and 15 – 27 are pending; claim 14 has been cancelled.

	
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-5, 11-13, 15, 19-20 and 23 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Cromack et al. (US 2009/0306981 A1).
Re claims 1, 12, 23:
1. Cromack teaches a method (Cromack, Abstract) comprising: 
obtaining a hearing user's (HU's) voice signal originating at a first device during a communication session between the first device and a second device, the communication session configured for verbal communication such that the HU's voice signal includes speech, the HU's voice signal being a first voice signal (Cromack, fig. 1, “Conversation Contributors 1-N 100”; [0029] – [0031], “Conversation Attendees”, “Conferees” are hearing users; [0038], “Automatic Speech-Recognizer (ASR) … automatic speech -recognizers are quite accurate at speaker-independent speech-recognition”); 
obtaining a first text string that is a transcription of the first voice signal, the first text string generated by a first automatic speech recognition (ASR) engine using the first voice signal (Cromack, [0038], “Automatic Speech-Recognizer (ASR) … automatic speech -recognizers are quite accurate at speaker-independent speech-recognition ... The CES alleviates these problems without requiring explicit speech -recognizer training for each speaker ... for both speaker-dependent and speaker-independent models”; figs. 10 – 16; [0088], “When automatic note field transcription has been invoked, the system displays a new timestamp 6310 and speaker name 6320 in addition to the transcribed note 6330 corresponding to the spoken conversation at each transition in speakers. When multiple speakers are simultaneously speaking, the transcription service separates the independent threads of conversation on separate lines with corresponding starting time stamps and speaker names”); 
obtaining a second text string that is a transcription of the first voice signal, the second text string generated based on input obtained from a device associated with a call assistant (CA) (Cromack, [0039], “Voice-writer-is a human trained to use an automatic speech-recognition system (which has conversely been trained to recognize the individual voice-writer) who listens to untrained speakers' utterances and enunciates those utterances into the speech-recognizer for transcription”; Voice-writer – a call assistant; "Voice-writer" generates a second text string); 
generating an output text string from the first text string and the second text string, the output text string includes one or more first words from the first text string and one or more second words from the second text string (Cromack, [0041], “is a human trained to correct the output of automatic speech-recognizers by listening to the original transcribed utterances, viewing the transcriptions, spotting any errors, and correcting them via speech input through a microphone and speech-recognizer, by hand with a keyboard and mouse, or using other computer input devices”; see fig. 11 in Cromack, 11100, “Edit Transcript”; figs. 4,6,8; figs. 10 – 16; [0082]; [0084]”); and 
providing the output text string as a transcription of the hearing user's voice signal for presentation during the communication session substantially concurrently with the presentation of the HU's voice signal by the second device (Cromack, [0041], “is a human trained to correct the output of automatic speech-recognizers by listening to the original transcribed utterances, viewing the transcriptions, spotting any errors, and correcting them via speech input through a microphone and speech-recognizer, by hand with a keyboard and mouse, or using other computer input devices”; see fig. 11 in Cromack, 11100, “Edit Transcript”; figs. 4,6,8; figs. 10 – 16; [0082]; [0084]”). 

12. Cromack teaches a method (Cromack, Abstract) comprising: 
obtaining a hearing user's (HU's) voice signal originating at a first device during a communication session between the first device and a second device, the communication session configured for verbal communication such that the HU's voice signal includes speech (Cromack, fig. 1, “Conversation Contributors 1-N 100”; [0029] – [0031], “Conversation Attendees”, “Conferees” are hearing users; [0038], “Automatic Speech-Recognizer (ASR) … automatic speech -recognizers are quite accurate at speaker-independent speech-recognition”); 
obtaining a first text string that is a transcription of the HU's voice signal, the first text string generated using an automatic speech recognition (ASR) system using the HU's voice signal (Cromack, [0038], “Automatic Speech-Recognizer (ASR) … automatic speech -recognizers are quite accurate at speaker-independent speech-recognition ... The CES alleviates these problems without requiring explicit speech -recognizer training for each speaker ... for both speaker-dependent and speaker-independent models”; figs. 10 – 16; [0088], “When automatic note field transcription has been invoked, the system displays a new timestamp 6310 and speaker name 6320 in addition to the transcribed note 6330 corresponding to the spoken conversation at each transition in speakers. When multiple speakers are simultaneously speaking, the transcription service separates the independent threads of conversation on separate lines with corresponding starting time stamps and speaker names”); 
obtaining a second text string that is a transcription of a second voice signal, the second voice signal including a revoicing of the HU's voice signal by a call assistant and the second text string generated by the ASR system using the second voice signal (Cromack, [0039], “Voice-writer-is a human trained to use an automatic speech-recognition system (which has conversely been trained to recognize the individual voice-writer) who listens to untrained speakers' utterances and enunciates those utterances into the speech-recognizer for transcription”; Voice-writer – a call assistant; "Voice-writer" generates a second text string); 
generating an output text string from the first text string and the second text string wherein the output text string includes one or more first words from the first text string and one or more second words from the second text string (Cromack, [0041], “is a human trained to correct the output of automatic speech-recognizers by listening to the original transcribed utterances, viewing the transcriptions, spotting any errors, and correcting them via speech input through a microphone and speech-recognizer, by hand with a keyboard and mouse, or using other computer input devices”; see fig. 11 in Cromack, 11100, “Edit Transcript”; figs. 4,6,8; figs. 10 – 16; [0082]; [0084]”); and 
using the output text string as a transcription of the speech (Cromack, figs. 4,6, 8; figs. 10 – 16 show that the audio playback controls and transcription; [0081] – [0082]; [0081], “The subscriber can select the segment of audio to be replayed from any bookmarked … selected portions of the conversation along with related data), key word, or topic in this call or any previous conversation recorded in the subscriber's portfolio”; [0084], “the user may subsequently request that the service transcribe a note from the audio recording by clicking the Transcribe button 6130 located next to the CogI label title 6210 (see Icon description below)”; conversation transcriptions and the audio segment can be outputted at the same time). 

23. Cromack teaches a method (Cromack, Abstract) comprising: 
obtaining a hearing user's (HU's) voice signal originating at a first device during a communication session between the first device and a second device, the communication session configured for verbal communication such that the HU's voice signal includes speech (Cromack, fig. 1, “Conversation Contributors 1-N 100”; [0029] – [0031], “Conversation Attendees”, “Conferees” are hearing users; [0038], “Automatic Speech-Recognizer (ASR) … automatic speech -recognizers are quite accurate at speaker-independent speech-recognition”); 
obtaining a first text string that is a transcription of the HU's voice signal, the first text string generated by a first automatic speech recognition (ASR) engine using the HU's voice signal (Cromack, [0038], “Automatic Speech-Recognizer (ASR) … automatic speech -recognizers are quite accurate at speaker-independent speech-recognition ... The CES alleviates these problems without requiring explicit speech -recognizer training for each speaker ... for both speaker-dependent and speaker-independent models”; figs. 10 – 16; [0088], “When automatic note field transcription has been invoked, the system displays a new timestamp 6310 and speaker name 6320 in addition to the transcribed note 6330 corresponding to the spoken conversation at each transition in speakers. When multiple speakers are simultaneously speaking, the transcription service separates the independent threads of conversation on separate lines with corresponding starting time stamps and speaker names”); 
obtaining a second text string that is a transcription of a second voice signal, the second voice signal including a revoicing of the HU's voice signal by a call assistant and the second text string generated by a second ASR engine using the second voice signal Cromack, [0039], “Voice-writer-is a human trained to use an automatic speech-recognition system (which has conversely been trained to recognize the individual voice-writer) who listens to untrained speakers' utterances and enunciates those utterances into the speech-recognizer for transcription”; Voice-writer – a call assistant; "Voice-writer" generates a second text string; 
generating an output text string from the first text string and the second text string, the output text string includes one or more first words from the first text string and one or more second words from the second text string (Cromack, [0041], “is a human trained to correct the output of automatic speech-recognizers by listening to the original transcribed utterances, viewing the transcriptions, spotting any errors, and correcting them via speech input through a microphone and speech-recognizer, by hand with a keyboard and mouse, or using other computer input devices”; see fig. 11 in Cromack, 11100, “Edit Transcript”; figs. 4,6,8; figs. 10 – 16; [0082]; [0084]”); and 
providing the output text string, without providing the first text string and the second text string, as a transcription of the speech to the second device for presentation during the communication session (Cromack, figs. 4,6, 8; figs. 10 – 16 show that the audio playback controls and transcription; [0081] – [0082]; [0081], “The subscriber can select the segment of audio to be replayed from any bookmarked … selected portions of the conversation along with related data), key word, or topic in this call or any previous conversation recorded in the subscriber's portfolio”; [0084], “the user may subsequently request that the service transcribe a note from the audio recording by clicking the Transcribe button 6130 located next to the CogI label title 6210 (see Icon description below)”; conversation transcriptions and the audio segment can be outputted at the same time). 

Re claim 2:
2. The method of claim 1, wherein the first ASR engine is not trained for transcribing a particular voice signal (Cromack, [0038], “Automatic Speech-Recognizer (ASR) … automatic speech -recognizers are quite accurate at speaker-independent speech-recognition ... The CES alleviates these problems without requiring explicit speech -recognizer training for each speaker ... for … speaker-independent models”; figs. 10 – 16; [0088], “When automatic note field transcription has been invoked, the system displays a new timestamp 6310 and speaker name 6320 in addition to the transcribed note 6330 corresponding to the spoken conversation at each transition in speakers. When multiple speakers are simultaneously speaking, the transcription service separates the independent threads of conversation on separate lines with corresponding starting time stamps and speaker names”; The ASR in Cromack can be trained to be a speaker-independent model or “not trained to a specific user’s voice signal”). 
Re claim 3:
3. The method of claim 1, wherein generating the output text string further includes: aligning the first text string and the second text string; and comparing the aligned first and second text strings (Cromack, figs. 4,6, 8; figs. 10 – 16 show that the audio playback controls and transcription; a user may edit/insert a text string into a position (or align) of an ASR transcription (See fig. 15, 15100); [0081] – [0082]; [0081], “The subscriber can select the segment of audio to be replayed from any bookmarked … selected portions of the conversation along with related data), key word, or topic in this call or any previous conversation recorded in the subscriber's portfolio”).

Re claim 4:
The method of claim 1, wherein the step of providing includes providing the output text string, without providing the first text string and the second text to the second device (Cromack, figs. 4,6, 8; figs. 10 – 16 show that the audio playback controls and transcription; [0081] – [0082]; [0081], “The subscriber can select the segment of audio to be replayed from any bookmarked … selected portions of the conversation along with related data), key word, or topic in this call or any previous conversation recorded in the subscriber's portfolio”; [0084], “the user may subsequently request that the service transcribe a note from the audio recording by clicking the Transcribe button 6130 located next to the CogI label title 6210 (see Icon description below)”; conversation transcriptions and the audio segment can be outputted at the same time). 

Re claim 5:
5. The method of claim 1, further comprising correcting at least one word in one or more of: the output text string, the first text string, and the second text string based on input obtained from a device associated with the CA (Cromack, [0041]; [0086]). 

Re claim 11:
11. At least one non-transitory computer-readable media configured to store one or more instructions that in response to being executed by at least one computing system cause performance of the method of claim 1 (Cromack, [0066]). 

Re claim 13:
13. The method of claim 12, wherein the ASR system includes first and second ASR engines, the first ASR engine used to generate the first text string and the second ASR engine used to generate the second text string, the second ASR engine trained to the voice of the call assistant (Cromack, [0038], “training its speech recognizers, for both speaker-dependent … models, on the actual speech of each speaking participant”; [0080], “the quality level of computer automated speech recognition is dramatically improved when the recognizer has been trained to service a single speaker”; the recognizer can be trained for a single speaker; [0039], “Voice-writer-is a human trained to use an automatic speech-recognition system (which has conversely been trained to recognize the individual voice-writer) who listens to untrained speakers' utterances and enunciates those utterances into the speech-recognizer for transcription”; Voice-writer – a call assistant; "Voice-writer" generates a second text string). 

Re claim 15:
15. The method of claim 12, further comprising correcting at least one word in one or more of: the output text string, the first text string, and the second text string based on input obtained from a device associated with the call assistant (Cromack, [0041]). 

Re claim 19:
19. The method of claim 1 wherein the step of obtaining a second text string includes receiving a second voice signal that is a revoicing by the CA of the HU's voice signal, using a second ASR engine to transcribe the second voice signal to the second text string, the second ASR engine trained to the voice of the CA (Cromack, [0039], “Voice-writer-is a human trained to use an automatic speech-recognition system (which has conversely been trained to recognize the individual voice-writer) who listens to untrained speakers' utterances and enunciates those utterances into the speech-recognizer for transcription”; Voice-writer – a call assistant; "Voice-writer" generates a second text string). 

Re claim 20:
20. At least one non-transitory computer-readable media configured to store one or more instructions that in response to being executed by at least one computing system cause performance of the method of claim 12 (Cromack, [0066]). 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 6-9, 16-18, 21-22 and 24-27 are rejected under 35 U.S.C. 103 as being unpatentable over Cromack et al. (US 2009/0306981 A1) in view of Kahn (US 2006/0167686 A1; known here as Kahn’686).
Re claims 26, 27:
26. Cromack teaches a method (Cromack, Abstract) comprising: 
obtaining a hearing user's (HU's) voice signal originating at a first device during a communication session between the first device and a second device, the communication session configured for verbal communication such that the HU's voice signal includes speech (Cromack, fig. 1, “Conversation Contributors 1-N 100”; [0029] – [0031], “Conversation Attendees”, “Conferees” are hearing users; [0038], “Automatic Speech-Recognizer (ASR) … automatic speech -recognizers are quite accurate at speaker-independent speech-recognition”); 
obtaining a first text string that is a transcription of the HU's voice signal, the first text string generated by a first automatic speech recognition (ASR) engine using the HU's voice signal wherein the first ASR engine is not trained to a specific user's voice signal (Cromack, [0038], “Automatic Speech-Recognizer (ASR) … automatic speech -recognizers are quite accurate at speaker-independent speech-recognition ... The CES alleviates these problems without requiring explicit speech -recognizer training for each speaker ... for … speaker-independent models … an automatic speech-recognition system (which has conversely been trained to recognize the individual voice-writer”; figs. 10 – 16; [0088]; The ASR in Cromack can be trained to be a speaker-independent model or “not trained to a specific user’s voice signal”); 
obtaining a second text string that is a transcription of a second voice signal, the second voice signal including a revoicing of the HU's voice signal by a call assistant and the second text string generated by a second ASR engine using the second voice signal, the second ASR engine trained to the voice of the call assistant (Cromack, [0039], “Voice-writer-is a human trained to use an automatic speech-recognition system (which has conversely been trained to recognize the individual voice-writer) who listens to untrained speakers' utterances and enunciates those utterances into the speech-recognizer for transcription”; Voice-writer – a call assistant; "Voice-writer" generates a second text string) and the second text string generated by a second ASR engine using the second voice signal wherein the second ASR engine is trained for the captioning (Cromack, [0038], “training its speech recognizers, for both speaker-dependent … models, on the actual speech of each speaking participant”; [0080], “the quality level of computer automated speech recognition is dramatically improved when the recognizer has been trained to service a single speaker”; the recognizer can be trained for a single speaker); 
generating an output text string including at least portions of the first text string, the second text string (Cromack, [0041], “is a human trained to correct the output of automatic speech-recognizers by listening to the original transcribed utterances, viewing the transcriptions, spotting any errors, and correcting them via speech input through a microphone and speech-recognizer, by hand with a keyboard and mouse, or using other computer input devices”; see fig. 11 in Cromack, 11100, “Edit Transcript”; figs. 4,6,8; figs. 10 – 16; [0082]; [0084]”); and providing the output text string as a transcription of the speech to the second device for presentation during the communication session substantially concurrently with the presentation of the HU's voice signal by the second device (Cromack, figs. 4,6, 8; figs. 10 – 16 show that the audio playback controls and transcription; [0081] – [0082]; [0081], “The subscriber can select the segment of audio to be replayed from any bookmarked … selected portions of the conversation along with related data), key word, or topic in this call or any previous conversation recorded in the subscriber's portfolio”; [0084], “the user may subsequently request that the service transcribe a note from the audio recording by clicking the Transcribe button 6130 located next to the CogI label title 6210 (see Icon description below)”; conversation transcriptions and the audio segment can be outputted at the same time). 

27. Cromack teaches a system (Cromack, Abstract) comprising: 
one or more processors; and 
at least one non-transitory computer-readable media coupled to the one or more processors, the at least one non-transitory computer-readable media configured to store one or more instructions that in response to being executed by the one or more processors cause the system to perform operations (Cromack, [0065] – [0066], “host software”), the operations comprising: 
obtain a hearing user's (HU's) voice signal originating at a first device during a communication session between the first device and a second device, the communication session configured for verbal communication such that the HU's voice signal includes speech (Cromack, fig. 1, “Conversation Contributors 1-N 100”; [0029] – [0031], “Conversation Attendees”, “Conferees” are hearing users; [0038], “Automatic Speech-Recognizer (ASR) … automatic speech -recognizers are quite accurate at speaker-independent speech-recognition”); 
obtain a first text string that is a transcription of the HU's voice signal, the first text string generated using automatic speech recognition (ASR) technology using the HU's voice signal (Cromack, [0038], “Automatic Speech-Recognizer (ASR) … automatic speech -recognizers are quite accurate at speaker-independent speech-recognition ... The CES alleviates these problems without requiring explicit speech -recognizer training for each speaker ... for both speaker-dependent and speaker-independent models”; figs. 10 – 16; [0088], “When automatic note field transcription has been invoked, the system displays a new timestamp 6310 and speaker name 6320 in addition to the transcribed note 6330 corresponding to the spoken conversation at each transition in speakers. When multiple speakers are simultaneously speaking, the transcription service separates the independent threads of conversation on separate lines with corresponding starting time stamps and speaker names”); 
obtain a second text string that is a transcription of a second voice signal, the second voice signal including a revoicing of the HU's voice signal and the second text string generated by the ASR technology using the second voice signal Cromack, [0039], “Voice-writer-is a human trained to use an automatic speech-recognition system (which has conversely been trained to recognize the individual voice-writer) who listens to untrained speakers' utterances and enunciates those utterances into the speech-recognizer for transcription”; Voice-writer – a call assistant; "Voice-writer" generates a second text string; 
generate an output text string from the first text string, the second text string, wherein the output text string includes at least portions of each of the first text string, the second text string and the third text string (Cromack, [0041], “is a human trained to correct the output of automatic speech-recognizers by listening to the original transcribed utterances, viewing the transcriptions, spotting any errors, and correcting them via speech input through a microphone and speech-recognizer, by hand with a keyboard and mouse, or using other computer input devices”; see fig. 11 in Cromack, 11100, “Edit Transcript”; figs. 4,6,8; figs. 10 – 16; [0082]; [0084]”); and 
provide the output text string as a transcription of the speech (Cromack, figs. 4,6, 8; figs. 10 – 16 show that the audio playback controls and transcription; [0081] – [0082]; [0081], “The subscriber can select the segment of audio to be replayed from any bookmarked … selected portions of the conversation along with related data), key word, or topic in this call or any previous conversation recorded in the subscriber's portfolio”; [0084], “the user may subsequently request that the service transcribe a note from the audio recording by clicking the Transcribe button 6130 located next to the CogI label title 6210 (see Icon description below)”; conversation transcriptions and the audio segment can be outputted at the same time).

Cromack does not explicitly disclose obtaining a third text string that is a transcription of the HU's voice signal, the third text string generated by a third ASR engine; generating an output text string from the first text string, the second text string, and the third text string.  Cromack does not explicitly disclose obtain a third text string that is a transcription of the first audio data, the third text string generated by the automatic speech recognition technology; generate an output text string from the first text string, the second text string, and the third text string.

Kahn’686 teaches a system and method for creating a final text from an audio file.  Kahn’686 further teaches the step includes (Claim 26) obtaining a third text string that is a transcription of the HU's voice signal, the third text string generated by a third ASR engine; generating an output text string from the first text string, the second text string, and the third text string; (Claim 27) obtain a third text string that is a transcription of the first audio data, the third text string generated by the automatic speech recognition technology; generate an output text string from the first text string, the second text string, and the third text string. (Kahn’686, fig. 2, 213 - “Second Speech Engine 213” – A third ASR engine; 212 – “First Speech Engine 211” – First ASR engine; [0102], “speech editor 225 may be viewed as a front-end tool by which a correctionist corrects verbatim text to be submitted for speech training or corrects final text”; [0122], “the phrases are arranged irrespective of the utterances, even   to the point of overlapping utterance placeholder characters”; [0100], “The computer will generate transcription in real time and a .dra session file that aligns audio and text”).  Therefore, in view of Kahn’686, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the method described in Cromack, by providing a second speech engine as taught by Kahn’686, so that the transcribed file from the second speech engine provides a comparison text from which the transcribed file "A" from the first speech engine may be compared and the differences highlighted.  The speech editor may track the individual differences and matches between the two transcribed texts and display both of these files, complete with highlighted differences and unhighlighted matches to the correctionist. (Kahn’686).

Re claims 6 - 7:
Cromack does not explicitly disclose a third text string generated by the first ASR engine.  Kahn’686 teaches a system and method for creating a final text from an audio file (Kahn’686, Abstract).  Kahn’686 further teaches 6. The method of claim 1, wherein the input obtained from the device is based on a third text string generated by the first ASR engine using the HU's voice signal.   7. The method of claim 6 wherein the first text string and the third text string are both estimates generated by the first ASR engine for substantially the same portion of the HU's voice signal (Kahn’686, fig. 2, 212, 211 – “First Speech Engine 211”; 252, 211 – “First Speech Engine 211”; [0221] – [0222], “0221] With a training file saved at either step 234, 242, or 248, the process 200 may proceed to the step 250 to encounter the correction session 251. The correction session 251 involves automatically correcting a text file. The lesson learned may be input into a speech engine by updating the user speech files. At step 252, the first speech engine 211 may be selected for automatic correction”; the voice signal were processed twice by the first speech engine).  Therefore, in view of Kahn’686, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the method described in Cromack, by providing the speech engine as taught by Kahn’686, in order to provide the correction session involves automatically correcting a text file. The lesson learned may be input into a speech engine by updating the user speech files (Kahn’686, [0220] – [0223]).

Re claim 9:
9. The method of claim 6 wherein the first ASR engine persistently and automatically generates corrections to the first text string based on context within the first text string (Cromack, [0038], “Speech-recognizers using a functional syntactical model, in which words are recognized in the context of a sentence”). 

Re claims 8, 21 – 22, 24:
Cromack does not explicitly disclose a third text string; nor disclose a third ASR engine.   Kahn’686 teaches the limitation in claims 8, 21 – 22 and 24 – 25; specifically, 8. The method of claim 1, further comprising obtaining a third text string that is a transcription of the HU's voice signal, the third text string generated by a third ASR engine, wherein the output text string is generated from the first text string, the second text string, and the third text string (Kahn’686, fig. 2, 213 - “Second Speech Engine 213” – A third ASR engine; 212 – “First Speech Engine 211” – First ASR engine).  21. The method of claim 8 wherein the first text string and the third text string are both estimates for different segments of the HU's voice signal (Kahn’686, fig. 2).  22. The method of claim 21 wherein the different segments of the HU's voice signal partially overlap (Kahn’686, [0122], “the phrases are arranged irrespective of the utterances, even   to the point of overlapping utterance placeholder characters”).  24. The method of claim 21 wherein the step of providing the output text string includes providing the output text string substantially concurrently with the presentation of the HU's voice signal by the second device (Kahn’686, [0100], “The computer will generate transcription in real time and a .dra session file that aligns audio and text”).   

Therefore, in view of Kahn’686, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the method described in Cromack, by providing a second speech engine as taught by Kahn’686, so that the transcribed file from the second speech engine provides a comparison text from which the transcribed file "A" from the first speech engine may be compared and the differences highlighted.  The speech editor may track the individual differences and matches between the two transcribed texts and display both of these files, complete with highlighted differences and unhighlighted matches to the correctionist (Kahn’686). 

Re claim 25:
25. The method of claim 21 wherein the step of generating an output text string includes generating the output text string based on an alignment between the first text string and the second text string (Cromack, figs. 4,6, 8; figs. 10 – 16 show that the audio playback controls and transcription; a user may edit/insert a text string into a position (or align) of an ASR transcription (See fig. 15, 15100); [0081] – [0082]; [0081], “The subscriber can select the segment of audio to be replayed from any bookmarked … selected portions of the conversation along with related data), key word, or topic in this call or any previous conversation recorded in the subscriber's portfolio”).

Re claims 16 – 18:
Cromack does not explicitly disclose a third text string; nor disclose a third ASR engine.    Kahn’686 teaches 16. The method of claim 15, wherein the input obtained from the device is based on a third text string generated by the automatic speech recognition technology using the HU's voice signal.  17. The method of claim 16, wherein the first text string and the third text string are both  hypothesis generated by the automatic speech recognition technology for the substantially same portion of the HU's voice signal. 18. The method of claim 12, further comprising obtaining a third text string that is a transcription of the HU's voice signal or the second voice signal, the third text string generated by the automatic speech recognition system, wherein the output text string is generated from the first text string, the second text string, and the third text string (Kahn’686, fig. 2, 213 - “Second Speech Engine 213” – A third ASR engine; 212 – “First Speech Engine 211” – First ASR engine; [0122], “the phrases are arranged irrespective of the utterances, even   to the point of overlapping utterance placeholder characters”; [0100], “The computer will generate transcription in real time and a .dra session file that aligns audio and text”).  Therefore, in view of Kahn’686, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the method described in Cromack, by providing a second speech engine as taught by Kahn’686, so that the transcribed file from the second speech engine provides a comparison text from which the transcribed file "A" from the first speech engine may be compared and the differences highlighted.  The speech editor may track the individual differences and matches between the two transcribed texts and display both of these files, complete with highlighted differences and unhighlighted matches to the correctionist. (Kahn’686). 

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over  Cromack et al. (US 2009/0306981 A1) in view of Engelke et al. (US 2001/0005825 A1).
Re claim 10:
Cromack does not explicitly disclose a delay between second and first text string.   Engelke teaches the limitation: 10. The method of claim 1 further including the steps of tracking a delay between generation of the second text string and the first text string, the step of generating an output text string including selecting one or more words from the first text string when the delay is less than a threshold value and selecting one or more words from the second text string when the delay is greater than a threshold value (Engelke, [0006], “The delay before transmission of transcribed text may be adjusted, for example, dynamically based on error rates, perceptual rules, or call assistant or user preference”; [0031]; [0033]; Engelke set a time delay for a call assistant to make change to the machine transcription).   Therefore, in view of Engelke, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the method described in Cromack, by providing the time delay as taught by Engelke, in order to allow a call assistant to make corrections to the machine transcriptions by either edits the text or revoices the pronunciation. 

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1 – 13, 15 - 27 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1 – 73 of U.S. Patent No. 10917519 (‘519). Although the claims at issue are not identical, they are not patentably distinct from each other because the subject matter claimed in the instant application is fully disclosed in the more specific claims of ‘519.  Claims 1 – 73 in ‘519 recite generate a first text string from a first ASR (i.e., ‘519, claims 1, 13), second text string from a second ASR (i.e., ‘519, claims 1, 13) and a third text string from a third ASR (i.e., ‘519, claims 43 – 45, 50). 

Claims 1 – 13, 15 - 27 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1 – 72 of U.S. Patent No. 10878721 (‘721). Although the claims at issue are not identical, they are not patentably distinct from each other because the subject matter claimed in the instant application is fully disclosed in the more specific claims of ‘721.  Claims 1 – 72 in ‘721 recite generate a first text string from a first ASR (i.e., ‘721, claims 20, 22, 30, 34, 64, 70), second text string from a second ASR (i.e., ‘721, claims 20, 22, 30, 34, 64, 70) and a third text string from a third ASR (i.e., ‘721, claims 20, 22, 30, 34, 64, 70). 

Claims 1 – 13, 15 - 27 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1 – 61 of U.S. Patent No. 10389876  (‘876). Although the claims at issue are not identical, they are not patentably distinct from each other because the subject matter claimed in the instant application is fully disclosed in the more specific claims of ‘876. Claims 1 – 61 in ‘876 recite generate a first text string from a first ASR (i.e., ‘876, claims 1, 3, 7), second text string from a second ASR (i.e., ‘876, claims 1, 3, 7) and a third text string from a third ASR (i.e., ‘876, claims 20, 22, 30, 34, 64, 70). 

Claims 1 – 13, 15 - 27 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 – 19 of copending Application No. 16564393 (reference application) (‘393). Although the claims at issue are not identical, they are not patentably distinct from each other because the subject matter claimed in the instant application is fully disclosed in the more specific claims of ‘393.

This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Response to Arguments
Applicant's arguments filed 3/21/2022 have been fully considered but they are not persuasive. 
Applicant argues: 
Nothing in Cromack teaches or suggests that first and second transcriptions of an HU voice signal are generated and then an output text string is generated by combining portions of the first and second transcriptions as required by claim 1.  CA generated transcription is provided by the system and is also used to train the ASR engine.  There is no time when first and second transcriptions are simultaneously generated and then an output transcription including text portions from the first and second transcriptions is generated. 
The Office respectfully disagrees.  Cromack describes a scopist (or Call Assistant (CA)) - a human trained to correct the output of automatic speech-recognizers by listening to the original transcribed utterances, viewing the transcriptions, spotting any errors, and correcting them via speech input through a microphone and speech-recognizer, by hand with a keyboard and mouse, or using other computer input devices (See Cromack, [0041]).  Cromack further teaches the limitation: generating an output text string from the first text string and the second text string, the output text string includes one or more first words from the first text string and one or more second words from the second text string (For example, see fig. 11 in Cromack, 11100, “Edit Transcript”).   A scopist (CA) can replace an error via speech input (second text string) to replace Automated Speech Recognition (ASR) generated text (first text string). 

    PNG
    media_image1.png
    842
    1120
    media_image1.png
    Greyscale

Second, it is unclear which limitation in the claims describes the first and second transcriptions are generated simultaneously, concurrently or in parallel.  In contrast, claims 12, 19, 23, 26 and 27 require obtaining a second text string that is a transcription of a second voice signal, the second voice signal including a revoicing of the HU's voice signal by a call assistant and the second text string generated by the ASR system using the second voice signal.  The limitation suggests that the first and second transcriptions are not generated simultaneously, because a CA has to hear to the HU voice signal or review the first transcription for potential ASR transcription error before manually generated a second text string.

Applicant argues: 
while the two ASR engines simultaneously generate voice signal text strings, portions of those two text strings are never combined to generate an output text string. In addition, while a CA may error correct one of the ASR engine text strings, the CA generated corrected text string is never combined with either one or both of the two ASR engine text strings to generate an output text string.  
The Office respectfully disagrees.  In para. [0054], Kahn states “Preferably, the first speech recognition program 138 and the second speech recognition program 140 would transcribe the same audio file to produce two transcription files that are more likely to have differences from one another”. [0125], “The speech editor 225 of FIG. 2 becomes a powerful tool when the correctionist opens up the transcribed file from the second speech engine 213. One reason for this is that the transcribed file from the second speech engine 213 provides a comparison text from which the transcribed file "A" from the first speech engine 211 may be compared and the differences highlighted. In other words, the speech editor 225 may track the individual differences and matches between the two transcribed texts and display both of these files, complete with highlighted differences and unhighlighted matches to the correctionist.”   The transcriptions from a first, a second speech engine and a human (i.e. correctionist) are combined to generate a final transcript version. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACK YIP whose telephone number is (571)270-5048. The examiner can normally be reached Monday thru Friday; 9:00 AM - 5:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, XUAN THAI can be reached on (571) 272-7147. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JACK YIP/Primary Examiner, Art Unit 3715