DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
In response to the amendment filed 6/24/2022; claims 1 – 21 are pending.

	
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged. Applicant has not complied with one or more conditions for receiving the benefit of an earlier filing date under 35 U.S.C. 120 as follows:
The later-filed application must be an application for a patent for an invention which is also disclosed in the prior application (the parent or original nonprovisional application or provisional application). The disclosure of the invention in the parent application and in the later-filed application must be sufficient to comply with the requirements of 35 U.S.C. 112(a) or the first paragraph of pre-AIA  35 U.S.C. 112, except for the best mode requirement.  See Transco Products, Inc. v. Performance Contracting, Inc., 38 F.3d 551, 32 USPQ2d 1077 (Fed. Cir. 1994)
The disclosure of the prior-filed applications, Application No. 15/982239, 15/729069, 15/171720, 14/953631, 14/632257, 61/946072 fail to provide adequate support or enablement in the manner provided by 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph for one or more claims of this application.  
The cited prior-filed applications fail to provide adequate support or enablement for the limitation:
obtaining a first hypothesis transcription generated by the automated speech recognition system, the first hypothesis transcription including one or more first words determined by the automated speech recognition system to be a transcription of at least a first portion of the voice signal;
obtaining a second hypothesis transcription generated by the automated speech recognition system, the second hypothesis transcription including a plurality of second words determined by the automated speech recognition system to be a transcription of at least a second portion of the voice signal that includes the first portion of the voice signal; 
determining one or more consistent words that are included in both the one or more first words of the first hypothesis transcription and the plurality of second words of the second hypothesis transcription; and 
in response to determining the one or more consistent words, transmitting the one or more consistent words to the second device for presentation of the one or more consistent words by the second device, the presentation of the one or more consistent words configured to occur before the final transcription of the voice signal is provided to the second device.
The cited prior-filed applications fail to disclose the phrases “obtaining a first hypothesis transcription”, “obtaining a second hypothesis transcription” and determining one or more consistent words in both … the first hypothesis transcription … second hypothesis transcription.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1 – 21 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Applicant’s original filed application only mention the phrases “obtaining a first hypothesis transcription”, “obtaining a second hypothesis transcription” and determining one or more consistent words in both … the first hypothesis transcription … second hypothesis transcription in the Abstract and the claim.  However, these phrases are never described in the Applicant’s original specification and figures. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1 – 21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.  
Step 1: Is the claimed invention a statutory category of invention?
Claims 1, 7, 14 are directed to a method to transcribe communication (Step 1, Yes).  

Step 2A, Prong 1: Does the claim recite an abstract idea?
The limitation of steps: 
… obtaining a first hypothesis transcription generated by the automated speech recognition system, the first hypothesis transcription including one or more first words determined by the automated speech recognition system to be a transcription of at least a first portion of the voice signal; 
obtaining a second hypothesis transcription generated by the automated speech recognition system, the second hypothesis transcription including a plurality of second words determined by the automated speech recognition system to be a transcription of at least a second portion of the voice signal that includes the first portion of the voice signal; 
determining one or more consistent words that are included in both the one or more first words of the first hypothesis transcription and the plurality of second words of the second hypothesis transcription; and in response to determining the one or more consistent words, transmitting the one or more consistent words to the second device for presentation of the one or more consistent words by the second device, the presentation of the one or more consistent words configured to occur before the final transcription of the voice signal is provided to the second device as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components.  This type of mental process can be practically performed in the human mind, for instance by a human can mental compare two transcriptions and determine consistent words in the two transcriptions.  The human transcriptionist may mentally compares the candidate transcriptions and choose the suitable transcriptions to create a final transcription.  The mere nominal recitation of at least one processor performing these steps does not take the claim limitation outside of the mental processes grouping. Thus, the claim recites a mental process (Step 2A, Prong 1: yes).

Step 2A, Prong 2: Does the claim recite additional elements that integrate the judicial exception into a practical application? 
Per the 2019 Revised Patent Subject Matter Eligibility Guidance, if a claim as a whole integrates the recited judicial exception into a practical application of that exception, a claim is not "directed to" a judicial exception. Alternatively, a claim that does not integrate a recited judicial exception into a practical application is directed to the exception. Evaluating whether a claim integrates an abstract idea into a practical application is performed by a) identifying whether there are any additional elements recited in the claim beyond the abstract idea, and b) evaluating those additional elements individual and in combination to determine whether they integrate the abstract idea into a practical application, using one or more of the considerations laid out by the Supreme Court and the Federal Circuit. Exemplary considerations indicative that an additional element (or combination of elements) may have or has not been integrated into a practical application are set forth in the 2019 PEG.
With respect to the instant claims, claims 1, 7, 14, 19 recite the additional elements of: a first device, a second device, an automated speech recognition system, at least one processor, at least one memory device communicatively coupled to the at least one processor and a call assistant device.  It is particularly noted that the use of a computing device "as a tool" to perform an abstract method and steps that only amount to extra solution activity are indicated in the 2019 PEG as examples that an additional element has not been integrated into a practical application.  Even in combination, the recited additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits, such as an improvement to a computing system, on practicing the abstract idea (STEP 2A, Prong 2: NO). 

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
Claims 1, 7, 14 and 19 recite the additional elements of: a first device, a second device, an automated speech recognition system, at least one processor, at least one memory device communicatively coupled to the at least one processor and a call assistant device set forth above for Step 2A, Prong 2.   Applicant specification only describes these features in a highly generic manner by stating that HU's device 14, in at least some embodiments, includes a communication device (e.g., a telephone) including a keyboard for dialing phone numbers and a handset including a speaker and a microphone for communication with other devices. In other embodiments device 14 may include a computer, a smart phone, a smart tablet, etc., that can facilitate audio communications with other devices. Devices 12 and 14 may use any of several different communication protocols including analog or digital protocols, a VOIP protocol or others (Applicant’s, published application, [0098]; [0102]; [0432]). There is no indication in the specification that Applicants have achieved an advancement or improvement in ASR technology. Dependent claims 2 – 6, 8 - 13, 15 - 18 and 21 inherit the deficiencies of their respective parent claims through their dependencies and do not recite additional limitations sufficient to direct the claims to more than the claimed abstract idea, and are thus rejected for the same reasons.  	

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1 -  21 are rejected under 35 U.S.C. 103 as being unpatentable over Wetjen et al. (US 2015/0106091 A1) in view of Bigham et al. (US 2013/0317818 A1).
Re claims 1, 7, 14:
1. Wetjen teaches a method to transcribe communications (Wetjen, Abstract0, the method comprising: 
obtaining a voice signal originating at a first device during a communication session between the first device and a second device (Wetjen, fig. 1, 110, 120), the communication session configured for verbal communication (Wetjen, fig. 1, 110, 120, Abstract); 
providing the voice signal to an automated speech recognition system configured to transcribe the voice signal (Wetjen, fig. 1; Abstract); 
before a final transcription of the voice signal is determined by the automated speech recognition system (Wetjen, fig. 6, 650 – Transcription Out), the method including: 
obtaining a first hypothesis transcription generated by the automated speech recognition system, the first hypothesis transcription including one or more first words determined by the automated speech recognition system to be a transcription of at least a first portion of the voice signal (Wetjen, fig. 6, 610, [0108], “each service is processing the same utterance”); 
obtaining a second hypothesis transcription generated by the automated speech recognition system, the second hypothesis transcription including a plurality of second words determined by the automated speech recognition system to be a transcription of at least a second portion of the voice signal that includes the first portion of the voice signal (Wetjen, fig. 6, 615; [0108], “each service is processing the same utterance”); 
determining one or more consistent words that are included in both the one or more first words of the first hypothesis transcription and the plurality of second words of the second hypothesis transcription (Wetjen, [0108], “If there are mismatched words resulting from the services 610, 615, 620, the mismatched words are provided to element 640 where the highest confidence words, or phrases are selected”); and 
in response to determining the one or more consistent words, transmitting the one or more consistent words to the second device for presentation of the one or more consistent words by the second device (Wetjen, fig. 6; [0108] – [0109]). 

Wetjen does not explicitly disclose the presentation of the one or more consistent words configured to occur before the final transcription of the voice signal is provided to the second device.  Bigham et al. (US 2013/0317818 A1) teaches methods and systems for captioning speech in real-time (Bigham, Abstract).  Bigham teaches in response to determining the one or more consistent words, transmitting the one or more consistent words to the second device for presentation of the one or more consistent words by the second device; the presentation of the one or more consistent words configured to occur before the final transcription of the voice signal is provided to the second device (Bigham, fig. 2; fig. 5; [0056]; fig. 2 shows a user interface for editing a transcript before a final transcript can be generated; Abstract, “transcriptions received from each worker are aligned and combined to create a resulting caption”; [0074], “The resulting captions can be used to determine the rate of speech, as well as each worker's performance, by comparing each individual worker's captions to the crowd's result”).   Therefore, in view of Bigham, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the method/system described in Wetjen , by providing the comparison between two transcription before generating a final transcription as taught by Bigham, in order to provide a web interface visually presents relevant information, such as the confidence of each spelling and possible word and arrangement alternatives. These cues both reduce the attention that must be paid to the editing process, and encourage users to focus their efforts on specific problems in the caption. For example, conflicted words or spellings are highlighted and, when selected, alternatives are displayed and can be agreed with or new answers can be added.  These updates are then forwarded back to the combiner (Bigham, [0056]).

7. A system comprising: at least one processor; and at least one memory device communicatively coupled to the at least one processor and configured to store one or more instructions that when executed by the at least one processor cause the system to perform operations comprising: obtaining a voice signal originating at a first device during a communication session between the first device and a second device; providing the voice signal to an automated speech recognition system configured to transcribe the voice signal; obtaining a plurality of hypothesis transcriptions generated by the automated speech recognition system, each of the plurality of hypothesis transcriptions including one or more words determined by the automated speech recognition system to be a transcription of a portion of the voice signal; determining one or more consistent words that are included in two or more of the plurality of hypothesis transcriptions; and in response to determining the one or more consistent words, providing the one or more consistent words to the second device for presentation of the one or more consistent words by the second device, the presentation of the one or more consistent words configured to occur before a final transcription of the voice signal is provided to the second device (See claim 1 rejection above). 

14. A method to transcribe communications, the method comprising: obtaining voice signal originating at a first device during a communication session between the first device and a second device; providing the voice signal to an automated speech recognition system configured to transcribe the voice signal; obtaining a plurality of hypothesis transcriptions generated by the automated speech recognition system, each of the plurality of hypothesis transcriptions including one or more words determined by the automated speech recognition system to be a transcription of a portion of the voice signal; determining one or more consistent words that are included in two or more of the plurality of hypothesis transcriptions; and in response to determining the one or more consistent words, providing the one or more consistent words to the second device for presentation of the one or more consistent words by the second device, the presentation of the one or more consistent words configured to occur before a final transcription of the voice signal is provided to the second device  (See claim 1 rejection above). 

Re claims 2, 8, 15:
2. The method of claim 1, wherein the first hypothesis transcription is not provided to the second device (Bigham, fig. 5 shows a combined transcript without showing a particular of transcription of a captionist). 

8. The system of claim 7, wherein the plurality of hypothesis transcriptions are not provided to the second device (See claim 2 rejection above). 

15. The method of claim 14, wherein the plurality of hypothesis transcriptions are not provided to the second device (See claim 2 rejection above). 

Re claims 3 - 5:
3. The method of claim 1, wherein the voice signal is a portion of total voice signal originating at the first device during the communication session (Wetjen, [0043]). 

4. The method of claim 1, further comprising: obtaining a third hypothesis transcription generated by the automated speech recognition system, the third hypothesis transcription including a plurality of third words determined by the automated speech recognition system to be a transcription of at least a third portion of the voice signal that includes the first portion of the voice signal (Wetjen, fig. 6, 620). 

5. The method of claim 1, further comprising: obtaining the final transcription of the voice signal, the final transcription including a plurality of third words that the automated speech recognition system outputs together as the finalized transcription of the voice signal; determining when a portion of the final transcription that corresponds to the one or more consistent words includes a corrected word that is different from any of the one or more consistent words; and in response to determining that the consistent words include the corrected word, transmitting an indication of the corrected word to the second device such that the second device changes the presentation of the one or more consistent words to include the corrected word (Wetjen, fig. 6 and Bigham, fig. 2; fig. 5; [0056]; fig. 2 shows a user interface for editing a transcript before a final transcript can be generated; Abstract, “transcriptions received from each worker are aligned and combined to create a resulting caption”; [0074], “The resulting captions can be used to determine the rate of speech, as well as each worker's performance, by comparing each individual worker's captions to the crowd's result”). 

Re claim 6:
6. At least one storage device configured to store one or more instructions that when executed by at least one processor cause or direct a system to perform the method of claim 1 (Wetjen, [0021]). 

Re claim 9:
9. The system of claim 7, wherein the automated speech recognition system is included in the system (Wetjen, [0028]). 

Re claim 10:
10. The system of claim 7, wherein the automated speech recognition system includes a plurality of differently tuned automated speech recognition engines and wherein each of the hypothesis transcriptions is generated by a different one of the ASR engines (Wetjen, fig. 6; [0108]). 

Re claim 11:
11. The system of claim 7, wherein at least one of the hypothesis transcriptions for at least one word is based on other words corresponding to the voice signal that are temporally proximate the at least one word (Wetjet, [0107], “the probabilities for every word in every utterance in the language would be exactly equal to the measured frequency of occurrence within the language. In the case of our example, "To be, or not to be, that is the question" is a direct quote from William Shakespeare's 'Hamlet' or a paraphrase or reference to it. Thus, given the utterance "To be, or not to be, that is the ___ ", the word 'question', should have the highest probability of occurring of any word in the language, and should therefore be chosen from the set of partial homophones”; the word “question” is selected based on the position of the blank). 

Re claim 12:
12. The system of claim 7, wherein the operations further comprise: obtaining a subsequent transcription of the voice signal; determining when a portion of the subsequent transcription that corresponds to the one or more consistent words includes a corrected word that is different from any of the one or more consistent words; and in response to determining the corrected word, provide an indication of the corrected word to the second device such that the second device changes the presentation of the one or more consistent words to include the corrected word, the presentation of the corrected word configured to occur before the final transcription of the voice signal is provided to the second device (Wetjen, fig. 6 and Bigham, fig. 2; fig. 5; [0056]; fig. 2 shows a user interface for editing a transcript before a final transcript can be generated; Abstract, “transcriptions received from each worker are aligned and combined to create a resulting caption”; [0074], “The resulting captions can be used to determine the rate of speech, as well as each worker's performance, by comparing each individual worker's captions to the crowd's result”). 

Re claim 13:
13. The system of claim 12, wherein the operations further comprise: obtain the final transcription of the voice signal, the final transcription including a plurality of words that the automated speech recognition system outputs together as the final transcription of the voice signal; determine when a portion of the final transcription that corresponds to the one or more consistent words includes a final word that is different from any of the one or more consistent words as corrected via the corrected word; and in response to determining the final word, provide an indication of the final word to the second device such that the second device changes the presentation of the words to include the final word (Wetjen, fig. 6 and Bigham, fig. 2; fig. 5; [0056]; fig. 2 shows a user interface for editing a transcript before a final transcript can be generated; Abstract, “transcriptions received from each worker are aligned and combined to create a resulting caption”; [0074], “The resulting captions can be used to determine the rate of speech, as well as each worker's performance, by comparing each individual worker's captions to the crowd's result”). 

Re claim 16:
16. The method of claim 14, wherein the plurality of hypothesis transcriptions are obtained sequentially over time and a first portion of the voice signal associated with a first one of the plurality of hypothesis transcriptions includes all of the voice signal associated with all of the plurality of hypothesis transcriptions obtained previous to obtaining the first one of the plurality of hypothesis transcriptions (Wetjen, [0030], “Indexing audio to text: Indexing audio to text is the process of linking segments of recorded audio to text based elements so that the audio can be accessed by means of text-based search processes”; [0074], “Each of the individual participant's utterances from the mixing audio server, containing the metadata about the participant and the time the utterance started, are placed into two first in first out (FIFO) queues 265, 266. The ASR will then pull the audio from the first queue, transcribing the utterances, and places the result along with any metadata that was with the audio on another FIFO queue where it will be sent to any participant who is subscribed to the real-time feed; it is also stored into the database 260 for on-demand retrieval”). 

Re claim 17:
17. The method of claim 14, further comprising: determining a corrected word in a subsequent transcription of the voice signal that is different from any of the consistent words; and in response to determining the corrected word, providing an indication of the corrected word to the second device, the second device using the corrected word to replace one or more of the consistent words in the presentation of the consistent words (Bigham, fig. 2; fig. 5; [0056]; fig. 2 shows a user interface for editing a transcript before a final transcript can be generated; Abstract, “transcriptions received from each worker are aligned and combined to create a resulting caption”; [0074], “The resulting captions can be used to determine the rate of speech, as well as each worker's performance, by comparing each individual worker's captions to the crowd's result”; [0050], “the worker interface 14 may be configured to change the color of correct words and/or incorrect words”). 

Re claims 18 - 21:
18. The method of claim 17 wherein the automated speech recognition system generates each of the hypothesis transcriptions as well as the subsequent transcription (Wetjen, fig. 6; [0028]; [0108]). 

19. The method of claim 17 wherein the automated speech recognition system generates each of the plurality of hypothesis transcriptions and wherein the subsequent transcription is received via a call assistant interface device  (Wetjen, fig. 6; [0028]; [0108]). 

20. The method of claim 14 wherein the automated speech recognition system includes a plurality of differently tuned ASR engines and wherein the two or more of the plurality of hypothesis transcriptions are generated via two or more of the differently tuned ASR engines, respectively  (Wetjen, fig. 6; [0028]; [0108]). 

21. The method of claim 1 wherein the first and second portions of the voice signal are identical (Wetjen, [0108]).

Response to Arguments
Applicant's arguments filed 6/24/2022 have been fully considered but they are not persuasive.
Applicant argues:
The description, moreover, "need not be in ipsis verbis [i.e., "in the same words"] to be sufficient." (See MPEP 2163) Additional support is found at least in paragraphs [0328] (hypothesis words (e.g., initially identified words prior to contextual correction based on subsequent words)), [0455] (hypothesis for words uttered), and [0467] (text hypothesis).
The cited paragraphs are listed below:
Para. [0328]: 
Restarting an ASR engine at various points within an HU voice signal has the additional benefit of making all hypothesis words (e.g., initially identified words prior to contextual correction based on subsequent words) firm in at least some embodiments. Doing so allows a CA correcting the text to make corrections or any other manipulations deemed appropriate for an AU immediately without having to wait for automated contextual corrections and avoids a case where a CA error correction may be replaced subsequently by an ASR engine correction.
Para. [0455]: 
In other cases where a CA initiates or completes a word correction, the ASR engine may be programmed to disable generating additional estimates or hypothesis for any words uttered by the HU prior to the CA corrected word or within a text segment or phrase that includes the corrected word. Thus, for instance, in some cases, where 30 text words appear on a CA's display screen, if the CA corrects the fifth most recently presented word, the fifth most recently corrected word and the 25 preceding words would be rendered firm and unchangeable via the ASR engine. Here, in some cases the CA would still be free to change any word presented on her display screen at any time. In other cases, once a CA corrects a word, that word and any preceding text words may be firm as to both the CA and the ASR engine.
Para. [0467]: 
In still other cases it is contemplated that only final ASR engine text may be sent on to an AU for consideration. In this case, for instance, ASR generated text may be transmitted to an AU device in blocks where context afforded by surrounding words has already been used to refine text hypothesis. For instance, words may be sent in five word text blocks where the block sent always includes the 6th through 10th most recently transcribed words so that the most recent through fifth most recent words can be used contextually to generate final text hypothesis for the 6th through 10th most recent words. Here, CA text corrections would still be made at a relay and transmitted to the AU device for in line corrections of the ASR engine final text.
Claim 1 requires “obtaining a first hypothesis transcription generated by the automated speech recognition system … including one or more first words … a transcription of at least a first portion of the voice signal … the second hypothesis transcription including … a transcription of at least a second portion of the voice signal that includes the first portion of the voice signal … ”   Claim 4 requires  “obtaining a third hypothesis transcription … including … a transcription of at least a third portion of the voice signal that includes the first portion of the voice signal”.   The Office respectfully submits that the cited paragraphs fail to mention three overlapping transcription from three hypothesis transcriptions.   Para. [0328] suggests CA correcting the text to make corrections or any other manipulations deemed appropriate for an AU immediately without having to wait for automated contextual corrections.  A CA (i.e., a human) corrects the text is not the same as ASR generated hypothesis.  Para. [0455] discusses the how the system works between a CA and an ASR engine; however, it does not explicitly disclose three overlapping hypothesis transcriptions by an ASR engine.  Para. [0467] states “words may be sent in five word text blocks where the block sent always includes the 6th through 10th most recently transcribed words so that the most recent through fifth most recent words can be used contextually to generate final text hypothesis for the 6th through 10th most recent words”.  There is no teaching in para. [0467] that shows a first portion of the voice signal that has been transcribed three times to generate three hypothesis transcriptions.
 
Applicant argues:
Each of the independent claims requires an automated speech recognition system. The Office Action has provided no evidence that an automated speech recognition system is a "generic computer component," or that the claimed process is routine or conventional. Applicant therefore respectfully submits that the Office Action has not presented a prima facie case of unpatentability under 35 U.S.C. Section 101, and respectfully requests that the rejection be withdrawn.  The Office Action further fails to provide prima facie case of unpatentability under 35 U.S.C. 101 because it fails to address the dependent claims. 
The Examiner respectfully disagrees. The Examiner has met his examination burden of providing a detailed subject-matter eligibility analysis applying Steps 1, 2A (i.e. Prong One and Prong Two) and 2B in the last office action mailed 12/24/2021 and has provided a further detailed explanation above, which reasonably describes a human transcriptionist may mentally compares the candidate transcriptions and choose the suitable transcriptions to create a final transcription.  As such, the argument is not persuasive.

Applicant argues:
Wetjen discloses a system in which "each uttered word" is identified and compared. The words are correlated, for example, based on time stamps to "ensure each service is processing the same utterance." The process of Wetjun, therefore, is clearly not the process of claim 1. Claim 1 compares a first transcription that includes one or more first words to a second transcription that includes a plurality of words. Wetjun does not suggest comparing one or more first word against a plurality of second words in a second transcript. To the contrary, Wetjun assures a one to one correspondence, that is, that only a single word or "utterance" is processed at a time.
The Examiner respectfully disagrees.  Wetjun explicitly states: 
“Given the utterance, "To be or not to be, that is the question." Perhaps a resulting transcription is, "To be or not to be, that is the equestrian." If the words and confidence values returned from the transcription service are as follows: To (0.9), be (0.87), or (0.99), not (0.95), to (0.9), be (0.85), that (0.89), is (0.88), the (0.79), equestrian (0.45), then the word "equestrian" is selected as a possible error based on its confidence score being lower than a target threshold, (0.5 for example).” [0085]; 
“In the case of our example, "To be, or not to be, that is the question" is a direct quote from William Shakespeare's 'Hamlet' or a paraphrase or reference to it. Thus, given the utterance "To be, or not to be, that is the ___ ", the word 'question', should have the highest probability of occurring of any word in the language”, [0107]. 
“phrase may be selected from one of the services, along with one or more words from different services to arrive at a more accurate combination of words and phrases for a given time interval.” ([0108]). 
“Method 600 combines the results from multiple speech recognition services to produce a more accurate result. Given an audio stream consisting of multiple user correlated channels, the same audio is processed by multiple speech recognition services. Results from each of the speech recognition services are then compared with each other to find the best matching sets of words and phrases among all of the results.” ([0109]).  
“Results from each of the speech recognition services are then compared with each other to find the best matching sets of words and phrases among all of the results.” ([0048]).

    PNG
    media_image1.png
    946
    728
    media_image1.png
    Greyscale

Wetjun teaches first and second hypothesis transcriptions can be generated from multiple speech recognition services.  First and second hypothesis transcriptions can be “question” or “equestrian” (examples in para. [0085]).  Method 600 combines the results from multiple speech recognition services to produce a more accurate result to generate sentence such as “"To be, or not to be, that is the question" (para. [0107]).   Fig. 6, step 640 (“select highest confidence words and phrases”) and para. [0048] (“find the best matching sets of words and phrases”) show multiple words can be generated by an automatic speech recognition server. 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACK YIP whose telephone number is (571)270-5048. The examiner can normally be reached Monday thru Friday; 9:00 AM - 5:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, XUAN THAI can be reached on (571) 272-7147. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JACK YIP/Primary Examiner, Art Unit 3715