DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 1-20 and 22-23 are pending in this application.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.  A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The USPTO internet Web site contains terminal disclaimer forms which may be used.  Please visit http://www.uspto.gov/forms/.  The filing date of the application will determine what form should be used.  A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission.  For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.  
Claims 1-20 are rejected on the ground of nonstatutory double patenting over claims 1-20 of Co-pending Application 17/127,166. Although the claims at issue are not identical, they are not patentably distinct from each other because adding inherent and/or unnecessary limitations/step and rearranging the claims would be within the level of one of ordinary skill in the art. It is well settled that the insertion of an element, e.g. “each entry is the data structure indicates whether a corresponding word was interpreted differently by the first and second transcription services”, and its function is an obvious expedient if the remaining elements perform the same function as before. In re Karlson, 136 USPQ 184 (CCPA 1963). Also note Ex parte Rainu, 168 USPQ 375 (Bd. App. 1969). Insertion of a reference element or step whose function is not needed would be obvious to one of ordinary skill in the art.
Instant Application No. 17/127,235
Co-pending Application 17/127,166
1. A non-transitory computer-readable medium with instructions stored thereon that, when executed by a processor of a computing device, cause the computing device to perform operations comprising: forwarding a separate copy of an audio file to each of multiple transcription services via a corresponding application programming interface; acquiring multiple transcripts by obtaining a separate transcript from each of the multiple transcription services via the corresponding application programming interface; populating, based on the multiple transcripts, a series of data structures so as to produce a tuple for each word uttered in the audio file; and causing display of a master transcript derived from the series of data structures.

2. The non-transitory computer-readable medium of claim 1, wherein each tuple comprises all interpretations of the corresponding word by the multiple transcription services.

3. The non-transitory computer-readable medium of claim 2, wherein each tuple further comprises information regarding part of speech of the corresponding word.

4. The non-transitory computer-readable medium of claim 2, wherein each tuple further comprises information regarding relation of the corresponding word to one or more other words uttered in the audio file.

5. The non-transitory computer-readable medium of claim 1, wherein the master transcript is displayed is such a manner that any misaligned segments are visually distinguishable from the remainder of the master transcript.

6. The non-transitory computer-readable medium of claim 1, further comprising: identifying a misaligned segment of the master transcript, wherein the misaligned segment corresponds to a portion of the audio file for which the multiple transcription services have different interpretations as determined from the corresponding tuple; and embedding one or more suggested replacements in the master transcript proximate to the misaligned segment.

7. The non-transitory computer-readable medium of claim 6, wherein the one or more suggested replacements includes all interpretations of the corresponding word by the multiple transcription services.

8. A system comprising: a memory that includes instructions for deriving master transcripts through automated comparison of transcripts generated by different transcription services; and a processor that, upon executing the instructions, is configured to: obtain a series of n-tuples associated with an audio file for which a master transcript is to be generated, wherein each n-tuple comprises n interpretations of a corresponding word in the audio file, and wherein each of the n interpretations is provided by a different transcription service, derive the master transcript based on the series of n-tuples, and cause display of the master transcript on an interface through which the master transcript is alterable.

9. The system of claim 8, wherein each n-tuple is representative of a sequence of the n interpretations of the corresponding word.

10. The system of claim 9, wherein the n interpretations are ordered in terms of likelihood of being correct as determined based on historical accuracy of the corresponding transcription service.

11. The system of claim 9, wherein the n interpretations are ordered in a predetermined manner based on the corresponding transcription service such that each transcription service occupies a given field in each n-tuple.

12. The system of claim 8, wherein the processor is further configured to: for each of the series of n-tuples, indicate whether the corresponding word was interpreted differently by the different transcription services.

13. The system of claim 12, wherein said indicating comprises populating a field in each of the series of n-tuples to specify whether identical interpretations were provided by the different transcription services.

14. The system of claim 13, wherein words with identical interpretations across the n interpretations are deemed to be properly interpreted, and wherein words with dissimilar interpretations across the n interpretations are deemed to be improperly interpreted.

15. The system of claim 12, wherein the processor is further configured to: identify a word for which the interpretation is not identical across the n interpretations, and indicate, on the interface, a type of issue responsible for the nonidentical interpretation of the word.

16. The system of claim 15, wherein the type of issue is established based on an analysis of the corresponding n-tuple in the series of n-tuples.

17. The system of claim 15, wherein the type of issue is misinterpretation of a non- speech utterance, substitution of an acronym, mispronunciation of an acronym, or misuse of an acronym.
1. A method comprising: receiving, by a processor, input indicative of a selection of an audio file; retrieving, by the processor, the audio file from a storage medium; forwarding, by the processor, a first copy of the audio file to a first transcription service via a first application programming interface, and a second copy of the audio file to a second transcription service via a second application programming interface; receiving, by the processor, a first transcript from the first transcription service via the first application programming interface, and a second transcript from the second transcription service via the second application programming interface; generating, by the processor, a master transcript based on the first and second transcripts; identifying, by the processor, a misaligned segment of the master transcript, wherein the misaligned segment corresponds to a portion of the audio file for which the first and second transcription services have different interpretations; and causing, by the processor, display of the master transcript in such a manner that the misaligned segment is visually distinguishable from the remainder of the master transcript.

2. The method of claim 1, further comprising: embedding, by the processor, one or more suggested replacements in the master transcript proximate to the misaligned segment.

3. The method of claim 2, further comprising: receiving, by the processor, second input indicative of a selection of a given suggested replacement from amongst the one or more suggested replacements for the misaligned segment; and replacing, by the processor in response to receiving the second input, the misaligned segment with the given suggested replacement in the master transcript.

4. The method of claim 1, wherein said identifying comprises: populating a data structure representative of the master transcript based on analysis of the first and second transcripts, wherein each entry is the data structure indicates whether a corresponding word was interpreted differently by the first and second transcription services.

5. The method of claim 4, wherein words with identical interpretations in the first and second transcripts are deemed to be properly interpreted, and wherein words with dissimilar interpretations in the first and second transcripts are deemed to be improperly interpreted.

6. The method of claim 1, wherein the storage medium is accessible to the processor via a network.

7. A non-transitory computer-readable medium with instructions stored thereon that, when executed by a processor of a computing device, cause the computing device to perform operations comprising determining that a master transcript is to be generated for an audio file; forwarding a separate copy of the audio file to each of multiple transcription services via a corresponding application programming interface; acquiring multiple transcripts by obtaining a separate transcript from each of the multiple transcription services via the corresponding application programming interface; deriving the master transcript by comparing the multiple transcripts on a per-word basis; and storing the master transcript in a storage medium.

8. The non-transitory computer-readable medium of claim 7, wherein said determining comprises establishing that input indicative of a selection of the audio file has been received.

9. The non-transitory computer-readable medium of claim 7, wherein the operations further comprise: identifying a word for which the interpretation is not identical across the multiple transcripts, and posting the master transcript to an interface for review, wherein the word is visually distinguishable from words for which the interpretation is identical across the multiple transcripts.

10. The non-transitory computer-readable medium of claim 9, wherein the operations further comprise: indicating, on the interface, a type of issue responsible for the nonidentical interpretation of the word.

11. The non-transitory computer-readable medium of claim 10, wherein the type of issue is misinterpretation of a non-speech utterance, substitution of an acronym, mispronunciation of an acronym, or misuse of an acronym.

12. The non-transitory computer-readable medium of claim 7, wherein the operations further comprise: receiving input indicative of a selection of the multiple transcription services.

13. The non-transitory computer-readable medium of claim 12, wherein the operations further comprise: for each of the multiple transcription services, initiating, in response to said receiving, a connection with the corresponding application programming interface.

14. The non-transitory computer-readable medium of claim 12, wherein the operations further comprise: receiving input indicative of a selection of a portion of the master transcript; identifying a portion of the audio file that corresponds to the selected portion of the master transcript; and forwarding the portion of the audio file to a transcription service that is not one of the multiple transcription services.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Thomson et al., (US Pub. 2020/0175987).
Regarding claim 1, Thomson discloses a non-transitory computer-readable medium with instructions stored thereon that, when executed by a processor of a computing device, cause the computing device to perform operations comprising: 
forwarding a separate copy of an audio file to each of multiple transcription services via a corresponding application programming interface (Fig. 13, [0333]-[0336] the ASR systems 1320 may include multiple ASR system (ASR 1 – ASR n) and receive audio input separately); 
acquiring multiple transcripts by obtaining a separate transcript from each of the multiple transcription services via the corresponding application programming interface (Figs. 13 and 14, [0333]-[0336][0344] a Fuser 1324 acquires multiple transcripts which are generated by each of the ASR system (ASR 1 – ASR n)); 
populating, based on the multiple transcripts, a series of data structures so as to produce a tuple for each word uttered in the audio file (Figs. 13 and 14, [0345]-[0368] a series of tokens are aligned uttered in the audio input; [0230] dividing by the number of tokens in the transcription; ‘the divided tokens are indicative of “data structure”); and 
causing display of a master transcript derived from the series of data structures (Figs. 13 and 14, [0187][0343]-[0345] outputting transcription 1410 which may performed by a fuser).
Regarding claim 2, Thomson discloses the non-transitory computer-readable medium of claim 1, Thomson further discloses:
wherein each tuple comprises all interpretations of the corresponding word by the multiple transcription services ([0397]-[0400] an example of hypothesis ASR output from the multiple ASR systems).
Regarding claim 3, Thomson discloses the non-transitory computer-readable medium of claim 2, Thomson further discloses:
 wherein each tuple further comprises information regarding part of speech of the corresponding word ([0397]-[0400] an example of hypothesis transcriptions of ASR output from the multiple ASR systems).
Regarding claim 4, Thomson discloses the non-transitory computer-readable medium of claim 2, Thomson further discloses:
wherein each tuple further comprises information regarding relation of the corresponding word to one or more other words uttered in the audio file ([0339] and Table 3, ASR1 and ASR2 use different language model which is based on n-gram).
Regarding claim 5, Thomson discloses the non-transitory computer-readable medium of claim 1, Thomson further discloses:
wherein the master transcript is displayed is such a manner that any misaligned segments are visually distinguishable from the remainder of the master transcript ([0401]-[0407] an example of Error map which is displayed with word misalignment).
Regarding claim 6, Thomson discloses the non-transitory computer-readable medium of claim 1, Thomson further discloses:
identifying a misaligned segment of the master transcript, wherein the misaligned segment corresponds to a portion of the audio file for which the multiple transcription services have different interpretations as determined from the corresponding tuple; and embedding one or more suggested replacements in the master transcript proximate to the misaligned segment ([0372] “an ASR system may generate multiple ranked hypotheses for a segment of audio. The tokens may be assigned weights according to each token's appearance in a particular one of the multiple ranked hypotheses”; [0401]-[0407] an example of Error map which is displayed with word misalignment; [0474][475] “the detector 1720 … identifying key words, and/or phrases … provide an indication of the identified key words and/or phrases in the transcription that may be adjusted and the type of adjustment”).
Regarding claim 7, Thomson discloses the non-transitory computer-readable medium of claim 6, Thomson further discloses:
wherein the one or more suggested replacements includes all interpretations of the corresponding word by the multiple transcription services ([475]-[0489] “the detector 1720  … provide an indication of the identified key words and/or phrases in the transcription that may be adjusted and the type of adjustment”).
 Regarding claim 8, Thomson discloses a system comprising: 
a memory that includes instructions for deriving master transcripts through automated comparison of transcripts generated by different transcription services ([0101][0111][0112][0182]-[0187] transcription system includes memory); and
a processor that, upon executing the instructions, is configured to ([0111][0112] a  system includes at least one processor): 
obtain a series of n-tuples associated with an audio file for which a master transcript is to be generated (Figs. 13 and 14, [0345]-[0368] a series of tokens are aligned uttered in the audio input), 
wherein each n-tuple comprises n interpretations of a corresponding word in the audio file ([0339][0397]-[0400] an example of hypothesis transcriptions of ASR output from the multiple ASR systems), and 
wherein each of the n interpretations is provided by a different transcription service, derive the master transcript based on the series of n-tuples ([0372] “an ASR system may generate multiple ranked hypotheses for a segment of audio. The tokens may be assigned weights according to each token's appearance in a particular one of the multiple ranked hypotheses”), and 
display of the master transcript on an interface through which the master transcript is alterable ([0401]-[0407] an example of Error map which is displayed with word misalignment; [0474][475] “the detector 1720 … identifying key words, and/or phrases … provide an indication of the identified key words and/or phrases in the transcription that may be adjusted and the type of adjustment”).
Regarding claim 9, Thomson discloses the system of claim 8, and Thomson further discloses:
wherein each n-tuple is representative of a sequence of the n interpretations of the corresponding word ([0357]-[0360] an example of a sequence of interpretation of word tokens from each of the ASR systems). 
Regarding claim 10, Thomson discloses the system of claim 9, and Thomson further discloses:
wherein the n interpretations are ordered in terms of likelihood of being correct as determined based on historical accuracy of the corresponding transcription service ([0171][0229] ASR system uses historical accuracy).
Regarding claim 11, Thomson discloses the system of claim 9, and Thomson further discloses:
wherein the n interpretations are ordered in a predetermined manner based on the corresponding transcription service such that each transcription service occupies a given field in each n-tuple ([0427]-[0429][0457] treating tags as regular tokens for purposes of alignment and assigning weights for tags that are different from weights for other tokens for purposes of alignment and/or voting). 
Regarding claim 12, Thomson discloses the system of claim 8, and Thomson further discloses:
wherein the processor is further configured to: for each of the series of n-tuples, indicate whether the corresponding word was interpreted differently by the different transcription services ([0357]-[0360] parts of interpretation of word tokens from each of the ASR systems was generated differently).
Regarding claim 13, Thomson discloses the system of claim 12, and Thomson further discloses:
wherein said indicating comprises populating a field in each of the series of n-tuples to specify whether identical interpretations were provided by the different transcription services ([0357]-[0360] parts of interpretation of word tokens from each of the ASR systems was generated identically).

Regarding claim 14, Thomson discloses the system of claim 13, and Thomson further discloses:
wherein words with identical interpretations across the n interpretations are deemed to be properly interpreted, and wherein words with dissimilar interpretations across the n interpretations are deemed to be improperly interpreted ([0401]-[0407] an example of Error map which is displayed with word misalignment; [0474][475] “the detector 1720 … identifying key words, and/or phrases … provide an indication of the identified key words and/or phrases in the transcription that may be adjusted and the type of adjustment”).
Regarding claim 15, Thomson discloses the system of claim 12, and Thomson further discloses:
wherein the processor is further configured to: identify a word for which the interpretation is not identical across the n interpretations, and indicate, on the interface, a type of issue responsible for the nonidentical interpretation of the word ([0401]-[0407] an example of Error map which is displayed with word misalignment).
Regarding claim 16, Thomson discloses the system of claim 15, and Thomson further discloses:
wherein the type of issue is established based on an analysis of the corresponding n-tuple in the series of n-tuples ([0401]-[0407] an example of Error map which is displayed with word misalignment).
Regarding claim 17, Thomson discloses the system of claim 15, and Thomson further discloses:
 wherein the type of issue is misinterpretation of a non- speech utterance, substitution of an acronym, mispronunciation of an acronym, or misuse of an acronym ([0373][0374] the error type is a missing token from a token group; [0992] a data analyzer may use the feature from a punctuated term list 1512, which may include a list of abbreviations and acronyms).
Regarding claim 18, Thomson a method comprising: 
obtaining, by a processor, multiple transcripts that are associated with an audio file, wherein each of the multiple transcripts is representative of an interpretation of the audio file by a different transcription service (Fig. 13, [0333]-[0336] the ASR systems 1320 may include multiple ASR system (ASR 1 – ASR n) and receive audio input separately; [0344] a Fuser 1324 acquires multiple transcripts which are generated by each of the ASR system (ASR 1 – ASR n)); 
deriving, by the processor, a master transcript for the audio file by comparing on the multiple transcripts on a per-word basis (Figs. 13 and 14, [0345]-[0368] a series of tokens are aligned uttered in the audio input; [0187][0343]-[0345] outputting transcription 1410 which may performed by a fuser); and 
posting, by the processor, the master transcript to an interface, 
wherein words for which the interpretation is not identical across the multiple transcripts are visually distinguishable from words for the interpretation is identical across the multiple transcripts ([0401]-[0407] an example of Error map which is displayed with word misalignment; [0474][475] “the detector 1720 … identifying key words, and/or phrases … provide an indication of the identified key words and/or phrases in the transcription that may be adjusted and the type of adjustment”).
Regarding claim 19, Thomson discloses the method of claim 18, and Thomson further discloses:
forwarding, by the processor, separate copies of the audio file to each of multiple transcription services via a corresponding application programming interface (Fig. 13, [0333]-[0336] the ASR systems 1320 may include multiple ASR system (ASR 1 – ASR n) and receive audio input separately);
wherein said obtaining comprises receiving, from each of the multiple transcription services, one of the multiple transcripts via the corresponding application programming interface (Figs. 13 and 14, [0333]-[0336][0344] a Fuser 1324 acquires multiple transcripts which are generated by each of the ASR system (ASR 1 – ASR n)).
Regarding claim 20, Thomson discloses the method of claim 18, and Thomson further discloses:
receiving, by the processor, input indicative of a modification to the master transcript; and altering, by the processor in response to said receiving, the audio file to reflect the modification to the master transcript ([0169][0184][0407] ASR system includes modification to transcription).
	Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEONG-AH A. SHIN whose telephone number is (571)272-5933. The examiner can normally be reached 9 AM-3PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Seong-ah A. Shin
Primary Examiner
Art Unit 2659



/SEONG-AH A SHIN/           Primary Examiner, Art Unit 2659