Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 4, 5 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Thomson (US 20200175961 A1) in further view of Ohana (US 20190341023 A1). 
With respect to claim 1 and Ohana teaches A method comprising: 
[[receiving, by a processor, input indicative of a selection of an audio file;  
retrieving, by the processor, the audio file from a storage medium]]
forwarding, by the processor, a first copy of the audio file ([0105] In some embodiments, each of the first and second devices 104 and 106 may be configured to obtain audio during a communication session…Furthermore, the term “audio” may be used generically to include audio in any format, such as a digital format…) to a first transcription service via a first application programming interface ([0333] Additionally or alternatively, the ASR systems 1320 may be part of different API services, such as services provided by different vendors. [0161] For example, in some embodiments, providing the transcriptions by the transcription system 108 may be described as a transcription service), and a second copy of the audio file ([0107] Alternatively or additionally, the second device 106 may provide the second audio to the first device 104.) to a second transcription service via a second application programming interface ([0333] Additionally or alternatively, the ASR systems 1320 may be part of different API services, such as services provided by different vendors.) 
receiving, by the processor, a first transcript from the first transcription service via the first application programming interface, and a second transcript from the second transcription service via the second application programming interface ([0335] FIG. 13 also illustrates a fuser 1324. In some embodiments, the fuser 1324 may be configured to merge the transcriptions generated by the ASR systems 1320 to create a fused transcription. In some embodiments, the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription. [0333] Additionally or alternatively, the ASR systems 1320 may be part of different API services, such as services provided by different vendors); 
generating, by the processor, a master transcript based on the first and second transcripts ([0335] FIG. 13 also illustrates a fuser 1324. In some embodiments, the fuser 1324 may be configured to merge the transcriptions generated by the ASR systems 1320 to create a fused transcription. In some embodiments, the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription. Additionally or alternatively, the fuser 1324 may generate multiple transcriptions.); 
identifying, by the processor, a misaligned segment of the master transcript, wherein the misaligned segment corresponds to a portion of the audio file for which the first and second transcription services have different interpretations ([0357] In this example, a speaker may say “OK, let's meet at four.” During the transcription generation process 1402, three different ASR systems (e.g., ASR systems 1320 of FIG. 13) may each generate one of the below hypotheses: [0358] 1. OK, let's meet more. [0359] 2. OK, says meet at 4:00. [0360] 3. OK, ha let's meet at far.); and 
causing, by the processor, display of the master transcript ([0095] In some embodiments, systems and methods in this disclosure may be configured to combine or fuse multiple transcriptions into a single transcription that is provided to a device for display to a user.) in such a manner that the misaligned segment is visually distinguishable from the remainder of the master transcript ([0159] Differences determined by the fuser 124 may be determined to be errors in the third transcription. Corrections of the errors may be provided to the first device 104 for correcting the third transcription being presented by the first device 104. Corrections may be marked in the presentation by the first device 104 in any manner of suitable methods including, but not limited to, highlighting, changing the font, or changing the brightness of the text that is replaced.)
Thompson does not explicitly disclose but Ohana receiving, by a processor, input indicative of a selection of an audio file ([0022] The speech-to-text processing engine 110 may select which audio file stored within the audio file database 210 is to be transcribed.); 
retrieving, by the processor, the audio file from a storage medium ([0022] The speech-to-text processing engine 110 may select which audio file stored within the audio file database 210 is to be transcribed.);
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson in view of Ohana, to retrieve the audio file from a storage medium in order to address customer's needs using the transcriptions of communications. ([0045], Ohana).

With respect to claim 4 Thomson teaches wherein said identifying comprises: populating a data structure representative of the master transcript based on analysis of the first and second transcripts, wherein each entry is the data structure indicates whether a corresponding word was interpreted differently by the first and second transcription services transcript ([0095] In some embodiments, systems and methods in this disclosure may be configured to combine or fuse multiple transcriptions into a single transcription that is provided to a device for display to a user ([0159] Differences determined by the fuser 124 may be determined to be errors in the third transcription. Corrections of the errors may be provided to the first device 104 for correcting the third transcription being presented by the first device 104. Corrections may be marked in the presentation by the first device 104 in any manner of suitable methods including, but not limited to, highlighting, changing the font, or changing the brightness of the text that is replaced.) [both highlighting and display imply an underlying data structure). 

With respect to claim 5 Thomson teaches wherein words with identical interpretations in the first and second transcripts are deemed to be properly interpreted ([0572] In some embodiments, the scorer 2216 may include an aligner 2204 configured to align two or more transcriptions in a manner that reduces the number of differences between similar tokens in the transcriptions.), and 
wherein words with dissimilar interpretations in the first and second transcripts are deemed to be improperly interpreted ([0572] In some embodiments, the scorer 2216 may include an aligner 2204 configured to align two or more transcriptions in a manner that reduces the number of differences between similar tokens in the transcriptions.). 

With respect to claim 6 Thomson does not explicitly recite but Ohana teaches wherein the storage medium is accessible to the processor via a network (Ohana: [0052] One skilled in the art will also appreciate that databases, systems, devices, servers or other components of the disclosed systems or machines may consist of any combination thereof at a single location or at multiple locations, wherein each database, system or machine may include of suitable security features, such as firewalls, access codes, encryption, decryption, compression, decompression, and/or the like. The special purpose systems, networks and/or computers discussed herein may provide a suitable website or other Internet-based graphical user interface which is accessible by users.). 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson in view of Ohana, to retrieve the audio file from a storage medium in order to address customer's needs using the transcriptions of communications. ([0045], Ohana).

Claims 7, 9, 15 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Thomson (US 20200175961 A1) in view of Moritz (US 20210183373 A1)

With respect to claim 7 Thomson teaches A non-transitory computer-readable medium with instructions stored thereon that, when executed by a processor of a computing device ([0217] In these and other embodiments, the method 300 may be performed based on the execution of instructions stored on one or more non-transitory computer-readable media. Although illustrated as discrete blocks, various blocks may be divided into additional blocks, combined into fewer blocks, or eliminated, depending on the desired implementation. [0272] Another optional component of the ASR system 520, for example, may be a domain-specific processor for application-specific needs such as address recognition,), cause the computing device to perform operations comprising determining that a master transcript is to be generated for an audio file ([0095] Each of the automatic ASR system and the revoicing system may generate a transcription of audio of a communication session. The transcriptions from each of the automatic ASR system and the revoicing system may be fused together to generate a finalized transcription that may be provided to a device for display.); 
forwarding a separate copy of the audio file to each of multiple transcription services via a corresponding application programming interface (Thomson: ([0105] In some embodiments, each of the first and second devices 104 and 106 may be configured to obtain audio during a communication session…Furthermore, the term “audio” may be used generically to include audio in any format, such as a digital format…[0333] Additionally or alternatively, the ASR systems 1320 may be part of different API services, such as services provided by different vendors. [0161] For example, in some embodiments, providing the transcriptions by the transcription system 108 may be described as a transcription service. [0107] Alternatively or additionally, the second device 106 may provide the second audio to the first device 104.); 
acquiring multiple transcripts by obtaining a separate transcript from each of the multiple transcription services via the corresponding application programming interface ([0335] FIG. 13 also illustrates a fuser 1324. In some embodiments, the fuser 1324 may be configured to merge the transcriptions generated by the ASR systems 1320 to create a fused transcription. In some embodiments, the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription. [0333] Additionally or alternatively, the ASR systems 1320 may be part of different API services, such as services provided by different vendors); 
deriving the master transcript by comparing the multiple transcripts on a per-word basis ([0335] FIG. 13 also illustrates a fuser 1324. In some embodiments, the fuser 1324 may be configured to merge the transcriptions generated by the ASR systems 1320 to create a fused transcription. In some embodiments, the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription. [0341] The process 1400, generally, may include generating transcriptions of audio and fusing the transcriptions of the audio. For example, the process 1400 may include a transcription generation process 1402, denormalize text process 1404, align text process 1406 [per-word basis], voting process 1408, normalize text process 1409, and output transcription process 1410.) 
Thomson does not explicitly recite but Moritz teaches storing the master transcript in a storage medium (Moritz: [0125] In some embodiments, the output interface 439 can display the transcription outputs on a display device 441, store the transcription outputs into storage medium and/or transmit the transcription outputs over the network 407.). 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson in view of Moritz, to store the master transcript in a storage medium in order to connect the system to an external device for performing various tasks. ([0125], Moritz).

With respect to claim 9 Thomson teaches identifying a word for which the interpretation is not identical across the multiple transcripts (Thomson: [0159] Differences determined by the fuser 124 may be determined to be errors in the third transcription. Corrections of the errors may be provided to the first device 104 for correcting the third transcription being presented by the first device 104. Corrections may be marked in the presentation by the first device 104 in any manner of suitable methods including, but not limited to, highlighting, changing the font, or changing the brightness of the text that is replaced.), and 
posting the master transcript to an interface for review (0095] In some embodiments, systems and methods in this disclosure may be configured to combine or fuse multiple transcriptions into a single transcription that is provided to a device for display to a user), wherein the word is visually distinguishable from words for which the interpretation is identical across the multiple transcripts (([0159] Differences determined by the fuser 124 may be determined to be errors in the third transcription. Corrections of the errors may be provided to the first device 104 for correcting the third transcription being presented by the first device 104. Corrections may be marked in the presentation by the first device 104 in any manner of suitable methods including, but not limited to, highlighting, changing the font, or changing the brightness of the text that is replaced.).

With respect to claim 15 Thomson teaches A method comprising: 
forwarding, by a media production platform, separate copies of an audio file to a pair of transcription services (Thomson: ([0105] In some embodiments, each of the first and second devices 104 and 106 may be configured to obtain audio during a communication session…Furthermore, the term “audio” may be used generically to include audio in any format, such as a digital format…[0333] Additionally or alternatively, the ASR systems 1320 may be part of different API services, such as services provided by different vendors. [0161] For example, in some embodiments, providing the transcriptions by the transcription system 108 may be described as a transcription service. [0107] Alternatively or additionally, the second device 106 may provide the second audio to the first device 104.); 
receiving, by the media production platform from the pair of transcription services, a pair of transcripts for the audio file, wherein each of the pair of transcripts is representative of an interpretation of the audio file by the corresponding transcription service (0335] FIG. 13 also illustrates a fuser 1324. In some embodiments, the fuser 1324 may be configured to merge the transcriptions generated by the ASR systems 1320 to create a fused transcription. In some embodiments, the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription. [0333] Additionally or alternatively, the ASR systems 1320 may be part of different API services, such as services provided by different vendors); 
deriving, by the media production platform, a master transcript from the pair of transcripts ([0335] FIG. 13 also illustrates a fuser 1324. In some embodiments, the fuser 1324 may be configured to merge the transcriptions generated by the ASR systems 1320 to create a fused transcription. In some embodiments, the fused transcription may include an accuracy that is improved with respect to the accuracy of the individual transcriptions combined to generate the fused transcription. [0333] Additionally or alternatively, the ASR systems 1320 may be part of different API services, such as services provided by different vendors); and 
posting, by the media production platform, the master transcript to an interface ([0095] In some embodiments, systems and methods in this disclosure may be configured to combine or fuse multiple transcriptions into a single transcription that is provided to a device for display [posting] to a user.), wherein words for which the interpretation is not identical in the pair of transcripts are visually distinguishable from words for which the interpretation is identical in the pair of transcripts ([0159] Differences determined by the fuser 124 may be determined to be errors in the third transcription. Corrections of the errors may be provided to the first device 104 for correcting the third transcription being presented by the first device 104. Corrections may be marked in the presentation by the first device 104 in any manner of suitable methods including, but not limited to, highlighting, changing the font, or changing the brightness of the text that is replaced.) 
Thomson does not explicitly recite but Moritz teaches posting, by the media production platform, the master transcript to an interface (Moritz: [0125] In some embodiments, the output interface 439 can display the transcription outputs on a display device 441, store the transcription outputs into storage medium and/or transmit the transcription outputs over the network 407.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson in view of Moritz, to post by the media production platform, the master transcript to an interface in order to connect the system to an external device for performing various tasks. ([0125], Moritz).

With respect to claim 16 Thomson teaches wherein each copy of the audio file is forwarded via an application programming interface that handles communication between the media production platform and the corresponding transcription service ([0333] Additionally or alternatively, the ASR systems 1320 may be part of different API services, such as services provided by different vendors). 


Claims 2 and 3 are rejected under 35 U.S.C. 103 as being unpatentable over Thomson and Ohana as applied to claim 1, and in further view of Skarbovsky (US 20180143956 A1)

With respect to claim 2 Thomson and Ohana do not explicitly disclose but Skarbovsky teaches further comprising: embedding, by the processor, one or more suggested replacements in the master transcript proximate to the misaligned segment ([0064] When a suggested text item 310 is selected from the replacement interface 250, the suggested text item 310 will replace the selected text item 240 in the captioning 220 and the transcript, and the confidence assigned to the suggested text item 310 and the former selected text item 240 will be adjusted upward and downward accordingly to affect future speech to text conversions.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson and Ohana in view of Skarbovsky, to embed one or more suggested replacements in the master transcript proximate to the misaligned segment in order for correction of the transcript, so that the transcript accessed after the event may optionally include the corrections. ([0005], Skarbovsky).


With respect to claim 3, and Thomson and Ohana do not explicitly recite but Skarbovsky teaches further comprising: receiving, by the processor, second input indicative of a selection of a given suggested replacement from amongst the one or more suggested replacements for the misaligned segment ([0064] When a suggested text item 310 is selected from the replacement interface 250 [second input is the GUI interface shown in Fig. 3B], the suggested text item 310 will replace the selected text item 240 in the captioning 220 and the transcript, and the confidence assigned to the suggested text item 310 and the former selected text item 240 will be adjusted upward and downward accordingly to affect future speech to text conversions.); and 
replacing, by the processor in response to receiving the second input, the misaligned segment with the given suggested replacement in the master transcript ([0064] When a suggested text item 310 is selected from the replacement interface 250, the suggested text item 310 will replace the selected text item 240 in the captioning 220 and the transcript, and the confidence assigned to the suggested text item 310 and the former selected text item 240 will be adjusted upward and downward accordingly to affect future speech to text conversions.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson and Ohana in view of Skarbovsky, to receive second input indicative of a selection of a given suggested replacement from amongst the one or more suggested replacements for the misaligned segment in order for correction of the transcript, so that the transcript accessed after the event may optionally include the corrections ([0005], Skarbovsky).


Claims 8 rejected under 35 U.S.C. 103 as being unpatentable over Thomson, and Moritz as applied to claim 7 in further view of Ohana.

With respect to claim 8, Thomson and Moritz do not explicitly disclose but Ohana teaches wherein said determining comprises establishing that input indicative of a selection of the audio file has been received (Ohana: [0022] The secure computing environment 102 also includes the speech-to-text processing engine 110. The speech-to-text processing engine 110 may select which audio file stored within the audio file database 210 is to be transcribed.) 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson and Moritz in view of Ohana, such that the determining comprises establishing that input indicative of a selection of the audio file has been received in order to address customer's needs using the transcriptions of communications. ([0045], Ohana).

Claims 10 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Thomson, and Moritz as applied to claim 9 in further view of Howard (US 20210056960 A1).

With respect to claim 10 Thomson and Moritz do not explicitly disclose but Howard teaches wherein the operations further comprise: 
indicating, on the interface, a type of issue responsible for the nonidentical interpretation of the word [(0023] After the transcriptions and scores 110 are obtained by the ASR transcription and scoring module 108, the transcriptions and scores 110 are provided to an NLU interpretation and scoring module 112 that performs NLU interpretation of both of the ASR transcriptions. As illustrated in FIG. 1, the NLU interpretation and scoring module 112 implements two different domains to interpret the two different transcriptions. The domains [type of issue] can be selected by the NLU interpretation and scoring module 112 or some other manner based on the contents of the transcriptions or based on other factors that would be apparent to a person of ordinary skill in the art. As illustrated, the NLU interpretation and scoring module 112 implements a recipes domain 114 to interpret the phrase “how to make whirled peas” using particular grammars assigned to the recipes domain 114 and the NLU interpretation and scoring module 112 implements a politics domain 116 to interpret the phrase “how to make world peace” using particular grammars assigned to the politics domain 116. The result of the NLU interpretations are illustrated as interpretations and scores 118.) 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson and Moritz in view of Howard to indicate, on the interface, a type of issue responsible for the nonidentical interpretation of the word in order to increase the accuracy and performance of the speech recognition and natural language understanding ([0073], Howard).

With respect to claim 11 Thomson further teaches wherein the type of issue is misinterpretation of a non-speech utterance, substitution of an acronym, mispronunciation of an acronym, or misuse of an acronym ([0112] In these and other embodiments, the ASR system may include the computer system. In some embodiments, the transcription of the audio generated by the ASR systems may include capitalization, punctuation, and non-speech sounds. The non-speech sounds may include, background noise, vocalizations such as laughter, filler words such as “um,” and speaker identifiers such as “new speaker,” among others.). 

Claims 12, 13 rejected under 35 U.S.C. 103 as being unpatentable over Thomson, and Moritz as applied to claim 7 and 12 in further view of Nelson (US 20190108834 A1.)

With respect to claim 12 Thomson and Moritz do not explicitly disclose but Nelson teaches wherein the operations further comprise: receiving input indicative of a selection of the multiple transcription services (Nelson: [0274] The selected translation/transcription services are selected in a manner to provide the most accurate results for particular text or audio data, and the translation/transcription services that are selected may be different for each set of text and audio data. [0275] FIG. 12 depicts example data that may be included in selection data 1152. In the examples depicted in FIG. 12, the labels “S1”, “S2”, and “S3” refer to three different translation/transcription services, such as translation/transcription services 1170, 1180, 1190, but data may be provided for any number of translation/transcription services [the input selection is shown in table]). 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson and Moritz in view of Nelson to receive input indicative of a selection of the multiple transcription services in order to reduce the amount of computational resources and/or time required to perform the translation/transcription. ([0291], Nelson).

With respect to claim 13 Thomson and Moritz do not explicitly disclose but Nelson teaches wherein the operations further comprise: for each of the multiple transcription services, initiating, in response to said receiving, a connection with the corresponding application programming interface (Nelson: [0267] Translation/transcription services 1130 include services with the capability to translate text data from one language to one or more other languages, transcribe audio data into text, or both. Translation/transcription services 1130 may be implemented, for example, as Web applications or other processes on servers or other networking elements and translation/transcription services 1130 may support one or more application programming interfaces (APIs).). 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson and Moritz in view of Nelson to for each of the multiple transcription services, initiate, in response to said receiving, a connection with the corresponding application programming interface in order to reduce the amount of computational resources and/or time required to perform the translation/transcription ([0291], Nelson).

Claims 14 rejected under 35 U.S.C. 103 as being unpatentable over Thomson, and Moritz and Nelson as applied to 12 in further view of Vozila (US 20190272902 A1) and Han (US 11138970 B1).

With respect to claim 14 Thomson, Moritz and Nelson do not explicitly disclose but Vozila teaches receiving input indicative of a selection of a portion of the master transcript ([0098] The use of dedicated hardware (e.g., a peripheral device) may be utilized by ACD process 10 to improve editing efficiency for medical transcriptionists/physicians, etc. to help navigate and browse (e.g., sentence by sentence) through the draft medical report, the associated conversation transcript excerpt, and/or associated audio file, annotated (e.g., highlighted, bolded, etc.) with its corresponding audio cued up for easy playback.); 
identifying a portion of the audio file that corresponds to the selected portion of the master transcript ([0098] The use of dedicated hardware (e.g., a peripheral device) may be utilized by ACD process 10 to improve editing efficiency for medical transcriptionists/physicians, etc. to help navigate and browse (e.g., sentence by sentence) through the draft medical report, the associated conversation transcript excerpt, and/or associated audio file, annotated (e.g., highlighted, bolded, etc.) with its corresponding audio cued up for easy playback.); 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson, Moritz and Nelson in view of Vozila to receive input indicative of a selection of a portion of the master transcript in order to improve editing efficiency for transcriptionists ([0098], Vozila).

Thomson, Moritz, Nelson and Vozila do not explicitly recite but Han teaches forwarding the portion of the audio file to a transcription service that is not one of the multiple transcription services (Col 1, ll The system sends the modified audio recording and the extracted audio clips to separate transcription services.) Note: Thomson teaches multiple transcription services. Han teaches the third transcription service. In the Broad Reasonable Interpretation, satisfies the third transcription services.

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson, Moritz, Nelson and Vozila in view of Han to forward the portion of the audio file to a transcription service in order to create a complete transcription of an audio recording from separately transcribed words, thus enabling a company to protect the privacy (Col 1 ll 32-35 Han).

Claims 17 and 18 is rejected under 35 U.S.C. 103 as being unpatentable over Thomson, as applied to claims 15, 17 respectively in further view of Vozila and Han.

With respect to claim 17 Thomson does not explicitly disclose but Vozila teaches 
receiving, by the media production platform, input indicative of a selection of a portion of the master transcript (Vozila : [0098] The use of dedicated hardware (e.g., a peripheral device) may be utilized by ACD process 10 to improve editing efficiency for medical transcriptionists/physicians, etc. to help navigate and browse (e.g., sentence by sentence) through the draft medical report, the associated conversation transcript excerpt, and/or associated audio file, annotated (e.g., highlighted, bolded, etc.) with its corresponding audio cued up for easy playback.; 
identifying, by the media production platform, a portion of the audio file that corresponds to the selected portion of the master transcript (Vozila: [0098] The use of dedicated hardware (e.g., a peripheral device) may be utilized by ACD process 10 to improve editing efficiency for medical transcriptionists/physicians, etc. to help navigate and browse (e.g., sentence by sentence) through the draft medical report, the associated conversation transcript excerpt, and/or associated audio file, annotated (e.g., highlighted, bolded, etc.) with its corresponding audio cued up for easy playback.; 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson in view of Vozila to receive input indicative of a selection of a portion of the master transcript in order to improve editing efficiency for transcriptionists ([0098], Vozila).

Thomson and Vozila do not explicitly recite but Han teaches
forwarding, by the media production platform, the portion of the audio file to a third transcription service (Col 1, ll The system sends the modified audio recording and the extracted audio clips to separate transcription services. The transcription services may be third-party transcription services (i.e., an outside company hired to handle the transcription services) or in-house transcription services (i.e., the company itself performing any transcription services) or a combination of the two, so long as the transcription services assigned to transcribe the confidential information contained in the extracted audio clips have an appropriate level of security that is compliant with any applicable privacy laws.) 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson and Vozila in view of Han to forward the portion of the audio file to a transcription service in order to create a complete transcription of an audio recording from separately transcribed words, thus enabling a company to protect the privacy (Col 1 ll 32-35 Han).

With respect to claim 18 Han further teaches: 
receiving, by the media production platform from the third transcription service, a third transcript for the portion of the audio file (Col 1 ll 66 to Col 2 ll 9) When the system receives the transcribed modified audio recording and the transcribed audio clips back from the separate transcription services, the system combines the two by aligning the word-level timestamps of the transcribed audio clips with the word-level timestamps of the transcribed modified audio recording and inserting the transcriptions of the audio clips into the appropriate locations of the transcription of the modified audio recording so that the combination creates a complete transcription of the original audio recording while enabling the user to protect the privacy of any confidential information during the transcription process.); and 
replacing, by the media production platform, the selected portion of the master transcript with the third transcript (Col 1 ll 66 to Col 2 ll 9) When the system receives the transcribed modified audio recording [master transcript] and the transcribed audio clips back from the separate transcription services, the system combines the two by aligning the word-level timestamps of the transcribed audio clips with the word-level timestamps of the transcribed modified audio recording and inserting the transcriptions of the audio clips into the appropriate locations of the transcription of the modified audio recording so that the combination creates a complete transcription of the original audio recording while enabling the user to protect the privacy of any confidential information during the transcription process.). 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson and Vozila in view of Han to receiving, by the media production platform from the third transcription service, a third transcript for the portion of the audio file in order to create a complete transcription of an audio recording from separately transcribed words, thus enabling a company to protect the privacy (Col 1 ll 32-35 Han).

Claims 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Thomson, Vozila and Han, as applied to claim 17 and 17 respectively, in further view of Nelson (US 20190108834 A1).

With respect to claim 19, Thomson, Vozila and Han do not explicitly disclose but Nelson teaches further comprising: receiving, by the media production platform, second input indicative of a selection of the third transcription service Nelson: ([0274] The selected translation/transcription services are selected in a manner to provide the most accurate results for particular text or audio data, and the translation/transcription services that are selected may be different for each set of text and audio data. [0275] FIG. 12 depicts example data that may be included in selection data 1152. In the examples depicted in FIG. 12, the labels “S1”, “S2”, and “S3” refer to three different translation/transcription services, such as translation/transcription services 1170, 1180, 1190, but data may be provided for any number of translation/transcription services [the input selection is shown in table]). 

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson and Moritz in view of Nelson to receive, by the media production platform, second input indicative of a selection of the third transcription service in order to reduce the amount of computational resources and/or time required to perform the translation/transcription ([0291], Nelson).

With respect to claim 20, Thomson, Vozila and Han do not explicitly disclose but Nelson teaches wherein the third transcription service is automatically identified by the media production platform responsive to receiving the input ([0287] For example, translation/transcription services may be selected randomly [automatically]). 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Thomson Vozila, and Han in view of Nelson wherein the third transcription service is automatically identified by the media production platform responsive to receiving the input in order to reduce the amount of computational resources and/or time required to perform the translation/transcription ([0291], Nelson).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675.  The examiner can normally be reached on Monday-Thursday Alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.   Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ATHAR N PASHA/Examiner, Art Unit 2657     

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657