DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 04/06/2022 has been entered.

Response to Arguments
Applicant's arguments filed 04/06/2022 regarding the rejection of claims 1-7, 9-10 and 12-26 have been fully considered but they are not persuasive. 	

Regarding claim 1, applicant argues that Phillips fails to teach “determining a first transcript of an audio portion of media content, wherein the first transcript comprises timing information synchronizing first words of the first transcript with the media content; and based on a correlation between first words of the first transcript and second words of a second transcript of the audio portion, determining an updated second transcript that comprises timing information synchronizing the second words, of the second transcript of the audio portion, with the media content”, as recited in claim 1.
Regarding applicant’s arguments, the examiner respectfully disagrees. The examiner contends that Phillips in p. 0017 shows how the media content can be edited and the new audio track can be further synchronized with a script or transcript. This shows the “second transcript” as claimed. The examiner contends that if the media is edited in a way in which spoken words are changed or eliminated then the script or transcript must be different from the original version. This editing is shown, for example, in Fig. 3 and p. 0020 of Phillips, where he describes that FIG. 3 shows the manual edits performed by the user after listening to the corresponding media. Three of the displayed phrases, 302, 304, and 308, required correction, while the fourth 306, which was matched with high confidence, required no changes. The spot dialog editor can be adjusted to limit the display of script text having a match confidence level below a certain threshold, with the threshold being adjustable by the user. In some embodiments, the spot dialog editor defaults to a mid-level confidence value for the threshold that has been empirically determined to cause the spot dialog editor to display only the script portions that depart enough from the final media program to require an edit. This can correspond to any discrepancy, even a very minor one, that may require a script change, no matter how small. The recitation of a script change denotes that there is a second version of the original script, which would correspond to the claimed “second transcript.” The examiner contends that the “augmented/updated audio track”, as argued in Pg. 3 of Remarks, in some cases, such as the cases described previously in Fig. 3, require that a new script be presented in order to accurately synchronize the script with the “augmented/updated audio track”. Thus, the examiner contends that Phillips fully discloses the limitations as recited in claim 1.

Regarding claim 10, applicant argues that Dow fails to teach or suggest “updating, based on a first time code associated with the first phrase in the first transcript, the second transcript to associate a second time code with a second phrase, in the second transcript, that is absent from the first transcript.”	Regarding applicants arguments, the examiner respectfully disagrees. The examiner contends that Dow does teach updating of the transcription in p. 0030-0031. P. 0030 recites If program 200 determines that differences in the one or more transcriptions exists (yes branch, decision 210), then program 200 identifies information regarding the source of the one or more recordings (step 212). In various embodiments, program 200 may find multiple differences in the one or more similar recordings. In one example, program 200 identifies the first word in transcription 142 as “Hello,” but transcriptions 144 and 146 both have the first word as “Hi.” In this example, program 200 identifies information regarding the recordings related to transcriptions 142, 144, and 146 which include signal strength, volume, voice pattern recognition, white noise detection, etc. In the example, the information may have been previously stored in database 140 by a user of merging program 120, stored as metadata associated with the creator of the recording, or created by merging program 120. In some embodiments, transcriptions may also be associated with a corresponding confidence level. In these embodiments, the confidence levels may be created by weighing factors, such as signal strength, volume, voice pattern recognition, white noise detection, historical accuracy of the recording device, historical accuracy of the transcription device, GSPP, etc. Then cited paragraph of Dow shows the detection of differences between one or more transcriptions, and, in p. 0031, if there are differences, then a new transcript is created based upon the differences in the one or more transcripts (e.g., transcriptions 142, 144, and 146) and the confidence level for the one or more transcripts. Therefore, Miller in view of Dow does teach the claim limitations of claim 10.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-3, 5-7, 9, 17 and 26 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Phillips (US PG Pub 20110239119).

As per claim 1, Phillips discloses:
	A method comprising: 	determining a first transcript of an audio portion of media content, wherein the first transcript comprises timing information synchronizing first words of the first transcript with the media content (Phillips; Fig. 1, item 102; p. 0016 - The method uses as inputs the original captured time-based media (video and audio, or audio only), a script or transcript, and the dialog audio track of the final program media after completion of the editing. The methods described herein can use as text input any textual script corresponding to the time-based media for which a spot dialog master is being created, whether it is a pre-production script, a transcript of a non-final version of the media, or a transcript of the version of the media that is used to edit together the final program); 	based on a correlation between first words of the first transcript and second words of a second transcript of the audio portion, determining an updated second transcript that comprises timing information synchronizing the second words, of the second transcript of the audio portion, with the media content (Phillips; Fig. 1, item 108; p. 0017 - In the post-production phase, media editing 104 results in possible reordering and elimination of portions of the media that include dialog, or the introduction of additional media, thus introducing a second source of variation between the original script and the dialog corresponding to the final program (second words of a second transcript). Final program dialog audio track 106 is now passed through a time alignment process, such as one involving synchronization of a phonetically indexed dialog audio track of the media with a script or transcript (timing information synchronizing…). In the described embodiment, audio processing step (108) identifies phonemes in the edited media); and 	determining, based on the updated second transcript, caption data associated with the media content (Phillips; Fig. 1, item 116; p. 0018 - The operator reviews (step 114) such portions and edits the text entries as needed to convert the script into a final, word-accurate transcript (116) that is used for the spot dialog master, and that corresponds accurately to the final edited media).	As per claim 2, Phillips discloses: 	The method of claim 1, wherein the first words are associated with a first plurality of phonetic elements, and wherein the second words are associated with a second plurality of phonetic elements (Phillips; Fig. 1, item 108; p. 0017 - In the post-production phase, media editing 104 results in possible reordering and elimination of portions of the media that include dialog, or the introduction of additional media, thus introducing a second source of variation between the original script and the dialog corresponding to the final program (second words of a second transcript). Final program dialog audio track 106 is now passed through a time alignment process, such as one involving synchronization of a phonetically indexed dialog audio track of the media with a script or transcript (timing information synchronizing…). In the described embodiment, audio processing step (108) identifies phonemes in the edited media).	As per claim 3, Phillips discloses:	The method of claim 1, wherein the first transcript further comprises location information associated with a sound occurrence, wherein determining the first transcript comprises determining, based on analyzing the audio portion, the location information (Phillips; p. 0023 - In the described embodiment, the spot dialog editor outputs an XML document, in which timing information is obtained from the phoneme-based indexing of the media dialog audio track matched with the phonemes derived from the transcript, the character is obtained from a character tag placed in the XML document, and location and scene information are obtained if desired from their corresponding tagged entries).	As per claim 5, Phillips discloses:	The method of claim 1, wherein the first transcript further comprises a sentiment associated with theAmendment dated October 1, 2021 Reply to Office Action of April 1, 2021audio portion, a speaker identification associated with the audio portion, a volume associated with the audio portion, or object information associated with the audio portion (Phillips; p. 0024 - The rapid and semi-automatic generation of an original dialog master list using audio indexing methods also enables the automation and streamlining of various processes and deliverables further downstream in the media creation and distribution process. This is facilitated by including additional metadata into an augmented dialog master document. Such metadata can include video related information including, but not limited to, frame rate, shoot date, camera roll, scene, take, sound roll, production timecode, HD tape name and timecode, SD tapename and sound code, pulldown, KeyKode.RTM., pan and scan parameters, character identification, and user-defined metadata such as comments and descriptions; also see p. 0015), and wherein determining the caption data further comprises modifying, based on the metadata, one or more or formatting associated with the caption data or text associated with the caption data (Phillips; p. 0015 - A spot dialog editing tool and interface then facilitates rapid final editing of the script, and automatically adds the timing and the name of the speaker, as well as other relevant metadata from the program master).

	As per claim 6, Phillips discloses the method of claim 1, wherein determining the updated second transcript comprises at least one of: receiving a second transcript generating, based on audio received via a low-latency transmission path, the second transcript or generating, based on a plurality of transcriber outputs, the second transcript (Phillips; Fig. 1, item 108; p. 0017 - In the post-production phase, media editing 104 results in possible reordering and elimination of portions of the media that include dialog, or the introduction of additional media, thus introducing a second source of variation between the original script and the dialog corresponding to the final program (second words of a second transcript). Final program dialog audio track 106 is now passed through a time alignment process, such as one involving synchronization of a phonetically indexed dialog audio track of the media with a script or transcript (timing information synchronizing…). In the described embodiment, audio processing step (108) identifies phonemes in the edited media).	As per claim 7. Phillips discloses:	The method of claim 2, wherein the correlation between the first words of the first transcript and the second words of the second transcript comprises: a determination of timing information associated with one or more of the first words of the first transcript (Phillips; Fig. 1, item 102; p. 0016 - The method uses as inputs the original captured time-based media (video and audio, or audio only), a script or transcript, and the dialog audio track of the final program media after completion of the editing. The methods described herein can use as text input any textual script corresponding to the time-based media for which a spot dialog master is being created, whether it is a pre-production script, a transcript of a non-final version of the media, or a transcript of the version of the media that is used to edit together the final program); and an association of one or more of the second words of the second transcript with the timing information based on phonetic elements of the one or more of the first words matching phonetic elements of the one or more of the second words (Phillips; Fig. 1, item 108; p. 0017 - The phoneme sequences are matched up, where found, with phonemes that correspond to the original script or a transcript 110, each match being assigned a confidence level based on the quality of the match. The confidence level assigned to each word or phrase of the script or transcript reflects a degree of confidence that the word or phrase has been correctly identified in the dialog audio track of the media, and that there are no discrepancies between the word or phrase from the script or transcript and the audio speech identified in the media. Timing information from final program dialog audio track 106 is added at the points in the script or transcript for which phoneme matching is available. The timing information may be added at a syllable, word, phrase, sentence, or even at a paragraph level of granularity, depending on the level of accuracy desired in the dialog master).

	As per claim 9, Phillips discloses the method of claims 1, wherein the second transcript comprises one of a computer-generated transcript of the audio portion or a human-generated transcript of the audio portion (Phillips; Fig. 1, item 108; p. 0017 - In the post-production phase, media editing 104 results in possible reordering and elimination of portions of the media that include dialog, or the introduction of additional media, thus introducing a second source of variation between the original script and the dialog corresponding to the final program (second words of a second transcript)).	As per claim 17, Phillips discloses:	A system similar to the method of claim 1 and further comprising:	a sending device (Phillips; p. 0032 - The components described herein may be separate modules of a computer program, or may be separate computer programs, which may be operable on separate computers. The data produced by these components may be stored in a memory system or transmitted between computer systems (sending and receiving devices)) comprising:	one or more first processors (Phillips; p. 0032 - The computer system may be a multiprocessor computer system or may include multiple computers connected over a computer network); and 	first memory storing first instructions that, when executed by the one or more first processors (Phillips; p. 0032 - The various elements of the system, either individually or in combination may be implemented as one or more computer program products in which computer program instructions are stored on a computer readable medium for execution by a computer), cause the sending device to… (see claim 1);	a receiving device (Phillips; p. 0032 - The components described herein may be separate modules of a computer program, or may be separate computer programs, which may be operable on separate computers. The data produced by these components may be stored in a memory system or transmitted between computer systems (sending and receiving devices)) comprising: 	one or more second processors (Phillips; p. 0032 - The computer system may be a multiprocessor computer system or may include multiple computers connected over a computer network); and 	second memory storing first instructions that, when executed by the one or more second processors (Phillips; p. 0032 - The various elements of the system, either individually or in combination may be implemented as one or more computer program products in which computer program instructions are stored on a computer readable medium for execution by a computer), cause the receiving device to… (see claim 1).

	As per claim 26, Phillips discloses:	The method of claim 1, further comprising sending the media content and the caption data (Phillips; p. 0032 - The data produced by these components may be stored in a memory system or transmitted between computer systems).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 10, 12-14 and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Miller (US PG Pub 20180358052) in view of Dow (US PG Pub 20160372107).		As per claim 10, Miller discloses a method comprising:  	determining, based on the second phrase, caption data associated with the portion of the media content (Miller; Fig. 17, item 1722; p. 0310 - In act 1722, the computer system generates a new media file, and the process ends. The new media file may include the audio description data synchronized with the video data according to the time index); and 	sending the media content and the caption data (Miller; p. 0018 - render, via the display, text from portions of the transcription data in synchrony with the one or more images; receive input identify at least one point within the time index (transmit to display); receive input specifying audio description data to associate with the at least one point; store, in the memory, the audio description data; and store, in the memory, an association between the audio description data and the at least one point (transmit to memory); transmit over a network such as in Fig. 1, item 116 & p. 0047).	Miller, however, fails to disclose determining a first phrase that is present in both a first transcript and a second transcript of an audio portion of media content, wherein the first transcript comprises timing information synchronizing first words of the first transcript with the media content; and updating, based on a first time code associated with the first phrase in the first transcript, the second transcript to associate a second time code with a second phrase, in the second transcript, that is absent from the first transcript.	Dow does teach determining a first phrase that is present in both a first transcript and a second transcript of an audio portion of media content, wherein the first transcript comprises timing information synchronizing first words of the first transcript with the media content (Dow; p. 0013 - merging program 120 generates confidence levels for one or more transcriptions (e.g., transcriptions 142, 144, and 146) of the same event and determines differences in the transcriptions of the same event; p. 0025 - transcriptions (e.g., transcriptions 142, 144, and 146) may also be time annotated based on a universally coordinated clock, such as coordinated universal time (UTC) or some other synchronization scheme for relative timekeeping. In some embodiments, some file formats (e.g., extensible markup language (XML)) may represent metadata and textual transcription within a single file instance for processing convenience); and updating, based on a first time code associated with the first phrase in the first transcript, the second transcript to associate a second time code with a second phrase, in the second transcript, that is absent from the first transcript (Dow; p. 0028 - In one example, two transcripts may have different words or characters corresponding to the same time in the recordings and surrounded by the same words in each recording; p. 0030-0031 - If program 200 determines that differences in the one or more transcriptions exists (yes branch, decision 210), then program 200 identifies information regarding the source of the one or more recordings (step 212). In various embodiments, program 200 may find multiple differences in the one or more similar recordings. In one example, program 200 identifies the first word in transcription 142 as “Hello,” but transcriptions 144 and 146 both have the first word as “Hi.” In this example, program 200 identifies information regarding the recordings related to transcriptions 142, 144, and 146 which include signal strength, volume, voice pattern recognition, white noise detection, etc. In the example, the information may have been previously stored in database 140 by a user of merging program 120, stored as metadata associated with the creator of the recording, or created by merging program 120. In some embodiments, transcriptions may also be associated with a corresponding confidence level. In these embodiments, the confidence levels may be created by weighing factors, such as signal strength, volume, voice pattern recognition, white noise detection, historical accuracy of the recording device, historical accuracy of the transcription device, GSPP, etc.).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Miller to include determining a first phrase that is present in both a first transcript and a second transcript of an audio portion of media content, wherein the first transcript comprises timing information synchronizing first words of the first transcript with the media content; and updating, based on a first time code associated with the first phrase in the first transcript, the second transcript to associate a second time code with a second phrase, in the second transcript, that is absent from the first transcript, as taught by Dow, because compared with audio content, a text transcript is searchable, takes up less computer memory, and can be used as an alternate method of communication, such as for closed captions (Dow; p. 0003).
	As per claim 12, Miller discloses the method of claim 10, upon which claim 12 depends.	And further, Dow does teach correlating, based on a determination that a quantity of exact matches between first elements of the first transcript and second elements of a second transcript satisfy a threshold, the first transcript with the second transcript (Dow; p. 0026 - an audio recording of an event may contain multiple recordings in one file. In some embodiments, program 200 determines if transcripts of an audio recording (e.g., transcriptions 142, 144, and 146), which were created from the same event, are available (first and second transcripts). In some examples, program 200 may identify the number of words in a transcript to determine if multiple transcripts are within a similarity threshold for the same number of words, such as 99% (matching elements in first and second transcripts)).
Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Miller to include correlating, based on a determination that a quantity of exact matches between first elements of the first transcript and second elements of a second transcript satisfy a threshold, the first transcript with the second transcript, as taught by Dow, because compared with audio content, a text transcript is searchable, takes up less computer memory, and can be used as an alternate method of communication, such as for closed captions (Dow; p. 0003).

As per claim 13, Miller in view of Dow discloses the method of claims 1 and 10, wherein the first transcript further comprises location information, wherein generating the first transcript comprises generating, based on analyzing the audio portion, the location information (Miller; p. 0148 - the customer interface 124 is configured to present the screen 900 with a supplemental view of the transcript (in its current state) that displays the locations and durations of the describable (given the current threshold configurations) regions, prior to ordering).
	As per claim 14, Miller in view of Dow discloses the method of claims 1 and 10, wherein the first elements comprise a first plurality of phonetic elements, and wherein the second elements comprise a second plurality of phonetic elements (Miller; p. 0129 - the description engine 138 is configured to convert the entire audio description text into a phoneme sequence using a phonetic dictionary).	As per claim 23, Miller in view of Dow disclose the method of claim 10, upon which claim 23 depends. 	And further, Dow teaches wherein determining the second time code comprises interpolating, based on a determination that the second phrase is between the first phrase and a third phrase, the second time code (Dow; p. 0028 - In one example, two transcripts may have different words or characters corresponding to the same time in the recordings and surrounded by the same words in each recording).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Miller to include wherein determining the second time code comprises interpolating, based on a determination that the second phrase is between the first phrase and a third phrase, the second time code, as taught by Dow, because compared with audio content, a text transcript is searchable, takes up less computer memory, and can be used as an alternate method of communication, such as for closed captions (Dow; p. 0003).
	As per claim 24, Miller in view of Dow disclose the method of claim 10, wherein determining the second time code comprises determining that the first phrase is adjacent to the second phrase in the second transcript (Miller; p. 0013 - the at least one processor may be further configured to render additional text from additional portions of the transcription data adjacent to the portions of the transcription data).

Claims 4 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Phillips in view of Miller.

As per claim 4, Phillips discloses the method of claim 1, upon which claim 4 depends. 	Phillips, however, fails to disclose wherein determining the caption data further comprises determining metadata indicating a first location for displaying, within the media content, a first caption of the caption data.	Miller does teach wherein determining the caption data further comprises determining metadata indicating a first location for displaying, within the media content, a first caption of the caption data (Miller; p. 0148 - the customer interface 124 is configured to present the screen 900 with a supplemental view of the transcript (in its current state) that displays the locations and durations of the describable (given the current threshold configurations) regions, prior to ordering).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Phillips to include wherein determining the caption data further comprises determining metadata indicating a first location for displaying, within the media content, a first caption of the caption data, as taught by Miller, in order to increase productivity of audio description professionals via the inventive use of particular arrangements of user interface elements (Miller; p. 0006).		As per claim 18, Phillips discloses the system of claim 17, upon which claim 18 depends.	Phillips, however, fails to disclose wherein the second instructions further cause the receiving device to display the captions of the caption data at corresponding locations, within the media content, indicated by the caption data.	Miller does teach wherein the second instructions further cause the receiving device to display the captions of the caption data at corresponding locations, within the media content, indicated by the caption data (Miller; p. 0148 - the customer interface 124 is configured to present the screen 900 with a supplemental view of the transcript (in its current state) that displays the locations and durations of the describable (given the current threshold configurations) regions, prior to ordering).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Phillips to include wherein the second instructions further cause the receiving device to display the captions of the caption data at corresponding locations, within the media content, indicated by the caption data, as taught by Miller, in order to increase productivity of audio description professionals via the inventive use of particular arrangements of user interface elements (Miller; p. 0006).

Claims 21, 22 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Phillips in view of Dow.
	As per claim 21, Phillips discloses the method of claim 1, of correlating the first words of the first transcript with the second words of the second transcript of the audio portion.	Phillips, however, fails to disclose determining an extra word of the second transcript that is not contained in the first transcript; determining an overlapping word that is contained in the second transcript and the first transcript, wherein the overlapping word is adjacent to the second words; and associating, based on a first time code of the overlapping word, the extra word with a second time code of the audio portion.	Dow does teach determining an extra word of the second transcript that is not contained in the first transcript; determining an overlapping word that is contained in the second transcript and the first transcript, wherein the overlapping word is adjacent to the second words; and associating, based on a first time code of the overlapping word, the extra word with a second time code of the audio portion (Dow; p. 0013 - merging program 120 generates confidence levels for one or more transcriptions (e.g., transcriptions 142, 144, and 146) of the same event and determines differences in the transcriptions of the same event; p. 0025 - transcriptions (e.g., transcriptions 142, 144, and 146) may also be time annotated based on a universally coordinated clock, such as coordinated universal time (UTC) or some other synchronization scheme for relative timekeeping. In some embodiments, some file formats (e.g., extensible markup language (XML)) may represent metadata and textual transcription within a single file instance for processing convenience).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Phillips to include determining an extra word of the second transcript that is not contained in the first transcript; determining an overlapping word that is contained in the second transcript and the first transcript, wherein the overlapping word is adjacent to the second words; and associating, based on a first time code of the overlapping word, the extra word with a second time code of the audio portion, as taught by Dow, because compared with audio content, a text transcript is searchable, takes up less computer memory, and can be used as an alternate method of communication, such as for closed captions (Dow; p. 0003).
	As per claim 22, Phillips discloses the method of claim 1, upon which claim 22 depends.
Phillips, however, fails to disclose wherein the correlating is further based on a determination that an average similarity score between non-matching elements of the first and second elements satisfies a second threshold.	Dow does teach wherein the correlating is further based on a determination that an average similarity score between non-matching elements of the first and second elements satisfies a second threshold (Dow; p. 0026 - an audio recording of an event may contain multiple recordings in one file. In some embodiments, program 200 determines if transcripts of an audio recording (e.g., transcriptions 142, 144, and 146), which were created from the same event, are available (first and second transcripts). In some examples, program 200 may identify the number of words in a transcript to determine if multiple transcripts are within a similarity threshold for the same number of words, such as 99% (matching elements in first and second transcripts)).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Phillips to include wherein the correlating is further based on a determination that an average similarity score between non-matching elements of the first and second elements satisfies a second threshold, as taught by Dow, because compared with audio content, a text transcript is searchable, takes up less computer memory, and can be used as an alternate method of communication, such as for closed captions (Dow; p. 0003).
	As per claim 25, Phillips disclose the system of claim 17, upon which claim 25 depends.	Phillips, however, fails to disclose wherein the first instructions, when executed, cause the sending device to correlate the first words and the second words further based on a determination that an average similarity score between non-matching elements of the first and second elements satisfies a second threshold.	Dow teaches wherein the first instructions, when executed, cause the sending device to correlate the first words and the second words further based on a determination that an average similarity score between non-matching elements of the first and second elements satisfies a second threshold (Dow; p. 0026 - an audio recording of an event may contain multiple recordings in one file. In some embodiments, program 200 determines if transcripts of an audio recording (e.g., transcriptions 142, 144, and 146), which were created from the same event, are available (first and second transcripts). In some examples, program 200 may identify the number of words in a transcript to determine if multiple transcripts are within a similarity threshold for the same number of words, such as 99% (matching elements in first and second transcripts)).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Phillips to include wherein the first instructions, when executed, cause the sending device to correlate the first words and the second words further based on a determination that an average similarity score between non-matching elements of the first and second elements satisfies a second threshold, as taught by Dow, because compared with audio content, a text transcript is searchable, takes up less computer memory, and can be used as an alternate method of communication, such as for closed captions (Dow; p. 0003).
	Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Miller in view of Dow and further in view of Matthews (US PG Pub 20170040017).
	As per claim 15, Miller in view of Dow discloses the method of claim 14, further comprising: converting the second words of the second transcript to a second plurality of phonemes.	Miller in view of Dow, however, fails to disclose wherein correlating the first words of the first transcript with the second words of the second transcript comprises comparing the first plurality of phonemes to the second plurality of phonemes.	Matthews does teach wherein correlating the first words of the first transcript with the second words of the second transcript comprises comparing the first plurality of phonemes to the second plurality of phonemes (Matthews; p. 0035 - redubbing application 140 compares the ordered phoneme list to the dynamic viseme sequence. In some implementations, redubbing application 140 may compare the suggested alternative phrase by testing the ordered phoneme sequence against the graph of the phonemes corresponding to the dynamic viseme sequence).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Miller and Dow to include wherein correlating the first words of the first transcript with the second words of the second transcript comprises comparing the first plurality of phonemes to the second plurality of phonemes, as taught by Matthews, in order to redub a suggested alternative phrase that matches the lip movements of the mouth of the speaker in the video corresponding to the dynamic sequence (Matthews; p. 0035).
	Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Miller in view of Dow and further in view of Karthikeyan (US PG Pub 20180198732).
	As per claim 16, Miller in view of Dow discloses the method of claim 10, upon which claim 16 depends. 	Miller in view of Dow, however, fails to disclose generating the second transcript based on audio received via a low-latency transmission path.	Karthikeyan does teach generating the second transcript based on audio received via a low-latency transmission path (Karthikeyan; p. 0004).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Miller and Dow to include wherein correlating the first words of the first transcript with the second words of the second transcript comprises comparing the first plurality of phonemes to the second plurality of phonemes, as taught by Karthikeyan, in order to prioritize critical network traffic, such as streaming packets or voice-over-IP, across a network (Karthikeyan; p. 0004).
	Claims 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Phillips in view of Scott (US PG Pub 20140304019).

As per claim 19, Phillips discloses the system of claim 17, upon which claim 19 depends.	Phillips, however, fails to teach wherein, to display the media content, the second instructions further cause the receiving device to: display a portion of the media content corresponding to an interactive field of view; determine that a first location associated with a first caption is outside the interactive field of view; and display an overlay indicating that the first location is outside the interactive field of view.	Scott does teach wherein, to display the media content, the second instructions further cause the receiving device to: display a portion of the media content corresponding to an interactive field of view; determine that a first location associated with a first caption is outside the interactive field of view; and display an overlay indicating that the first location is outside the interactive field of view (Scott; p. 0077 - In some embodiments, an interface is provided for the user to modify date/time /location stamping of media items, Audio Captions, Text Captions, and/or Comments, in the event a media item, Audio Caption, Text Caption, or Comment is stored or created significantly out of sequence or in a location not related to the intended location).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Phillips to include wherein, to display the media content, the second instructions further cause the receiving device to: display a portion of the media content corresponding to an interactive field of view; determine that a first location associated with a first caption is outside the interactive field of view; and display an overlay indicating that the first location is outside the interactive field of view, as taught by Scott, in order to give the user a chance to correct mistakes that may happen in the rendering (Scott; p. 0077).
	As per claim 20, Phillips discloses the system of claim 17,upon which claim 20 depends.	Phillips, however, fails to disclose wherein the second instructions, when executed, cause the receiving device to display the media content by displaying and overlay that comprises text of a first caption and an indication that a location of the first caption is outside an interactive field of view.	Scott does teach wherein the second instructions, when executed, cause the receiving device to display the media content by displaying and overlay that comprises text of a first caption and an indication that a location of the first caption is outside an interactive field of view (Scott; p. 0077 - In some embodiments, an interface is provided for the user to modify date/time /location stamping of media items, Audio Captions, Text Captions, and/or Comments, in the event a media item, Audio Caption, Text Caption, or Comment is stored or created significantly out of sequence or in a location not related to the intended location).
Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Phillips to include wherein the second instructions, when executed, cause the receiving device to display the media content by displaying and overlay that comprises text of a first caption and an indication that a location of the first caption is outside an interactive field of view, as taught by Scott, in order to give the user a chance to correct mistakes that may happen in the rendering (Scott; p. 0077).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art made of record and not relied upon includes:	Gardyne (US PG Pub 20190087870) discloses systems, devices, and processes to create a successful and effective personal video commercial through the use of one or more scripts, timecode commands, storyboarding, teleprompting displays, analyzers directed to static defects, eye contact, facial expression, and audio spoken word defects, automated video splicing, and video content and quality scoring, and feedback of the scoring (Gardyne; Abstract)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Rodrigo A Chavez whose telephone number is (571)270-0139.  The examiner can normally be reached on Monday - Friday 9-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 5712727602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/RODRIGO A CHAVEZ/Examiner, Art Unit 2658

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658