DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments

Applicant’s arguments with respect to claim(s) 1-7, 9, 17-22 and 25-26 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. The limitation “based on a correlation between first words of the first transcript and second words of a second transcript of the audio portion, determining an updated second transcript that comprises timing information synchronizing the second words, of the second transcript of the audio portion, with the media content” in amended claims 1 and 17, provides new subject matter that required further search and consideration. After a thorough search, however, new prior art was found in view of Phillips.
	Applicant's arguments filed 10/01/2021 regarding the rejection of claims 10, 12-16 and 23-24 have been fully considered but they are not persuasive. 	
Regarding claim 10, applicant argues that Dow makes no determinations about time codes and fails to teach or suggest “determining, based on a first time code associated with the first phrase in the first transcript, a second time code associated with a second phrase, in the .

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-3, 5-7, 9, 17 and 26 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Phillips (US PG Pub 20110239119).

claim 1, Phillips discloses:
	A method comprising: 	determining a first transcript of an audio portion of media content, wherein the first transcript comprises timing information synchronizing first words of the first transcript with the media content (Phillips; Fig. 1, item 102; p. 0016 - The method uses as inputs the original captured time-based media (video and audio, or audio only), a script or transcript, and the dialog audio track of the final program media after completion of the editing. The methods described herein can use as text input any textual script corresponding to the time-based media for which a spot dialog master is being created, whether it is a pre-production script, a transcript of a non-final version of the media, or a transcript of the version of the media that is used to edit together the final program); 	based on a correlation between first words of the first transcript and second words of a second transcript of the audio portion, determining an updated second transcript that comprises timing information synchronizing the second words, of the second transcript of the audio portion, with the media content (Phillips; Fig. 1, item 108; p. 0017 - In the post-production phase, media editing 104 results in possible reordering and elimination of portions of the media that include dialog, or the introduction of additional media, thus introducing a second source of variation between the original script and the dialog corresponding to the final program (second words of a second transcript). Final program dialog audio track 106 is now passed through a time alignment process, such as one involving synchronization of a phonetically indexed dialog audio track of the media with a script or transcript (timing information synchronizing…). In the described embodiment, audio processing step (108) identifies phonemes in the edited media); The operator reviews (step 114) such portions and edits the text entries as needed to convert the script into a final, word-accurate transcript (116) that is used for the spot dialog master, and that corresponds accurately to the final edited media).	As per claim 2, Phillips discloses: 	The method of claim 1, wherein the first words are associated with a first plurality of phonetic elements, and wherein the second words are associated with a second plurality of phonetic elements (Phillips; Fig. 1, item 108; p. 0017 - In the post-production phase, media editing 104 results in possible reordering and elimination of portions of the media that include dialog, or the introduction of additional media, thus introducing a second source of variation between the original script and the dialog corresponding to the final program (second words of a second transcript). Final program dialog audio track 106 is now passed through a time alignment process, such as one involving synchronization of a phonetically indexed dialog audio track of the media with a script or transcript (timing information synchronizing…). In the described embodiment, audio processing step (108) identifies phonemes in the edited media).	As per claim 3, Phillips discloses:	The method of claim 1, wherein the first transcript further comprises location information associated with a sound occurrence, wherein determining the first transcript In the described embodiment, the spot dialog editor outputs an XML document, in which timing information is obtained from the phoneme-based indexing of the media dialog audio track matched with the phonemes derived from the transcript, the character is obtained from a character tag placed in the XML document, and location and scene information are obtained if desired from their corresponding tagged entries).	As per claim 5, Phillips discloses:	The method of claim 1, wherein the first transcript further comprises a sentiment associated with theAmendment dated October 1, 2021 Reply to Office Action of April 1, 2021audio portion, a speaker identification associated with the audio portion, a volume associated with the audio portion, or object information associated with the audio portion (Phillips; p. 0024 - The rapid and semi-automatic generation of an original dialog master list using audio indexing methods also enables the automation and streamlining of various processes and deliverables further downstream in the media creation and distribution process. This is facilitated by including additional metadata into an augmented dialog master document. Such metadata can include video related information including, but not limited to, frame rate, shoot date, camera roll, scene, take, sound roll, production timecode, HD tape name and timecode, SD tapename and sound code, pulldown, KeyKode.RTM., pan and scan parameters, character identification, and user-defined metadata such as comments and descriptions; also see p. 0015), and wherein determining the caption data further comprises modifying, based on the metadata, one or more or formatting associated with the caption data or text associated with the caption data (Phillips; p. 0015 - A spot dialog editing tool and interface then facilitates rapid final editing of the script, and automatically adds the timing and the name of the speaker, as well as other relevant metadata from the program master).

	As per claim 6, Phillips discloses the method of claim 1, wherein determining the updated second transcript comprises at least one of: receiving a second transcript generating, based on audio received via a low-latency transmission path, the second transcript or generating, based on a plurality of transcriber outputs, the second transcript (Phillips; Fig. 1, item 108; p. 0017 - In the post-production phase, media editing 104 results in possible reordering and elimination of portions of the media that include dialog, or the introduction of additional media, thus introducing a second source of variation between the original script and the dialog corresponding to the final program (second words of a second transcript). Final program dialog audio track 106 is now passed through a time alignment process, such as one involving synchronization of a phonetically indexed dialog audio track of the media with a script or transcript (timing information synchronizing…). In the described embodiment, audio processing step (108) identifies phonemes in the edited media).	As per claim 7. Phillips discloses:	The method of claim 2, wherein the correlation between the first words of the first transcript and the second words of the second transcript comprises: a determination of timing information associated with one or more of the first words of the first transcript (Phillips; Fig. 1, item 102; p. 0016 - The method uses as inputs the original captured time-based media (video and audio, or audio only), a script or transcript, and the dialog audio track of the final program media after completion of the editing. The methods described herein can use as text input any textual script corresponding to the time-based media for which a spot dialog master is being created, whether it is a pre-production script, a transcript of a non-final version of the media, or a transcript of the version of the media that is used to edit together the final program); and an association of one or more of the second words of the second transcript with the timing information based on phonetic elements of the one or more of the first words matching phonetic elements of the one or more of the second words (Phillips; Fig. 1, item 108; p. 0017 - The phoneme sequences are matched up, where found, with phonemes that correspond to the original script or a transcript 110, each match being assigned a confidence level based on the quality of the match. The confidence level assigned to each word or phrase of the script or transcript reflects a degree of confidence that the word or phrase has been correctly identified in the dialog audio track of the media, and that there are no discrepancies between the word or phrase from the script or transcript and the audio speech identified in the media. Timing information from final program dialog audio track 106 is added at the points in the script or transcript for which phoneme matching is available. The timing information may be added at a syllable, word, phrase, sentence, or even at a paragraph level of granularity, depending on the level of accuracy desired in the dialog master).

	As per claim 9, Phillips discloses the method of claims 1, wherein the second transcript comprises one of a computer-generated transcript of the audio portion or a human-generated transcript of the audio portion (Phillips; Fig. 1, item 108; p. 0017 - In the post-production phase, media editing 104 results in possible reordering and elimination of portions of the media that include dialog, or the introduction of additional media, thus introducing a second source of variation between the original script and the dialog corresponding to the final program (second words of a second transcript)).	As per claim 17, Phillips discloses:	A system similar to the method of claim 1 and further comprising:	a sending device (Phillips; p. 0032 - The components described herein may be separate modules of a computer program, or may be separate computer programs, which may be operable on separate computers. The data produced by these components may be stored in a memory system or transmitted between computer systems (sending and receiving devices)) comprising:	one or more first processors (Phillips; p. 0032 - The computer system may be a multiprocessor computer system or may include multiple computers connected over a computer network); and 	first memory storing first instructions that, when executed by the one or more first processors (Phillips; p. 0032 - The various elements of the system, either individually or in combination may be implemented as one or more computer program products in which computer program instructions are stored on a computer readable medium for execution by a computer), cause the sending device to… (see claim 1);	a receiving device (Phillips; p. 0032 - The components described herein may be separate modules of a computer program, or may be separate computer programs, which may be operable on separate computers. The data produced by these components may be stored in a memory system or transmitted between computer systems (sending and receiving devices)) comprising: 	one or more second processors (Phillips; p. 0032 - The computer system may be a multiprocessor computer system or may include multiple computers connected over a computer network); and 	second memory storing first instructions that, when executed by the one or more second processors (Phillips; p. 0032 - The various elements of the system, either individually or in combination may be implemented as one or more computer program products in which computer program instructions are stored on a computer readable medium for execution by a computer), cause the receiving device to… (see claim 1).

	As per claim 26, Phillips discloses:	The method of claim 1, further comprising sending the media content and the caption data (Phillips; p. 0032 - The data produced by these components may be stored in a memory system or transmitted between computer systems).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

s 10, 12-14 and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Miller (US PG Pub 20180358052) in view of Dow (US PG Pub 20160372107).		As per claim 10, Miller discloses a method comprising:  	determining, based on the second phrase, caption data associated with the portion of the media content (Miller; Fig. 17, item 1722; p. 0310 - In act 1722, the computer system generates a new media file, and the process ends. The new media file may include the audio description data synchronized with the video data according to the time index); and 	sending the media content and the caption data (Miller; p. 0018 - render, via the display, text from portions of the transcription data in synchrony with the one or more images; receive input identify at least one point within the time index (transmit to display); receive input specifying audio description data to associate with the at least one point; store, in the memory, the audio description data; and store, in the memory, an association between the audio description data and the at least one point (transmit to memory); transmit over a network such as in Fig. 1, item 116 & p. 0047).	Miller, however, fails to disclose determining a first phrase that is present in both a first transcript and a second transcript of an audio portion of media content, wherein the first transcript comprises timing information synchronizing first words of the first transcript with the media content; and determining, based on a first time code associated with the first phrase in the first transcript, a second time code associated with a second phrase, in the second transcript that is absent from the first transcript.	Dow does teach determining a first phrase that is present in both a first transcript and a .
	As per claim 12, Miller discloses the method of claim 10, upon which claim 12 depends.	And further, Dow does teach correlating, based on a determination that a quantity of exact matches between first elements of the first transcript and second elements of a second transcript satisfy a threshold, the first transcript with the second transcript (Dow; p. 0026 - an audio recording of an event may contain multiple recordings in one file. In some embodiments, program 200 determines if transcripts of an audio recording (e.g., transcriptions 142, 144, and 146), which were created from the same event, are available (first and second transcripts). In some examples, program 200 may identify the number of words in a transcript to determine if multiple transcripts are within a similarity threshold for the same number of words, such as 99% (matching elements in first and second transcripts)).
Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Miller to include correlating, based on a determination that a quantity of exact matches between first elements of the first transcript and second elements of a second transcript satisfy a threshold, the first transcript with the second transcript, as taught by Dow, because compared with audio content, a text transcript is searchable, takes up less computer memory, and can be used as an alternate method of communication, such as for closed captions (Dow; p. 0003).


	As per claim 14, Miller in view of Dow discloses the method of claims 1 and 10, wherein the first elements comprise a first plurality of phonetic elements, and wherein the second elements comprise a second plurality of phonetic elements (Miller; p. 0129 - the description engine 138 is configured to convert the entire audio description text into a phoneme sequence using a phonetic dictionary).	As per claim 23, Miller in view of Dow disclose the method of claim 10, upon which claim 23 depends. 	And further, Dow teaches wherein determining the second time code comprises interpolating, based on a determination that the second phrase is between the first phrase and a third phrase, the second time code (Dow; p. 0028 - In one example, two transcripts may have different words or characters corresponding to the same time in the recordings and surrounded by the same words in each recording).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the 
	As per claim 24, Miller in view of Dow disclose the method of claim 10, wherein determining the second time code comprises determining that the first phrase is adjacent to the second phrase in the second transcript (Miller; p. 0013 - the at least one processor may be further configured to render additional text from additional portions of the transcription data adjacent to the portions of the transcription data).

Claims 4 and 18 is rejected under 35 U.S.C. 103 as being unpatentable over Phillips in view of Miller.

As per claim 4, Phillips discloses the method of claim 1, upon which claim 4 depends. 	Phillips, however, fails to disclose wherein determining the caption data further comprises determining metadata indicating a first location for displaying, within the media content, a first caption of the caption data.	Miller does teach wherein determining the caption data further comprises determining metadata indicating a first location for displaying, within the media content, a first caption of the caption data (Miller; p. 0148 - the customer interface 124 is configured to present the .

Claims 21, 22 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Phillips in view of Dow.
	As per claim 21, Phillips discloses the method of claim 1, of correlating the first words of the first transcript with the second words of the second transcript of the audio portion.	Phillips, however, fails to disclose determining an extra word of the second transcript that is not contained in the first transcript; determining an overlapping word that is contained in the second transcript and the first transcript, wherein the overlapping word is adjacent to the second words; and associating, based on a first time code of the overlapping word, the extra word with a second time code of the audio portion.	Dow does teach determining an extra word of the second transcript that is not contained in the first transcript; determining an overlapping word that is contained in the second transcript and the first transcript, wherein the overlapping word is adjacent to the second words; and associating, based on a first time code of the overlapping word, the extra word with a second time code of the audio portion (Dow; p. 0013 - merging program 120 generates confidence levels for one or more transcriptions (e.g., transcriptions 142, 144, and 146) of the same event and determines differences in the transcriptions of the same event; p. 
	As per claim 22, Phillips discloses the method of claim 1, upon which claim 22 depends.
Phillips, however, fails to disclose wherein the correlating is further based on a determination that an average similarity score between non-matching elements of the first and second elements satisfies a second threshold.	Dow does teach wherein the correlating is further based on a determination that an average similarity score between non-matching elements of the first and second elements satisfies a second threshold (Dow; p. 0026 - an audio recording of an event may contain multiple recordings in one file. In some embodiments, program 200 determines if transcripts of 
	As per claim 25, Phillips disclose the system of claim 17, upon which claim 25 depends.	Phillips, however, fails to disclose wherein the first instructions, when executed, cause the sending device to correlate the first words and the second words further based on a determination that an average similarity score between non-matching elements of the first and second elements satisfies a second threshold.	Dow teaches wherein the first instructions, when executed, cause the sending device to correlate the first words and the second words further based on a determination that an average similarity score between non-matching elements of the first and second elements satisfies a second threshold (Dow; p. 0026 - an audio recording of an event may contain multiple recordings in one file. In some embodiments, program 200 determines if transcripts of .
	Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Miller in view of Dow and further in view of Matthews (US PG Pub 20170040017).
	As per claim 15, Miller in view of Dow discloses the method of claim 14, further comprising: converting the second words of the second transcript to a second plurality of phonemes.	Miller in view of Dow, however, fails to disclose wherein correlating the first words of the first transcript with the second words of the second transcript comprises comparing the first plurality of phonemes to the second plurality of phonemes..
	Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Miller in view of Dow and further in view of Karthikeyan (US PG Pub 20180198732).
	As per claim 16, Miller in view of Dow discloses the method of claim 10, upon which claim 16 depends. 	Miller in view of Dow, however, fails to disclose generating the second transcript based on audio received via a low-latency transmission path.	Karthikeyan does teach generating the second transcript based on audio received via a .
	Claims 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Phillips in view of Scott (US PG Pub 20140304019).

As per claim 19, Phillips discloses the system of claim 17, upon which claim 19 depends.	Phillips, however, fails to teach wherein, to display the media content, the second instructions further cause the receiving device to: display a portion of the media content corresponding to an interactive field of view; determine that a first location associated with a first caption is outside the interactive field of view; and display an overlay indicating that the first location is outside the interactive field of view.	Scott does teach wherein, to display the media content, the second instructions further cause the receiving device to: display a portion of the media content corresponding to an interactive field of view; determine that a first location associated with a first caption is outside the interactive field of view; and display an overlay indicating that the first location is outside the interactive field of view (Scott; p. 0077 - In some embodiments, an interface is provided for 
	As per claim 20, Phillips discloses the system of claim 17,upon which claim 20 depends.	Phillips, however, fails to disclose wherein the second instructions, when executed, cause the receiving device to display the media content by displaying and overlay that comprises text of a first caption and an indication that a location of the first caption is outside an interactive field of view.	Scott does teach wherein the second instructions, when executed, cause the receiving device to display the media content by displaying and overlay that comprises text of a first caption and an indication that a location of the first caption is outside an interactive field of view (Scott; p. 0077 - In some embodiments, an interface is provided for the user to modify date/time /location stamping of media items, Audio Captions, Text Captions, and/or 
Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Phillips to include wherein the second instructions, when executed, cause the receiving device to display the media content by displaying and overlay that comprises text of a first caption and an indication that a location of the first caption is outside an interactive field of view, as taught by Scott, in order to give the user a chance to correct mistakes that may happen in the rendering (Scott; p. 0077).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892. 
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 5712727602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/RODRIGO A CHAVEZ/Examiner, Art Unit 2658
/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658