DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	This Office Action has been issued in response to Applicant’s Communication of application S/N 16/713,825 filed on December 13, 2019.  Claims 1 to 13 and 15 to 21 are currently pending with the application.
	
Priority
This application has claimed foreign priority under 35 U.S.C. 119 (a)-(d) or (f), 365(a) or (b), or 386(a), of Application No. 1872971, filed on 12/14/2018. Receipt of certified copies of papers as required by 37 CFR 1.55 is acknowledged.

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 12/13/2019, and 11/01/2021 were filed before the mailing date of the first action on the merits.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1 to 13, and 15 to 21 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitations “each word”, “the other content”, “the contents” and “the type of content” in lines 9, 10, 13, and 14, respectively.  There is insufficient antecedent basis for these limitations in the claim. Same rationale applies to claims 15 and 16 since they recite similar limitations.
Claim 1 recites “offer a navigation of said content” in line 13.  It is not clear to which content this limitation is referring to, when the claim has previously recited “a content”, “a first content”, “a second content”, “other content”, and “contents”.  Therefore, this renders the claim indefinite. Same rationale applies to claims 15 and 16 since they recite similar limitations.
Claim 4 recites the limitation “the status of the graphical element” in line 5.  There is insufficient antecedent basis for this limitation in the claim. Same rationale applies to claim 19 since it recites similar limitations.
Claim 13 recites the limitation “the named entities” in line 5.  There is insufficient antecedent basis for this limitation in the claim.
Claim 20 recites the limitation “the non-transitory computer-readable recording medium according to claim 16” in line 1.  There is insufficient antecedent basis for this limitation in the claim.
Same rationale applies to claims 2, 3, 5 to 12, 17 to 19 and 21, since they inherit the same deficiencies by virtue of their dependency on the previously rejected claims.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1 to 11, 13, and 15 to 21 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Larsen et al. (U.S. Publication No. 2012/0236201) hereinafter Larsen.
As to claim 1:
Larsen discloses: 
A method for enabling a spatio-temporal navigation of content, comprising the following acts performed by a device: 
receiving  a content request from a client, the content comprising a first content of a first type [Paragraph 0236 teaches search display GUI includes a search field input box where the user can type any word or phrase to perform a search for content, where the search results will be displayed including the content, e.g., video, audio, text, etc.]; and 
transmitting to the client: the first content [Paragraph 0236 teaches search the search results will be displayed including the content, e.g., video, audio, text, etc., which the user can select]; 
second content of a second type generated from the first content [Paragraph 0021 teaches automatically generating a text transcription of the source video’s audio content];  
synchronization metadata associating each word of one of the contents with a time marker to the other content [Paragraph 0020 teaches distinct segments of the video, audio, and text content may each be mapped to a respective timecode of the timecode synchronization information; Paragraph 0038 teaches generating indexes to synchronize ingested, edited, and displayed audio/video files and streams with other text; Paragraph 0045 teaches integration of multimedia files, streams, and features by linking them through time-based indexes]; and 
a script, for execution at the client, configured to re-establish at least one of the contents and offer a navigation of said content depending on the type of content re-established by using at least one among: the first content, the second content, and the synchronization metadata [Paragraph 0020 teaches presentation of the multimedia content may include maintaining, using at least a portion of the mapped timecode information, presentation synchronization among different portions of content being concurrently displayed; Paragraph 0021 teaches multimedia package includes a video content derived from the source video, an audio content derived from the source video, and a text transcription content representing a transcription of the audio content; Paragraph 0047 teaches combining, video, and transcribed scrolling text that is time-synced to the video; Paragraph 0049 teaches as the video plays, the text scrolls with the video, and if the user scrolls forward or backward in the text file, the video may move to the point in the production that matches that point in the text; Paragraph 0324 teaches displaying a synchronized scrolling transcript of the audio portion of the video which is playing in the Video Player GUI].  

As to claim 2:
Larsen discloses:
the first type of content and the second type of content are distinct types of contents among: a textual content and a media content [Paragraph 0021 teaches multimedia package includes a video content, an audio content, and a text transcription content representing a transcription of the audio content].  

As to claim 3:
Larsen discloses:
if the first content comprises a media content, the second content comprises a text generated from an audio track extracted from the media content and the synchronization metadata comprise first synchronization metadata associating each word of the generated text with a time marker in the audio track [Paragraph 0021 teaches automatically generating a text transcription of the source video’s audio content using speech processing analysis, and automatically generating synchronization data for use in synchronizing distinct chunks of the text transcription with respective distinct chunks of the video content]; 
if the first content comprises a textual content, the second content comprises an audio file generated from the textual content and the synchronization metadata comprise second synchronization metadata associating each word of the textual content with a time marker in the generated audio file [Paragraph 0109 teaches conversion of text into speech, and the association of that text and speech with the time-code and time-base information linked to video/audio files and streams].  

As to claim 4:
Larsen discloses:
display, in a user interface of the client, a graphical element to allow a user to select between the two types of contents [Paragraph 0163 teaches the user can scroll the video/audio, and text, therefore, the interface provides the user with a graphical element which the user can select the two types of content]; and 
re-establish the content based on the status of the graphical element [Paragraph 0163 teaches synchronization of video, audio, and text with other digital media, where, when the user scrolls the video, the text scrolls along with it; and when the user scrolls the text, the video or audio stays in synch].  

As to claim 5:
Larsen discloses: 
the graphical element allows the user to select between a text view mode and a media view mode of the content, and wherein the script is configured to: in response to a selection of the text view mode by the graphical element, if the first content comprises the media content, activate a display of the generated text and of a progress bar of the audio track, the generated text and the progress bar of the audio track being synchronized using the first synchronization metadata [Paragraph 0161 teaches displays include words presented using scrolling text as well as voice-over audio that is synchronized to the visual media]; and 
if the first content comprises the textual content, activate a display of the textual content and of a progress bar of the audio file, the textual content and the progress bar of the audio file being synchronized using the second synchronization metadata [Paragraph 0062 teaches the user may also move the video forward or backward in time, hence, displaying progress bar of audio/video].

As to claim 6:
Larsen discloses:
allows the user to select between a text view mode and a media view mode of the content, and wherein the script is configured to: in response to a selection of the media view mode by the graphical element, if the first content comprises the textual content, activate a display of a progress bar of the audio file [Paragraph 0050 teaches as the video plays, the text scrolls with the video; Paragraph 0062 teaches the user may scroll forward or backward in the text file, and the video may move to the point in the production that matches that point in the text]; and 
if the first content comprises the media content, activate a display of a progress bar of the media content and of at least one image of the media content [Paragraph 0062 teaches the user may also move the video forward or backward in time].  

As to claim 7:
Larsen discloses:
wherein when the text view mode is selected and if the first content comprises the media content, the script is configured to: in response to a scrolling of the generated text, activate a synchronous displacement of a current playback position indicator of the progress bar of the audio track based on the first synchronization metadata [Paragraph 0062 teaches if the user scrolls forward or backward in the text file, the video may move to the point in the production that matches that point in the text; Paragraph 0253 teaches when a user flicks (e.g., up or down), a portion of the touchscreen displaying the text of the Resources Display GUI, the displayed text may scroll up/down (as appropriate), and synchronized with the scrolling of this text, the associated video displayed in the Video Player GUI (and corresponding audio) may maintain substantial synchronization with the scrolling of the text in the Resources Display GUI]; and 
in response to a movement of the current playback position indicator of the progress bar of the audio track, activate a synchronous scrolling of the generated text based on the first synchronization metadata [Paragraph 0062 teaches the user may also move the video forward or backward in time, and the text may automatically scroll to that point in the production that matches the same point in the video; Paragraph 0163 teaches let the video or audio play and the synchronized text scrolls along with the words being said; Paragraph 0251 teaches when a user moves the video slider bar, the automated scrolling of the text displayed in the Resources Display GUI stays in sync with the video (& audio) content displayed in the Video Player GUI].

As to claim 8:
Larsen discloses:
wherein when the text view mode is selected and if the first content comprises the textual content, the script is configured to: in response to a scrolling of the textual content, activate a synchronous displacement of a current playback position indicator of the progress bar of the audio file based on the second synchronization metadata [Paragraph 0062 teaches if the user scrolls forward or backward in the text file, the video may move to the point in the production that matches that point in the text; Paragraph 0253 teaches when a user flicks (e.g., up or down), a portion of the touchscreen displaying the text of the Resources Display GUI, the displayed text may scroll up/down (as appropriate), and synchronized with the scrolling of this text, the associated video displayed in the Video Player GUI (and corresponding audio) may maintain substantial synchronization with the scrolling of the text in the Resources Display GUI]; and 
in response to a movement of the current position indicator of the progress bar of the audio file, activate a synchronous scrolling of the textual content based on the second synchronization metadata [Paragraph 0062 teaches the user may also move the video forward or backward in time, and the text may automatically scroll to that point in the production that matches the same point in the video; Paragraph 0163 teaches let the video or audio play and the synchronized text scrolls along with the words being said; Paragraph 0251 teaches when a user moves the video slider bar, the automated scrolling of the text displayed in the Resources Display GUI stays in sync with the video (& audio) content displayed in the Video Player GUI].  

As to claim 9:
Larsen discloses:
if the first content comprises a media content, extracting the audio track from the media content [Paragraph 0021 teaches automatically generating an audio content derived from the source video];  
generating, from the audio track, said text and first synchronization metadata [Paragraph 0021 teaches a text transcription content representing a transcription of the audio content using speech processing analysis, automatically generating synchronization data for use in synchronizing distinct chunks of the text transcription with respective distinct chunks of the video content; Paragraph 0206 teaches synchronizing the spoken word with the written word along a timeline; Paragraph 0238 teaches words and phrases of the transcribed text relating to the audio content may be analyzed and indexed to facilitate subsequent user searchability]; and 
storing said text and first synchronization metadata [Paragraph 0109 teaches associating of text and speech with the time-code and time-base information linked to video/audio files and streams; Paragraph 0238 teaches words and phrases of the transcribed text relating to the audio content may be analyzed and indexed to facilitate subsequent user searchability, hence, storing the text and synchronization metadata]; and 
if the first content comprises a textual content, generating, from the textual content, the audio file and second synchronization metadata [Paragraph 0109 teaches converting text into speech, and the associating that text and speech with the time-code and time-base information linked to video/audio files and streams; Paragraph 0118 teaches text to speech conversion, metadata tagging, time-stamping, etc.]; and 
storing the audio file and second synchronization metadata [Paragraph 0109 teaches associating of text and speech with the time-code and time-base information linked to video/audio files and streams; Paragraph 0238 teaches words and phrases of the transcribed text relating to the audio content may be analyzed and indexed to facilitate subsequent user searchability, hence, storing the text and synchronization metadata]. 

As to claim 10:
Larsen discloses:
identifying  speakers whose voices are recorded in the audio track or in the audio file [Paragraph 0023 teaches analyzing the audio portion of the source video to automatically identify different vocal characteristics relating to voices of one or more different persons speaking in the audio portion of the source video]; and 
generating an index of said identified speakers [Paragraph 0023 teaches identifying and associate selected portions of the text transcription with a particular voice identified in the audio portion of the source video; Paragraph 0478 teaches create a searchable index which may be used for identifying one or more video segments which are narrated by a specified person or speaker].  

As to claim 11:
Larsen discloses:
extracting a text from the textual content [Paragraph 0020 teaches distinct segments of the video, audio, and text content may each be mapped to a respective timecode of the timecode synchronization information; Paragraph 0238 teaches analyzing and indexing text-related content, e.g., words, phrases, characters, numbers, etc., hence, extracting text]; and 
generating the audio file and second synchronization metadata based on the extracted text [Paragraph 0109 teaches converting text into speech, and the association of that text and speech with the time-code and time-base information linked to video/audio files and streams]. 
  
As to claim 13:
Larsen discloses:
performing a lexical segmentation of the generated text or of the textual content to determine one or more element(s) among: an index of thematic segments, an index of keywords and an index of the named entities [Paragraph 0237 teaches dynamically analyze and index (e.g., for subsequent searchability) different types of characteristics, criteria, properties, etc. that relate to (or are associated with) the video, audio, and textual components].  

As to claim 15:
Larsen discloses: 
A non-transitory computer-readable recording medium on which is recorded a computer program including instructions, which when executed by a device configure the device to perform a method enabling a spatio-temporal navigation of content, the instructions configuring the device to:
receive a content request from a client, the content comprising a first content of a first type [Paragraph 0236 teaches search display GUI includes a search field input box where the user can type any word or phrase to perform a search for content, where the search results will be displayed including the content, e.g., video, audio, text, etc.]; and 
transmit to the client: the first content [Paragraph 0236 teaches search the search results will be displayed including the content, e.g., video, audio, text, etc., which the user can select]; 
second content of a second type generated from the first content [Paragraph 0021 teaches automatically generating a text transcription of the source video’s audio content];  
synchronization metadata associating each word of one of the contents with a time marker to the other content [Paragraph 0020 teaches distinct segments of the video, audio, and text content may each be mapped to a respective timecode of the timecode synchronization information; Paragraph 0038 teaches generating indexes to synchronize ingested, edited, and displayed audio/video files and streams with other text; Paragraph 0045 teaches integration of multimedia files, streams, and features by linking them through time-based indexes]; and 
a script, for execution at the client, configured to re-establish at least one of the contents and offer a navigation of said content depending on the type of content re-established by using at least one among: the first content, the second content, and the synchronization metadata [Paragraph 0020 teaches presentation of the multimedia content may include maintaining, using at least a portion of the mapped timecode information, presentation synchronization among different portions of content being concurrently displayed; Paragraph 0021 teaches multimedia package includes a video content derived from the source video, an audio content derived from the source video, and a text transcription content representing a transcription of the audio content; Paragraph 0047 teaches combining, video, and transcribed scrolling text that is time-synced to the video; Paragraph 0049 teaches as the video plays, the text scrolls with the video, and if the user scrolls forward or backward in the text file, the video may move to the point in the production that matches that point in the text; Paragraph 0324 teaches displaying a synchronized scrolling transcript of the audio portion of the video which is playing in the Video Player GUI].  

As to claim 16:
Larsen discloses: 
A device enabling a spatio-temporal navigation of content, comprising: a processor; and a non-transitory computer-readable medium comprising instructions stored thereon, which when executed by the processor configure the device to:
receive a content request from a client, the content comprising a first content of a first type [Paragraph 0236 teaches search display GUI includes a search field input box where the user can type any word or phrase to perform a search for content, where the search results will be displayed including the content, e.g., video, audio, text, etc.]; and 
transmit to the client: the first content [Paragraph 0236 teaches search the search results will be displayed including the content, e.g., video, audio, text, etc., which the user can select]; 
second content of a second type generated from the first content [Paragraph 0021 teaches automatically generating a text transcription of the source video’s audio content];  
synchronization metadata associating each word of one of the contents with a time marker to the other content [Paragraph 0020 teaches distinct segments of the video, audio, and text content may each be mapped to a respective timecode of the timecode synchronization information; Paragraph 0038 teaches generating indexes to synchronize ingested, edited, and displayed audio/video files and streams with other text; Paragraph 0045 teaches integration of multimedia files, streams, and features by linking them through time-based indexes]; and 
a script, for execution at the client, configured to re-establish at least one of the contents and offer a navigation of said content depending on the type of content re-established by using at least one among: the first content, the second content, and the synchronization metadata [Paragraph 0020 teaches presentation of the multimedia content may include maintaining, using at least a portion of the mapped timecode information, presentation synchronization among different portions of content being concurrently displayed; Paragraph 0021 teaches multimedia package includes a video content derived from the source video, an audio content derived from the source video, and a text transcription content representing a transcription of the audio content; Paragraph 0047 teaches combining, video, and transcribed scrolling text that is time-synced to the video; Paragraph 0049 teaches as the video plays, the text scrolls with the video, and if the user scrolls forward or backward in the text file, the video may move to the point in the production that matches that point in the text; Paragraph 0324 teaches displaying a synchronized scrolling transcript of the audio portion of the video which is playing in the Video Player GUI].  

As to claim 17:
Larsen discloses:
the first type of content and the second type of content are distinct types of contents among: a textual content and a media content [Paragraph 0021 teaches multimedia package includes a video content, an audio content, and a text transcription content representing a transcription of the audio content].  

As to claim 18:
Larsen discloses:
if the first content comprises a media content, the second content comprises a text generated from an audio track extracted from the media content and the synchronization metadata comprise first synchronization metadata associating each word of the generated text with a time marker in the audio track [Paragraph 0021 teaches automatically generating a text transcription of the source video’s audio content using speech processing analysis, and automatically generating synchronization data for use in synchronizing distinct chunks of the text transcription with respective distinct chunks of the video content]; 
if the first content comprises a textual content, the second content comprises an audio file generated from the textual content and the synchronization metadata comprise second synchronization metadata associating each word of the textual content with a time marker in the generated audio file [Paragraph 0109 teaches conversion of text into speech, and the association of that text and speech with the time-code and time-base information linked to video/audio files and streams].  

As to claim 19:
Larsen discloses:
display, in a user interface of the client, a graphical element to allow a user to select between the two types of contents [Paragraph 0163 teaches the user can scroll the video/audio, and text, therefore, the interface provides the user with a graphical element which the user can select the two types of content]; and 
re-establish the content based on the status of the graphical element [Paragraph 0163 teaches synchronization of video, audio, and text with other digital media, where, when the user scrolls the video, the text scrolls along with it; and when the user scrolls the text, the video or audio stays in synch].  

As to claim 20:
Larsen discloses:
the first type of content and the second type of content are distinct types of contents among: a textual content and a media content [Paragraph 0021 teaches multimedia package includes a video content, an audio content, and a text transcription content representing a transcription of the audio content].  

As to claim 21:
Larsen discloses:
if the first content comprises a media content, the second content comprises a text generated from an audio track extracted from the media content and the synchronization metadata comprise first synchronization metadata associating each word of the generated text with a time marker in the audio track [Paragraph 0021 teaches automatically generating a text transcription of the source video’s audio content using speech processing analysis, and automatically generating synchronization data for use in synchronizing distinct chunks of the text transcription with respective distinct chunks of the video content]; 
if the first content comprises a textual content, the second content comprises an audio file generated from the textual content and the synchronization metadata comprise second synchronization metadata associating each word of the textual content with a time marker in the generated audio file [Paragraph 0109 teaches conversion of text into speech, and the association of that text and speech with the time-code and time-base information linked to video/audio files and streams].  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Larsen et al. (U.S. Publication No. 2012/0236201) hereinafter Larsen, as applied to claim 3 above, and further in view of Raitio et al. (U.S. Publication No. 2017/0345411) hereinafter Raitio.
As to claim 12:
Larsen discloses:
generating an audio file [Paragraph 0109 teaches converting text into speech, and the associating that text and speech with the time-code and time-base information linked to video/audio files].
Larsen does not appear to expressly disclose cutting the textual content into several text portions; generating partial audio files, a partial audio file being generated from a text portion of the textual content; and merging all of the partial audio files generated from all the text portions of the textual content to generate said audio file.  
Raitio discloses:
cutting the textual content into several text portions [Paragraph 0020 teaches a sequence of target units representing a spoken pronunciation of the text is generated; Paragraph 0159 teaches the text is received by text analysis module of text-to-speech module, and converted into a sequence of target units representing the spoken pronunciation of the text]; 
generating partial audio files, a partial audio file being generated from a text portion of the textual content [Paragraph 0161 teaches selecting one or more candidate speech segments for each target unit of the sequence of target units, therefore, partial audio files for each text portion]; and 
merging all of the partial audio files generated from all the text portions of the textual content to generate said audio file [Paragraph 0020 teaches speech corresponding to the received text is generated using the subset of candidate speech segments; Paragraph 0165 teaches joining the sequence of speech segments into a continuous speech waveform, where the speech waveform is an audio rendering of the spoken form of the text received at text analysis module].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Larsen, by cutting the textual content into several text portions; generating partial audio files, a partial audio file being generated from a text portion of the textual content; and merging all of the partial audio files generated from all the text portions of the textual content to generate said audio file, as taught by Raitio [Paragraph 0020, 0159, 0161], because both applications are directed to analysis and management of content; generating partial audio files for each text portion of the content, and generating the audio file by merging the partial audio files, enables the determination of predicted changes in one or more acoustic features at the end of a target unit, which leads to the selection of the best suitable candidate speech units that concatenate in the expected manner, and improves thereby the accuracy and naturalness of the resultant synthesized speech (See Raitio Para[0180], [0185]).

	
	

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RAQUEL PEREZ-ARROYO whose telephone number is (571)272-8969. The examiner can normally be reached Monday - Friday, 8:00am - 5:30pm, Alt Friday, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on 571-272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RAQUEL PEREZ-ARROYO/Examiner, Art Unit 2169