DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of papers submitted under 35 U.S.C 119(a)-(d), which papers have been placed of record in the file.

Information Disclosure Statement
The references listed in the Information Disclosure Statement filed on July 23, 2020 and April 12, 2021 has been considered by the examiner (see attached PTO-1449 form).

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 12, 14 and 20 are rejected under 35 U.S.C. 102(a1) as being anticipated by Nir (U.S. Pub. No. 2016/0066055).

Regarding claims 1 and 20, Nir discloses a method for pushing subtitle data, performed by a computer device, the method comprising:
obtaining video stream data and audio stream data, the audio stream data being data corresponding to an audio part in the video stream data (see paragraph 0012; received video signals accompanied by corresponding audio signals);
generating the subtitle data according to the audio stream data, the subtitle data comprising a subtitle text corresponding to a speech in the audio stream data and time information of the subtitle text (see paragraphs 0052, 0059; a speech recognition module 37 converts each audio time slice to text that includes the transcription of the audio time slice); and
pushing, in response to pushing the video stream data to a user terminal, the subtitle data to the user terminal, the subtitle data instructing the user terminal to synchronously display the subtitle text with live pictures in the video stream data and the audio part in the audio stream data according to the time information of the subtitle text (see paragraphs 0022, 0050, 0059, 0062-0063, figs. 1, 3; a synchronization module for synchronizing between each group of composite frames and their corresponding time slices of a sound track associated with the audio signal before outputting the synchronized composite frame group and audio channel to the video display).

Examiner notes that the instant claim 14 is to an apparatus comprising a memory and processor for carrying out the method of claim 1.  Therefore, the supporting rationale of the rejection to claim 1 applies to the instant claim of claim 14.

Regarding claim 12, Nir discloses everything claimed as applied above (see claim 1).  Nir discloses wherein the obtaining the video stream data and the audio stream data comprises:
transcoding a video stream through a transcoding process in a transcoding device to obtain the video stream data and the audio stream data with synchronized time information (see paragraphs 0049, 0053-0054, 0059, fig. 3).


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having 

Claims 2-6, 8, 11, 15-19 are rejected under 35 U.S.C. 103 as being unpatentable over Nir as applied to claim 1 above, and further in view of Erskine et al. (U.S. Pub. No. 2008/0284910).

Regarding claims 2 and 15, Nir discloses everything claimed as applied above (see claims 1 and 14).  Nir discloses the proposed video subtitling device may be programmed to generate subtitles in any predetermined language and appearance and may further comprise user interface elements for allowing a user to configure it to operate according to user predetermined preferences such as destination language, subtitle font size, contrast and graphical properties of the subtitles in paragraph 0025. 
However, Nir is silent as to wherein the pushing, in response to the pushing the video stream data to the user terminal, the subtitle data to the user terminal comprises: receiving a subtitle obtaining request transmitted by the user terminal, the subtitle obtaining request including language indication information, and the language indication information being used for indicating a subtitle language; determining whether the subtitle language indicated by the language indication information is a language corresponding to the subtitle text; and based on determining that the language indication information is the language corresponding to the subtitle text, pushing the subtitle data to the user terminal.
Erskine et al. discloses wherein the pushing, in response to the pushing the video stream data to the user terminal, the subtitle data to the user terminal comprises:

determining whether the subtitle language indicated by the language indication information is a language corresponding to the subtitle text (see paragraph 0055); and
based on determining that the language indication information is the language corresponding to the subtitle text, pushing the subtitle data to the user terminal (see paragraph 0055).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to and to modify the method and system of Nir to include wherein the pushing, in response to the pushing the video stream data to the user terminal, the subtitle data to the user terminal comprises: receiving a subtitle obtaining request transmitted by the user terminal, the subtitle obtaining request including language indication information, and the language indication information being used for indicating a subtitle language; determining whether the subtitle language indicated by the language indication information is a language corresponding to the subtitle text; and based on determining that the language indication information is the language corresponding to the subtitle text, pushing the subtitle data to the user terminal as taught by Erskine et al. for the advantage of providing a video with closed captioning to a user.

claims 3 and 16, Nir and Erskine et al. discloses everything claimed as applied above (see claims 2 and 15).  Erskine et al. discloses wherein the generating the subtitle data according to the audio stream data comprises generating the subtitle data according to the audio stream data through a target service, the target service being any one of at least one subtitle generation service (see paragraph 0039),
wherein the subtitle obtaining request further includes a service identifier used for indicating a subtitle generation service (see paragraph 0039), and
wherein the pushing the subtitle data to the user terminal further comprises pushing the subtitle data to the user terminal based on determining that the subtitle generation service indicated by the service identifier is the target service (see paragraphs 0040-0041).

Regarding claims 4 and 17, Nir and Erskine et al. discloses everything claimed as applied above (see claims 2 and 15).  Erskine et al. discloses wherein the subtitle obtaining request further comprises a time identifier, the time identifier indicating time information of the requested subtitle data (see paragraph 0014), and
wherein the pushing the subtitle data to the user terminal further comprises (see paragraph 0024):
querying whether the subtitle data corresponding to the time information indicated by the time identifier is cached (see paragraph 0024); and
based on determining that the subtitle data corresponding to the time information is cached, pushing the cached subtitle data to the user terminal (see paragraph 0024).

claim 5, Nir and Erskine et al. discloses everything claimed as applied above (see claim 4).  Erskine et al. discloses based on determining that the subtitle data is not found, extracting the subtitle data from a subtitle database (see paragraphs 0044, 0046, 0048); and 
caching the extracted subtitle data (see paragraphs 0044, 0046, 0048).

Regarding claims 6 and 18, Nir and Erskine et al. discloses everything claimed as applied above (see claims 2 and 15).  Erskine et al. discloses determining a next request time according to the time information of the subtitle data pushed to the user terminal (see paragraphs 0024, 0026); and
transmitting request indication information to the user terminal, the request indication information instructing the user terminal to transmit a new subtitle obtaining request when the next request time arrives (see paragraphs 0024, 0026).

Regarding claims 8 and 19, Nir and Erskine et al. discloses everything claimed as applied above (see claims 2 and 14).  Nir discloses wherein the generating the subtitle data according to the audio stream data comprises:
performing a speech recognition on the audio stream data to obtain a speech recognized text (see paragraphs 0035, 0049, 0059, 0067-0068); and
generating the subtitle data according to the speech recognized text (see paragraphs 0018, 0035, 0049, 0059, 0067-0068).

claim 11, Nir and Erskine et al. discloses everything claimed as applied above (see claim 8).  Erskine et al. discloses wherein the generating the subtitle data according to the speech recognized text comprises:
translating the speech recognized text into translated text corresponding to a target language (see paragraph 0055);
generating the subtitle text according to the translated text, the subtitle text comprising at least one of the translated text or the speech recognized text (see paragraph 0055); and
generating the subtitle data according to the subtitle text (see paragraph 0055).


Claims 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Nir and Erskine et al. as applied to claim 8 above, and further in view of Sakai et al. (U.S. Pub. No. 2008/0243506).

Regarding claim 9, Nir and Erskine et al. discloses everything claimed as applied above (see claim 8).  However, Nir and Erskine et al. fails to disclose wherein the performing speech recognition on the audio stream data to obtain the speech recognized text comprises: performing a speech start and end detection on the audio stream data to obtain a speech start frame and a speech end frame in the audio stream data, the speech start frame being an audio frame at the start of a speech segment and the speech end frame being an audio frame at the end of the speech segment; and

Sakai et al. discloses wherein the performing speech recognition on the audio stream data to obtain the speech recognized text comprises: performing a speech start and end detection on the audio stream data to obtain a speech start frame and a speech end frame in the audio stream data, the speech start frame being an audio frame at the start of a speech segment and the speech end frame being an audio frame at the end of the speech segment (see paragraph 0007); and
performing the speech recognition on target speech data in the audio stream data to obtain the speech recognized text corresponding to the target speech data, the target speech data comprising a plurality of audio frames between any set of the speech start frame and the speech end frame in the audio stream data (see paragraph 0007).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to and to modify the method and system of Nir and Erskine et al. to include wherein the performing speech recognition on the audio stream data to obtain the speech recognized text comprises: performing a speech start and end detection on the audio stream data to obtain a speech start frame and a speech end frame in the audio stream data, the speech start frame being an audio frame at the start of a speech segment and the speech end frame being an audio frame at the end of the speech segment; and performing the speech recognition on target speech data in the audio 

Regarding claim 10, Nir and Erskine et al. discloses everything claimed as applied above (see claim 9).  However, Nir and Erskine et al. fails to disclose wherein the performing the speech recognition on the target speech data in the audio stream data further comprises: performing a speech frame extraction at predetermined time intervals according to the time information of the plurality of audio frames in the target speech data to obtain at least one piece of speech subdata, the speech subdata comprising at least one audio frame, among the plurality of audio frames, between the speech start frame and a target audio frame in the target speech data when the speech frame extraction operation of the speech subdata corresponds to the time information in the target speech data; performing the speech recognition on the at least one piece of speech subdata to obtain recognized subtext corresponding to the at least one piece of speech subdata; and obtaining the recognized subtext corresponding to the at least one piece of speech subdata as the speech recognized text corresponding to the target speech data.
Sakai et al. discloses wherein the performing the speech recognition on the target speech data in the audio stream data further comprises: 

performing the speech recognition on the at least one piece of speech subdata to obtain recognized subtext corresponding to the at least one piece of speech subdata (see paragraphs 0007, 0034); and 
obtaining the recognized subtext corresponding to the at least one piece of speech subdata as the speech recognized text corresponding to the target speech data (see paragraphs 0007, 0029, 0034).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to and to modify the method and system of Nir and Erskine et al. to include wherein the performing the speech recognition on the target speech data in the audio stream data further comprises: performing a speech frame extraction at predetermined time intervals according to the time information of the plurality of audio frames in the target speech data to obtain at least one piece of speech subdata, the speech subdata comprising at least one audio frame, among the plurality of audio frames, between the speech start frame and a target audio frame in the target speech data when the speech frame extraction operation of the speech subdata corresponds to the time information in .

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Nir as applied to claim 1 above, and further in view of Smith et al. (U.S. Pub. No. 2007/0300249).

Regarding claim 13, Nir discloses everything claimed as applied above (see claim 1).  However, Nir is silent as to wherein the video stream data is live video stream data.
Smith et al. discloses wherein the video stream data is live video stream data (see paragraph 0076).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to and to modify the method and system of Nir to include wherein the video stream data is live video stream data as taught by Smith et al. for the advantage of viewing content in real-time.

Allowable Subject Matter
Claim 7 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NNENNA NGOZI EKPO whose telephone number is (571)270-1663.  The examiner can normally be reached on M-W 10:00am - 6:30pm, TH-F 8:00am - 4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Brian Pendleton can be reached on 571-272-7527.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private 


NNENNA EKPO
Primary Examiner
Art Unit 2425



/NNENNA N EKPO/Primary Examiner, Art Unit 2425                                                                                                                                                                                             April 29, 2021.