DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-8, 10-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Nir (U.S. Pub. No. 2016/0066055) in view of McCartney, Jr. et al. (U.S. Pub. No. 2020/0404386).

Regarding claims 1, 11 and 20, Nir discloses a video stream processing method, the method comprising:

performing, by the processing circuitry, speech recognition on the first audio stream data to generate speech recognition text (see paragraph 0059; a speech recognition module 37 converts each audio time slice to text that includes the transcription of the audio time slice);
generating, by the processing circuitry, caption data according to the speech recognition text, the caption data including caption text and time information corresponding to the caption text (see paragraph 0018; a speech recognition module for converting each audio time slice to text that contains the transcription of the audio time slice); and
adding, by the processing circuitry, the caption text to a corresponding picture frame in the live video stream data according to the time information corresponding to the caption text to generate captioned live video stream data (see paragraphs 0021-0022, 0039, 0049; 0062; A synchronization module for synchronizing between each group of composite frames and their corresponding time slices of a sound track associated with the audio signal before outputting the synchronized composite frame group and audio channel to the video display).
However, Nir fails to disclose the time information indicating a time point corresponding to speech start frame of the segment of speech and a duration of the segment of speech.

Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to and to modify the method and system of Nir to include the time information indicating a time point corresponding to speech start frame of the segment of speech and a duration of the segment of speech as taught by McCartney, Jr. et al. for the advantage of aligning a translation of caption data with an audio portion of the video.

Regarding claims 2 and 12, Nir and McCartney, Jr. et al. discloses everything claimed as applied above (see claims 1 and 11).  Nir discloses wherein the adding comprises:
separating the live video stream data into second audio stream data and first picture frame stream data (see paragraph 0033);
determining a target picture frame in the first picture frame stream data, the target picture frame corresponding to the time information (see abstract, paragraphs 0013-0024);
generating a caption image of the caption text (see abstract, paragraphs 0013-0024);
superimposing the caption image on the target picture frame to generate superimposed picture frame stream data (see abstract, paragraphs 0013-0024); and


Regarding claims 3 and 13, Nir and McCartney, Jr. et al. discloses everything claimed as applied above (see claims 2 and 12).  Nir discloses wherein the combining comprises:
synchronizing the second audio stream data and the superimposed picture frame stream data according to the time information (see abstract, paragraphs 0022, 0040, 0062); and
combining the synchronized second audio stream data and the superimposed picture frame stream data to generate the captioned live video stream data (see abstract, paragraphs 0022, 0040, 0062).

Regarding claims 4 and 14, Nir and McCartney, Jr. et al. discloses everything claimed as applied above (see claims 1 and 11).  Nir discloses wherein before the adding, the method includes obtaining second picture frame stream data in the live video stream data (see abstract, paragraphs 0013-0024), and
the adding includes:
determining a target picture frame in the second picture frame stream data, the target picture frame corresponding to the time information (see abstract, paragraphs 0013-0024);

superimposing the caption image on the target picture frame to generate superimposed picture frame stream data (see abstract, paragraphs 0013-0024); and
combining the first audio stream data with the superimposed picture frame stream data to generate the captioned live video stream data (see abstract, paragraphs 0013-0024).

Regarding claims 5 and 15, Nir and McCartney, Jr. et al. discloses everything claimed as applied above (see claims 1 and 11).  Nir discloses adding, after a delay of a preset duration from a first moment, the caption text to the corresponding picture frame in the live video stream data according to the time information corresponding to the caption text to generate the captioned live video stream data, the first moment being a time the live video stream data is obtained (see paragraphs 0049, 0062, 0065).

Regarding claims 6 and 16, Nir and McCartney, Jr. et al. discloses everything claimed as applied above (see claims 1 and 11).  Nir discloses adding, after the caption data is stored, the caption text to the corresponding picture frame in the live video stream data according to the time information corresponding to the caption text to generate the captioned live video stream data (see paragraphs 0021-0022, 0039, 0049; 0062).

claims 7 and 17, Nir and McCartney, Jr. et al. discloses everything claimed as applied above (see claims 1 and 11).  Nir discloses wherein the performing the speech recognition comprises:
performing a speech start-end detection on the first audio stream data to obtain the speech start frame and a speech end frame in the first audio stream data, the speech start frame corresponding to a beginning of a segment of speech, and the speech end frame corresponding to an end of the segment of speech (see paragraphs 0030-0031);
extracting at least one segment of speech data from the first audio stream data according to the speech start frame and the speech end frame in the first audio stream data, the speech data including an audio frame between the speech start frame and the speech end frame (see paragraph 0058);
performing speech recognition on the at least one segment of speech data to obtain recognition sub-text corresponding to the at least one segment of speech data (see paragraph 0018, 0059); and
determining the recognition sub-text corresponding to the at least one segment of speech data as the speech recognition text (see paragraph 0018, 0059).

Regarding claims 8 and 18, Nir and McCartney, Jr. et al. discloses everything claimed as applied above (see claims 1 and 11).  Nir discloses wherein the generating the caption data comprises:
translating the speech recognition text into translated text corresponding to a target language (see paragraphs 0029, 0049);

generating the caption data including the caption text (see paragraphs 0029, 0049).

Regarding claim 10, Nir and McCartney, Jr. et al. discloses everything claimed as applied above (see claim 1).  Nir discloses receiving a video stream obtaining request from a user terminal (see paragraphs 0010, 0025, 0029, 0032-0040, 0049 and 0059);
obtaining language indication information in the video stream obtaining request, the language indication information indicating a caption language (see paragraphs 0010, 0025, 0029, 0032-0040, 0049 and 0059); and
pushing the captioned live video stream data to the user terminal when the caption language indicated by the language indication information corresponds to the caption text (see paragraphs 0010, 0025, 0029, 0032-0040, 0049 and 0059).

Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Nir and McCartney, Jr. et al. as applied to claims 1 and 11 above, and further in view of Cuthbert et al. (U.S. Patent No. 9,953,631).


claims 9 and 19, Nir and McCartney, Jr. et al. discloses everything claimed as applied above (see claims 1 and 11).  Nir discloses wherein the generating the caption data comprises:
translating the speech recognition text into translated text corresponding to a target language (see paragraphs 0029, 0049).
However, Nir and McCartney, Jr. et al. are silent as to generating the caption text according to the translated text, the caption text including the speech recognition text and the translated text; and generating the caption data including the caption text.
Cuthbert et al. discloses generating the caption text according to the translated text, the caption text including the speech recognition text and the translated text (see col. 5, lines 23-col. 6, line 10, fig. 3A-3C; displaying both original speech text and translated text); and
generating the caption data including the caption text (see col. 5, lines 23-col. 6, line 10, fig. 3A-3C).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to and to modify the method and system of Nir and McCartney, Jr. et al. to include generating the caption text according to the translated text, the caption text including the speech recognition text and the translated text; and generating the caption data including the caption text as taught by Cuthbert et al. for the advantage of enhancing conversation experience.


Citation of Pertinent Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Mohamed et al.		U.S. Pub. No. 2004/0068410 (see figs. 9, 10, para. 31)
Meiri				U.S. Pub. No. 2013/0295534 (see figs. 6, 7, para. 30)
Cuthbert et al. 		U.S. Patent No. 9,355,094 (see fig. 4).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NNENNA NGOZI EKPO whose telephone number is (571)270-1663.  The examiner can normally be reached on M-W 10:00am - 6:30pm, TH-F 8:00am - 4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Brian Pendleton can be reached on 571-272-7527.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


NNENNA EKPO
Primary Examiner
Art Unit 2425