DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 03/14/2022 have been fully considered but they are not persuasive. 
At pages 9-11, Applicant argues that,
“… 
It is clear that Hu acquires audio stream data from a live video stream data, instead of acquiring speech data directly. This means that the live video stream data in Hu is encoded, and it is necessary to decode audio stream data from the live video stream data for performing speech recognition (as shown in the above figure). However, the terminal of the present application directly acquires video image frames and speech data, and then obtains subtitle content by performing text recognition based on the speech data …”

In response, Examiner respectfully disagrees and submits that, at least at [0036] and [0042], Hu states,
[0036] The live video recording terminal 220 includes an image acquisition component and an audio acquisition component. The image acquisition component and the audio acquisition component may be parts of the live video recording terminal 220. For example, the image acquisition component and the audio acquisition component may be a built-in camera and a built-in microphone of the live video recording terminal 220. Alternatively, the image acquisition component and the audio acquisition component may be peripheral devices connected to the live video recording terminal 220. For example, the image acquisition component and the audio acquisition component may be an external camera and an external microphone connected to the live video recording terminal 220. Alternatively, one of the image acquisition component and the audio acquisition component may be built in the live video recording terminal 220, and the other being a peripheral device of the live video recording terminal 220. For example, the image acquisition component may be a built-in camera of the live video recording terminal 220, and the audio acquisition component may be an external microphone in an earphone connected to the live video recording terminal 220. Implementation forms of the image acquisition component and the audio acquisition component are not limited in this embodiment of this application.
[0042] During live streaming, the live video recording terminal runs the live streaming APP client, a user (who may also be referred to as an anchor) triggers and starts a live streaming function in a live streaming APP interface, and then the live streaming APP client invokes the image acquisition component and the audio acquisition component in the live video recording terminal to record a live video stream, and uploads the recorded live video stream to the live streaming server. The live streaming server receives the live video stream, and establishes a live streaming channel for the live video stream. A user of the user terminal may access the live streaming server through a live streaming APP client or a browser client installed in the user terminal, and selects the live streaming channel at the access interface. Then the live streaming server pushes the live video stream to the user terminal. The user terminal plays the live video stream in a live streaming APP interface or a browser interface.
(emphasis added)
Thus, Hu teaches that video image frames and speech data are directly acquired from a camera and a microphone, either built-in or external to the recording terminal 220 as further shown in Fig. 2.
With respect to Applicant’s arguments at page 11 regarding the method of Hu involving a process of encoding, decoding, then encoding again while the present application performing only one encoding, and recognizing speech data in real time.
Examiner respectfully submits that, without acquiescing to any of Applicant’s characterizations of Hu, the claim does not mention anything regarding encoding or decoding, or real time processing of speech data. Therefore, Applicant’s arguments regarding this subject matter are moot.
As such, Applicant’s arguments with respect to claims 1-4, 7-13, 16- 21, and 23 are not persuasive.
Applicant’s arguments with respect to new claims 22 and 24 are moot in view of a new ground of rejections.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 7-8, 10-11, 16-17, and 19-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Hu et al. (US 2020/0336796 A1 – hereinafter Hu).
Regarding claim 1, Hu discloses a video recording method, comprising: receiving a video recording triggering signal, the video recording triggering signal being configured to trigger a video recording operation ([0042] – a user triggers a live video recording terminal to record a live video stream); directly acquiring video image frames and speech data according to the video recording triggering signal ([0036] – collecting video image frames via a built-in camera and collecting speech data via a built-in microphone); determining a timestamp range of the video image frames corresponding to a duration of speech covered by the collected speech data in the video recording operation ([0081]-[0082] – determining a timestamp range of the video image frames corresponding to a duration of speech as described in [0077], e.g. start time and duration of a segment of speech); performing text recognition on the speech data to obtain subtitle content for a recorded video within the timestamp range ([0072]; [0077]; [0083] – performing text recognition on the speech data within the timestamp range as further shown in step 33 of Fig. 3); generating a target video based on the video image frames, the speech data and the subtitle content ([0086]; [0092]; Fig. 8 – generating a video stream with superimposed audio and caption data).
Regarding claim 2, Hu also discloses performing the text recognition on the speech data to obtain the subtitle content for the recorded video within the timestamp range comprises: performing the text recognition on the speech data to obtain corresponding text content ([0072]; [0077] – performing text recognition on the speech data within the timestamp range as further shown in step 33 of Fig. 3); and segmenting the text content by performing semantic recognition on the text content to obtain the subtitle content ([0072]-[0073]; [0077]; Fig. 6 – segmenting the text content by performing semantic recognition on the text content via recognition of complete segments of speech comprising one or more recognized sentences).
Regarding claim 7, Hu also discloses collecting the video image frames and the speech data according to the video recording triggering signal comprises: collecting the video image frames through a camera and collecting the speech data through a microphone according to the video recording triggering signal ([0036] – collecting video image frames via a built-in camera and collecting speech data via a built-in microphone).
Regarding claim 8, Hu also discloses collecting the video image frames and the speech data according to the video recording triggering signal comprises: acquiring display content of a terminal display screen as the video image frames according to the video recording triggering signal ([0042]); and acquiring audio playing content corresponding to the display content as the speech data ([0042]).
Claim 10 is rejected for the same reason as discussed in claim 1 above in view of Hu also disclosing a video recording apparatus, executed by the terminal, comprising: a processor and a memory, wherein the memory stores at least one instruction which is executable by the processor, and the processor is configured to perform the recited steps (Fig. 16; [0146]-[0149] – CPU 1601 and memory 1602).
Claim 11 is rejected for the same reason as discussed in claim 2 above.
Claim 16 is rejected for the same reason as discussed in claim 7 above.
Claim 17 is rejected for the same reason as discussed in claim 8 above.
Claim 19 is rejected for the same reason as discussed in claim 10 above.
Claim 20 is rejected for the same reason as discussed in claim 1 above.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Hu as applied to claims 1-2, 7-8, 10-11, 16-17, and 19-20 above.
Regarding claim 9, see the teachings of Hu as discussed in claim 1 above. However, Hu does not explicitly disclose before receiving the video recording triggering signal, further comprising: receiving a speech subtitle enabling signal, wherein the speech subtitle enabling signal is configured to enable a function for generating the subtitle content for the recorded video.
Official Notice is taken that user interfaces allowing a user to turn on or off a recording option are well known in the art.
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate a user setting option to enable or disable a function for generating the subtitle content for the recorded video before recording to allow the user to turn on or off the subtitle generating function as he or she desires so that the function can be turned off if the user does not intend to have the subtitle generated for the stream to save processing power.
Claim 18 is rejected for the same reason as discussed in claim 9 above.
Claims 3-4, 12-13, 22, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Hu as applied to claims 1-2, 7-11, and 16-20 above, and further in view of Kim et al. (US 2019/0373336 A1 – hereinafter Kim).
	Regarding claim 3, Hu also discloses segmenting the text content by performing the semantic recognition on the text content to obtain the subtitle content comprises: segmenting the text content by performing the semantic recognition on the text content to obtain at least one text segment as the subtitle content ([0072]-[0073]; [0077] – segmenting the text content by performing semantic recognition on the text content via recognition of complete segments of speech comprising one or more recognized sentences). 
However, Hu does not disclose adding a punctuation mark to the at least one text segment by performing tone recognition on the speech data.
Kim discloses adding a punctuation mark to the at least one text segment by performing tone recognition on the speech data ([0116]; Fig. 8).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Kim into the method taught by Hu to correctly represent the tone of the speaker in the caption.
Regarding claim 4, see the teachings of Hu and Kim as discussed in claim 3 above, in which Hun and Kim also disclose after segmenting the text content by performing the semantic recognition on the text content to obtain the at least one text segment (Hu: [0072]-[0073]; [0077] – segmenting the text content by performing semantic recognition on the text content via recognition of complete segments of speech comprising one or more recognized sentences), further comprising: adding a display element corresponding to a recognized scene to the at least one text segment by performing scene recognition on the speech data (Kim: [0116]; Fig. 8). The motivation for incorporating the teachings of Kim into the method of Hu has been discussed in claim 3 above.
Claim 12 is rejected for the same reason as discussed in claim 3 above.
Claim 13 is rejected for the same reason as discussed in claim 4 above.
Regarding claim 22, see the teachings of Hu and Kim as discussed in claim 4 above, in which Kim also discloses the display element comprises at least one of an emoticon, an emoji, a kaomoji and an image ([0116]; Fig. 8). The motivation for incorporating the teachings of Kim into the method of Hu has been discussed in claim 3 above.
Claim 24 is rejected for the same reason as discussed in claim 22 above.
Claims 21 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Hu as applied to claims 1-2, 7-11, and 16-20 above, and further in view of Pornprasitsakul et al. (US 2014/0201631 A1 – hereinafter Pornprasitsakul).
Regarding claim 21, see the teachings of Hu as discussed in claim 1 above, in which Hu also discloses displaying a preview interface, wherein the preview interface is configured to play a preview video corresponding to the target video, and the subtitle content is displayed on the video image frames in an overlapping manner when the preview video is played to display the video image frames within the timestamp range (Fig. 11).
However, Hu does not disclose providing a subtitle editing control for the preview interface, wherein the subtitle editing control is disposed below the preview interface; receiving a selection operation on the subtitle editing control; displaying a subtitle editing area and a subtitle confirmation control according to the selection operation, wherein the subtitle editing area is disposed below the subtitle editing control and the subtitle confirmation control, subtitle editing area displays a subtitle editing sub-area corresponding to at least one video segment corresponding to the preview video, and subtitle content corresponding to the video segment is edited in the subtitle editing sub-area; and updating the target video according to the subtitle content in the subtitle editing area when a triggering operation on the subtitle confirmation control is received.
Pornprasitsakul discloses providing a subtitle editing control for a preview interface, wherein the subtitle editing control is disposed below the preview interface; receiving a selection operation on the subtitle editing control ([0023], [0053]; Fig. 1 – a subtitle editing control is the timeline with a playhead); receiving a selection operation on the subtitle editing control ([0023] – receiving a selection to enter a caption); displaying a subtitle editing area and a subtitle confirmation control according to the selection operation, wherein the subtitle editing area is disposed below the subtitle editing control and the subtitle confirmation control, the subtitle editing area displays a subtitle editing sub-area corresponding to at least one video segment corresponding to the preview video ([0023], [0053]; Fig. 1 – a subtitle editing sub-area 116), and subtitle content corresponding to the video segment is edited in the subtitle editing sub-area ([0023], [0053]; Fig. 1 – subtitle content entered into area 116); and updating a target video according to the subtitle content in the subtitle editing area when a triggering operation on the subtitle confirmation control is received ([0068]).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Pornprasitsakul into the method taught by Hu to allow the user to adjust the caption data as he or she intends.
Claim 23 is rejected for the same reason as discussed in claim 21 above.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUNG Q DANG whose telephone number is (571)270-1116.  The examiner can normally be reached on IFT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Thai Q Tran can be reached on 571-272-7382.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/HUNG Q DANG/Primary Examiner, Art Unit 2484