DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
• The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
• This action is responsive to the following communication: an amendment filed on 10/03/2022.
• Claims 1-13, 15-26 are currently pending; claim 14 has been canceled.

Response to Arguments
• Applicant’s arguments, see pages 10-15, filed 10/3/2022, with respect to the rejection(s) of claim(s) 1, 22, and 26 under 35 U.S.C. 102(a)(2) have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of newly found prior art reference.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-10, 15-27 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al (US 20160284354) in view of Muthlah (US 20220030286). 
Regarding claim 1, Chen discloses an interactive information processing method, comprising: establishing a position correspondence (fig. 2 shows example of detecting speech and transcribe to be displayed, par. 3) between a display text generated based on a multimedia data stream and the multimedia data stream; and presenting the display text and the multimedia data stream (displays transcribed contents on the position correspondence, abstract, par. 3) corresponding to the display text based on the position correspondence. 
Chen further teaches establishing the position (display transcribed text based upon correspondence position, pars. 18-26) correspondence between the display text generated based on the multimedia data stream and the multimedia data stream, but fails to teach and/or suggest a timestamp synchronization association between the display text and the multimedia data stream based on a timestamp of the display text and a timestamp of the multimedia data stream.
Muthlah, in the same field of endeavor media stream, teaches a well-known of timestamp synchronization (timestamp synchronization, pars. 80,88, 97) association between the display text and the multimedia data stream based on a timestamp of the display text and a timestamp of the multimedia data stream (figs. 6-8).
It would have been obvious to one of ordinary skill in the art at the time of the invention was made by modifying media stream of Chen to include timestamp synchronization as taught by Muthlah to synchronize subtitle data with video frames of the video streams.
Therefore, it would have been obvious to combine Chen with Muthlah to obtain the invention as specified in claim 1.


Regarding claim 2, Chen further discloses the method of claim 1, further comprising: acquiring an audio-video frame of the multimedia data stream, and determining a user identity of a speaking user (identifying user via voiceprint, par. 18) corresponding to the audio-video frame; and generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame (audio/video, par. 18).

Regarding claim 3, Chen further discloses the method of claim 2, wherein the audio-video frame comprises an audio frame; and acquiring the audio-video frame of the multimedia data stream, and determining the user identity of the speaking user (voiceprint of user, par. 18)corresponding to the audio-video frame comprise at least one of: determining the user identity of the speaking user by performing a voiceprint recognition (par. 18) on the audio frame; or determining a client identity of a client to which the audio frame belongs, and determining the user identity of the speaking user based on the client identity (voiceprint identity, par. 18).

Regarding claim 4, Chen further disclose the method of claim 2, wherein the audio-video frame comprises an audio frame; and generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame comprises: obtaining a literal expression (expression, par. 18) corresponding to the audio frame by performing a speech-to-text (speech-to-text via transcribing, par.18, fig. 2) processing on the audio frame, and generating a first display text in the display text based on the literal expression and the user identity.

Regarding claim 5, Chen further discloses the method of claim 2, wherein the audio-video frame comprises a video frame; and generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame comprises: obtaining characters in the video frame by performing an image-text recognition (par. 18-21) on the video-frame, and generating a second display text in the display text based on the characters and the user identity.

Regarding claim 6, Chen further discloses the method of claim 4, wherein obtaining the literal expression (par. 18) corresponding to the audio frame by performing the speech-to-text processing on the audio frame, and generating the first display text in the display text based on the literal expression and the user identity comprise: determining the literal expression corresponding to the audio frame, a timestamp (par. 23) currently corresponding to the audio frame and a user identity of a speaking user to which the audio frame belongs; and generating a display content in the display text based on the user identity, the timestamp and the literal expression; wherein the display content comprises at least one paragraph; and obtaining the literal expression corresponding to the audio frame by performing the speech-to-text processing on the audio frame, and generating the first display text in the display text based on the literal expression (pars. 18-23) and the user identity comprise: in a process of performing the speech-to-text processing based on the audio frame, in response to detecting that an interval duration between adjacent audio frames is greater than or equal to a preset interval duration threshold (time duration, par. 26) and a user identity of a latter audio frame of the adjacent audio frames is not changed, generating a next paragraph in the display content based on the latter audio frame.

Regarding claim 7, Chen further discloses the method of claim 5, wherein obtaining the second display text in the display text by performing the image-text recognition on the video frame comprises at least one of: in response to determining that the video frame comprises at least one uniform resource locator (URL) address, generating a first display content in the second display text based on the at least one URL address (Internet, par. 9); or in response to determining that the video frame comprises a character, determining a second display content (display screens, pars. 26, 32) in the second display text based on the character.

Regarding claim 8, Chen further discloses the method of claim 4, wherein generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame comprises: determining, based on the audio-video frame, a third display text (pars. 18-21) in the display text to determine a content corresponding to a target content from the first display text in response to detecting that the target content in the third display text is triggered, and display the content differentially; wherein the third display text comprises at least one of at least one keyword (keywords, par. 25) or at least one key sentence.

Regarding claim 9, Chen further discloses the method of claim 8, wherein the display text comprises a second display text; and the method further comprises: displaying the display text and the multimedia data stream on a target page; wherein displaying the display text and the multimedia data stream on the target page comprises: displaying the third display text, the second display text, and the first display text in the display text and a recording screen video in preset display regions on the target page, respectively.

Regarding claim 10, Chen further discloses the method of claim 9, further comprising: determining a region proportion of the first display text, a region proportion of the second display (display portions/regions on a display screen, pars. 26, 32) text and a region proportion of the third display text on the target page based on a display content in the first display text, a display content in the second display text and a display content in the third display text.

Regarding claim 15, Chen further discloses the method of claim 1, wherein presenting the display text and the multimedia data stream corresponding to the display text based on the position correspondence comprises: displaying a display content corresponding to the multimedia data stream differentially in the display text based on the position correspondence (pars. 18-23).

Regarding claim 16, Chen further discloses the method of claim 1, wherein the multimedia data stream and the display text are displayed on a target page, and the target page further comprises a controlling control; and the method further comprises: adjusting a currently displayed content of the display text and the multimedia data stream simultaneously based on the controlling control; wherein an adjustment precision corresponding to the controlling control is greater than an adjustment precision (pars. 18-24) of an adjustment control for adjusting a progress of a recording screen video in the multimedia data stream.

Regarding claim 17, Chen further discloses the method of claim 16, further comprising:
in response to detecting a triggering operation triggering the controlling control on the target
page, acquiring a playback timestamp corresponding to the controlling control, adjusting the
multimedia data stream to jump to a playback position corresponding to the playback timestamp based on the playback timestamp, and displaying a content of a display text of a video frame (pars.18-24) corresponding to the playback timestamp differentially in the display text.

Regarding claim 18, Chen further discloses the method of claim 1, further comprising at least one of: in response to detecting a triggering operation triggering a display content in the display text on a target page, adjusting, based on a timestamp corresponding to the display content, the multimedia data stream to jump to a video frame corresponding to the timestamp; in response to detecting a triggering operation for the multimedia data stream, acquiring a playback timestamp of the multimedia data stream corresponding to the triggering operation, and jumping the display text to a display content corresponding to the playback timestamp in the display text based on the playback timestamp; in response to detecting a triggering operation for an editing control on a target page (pars. 18-24), displaying a permission editing list, and determining a user permission of each interactive user based on the permission editing list, wherein the user permission is used for representing an access permission of a user to a content presented on the target page, and the permission editing list comprises at least one user permission of an interactive user; or in response to detecting a triggering operation triggering a sharing control on a target page, generating a target identity corresponding to the target page, and sending the target identity to a user to share with, to cause the user to share with to acquire the target page based on the target  identity.

Regarding claim 19, Chen further discloses the method of claim 2, further comprising: acquiring a search content edited in a search content editing control, and acquiring at least one target content corresponding to the search content from the display text; wherein each of the at least one target content is the same as the search content; and displaying the at least one target content differentially in the display text, and marking an audio-video frame corresponding to the at least one target content in a controlling control (pars. 18-24) corresponding to the multimedia data stream.

Regarding claim 20, Chen further discloses the method of claim wherein marking the audio-video frame corresponding to the at least one target content in the controlling control corresponding to the multimedia data stream comprises: determining a playback timestamp corresponding to each of the at least one target content, and marking an audio-video frame corresponding to the each of the at least one target content in the controlling control corresponding to the multimedia data stream according to the playback timestamp (pars. 18-26).

Regarding claim 21, Chen further discloses the method of claim 19, further comprising: in response to detecting a triggering operation triggering each of the at least one target content, determining a target playback timestamp of the each of the at least one target content; and displaying a marker corresponding to the target playback timestamp differentially in the controlling control (par. 20) video conference, 

Regarding claim 27, Chen further discloses a non-transitory storage medium (fig. 3), comprising computer-executable instructions which, when executed by a computer processor, are configured to implement the method of claim 1.

Regarding claims 22-26 recite limitations/features that are similar and in the same scope of invention as to those in claims 1-10, 15-21 above and/or combination thereof; therefore, claims 22-26 are rejected for the same rejection rationale/basis as described in claims 1-10, 15-21 above and/or combination thereof.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over Chen/Muthlah as described in claims 1-10, 15-27 above, and in view of Horton et al (US 20200411013).
	Regarding claims 11-13, Chen fails to teach and/or suggest language detection in a video/audio data stream.
	Horton, in the same field of endeavor for natural language processing, teaches a well-known example of language detection in a video/audio stream data (pars. 43, 53, and 142). 
	It would have been obvious to one of ordinary skill in the art at the time of the invention was made by modifying natural language processing of Chen to include methods/steps of languages detection as taught by Horton. The transcription process may include an automatic language detection process which operates to analyze speech in an audio file and distinguish the languages spoken by the speakers. The automatic language detection process provides the ability to offer notification, alerting and routing options based on the spoken languages, such as real time notification when a speaker utters certain words or phrases in a particular language and may deliver language based statistics that can be used for resource planning and other management level tasks at a facility implementing the system. 
Therefore, it would have been obvious to combine Chen with Horton to obtain the invention as specified in claims 11-13.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THIERRY L PHAM whose telephone number is (571)272-7439. The examiner can normally be reached M-F, 11-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Tieu can be reached on (571)272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/THIERRY L PHAM/            Primary Examiner, Art Unit 2674