DETAILED ACTION
1.	This communication is in response to the Amendments and Arguments filed on 7/12/2022. Claims 1-20 are pending and have been examined. 
Response to Amendments and Arguments
2.	 Applicant's arguments with respect to claim rejections under 35 U.S.C. 103 have been fully considered, but they are not persuasive. In particular, the applicant argues that, with respect to the amended independent claims, the references do not teach: “presenting the first text information and the second text information in a single window, and displaying the first text information and in the second text information in a corresponding manner .. modifying the first text information based on the second text information.” In response, the examiner respectfully disagrees. 
Note that CARRAUX teaches: [0003] "display of text in response to their speech" and JOHNSON teaches: [0059] “If the user selects the merge option (stage 740), the windows or panes containing related fields may be merged together and displayed in a single window or pane (stage 750). If the viewing window or pane is large enough, all of the merged windows or panes may appear on the display device simultaneously. If not, the merged windows or panes may appear in the same window or pane, but scrolling may be required to see all of the information. Even where scrolling is required, placing the information in a single window or pane simplifies a user's task by pushing the necessary data to the user, resulting in efficient error resolution.” Further note that “in a corresponding manner” is ambiguous and can be interpreted broadly.
THELEN teaches: [Abstract] “transcription of spoken and written utterances .. the utterances undergo speech or text recognition, and the recognition result (ME) is combined with a manually created transcription (MT) of the utterances in order to obtain the transcription” and [0010-0011] “retaining the process of the manual transcription of a spoken or written utterance as such, but then supporting it with pattern recognition .. An utterance is manually transcribed in order to be subsequently combined with the pattern-recognition result of the utterance. Since the pattern-recognition result adds additional information to the manual transcription, the human transcriber can take this into account in his working method in order to make the manual transcription e.g. faster or more convenient for him to produce ..”
Also note that if the applicant chooses to file another RCE as a response to this Office action, some recited references will be consolidated into a single reference that still teaches the recited limitations.
Claim Rejections - 35 USC § 103
3.	Claims 1-4, 7, 10-16, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Carraux, et al. (US 20120296645; hereinafter CARRAUX) in view of Chandler, et al. (US 6477491; hereinafter CHANDLER), further in view of Kahn, et al. (WO 2000046787A2; hereinafter KAHN), further in view of Johnson, et al. (US 20100138704; hereinafter JOHNSON) and further in view of Thelen, et al. (US 20060167685; hereinafter THELEN).
As per claim 1, CARRAUX (Title: Distributed Speech Recognition Using One Way Communication) discloses “An information processing method, comprising: 
[ receiving first text information through a first input device, wherein the first text information is generated according to a speech heard by a user and inputted by the user using the first input device ];
 	receiving audio information recorded by a second input device, wherein the audio information is generated and recorded according to the speech (CARRAUX, [0023], The device 106 may receive the speech 104 from the user 102 in any way, such as through a microphone); 
performing speech recognition on the audio information to obtain second text information (CARRAUX, [0026], The speech recognition client 140 transmits the speech 104 over a network 116 to a server-side speech recognition engine 120 located on a server 118 (step 204));
[ presenting the first text information and the second text information in a single window, and displaying the first text information and in the second text information in a corresponding manner ] (CARRAUX, [0003], display of text in response to their speech <read on a ready mechanism for presenting any text>.); and [ modifying the first text information based on the second text information ].”
CARRAUX does not expressly disclose “receiving first text information through a first input device, wherein the first text information is generated according to a speech [ heard by a user and inputted by the user ] using the first input device ..” However, the feature is taught by CHANDLER (Title: System and method for providing speaker-specific records of statements of speakers). 
In the same field of endeavor, CHANDLER teaches: [Abstract] “A speech processing system for the generation of speaker specific text output. To automatically generate a transcript of a trial, hearing, or meeting, the system uses microphones dedicated to specific speakers along with one or more computers with speech recognition software assigned to each microphone. The system tracks the occurrences of speech and assembles a transcript of the participant's spoken words including the speaker's identity and a text version of the spoken words in the order the words were spoken.”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of CHANDLER in the system (as taught by CARRAUX) for speech dictation and transcription (also read on text input means).
CARRAUX in view of CHANDLER does not expressly disclose “the first text information is generated according to a speech heard by a user and inputted by the user ..” However, the feature is taught by KAHN (Title: System and method for automating transcription services). 
In the same field of endeavor, KAHN teaches: [col. 2, lines 14-26] “The system further includes means for manually inputting and creating a transcribed file based on humanly perceived contents of the uniquely identified voice dictation file. Thus, for certain voice dictation files, a human transcriptionist manually transcribes a textual version of the audio — using a text editor or word processor — based on the output of the output of the audio player. The system also includes means for automatically converting the voice dictation file into written text. The automatic speech converting means may be a preexisting speech recognition program, such as Dragon Systems' Naturally Speaking, IBM's Via Voice or Philips Corporation's Magic Speech.”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of KAHN in the system (as taught by CARRAUX and CHANDLER) for manual transcription and text input.
CARRAUX in view of CHANDLER and KAHN does not expressly disclose “presenting the first text information and the second text information in a single window, and displaying the first text information and in the second text information in a corresponding manner ..” However, the feature is taught by JOHNSON (Title: User interface messaging system and method permitting deferral of message resolution). Note that “in a corresponding manner” can be interpreted broadly.
In the same field of endeavor, JOHNSON teaches: [0059] “If the user selects the merge option (stage 740), the windows or panes containing related fields may be merged together and displayed in a single window or pane (stage 750). If the viewing window or pane is large enough, all of the merged windows or panes may appear on the display device simultaneously. If not, the merged windows or panes may appear in the same window or pane, but scrolling may be required to see all of the information. Even where scrolling is required, placing the information in a single window or pane simplifies a user's task by pushing the necessary data to the user, resulting in efficient error resolution.”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of JOHNSON in the system (as taught by CARRAUX, CHANDLER and KAHN) for displaying related text data in a single window in any desired manner.
CARRAUX in view of CHANDLER, KAHN and JOHNSON does not expressly disclose “modifying the first text information based on the second text information.” However, the feature is taught by THELEN (Title: Method and device for the rapid, pattern-recognition-supported transcription of spoken and written utterances). 
In the same field of endeavor, THELEN teaches: [Abstract] “transcription of spoken and written utterances .. the utterances undergo speech or text recognition, and the recognition result (ME) is combined with a manually created transcription (MT) of the utterances in order to obtain the transcription” and [0010-0011] “retaining the process of the manual transcription of a spoken or written utterance as such, but then supporting it with pattern recognition .. An utterance is manually transcribed in order to be subsequently combined with the pattern-recognition result of the utterance. Since the pattern-recognition result adds additional information to the manual transcription, the human transcriber can take this into account in his working method in order to make the manual transcription e.g. faster or more convenient for him to produce ..”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of THELEN in the system (as taught by CARRAUX, CHANDLER, KAHN and JOHNSON) for modifying human transcription based on speech recognition results.
As per claims 2 (dependent on claim 1), CARRAUX in view of CHANDLER, KAHN, JOHNSON and THELEN further discloses “wherein performing speech recognition comprises: 
recognizing identity information corresponding to the audio information; and recognizing the second text information expressed in the speech of the audio information, wherein the second text information is associated with the identity information (CHANDLER, [Abstract], A speech processing system for the generation of speaker specific text output. To automatically generate a transcript of a trial, hearing, or meeting, the system uses microphones dedicated to specific speakers along with one or more computers with speech recognition software assigned to each microphone. The system tracks the occurrences of speech and assembles a transcript of the participant's spoken words including the speaker's identity and a text version of the spoken words in the order the words were spoken <Examiner’s Note: with the two texts based on the same speech, the assignment of the same identity information to both texts is simply a system design choice>);
wherein the presenting accordingly comprises: presenting, in a corresponding manner, the identity information and the corresponding second text information (CARRAUX, [0003], display of text in response to their speech <read on a ready mechanism for presenting any information as desired>).”
As per claims 3 (dependent on claim 1), CARRAUX in view of CHANDLER, KAHN, JOHNSON and THELEN further discloses “sending the audio information to a server for determining, by the server, identity information corresponding to the audio information (CARRAUX, [0026], The speech recognition client 140 transmits the speech 104 over a network 116 to a server-side speech recognition engine <read on a ready mechanism for recognizing/determining identity information, provided that the identity information is part of the speech. Specification [0053] ‘the first text information can include the name of a corresponding speaker. As such, the first text information can present the identity of the speaker and the content more intuitively. Specifically, for example, the first text information can be "Xiao Ming says, 'Xiao Zhang owes me 100 dollars ...’>); and receiving the identity information corresponding to the audio information fed back from the 60Attorney Docket No. 12852.0335-00000Alibaba No. PCT12549USserver (CARRAUX, [Abstract], The speech recognition client receives recognition results from the server-side recognizer in response to requests from the client <read on a ready mechanism for receiving feedback of any information>); wherein the presenting accordingly comprises: presenting, in a corresponding manner, the identity information and the corresponding second text information (CARRAUX, [0003], display of text in response to their speech <read on a ready mechanism for presenting any information as desired>).”
As per claims 4 (dependent on claim 1), CARRAUX in view of CHANDLER, KAHN, JOHNSON and THELEN further discloses “associating the first text information and the audio information received in adjacent time (CARRAUX, [0023], The device 106 may receive the speech 104 from the user 102 <‘in adjacent time’ can be broadly interpreted and is simply a system design choice>), to present, in a corresponding manner, the first text information and the second text information that is obtained by recognizing the audio information (CARRAUX, [0003], display of text in response to their speech <read on speech recognition and a ready mechanism for presenting any texts as desired>).”
 As per claims 7 (dependent on claim 1), CARRAUX in view of CHANDLER, KAHN, JOHNSON and THELEN further discloses “modifying the first text information according to input from the first input device (Examiner’s Note: This limitation is confusing based on Claim 1 ‘receiving first text information through a first input device’ and Specification [0077] ‘the court clerk can modify the first text information according to the second text information’ where the second text information is based on input of the second input device); and outputting the modified first text information (CARRAUX, [0003], display of text in response to their speech <read on a ready mechanism for outputting any text as desired>).”
Claim 10 (similar in scope to claim 1) is rejected under the same rationale as applied above for claim 1. 
As per claims 11 (dependent on claim 10), CARRAUX in view of CHANDLER, KAHN, JOHNSON and THELEN further discloses “a network communication unit configured to send the audio information or representation of 62Attorney Docket No. 12852.0335-00000Alibaba No. PCT12549USthe audio information to a server for performing, by the server, speech recognition and configured to receive second text information obtained by the speech recognition and fed back from the server (CARRAUX, [Abstract], A speech recognition client sends a speech stream and control stream in parallel to a server-side speech recognizer over a network .. The server-side speech recognizer recognizes the speech stream continuously. The speech recognition client receives recognition results from the server-side recognizer in response to requests from the client).” 
As per claims 12 (dependent on claim 10), CARRAUX in view of CHANDLER, KAHN, JOHNSON and THELEN further discloses “wherein the processor is configured to perform typesetting according to the correspondence relationship between the first text information and the second text information, wherein the typeset first text information and second text information are used for presentation (CARRAUX, [0003], display of text in response to their speech <read on a ready mechanism for presenting any text format/typeset as desired. Note that ‘typesetting according to the correspondence relationship’ can be broadly interpreted. Because the first text and the second text are based on the same speech, they inherently have a ‘correspondence relationship’ so any display of them satisfies the claimed limitation. The applicant must make the limitation more specific>).”
Claims 13-14, 15, 16 (similar in scope to claims 1-2, 12, 4) are rejected under the same rationale as applied above for claims 1-2, 12, 4.
As per claims 20 (dependent on claim 13), CARRAUX in view of CHANDLER, KAHN, JOHNSON and THELEN further discloses “presenting the first text information in a first region; and presenting the second text information in a second region, wherein the first region and the second region are located in the same interface (CARRAUX, [0003], display of text in response to their speech <read on a ready mechanism and interface for presenting any text as desired. Note that ‘a first region .. a second region’ can be broadly interpreted. Because the first text and the second text certainly cannot overlap each other, they of course must be presented in ‘different’ regions>).”

4.	Claims 5-6, 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over CARRAUX in view of CHANDLER, KAHN, JOHNSON and THELEN and further in view of Wu, et al. (CN 106372122A; hereinafter WU).
As per claims 5 (dependent on claim 1), CARRAUX in view of CHANDLER, KAHN, JOHNSON and THELEN further discloses “[ performing semantic matching ] on the first text information with the second text information of the audio information that is generated within a designated time frame to obtain the second text information corresponding to the first text information (see Claim 1, where designated time frame is a system design choice).”
CARRAUX in view of CHANDLER, KAHN, JOHNSON and THELEN does not expressly disclose “performing semantic matching ..” However, the feature is taught by WU (Title: Wiki semantic matching-based document classification method and system).
In the same field of endeavor, WU teaches: [Abstract] “a wiki semantic matching-based document classification .. (1) obtaining a keyword set of a text document by utilizing keyword matching for each text document D in a document set, and performing matching in a wiki semantic reference space by utilizing a matching rule to obtain a reference concept set related to the text documents ..”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of WU in the system (as taught by CARRAUX, CHANDLER, KAHN, JOHNSON and THELEN) to provide semantic matching for correlating two texts.
As per claims 6 (dependent on claim 5), CARRAUX in view of CHANDLER, KAHN, JOHNSON, THELEN and WU further discloses “wherein performing semantic matching further comprises: setting a reference time as the time when the first text information is received; and setting the designated time frame according to the reference time, the reference time being within the designated time frame (WU, [Abstract], a wiki semantic matching-based document classification <Examiner’s Note: setting a reference time and a time frame is a system design choice>).”
Claims 17-18 (similar in scope to claims 5-6) are rejected under the same rationale as applied above for claims 5-6.
5.	Claims 8-9, 19 are rejected under 35 U.S.C. 103 as being unpatentable over CARRAUX in view of CHANDLER, KAHN, JOHNSON and THELEN and further in view of Batchilo, et al. (WO 2002037223A2; hereinafter BATCHILO).
As per claims 8 (dependent on claim 1), CARRAUX in view of CHANDLER, KAHN, JOHNSON and THELEN further discloses “when [ a triggering event ] occurs in the first text information or in the second text information, [ playing back the audio information ] corresponding to the first text information or the second text information in which the triggering event occurs.” 
CARRAUX in view of CHANDLER, KAHN, JOHNSON and THELEN does not expressly disclose “a triggering event .. playing back the audio information ..” However, the feature is taught by BATCHILO (Title: Computer based integrated text and graphic document analysis).
In the same field of endeavor, BATCHILO teaches: “user can also select (click on) a "speak" button which will activate computer speech module which "reads" and "speaks" to the user the text segments ..” and “clicking on a specific displayed text segment, can display the full text including from several lines before to several lines of text after the selected text segment,”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of BATCHILO in the system (as taught by CARRAUX, CHANDLER, KAHN, JOHNSON and THELEN) to enable user clicking to activate any action such as audio playback.
As per claims 9 (dependent on claim 1), CARRAUX in view of CHANDLER, KAHN, JOHNSON and THELEN further discloses “when [ a triggering event ] occurs in the first text information, [ displaying, with a designated style, the second text information ] corresponding to the first text information; or when the triggering event occurs in the second text information, displaying, with a designated style, the first text information corresponding to the second text information (CARRAUX, [0003], display of text in response to their speech <read on a ready mechanism for presenting any text in any chosen style as desired>).” 
CARRAUX in view of CHANDLER, KAHN, JOHNSON and THELEN does not expressly disclose “a triggering event .. displaying, with a designated style, the second text information ..” However, the feature is taught by BATCHILO (Title: Computer based integrated text and graphic document analysis).
In the same field of endeavor, BATCHILO teaches: “clicking on a specific displayed text segment, can display the full text including from several lines before to several lines of text after the selected text segment ..” 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of BATCHILO in the system (as taught by CARRAUX, CHANDLER, KAHN, JOHNSON and THELEN) to enable user clicking to activate any action such as text display with any style as system design choice.
Claim 19 (similar in scope to claim 8) is rejected under the same rationale as applied above for claim 8.
Conclusion
6.	THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).   
	A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 		
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FENG-TZER TZENG whose telephone number is (571)272-4609. The examiner can normally be reached on M-F (8:30-5:00). The fax phone number where this application or proceeding is assigned is 571-273-4609.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir (SPE) can be reached on 571-272-7799. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/FENG-TZER TZENG/		10/13/2022Primary Examiner, Art Unit 2659