Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Response to Amendments 
Applicant’s amendment filed on October 18, 2021 has been entered. 
In view of the amendment to the claim(s), the amendment of claim(s) 1, 4-9, 13, 22, and 23 have been acknowledged and entered.  
In view of the amendment to claim(s) 7-9 and 13, the objection to claim(s) 7-9 and 13 is withdrawn.
In view of the amendment to claim(s) 1, 4-6, 22, and 23, new objections are provided in the response below.
In view of the amendment to claim(s) 1, 4-9, 13, 22, and 23, the rejection of claims 1-23 under 35 U.S.C. §102 and 103 is withdrawn.
In light of the amended claims, new grounds for rejection under 35 U.S.C. §103 are provided in the response below. 

Response to Arguments
Applicant’s arguments regarding the prior art rejections under 35 U.S.C. §102/103, see pages 9-10 of the Response to Non-Final Office Action dated May 19, 2021, which was received on October 18, 2021 (hereinafter Response and Office Action
However, upon further consideration, new ground(s) of rejection under 35 U.S.C. §103 are made in light of combinations of the previously cited references in view of newly cited references Stefani (U.S. Pat. No. 10,777,186, hereinafter Stefani) and Kashima (U.S. Pat. App. Pub. No. 2008/0300876, hereinafter Kashima).
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.	

Claim Objections
Claims 1, 5-6, 22, and 23 are objected to because of the following informalities:  
In claims 5 and 6, the word “instuctions” should read “instructions.”
In claims 1, 22, and 23, the phrase “…in view of the evaluation according to a natural language grammar” should read “…in view of the evaluation according to the natural language grammar.”
Appropriate correction is required.
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claim 4 is objected to under 37 CFR 1.75(c), as being of improper dependent form for failing to further limit the subject matter of a previous claim. Applicant is required to cancel the claim(s), or amend the claim(s) to place the claim(s) in proper dependent form, or rewrite the claim(s) in independent form. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 7, 9-10, 20, and 21-23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Crinon (U.S. Pat. App. Pub. No. 2008/0295040, hereinafter Crinon) in view of Stefani.

Regarding claim 1, Crinon discloses A conferencing endpoint comprising: (“real time conferencing component 102”; Crinon, ¶ [0022]); an audio input interface; a network interface; a processor; system memory coupled to the processor and storing instructions configured to cause the processor to (“real time conferencing component 102 can [include] an input component 202 (audio input interface) that can obtain the audio data” and real time conferencing component 102 can be substantially similar to exemplary computing system 1112, including “a processing unit 1114 (processor), a system memory 1116, and a system bus 1118... [which] couples system components including... the system memory 1116 to the processing unit 1114” and a “network interface 1148”; Crinon, ¶¶ [0027], [0063], [0069]): coordinate connection of the conferencing endpoint via the network interface to a network conference including one or more other conferencing endpoints (“The real time conferencing component 102 can send Crinon, ¶¶ [0023], [0069], [0022]); capture an audio stream that includes speech from the audio input interface (“the input component 202 can be a microphone that can capture the audio data and generate electrical impulses.”; Crinon, ¶ [0028]); transcribe the audio stream into a caption stream… (“text streaming component 106 can further include a speech to text conversion component 204 that converts the audio data to text data [and] can employ a speech recognition engine that can convert digital signals corresponding to the audio data to phonemes, words, and so forth, therefore recognizing speech. “Moreover, the speech to text conversion component 204 can process continuous speech,” thus creating a text stream (caption stream) from the recognized speech.; Crinon, ¶ [0029]); multiplex the audio stream and the caption stream into a transport stream (“the synchronization component 606 can employ timestamps to synchronize data (e.g., audio, video, text, . . .). For example, the timestamps can be in the real time transport protocol (RTP) used by real time communication systems,” where “Separate streams of data including timestamps can be generated ... and the streams can be multiplexed over the RTP.” Therefore, the audio stream and the text stream (caption stream) are multiplexed into a transport stream.; Crinon, ¶ [0051]); and send the transport stream to the one or more other conferencing endpoints via the network interface (“Separate streams of data including timestamps can be generated ... and the streams can be multiplexed over the RTP,” where the audio stream and the text stream (caption stream) as part of a multiplexed transport stream, are transported (over the RTP) which are “correlated... for presentation to listening participants in the real time teleconference.” Further, the “receiving Crinon, ¶¶ [0049], [0051]). However, Crinon fails to expressly recite convert the speech into a transcription hypothesis; evaluate the transcription hypothesis according to a natural language grammar to inform speech recognition; and generate the caption stream from the transcription hypothesis in view of the evaluation according to a natural language grammar.
Stefani teaches systems and methods for “streaming real-time automatic speech recognition (ASR).” (Stefani, Col. 2, line 1). Regarding claim 1, Stefani teaches transcribe the audio stream into a caption stream, including: convert the speech into a transcription hypothesis (" For each chunk of audio data, the streaming ASR engine can use the acoustic model to break {convert} the audio data {the speech} into a series of words {transcription hypothesis}"; Stefani, Col. 4, lines 45-50); evaluate the transcription hypothesis according to a natural language grammar to inform speech recognition ("models 126 can include an acoustic model and a language model… [where] The language model includes grammar rules {natural language grammar}, language constructs, and other language-specific nuances." and the output of the acoustic model, which is "a series of words" derived from the audio data {transcription hypothesis}, "can be passed through {evaluate...} the language model {according to natural language grammar} to identify phrases and/or sentences corresponding to the series of words identified by the acoustic model {...to inform speech recognition}."; Stefani, Col. 4, lines 38-40 and lines 48-55); and generate the caption stream from the transcription hypothesis in view of the evaluation according to a natural language grammar ("The output of the language model can be passed through a search algorithm to identify a highest confidence transcription" selected from "a plurality of hypotheses associated with each resulting sentence (or series of words)" where "the real-time transcription may be used to provide closed captions of a live event,"; Stefani, Col. 5, lines 1-6; Col. 7, lines 20-25).
Crinon to incorporate the teachings of Stefani to include convert the speech into a transcription hypothesis; evaluate the transcription hypothesis according to a natural language grammar to inform speech recognition; and generate the caption stream from the transcription hypothesis in view of the evaluation according to a natural language grammar. The systems and methods described in Stefani can “automatically generate [accurate] transcripts of the speech [from] the audio data stream.” (Stefani, Col. 2, lines 2-17). 

Regarding claim 2, the rejection of claim 1 is incorporated. Crinon further discloses further comprising a video input interface (“it is contemplated that the real time conferencing component 102 (e.g., via the input component 202) can receive video data (not shown) along with the audio data,” thus the input component 202 includes a video input interface.; Crinon, ¶ [0027]); the system memory further storing instructions configured to cause the processor to capture a video stream from the video input interface (“the input component 202 can obtain audio data and/or video data from a participant in a teleconference (e.g., the active speaker).”; Crinon, ¶ [0033]); and wherein instructions configured to multiplex the audio stream and the caption stream into a transport stream comprise instructions configured to multiplex the audio stream, the caption stream, and the video stream into the transport stream (“the synchronization component 606 can employ timestamps to synchronize data (e.g., audio, video, text, . . .),” thus the audio stream, the video stream, and the text stream (caption stream), where the “separate streams of data including timestamps can be generated (e.g., at a sending endpoint, an AVMCU, . . .), and the streams can be multiplexed over the RTP,” thus the streams are multiplexed.; Crinon, ¶ [0051]).

Regarding claim 3, the rejection of claim 1 is incorporated. Crinon further discloses wherein instructions configured to coordinate connection to a network conference comprise instructions configured to coordinate connection to a plurality of conferencing endpoints including a first conferencing endpoint and a second endpoint (“The real time conferencing component 102 can send and/or receive data (e.g., via a network such as the internet, a corporate intranet, a telephone network, . . .) utilized in connection with audio/video teleconferences,” through the network interface 1148 and “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be... an audio/video multi-point control unit (AVMCU),... [which] can be a bridge that interconnects several endpoints and enables routing data between the endpoints,” thus coordinating the connection of the other disparate real time conferencing component(s) 104. As shown in FIG. 4, the system can include “any number of receiving endpoints (e.g., a receiving endpoint 1 404, [and] a receiving endpoint 2 406),” which, in this embodiment,  is the first conferencing endpoint and the second conferencing endpoint, respectively.; Crinon, ¶¶ [0023], [0069], [0022]); and wherein instructions configured to send the transport stream to the one or more other conferencing endpoints comprises instructions configured to: send the transport stream directly to the first conferencing endpoint (“It is to be appreciated that the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be… an audio/video multi-point control unit (AVMCU), included within … an endpoint,” thus the real time conferencing component 102 can be both an AVMCU and an endpoint. In this embodiment, the real time conferencing component 102 can, as “the AVMCU 402,” “route data to non-speaking participants,” Since, in this embodiment, the real time conferencing component 102 is also the “the sending endpoint 302 ... associated with the active speaker”, the transport stream is sent directly to the receiving endpoint 1 (the first conferencing endpoint).; Crinon, ¶¶ [0022], [0038], FIG. 4); and send the transport stream directly to the second conferencing endpoint (In the same way, using the example provided in FIG. 4, the real time conferencing component 102 is Crinon, ¶¶ [0022], [0038], FIG. 4).

Regarding claim 7, the rejection of claim 1 is incorporated. Crinon further discloses further comprising a display interface and an audio output interface (the endpoints can include “output component 306 … [including] a display (e.g., monitor, television, projector,…) to present video data and/or text data” which forms the display interface, and “the output component 306 can comprise one or more speakers to render audio output,” which forms the audio output interface.; Crinon, ¶ [0035]); the system memory further storing instructions configured to: receive an other transport stream, including an other audio stream and an other caption stream corresponding to the other audio stream, directly from an other conferencing endpoint (“the sending endpoint 302 can send audio data, video data and text data to the AVMCU 402,” where “sending endpoint 302 can be the real time conferencing component 102 (and/or one of the disparate real time conferencing component(s) 104) described herein (and similarly the receiving endpoint 304 can be the real time conferencing component 102 and/or one of the disparate real time conferencing component(s) 104),” and where “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be… an audio/video multi-point control unit (AVMCU), included within … an endpoint,” thus the real time conferencing component 102 can be both an AVMCU and a receiving endpoint 302. Likewise, any of the disparate real time conferencing components 104 can be the sending endpoint 302, where the sending component can multiplex the “ synchronize[d] data (e.g., audio, video, text, . . .)” to create the transport stream to be received by the receiving endpoints.; Crinon, ¶¶ [0040], [0032], [0022], [0051]); and coordinate outputs at the conferencing endpoint, including coordinating output of the other audio stream at the audio output device with output of the other caption stream at the display interface (“The real time conferencing component 102 can Crinon, ¶¶ [0023], [0069], [0022], [0051]).

Regarding claim 9, the rejection of claim 7 is incorporated. Crinon further discloses wherein instructions configured to receive an other transport stream comprise instructions configured to receive the other transport stream including an other video stream (“the sending endpoint 302 can send audio data, video data and text data to the AVMCU 402,” where “sending endpoint 302 can be the real time conferencing component 102 (and/or one of the disparate real time conferencing component(s) 104) described herein (and similarly the receiving endpoint 304 can be the real time conferencing component 102 and/or one of the disparate real time conferencing component(s) 104),” and where “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be… an audio/video multi-point control unit (AVMCU), included within … an endpoint,” thus the real time conferencing component 102 can be both an AVMCU and a receiving endpoint 302. Likewise, any of the disparate real time conferencing components 104 can be the sending endpoint 302, where the sending component can multiplex the “ synchronize[d] data (e.g., audio, video, text, . . .)” to create the transport stream to be received by the receiving endpoints.; Crinon, ¶¶ [0040], [0032], [0022], [0051]); wherein instructions configured to coordinate outputs at the conferencing endpoint comprise instructions configured to: present the other video stream in a window at the display interface (“at a receiving endpoint (e.g., the real time conferencing component 102, the receiving endpoint 304 of FIG. 3, the receiving endpoints 404-408 of FIGS. 4 and 5, . . .), when a video frame is received, data can be decoded to render the video frame while the metadata including the text can also be decoded to render closed captions on a screen with the corresponding video frame.”; Crinon, ¶ [0050]); and present the other caption stream in the window (“while the metadata including the text can also be decoded to render closed captions on a screen with the corresponding video frame.”; Crinon, ¶ [0050]).

Regarding claim 10, the rejection of claim 7 is incorporated. Crinon further discloses wherein instructions configured to coordinate outputs at the conferencing endpoint comprise instructions configured to: determine that caption presentation is toggled off (“Pursuant to another example, the sending endpoint 302 can select whether to disable or enable the ability of receiving endpoints 404-408 to obtain the text data for closed captioning”; Crinon, ¶ [0040]); and not present the other caption stream at the display device in response to the determination (“hence, if closed captioning is disabled, the sending endpoint 302 can sent audio data and video data to the AVMCU 402 without text data, for instance.”; Crinon, ¶ [0040]).

Regarding claim 20, the rejection of claim 1 is incorporated. Crinon further discloses the system memory further storing instructions configured to mute the audio stream and suspend multiplexing the audio stream into the transport stream (“Hence, a speaker (e.g., the output component N 414) associated with the receiving endpoint N 408 can be muted,” where “the action can be triggered in the receiving endpoint N 408 by a mute button on a user interface,” and “In response to the request, the AVMCU 402 can halt sending of the audio data to the receiving endpoint N 408, and the text data can be transmitted instead with the video data,” where Crinon, ¶ [0043]).

Regarding claim 21, the rejection of claim 1 is incorporated. Crinon further discloses the system memory further storing instructions configured to mute the caption stream and suspend multiplexing the caption stream into the transport stream (“In the manual negotiation scenario, the participant employing each of the receiving endpoints 404-408 can select whether closed captions are desired,” where, in light of the request to receive text,  “the AVMCU 402 can forward text data to the receiving endpoint 2 406,” where a participant can choose the transmission of “text data or audio data” or “both text data and audio data”; Crinon, ¶ [0043]).

Regarding claim 22, Crinon discloses A method of generating a combined caption stream, the method comprising (“the claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof”; Crinon, ¶ [0021]): performing automatic speech recognition on speech included in a first audio stream to create a first caption stream… (“the sending endpoint 302 can send audio data, video data and text data to the AVMCU 402,” where “sending endpoint 302 can be the real time conferencing component 102 (and/or one of the disparate real time conferencing component(s) 104) described herein” where the “text streaming component 106 can further include a speech to text conversion component 204 that converts the audio data to text data… [which] can employ a speech recognition engine that can convert digital signals corresponding to the audio data to phonemes, words, and so forth…” which allows the system to “process continuous speech and/or isolated or discrete speech.” The real time conferencing component 102, when acting as the sending endpoint 302, uses the “the input component 202 … [to] capture the audio data and generate Crinon, ¶¶ [0040], [0029], [0028]) performing automatic speech recognition on other speech included in a second audio stream to create a second caption stream (the system further includes “the case where there are multiple concurrent active speakers in the conference and text streams are available for each of these participants” Thus, the system can have multiple active sending endpoints 302. The real time conferencing component 104, when acting as a second sending endpoint 302, uses the “the input component 202 … [to] capture the audio data and generate electrical impulses,” thus producing a second audio stream having other speech included. Then, the second audio stream is converted to a second text data (second caption stream) through the text streaming component 106 using the speech recognition engine (performing automatic speech recognition); Crinon, ¶¶ [0043], [0029], [0028]); combining the first caption stream and the second caption stream into a combined caption stream (In the case of “audio stream… represent[ing] a combination of all active speakers,” thus a combination audio stream, “the AVMCU 402 can… elect to send several text streams, each corresponding to one active speech track” and where “Separate streams of data including timestamps can be generated (e.g., at a sending endpoint, an AVMCU, . . .), and the streams can be multiplexed” the first caption stream and the second caption stream can be multiplexed together (combined) into a combined caption stream.; Crinon, ¶¶ [0041], [0051]); and transmitting the combined caption stream over a network (“The examples mentioned above can be extended to the case where there are multiple concurrent active speakers in the conference and text streams are available for each of these participants in which case manual selection can include the choice of which closed captions stream is selected for viewing in the receiving endpoint.” where text data (text streams) are “multiplexed over the RTP” and sent to receiving endpoints. Thus, both the first and the second caption streams can be transferred over the network for display at the display interface; Crinon, ¶¶ [0043], [0051]). Crinon fails to expressly recite converting the speech into a transcription hypothesis; evaluating the transcription hypothesis according to a natural language grammar to inform speech recognition; and generating the first caption stream from the transcription hypothesis in view of the evaluation according to a natural language grammar.
The relevance of Stefani is described above with relation to claim 1. Regarding claim 22, Stefani teaches converting the speech into a transcription hypothesis (" For each chunk of audio data, the streaming ASR engine can use the acoustic model to break {convert} the audio data {the speech} into a series of words {transcription hypothesis}"; Stefani, Col. 4, lines 45-50); evaluating the transcription hypothesis according to a natural language grammar to inform speech recognition ("models 126 can include an acoustic model and a language model… [where] The language model includes grammar rules {natural language grammar}, language constructs, and other language-specific nuances." and the output of the acoustic model, which is "a series of words" derived from the audio data {transcription hypothesis}, "can be passed through {evaluate...} the language model {according to natural language grammar} to identify phrases and/or sentences corresponding to the series of words identified by the acoustic model {...to inform speech recognition}."; Stefani, Col. 4, lines 38-40 and lines 48-55); and generating the first caption stream from the transcription hypothesis in view of the evaluation according to a natural language grammar ("The output of the language model can be passed through a search algorithm to identify a highest confidence transcription" selected from "a plurality of hypotheses associated with each resulting sentence (or series of words)" where "the real-time transcription may be used to provide closed captions of a live event,"; Stefani, Col. 5, lines 1-6; Col. 7, lines 20-25).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon to incorporate the teachings of Stefani to include converting the speech into a transcription hypothesis; evaluating the transcription hypothesis according to a natural Stefani can “automatically generate [accurate] transcripts of the speech [from] the audio data stream.” (Stefani, Col. 2, lines 2-17). 

Regarding claim 23, Crinon discloses A method comprising (“the claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof”; Crinon, ¶ [0021]): coordinating connection of the conferencing endpoint via the network interface to a network conference including one or more other conferencing endpoints (“The real time conferencing component 102 can send and/or receive data (e.g., via a network such as the internet, a corporate intranet, a telephone network, . . .) utilized in connection with audio/video teleconferences,” through the network interface 1148 and “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be... an audio/video multi-point control unit (AVMCU),... [which] can be a bridge that interconnects several endpoints and enables routing data between the endpoints,” thus coordinating the connection of the other disparate real time conferencing component(s) 104; Crinon, ¶¶ [0023], [0069], [0022]); capture an audio stream that includes speech from the audio input interface (“the input component 202 can be a microphone that can capture the audio data and generate electrical impulses.”; Crinon, ¶ [0028]); transcribe the audio stream into a caption stream… (“text streaming component 106 can further include a speech to text conversion component 204 that converts the audio data to text data [and] can employ a speech recognition engine that can convert digital signals corresponding to the audio data to phonemes, words, and so forth, therefore recognizing speech. “Moreover, the speech to text conversion component 204 can process continuous speech,” thus creating a text stream (caption stream) from the recognized speech.; Crinon, ¶ [0029]); multiplex the audio stream and the caption stream into a transport stream (“the synchronization component 606 can employ timestamps to synchronize data (e.g., audio, video, text, . . .). For example, the timestamps can be in the real time transport protocol (RTP) used by real time communication systems,” where “Separate streams of data including timestamps can be generated ... and the streams can be multiplexed over the RTP.” Therefore, the audio stream and the text stream (caption stream) are multiplexed into a transport stream.; Crinon, ¶ [0051]); and send the transport stream to the one or more other conferencing endpoints via the network interface (“Separate streams of data including timestamps can be generated ... and the streams can be multiplexed over the RTP,” where the audio stream and the text stream (caption stream) as part of a multiplexed transport stream, are transported (over the RTP) which are “correlated... for presentation to listening participants in the real time teleconference.” Further, the “receiving endpoints can utilize timestamps to identify correlation between data within the separate streams”, thus the transport stream was received by the receiving endpoints, also referred to as one or more disparate real time conferencing components 104 (conferencing endpoints).; Crinon, ¶¶ [0049], [0051]). However, Crinon fails to expressly recite convert the speech into a transcription hypothesis; evaluate the transcription hypothesis according to a natural language grammar to inform speech recognition; and generate the caption stream from the transcription hypothesis in view of the evaluation according to a natural language grammar.
The relevance of Stefani is described above with relation to claim 1. Regarding claim 23, Stefani teaches transcribe the audio stream into a caption stream, including: convert the speech into a transcription hypothesis (" For each chunk of audio data, the streaming ASR engine can use the acoustic model to break {convert} the audio data {the speech} into a series of words {transcription hypothesis}"; Stefani, Col. 4, lines 45-50); evaluate the transcription hypothesis according to a natural language grammar to inform speech recognition ("models 126 can include an acoustic model and a language model… [where] The language model includes grammar rules {natural language grammar}, language constructs, and other Stefani, Col. 4, lines 38-40 and lines 48-55); and generate the caption stream from the transcription hypothesis in view of the evaluation according to a natural language grammar ("The output of the language model can be passed through a search algorithm to identify a highest confidence transcription" selected from "a plurality of hypotheses associated with each resulting sentence (or series of words)" where "the real-time transcription may be used to provide closed captions of a live event,"; Stefani, Col. 5, lines 1-6; Col. 7, lines 20-25).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon to incorporate the teachings of Stefani to include convert the speech into a transcription hypothesis; evaluate the transcription hypothesis according to a natural language grammar to inform speech recognition; and generate the caption stream from the transcription hypothesis in view of the evaluation according to a natural language grammar. The systems and methods described in Stefani can “automatically generate [accurate] transcripts of the speech [from] the audio data stream.” (Stefani, Col. 2, lines 2-17). 

Claims 5-6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Crinon and Stefani as applied to claim 1 above, and further in view of Stahl (U.S. Pat. App. Pub. No. 2018/0182385, hereinafter Stahl).

Regarding claim 5, the rejection of claim 1 is incorporated. Crinon and Stefani disclose all of the elements of the current invention as stated above. However, Crinon and Stefani fail(s) to expressly disclose wherein instructions configured to evaluate the transcription hypothesis 
Stahl teaches “systems, methods, and algorithms that use speech characterization to condition automatic speech recognition and parsing according to natural language grammars.” (Stahl, ¶ [0007]). Regarding claim 5, Stahl teaches wherein instructions configured to evaluate the transcription hypothesis according to a natural language grammar comprise inst[r]uctions configured to evaluate the transcription hypothesis according to a user-specific grammar ("Interpreter module 22 consumes the UID and uses it to condition its interpretation according to a grammar" where "some embodiments maintain databases of UID-specific interpretation weights," where UID-specific interpretation weights are user specific grammar.; Stahl, ¶¶ [0033]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the real-time ASR streaming systems of Stefani, to incorporate the teachings of Stahl to include wherein instructions configured to evaluate the transcription hypothesis according to a natural language grammar comprise inst[r]uctions configured to evaluate the transcription hypothesis according to a user-specific grammar. Natural language grammar can be employed as part of an “improved approach for generating interpretations of speech inputs”, as recognized by Stahl. (Stahl, ¶¶ [0006], [0007]).

Regarding claim 6, the rejection of claim 4 is incorporated. Crinon and Stefani disclose all of the elements of the current invention as stated above. However, Crinon and Stefani fail(s) to expressly disclose wherein the natural language grammars include a topic-specific grammar.
The relevance of Stahl is described above with relation to claim 5. Regarding claim 6, Stahl teaches wherein instructions configured to evaluate a transcription hypothesis according to a natural language grammar comprise inst[r]uctions configured to evaluate the transcription hypotheses according to a topic-specific grammar ("Some embodiments allow for grammar rules related to mature or offensive subject matter," where mature or offensive subject matter is a topic.; Stahl, ¶¶ [0056]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the real-time ASR streaming systems of Stefani, to incorporate the teachings of Stahl to include wherein instructions configured to evaluate a transcription hypothesis according to a natural language grammar comprise inst[r]uctions configured to evaluate the transcription hypotheses according to a topic-specific grammar. Natural language grammar can be employed as part of an “improved approach for generating interpretations of speech inputs”, as recognized by Stahl. (Stahl, ¶¶ [0006], [0007]).

Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Crinon, Stefani, and Stahl as applied to claim 6 above, and further in view of Kashima.

Regarding claim 4, the rejection of claim 6 is incorporated. Crinon, Stefani, and Stahl disclose all of the elements of the current invention as stated above. However, Crinon, Stefani, and Stahl fail(s) to expressly disclose wherein instructions configured to recognize speech comprise instructions configured to evaluate transcription hypotheses according to natural language grammars to inform speech recognition.
Kashima teaches “a speech recognizing device and the like that performs recognition of natural speech using a speech application program… of grammar method.” (Kashima, ¶ [0002]). Regarding claim 4, Kashima discloses wherein instructions configured to evaluate the transcription hypothesis according to a topic-specific grammar comprise instructions configured to match a requisite number of words in the audio stream to words in the topic-specific grammar ("The candidate sentence generating means 19 evolves all the phrases that  Kashima, ¶¶ [0044], [0090], [0096]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the real-time ASR streaming systems of Stefani, and the speech characterization systems of Stahl, to incorporate the teachings of Kashima to include wherein instructions configured to recognize speech comprise instructions configured to evaluate transcription hypotheses according to natural language grammars to inform speech recognition. The systems and methods of Kashima can recognize “natural speech by dictation” while avoiding the “necessity of collecting a vast amount of interaction data specific to each application and preparing a statistical language model”. (Kashima, ¶¶ [0009], [0010]).

Claims 8 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Crinon and Stefani as applied to claim 1 and 7 above, and further in view of Casagrande (U.S. Pat. App. Pub. No. 2017/0171600, hereinafter Casagrande).

Regarding claim 8, the rejection of claim 7 is incorporated. Crinon and Stefani disclose all of the elements of the current invention as stated above. Crinon further discloses wherein instructions configured to receive an other transport stream comprise instructions configured to receive the other transport stream including an other video stream (“the sending endpoint 302 can send audio data, video data and text data to the AVMCU 402,” where “sending endpoint 302 can be the real time conferencing component 102 (and/or one of the Crinon, ¶¶ [0040], [0032], [0022], [0051]); and wherein instructions configured to coordinate outputs at the conferencing endpoint comprise instructions configured to: present the video stream in a window at the display interface (“at a receiving endpoint (e.g., the real time conferencing component 102, the receiving endpoint 304 of FIG. 3, the receiving endpoints 404-408 of FIGS. 4 and 5, . . .), when a video frame is received, data can be decoded to render the video frame while the metadata including the text can also be decoded to render closed captions on a screen with the corresponding video frame.”; Crinon, ¶ [0050]); and present the other caption stream in a… window supplementing presentation of the video in the window (“while the metadata including the text can also be decoded to render closed captions on a screen with the corresponding video frame.” Thus, the system includes presenting the closed captions in a window. The captions correspond to the video frame, thus supplementing the presentation of the video in the window; Crinon, ¶ [0050]). However, Crinon and Stefani fail(s) to expressly disclose present the other caption stream in a different window.
Casagrande teaches systems and methods for synchronizing second screen content with audio/video programming. (Casagrande, ¶ [0002]). Regarding claim 8, Casagrande teaches present the other caption stream in a different window supplementing presentation of the video in the window (the system can “convert the received data to suitably formatted video Casagrande, ¶ [0021], [0023]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the real-time ASR streaming systems of Stefani, to incorporate the teachings of Casagrande to include wherein instructions configured to receive an other transport stream comprise instructions configured to receive the other transport stream including an other video stream; and wherein instructions configured to coordinate outputs at the conferencing endpoint comprise instructions configured to: present the video stream in a window at the display interface; and present the other caption stream in a different window supplementing presentation of the video in the window. The systems and methods described in Casagrande can present “intelligently selected content” on a second screen, where the content “is directly associated to particular events currently occurring in programming while it is being viewed.” (Casagrande, ¶¶ [0005], [0006]).

Regarding claim 19, the rejection of claim 1 is incorporated. Crinon and Stefani disclose all of the elements of the current invention as stated above. However, Crinon and Stefani fail(s) to expressly disclose wherein instructions configured to recognize speech comprise instructions configured to recognize a speech portion included in a blacklist; and the system memory further 
The relevance of Casagrande is described above with relation to claim 8. Regarding claim 19, Casagrande teaches wherein instructions configured to recognize speech comprise instructions configured to recognize a speech portion included in a blacklist (“the second screen electronic device 400 may utilize a ‘black list’ of unauthorized terms, to prevent unwanted caption words from being transmitted to the second screen content module 412 to be used for the retrieval and presentation of second screen content. For example, words designated by the Federal Communications Commission (FCC) as being unfit for broadcast television during daytime hours may be present on the ‘black list’ of words,” thus recognizing a portion included in a blacklist.; Casagrande, ¶ [0058]); and the system memory further storing instructions configured to adjust the caption stream to obscure presentation of the speech portion at the one or more other conferencing endpoints (the system can further “prevent the presentation of content associated with profane language. The ‘black list’ of words may be user-configurable, and may include not only profanity, but any word for which a user does not wish to view additional content,” thus caption stream is adjusted to prevent the presentation (obscure) the speech portion at the one or more conferencing devices (noting that this is user configurable to not receive black listed content and involves a plurality of “receiving endpoints N 408”, thus at the “one or more” conferencing devices).; Casagrande, ¶ [0058]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the real-time ASR streaming systems of Stefani, to incorporate the teachings of Casagrande to include wherein instructions configured to receive an other transport stream comprise instructions configured to receive the other transport stream including an other video stream; and wherein instructions configured to coordinate outputs at the conferencing endpoint comprise instructions configured to: present the video stream in a window Casagrande can present “intelligently selected content” on a second screen, where the content “is directly associated to particular events currently occurring in programming while it is being viewed.” (Casagrande, ¶¶ [0005], [0006]).

Claims 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Crinon and Stefani as applied to claim 10 above, and further in view of Casagrande and Garrido (U.S. Pat. App. Pub. No. 2019/0342351, hereinafter Garrido).

Regarding claim 11, the rejection of claim 10 is incorporated. Crinon and Stefani discloses all of the elements of the current invention as stated above. However, Crinon and Stefani fails to expressly recite the system memory further storing instructions configured to toggle captioning off based on one or more of: network characteristics or characteristics of speech included in the other audio stream.
The relevance of Casagrande is described above with relation to claim 8. Regarding claim 11, Casagrande teaches the system memory further storing instructions configured to toggle captioning off based on one or more of: … characteristics of speech included in the other audio stream (“the second screen electronic device 400 may utilize a ‘black list’ of unauthorized terms, to prevent unwanted caption words from being transmitted to the second screen content module 412 to be used for the retrieval and presentation of second screen content. For example, words designated by the Federal Communications Commission (FCC) as being unfit for broadcast television during daytime hours may be present on the ‘black list’ of words, to prevent the presentation of content associated with profane language. The ‘black list’ of words may be user-configurable, and may include not only profanity, but any word for which a user does not wish to view additional content,” thus captioning is toggled off based on profanity or unwanted Casagrande, ¶ [0058]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the real-time ASR streaming systems of Stefani, to incorporate the teachings of Casagrande to include the system memory further storing instructions configured to toggle captioning off based on one or more of: … characteristics of speech included in the other audio stream. The systems and methods described in Casagrande can present “intelligently selected content” on a second screen, where the content “is directly associated to particular events currently occurring in programming while it is being viewed.” (Casagrande, ¶¶ [0005], [0006]). However, Crinon and Casagrande fail to expressly recite the system memory further storing instructions configured to toggle captioning off based on one or more of: network characteristics.
Garrido teaches systems and methods for data stream management in a multi-party conference session. (Garrido, ¶ [0013]). Regarding claim 11, Garrido teaches the system memory further storing instructions configured to toggle captioning off based on one or more of: network characteristics (“the endpoints 110.1-110.n may generate audio feeds, video feeds, and/or closed caption feeds… The endpoints may generate multiple instances of each type feeds” and “each endpoint 110.1-110.n also may generate priority metadata representing a priority assignment conferred on the data feed(s) output by the respective endpoint.” In some examples, data feeds (though described as image feeds, the “types and content of the data feeds are immaterial” and when “associated priority values below a cut-off threshold may not be [presented] at all,” where “thresholds are determined based on network condition,” thus cutting closed caption feeds off (toggling captioning off) based on network characteristics; Garrido, ¶¶ [0015], [0016], [0039], [0053]).
Crinon, as modified by the real-time ASR streaming systems of Stefani, as modified by the methods for synchronizing second screen content with audio/video programming of Casagrande to incorporate the teachings of Garrido to include the system memory further storing instructions configured to toggle captioning off based on one or more of: network characteristics or characteristics of speech included in the other audio stream. Presenting or receiving feeds “based on associated priorities [can] improve a user experience at the receiving endpoint,” as recognized by Garrido. (Garrido, ¶ [0021]).

Claims 12 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Crinon and Stefani as applied to claim 1 and 7 above, and further in view of Garrido.

Regarding claim 12, the rejection of claim 7 is incorporated. Crinon and Stefani disclose all of the elements of the current invention as stated above. Crinon further discloses the system memory further storing instructions configured to receive a further transport stream, including a further audio stream and a further caption stream corresponding to the further audio stream, directly from a further conferencing endpoint, (“the sending endpoint 302 can send audio data, video data and text data to the AVMCU 402,” where “sending endpoint 302 can be the real time conferencing component 102 (and/or one of the disparate real time conferencing component(s) 104) described herein (and similarly the receiving endpoint 304 can be the real time conferencing component 102 and/or one of the disparate real time conferencing component(s) 104),” and where “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be… an audio/video multi-point control unit (AVMCU), included within … an endpoint,” thus the real time conferencing component 102 can be both an AVMCU and a receiving endpoint 302. Likewise, any of the disparate real time conferencing Crinon, ¶¶ [0040], [0032], [0022], [0051], [0038]), the further conferencing endpoint included in the one or more other conferencing endpoints (the receiving endpoints N are included in the one or more other conferencing endpoints, as shown in FIG. 4.; Crinon, ¶ [0038], FIG. 4); and wherein instructions configured to coordinate outputs at the conferencing endpoint comprise instructions configured to: present the other caption stream at the display interface with first visual characteristics; and present the further caption stream at the display interface with second visual characteristics (“The examples mentioned above can be extended to the case where there are multiple concurrent active speakers in the conference and text streams are available for each of these participants in which case manual selection can include the choice of which closed captions stream is selected for viewing in the receiving endpoint.” where “text data (text streams)” are “outputt[ed] via a display in the form of closed captions.” Thus, the other caption stream is presented at the display interface with first visual characteristics and the further caption stream is presented at the display interface with second visual characteristics.; Crinon, ¶¶ [0043], [0058]).  However, Crinon fail(s) to expressly disclose the second visual characteristics differing from the first visual characteristics.
The relevance of Garrido is described above with relation to claim 11. Regarding claim 12, Garrido teaches the second visual characteristics differing from the first visual characteristics (“the endpoints 110.1-110.n may generate audio feeds, video feeds, and/or closed caption feeds. In The endpoints may generate multiple instances of each type feeds” and “each endpoint 110.1-110.n also may generate priority metadata representing a priority assignment conferred on the data feed(s) output by the respective endpoint” and “a feed with a Garrido, ¶¶ [0015], [0016]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the real-time ASR streaming systems of Stefani, to incorporate the teachings of Garrido to include the second visual characteristics differing from the first visual characteristics. Presenting or receiving feeds “based on associated priorities [can] improve a user experience at the receiving endpoint,” as recognized by Garrido. (Garrido, ¶ [0021]).

Regarding claim 18, the rejection of claim 1 is incorporated. Crinon and Stefani disclose all of the elements of the current invention as stated above. Crinon further discloses the system memory further storing instructions configured to receive consent to transcribe recognized speech via [a selection process] at the conferencing endpoint (“the listening participants can manually and/or automatically negotiate the use of closed captions,” at the receiving endpoints, where “the sending endpoint 302 can select whether to disable or enable the ability of receiving endpoints 404-408 to obtain the text data for closed captioning” thus an end user choose whether to consent or not to the transcription of text data for closed captioning (transcribe recognized speech) speech; Crinon, ¶ [0043], [0040]). However, Crinon fail(s) to expressly disclose wherein the selection process is a meeting registration.
The relevance of Garrido is described above with relation to claim 11. Regarding claim 18, Garrido teaches wherein the selection process is a meeting registration (“the present technology can be configured to allow users to select to “opt in” or “opt out” of participation in the collection of personal information data during registration for services,” where the “participant's Garrido, ¶ [0066]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon to incorporate the teachings of Garrido to include wherein the selection process is a meeting registration. Presenting or receiving feeds “based on associated priorities [can] improve a user experience at the receiving endpoint,” as recognized by Garrido. (Garrido, ¶ [0021]).

Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Crinon and Stefani as applied to claim 10 above, and further in view of Garrido and Le Roux (U.S. Pat. App. Pub. No. 2019/0318725, hereinafter Le Roux).

Regarding claim 13, the rejection of claim 12 is incorporated. Crinon and Stefani discloses all of the elements of the current invention as stated above. However, Crinon and Stefani fails to expressly recite the system memory further storing instructions configured to detect that the other caption stream includes speech that temporally overlaps with speech included in the further caption stream; wherein instructions configured to receive an other transport stream comprise instructions configured to receive the other transport stream including an other video stream; wherein instructions configured to receive a further transport stream comprise instructions configured to receive the further transport stream including a further video stream;  wherein instructions configured to output the other caption stream at the display interface with first visual characteristics comprise instructions configured to present the other caption stream along with a person depicted in the other video stream in a window; and wherein instructions configured to output the further caption stream at the display interface with second visual characteristics 
The relevance of Garrido is described above with relation to claim 11. Regarding claim 13, Garrido teaches wherein instructions configured to receive an other transport stream comprise instructions configured to receive the other transport stream including an other video stream (““the endpoints 110.1-110.n may generate audio feeds, video feeds, and/or closed caption feeds (the transport stream)” where “ the relay server 120 [receives the] different data feeds... from the endpoints” and then “send[s] one or more media feeds 230.1, 230.3 and associated priorities 232.1, 232.3 to endpoint 110.n.” which performs “conference management of the media feeds 230.1, 230.3 it receives based on the associated priorities. 232.1, 232.3,” where media feeds 230.1 is the other transport stream including a video feed (an other video stream).; Garrido, ¶¶ [0015]-[0017]); wherein instructions configured to receive a further transport stream comprise instructions configured to receive the further transport stream including a further video stream (““the endpoints 110.1-110.n may generate audio feeds, video feeds, and/or closed caption feeds (the transport stream)” where “ the relay server 120 [receives the] different data feeds... from the endpoints” and then “send[s] one or more media feeds 230.1, 230.3 and associated priorities 232.1, 232.3 to endpoint 110.n.” which performs “conference management of the media feeds 230.1, 230.3 it receives based on the associated priorities. 232.1, 232.3,” where media feeds 230.3 is the further transport stream including a video feed (a further video stream).; Garrido, ¶¶ [0015]-[0017]); wherein instructions configured to output the other caption stream at the display interface with first visual characteristics comprise instructions configured to present the other caption stream along with a person depicted in the other video stream in a window (“ the endpoints 110.1-110.n may generate audio feeds, video feeds, and/or closed caption feeds,” where  The video feed can include “locally-captured video of an endpoint operator,” thus a person is depicted. Further, “all feeds above a threshold priority may be presented in a primary region of the endpoint display called a canvas,” thus feeds, Garrido, ¶¶ [0015], [0039]); and wherein instructions configured to output the further caption stream at the display interface with second visual characteristics comprise instructions configured to simultaneously present the further caption stream along with a person depicted in the further video stream in an other window (“the endpoints 110.1-110.n may generate audio feeds, video feeds, and/or closed caption feeds,” where the video feed can include “locally-captured video of an endpoint operator,” thus a person is depicted. Further, “feeds below a threshold may be presented in a secondary region of the endpoint display called a roster” thus feeds, such as a caption feed (further caption stream) and a video feed (further video stream), as received from a second endpoint 110, which are below a priority level threshold, will be presented in the roster area (an other window).; Garrido, ¶¶ [0015], [0039]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the real-time ASR streaming systems of Stefani, to incorporate the teachings of Garrido to include wherein instructions configured to receive an other transport stream comprise instructions configured to receive the other transport stream including an other video stream; wherein instructions configured to receive a further transport stream comprise instructions configured to receive the further transport stream including a further video stream;  wherein instructions configured to output the other caption stream at the display interface with first visual characteristics comprise instructions configured to present the other caption stream along with a person depicted in the other video stream in a window; and wherein instructions configured to output the further caption stream at the display interface with second visual characteristics comprise instructions configured to simultaneously present the further caption stream along with a person depicted in the further video stream in an other window. Presenting or receiving feeds “based on associated priorities [can] improve a user experience at Garrido. (Garrido, ¶ [0021]). However, Crinon, Stefani, and Garrido fail to expressly recite the system memory further storing instructions configured to detect that the other caption stream includes speech that temporally overlaps with speech included in the further caption stream.
Le Roux teaches systems and methods for “recognizing speech from an acoustic signal with multiple overlapping speakers.” (Le Roux, ¶ [0010]). Regarding claim 13, Le Roux teaches the system memory further storing instructions configured to detect that the other caption stream includes speech that temporally overlaps with speech included in the further caption stream (“a speech recognition system for recognizing speech including overlapping speech by multiple speakers,” where the system is “trained to transform the received acoustic signal into a text for each target speaker” and “to output the text for each target speaker [using] an output interface to transmit the text for each target speaker.”; Le Roux, ¶ [0011]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the real-time ASR streaming systems of Stefani, as modified by the systems and methods for data stream management in a multi-party conference session of Garrido to incorporate the teachings of Le Roux to include the system memory further storing instructions configured to detect that the other caption stream includes speech that temporally overlaps with speech included in the further caption stream. “In the system without explicit separation, recognition can be optimized directly for recognizing speech from an acoustic signal with multiple overlapping speakers, leading to improved performance,” as recognized by Le Roux. (Le Roux, ¶ [0010]).

Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Crinon, Stefani, and Garrido as applied to claim 12 above, and further in view of Calatano (U.S. Pat. App. Pub. No. 2019/0158927, hereinafter Calatano).

Regarding claim 14, the rejection of claim 12 is incorporated. Crinon, Stefani, and Garrido disclose all of the elements of the current invention as stated above. However, Crinon, Stefani, and Garrido fail to expressly recite wherein instructions configured to output the other caption stream at the display interface with first visual characteristics comprise instructions configured to present the other caption stream in a first color; and wherein instructions configured to output the further caption stream at the display interface with second visual characteristics comprise instructions configured to present the further caption stream in a second color, the second color differing from the first color.
Calatano teaches a “smart closed caption systems for appearance and positioning of closed caption text in video content.” (Calatano, ¶ [0003]). Regarding claim 14, Calatano teaches wherein instructions configured to output the other caption stream at the display interface with first visual characteristics comprise instructions configured to present the other caption stream in a first color (“in some embodiments of the present invention, the respective text style for a first character may include at least one of font style, text size, bold, italic, or color that is different from a font style, text size, bold, italic, or color of a second character of the one or more identified speaking characters.” In a first example “a color of a text style of the first character may indicate that a first color should be used for closed caption text that is associated with the first character.” Thus, the first visual characteristics can include instructions to present the other caption stream in a first color; Calatano, ¶ [0068]); and wherein instructions configured to output the further caption stream at the display interface with second visual characteristics comprise instructions configured to present the further caption stream in a second color, the second color differing from the first color (Using the same example of the first character above “a color of a text style of the second character may indicate that a second color, different from the first color (of the first character) is to be used for closed caption text that is associated with the second character.” Thus, the second visual characteristics can include Calatano, ¶ [0068]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the real-time ASR streaming systems of Stefani, as modified by the systems and methods for data stream management in a multi-party conference session of Garrido to incorporate the teachings of Calatano to include wherein instructions configured to output the other caption stream at the display interface with first visual characteristics comprise instructions configured to present the other caption stream in a first color; and wherein instructions configured to output the further caption stream at the display interface with second visual characteristics comprise instructions configured to present the further caption stream in a second color, the second color differing from the first color. Modifying the visual representation of the text can “improve recognition of which characters are speaking and their sentiment,” as recognized by Calatano. (Calatano, ¶ [0047]).

Claims 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Crinon and Stefani as applied to claim 1 above, and further in view of Foster (U.S. Pat. App. Pub. No. 2008/0064326, hereinafter Foster).

Regarding claim 15, the rejection of claim 1 is incorporated. Crinon and Stefani disclose all of the elements of the current invention as stated above. However, Crinon and Stefani fail(s) to expressly disclose the system memory further storing instructions configured to compress the audio stream; and wherein instructions configured to multiplex the audio stream and the caption stream into a transport stream comprises instructions configured to multiplex the compressed audio stream and the caption stream.
Foster teaches systems and methods for presenting captions associated with a broadcast media stream. (Foster, ¶ [0001]). Regarding claim 15, Foster teaches the system memory further storing instructions configured to compress the audio stream (“the caption generator 216 embeds or encodes each word by compressing the audio segment (or audio word) 238a or 238n using a known audio compression format (such as MP3, Adaptive Multi-Rate Wideband Codec (AMR-WB), or AccPlus) and then inserting the compressed audio segment or word 402 into a packet 400 along with the corresponding caption 404, which may be compressed using the same technique used to compress each audio segment 238a-238n,”; Foster, ¶ [0045]); and wherein instructions configured to multiplex the audio stream and the caption stream into a transport stream comprises instructions configured to multiplex the compressed audio stream and the caption stream (where “inserting the compressed audio segment or word 402 into a packet 400 along with the corresponding caption 404” is multiplexing the compressed audio stream and the caption stream into a transport stream.; Foster, ¶ [0045]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the real-time ASR streaming systems of Stefani, to incorporate the teachings of Foster to include the system memory further storing instructions configured to compress the audio stream; and wherein instructions configured to multiplex the audio stream and the caption stream into a transport stream comprises instructions configured to multiplex the compressed audio stream and the caption stream. The systems and methods of Foster allow for the presentation of “captions for any live or pre-recorded content” such that deaf or hard of hearing people can enjoy such content without “specialized equipment,” as recognized by Foster. (Foster, ¶¶ [0004], [0005]).

Claims 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Crinon and Stefani as applied to claim 1 above, and further in view of Bianco (U.S. Pat. App. Pub. No. 2015/0100315, hereinafter Bianco).

Regarding claim 16, the rejection of claim 1 is incorporated. Crinon and Stefani disclose all of the elements of the current invention as stated above. However, Crinon and Stefani fail(s) to expressly disclose the system memory further storing instructions configured to send the caption stream to the one or more other conferencing endpoints redundantly.
Bianco teaches systems and methods for improving the quality for Internet Protocol (IP) communications. (Bianco, ¶ [0002]). Regarding claim 16, Bianco teaches the system memory further storing instructions configured to send the caption stream to the one or more other conferencing endpoints redundantly (“because the textual representation of a user's spoken audio input can be encapsulated in far fewer digital data packets than the data created by a CODEC, it is possible to redundantly send multiple copies of the textual representation data, or perform error correction techniques, to ensure that a substantially complete copy of the textual representation data arrives at the destination device.”; Bianco, ¶ [0046]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the real-time ASR streaming systems of Stefani, to incorporate the teachings of Bianco to include the system memory further storing instructions configured to send the caption stream to the one or more other conferencing endpoints redundantly. “If a portion of the digital data created by the CODEC is missing or corrupted, the corresponding portion of the transcription is used to fill in the missing portion,” thus helping to ensure the integrity of the transmission, as recognized by Bianco. (Bianco, ¶ [0044]).

Claims 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Crinon, Stefani, and Bianco as applied to claim 16 above, and further in view of Thijssen (U.S. Pat. No. 6,230,163, hereinafter Thijssen).

Regarding claim 17, the rejection of claim 16 is incorporated. Crinon, Stefani, and Bianco disclose all of the elements of the current invention as stated above. However, Crinon, Stefani, and Bianco fail(s) to expressly disclose wherein the redundancy is by use of a forward error correction code.
Thijssen teaches systems and methods for datastream control and processing. (Thijssen, Col. 1, lines 7-15). Regarding claim 17, Thijssen teaches wherein the redundancy is by use of a forward error correction code (“A final buffer 46 is fed by video decoder 40, for buffering user data such as closed-caption information,” where the user data corresponds to user data 52 having an ECC 54. And where, further, the ECC 54 “contains error protection code, such as the redundancy symbols of a Reed-Solomon code that can be used for correcting a percentage of the symbols that have been received in an incorrect manner,“ thus, teaching redundancy by use of forward error correction codes (Reed-Solomon is a well-known forward error correction code).; Thijssen, Col. 2, lines 62-64; Col. 3, lines 16-20; FIG. 3).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the real-time ASR streaming systems of Stefani, as modified by the methods for quality improvement in IP transmissions of Bianco to incorporate the teachings of Thijssen to include wherein the redundancy is by use of a forward error correction code. The use of readily disposable error correction codes allows for minimal buffer use while correcting “symbols that have been received in an incorrect manner”, as recognized by Thijssen. (Thijssen, Col. 3, lines 18-20).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about 





/Sean E Serraguard/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657