Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Response to Amendments 
Applicant’s amendment filed on 12 September 2022 has been entered. 
In view of the amendment to the claim(s), the amendment of claim(s) 1, 4-6, 12, 23-25, 27, and 28, the cancellation of claim(s) 20-22, and the addition of claim(s) 29-46 have been acknowledged and entered.  
After entry of this response, claims 1-19 and 23-46 are pending.
In view of the amendment to claim(s) 27 and 28, the objection to claim(s) 27 and 28 is withdrawn.
In view of the amendment to claim(s) 28, the rejection of claim(s) 28 under 35 U.S.C. §112 is withdrawn.
In view of the amendment to claim(s) 1, 4-6, 12, 23-25, 27, and 28, the rejection of claims 1-28 under 35 U.S.C. §103  is withdrawn.
In light of the amended/newly added claims, new grounds for rejection under 35 U.S.C. §102 and 35 U.S.C. §103 are provided in the response below. 

Response to Arguments
Prior to entry of this amendment, claims 1-3, 7, 9-10, 20-24, and 26 are rejected under 35 U.S.C. §103 as being unpatentable over Crinon (U.S. Pat. App. Pub. No. 2008/0295040, hereinafter Crinon) in view of Stefani (U.S. Pat. No. 10,777,186, hereinafter Stefani). Claims 5 and 6 are rejected under 35 U.S.C. §103 as being unpatentable over Crinon in view of Stahl (U.S. Pat. App. Pub. No. 2018/0182385, hereinafter Stahl). Claim 4 is rejected under 35 U.S.C. §103 as being unpatentable over Crinon and Stahl, and further in view of Kashima (U.S. Pat. App. Pub. No. 2008/0300876, hereinafter Kashima). Claims 8 and 19 are rejected under 35 U.S.C. §103 as being unpatentable over Crinon, and further in view of Casagrande (U.S. Pat. App. Pub. No. 2017/0171600, hereinafter Casagrande). Claim 11 is rejected under 35 U.S.C. §103 as being unpatentable over Crinon, and further in view of Casagrande and Garrido (U.S. Pat. App. Pub. No. 2019/0342351, hereinafter Garrido). Claims 12 and 18 are rejected under 35 U.S.C. §103 as being unpatentable over Crinon, and further in view of Garrido. Claim 13 is rejected under 35 U.S.C. §103 as being unpatentable over Crinon, and further in view of Garrido and Le Roux (U.S. Pat. App. Pub. No. 2019/0318725, hereinafter Le Roux). Claim 14 is rejected under 35 U.S.C. §103 as being unpatentable over Crinon and Garrido, and further in view of Calatano (U.S. Pat. App. Pub. No. 2019/0158927, hereinafter Calatano). Claim 15 is rejected under 35 U.S.C. §103 as being unpatentable over Crinon, and further in view of Foster (U.S. Pat. App. Pub. No. 2008/0064326, hereinafter Foster). Claim 16 is rejected under 35 U.S.C. §103 as being unpatentable over Crinon, and further in view of Bianco (U.S. Pat. App. Pub. No. 2015/0100315, hereinafter Bianco). Claim 17 is rejected under 35 U.S.C. §103 as being unpatentable over Crinon and Bianco, and further in view of Thijssen (U.S. Pat. No. 6,230,163, hereinafter Thijssen). Claim 25 is rejected under 35 USC 103 as being obvious over Crinon and further in view of Drewes (U.S. Pat. App. Pub. No. 2017/0133007, hereinafter Drewes). Claim 27 is rejected under 35 USC 103 as being obvious over Crinon and further in view of Parc (U.S. Pat. App. Pub. No. 2021/0210072, hereinafter Parc).
Applicant’s arguments regarding the written description rejection under 35 U.S.C. 112 and the prior art rejections under 35 U.S.C. §102/103, see pages 17-21 of the Response to Non-Final Office Action dated 10 June 2022, which was received on 12 September 2022 (hereinafter Response and Office Action, respectively), have been fully considered.
Regarding the written description rejection, applicant asserts equivalence between the limitations “muting… for a period of time and including muting ….for a different period of time” and “muting… independently,” this argument is not persuasive. Muting independently asserts that the function is performed separately and not in reliance on the other. The office understands the phrase independent in this context as being unrelated to time, as the specification only discloses independence of the audio muting function as compared to the caption muting function. To the contrary, muting for a period of time calls for exactly that, a “period of time.” The office understands a period of time as having a defined quantity, a defined beginning time and end time for the event, or other form of periodicity, which would create a period. Though Examiner agrees that “independently” muting is disclosed, Applicant’s specification fails to disclose or teach periods of time for muting of the audio or the captions. Therefore, the rejection as previously presented is maintained.
With respect to the rejection(s) of claim(s) 1, and mutatis mutandis, claims 23 and 28, under 35 U.S.C. §103 in light of Crinon in view of Stefani, applicant asserts that Crinon, Kashima, Stahl, Casagrande, and Bianco fail to teach or suggest “muting the audio stream and suspending multiplexing of the compressed audio stream for a period of time and including muting the caption stream and suspending multiplexing of the caption stream for a different period of time”. However, these arguments are not persuasive.
In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., “muting the audio stream and suspending multiplexing of the compressed audio stream for a period of time and including muting the caption stream and suspending multiplexing of the caption stream for a different period of time”) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
In light of applicant’s amendments, Examiner further notes that Crinon discloses “a function muting the audio stream and suspending multiplexing of the audio stream independently from a different function muting the caption stream and suspending multiplexing of the caption stream.” Crinon discloses that “a speaker …associated with the receiving endpoint N 408 can be muted,” where “the action can be triggered in the receiving endpoint N 408 by a mute button on a user interface,” and “In response to the request, the AVMCU 402 can halt sending of the audio data to the receiving endpoint N 408, and the text data can be transmitted instead with the video data,” where only the text data and the video data are multiplexed into the transport stream for receiving endpoint N.  (Crinon, ¶ [0043]). Crinon further discloses that “the participant employing each of the receiving endpoints 404-408 can select whether closed captions are desired,” where, in light of the request to receive text, “the AVMCU 402 can forward text data to the receiving endpoint 2 406” where a participant can choose the transmission of “text data or audio data” or “both text data and audio data.” Thus, the participant may choose whether to send or receive, either the audio data {audio stream} or the text data {caption stream}, which is independent of one another. As such, the rejection of claim 1 is maintained in light of the above arguments.
However, in light of amendments to claims 1 and 23 and the cancellation of claims 20-22, the rejections to claims 1-28 are withdrawn.
Upon further consideration of currently amended and newly added claims, new ground(s) of rejection under 35 U.S.C. §102 and 35 U.S.C. §103 are made in light of combinations of Crinon, Stefani, Stahl, Kashima, Casagrande, Garrido, Le Roux, Calatano, Foster, Bianco, Thijssen, Drewes, Parc, and newly cited reference McCrossan (U.S. Pat. App. Pub. No. 2006/0210245, hereinafter McCrossan).
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.	

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-3, 7, 9-10, 23, 26, 29-30, and 34-37 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Crinon.

Regarding claim 1, Crinon discloses A conferencing endpoint comprising (“real time conferencing component 102”; Crinon, ¶¶ [0022]): an audio input interface; an audio output interface; a video input interface; a display interface; a network interface; a processor; and system memory coupled to the processor and storing instructions configured to cause the processor to (“real time conferencing component 102 can [include] an input component 202 (audio input interface) that can obtain the audio data” and real time conferencing component 102 can be substantially similar to exemplary computing system 1112, including “a processing unit 1114 (processor), a system memory 1116, and a system bus 1118... [which] couples system components including... the system memory 1116 to the processing unit 1114” and a “network interface 1148” as well as a “The real time conferencing component 102 can additionally include a video streaming component 602, an audio streaming component 604, and a synchronization component 606.”; Crinon, ¶¶ [0027], [0063], [0069], [0049]): coordinate connection of the conferencing endpoint via the network interface to a network conference including one or more other conferencing endpoints (“The real time conferencing component 102 can send and/or receive data (e.g., via a network such as the internet, a corporate intranet, a telephone network,..) utilized in connection with audio/video teleconferences,” through the network interface 1148 and “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be... an audio/video multi-point control unit (AVMCU),... [which] can be a bridge that interconnects several endpoints and enables routing data between the endpoints,” thus coordinating the connection of the other disparate real time conferencing component(s) 104; Crinon, ¶¶ [0023], [0069], [0022]); capture an audio stream from the audio input interface (“the input component 202 can be a microphone that can capture the audio data and generate electrical impulses.”; Crinon, ¶¶ [0028]); recognize speech to create a caption stream (“text streaming component 106 can further include a speech to text conversion component 204 that converts the audio data to text data [and] can employ a speech recognition engine that can convert digital signals corresponding to the audio data to phonemes, words, and so forth, therefore recognizing speech. “Moreover, the speech to text conversion component 204 can process continuous speech,” thus creating a text stream (caption stream) from the recognized speech.; Crinon, ¶¶ [0029]); multiplex the audio stream and the caption stream into a transport stream (“the synchronization component 606 can employ timestamps to synchronize data (e.g., audio, video, text...). For example, the timestamps can be in the real time transport protocol (RTP) used by real time communication systems,” where “Separate streams of data including timestamps can be generated... and the streams can be multiplexed over the RTP.” Therefore, the audio stream and the text stream (caption stream) are multiplexed into a transport stream.; Crinon, ¶¶ [0051]), including a function muting the audio stream and suspending multiplexing of the audio stream (“Hence, a speaker (e.g., the output component N 414) associated with the receiving endpoint N 408 can be muted,” where “the action can be triggered in the receiving endpoint N 408 by a mute button on a user interface,” and “In response to the request, the AVMCU 402 can halt sending of the audio data to the receiving endpoint N 408, and the text data can be transmitted instead with the video data,” where only the text data and the video data are multiplexed into the transport stream for receiving endpoint N.; Crinon, ¶¶ [0043]) independently from a different function muting the caption stream and suspending multiplexing of the caption stream (“In the manual negotiation scenario, the participant employing each of the receiving endpoints 404-408 can select whether closed captions are desired,” where, in light of the request to receive text, “the AVMCU 402 can forward text data to the receiving endpoint 2 406, {independently from a different function muting the caption stream...}” where a participant can choose the transmission of “text data or audio data” or “both text data and audio data {... and suspending multiplexing of the caption stream}”; Crinon, ¶¶ [0043]); and send the transport stream to the one or more other conferencing endpoints via the network interface. (“Separate streams of data including timestamps can be generated... and the streams can be multiplexed over the RTP,” where the audio stream and the text stream (caption stream) as part of a multiplexed transport stream, are transported (over the RTP) which are “correlated... for presentation to listening participants in the real time teleconference.” Further, the “receiving endpoints can utilize timestamps to identify correlation between data within the separate streams”, thus the transport stream was received by the receiving endpoints, also referred to as one or more disparate real time conferencing components 104 (conferencing endpoints).; Crinon, ¶¶ [0049]).

Regarding claim 2, the rejection of claim 1 is incorporated. Crinon further discloses further comprising a video input interface (“it is contemplated that the real time conferencing component 102 (e.g., via the input component 202) can receive video data (not shown) along with the audio data,” thus the input component 202 includes a video input interface.; Crinon, ¶ [0027]); the system memory further storing instructions configured to cause the processor to capture a video stream from the video input interface (“the input component 202 can obtain audio data and/or video data from a participant in a teleconference (e.g., the active speaker).”; Crinon, ¶ [0033]); and wherein instructions configured to multiplex the audio stream and the caption stream into the transport stream comprise instructions configured to multiplex the audio stream, the caption stream, and the video stream into the transport stream (“the synchronization component 606 can employ timestamps to synchronize data (e.g., audio, video, text, . . .),” thus the audio stream, the video stream, and the text stream (caption stream), where the “separate streams of data including timestamps can be generated (e.g., at a sending endpoint, an AVMCU, . . .), and the streams can be multiplexed over the RTP,” thus the streams are multiplexed.; Crinon, ¶ [0051]).

Regarding claim 3, the rejection of claim 1 is incorporated. Crinon further discloses wherein instructions configured to coordinate connection to a network conference comprise instructions configured to coordinate connection to a plurality of conferencing endpoints including a first conferencing endpoint and a second endpoint (“The real time conferencing component 102 can send and/or receive data (e.g., via a network such as the internet, a corporate intranet, a telephone network, . . .) utilized in connection with audio/video teleconferences,” through the network interface 1148 and “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be... an audio/video multi-point control unit (AVMCU),... [which] can be a bridge that interconnects several endpoints and enables routing data between the endpoints,” thus coordinating the connection of the other disparate real time conferencing component(s) 104. As shown in FIG. 4, the system can include “any number of receiving endpoints (e.g., a receiving endpoint 1 404, [and] a receiving endpoint 2 406),” which, in this embodiment,  is the first conferencing endpoint and the second conferencing endpoint, respectively.; Crinon, ¶¶ [0023], [0069], [0022]); and wherein instructions configured to send the transport stream to the one or more other conferencing endpoints comprises instructions configured to: send the transport stream directly to the first conferencing endpoint (“It is to be appreciated that the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be… an audio/video multi-point control unit (AVMCU), included within … an endpoint,” thus the real time conferencing component 102 can be both an AVMCU and an endpoint. In this embodiment, the real time conferencing component 102 can, as “the AVMCU 402,” “route data to non-speaking participants,” Since, in this embodiment, the real time conferencing component 102 is also the “the sending endpoint 302 ... associated with the active speaker”, the transport stream is sent directly to the receiving endpoint 1 (the first conferencing endpoint).; Crinon, ¶¶ [0022], [0038], FIG. 4); and send the transport stream directly to the second conferencing endpoint (In the same way, using the example provided in FIG. 4, the real time conferencing component 102 is both the “the sending endpoint 302 ... associated with the active speaker” and the “the AVMCU 402,” which directs “data to non-speaking participants,”, the transport stream is sent directly to the receiving endpoint 2 (the second conferencing endpoint).; Crinon, ¶¶ [0022], [0038], FIG. 4).

Regarding claim 7, the rejection of claim 1 is incorporated. Crinon further discloses further comprising a display interface and an audio output interface (the endpoints can include “output component 306 … [including] a display (e.g., monitor, television, projector,…) to present video data and/or text data” which forms the display interface, and “the output component 306 can comprise one or more speakers to render audio output,” which forms the audio output interface.; Crinon, ¶ [0035]); the system memory further storing instructions configured to: receive an other transport stream, including an other audio stream and an other caption stream corresponding to the other audio stream, directly from an other conferencing endpoint (“the sending endpoint 302 can send audio data, video data and text data to the AVMCU 402,” where “sending endpoint 302 can be the real time conferencing component 102 (and/or one of the disparate real time conferencing component(s) 104) described herein (and similarly the receiving endpoint 304 can be the real time conferencing component 102 and/or one of the disparate real time conferencing component(s) 104),” and where “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be… an audio/video multi-point control unit (AVMCU), included within … an endpoint,” thus the real time conferencing component 102 can be both an AVMCU and a receiving endpoint 302. Likewise, any of the disparate real time conferencing components 104 can be the sending endpoint 302, where the sending component can multiplex the “ synchronize[d] data (e.g., audio, video, text, . . .)” to create the transport stream to be received by the receiving endpoints.; Crinon, ¶¶ [0040], [0032], [0022], [0051]); and coordinate outputs at the conferencing endpoint, including coordinating output of the other audio stream at the audio output interface with output of the other caption stream at the display interface (“The real time conferencing component 102 can send and/or receive data (e.g., via a network such as the internet, a corporate intranet, a telephone network, . . .) utilized in connection with audio/video teleconferences,” through the network interface 1148 and “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be... an audio/video multi-point control unit (AVMCU),... [which] can be a bridge that interconnects several endpoints and enables routing data between the endpoints,” thus coordinating the connection of the other disparate real time conferencing component(s) 104, where routing data between the endpoints includes coordinating output of the other audio stream at the audio output device with output of the other caption stream at the display interface (data has already been established to include “(e.g., audio, video, text, . . .)”; Crinon, ¶¶ [0023], [0069], [0022], [0051]).

Regarding claim 9, the rejection of claim 7 is incorporated. Crinon further discloses wherein instructions configured to receive an other transport stream comprise instructions configured to receive the other transport stream including an other video stream (“the sending endpoint 302 can send audio data, video data and text data to the AVMCU 402,” where “sending endpoint 302 can be the real time conferencing component 102 (and/or one of the disparate real time conferencing component(s) 104) described herein (and similarly the receiving endpoint 304 can be the real time conferencing component 102 and/or one of the disparate real time conferencing component(s) 104),” and where “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be… an audio/video multi-point control unit (AVMCU), included within … an endpoint,” thus the real time conferencing component 102 can be both an AVMCU and a receiving endpoint 302. Likewise, any of the disparate real time conferencing components 104 can be the sending endpoint 302, where the sending component can multiplex the “ synchronize[d] data (e.g., audio, video, text, . . .)” to create the transport stream to be received by the receiving endpoints.; Crinon, ¶¶ [0040], [0032], [0022], [0051]); wherein instructions configured to coordinate outputs at the conferencing endpoint comprise instructions configured to: present the other video stream in a window at the display interface (“at a receiving endpoint (e.g., the real time conferencing component 102, the receiving endpoint 304 of FIG. 3, the receiving endpoints 404-408 of FIGS. 4 and 5, . . .), when a video frame is received, data can be decoded to render the video frame while the metadata including the text can also be decoded to render closed captions on a screen with the corresponding video frame.”; Crinon, ¶ [0050]); and present the other caption stream in the window (“while the metadata including the text can also be decoded to render closed captions on a screen with the corresponding video frame.”; Crinon, ¶ [0050]).

Regarding claim 10, the rejection of claim 7 is incorporated. Crinon further discloses wherein instructions configured to coordinate outputs at the conferencing endpoint comprise instructions configured to: determine that caption presentation is toggled off (“Pursuant to another example, the sending endpoint 302 can select whether to disable or enable the ability of receiving endpoints 404-408 to obtain the text data for closed captioning”; Crinon, ¶ [0040]); and not present the other caption stream at the display interface in response to the determination (“hence, if closed captioning is disabled, the sending endpoint 302 can sent audio data and video data to the AVMCU 402 without text data, for instance.”; Crinon, ¶ [0040]).

Regarding claim 23, Crinon discloses A method comprising (the systems and methods described with reference to “real time conferencing component 102”; Crinon, ¶¶ [0022]): coordinating connection of a conferencing endpoint via a network interface to a network conference including one or more other conferencing endpoints (“The real time conferencing component 102 can send and/or receive data (e.g., via a network such as the internet, a corporate intranet, a telephone network,..) utilized in connection with audio/video teleconferences,” through the network interface 1148 and “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be... an audio/video multi-point control unit (AVMCU),... [which] can be a bridge that interconnects several endpoints and enables routing data between the endpoints,” thus coordinating the connection of the other disparate real time conferencing component(s) 104; Crinon, ¶¶ [0023], [0069], [0022]); capturing an audio stream from an audio input interface (“the input component 202 can be a microphone that can capture the audio data and generate electrical impulses.”; Crinon, ¶¶ [0028]); recognizing speech to create a caption stream (“text streaming component 106 can further include a speech to text conversion component 204 that converts the audio data to text data [and] can employ a speech recognition engine that can convert digital signals corresponding to the audio data to phonemes, words, and so forth, therefore recognizing speech. “Moreover, the speech to text conversion component 204 can process continuous speech,” thus creating a text stream (caption stream) from the recognized speech.; Crinon, ¶¶ [0029]); multiplexing the audio stream and the caption stream into a transport stream (“the synchronization component 606 can employ timestamps to synchronize data (e.g., audio, video, text...). For example, the timestamps can be in the real time transport protocol (RTP) used by real time communication systems,” where “Separate streams of data including timestamps can be generated... and the streams can be multiplexed over the RTP.” Therefore, the audio stream and the text stream (caption stream) are multiplexed into a transport stream.; Crinon, ¶¶ [0051]), including a function muting the audio stream and suspending multiplexing of the audio stream (“Hence, a speaker (e.g., the output component N 414) associated with the receiving endpoint N 408 can be muted,” where “the action can be triggered in the receiving endpoint N 408 by a mute button on a user interface,” and “In response to the request, the AVMCU 402 can halt sending of the audio data to the receiving endpoint N 408, and the text data can be transmitted instead with the video data,” where only the text data and the video data are multiplexed into the transport stream for receiving endpoint N.; Crinon, ¶¶ [0043]) independently from a different function muting the caption stream and suspending multiplexing of the caption stream (“In the manual negotiation scenario, the participant employing each of the receiving endpoints 404-408 can select whether closed captions are desired,” where, in light of the request to receive text, “the AVMCU 402 can forward text data to the receiving endpoint 2 406, {independently from a different function muting the caption stream...}” where a participant can choose the transmission of “text data or audio data” or “both text data and audio data {... and suspending multiplexing of the caption stream}”; Crinon, ¶¶ [0043]); and sending the transport stream to the one or more other conferencing endpoints via the network interface (“Separate streams of data including timestamps can be generated... and the streams can be multiplexed over the RTP,” where the audio stream and the text stream (caption stream) as part of a multiplexed transport stream, are transported (over the RTP) which are “correlated... for presentation to listening participants in the real time teleconference.” Further, the “receiving endpoints can utilize timestamps to identify correlation between data within the separate streams”, thus the transport stream was received by the receiving endpoints, also referred to as one or more disparate real time conferencing components 104 (conferencing endpoints).; Crinon, ¶¶ [0049]).
  
Regarding claim 26, the rejection of claim 1 is incorporated. Crinon further discloses wherein instructions configured to send the transport stream to the one or more other conferencing endpoints comprises instructions configured to (“The real time conferencing component 102 can send and/or receive data (e.g., via a network such as the internet, a corporate intranet, a telephone network,..) utilized in connection with audio/video teleconferences,” through the network interface 1148 and “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be... an audio/video multi-point control unit (AVMCU),... [which] can be a bridge that interconnects several endpoints and enables routing data between the endpoints,” thus coordinating the connection of the other disparate real time conferencing component(s) 104 for a plurality of users; Crinon, ¶¶ [0023], [0069], [0022]): render the caption stream in video images encoded in a video stream to the one or more other end points (“the sending endpoint 302 can receive audio data and video data for a real time conference from the input component 202, and the speech to text conversion component 204 can generate text data corresponding to the audio data.”; Crinon, ¶¶ [0040]); and subsequent to rendering the caption stream, send the audio stream to the one or more other endpoints. (“Thereafter, the sending endpoint 302 can send audio data, video data and text data to the AVMCU 402.”; Crinon, ¶¶ [0040]).

Regarding claim 29, Crinon discloses, further comprising a video input interface (“it is contemplated that the real time conferencing component 102 (e.g., via the input component 202) can receive video data (not shown) along with the audio data,” thus the input component 202 includes a video input interface.; Crinon, ¶¶ [0027]); further comprising capturing a video stream from the video input interface (“the input component 202 can obtain audio data and/or video data from a participant in a teleconference (e.g., the active speaker).”; Crinon, ¶¶ [0033]); and wherein multiplexing the audio stream and the caption stream into the transport stream comprise multiplexing the audio stream, the caption stream, and the video stream into the transport stream (“the synchronization component 606 can employ timestamps to synchronize data (e.g., audio, video, text,..),” thus the audio stream, the video stream, and the text stream (caption stream), where the “separate streams of data including timestamps can be generated (e.g., at a sending endpoint, an AVMCU,..), and the streams can be multiplexed over the RTP,” thus the streams are multiplexed.; Crinon, ¶¶ [0055]). 

Regarding claim 30, Crinon discloses wherein coordinating connection to a network conference comprises coordinating connection to a plurality of conferencing endpoints including a first conferencing endpoint and a second endpoint (“The real time conferencing component 102 can send and/or receive data (e.g., via a network such as the internet, a corporate intranet, a telephone network,..) utilized in connection with audio/video teleconferences,” through the network interface 1148 and “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be... an audio/video multi-point control unit (AVMCU),... [which] can be a bridge that interconnects several endpoints and enables routing data between the endpoints,” thus coordinating the connection of the other disparate real time conferencing component(s) 104. As shown in FIG. 4, the system can include “any number of receiving endpoints (e.g., a receiving endpoint 1 404, [and] a receiving endpoint 2 406),” which, in this embodiment, is the first conferencing endpoint and the second conferencing endpoint, respectively.; Crinon, ¶¶ [0023], [0069], [0022]); and wherein sending the transport stream to the one or more other conferencing endpoints comprises: sending the transport stream directly to the first conferencing endpoint (“It is to be appreciated that the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be… an audio/video multi-point control unit (AVMCU), included within … an endpoint,” thus the real time conferencing component 102 can be both an AVMCU and an endpoint. In this embodiment, the real time conferencing component 102 can, as “the AVMCU 402,” “route data to non-speaking participants,” Since, in this embodiment, the real time conferencing component 102 is also the “the sending endpoint 302... associated with the active speaker”, the transport stream is sent directly to the receiving endpoint 1 (the first conferencing endpoint).; Crinon, ¶¶ [0022], [0038], FIG. 4); and sending the transport stream directly to the second conferencing endpoint (In the same way, using the example provided in FIG. 4, the real time conferencing component 102 is both the “the sending endpoint 302... associated with the active speaker” and the “the AVMCU 402,” which directs “data to non-speaking participants,”, the transport stream is sent directly to the receiving endpoint 2 (the second conferencing endpoint).; Crinon, ¶¶ [0022], [0038], FIG. 4).

Regarding claim 34, Crinon discloses, further comprising: receiving an other transport stream, including an other audio stream and an other caption stream corresponding to the other audio stream, directly from an other conferencing endpoint (“The system 100 can support real time peer-to-peer conferences and/or multi-party conferences. For example, in a peer-to-peer conference, the real time conferencing component 102 and the disparate real time conferencing component 104 can both be endpoints that can directly communicate with each other” where “the sending endpoint 302 can send audio data, video data and text data” where “sending endpoint 302 can be the real time conferencing component 102 (and/or one of the disparate real time conferencing component(s) 104) described herein (and similarly the receiving endpoint 304 can be the real time conferencing component 102 and/or one of the disparate real time conferencing component(s) 104),”; Crinon, ¶¶ [0024], [0032]); and coordinating outputs at the conferencing endpoint, including coordinating output of the other audio stream at an audio output interface with output of the other caption stream at a display interface (“The real time conferencing component 102 can send and/or receive data (e.g., via a network such as the internet, a corporate intranet, a telephone network,..) utilized in connection with audio/video teleconferences,” where the “real time conferencing component 102 can further include a text streaming component 106 that can generate, transfer, route, receive, output, etc. streaming text (e.g., text data) utilized to yield closed captions associated with a real time audio/video conference.”; Crinon, ¶¶ [0023], [0025]). 

Regarding claim 35, Crinon discloses wherein receiving an other transport stream comprises receiving the other transport stream including an other video stream (“the text can correspond to audio data yielded by an active speaker at a particular time” where the text is received with “video associated with the real time conference concurrently being outputted”; Crinon, ¶¶ [0025]); and wherein coordinating outputs at the conferencing endpoint comprises: presenting the video stream in a window at the display interface (“video associated with the real time conference concurrently being outputted” on a display, wherein the video necessarily forms a window on the display (e.g., the boundaries of the video in the display form the boundaries of the window).; Crinon, ¶¶ [0025]); and presenting the other caption stream in a different window supplementing presentation of the video in the window (As well, the text can be presented “in an area above, below, to the side of, etc. the video, for instance,” thus, in some embodiments, the text is in a separate window from the window for the video.; Crinon, ¶¶ [0025]). 

Regarding claim 36, Crinon discloses wherein receiving an other transport stream comprises receiving the other transport stream including an other video stream (“the text can correspond to audio data yielded by an active speaker at a particular time” where the text is received with “video associated with the real time conference concurrently being outputted”; Crinon, ¶¶ [0025]); and wherein coordinating outputs at the conferencing endpoint comprises: presenting the other video stream in a window at the display interface (“video associated with the real time conference concurrently being outputted” on a display, wherein the video necessarily forms a window on the display (e.g., the boundaries of the video in the display form the boundaries of the window).; Crinon, ¶¶ [0025]); and presenting the other caption stream in the window (“The text can be overlaid over video associated with the real time conference concurrently being outputted”.; Crinon, ¶¶ [0025]). 

Regarding claim 37, Crinon discloses wherein coordinating outputs at the conferencing endpoint comprises: determining that caption presentation is toggled off (“Pursuant to another example, the sending endpoint 302 can select whether to disable or enable the ability of receiving endpoints 404-408 to obtain the text data for closed captioning”; Crinon, ¶¶ [0040]), and not presenting the other caption stream at the display interface in response to the determination (“hence, if closed captioning is disabled, the sending endpoint 302 can send audio data and video data... without text data, for instance,” such as between peers in the peer to peer network.; Crinon, ¶¶ [0040]). 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. §102 and 103 (or as subject to pre-AIA  35 U.S.C. §102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. §103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 4-6 and 31-33 is/are rejected under 35 U.S.C. §103 as being unpatentable over Crinon as applied to claim 1 above, and further in view of Stahl.

Regarding claim 4, the rejection of claim 1 is incorporated. Crinon discloses all of the elements of the current invention as stated above. However, Crinon fail(s) to expressly recite wherein instructions configured to recognize speech comprise instructions configured to evaluate transcription hypotheses according to natural language grammars to inform speech recognition.
Stahl teaches “systems, methods, and algorithms that use speech characterization to condition automatic speech recognition and parsing according to natural language grammars.” (Stahl, ¶ [0007]). Regarding claim 4, Stahl teaches wherein instructions configured to recognize speech comprise instructions configured to evaluate transcription hypotheses according to natural language grammars to inform speech recognition (the system can include “NLP interpretation” of the “one or more transcription hypotheses” from an “utterance... [received by] ASR module 11” {instructions configured to evaluate transcription hypotheses…} where the NLP interpretation is performed “according to a [natural language] grammar {...according to natural language grammars to inform speech recognition}.”; Stahl, ¶¶ [0032], [0033]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon to incorporate the teachings of Stahl to include wherein instructions configured to recognize speech comprise instructions configured to evaluate transcription hypotheses according to natural language grammars to inform speech recognition. Natural language grammar can be employed as part of an “improved approach for generating interpretations of speech inputs”, as recognized by Stahl. (Stahl, ¶¶ [0006], [0007]).

Regarding claim 5, the rejection of claim 4 is incorporated. Crinon discloses all of the elements of the current invention as stated above. However, Crinon fail(s) to expressly recite wherein the natural language grammars include a user-specific grammar.
The relevance of Stahl is described above with relation to claim 4. Regarding claim 5, Stahl teaches wherein the natural language grammars include a user-specific grammar (“Interpreter module 22 consumes the UID and uses it to condition its interpretation according to a grammar” where “some embodiments maintain databases of UID-specific interpretation weights,” where UID-specific interpretation weights are user specific grammar.; Stahl, ¶¶ [0033]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, to incorporate the teachings of Stahl to include wherein the natural language grammars include a user-specific grammar. Natural language grammar can be employed as part of an “improved approach for generating interpretations of speech inputs”, as recognized by Stahl. (Stahl, ¶¶ [0006], [0007]).

Regarding claim 6, the rejection of claim 4 is incorporated. Crinon discloses all of the elements of the current invention as stated above. However, Crinon fails to expressly recite wherein the natural language grammars include a topic-specific grammar.
The relevance of Stahl is described above with relation to claim 4. Regarding claim 6, Stahl teaches wherein the natural language grammars include a topic-specific grammar (“Some embodiments allow for grammar rules related to mature or offensive subject matter,” where mature or offensive subject matter is a topic.; Stahl, ¶¶ [0056]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, to incorporate the teachings of Stahl to include wherein the natural language grammars include a topic-specific grammar. Natural language grammar can be employed as part of an “improved approach for generating interpretations of speech inputs”, as recognized by Stahl. (Stahl, ¶¶ [0006], [0007]).

Regarding claim 31, the rejection of claim 23 is incorporated. Claim 31 is substantially the same as claim 4 and is therefore rejected under the same rationale as above.

Regarding claim 32, the rejection of claim 31 is incorporated. Claim 32 is substantially the same as claim 5 and is therefore rejected under the same rationale as above.

Regarding claim 33, the rejection of claim 31 is incorporated. Claim 33 is substantially the same as claim 6 and is therefore rejected under the same rationale as above.

Claims 8, 19, and 46 is/are rejected under 35 U.S.C. §103 as being unpatentable over Crinon as applied to claims 1, 7, and 23 above, and further in view of Casagrande.

Regarding claim 8, the rejection of claim 7 is incorporated. Crinon discloses all of the elements of the current invention as stated above. Crinon further discloses wherein instructions configured to receive an other transport stream comprise instructions configured to receive the other transport stream including an other video stream (“the sending endpoint 302 can send audio data, video data and text data to the AVMCU 402,” where “sending endpoint 302 can be the real time conferencing component 102 (and/or one of the disparate real time conferencing component(s) 104) described herein (and similarly the receiving endpoint 304 can be the real time conferencing component 102 and/or one of the disparate real time conferencing component(s) 104),” and where “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be… an audio/video multi-point control unit (AVMCU), included within … an endpoint,” thus the real time conferencing component 102 can be both an AVMCU and a receiving endpoint 302. Likewise, any of the disparate real time conferencing components 104 can be the sending endpoint 302, where the sending component can multiplex the “ synchronize[d] data (e.g., audio, video, text, . . .)” to create the transport stream to be received by the receiving endpoints.; Crinon, ¶¶ [0040], [0032], [0022], [0051]); and wherein instructions configured to coordinate outputs at the conferencing endpoint comprise instructions configured to: present the video stream in a window at the display interface (“at a receiving endpoint (e.g., the real time conferencing component 102, the receiving endpoint 304 of FIG. 3, the receiving endpoints 404-408 of FIGS. 4 and 5, . . .), when a video frame is received, data can be decoded to render the video frame while the metadata including the text can also be decoded to render closed captions on a screen with the corresponding video frame.”; Crinon, ¶ [0050]); and present the other caption stream in a… window supplementing presentation of the video in the window (“while the metadata including the text can also be decoded to render closed captions on a screen with the corresponding video frame.” Thus, the system includes presenting the closed captions in a window. The captions correspond to the video frame, thus supplementing the presentation of the video in the window; Crinon, ¶ [0050]). However, Crinon fail(s) to expressly recite present the other caption stream in a different window.
Casagrande teaches systems and methods for synchronizing second screen content with audio/video programming. (Casagrande, ¶ [0002]). Regarding claim 8, Casagrande teaches present the other caption stream in a different window supplementing presentation of the video in the window (the system can “convert the received data to suitably formatted video signals that can be rendered for viewing... by the customer on the presentation device 106.” Further, the system can “be synchronized and utilized cooperatively with the second screen electronic device 112 to provide additional, supplemental content associated with the [received data]” and “communicate output data to a second screen electronic device 112, including closed captioning data and timing data for audio/video content,” thus presenting the other caption stream in a different window, where synchronized and used cooperatively including the timing data indicates that it's supplementing the presentation of the video in the window.; Casagrande, ¶ [0021], [0023]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, to incorporate the teachings of Casagrande to include wherein instructions configured to receive an other transport stream comprise instructions configured to receive the other transport stream including an other video stream; and wherein instructions configured to coordinate outputs at the conferencing endpoint comprise instructions configured to: present the video stream in a window at the display interface; and present the other caption stream in a different window supplementing presentation of the video in the window. The systems and methods described in Casagrande can present “intelligently selected content” on a second screen, where the content “is directly associated to particular events currently occurring in programming while it is being viewed.” (Casagrande, ¶¶ [0005], [0006]).

Regarding claim 19, the rejection of claim 1 is incorporated. Crinon discloses all of the elements of the current invention as stated above. However, Crinon fail(s) to expressly recite wherein instructions configured to recognize speech comprise instructions configured to recognize a speech portion included in a blacklist; and the system memory further storing instructions configured to adjust the caption stream to obscure presentation of the speech portion at the one or more other conferencing endpoints.
The relevance of Casagrande is described above with relation to claim 8. Regarding claim 19, Casagrande teaches wherein instructions configured to recognize speech comprise instructions configured to recognize a speech portion included in a blacklist (“the second screen electronic device 400 may utilize a ‘black list’ of unauthorized terms, to prevent unwanted caption words from being transmitted to the second screen content module 412 to be used for the retrieval and presentation of second screen content. For example, words designated by the Federal Communications Commission (FCC) as being unfit for broadcast television during daytime hours may be present on the ‘black list’ of words,” thus recognizing a portion included in a blacklist.; Casagrande, ¶ [0058]); and the system memory further storing instructions configured to adjust the caption stream to obscure presentation of the speech portion at the one or more other conferencing endpoints (the system can further “prevent the presentation of content associated with profane language. The ‘black list’ of words may be user-configurable, and may include not only profanity, but any word for which a user does not wish to view additional content,” thus caption stream is adjusted to prevent the presentation (obscure) the speech portion at the one or more conferencing devices (noting that this is user configurable to not receive black listed content and involves a plurality of “receiving endpoints N 408”, thus at the “one or more” conferencing devices).; Casagrande, ¶ [0058]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, to incorporate the teachings of Casagrande to include wherein instructions configured to receive an other transport stream comprise instructions configured to receive the other transport stream including an other video stream; and wherein instructions configured to coordinate outputs at the conferencing endpoint comprise instructions configured to: present the video stream in a window at the display interface; and present the other caption stream in a different window supplementing presentation of the video in the window. The systems and methods described in Casagrande can present “intelligently selected content” on a second screen, where the content “is directly associated to particular events currently occurring in programming while it is being viewed.” (Casagrande, ¶¶ [0005], [0006]).

Regarding claim 46, the rejection of claim 23 is incorporated. Claim 46 is substantially the same as claim 19 and is therefore rejected under the same rationale as above.

Claims 11 and 38 is/are rejected under 35 U.S.C. §103 as being unpatentable over Crinon as applied to claim 10 and 37 above, and further in view of Casagrande and Garrido.

Regarding claim 11, the rejection of claim 10 is incorporated. Crinon discloses all of the elements of the current invention as stated above. However, Crinon fails to expressly recite the system memory further storing instructions configured to toggle captioning off based on one or more of: network characteristics or characteristics of speech included in the other audio stream.
The relevance of Casagrande is described above with relation to claim 8. Regarding claim 11, Casagrande teaches the system memory further storing instructions configured to toggle captioning off based on one or more of: … characteristics of speech included in the other audio stream (“the second screen electronic device 400 may utilize a ‘black list’ of unauthorized terms, to prevent unwanted caption words from being transmitted to the second screen content module 412 to be used for the retrieval and presentation of second screen content. For example, words designated by the Federal Communications Commission (FCC) as being unfit for broadcast television during daytime hours may be present on the ‘black list’ of words, to prevent the presentation of content associated with profane language. The ‘black list’ of words may be user-configurable, and may include not only profanity, but any word for which a user does not wish to view additional content,” thus captioning is toggled off based on profanity or unwanted caption words (characteristics of speech) included in the other audio stream.; Casagrande, ¶ [0058]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, to incorporate the teachings of Casagrande to include the system memory further storing instructions configured to toggle captioning off based on one or more of: … characteristics of speech included in the other audio stream. The systems and methods described in Casagrande can present “intelligently selected content” on a second screen, where the content “is directly associated to particular events currently occurring in programming while it is being viewed.” (Casagrande, ¶¶ [0005], [0006]). However, Crinon and Casagrande fail to expressly recite the system memory further storing instructions configured to toggle captioning off based on one or more of: network characteristics.
Garrido teaches systems and methods for data stream management in a multi-party conference session. (Garrido, ¶ [0013]). Regarding claim 11, Garrido teaches the system memory further storing instructions configured to toggle captioning off based on one or more of: network characteristics (“the endpoints 110.1-110.n may generate audio feeds, video feeds, and/or closed caption feeds… The endpoints may generate multiple instances of each type feeds” and “each endpoint 110.1-110.n also may generate priority metadata representing a priority assignment conferred on the data feed(s) output by the respective endpoint.” In some examples, data feeds (though described as image feeds, the “types and content of the data feeds are immaterial” and when “associated priority values below a cut-off threshold may not be [presented] at all,” where “thresholds are determined based on network condition,” thus cutting closed caption feeds off (toggling captioning off) based on network characteristics; Garrido, ¶¶ [0015], [0016], [0039], [0053]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the methods for synchronizing second screen content with audio/video programming of Casagrande to incorporate the teachings of Garrido to include the system memory further storing instructions configured to toggle captioning off based on one or more of: network characteristics or characteristics of speech included in the other audio stream. Presenting or receiving feeds “based on associated priorities [can] improve a user experience at the receiving endpoint,” as recognized by Garrido. (Garrido, ¶ [0021]).

Regarding claim 38, the rejection of claim 37 is incorporated. Claim 38 is substantially the same as claim 11 and is therefore rejected under the same rationale as above.

Claims 12, 18, 39, and 45 is/are rejected under 35 U.S.C. §103 as being unpatentable over Crinon as applied to claim 1 and 23 above, and further in view of McCrossan and Garrido.

Regarding claim 12, the rejection of claim 1 is incorporated. Crinon discloses all of the elements of the current invention as stated above. Crinon further discloses the system memory further storing instructions configured to receive a further transport stream, including a further audio stream and a further caption stream corresponding to the further audio stream, directly from a further conferencing endpoint, (“the sending endpoint 302 can send audio data, video data and text data to the AVMCU 402,” where “sending endpoint 302 can be the real time conferencing component 102 (and/or one of the disparate real time conferencing component(s) 104) described herein (and similarly the receiving endpoint 304 can be the real time conferencing component 102 and/or one of the disparate real time conferencing component(s) 104),” and where “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be… an audio/video multi-point control unit (AVMCU), included within … an endpoint,” thus the real time conferencing component 102 can be both an AVMCU and a receiving endpoint 302. Likewise, any of the disparate real time conferencing components 104 can be the sending endpoint 302, where the sending component can multiplex the “ synchronize[d] data (e.g., audio, video, text, . . .)” to create the transport stream to be received by the receiving endpoints. Further, the system includes “a receiving endpoint N 408, where N can be substantially any integer,” thus there are an unlimited number of sending endpoints and receiving endpoint, thus providing an unlimited number of transport streams to and from the AVMCU.; Crinon, ¶¶ [0040], [0032], [0022], [0051], [0038]), the further conferencing endpoint included in the one or more other conferencing endpoints (the receiving endpoints N are included in the one or more other conferencing endpoints, as shown in FIG. 4.; Crinon, ¶ [0038], FIG. 4); and wherein instructions configured to coordinate outputs at the conferencing endpoint comprise instructions configured to: present the other caption stream at the display interface with first visual characteristics; and… present the further caption stream at the display interface with second visual characteristics (“The examples mentioned above can be extended to the case where there are multiple concurrent active speakers in the conference and text streams are available for each of these participants in which case manual selection can include the choice of which closed captions stream is selected for viewing in the receiving endpoint.” where “text data (text streams)” are “outputt[ed] via a display in the form of closed captions.” Thus, the other caption stream is presented at the display interface with first visual characteristics and the further caption stream is presented at the display interface with second visual characteristics.; Crinon, ¶¶ [0043], [0058]).  However, Crinon fail(s) to expressly recite concurrent with presenting the other caption stream, presenting the further caption stream at the display interface with second visual characteristics.
	McCrossan discloses systems and methods for presenting different data streams simultaneously. (McCrossan, ¶ [0002]). Regarding claim 12, McCrossan discloses concurrent with presenting the other caption stream, presenting the further caption stream at the display interface with second visual characteristics (Discloses a system which “simultaneously decodes a plurality of subtitle substreams to produce a corresponding plurality of decoded subtitle outputs” where said subtitle outputs can be presented on a single display where “the user can view two subtitle substreams simultaneously,” as displayed in FIGS. 5A and 5B.; McCrossan, ¶¶ [0086], [0067]; FIGS. 5A-5B).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, to incorporate the teachings of McCrossan to include concurrent with presenting the other caption stream, presenting the further caption stream at the display interface with second visual characteristics. The ability to present “two or more unique subtitle substreams” allows for better understanding of an ongoing conversation, as recognized by McCrossan. (McCrossan, ¶ [0066]-[0067]). However, Crinon and McCrossan fail(s) to expressly recite the second visual characteristics differing from the first visual characteristics.
The relevance of Garrido is described above with relation to claim 11. Regarding claim 12, Garrido teaches the second visual characteristics differing from the first visual characteristics (“the endpoints 110.1-110.n may generate audio feeds, video feeds, and/or closed caption feeds. In The endpoints may generate multiple instances of each type feeds” and “each endpoint 110.1-110.n also may generate priority metadata representing a priority assignment conferred on the data feed(s) output by the respective endpoint” and “a feed with a highest priority value may be highlighted, for example by framing an associated image feed with a distinct color,” thus a second visual characteristic different from a first visual characteristic; Garrido, ¶¶ [0015], [0016]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the systems and methods for data presentation of McCrossan, to incorporate the teachings of Garrido to include the second visual characteristics differing from the first visual characteristics. Presenting or receiving feeds “based on associated priorities [can] improve a user experience at the receiving endpoint,” as recognized by Garrido. (Garrido, ¶ [0021]).

Regarding claim 18, the rejection of claim 1 is incorporated. Crinon discloses all of the elements of the current invention as stated above. Crinon further discloses the system memory further storing instructions configured to receive consent to transcribe recognized speech via [a selection process] at the conferencing endpoint (“the listening participants can manually and/or automatically negotiate the use of closed captions,” at the receiving endpoints, where “the sending endpoint 302 can select whether to disable or enable the ability of receiving endpoints 404-408 to obtain the text data for closed captioning” thus an end user choose whether to consent or not to the transcription of text data for closed captioning (transcribe recognized speech) speech; Crinon, ¶ [0043], [0040]). However, Crinon fail(s) to expressly recite wherein the selection process is a meeting registration.
The relevance of Garrido is described above with relation to claim 11. Regarding claim 18, Garrido teaches wherein the selection process is a meeting registration (“the present technology can be configured to allow users to select to “opt in” or “opt out” of participation in the collection of personal information data during registration for services,” where the “participant's face or voice … may or may not be shared with other participants or the system based on the user's selection to “opt in” or “opt out” of sharing such information.”; Garrido, ¶ [0066]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the systems and methods for data presentation of McCrossan, to further incorporate the teachings of Garrido to include wherein the selection process is a meeting registration. Presenting or receiving feeds “based on associated priorities [can] improve a user experience at the receiving endpoint,” as recognized by Garrido. (Garrido, ¶ [0021]).

Regarding claim 39, the rejection of claim 23 is incorporated. Claim 39 is substantially the same as claim 12 and is therefore rejected under the same rationale as above.

Regarding claim 45, the rejection of claim 23 is incorporated. Claim 45 is substantially the same as claim 18 and is therefore rejected under the same rationale as above.

Claim(s) 13 and 40 is/are rejected under 35 U.S.C. §103 as being unpatentable over Crinon, McCrossan, and Garrido as applied to claim 12 and 39 above, and further in view of Le Roux.

Regarding claim 13, the rejection of claim 12 is incorporated. Crinon, McCrossan, and Garrido discloses all of the elements of the current invention as stated above. However, Crinon and McCrossan fail(s) to expressly recite the system memory further storing instructions configured to detect that the other caption stream includes speech that temporally overlaps with speech included in the further caption stream; wherein instructions configured to receive an other transport stream comprise instructions configured to receive the other transport stream including an other video stream; wherein instructions configured to receive a further transport stream comprise instructions configured to receive the further transport stream including a further video stream;  wherein instructions configured to output the other caption stream at the display interface with first visual characteristics comprise instructions configured to present the other caption stream along with a person depicted in the other video stream in a window; and wherein instructions configured to output the further caption stream at the display interface with second visual characteristics comprise instructions configured to simultaneously present the further caption stream along with a person depicted in the further video stream in another window.
The relevance of Garrido is described above with relation to claim 11. Regarding claim 13, Garrido teaches wherein instructions configured to receive an other transport stream comprise instructions configured to receive the other transport stream including an other video stream (“the endpoints 110.1-110.n may generate audio feeds, video feeds, and/or closed caption feeds {the transport stream}” where “ the relay server 120 [receives the] different data feeds... from the endpoints” and then “send[s] one or more media feeds 230.1, 230.3 and associated priorities 232.1, 232.3 to endpoint 110.n.” which performs “conference management of the media feeds 230.1, 230.3 it receives based on the associated priorities. 232.1, 232.3,” where media feeds 230.1 is the other transport stream including a video feed {an other video stream}.; Garrido, ¶¶ [0015]-[0017]); wherein instructions configured to receive a further transport stream comprise instructions configured to receive the further transport stream including a further video stream (“the endpoints 110.1-110.n may generate audio feeds, video feeds, and/or closed caption feeds (the transport stream)” where “ the relay server 120 [receives the] different data feeds... from the endpoints” and then “send[s] one or more media feeds 230.1, 230.3 and associated priorities 232.1, 232.3 to endpoint 110.n.” which performs “conference management of the media feeds 230.1, 230.3 it receives based on the associated priorities. 232.1, 232.3,” where media feeds 230.3 is the further transport stream including a video feed (a further video stream).; Garrido, ¶¶ [0015]-[0017]); wherein instructions configured to output the other caption stream at the display interface with first visual characteristics comprise instructions configured to present the other caption stream along with a person depicted in the other video stream in a window (“the endpoints 110.1-110.n may generate audio feeds, video feeds, and/or closed caption feeds,” where  The video feed can include “locally-captured video of an endpoint operator,” thus a person is depicted. Further, “all feeds above a threshold priority may be presented in a primary region of the endpoint display called a canvas,” thus feeds, such as a caption feed (other caption stream) and a video feed (other video stream), as received from a first endpoint 110, which are above a priority level threshold will be presented in the canvas area (a window).; Garrido, ¶¶ [0015], [0039]); and wherein instructions configured to output the further caption stream at the display interface with second visual characteristics comprise instructions configured to simultaneously present the further caption stream along with a person depicted in the further video stream in an other window (“the endpoints 110.1-110.n may generate audio feeds, video feeds, and/or closed caption feeds,” where the video feed can include “locally-captured video of an endpoint operator,” thus a person is depicted. Further, “feeds below a threshold may be presented in a secondary region of the endpoint display called a roster” thus feeds, such as a caption feed (further caption stream) and a video feed (further video stream), as received from a second endpoint 110, which are below a priority level threshold, will be presented in the roster area (an other window).; Garrido, ¶¶ [0015], [0039]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the systems and methods for data presentation of McCrossan, to incorporate the teachings of Garrido to include wherein instructions configured to receive an other transport stream comprise instructions configured to receive the other transport stream including an other video stream; wherein instructions configured to receive a further transport stream comprise instructions configured to receive the further transport stream including a further video stream;  wherein instructions configured to output the other caption stream at the display interface with first visual characteristics comprise instructions configured to present the other caption stream along with a person depicted in the other video stream in a window; and wherein instructions configured to output the further caption stream at the display interface with second visual characteristics comprise instructions configured to simultaneously present the further caption stream along with a person depicted in the further video stream in an other window. Presenting or receiving feeds “based on associated priorities [can] improve a user experience at the receiving endpoint,” as recognized by Garrido. (Garrido, ¶ [0021]). However, Crinon, McCrossan, and Garrido fail to expressly recite the system memory further storing instructions configured to detect that the other caption stream includes speech that temporally overlaps with speech included in the further caption stream.
Le Roux teaches systems and methods for “recognizing speech from an acoustic signal with multiple overlapping speakers.” (Le Roux, ¶ [0010]). Regarding claim 13, Le Roux teaches the system memory further storing instructions configured to detect that the other caption stream includes speech that temporally overlaps with speech included in the further caption stream (“a speech recognition system for recognizing speech including overlapping speech by multiple speakers,” where the system is “trained to transform the received acoustic signal into a text for each target speaker” and “to output the text for each target speaker [using] an output interface to transmit the text for each target speaker.”; Le Roux, ¶ [0011]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the systems and methods for data presentation of McCrossan, and as modified by the systems and methods for data stream management in a multi-party conference session of Garrido, to incorporate the teachings of Le Roux to include the system memory further storing instructions configured to detect that the other caption stream includes speech that temporally overlaps with speech included in the further caption stream. “In the system without explicit separation, recognition can be optimized directly for recognizing speech from an acoustic signal with multiple overlapping speakers, leading to improved performance,” as recognized by Le Roux. (Le Roux, ¶ [0010]).

Regarding claim 40, the rejection of claim 39 is incorporated. Claim 40 is substantially the same as claim 13 and is therefore rejected under the same rationale as above.


Claim(s) 14 and 41 is/are rejected under 35 U.S.C. §103 as being unpatentable over Crinon, McCrossan, and Garrido as applied to claims 12 and 39 above, and further in view of Calatano.

Regarding claim 14, the rejection of claim 12 is incorporated. Crinon, McCrossan, and Garrido disclose all of the elements of the current invention as stated above. However, Crinon, McCrossan, and Garrido fail to expressly recite wherein instructions configured to output the other caption stream at the display interface with first visual characteristics comprise instructions configured to present the other caption stream in a first color; and wherein instructions configured to output the further caption stream at the display interface with second visual characteristics comprise instructions configured to present the further caption stream in a second color, the second color differing from the first color.
Calatano teaches a “smart closed caption systems for appearance and positioning of closed caption text in video content.” (Calatano, ¶ [0003]). Regarding claim 14, Calatano teaches wherein instructions configured to output the other caption stream at the display interface with first visual characteristics comprise instructions configured to present the other caption stream in a first color (“in some embodiments of the present invention, the respective text style for a first character may include at least one of font style, text size, bold, italic, or color that is different from a font style, text size, bold, italic, or color of a second character of the one or more identified speaking characters.” In a first example “a color of a text style of the first character may indicate that a first color should be used for closed caption text that is associated with the first character.” Thus, the first visual characteristics can include instructions to present the other caption stream in a first color; Calatano, ¶ [0068]); and wherein instructions configured to output the further caption stream at the display interface with second visual characteristics comprise instructions configured to present the further caption stream in a second color, the second color differing from the first color (Using the same example of the first character above “a color of a text style of the second character may indicate that a second color, different from the first color (of the first character) is to be used for closed caption text that is associated with the second character.” Thus, the second visual characteristics can include instructions to present the further caption stream in a second color which is different from the first color; Calatano, ¶ [0068]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the systems and methods for data presentation of McCrossan, and as modified by the systems and methods for data stream management in a multi-party conference session of Garrido, to incorporate the teachings of Calatano to include wherein instructions configured to output the other caption stream at the display interface with first visual characteristics comprise instructions configured to present the other caption stream in a first color; and wherein instructions configured to output the further caption stream at the display interface with second visual characteristics comprise instructions configured to present the further caption stream in a second color, the second color differing from the first color. Modifying the visual representation of the text can “improve recognition of which characters are speaking and their sentiment,” as recognized by Calatano. (Calatano, ¶ [0047]).

Regarding claim 41, the rejection of claim 39 is incorporated. Claim 41 is substantially the same as claim 14 and is therefore rejected under the same rationale as above.

Claims 15 and 42 is/are rejected under 35 U.S.C. §103 as being unpatentable over Crinon as applied to claims 1 and 23 above, and further in view of Foster.

Regarding claim 15, the rejection of claim 1 is incorporated. Crinon discloses all of the elements of the current invention as stated above. However, Crinon fail(s) to expressly recite the system memory further storing instructions configured to compress the audio stream; and wherein instructions configured to multiplex the audio stream and the caption stream into a transport stream comprises instructions configured to multiplex the compressed audio stream and the caption stream.
Foster teaches systems and methods for presenting captions associated with a broadcast media stream. (Foster, ¶ [0001]). Regarding claim 15, Foster teaches the system memory further storing instructions configured to compress the audio stream (“the caption generator 216 embeds or encodes each word by compressing the audio segment (or audio word) 238a or 238n using a known audio compression format (such as MP3, Adaptive Multi-Rate Wideband Codec (AMR-WB), or AccPlus) and then inserting the compressed audio segment or word 402 into a packet 400 along with the corresponding caption 404, which may be compressed using the same technique used to compress each audio segment 238a-238n,”; Foster, ¶ [0045]); and wherein instructions configured to multiplex the audio stream and the caption stream into a transport stream comprises instructions configured to multiplex the compressed audio stream and the caption stream (where “inserting the compressed audio segment or word 402 into a packet 400 along with the corresponding caption 404” is multiplexing the compressed audio stream and the caption stream into a transport stream.; Foster, ¶ [0045]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, to incorporate the teachings of Foster to include the system memory further storing instructions configured to compress the audio stream; and wherein instructions configured to multiplex the audio stream and the caption stream into a transport stream comprises instructions configured to multiplex the compressed audio stream and the caption stream. The systems and methods of Foster allow for the presentation of “captions for any live or pre-recorded content” such that deaf or hard of hearing people can enjoy such content without “specialized equipment,” as recognized by Foster. (Foster, ¶¶ [0004], [0005]).

Regarding claim 42, the rejection of claim 23 is incorporated. Claim 42 is substantially the same as claim 15 and is therefore rejected under the same rationale as above.

Claims 16 and 43 is/are rejected under 35 U.S.C. §103 as being unpatentable over Crinon as applied to claims 1 and 23 above, and further in view of Bianco.

Regarding claim 16, the rejection of claim 1 is incorporated. Crinon discloses all of the elements of the current invention as stated above. However, Crinon fail(s) to expressly recite the system memory further storing instructions configured to send the caption stream to the one or more other conferencing endpoints redundantly.
Bianco teaches systems and methods for improving the quality for Internet Protocol (IP) communications. (Bianco, ¶ [0002]). Regarding claim 16, Bianco teaches the system memory further storing instructions configured to send the caption stream to the one or more other conferencing endpoints redundantly (“because the textual representation of a user's spoken audio input can be encapsulated in far fewer digital data packets than the data created by a CODEC, it is possible to redundantly send multiple copies of the textual representation data, or perform error correction techniques, to ensure that a substantially complete copy of the textual representation data arrives at the destination device.”; Bianco, ¶ [0046]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, to incorporate the teachings of Bianco to include the system memory further storing instructions configured to send the caption stream to the one or more other conferencing endpoints redundantly. “If a portion of the digital data created by the CODEC is missing or corrupted, the corresponding portion of the transcription is used to fill in the missing portion,” thus helping to ensure the integrity of the transmission, as recognized by Bianco. (Bianco, ¶ [0044]).

Regarding claim 43, the rejection of claim 23 is incorporated. Claim 43 is substantially the same as claim 16 and is therefore rejected under the same rationale as above.

Claims 17 and 44 is/are rejected under 35 U.S.C. §103 as being unpatentable over Crinon and Bianco as applied to claims 16 and 43 above, and further in view of Thijssen.

Regarding claim 17, the rejection of claim 16 is incorporated. Crinon and Bianco disclose all of the elements of the current invention as stated above. However, Crinon and Bianco fail(s) to expressly recite wherein the redundancy is by use of a forward error correction code.
Thijssen teaches systems and methods for datastream control and processing. (Thijssen, Col. 1, lines 7-15). Regarding claim 17, Thijssen teaches wherein the redundancy is by use of a forward error correction code (“A final buffer 46 is fed by video decoder 40, for buffering user data such as closed-caption information,” where the user data corresponds to user data 52 having an ECC 54. And where, further, the ECC 54 “contains error protection code, such as the redundancy symbols of a Reed-Solomon code that can be used for correcting a percentage of the symbols that have been received in an incorrect manner,” thus, teaching redundancy by use of forward error correction codes (Reed-Solomon is a well-known forward error correction code).; Thijssen, Col. 2, lines 62-64; Col. 3, lines 16-20; FIG. 3).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the methods for quality improvement in IP transmissions of Bianco to incorporate the teachings of Thijssen to include wherein the redundancy is by use of a forward error correction code. The use of readily disposable error correction codes allows for minimal buffer use while correcting “symbols that have been received in an incorrect manner”, as recognized by Thijssen. (Thijssen, Col. 3, lines 18-20).

Regarding claim 44, the rejection of claim 43 is incorporated. Claim 44 is substantially the same as claim 17 and is therefore rejected under the same rationale as above.

Claims 24 is/are rejected under 35 U.S.C. §103 as being unpatentable over Crinon as applied to claim 1, and in further view of Stefani.

Regarding claim 24, the rejection of claim 1 is incorporated. Crinon discloses all of the elements of the current invention as stated above. Crinon further discloses wherein instructions configured to coordinate connection of the conferencing endpoint comprise instructions configured to coordinate connection of the conferencing endpoint for a user (“The real time conferencing component 102 can send and/or receive data (e.g., via a network such as the internet, a corporate intranet, a telephone network,..) utilized in connection with audio/video teleconferences,” through the network interface 1148 and “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be... an audio/video multi-point control unit (AVMCU),... [which] can be a bridge that interconnects several endpoints and enables routing data between the endpoints,” thus coordinating the connection of the other disparate real time conferencing component(s) 104 for a plurality of users; Crinon, ¶¶ [0023], [0069], [0022]). However, Crinon fail(s) to expressly recite further comprising instructions configured to download a lexicon update comprise instructions configured to download a lexicon update containing user-specific information specific to the user and wherein instructions configured to generate the caption stream comprise instructions configured to generate the caption stream based on the user-specific information.
Stefani teaches systems and methods for “streaming real-time automatic speech recognition (ASR).” (Stefani, Col. 2, line 1). Regarding claim 24, Stefani teaches further comprising instructions configured to download a lexicon update comprise instructions configured to download a lexicon update containing user-specific information specific to the user (“the user can upload a file including the words or phrases to be included in the custom dictionary… [and] the frontend service can retrieve {download…} the dictionary file 111 {a lexicon update} and generate the custom dictionary 115 using the file.”; Stefani, ¶¶ Col. 5, lines 37-46); and wherein instructions configured to generate the caption stream comprise instructions configured to generate the caption stream based on the user-specific information (“At D, the decoder host can use the name or identifier assigned to the custom dictionary {user-specific information} to retrieve the custom dictionary 115 from the storage service 109 and use the custom dictionary in the transcription of the audio stream.”; Stefani, ¶¶ Col. 5, lines 52-56).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon to incorporate the teachings of Stefani to include further comprising instructions configured to download a lexicon update comprise instructions configured to download a lexicon update containing user-specific information specific to the user and wherein instructions configured to generate the caption stream comprise instructions configured to generate the caption stream based on the user-specific information. The systems and methods described in Stefani can “automatically generate [accurate] transcripts of the speech [from] the audio data stream.” (Stefani, Col. 2, lines 2-17).

Claims 25 is/are rejected under 35 U.S.C. §103 as being unpatentable over Crinon and Stefani as applied to claim 24 above, and further in view of Drewes (U.S. Pat. App. Pub. No. 2017/0133007, hereinafter Drewes).

Regarding claim 25, the rejection of claim 24 is incorporated. Crinon and Stefani disclose all of the elements of the current invention as stated above. However, Crinon and Stefani fail(s) to expressly recite wherein instructions configured to download a lexicon update containing user-specific information specific comprises instructions configured to download a lexicon update containing at least one of: a names of a user's contact, information from a user's calendar, or content from a user's text message; and wherein instructions configured to generate the caption stream based on the user-specific information comprise instructions configured to generate the caption stream based on the at least one of: the names of the user's contacts, the information from the user's calendar, or the content from a user's text message.
Drewes teaches systems and methods for voice recognition data collection. (Drewes, ¶ [0102]-[0103]). Regarding claim 25, Drewes teaches wherein instructions configured to download a lexicon update containing user-specific information specific comprises instructions configured to download a lexicon update containing at least one of (“The RDB containing data relating to the speakers “user-id and “speaker-mode” (i.e., speaker-dependent) may be used to periodically download mini vocabulary dictionaries containing only one speaker-dependent user’s cumulative data to the PC of each and every speaker-dependent user of the voice recognition system.”; Drewes, ¶¶ [0049]): a names of a user’s contact, information from a user’s calendar, or content from a user’s text message (Includes “User Name of Speaker in Voice Recognition Session” where “a single voice recognition session… [can include] Multiple Speakers”; Drewes, ¶¶ [0140], [0198]); and wherein instructions configured to generate the caption stream based on the user-specific information comprise instructions configured to generate the caption stream (“the digital text of each speakers’ name or some other indication of which speaker is talking may precede the digital text detailing what each speaker said.”; Drewes, ¶¶ [0199]) based on the at least one of: the names of the user’s contacts, the information from the user’s calendar, or the content from a user’s text message (Each speaker’s name is associated with the digital text; Drewes, ¶¶ [0199]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, as modified by the real-time ASR streaming systems of Stefani, to incorporate the teachings of Drewes to include wherein instructions configured to download a lexicon update containing user-specific information specific comprises instructions configured to download a lexicon update containing at least one of: a names of a user's contact, information from a user's calendar, or content from a user's text message; and wherein instructions configured to generate the caption stream based on the user-specific information comprise instructions configured to generate the caption stream based on the at least one of: the names of the user's contacts, the information from the user's calendar, or the content from a user's text message. The systems and methods described in Drewes can provide “performance improvement corresponding to the cumulative voice recognition error corrections” over the corresponding time frame and will “will significantly reduce the rate (% amount) of voice recognition errors,” as recognized by Drewes. (Drewes, ¶ [0088]-[0089]).

Claims 27 is/are rejected under 35 U.S.C. §103 as being unpatentable over Crinon as applied to claim 1 above, and further in view of Parc (U.S. Pat. App. Pub. No. 2021/0210072, hereinafter Parc).

Regarding claim 27, the rejection of claim 1 is incorporated. Crinon discloses all of the elements of the current invention as stated above. However, Crinon fail(s) to expressly recite further comprising instructions configured to: receive a negative acknowledgement from another endpoint included in the one or more other endpoints; and re-send the caption stream to the other endpoint in response to receiving the negative acknowledgement.
Parc teaches “method of authenticating using a message transmitted to the intelligent electronic device.” (Parc, ¶ [0001]). Regarding claim 27, Parc teaches further comprising instructions configured to: receive a negative acknowledgment from another endpoint included in the one or more other endpoints (“A transmitter that performs the HARQ operation transmits data (e.g., a transport block, a codeword) and waits for an acknowledgment (ACK). A receiver that performs the HARQ operation sends an acknowledgment (ACK) only when data is properly received, and sends a negative acknowledgment (NACK) if an error occurs in the received data.”; Parc, ¶¶ [0144]); and re-send the caption stream to the other endpoint in response to receiving the negative acknowledgment (“The transmitter may transmit (new) data if ACK is received, and retransmit data if NACK is received.”; Parc, ¶¶ [0144]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system for closed captioning in real time communication of Crinon, to incorporate the teachings of Parc to include further comprising instructions configured to: receive a negative acknowledgement from another endpoint included in the one or more other endpoints; and re-send the caption stream to the other endpoint in response to receiving the negative acknowledgement. The retransmission of data after a negative acknowledgement (NACK) from the receiving end can reduce transmission delay and can result in improved user convenience, as recognized by Parc. (Parc, ¶ [0144]).

Allowable Subject Matter
Claim 28 is allowed.
The following is an examiner’s statement of reasons for allowance: 
The closest prior art of record Crinon teaches A conferencing endpoint comprising (“real time conferencing component 102”; Crinon, ¶¶ [0022]): an audio input interface; an audio output interface; a video input interface; a display interface; a network interface; a processor; and system memory coupled to the processor and storing instructions configured to cause the processor to (“real time conferencing component 102 can [include] an input component 202 (audio input interface) that can obtain the audio data” and real time conferencing component 102 can be substantially similar to exemplary computing system 1112, including “a processing unit 1114 (processor), a system memory 1116, and a system bus 1118... [which] couples system components including... the system memory 1116 to the processing unit 1114” and a “network interface 1148” as well as a “The real time conferencing component 102 can additionally include a video streaming component 602, an audio streaming component 604, and a synchronization component 606.”; Crinon, ¶¶ [0027], [0063], [0069], [0049]): coordinate connection of the conferencing endpoint via the network interface to a plurality of conferencing endpoints participating in network conference (“The real time conferencing component 102 can send and/or receive data (e.g., via a network such as the internet, a corporate intranet, a telephone network,..) utilized in connection with audio/video teleconferences,” through the network interface 1148 and “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be... an audio/video multi-point control unit (AVMCU),... [which] can be a bridge that interconnects several endpoints and enables routing data between the endpoints,” thus coordinating the connection of the other disparate real time conferencing component(s) 104; Crinon, ¶¶ [0023], [0069], [0022]), the plurality of conferencing endpoints including at least a first conferencing endpoint and a second conferencing endpoint (“The real time conferencing component 102 can send and/or receive data (e.g., via a network such as the internet, a corporate intranet, a telephone network,..) utilized in connection with audio/video teleconferences,” through the network interface 1148 and “the real time conferencing component 102 (and/or the disparate real time conferencing component(s) 104) can be... an audio/video multi-point control unit (AVMCU),... [which] can be a bridge that interconnects several endpoints and enables routing data between the endpoints,” thus coordinating the connection of the other disparate real time conferencing component(s) 104. As shown in FIG. 4, the system can include “any number of receiving endpoints (e.g., a receiving endpoint 1 404, [and] a receiving endpoint 2 406),” which, in this embodiment, is the first conferencing endpoint and the second conferencing endpoint, respectively.; Crinon, ¶¶ [0023], [0069], [0022]); capture an audio stream from the audio input interface (“the input component 202 can be a microphone that can capture the audio data and generate electrical impulses.” where the audio data comprises speech; Crinon, ¶¶ [0028]-[0029]); compress the audio stream (The video data, as multiplexed with the audio data can be compressed; Crinon, ¶¶ [0050]); capture a video stream from the video input interface (A video stream is captured by the “video streaming component 602”; Crinon, ¶¶ [0049]); receive consent to transcribe recognized speech via meeting registration at the conferencing endpoint (“the listening participants can manually and/or automatically negotiate the use of closed captions upon receiving endpoints,” where “the sending endpoint 302 can select whether to disable or enable the ability of receiving endpoints 404-408 to obtain the text data for closed captioning” thus an end user choose whether to consent or not to the transcription of text data for closed captioning (transcribe recognized speech)speech; Crinon, ¶¶ [0040]); multiplex the compressed audio stream, the caption stream, and the video stream into a transport stream (“the synchronization component 606 can employ timestamps to synchronize data (e.g., audio, video, text,..). For example, the timestamps can be in the real time transport protocol (RTP) used by real time communication systems,” where “Separate streams of data including timestamps can be generated... and the streams can be multiplexed over the RTP.” Therefore, the audio stream and the text stream (caption stream) are multiplexed into a transport stream.; Crinon, ¶¶ [0051]). However, Crinon does not specifically teach [elements from claim mapping which [Author1] does not teach]. 
Stefani further teaches download a lexicon update (“the user can upload a file including the words or phrases to be included in the custom dictionary… [and] the frontend service can retrieve {download…} the dictionary file 111 {a lexicon update} and generate the custom dictionary 115 using the file.”; Stefani, ¶¶ Col. 5, lines 37-46); transcribe the compressed audio stream into a caption stream, including: convert the speech into a transcription hypothesis (“ For each chunk of audio data, the streaming ASR engine can use the acoustic model to break {convert} the audio data {the speech} into a series of words {transcription hypothesis}”; Stefani, ¶¶ Col. 4, lines 45-50); and evaluate the transcription hypothesis according to a plurality of natural language grammars to inform speech recognition, including (““The output of the acoustic model can be passed through the language model to identify phrases and/or sentences corresponding to the series of words identified by the acoustic model” where “the language model includes grammar rules”; Stefani, ¶¶ Col. 4, lines 48-52); generate the caption stream from the transcription hypothesis, in accordance with the lexicon update, and in view of the evaluation (“the real-time transcription may be used to provide closed captions of a live event {instructions configured to generate the caption stream…}” where “The language model includes grammar rules, language constructs, and other language-specific nuances” and where the “custom dictionary… [can] be used to transcribe audio {and in accordance with the lexicon update}”; Stefani, ¶¶ Col. 5, lines 1-6, and 33-35; Col. 7, lines 20-25). 
Kashima further teaches match a requisite number of words in the audio stream to words in a topic-specific grammar (“The candidate sentence generating means 19 evolves all the phrases that can be received by the application from the grammars stored by the grammar storing unit 12 to generate candidate sentences” where “The matching means 23 matches these candidate sentences {words in the topic-specific grammar} against the recognition result (N-gram recognition result) {a requisite number of words in the audio stream}” where “sentence data specific to the application stored in the specific sentence storing unit 15 is used to shift the general topic”; Kashima, ¶¶ [0044], [0090], [0096]). 
Stahl further teaches evaluate the transcription hypothesis according to the topic- specific grammar and a user-specific grammar (“Interpreter module 22 consumes the UID and uses it to condition its interpretation according to a grammar” where “some embodiments maintain databases of UID-specific interpretation weights,” where UID-specific interpretation weights are user specific grammar.; Stahl, ¶¶ [0033]). 
 Casagrande further teaches recognize a portion of the speech is included in a blacklist (“the second screen electronic device 400 may utilize a ‘black list’ of unauthorized terms, to prevent unwanted caption words from being transmitted to the second screen content module 412 to be used for the retrieval and presentation of second screen content. For example, words designated by the Federal Communications Commission (FCC) as being unfit for broadcast television during daytime hours may be present on the ‘black list’ of words,” thus recognizing a portion included in a blacklist.; Casagrande, ¶¶ [0058]); and obscure presentation of the blacklisted portion of speech within the caption stream (the system can further “prevent the presentation of content associated with profane language. The ‘black list’ of words may be user-configurable, and may include not only profanity, but any word for which a user does not wish to view additional content,” thus caption stream is adjusted to prevent the presentation (obscure) the speech portion at the one or more conferencing devices (noting that this is user configurable to not receive black listed content and involves a plurality of “receiving endpoints N 408”, thus at the “one or more” conferencing devices).; Casagrande, ¶¶ [0058]).
Bianco further teaches redundantly send the transport stream to the plurality of conferencing endpoints using a forward error correction code (“because the textual representation of a user’s spoken audio input can be encapsulated in far fewer digital data packets than the data created by a CODEC, it is possible to redundantly send multiple copies of the textual representation data, or perform error correction techniques, to ensure that a substantially complete copy of the textual representation data arrives at the destination device.”; Bianco, ¶¶ [0046]).
However, none of the prior art references of record, either alone or in combination, teaches, suggests, or makes obvious the combination of limitations as recited in the independent claims.
More specifically, the limitation of “multiplex the compressed audio stream, the caption stream, and the video stream into a transport stream including muting the audio stream and suspending multiplexing of the compressed audio stream independently from muting the caption stream and suspending multiplexing of the caption stream; redundantly send the transport stream to the plurality of conferencing endpoints using a forward error correction code, including sending the transport stream directly to the first conferencing endpoint and sending the transport stream directly the second conferencing endpoint, via the network interface; receive an other transport stream, including an other audio stream, an other caption stream corresponding to the other audio stream, and an other video stream, directly from an other conferencing endpoint including in the plurality of conferencing endpoints; receive a further transport stream, including a further audio stream, a further caption stream corresponding to the further audio stream, and a further video stream directly from a further conferencing endpoint included in the plurality of other conferencing endpoints; and coordinate outputs at the conferencing endpoint, including: coordinate output of the other audio stream at the audio output interface with output of the other caption stream and the video stream at the display interface, including presenting the other video stream in a window and presenting the other caption stream with first visual characteristics, including a first text color, and with a person depicted in the in the other video stream in another window supplementing presentation of the video stream in an other the window; and concurrent with presentation of the other caption stream, present the further caption stream at the display interface with second visual characteristics, including a second text color, and with another person depicted in the further video stream, the second visual characteristics differing from the first visual characteristics and the second color differing from the first color” alongside the remaining limitations of independent claim 28, is not taught by the prior art of record.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
                                                                                                                                                                                                      
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/Sean E Serraguard/Patent Examiner, Art Unit 2657    

/LAMONT M SPOONER/Primary Examiner, Art Unit 2657                                                                                                                                                                                                        12/3/2022