DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-3, 5, 7, 13, 15-20, 22, 24-25, 28-30 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Cutler et al. (US 2018/0218727 A1, “Cutler”).
As to claims 1, 18, 24, 29, Cutler discloses a device for communication (receiving user terminal 102b, installed with a communication client application 103; Fig. 1, para. 0027-0029) comprising: 
one or more processors (one or more processing units, para. 0029) configured to: 
receive, during an online meeting (multiparty voice or video or VoIP session, para. 0029, 0032), a speech audio stream representing speech of a first user (terminal 102b receives audio data comprising the speech of sending user 106a, para. 0031); 
receive a text stream representing the speech of the first user (sending client 103a converts the sending user’s locally captured speech to text, which is sent to the receiving user terminal 102b in parallel with the audio but in separate packets, para. 0035); and 
selectively generate an output based on the text stream in response to an interruption in the speech audio stream (received text is converted to synthesized speech in response to poor audio quality due to packet loss, delay etc., para. 0011, 0024, 0026, 0034, 0045, 0064).
As to claims 2, 19, Cutler discloses: wherein the one or more processors are configured to detect the interruption in response to determining that no audio frames of the speech audio stream are received within a threshold duration of a last received audio frame of the speech audio stream (receiving client 103b detects that network conditions have fallen below a threshold quality, such as long delay or packet loss, para. 0034-0035).  
As to claims 3, 20, Cutler discloses: wherein the one or more processors are configured to detect the interruption in response to receiving the text stream (network condition measuring and speech-to-text conversion may be implemented at a server 104, para. 0063-0064, such that the reception of text indicates an interruption).
As to claims 5, 22, Cutler discloses: wherein the one or more processors are configured to provide the text stream as the output to a display (text information is output on a visual display screen 224, para. 0003, 0026, 0048, 0060).
As to claims 7, 25, Cutler discloses: wherein the one or more processors are further configured to: 
perform text-to-speech conversion on the text stream to generate a synthesized speech audio stream (text-to-speech converter 218 converts received text to synthesized speech at the receive end based on a model of the sending user’s voice, para. 0035); and 
provide the synthesized speech audio stream as the output to a speaker (synthesized speech is played out to the receiving user 106b through speaker(s) 222, para. 0035, 0045-0047, 0060-0061).
As to claim 13, Cutler discloses: wherein the text-to-speech conversion is performed based on a speech model (text is converted to synthesized speech based on a speech model 209 of the transmitting user’s voice, Abstract, para. 0006, 0035-0036, 0041-0042).
As to claims 15, 28, Cutler discloses: wherein the one or more processors are configured to, prior to the interruption, update the speech model based on the speech audio stream (voice model values are dynamically updated during the call, para. 0056).
As to claim 16, Cutler discloses: wherein the one or more processors are configured to: 
receive, during the online meeting, a second speech audio stream representing speech of a second user (multiparty voice or video or VoIP session, para. 0029, 0032); and 
provide the second speech audio stream to a speaker concurrently with generating the output (para. 0029, 0032).
As to claim 17, Cutler discloses: wherein the one or more processors are configured to: 
halt playback of the speech audio stream in response to the interruption in the speech audio stream (when network conditions are classified as poor, the received audio 250 and any video 252 are no longer played out, para. 0060); and 
in response to the interruption ending (controller 216 detects that the connection quality is classified as good again, para. 0046): 
refrain from generating the output based on the text stream (text-to-speech converter 218 stops playing out the synthesized speech, para. 0046; text is output only when the connection is classified as poor, as above with regard to the synthesized speech, para. 0048); and 
resume playback of the speech audio stream (controller 216 starts playing out the received audio 250 again, para. 0046).
As to claim 30, Cutler discloses: wherein the means for receiving the speech audio stream, the means for receiving the text stream, and the means for selectively generating the output are integrated into at least one of a virtual assistant, a home appliance, a smart device, an internet of things (IoT) device, a communication device, a headset, a vehicle, a computer, a display device, a television, a gaming console, a music player, a radio, a video player, an entertainment unit, a personal media player, a digital video player, a camera, or a navigation device (user terminals 102 may take any of a variety of different forms, such as desktop computer, smartphone, smart TV, set-top box, etc. para. 0028).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 4, 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cutler in view of Maistri et al. (US 2017/0237784 A1, “Maistri”).
Cutler differs from claims 4, 21 in that it does not disclose: wherein the one or more processors are configured to detect the interruption in response to receiving an interruption notification.
Maistri discloses a multi-point video conference during which an alert may be generated to notify the system that one or more quality parameters, e.g. packet loss, exceed a threshold value so that the system may responsively adjust parameters to reduce bandwidth usage (para. 0046-0047).
Since Cutler teaches the server measuring and classifying network conditions (para. 0064), it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Cutler with the above teaching of Maistri in order to notify a user terminal of a network condition which falls below a predetermined level of quality.
Claim(s) 6, 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cutler in view of Totzke et al. (US 2020/0302954 A1, “Totzke”).
Cutler differs from claims 6, 23 in that although it teaches transmitting one or more speech parameters 256 detected in the audio data (Fig. 2, para. 0055-0056), it does not disclose: wherein the one or more processors are further configured to: receive a metadata stream indicating intonations of the speech of the first user; and annotate the text stream based on the metadata stream.
Totzke discloses an audio/video conference in which speech-converted text is enriched with metadata indicating sentiment, such as sarcasm, seriousness, joking, etc. (para. 0015, 0033, 0046, 0048).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Cutler with the above teaching of Totzke in order to determine the true content of a participant’s speech, as taught by Totzke (para. 0015).
Claim(s) 8, 26 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cutler in view of Yang et al. (US 2020/0035215 A1, “Yang”).
Cutler differs from claims 8, 26 in that although it teaches transmitting one or more speech parameters 256 detected in the audio data (Fig. 2, para. 0055-0056), it does not disclose: wherein the one or more processors are further configured to receive a metadata stream indicating intonations of the speech of the first user, wherein the text-to-speech conversion is based on the metadata stream.
Yang discloses synthesizing speech corresponding to received metadata based on determined emotion information (Abstract, claim 1).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Cutler with the above teaching of Yang in order to provide a more lively and realistic speech synthesis to a user, as taught by Yang (para. 0296).
Claim(s) 9-12, 27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cutler in view of Marlow et al. (US 2017/0332044 A1, “Marlow”).
Cutler differs from claims 9, 27 in that it does not disclose: wherein the one or more processors are further configured to display an avatar concurrently with providing the synthesized speech audio stream to the speaker.  Rather, in Cutler, only text and/or synthesized speech is played out when network conditions are classified as poor (para. 0060).
Marlow discloses replacing a videoconference stream image with an animated avatar in response to an interruption (para. 0070).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Cutler with the above teaching of Marlow in order to provide a richer representation of a meeting participant during bandwidth-related interruptions, as taught by Marlow (Abstract, last sentence).
As to claim 10, Cutler in view of Marlow discloses: wherein the one or more processors are configured to receive a media stream during the online meeting, the media stream including the speech audio stream and a video stream of the first user (Cutler: user terminal transmits audio and video to each of the other user terminals, para. 0029, 0031, 0037, 0040).
As to claim 11, Cutler in view of Marlow discloses: wherein the one or more processors are configured to, in response to the interruption: halt playback of the speech audio stream; and halt playback of the video stream (Cutler: when network conditions are classified as poor, the received audio 250 and any video 252 are no longer played out, para. 0060).
As to claim 12, Cutler in view of Marlow discloses: wherein the one or more processors are configured to, in response to the interruption ending (Cutler: controller 216 detects that the connection quality is classified as good again, para. 0046): 
refrain from providing the synthesized speech audio stream to the speaker (Cutler: text-to-speech converter 218 stops playing out the synthesized speech, para. 0046); 
refrain from displaying the avatar (Marlow: live video is displayed during normal bandwidth conditions, para. 0031); 
resume playback of the video stream (Cutler: where it is said that audio data comprises the speech of the sending user, this will be understood to cover the typical scenario in which the audio is transmitted, and similarly for any transmitted video, para. 0031); and 
resume playback of the speech audio stream (Cutler: controller 216 starts playing out the received audio 250 again, para. 0046).
Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cutler in view of Calle et al. (US 2018/0358003 A1, “Calle”).
Cutler differs from claim 14 in that it does not disclose: wherein the speech model corresponds to a generic speech model.
Calle teaches using generic voice models as alternative to providing a custom voice model (Abstract, para. 0059, 0061).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Cutler with the above teaching of Calle in order to increase speech quality, as taught by Calle (para. 0064).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Thagadur Shivapa et al. (US 2022/0217425 A1) teach determining frame loss when no audio frame has been received for a threshold duration since a last received audio frame (Abstract).
Vilke et al. (US 10,971,161 B1) teach detecting audio loss when an audio portion hasn’t been received within a specific threshold duration (col. 9, lines 30-37).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to STELLA L WOO whose telephone number is (571)272-7512. The examiner can normally be reached Monday - Friday, 9 a.m. to 3 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached on 571-272-7488. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Stella L. Woo/            Primary Examiner, Art Unit 2652