DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 17 June 2020 and 12 November 2020, respectively, are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 11-13, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over US 20150347395, hereinafter referred to as Cuthbert et al., in view of US 20170364509, hereinafter referred to as Cordell et al.    

claim 1, Cuthbert et al. discloses a method of processing audio communications over a network (Cuthbert et al., fig. 1(112) and para [0035]), comprising: 

at a first client device that has one or more processors and memory (Cuthbert et al., fig. 1(124). See also Cuthbert et al., fig. 4A (404(408).), the first client device having established an audio and/or video communication session with a second client device over the network (Cuthbert et al., fig. 1 and para [0038]): 

during the audio and/or video communication session: 

receiving a first audio transmission from the second client device, wherein the first audio transmission is provided by the second client device in a source language that is distinct from a default language associated with the first client device (“The mobile computing device 108 can be associated with a speaker user 116 that is speaking to a group of listener users 120-1 . . . 120-N (N.gtoreq.1, collectively "listener users 120") associated with example computing devices 124-1 . . . 124-N (collectively "computing devices 124"), respectively. The mobile computing device 108 and the computing devices 124 can be referred to as a "speaker computing device 108" and "listener computing devices 124," respectively. When the speaker user 116 is speaking in a language that is not spoken or understood by a specific listener user 120, a real-time translation can be provided to the specific listener computing device 124 associated with the specific listener user 120 according to the techniques of the present disclosure,” 

in accordance with a determination that the one or more current user language attributes suggest a target language that is currently used for the audio and/or video communication session at the first client device, and in accordance with a determination that the target language is distinct from the default language associated with the first client device: 

obtaining a translation of the first audio transmission from the source language from the source language into the target language (Cuthbert et al., para [0039] and para [0045]. See also Cuthbert et al., fig. 2B.); and 

presenting the translation of the first audio transmission in the target language to a user at the first client device (Cuthbert et al., para [0039]).  

Cuthbert et al. does not specifically disclose obtaining one or more current user language attributes for the first client device, wherein the one or more current user language attributes are indicative of a current language that is used for the audio and/or video communication session at the first client device. 

Cordell et al. is cited to disclose obtaining one or more current user language attributes for the first client device, wherein the one or more current user language attributes are indicative of a current language that is used for the audio and/or video communication session at the first client device (“The configuration utilizes the capabilities of a mobile device corresponding to a user to augment a VRI session to enhance the language interpretation/translation with one or more features. For instance, a context of the language interpretation/translation session, e.g., geographical location, may be determined via the mobile device of the user,” [0015]. The geographical location is an attribute used to identify a location’s likely target language.). Cordell et al. benefits Cuthbert et al. by providing contextually information to the language interpreter/translator, thereby aiding a user with language translation without the user having to actively provide information (Cordell et al., para [0004]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Cuthbert et al. with those of Cordell et al. to improve the real-time translation of Cuthbert et al.

As to claim 11, device claim 11 and method claim 1 are related as method and device of using same, with each claimed element’s function corresponding to the method. Accordingly claim 11 is similarly rejected under the same rationale as applied above with respect to method claim. 

As to claim 17, CRM claim 17 and method claim 10 are related as method and CRM of using same, with each claimed element’s function corresponding to the method. Accordingly claim 17 is similarly rejected under the same rationale as applied above with respect to method claim. 


claim 2, Cuthbert et al., as modified by Cordell et al., discloses the method of claim 1, wherein the obtaining the one or more current user language attributes for the first client device includes: 

obtaining facial features of the user at the first client device (“As another example, the user 102 may perform image, video, or audio capture with the mobile computing device 103. The mobile computing device 103 may then automatically send the captured images, videos, or audio to the augmentation engine 107 to perform an analysis. The augmentation engine 107 may then automatically perform the analysis and/or request that the language interpreter/translator 105 perform the analysis. For instance, facial recognition, object recognition, and speech recognition may be utilized to determine the contents of the captured data,” Cordell et al., para [0026].) and obtaining geolocation information of the first client device (“As an example, the augmentation engine 107 may generate popup messages to be sent to the mobile computing device 103 based on the contextual data and particular words or phrases spoken during the language interpretation/translation session. In other words, the augmentation engine 107 may be configured to automatically generate a particular popup message based on a particular context and a particular keyword that occurs during the language interpretation/translation session. For instance, the mobile computing device 103 may send contextual data to the augmentation engine 107 that indicates the GPS coordinates of the user 102,” Cordell et al., para [0025].); and 

the facial features of the user at the first client device and the geolocation information of the first client device are utilized in combination to suggest the target language as the 

As to claim 12, device claim 12 and method claim 2 are related as method and device of using same, with each claimed element’s function corresponding to the method. Accordingly claim 12 is similarly rejected under the same rationale as applied above with respect to method claim. 

As to claim 18, CRM claim 18 and method claim 2 are related as method and CRM of using same, with each claimed element’s function corresponding to the method. Accordingly claim 18 is similarly rejected under the same rationale as applied above with respect to method claim. 


Regarding claim 3, Cuthbert et al., as modified by Cordell et al., discloses the method of claim 1, wherein the obtaining the one or more current user language attributes for the first client device includes: 

obtaining an audio input received locally at the first client device during the audio and/or video communication session (Cuthbert et al., para [0037]); and 

the audio input that is received locally at the first client device is analyzed linguistically to suggest the target language as the current language that is used at the first client device (Cordell et al., para [0026] – analysis of audio feed/speech recognition is a form of linguistic analysis to suggest a language.).  

claim 13, device claim 13 and method claim 3 are related as method and device of using same, with each claimed element’s function corresponding to the method. Accordingly claim 13 is similarly rejected under the same rationale as applied above with respect to method claim. 

As to claim 19, CRM claim 19 and method claim 3 are related as method and CRM of using same, with each claimed element’s function corresponding to the method. Accordingly claim 19 is similarly rejected under the same rationale as applied above with respect to method claim. 


Claims 4-5, 14-15, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over US 20150347395, hereinafter referred to as Cuthbert et al., in view of US 20170364509, hereinafter referred to as Cordell et al., and further in view of US 20100185434, hereinafter referred to as Burvall et al.        

Regarding claim 4, Cuthbert et al., as modified by Cordell et al., discloses the method of claim 1, but not further comprising: 

obtaining vocal characteristics of a voice in the first audio transmission; and 

according to the vocal characteristics of the voice in the first audio transmission, generating a simulated first audio transmission that includes the translation of the first audio transmission spoken in the target language in accordance with the vocal characteristics of the voice of the first audio transmission.

In some embodiments, the speech synthesis unit 247 may generate the translated signal using a voice in accordance with a received indication of the age, sex, and/or other vocal characteristics of the user of the wireless terminal 100a,” Burvall et al., para [0046].); and 

according to the vocal characteristics of the voice in the first audio transmission, generating a simulated first audio transmission that includes the translation of the first audio transmission spoken in the target language in accordance with the vocal characteristics of the voice of the first audio transmission (Burvall et al., para [0046]). Burvall et al. benefits Cuthbert et al. by providing a translation which embodies the vocal characteristics of the speaker (Burvall et al., para [0046]), thereby conveying a more realistic translation to the listener. Therefore, it would be obvious for one skilled in the art to combine the teachings of Cuthbert et al. with those of Burvall et al. to improve the real-time translation of Cuthbert et al.

As to claim 14, device claim 14 and method claim 4 are related as method and device of using same, with each claimed element’s function corresponding to the method. Accordingly claim 14 is similarly rejected under the same rationale as applied above with respect to method claim. 

claim 20, CRM claim 20 and method claim 4 are related as method and CRM of using same, with each claimed element’s function corresponding to the method. Accordingly claim 20 is similarly rejected under the same rationale as applied above with respect to method claim. 


Regarding claim 5, Cuthbert et al., as modified by Cordell et al., discloses the method of claim 1, wherein the presenting the translation of the first audio transmission in the target language to a user at the first client device includes: 

presenting a textual representation of the translation of the first audio transmission in the target language to the user at the first client device (Burvall et al., fig. 2(224b)); and 

presenting a simulated first audio transmission that is generated in accordance with vocal characteristics of a voice in the first audio transmission (Burvall et al., fig. 2(247b)).  

As to claim 15, device claim 15 and method claim 5 are related as method and device of using same, with each claimed element’s function corresponding to the method. Accordingly claim 15 is similarly rejected under the same rationale as applied above with respect to method claim. 


Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over US 20150347395, hereinafter referred to as Cuthbert et al., in view of US 20170364509, hereinafter referred to as Cordell et al., and further in view of US 20150347399, hereinafter referred to as Aue et al.   

Regarding claim 6, Cuthbert et al., as modified by Cordell et al., discloses the method of claim 1, further comprising: 

during the audio and/or video communication session: 

detecting continuous speech input from a user located at the first client device (Cuthbert et al., para [0039]).

Neither Cuthbert et al. nor Cordell et al., though, disclose marking a start time of the first continuous speech input as a beginning of a first audio paragraph detected at the first client device; 03 1384-5885-US36detecting a first predefined break in the continuous speech input at the first client device; in response to detecting the first predefined break in the continuous speech input, marking a start time of the first predefined break as an end of the first audio paragraph detected at the first client device, wherein the first audio paragraph is included in a second audio transmission that is sent to the second client device.

Aue et al. is cited to disclose marking a start time of the first continuous speech input as a beginning of a first audio paragraph detected at the first client device (“An interval of speech activity may e.g. correspond to a sentence or small number of sentences preceded and followed by a pause in Alice's speech,” Aue et al., para [0037].);  

03 1384-5885-US36detecting a first predefined break in the continuous speech input at the first client device (“An interval of speech activity may e.g. correspond to a sentence or small number of sentences preceded and followed by a pause in Alice's speech,”  Aue et al., para [0037].); 

in response to detecting the first predefined break in the continuous speech input, marking a start time of the first predefined break as an end of the first audio paragraph detected at the first client device, wherein the first audio paragraph is included in a second audio transmission that is sent to the second client device (“The live translation may be performed per-such interval of speech activity so a translation of Alice's immediately preceding interval of speech activity is triggered by a sufficient (e.g. predetermined) interval of speech inactivity ("immediately preceding" referring to the most recent interval of speech activity that has not already been translated). In this case, as soon as that translation is complete, it may be transmitted to Bob for outputting so that Bob hears it as soon as possible after hearing Alice's most recent period of natural speech activity i.e. so that a period of speech Activity by Alice is heard by Bob, followed by a short pause (while the translation and transmission thereof are performed), followed by Bob hearing and/or seeing the translation of Alice's speech in that interval,” Aue et al., para [0037].). Aue et al. benefits Cuthbert et al. by providing real-time video to accompany an audio translation (Aue et al., para [-0037]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Cuthbert et al. with those of Aue et al. to improve the real-time translation experience of Cuthbert et al.

As to claim 16, device claim 16 and method claim 6 are related as method and device of using same, with each claimed element’s function corresponding to the method. . 


Claims 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over US 20150347395, hereinafter referred to as Cuthbert et al., in view of US 20170364509, hereinafter referred to as Cordell et al., further in view of US 20150347399, hereinafter referred to as Aue et al., and further in view of US 8706486, hereinafter referred to as Devarajan et al. 

Regarding claim 7, Cuthbert et al., as modified by Cordell et al. and Aue et al., discloses the method of claim 6, further comprising: 

after detecting the first predefined break in the continuous speech input, generating a first audio packet that includes the first audio paragraph (Devarajan et al., col. 5, line 50 – col. 6, line 3. Here, a break in communication due to a deleted word, is replaced with a “silent packet”. The communication before and after the break is sent in separate packets.); 

sending the first audio packet to the second client device as a first portion of the second audio transmission (Devarajan et al., col. 5, line 50 – col. 6, line 3.); and 

while generating the first audio packet and sending the first audio packet: 

continuing to detect the continuous speech input from the user located at the first client device, wherein at least a portion of the continuous speech input that is detected while generating and sending the first audio packet is included in the second audio Devarajan et al. benefits Cuthbert et al. by enabling live censoring of ongoing conversations (Devarajan et al., col. 6, lines 2-3). Therefore, it would be obvious for one skilled in the art to combine the teachings of Cuthbert et al. with those of Devarajan et al. to improve the real-time translation experience of Cuthbert et al.


Regarding claim 8, Cuthbert et al., as modified by Cordell et al., Aue et al., and Devarajan et al., discloses the method of claim 7, wherein: 

two or more audio paragraphs including the first audio paragraph and the second audio paragraph are translated into the source language of the first audio transmission for presentation at the second client device (Aue et al., para [0037]).  


Regarding claim 9, Cuthbert et al., as modified by Cordell et al. and Aue et al., discloses the method of claim 6, but not further comprising: 

disclose during the audio and/or video communication session: 

identifying two or more audio paragraphs in a continuous speech input; and 

generating a respective audio packet for each of the two or more audio paragraphs; and

sending the respective audio packets for the two or more audio paragraphs to the second client device sequentially according to the respective start time stamps thereof.



identifying two or more audio paragraphs in a continuous speech input (Devarajan et al., col. 5, line 50 – col. 6, line 3.); and 

generating a respective audio packet for each of the two or more audio paragraphs (Devarajan et al., col. 5, line 50 – col. 6, line 3.); and 

sending the respective audio packets for the two or more audio paragraphs to the second client device sequentially according to the respective start time stamps thereof (Devarajan et al., col. 7, lines 38-42.). Devarajan et al. benefits Cuthbert et al. by enabling live censoring of ongoing conversations (Devarajan et al., col. 6, lines 2-3). Therefore, it would be obvious for one skilled in the art to combine the teachings of Cuthbert et al. with those of Devarajan et al. to improve the real-time translation experience of Cuthbert et al.


Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over US 20150347395, hereinafter referred to as Cuthbert et al., in view of US 20170364509, hereinafter referred to as Cordell et al., further in view of US 20150347399, hereinafter referred to as Aue et al., further in view of US 8706486, hereinafter referred to as Devarajan et al., and further in view of US 8779265, hereinafter referred to as Gottlieb.   

Regarding claim 10, Cuthbert et al., as modified by Cordell et al. and Devarajan et al., discloses the method of claim 9, further comprising: 

during the audio and/or video communication session: 

continuously capturing video using a camera at the first client device, while capturing the continuous speech input at the first client device (Aue et al., para [0037]).

However, none of the references discloses marking the continuously captured video with the respective start timestamps of the two or more audio paragraphs, wherein the respective start timestamps to synchronize presentation of the video and respective translations of the two or more audio paragraphs at the second client device.

Gottlieb is cited to disclose marking the continuously captured video with the respective start timestamps of the two or more audio paragraphs, wherein the respective start timestamps to synchronize presentation of the video and respective translations of the two or more audio paragraphs at the second client device (Gottlieb, col. 29, line 60 – col. 30, line 13). Gottlieb benefits Cuthbert et al. by synchronizing video with audio translation (Gottlieb, col. 29, line 60 – col. 30, line 13). Therefore, it would be obvious for one skilled in the art to combine the teachings of Cuthbert et al. with those of to improve the real-time translation experience of Cuthbert et al.

Conclusion
The prior art made of record and not relied upon is considered pertinent to the Applicant’s disclosure. See PTO 892 from previous office action(s). 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANNE L THOMAS-HOMESCU whose telephone number is (571)272-0899.  The examiner can normally be reached on Mon-Fri 8-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on 5712727453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ANNE L THOMAS-HOMESCU/Primary Examiner, Art Unit 2656