DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
The following title is suggested: Changing First Voice at First Computing Device to Second Voice at Second Computing Device.
The disclosure is objected to because of the following informalities:
In ¶[0022], “change over time” should be “changes over time”.
In ¶[0022], “to a change the data” should be “to a change in the data”.
In ¶[0027], “maybe configured” should be “may be configured”.
In ¶[0035], “voice selection models 172” should be “voice selection models 170”.  See Figure 1.
In ¶[0046], “memory 315” should be “memory 215”, given that this is a description of Figure 2.
In ¶[0052], “the phone number one of the users” should be “the phone number of one of the users”.
In ¶[0053], “preferences 290 the users” should be “preferences 290 of the users”.
In ¶[0056], “maybe configured” should be “may be configured”.
In ¶[0069], “outputs the speech audio the server” should be “outputs the speech audio to the server”.
In ¶[0071], “hearing the voice other user” should be “hearing the voice of the other user”.
In ¶[0071], “receives the audio data the encodes” should be “receives the audio data that encodes”.  
In ¶[0098], “then the user 113 request” should be “then the user 113 requests”.
Appropriate correction is required.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 3 to 4, 6, 9, 11 to 12, 14, 17, and 19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Conway (U.S. Patent No. 8,650,035).
Regarding independent claims 1, 9, and 17, Conway discloses a method, system, and algorithm for speech conversion, comprising:
“receiving, from a first computing device, first audio data that includes representations of one or more words in a first voice” – system 10 may include a public switched telephone network (PSTN) or a wireless voice communication network such as a cellular phone network or Voice over Internet Protocol (VoIP) network (column 3, lines 1 to 4: Figure 1); speech conversion system 10 includes speech converter 10 configured to convert an original speech signal 20 received from a party 14 (“in a first voice”) (column 3, lines 13 to 16: Figure 1); parties 14 provide original, i.e., unconverted, speech signals 20 (column 3, lines 48 to 49: Figure 1); conversion server 20 receives an original speech signal 20 from a party 14 (column 4, lines 19 to 21: Figure 1); conversion server 26 receives a speech signal 20a from party 14a (column 7, lines 37 to 39: Figure 3: Step 100); implicitly, speech of a party “includes representations of one or more words”; that is, ‘speech’, by definition, is a representation of words by sounds; here, party 14a provides speech is “a first voice”, i.e., an unconverted voice, that is provided from a cellular telephone (“a first computing device”) to speech converter 18a (“from a first computing device”);
“generating second audio data that includes representations of the one or more words in a second voice” – at least one speech converter is configured to convert a first speech signal received from the at least one first party into a converted first speech signal different than the first speech signal (Abstract); a first speech converter 18a is deployed so as to be able to receive an original speech signal 20a from a first party 14a, and to convert the speech signal 20a to a converted speech signal 22a that is transmitted to a second party 14b; a second speech converter 18b is deployed so as to be able to receive an original speech signal 20b from a party 14b, and to convert the speech signal 20b to a converted speech signal 22b that is transmitted to the first party 14a (column 3, lines 15 to 24: Figure 1); speech converter 18 may be any speech converting device capable of receiving an original voice signal 20 and converting the received original signal 20 to a different voice signal 22; speech converter 18 may be configured to perform speech conversion including gender translation, accent translations, and speech tone translations (column 3, lines 33 to 41: Figure 1);
“providing, for output to a second computing device, the second audio data” – system 10 may include a public switched telephone network (PSTN) or a wireless voice communication network such as a cellular phone network or Voice over Internet Protocol (VoIP) network (column 3, lines 1 to 4: Figure 1); a first speech converter 18a is deployed so as to be able to receive an original speech signal 20a from a first party 14a and to convert the speech signal 20a to a converted speech signal 22a that is transmitted to a second party 14b (column 3, lines 16 to 20: Figure 1); conversion heuristics 25 perform an appropriate speech conversion on the original speech signal 20 to be transmitted over voice communication network 12 as a converted voice signal 22 (column 7, line 67 to column 8, line 4: Figure 3: Step 108); here, party 14b has a cellular telephone for receiving and outputting a converted speech signal 22a (“for output to a second computing device”).

Regarding claims 3 and 11, Conway discloses that a speech converter library 24 includes a number of speech conversion heuristics 25 that may be applied to convert the original speech signal 20 to a converted speech signal 22; speech converter library 24 includes a database for storing conversion heuristics for performing speech conversion including gender translations, accent translations, and speech tone translations (column 3, lines 35 to 67: Figure 1); two speech conversion libraries 24a and 24b corresponding to the two depicted parties 14a and 14b provide a separate conversion library 24 for each party 14 (column 4, lines 1 to 17: Figure 1); conversion heuristics 25 may be selected by a party 14 through a converter selection interface (column 4, line 66 to column 5, line 2: Figure 1); converter selection interface 32 is used to enable one or more of the parties 14 to manually select at least one conversion heuristic 25 from the speech converter library 24 for converting speech signals 20 in a desired manner (“selecting, from among multiple voices, the second voice”) (column 5, line 63 to column 6, line 10: Figure 1); here, speech converter library 24 provides “multiple voices”.
Regarding claims 4 and 12, Conway discloses that conversion heuristics 25 may be selected by a party 14 through a converter selection interface (column 4, line 66 to column 5, line 2: Figure 1); converter selection interface 32 is used to enable one or more of the parties 14 to manually select at least one conversion heuristic 25 from the speech converter library 24 for converting speech signals 20 in a desired manner (column 5, line 63 to column 6, line 10: Figure 1); a second party 14b provides input via converter selection interface 32 indicating a decision whether to manually select a conversion heuristic or heuristics 25 from the speech converter library 24 (column 7, lines 43 to 48: Figure 3: Step 104); a second party 14b manually selects the conversion heuristic or heuristics 25 from speech converter library 24 (“a second user of the second computing device selects the second voice”) (column 7, lines 57 to 59: Figure 3: Step 106); here, parties 14a and 14b have a converter selection interface 32 (“wherein a first user of the first computing device or a second user of the second computing device selects the second voice”).
Regarding claims 6, 14, and 19, Conway discloses an embodiment where one or more of the parties 14 is provided with an ability to disable the speech conversion system 10 using converter selection interface 32 such that communication over the voice communication network 12 can be accomplished without speech conversion (column 6, lines 35 to 40: Figure 1); here, if a party 14, e.g., a second party 14b, has an ability to disable speech conversion system 10, then, this is equivalent to “ceasing to provide, for output to the second computing device, the second audio data” or “providing, for output to the second computing device, the first audio data”; that is, disabling speech conversion ‘ceases to provide’ conversion and returns to communicating the original speech, i.e., “the first audio data”; a second party 14b provides input via converter selection interface 32 indicating a decision whether to manually select a conversion heuristic or heuristics 25 from the speech converter library 24 (column 7, lines 43 to 48: Figure 3: Step 104); disabling the speech conversion by a second party 14b via converter selection interface 32 is equivalent to “receiving, from the second computing device, a request to transmit the first audio data that includes the representation of the one or more words in the first voice”.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Conway (U.S. Patent No. 8,650,035) in view of Alameh et al. (U.S. Patent No. 10,755,695).
Conway discloses that conversion server 26 determines identification information 30 based on the received original speech signal, where party identification information 30 is determined from an area code and telephone number of a party 14.  (Column 5, Lines 3 to 62)  However, Conway does not expressly provide for “generating data indicating a voice change of the first audio data” and “providing, for output to the second computing device, the data indicating the voice change of the first audio data.”  
Still, Alameh et al. teaches voice synthesis and acoustic watermarking in electronic devices, where an audio synthesizer can apply an acoustic watermark to a voice-synthesized audio output stream, the acoustic watermark indicating that the voice-synthesized audio output stream is machine made.  (Abstract)  A voice generated by a voice assistant, which constitutes a voice-synthesized audio stream, is a function of one or more audible characteristics, such as pitch, timbre, cadence, rhythm, volume, and spectrum, extracted from voice input received from an authorized user of the electronic device.  Audible characteristics can be extracted from voice input to modify a predefined voice synthesis model of an audio synthesizer to give the voice-synthesized audio output stream characteristics of the authorized user’s voice.  The characteristics can be used to make the voice, i.e., the voice-synthesized audio output stream, of the voice assistant mimic or resemble the voice of the authorized user.  The acoustic watermark can carry data, such as the identity of the electronic device or its owner.  The voice-synthesized audio output stream includes an audio watermark that indicates that the voice-synthesized audio output stream was machine made.  (Column 3, Line 35 to Column 4, Line 7)  Here, an acoustic watermark is “data indicating a voice change of the first audio signal”.  That is, a voice is changed from an original voice because it is voice-synthesized, and this acoustic watermark is “providing, for output to the second computing device” “data indicating a voice change of the first audio data” to synthesized speech.  A second party 14b, then, can determine that “a voice change” was performed on “the first audio data” due to the presence of an acoustic watermark indicating that a voice-synthesized audio output stream was machine made.  It would have been obvious to one having ordinary skill in the art to provide to a second computing device an acoustic watermark indicating a voice change as taught by Alameh et al. to perform speech conversion in Conway for a purpose of indicating that a voice-synthesized output stream was machine made.

Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Conway (U.S. Patent No. 8,650,035) in view of Lobzakov (U.S. Patent No. 8,645,140).
Conway discloses that conversion server 26 determines identification information 30 based on the received original speech signal, where party identification information 30 is determined from an area code and telephone number of a party 14.  (Column 5, Lines 3 to 62)  Conway, then, discloses “determining an identity of a speaker of the first audio data”.  However, Conway does not expressly disclose “providing, for output to the second computing device, data identifying the speaker of the first audio data” or that an identify of a speaker is “based on the second audio data”.  Still, Conway discloses that parties 14a and 14b may be engaged in a call on cellular phones, and it is well known that caller identification provides an identity of a caller to a called party.  
Generally, Lobzakov teaches associating a voice font with a contact for text-to-speech conversion at an electronic device.  (Abstract)  Upon receipt of a communication, such as a telephone call, output is commonly provided from an electronic device in the form of a notification of receipt of the communication.  Upon receipt of a telephone call, an audible notification such as a ring tone may be provided along with visual notification on the display such as a caller identification.  (Column 1, Lines 27 to 33)  A voice font is associated with a contact record, where the contact data can include the name, address, telephone numbers, and email addresses, and a voice font can be added to a contact record.  (Column 7, Lines 12 to 25: Figure 7)  A communication, such as a telephone call, is received at portable electronic device 100.  The originator of the communication is then determined by an identifier such as the phone number provided using caller identification in the case of a telephone call.  If a match is found to one of the contact data records, processor 103 determines if a voice font is stored in a contact data record.  If a voice font is stored in a contact data record, then voice units can be retrieved from memory 108 to begin text-to-speech conversion.  (Column 18, Line 62 to Column 19, Line 41: Figure 9)  Broadly, caller identification is “providing, for output to the second computing device, data identifying the speaker of the first audio data” and “determining an identity of a speaker of the first audio data”.  This caller identification information can be included within “the second audio data” that is transmitted from a cellular telephone of a first party 14a to a cellular telephone of a second party 14b for “determining an identity of a speaker of the audio data” “based on the second audio data”.  An objective is to provide a voice font that is associated with a caller based on contact data records.  (Abstract)  It would have been obvious to one having ordinary skill in the art to provide an identity of a speaker for output to a second computing device as taught by Lobzakov to perform speech conversion in Conway for a purpose of applying a voice font to perform text-to-speech conversion based on an identity of a caller.

Claims 8, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Conway (U.S. Patent No. 8,650,035) in view of Cutler et al. (U.S. Patent Publication 2018/0218727).
Conway discloses the limitations of “determining a first location of the first computing device”, “determining a second location of the second computing device”, and “selecting the second voice based on one or more of: the first location of the first computing device” and “the second location of the second computing device”.  Here, Conway discloses that conversion server 26 determines identification information 30 about a party 14, where identification information 30 includes a geographic location.  (Column 4, Lines 19 to 25: Figure 1)  Party identification information 30 can be obtained in a variety of ways, and conversion server 26 can determine a geographic location from which speech signal 20 is received using a detected area code, and may use the detected area code to perform a search of a local telephone directory corresponding to the determined geographic area.  (Column 5, Lines 4 to 31: Figure 1)  A party 14 in Texas may have difficulty understanding a party 14 with a strong Michigan accent, and could accordingly select a speech conversion heuristic 25.  (Column 6, Lines 3 to 6)  Here, Conway discloses equivalently determining a geographic location for both parties in a telephone call (“determining a first location of the first computing device” and “determining a second location of the second computing device”).  Party 14a may choose at least one conversion heuristic 25a for converting a speech signal 20b provided by party 14b, that converts a speech signal 20b from speech spoken with a Texas accent to speech spoken with a British accent for transmitting to first party 14a, and selects at least one conversion heuristic 25b, that converts speech spoken with a Texas accent to speech spoken with a New York accent for transmitting to a second party 14b.  (Column 7, Lines 3 to 18: Figure 1)  A conversion heuristic or heuristics 25 are automatically selected based on identification information 30 determined by conversion server 26.  (Column 7, Lines 65 to 67: Figure 3: Step 108)  Conway, then, discloses “selecting the second voice based on one or more of: the first location of the first computing device” and “the second location of the second computing device”.  Conway omits only the limitation of “determining characteristics of a connection between the first computing device and the second computing device” for alternatively “selecting the second voice based on one or more of: . . . the characteristics of the connection between the first computing device and the second computing device.”
However, Cutler et al. teaches whatever limitations are omitted by Conway.  Generally, Cutler et al. teaches artificially generating speech for a communication session that is dependent on the received audio quality.  (Abstract)  A text-to-speech module of a receiving device is able to detect a degradation in a quality of a packet-based voice communication session, and to output a transmitting user’s voice information using speech synthesis to convert the remote text into an audio output.  The voice-to-text module is then able to supplement poor quality voice communication with synthesized speech.  (¶[0004])  Controller 216 is configured to determine when to control the text-to-speech converter 218 to convert the received text data 254 into synthesized speech and to play this synthesized speech out through speakers 222 in place of the received audio 250.  The controller 216 makes this determination by detecting when the quality of the network connection formed over network 101 between the transmitting interface 210 of the transmitting terminal 102a and the receiving interface 212 of the receiving terminal 102b over which the audio 250 is transmitted falls below a predetermined level of quality.  Controller 216 classifies whether at the current moment in time its experience of the network conditions is such that the received audio 250 is deemed usable or not usable, e.g., too bad to be understandable or too annoying to be practically usable.  (¶[0045] - ¶[0046]: Figure 2)  Cutler et al., then, teaches “determining characteristics of a connection between the first computing device and the second computing device” for a purpose of determining when to provide artificially generated speech by text-to-speech synthesis.  An objective is to control a text-to-speech converter according to a drop of audio data below a predetermined quality level.  (¶[0011])  It would have been obvious to one having ordinary skill in the art to determine characteristics of a connection between a first computing device and a second computing device as taught by Cutler et al. to determine when to perform voice conversion in Conway for a purpose of disabling text-to-speech synthesis for poor quality communication sessions.

Claims 2, 10, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Conway (U.S. Patent No. 8,650,035) in view of Cutler et al. (U.S. Patent Publication 2018/0218727) as applied to claims 1, 8, 9, 16, 17, and 20 above, and further in view of Gupta et al. (U.S. Patent Publication 2018/0190278).
Generally, Conway and Cutler et al. disclose and teach all of the limitations of these claims with the exception of “determining a context of the conversation.”  Here, Conway discloses that conversion server 26 determines identification information 30 based on the received original speech signal, where party identification information 30 is determined from an area code and telephone number of a party 14.  After receiving an original speech signal 20, it may be possible to determine party identification information 30 such as the area code and telephone number of party 14.  This party identification information 30 may be provided to speech converter library 24 for the determination of a conversion heuristic or heuristics 25.  (Column 5, Lines 3 to 62)  A conversion heuristic or heuristics 25 are automatically selected based on identification information 30 determined by conversion server 26.  (Column 7, Lines 65 to 67: Figure 3: Step 108)  Conway, then, discloses “determining a first user of the second computing device”, “determining a second user of the first computing device” and “wherein generating the second audio data is based on one or more of: the first user of the second computing device” and “the second user of the first computing device”.  That is, Conway is providing identification for party 14a and party 14b to mutually select conversion heuristics in a two-way communication.  Implicitly, this communication is a telephone call that represents two parties “are conducting a conversation”.  That is, a telephone call between two parties would necessarily imply that the two parties are engaged in a conversation (“determining that the first user of the first computing device and the second user of the second computing device are conducting a conversation”).  Cutler et al., then, teaches the limitations of claims 8, 16, and 20 directed to  “determining characteristics of a connection between the first computing device and the second computing device” and “wherein generating the second audio data is based on one or more of: . . . the characteristics of the connection between the first computing device and the second computing device.” 
Gupta et al. teaches the limitations of “determining a context of the conversation” and “wherein generating the second audio data is based on one or more of: . . . the context of the conversation”.  Generally, Gupta et al. teaches a digital assistant that dynamically selects a personality.  A context of a conversation may be determined based upon a request and a profile of a user.  A first personality may be selected for the digital assistant from one or more personalities based upon the context of the conversation.  (Abstract)  A first personality may be selected for the digital assistant from one or more personalities, e.g., actors, athletes, celebrities, etc.  (¶[0003])  A first personality may be selected over a second personality in response to determining that the context of the conversation is associated with politics.  The first personality may be stored in association with a first topic, while a second personality may be stored in associated with a second topic.  (¶[0044]: Figure 4)  An objective is to improve upon a digital assistant with a single voice that may be monotonous, uninteresting, or tiring to a user, and where a user might then avoid using the digital assistant.  (¶[0040])  It would have been obvious to one having ordinary skill in the art to determine a context of a conversation to generate a personality as taught by Gupta et al. to perform speech conversion in Conway for a purpose of avoiding a monotonous, uninteresting, or tiring voice.

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure.
Towell, Boss et al., Goldberg et al., Bi et al., Caldwell et al., Paul, and Huang et al. disclose related prior art.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        September 23, 2022