Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-3, 6-11, 14-18, 20 rejected under 35 U.S.C. 103 as being unpatentable over Park: 20180268820  further in view of Gueta: 20170061955.

Regarding claim 1 
Park teaches:
A method comprising: 
receiving, from a client device of a posting user of an online system, a script for a voice-based content item (Park: ¶ 72-79, 94; Fig 1-3: a server of a content provider receives recorded user speech and extracted text thereof); 
retrieving a voice synthesis model stored in a user profile of the posting user (Park: ¶ 72-79, 94; Fig 1-3: user preferences stored in memory of the client device and comprising a specified voice model selective from a plurality thereof and operative to determine filtering for the recorded user speech);
generating a synthetic audio stream using the retrieved voice synthesis model and based on the received script (Park: ¶ 72-83, 94; Fig 1-3: server generates a response in a particular voice when the filtered recorded user speech is played to a subsequent user); 
presenting the generated synthetic audio stream to the posting user (Park: ¶ 72-83, 94; Fig 1-3: user provided with filtered recorded speech content, such as for verification); 
receiving instructions for modifying the synthetic audio stream (Park: ¶ 72-83, 94; Fig 1-3: system operative to correct, edit, modify the recorded user speech by re-recording and/or correct, edit, modify the text thereof); 
generating a second audio stream based on the received instructions (Park: ¶ 72-83, 94; Fig 1-3: system operative to perform a re-creation operation to automatically change the speech or text based on the modification instructions); 
composing the voice-based content item based on the generated second audio stream  (Park: ¶ 72-83, 94; Fig 1-3: system operative to provide a completed speech content after verification, editing, recreation, etc.); and 
presenting the voice-based content item to a viewing user of the online system (Park: ¶ 72-83, 94; Fig 1-3: user speech comments played out by content provider over a user interface).

Park discusses a voice model generated based on a particular user (Park: ¶ 74, 83) but does not explicitly discuss a voice synthesis model trained at least based on a plurality of voice samples of the posting user. 

In a related field of endeavor Gueta teaches a system and method for conversion of text into the speech of a particular user (Gueta: Abstract) comprising: receiving, from a client device of a posting user of an online system, a script for a voice-based content item (Gueta: ¶ 12, 22, Fig 2: user device receives a textual message from the user); 
retrieving a voice synthesis model stored in a user profile of the posting user, the voice synthesis model trained at least based on a plurality of voice samples of the posting user (Gueta: ¶ 32-36; Fig 2: system generates and maintains voice profiles suitable to select and apply a voice profile of the user upon a user message); 
generating a synthetic audio stream using the retrieved voice synthesis model and based on the received script (Gueta: ¶ 32-36: a text message converted into voice audio of the sender based on an extant voice profile thereof); 
composing the voice-based content item based on the generated audio stream (Gueta: ¶ 12, 22, 32-36; Fig 2: system operates to respond to a client request to prepare, provide convert text to speech and thereby compose a voice based content item comprising or otherwise based on a generated audio stream and deliver same to a user over the network); and presenting the voice-based content item to a viewing user of the online system (Gueta: ¶ 12, 22, 32-36; Fig 2: a recipient provided with a synthesized specifically worded message in the voice of the user without the user having spoken the specific wording thereof).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to utilize the Gueta taught training of a voice model based on and specific to a message sending user for utilization as a model of a particular user of the Park system and method. The average skilled practitioner would have been motivated to do so for the purpose of personalizing diverse communication among users and would have expected only predictable results therefrom.

Regarding claim 2
Park in view of Gueta teaches or suggests:
The method of claim 1, wherein the script comprises an indication of a mood for the voice-based content item, and wherein retrieving the voice synthesis model comprises: selecting the voice synthesis model from a set of voice synthesis model stored in the user profile of the posting user, the selection of the voice synthesis model based on the mood for the voice-based content item (Park: ¶ 72-83, 94; Fig 1-3: system operative to select a particular voice for voice synthesis such as by selection, provision, etc. of a charming voice or an angry voice); (Gueta: Abstract; ¶ 13, 20-25, 37-47, 91-100; Fig 2; Claim 6: a textual message converted into a user voice with respect to the emotional characteristics of the user, estimations thereof such as by detection of emotion or other moods such as by estimation logic, etc.). The claim is considered obvious over Park as modified by Gueta as addressed in the base claim as it would have been obvious to apply the further teaching of Park and/or Gueta to the modified device of Park and Gueta.

Regarding claim 3
Park in view of Gueta teaches or suggests:
The method of claim 1, wherein generating the second audio stream comprises: retrieving a second voice synthesis model stored in the user profile of the posting user; and generating the second audio stream using the retrieved second voice synthesis model and based on the received script and the received instructions for modifying the synthetic audio  (Park: ¶ 72-83, 94; Fig 1-3: system operative to select a particular voice for voice synthesis such as by selection, provision, etc. of a charming voice or an angry voice); (Gueta: Abstract; ¶ 13, 20-25, 37-47, 91-100; Fig 2; Claim 6: a textual message converted into a user voice with respect to the emotional characteristics of the user, estimations thereof such as by detection of emotion or other moods such as by estimation logic, etc.). The claim is considered obvious over Park as modified by Gueta as addressed in the base claim as it would have been obvious to apply the further teaching of Park and/or Gueta to the modified device of Park and Gueta.

Regarding claim 6
Park in view of Gueta teaches or suggests:
The method of claim 1, wherein the instructions for modifying the synthetic audio stream includes at least one of instructions for changing an intonation or pronunciation of one or more words or phrases in the generated synthetic audio stream  (Park: ¶ 72-83, 94; Fig 1-3: system operative to correct, edit, modify the recorded user speech by re-recording and/or correct, edit, modify the text thereof; such a modification would operate to instruct the system to adjust pronunciation), adding a pause in the generated audio stream, removing a pause in the generated synthetic audio stream, changing a cadence of at least a portion of the generated synthetic audio stream, and adding sound effects to the generated synthetic audio stream (Park: ¶ 74, 83: system operable to add an applauding, cheering, etc. sound). The claim is considered obvious over Park as modified by Gueta as addressed in the base claim as it would have been obvious to apply the further teaching of Park and/or Gueta to the modified device of Park and Gueta.

Regarding claim 7
Park in view of Gueta teaches or suggests:
The method of claim 1, further comprising: generating a stream of phonemes based on the received script, wherein the synthetic audio stream is generated by using the retrieved voice synthesis model based on the generated stream of phonemes. Examiner takes official notice that generating a stream of phonemes operable to synthesize a voice was well-known in the art before the effective filing date and would have comprised on obvious inclusion. The average skilled practitioner would have been motivated to do so for the purpose of  generating a spoken form of a text message and would have expected only predictable results therefrom.

Regarding claim 8
Park in view of Gueta teaches or suggests:
The method of claim 7, further comprising: presenting the stream of phonemes to the posting user; and receiving, from the client device of the posting user, a modified stream of phonemes, wherein the synthetic audio stream is generated by using the retrieved voice synthesis model based on the received modified stream of phonemes  (Park: ¶ 72-83, 94; Fig 1-3: system operative to correct, edit, modify the recorded user speech by re-recording and/or correct, edit, modify the text thereof). The claim is considered obvious over Park as modified by Gueta as addressed in the base claim as it would have been obvious to apply the further teaching of Park and/or Gueta to the modified device of Park and Gueta.

Claims 9, 17 are considered substantially similar to claim 1 and are similarly rejected. 

Claims 10, 18 are considered substantially similar to claim 2 and are similarly rejected. 

Claim 11 is considered substantially similar to claim 3 and is similarly rejected. 

Claim 14 is considered substantially similar to claim 6 and is similarly rejected. 

Claim 15 is considered substantially similar to claim 7 and is similarly rejected. 

Claim 16 is considered substantially similar to claim 8 and is similarly rejected. 

Claim 20 is considered substantially similar to claim 7, 8 and is similarly rejected. 

Claims 4, 5, 12, 13, 19 rejected under 35 U.S.C. 103 as being unpatentable over Park: 20180268820  further in view of Gueta: 20170061955 as applied to claims 1-3, 6-11, 14-18, 20 supra and further in view of Binkowski: High Fidelity Speech Synthesis with Adversarial Networks, (provided by Applicant; copyright 2020 and hereinafter Bink).

Regarding claim 4
Park in view of Gueta teaches or suggests:
The method of claim 1, further comprising: 
receiving the plurality of voice samples of the posting user (Park: ¶ 72-79, 94; Fig 1-3: a server of a content provider receives recorded user speech and extracted text thereof); (Gueta: ¶ 32-36; Fig 2: system generates and maintains voice profiles suitable to select and apply a voice profile of the user upon a user message); 
generating the voice synthesis model using the plurality of voice samples of the posting user (Park: ¶ 72-83, 94; Fig 1-3 );(Gueta: ¶ 32-36); 
generating a test audio stream using the voice synthesis model (Park: ¶ 72-83, 94; Fig 1-3: user provided with filtered recorded speech content, such as for verification). 

Park and Gueta do not explicitly teach generating a discriminator model using the plurality of voice samples of the posting user, the discriminator model for determining whether an audio stream includes a voice recording of the posting user; and determining a classification for the test audio stream using the discriminator model; and refining the voice synthesis model and the discriminator model based on the determined classification for the test audio stream.

In a related field of endeavor Bink teaches a system and method operable to generate a voice of a particular user comprising receiving voice samples of a user and generating a voice synthesis model based on the received voice samples (Bink: section 3.1-3.4) including: 
generating a discriminator model using the plurality of voice samples of the posting user, the discriminator model for determining whether an audio stream includes a voice recording of the posting user (Bink: section 3.1-3.4; Fig 2); and 
determining a classification for the test audio stream using the discriminator model (Bink: section 2: system operates to differential between real voice samples and generated voice samples); and 
refining the voice synthesis model and the discriminator model based on the determined classification for the test audio stream (Bink: section 2: in an adversarial network the generator and discriminator are trained jointly and the training operates to iteratively refine each with respect to the other).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to utilize machine learning to improve the Park in view of Gueta model such as using the GAN speech synthesizer of Bink. The average skilled practitioner would have been motivated to do so for at least the purpose of improving the fidelity of the Park in view of Gueta synthesized voice and would have expected only predictable results therefrom.

Regarding claim 5
Park in view of Gueta in view of Bink teaches or suggests:
The method of claim 4, further comprising: storing the voice synthesis model in the user profile of the posting user (Gueta: ¶ 25, 35, etc.: voice profile stored in user preferences). The claim is considered obvious over Park as modified by Gueta as addressed in the base claim as it would have been obvious to apply the further teaching of Park and/or Gueta to the modified device of Park and Gueta.

Claims 12, 19 are considered substantially similar to claim 4 and are similarly rejected. 

Claim 13 is are considered substantially similar to claim 5 and is similarly rejected. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL C MCCORD whose telephone number is (571)270-3701. The examiner can normally be reached 730-630 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VIVIAN CHIN can be reached on 5712727848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PAUL C MCCORD/Primary Examiner, Art Unit 2654