Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-30 are pending. Claims 1 and 16 are independent.  Some of the Claims have been amended for informalities or improving form.
This Application was published as U.S. 2022/0051654.
Apparent priority 13 Aug 2020.
This Application includes numerous references to “input text utterance” by which it intends “input text.”  Another special lexicography of the instant Application is “prosodic domain/vertical” or “prosodic vertical” both of which mean a particular prosodic domain.
Applicant has clarified:

    PNG
    media_image1.png
    144
    662
    media_image1.png
    Greyscale

Response 11.
Applicant’s arguments are persuasive and the pending Claims are allowed.
Response to Amendments
Objection to Claim 12 is withdrawn in view of the amendments to this Claim. 
Allowable Subject Matter
Pending Claims 1-30 are allowed.
The following is an examiner’s statement of reasons for allowance: In view of each of the particular limitations of the independent Claims when considered in the order established by the Claim language and in the context of the language of the independent Claims when each Claim is considered as a whole, the independent Claims of this Application were not found in the prior art that was viewed.
In particular ….
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Close Art of Record
Refer to the art applied to the Claims during the prosecution of the instant Application, including Kim (U.S. 2020/0082806) and Shekhar (U.S. 2022/0028367) that were used in the 35 U.S.C. 103 rejection.
Note also Wu (U.S. 2020/0380949) applied to Claim 1 in the Conclusion section.

    PNG
    media_image2.png
    292
    490
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    329
    423
    media_image3.png
    Greyscale

Note application of Wu to Claim 1:
1. A method comprising: [Wu, the computer device and its components including the “processing hardware” / “processor 2001” are shown in Figure 20.  Figures 15-16 shows the modules used by the “speech synthesis apparatus.”]
receiving, at data processing hardware, an input text utterance to be synthesized into expressive speech having an intended prosody and a target voice; [Wu, Figure 1 showing “linguistic data” being input to the “second encoder.”  “[0039] S202: Obtain to-be-processed linguistic data.”  “[0040] The linguistic data may be text, a text feature, or a feature item….”  The “intended prosody and a target voice” are taught by the “style feature” which is input as the “speech signal” in Figure 1 and is the “[0046] … target reference speech data that correspond to the same reference linguistic data.”  “[0048] In an embodiment, before speech interaction between a user and the terminal, the terminal obtains reference linguistic data and reference speech data that has a style feature….”  (Note in Figure 2A of the instant Application, the inputs are 320 and 325: “[0043] … a text utterance 320 and optional other inputs 325, that may include speaker characteristics (e.g., speaker embedding Z) of the target voice. The other inputs 325 may additionally or alternatively include one or more of a language identifier, text normalization, or a prosodic vertical identifier of the corresponding prosodic domain….”  Such that the “intended prosody and a target voice” of the Claim are lumped into the “other information 325.”  This “other information 325” is taught by “speech data” in Figure 1 of Wu.)]
generating, by the data processing hardware, using a first text-to-speech (TTS) model, an intermediate synthesized speech representation for the input text utterance, the intermediate synthesized speech representation possessing the intended prosody, and [Wu, Figure 1, the “second encoder” teaches the “first text-to-speech (TTS) model” of the Claim and generates the “Average synthesized speech data.”  However, this “average synthesized speech data” does not yet have the “intended prosody.”  Wu in Figure 1 provides the “average synthesized speech” / “intermediate synthesized speech” to a combination of a supermimposer, a residual model and a projection layer and “[0037] … The constituted average speech model, the superimposer, the residual model, and the projection layer are combined, and may be configured in an adaptive phase to obtain an embedded vector used for representing a style feature….”  Thus, the combination of second encoder/decoder and the elements in the center of Figure 1 (superimposer, residual model, and projection layer) generates an “embedded vector” which represents the “style feature” of the speech.]
providing, by the data processing hardware, the intermediate synthesized speech representation to a second TTS model, the second TTS model comprising: [Wu, Figure 1, the “Target Speech Model” of Figure 1 teaches the “second TTS model” of the Claim. But the “average synthesized speech” / “intermediate synthesized speech” of the Claim is not directly provided to the “first encoder” / “encoder portion of the second TTS.”] 
an encoder portion configured to encode the intermediate synthesized speech representation into an utterance embedding that specifies the intended prosody, and [Wu, Figure 1, the portion that converts the “average synthesized speech data” / “intermediate synthesized speech” into the “embedded vector” which specifies the “intended prosody” is the middle portion in Figure 1 which is not the same as the “first encoder” of the “target speech model”/ “second TTS” of the Claim.]
a decoder portion configured to process the input text utterance and the utterance embedding to generate an output audio signal of expressive speech, the output audio signal having the intended prosody specified by the utterance embedding and speaker characteristics of the target voice. [Wu, the “target synthesized speech data” from “first decoder” is based on the  “linguistic data” / text received at the “first encoder.”  The “first decoder”  generates speech having the “style feature” that was intended.]

Wu has two TTS models like the Claim and the product of one model (Average Speech Model /TTS1) is provided to the other model (Target Speech Model/TTS2).  Wu is different from the Claim in that the “Embedded Vector” in Wu is provided to the decoder of the second TTS (Target Speech Model) directly whereas the Claim requires the output of the first TTS to be input to the encoder of the second TTS.  The “Embedded Vector” of Wu has to be of the type that can be handled by the “First Decoder” and thus similar to one generated by the “First Encoder.”  This may be a minor implementation difference.  Still, Wu decides to perform is encoding and generation of the “Embedded Vector” not via the “First Encoder” of the “Target Speech Model”/“second TTS” at which the text / “linguistic data” is received.

See also:
Yun (U.S. 20210210067):

    PNG
    media_image4.png
    524
    541
    media_image4.png
    Greyscale


Chen (U.S. 20140222421):

    PNG
    media_image5.png
    496
    749
    media_image5.png
    Greyscale


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached on 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Fariba Sirjani/
Primary Examiner, Art Unit 2659