DETAILED ACTION
Notice of AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Election/Restrictions
Claims 1-6 and 14-20 have been withdrawn from further consideration pursuant to 37 CFR 1.142(b) as being drawn to a nonelected invention. Election was made without traverse in the reply filed on Sept. 09, 2022.
Claims 7-13 and newly-added claims 21-25 are now pending.
Priority
With respect to U.S. Provisional Patent Application 62/705,127, Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) is acknowledged. 
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/07/2021 has been considered by the examiner.
Specification
The disclosure is objected to because of the following informalities:
On page 26, para. 0118, line 23, the examiner suggests changing “Corresponding properties” to read “Corresponding probabilities” because the discriminator is described as computing probabilities in line 17 and further line 27 refers to “corresponding probabilities.”  If Applicant elects to make such change, the Applicant is reminded that Figure 14A, item 1410, may require a similar change.
On page 30, para. 0134, line 24 “Bis false” should read “B is false”
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 7-13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

Claim 7 recites a computerized method of synthesizing speech audio. Under the broadest reasonable interpretation, the limitations cover performance in the human mind with the assistance of physical aids (e.g., pen and paper), but for the recitation of generic computer components.  That is, other than that the methods are “computerized”, nothing in the claim precludes the steps from practically being performed in the mind.  For example, a human could, such as a voice-over actor reading a script, could receive a string of text and at least one voice property with a perceptible meaning (e.g., a script with text that includes cues telling the voice actor how to read, such as with a comedic and exaggerated Texas drawl), synthesize speech audio corresponding to the string of text that conditions a sound of speech audio on the at least one voice property value to generate synthesized speech audio (e.g., mentally rehearsing reading the script with a comedic and exaggerated Texas drawl), outputting the synthesized speech audio (e.g., reading the script aloud, as mentally rehearsed), wherein the sound of the synthesized speech audio perceptually relates to the at least one voice property value (e.g., the speech output, when read aloud, perceptually relates to the at least one voice property value, such as a perceptible comedic and exaggerated Texas drawl).

The judicial exception is not integrated into a practical application. While the claim recites a “neural speech synthesis model”, the claim only recites the model at a high level of generality and the claim does not recite a computer-specific algorithm relied upon in synthesizing the audio.  Therefore, the neural speech synthesis model is a simple computer automation of the speech production process that could be performed by a human.  The remaining limitations only recite generic computing components, i.e., a “computerized method”. Such generic computing components are recited at a high-level of generality (i.e., as a generic computer performing a generic computer function of receiving, processing, and outputting information) such that it amounts no more than mere instructions to apply the exception using generic computer components. Accordingly, these elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. With respect to the claimed “neural speech synthesis model,” use of a neural speech synthesis model for synthesizing speech is well-known, routine, and conventional, as evidenced by at least:
US 20180366138 A1 (Ramprashad) – Para. 0005 (with respect to speech coding and synthesis, explaining that “accuracy requirements of the neural network outputs are well understood, matching those known for the synthesis parameters.”)
US 20180330713 A1 (Hoory et al.) – Para. 0089 (It is “apparent to those of ordinary skill in the art” that deep neural networks can be applied to text-to-speech synthesis technology”)
US 9792900 B1 (Kaskari et al.) – Col. 1, lines 22-25 (“As a previously known machine-listening process, speech recognition (and subsequent re-synthesis) often includes recognizing phonemes using statistical formalisms such as neural networks.”)
US 20220051655 A1 (Kanagawa et al.) – Para. 0002 (“Conventionally, voice synthesis devices are known, which learn an acoustic model according to a DNN (deep neural network)”)
The remaining limitations in claim 7 are not sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using generic computer components cannot provide an inventive concept. Claim 7 is not patent eligible.
Claim 8 depends from claim 7 and further recites that the at least one voice property value includes at least one of a gender voice property, an age voice property, an accent voice property, a timbre voice property, or an attitude voice property.  As explained above with respect to claim 7, each of these voice properties could be considered by a human, such as a voice actor reading a script, when mentally rehearsing and speaking aloud the script, e.g., a particular actor can be cast who will naturally read in a male/female voice, a child/adult voice, an accent (e.g., Texas drawl, stiff British aristocrat), a timbre (e.g., nasally, excited), or attitude (e.g., happy, scared). None of the additional limitations recited in claim 8 amount to anything more than the same or a similar abstract idea as recited in claim 7.    Nor do any limitations in claim 8: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Claim 8 is not patent eligible.
Claim 9 depends from claim 7 and further recites enabling download of the synthesized speech audio.  This additional limitation does not amount to anything more than the same or a similar abstract idea as recited in claim 7.  Downloading a file is merely insignificant post-solution activity that does not integrate the abstract idea into a practical application.  MPEP 2106.05(g).  Downloading a file does not amount to significantly more than the judicial exception because downloading a file is merely a well-understood, routine, and conventional activity.  Claim 9 is not patent eligible. See examples in MPEP 2106.05(d) II:
Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network); but see DDR Holdings, LLC v. Hotels.com, L.P., 773 F.3d 1245, 1258, 113 USPQ2d 1097, 1106 (Fed. Cir. 2014) ("Unlike the claims in Ultramercial, the claims at issue here specify how interactions with the Internet are manipulated to yield a desired result‐‐a result that overrides the routine and conventional sequence of events ordinarily triggered by the click of a hyperlink." (emphasis added)); 

Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93

Claim 10 depends from claim 7 and further recites enabling playback of the synthesized speech audio (e.g., a human voice actor reading the script aloud again).  None of the additional limitations recited in claim 10 amount to anything more than the same or a similar abstract idea as recited in claim 7.    Nor do any limitations in claim 10: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Claim 10 is not patent eligible.
Claim 11 depends from claim 7 and further recites that the string of text is associated with at least one text tag (e.g., a script can be annotated with tags, such as to emphasize certain words or phrases, or to recite certain words as song lyrics).  None of the additional limitations recited in claim 11 amount to anything more than the same or a similar abstract idea as recited in claim 7.    Nor do any limitations in claim 11: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Claim 11 is not patent eligible.
Claim 12 depends from claim 7 and further recites that the string of text indicates dynamically configurable voice parameter values (e.g., a script can be annotated with tags, such as to emphasize certain words or phrases, or to recite certain words as song lyrics, where the voice actor dynamically varies his/her voice).  None of the additional limitations recited in claim 12 amount to anything more than the same or a similar abstract idea as recited in claim 7.    Nor do any limitations in claim 12: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Claim 12 is not patent eligible.
Claim 13 depends from claim 7 and further recites providing a graphical user interface that includes one of a text input field or a voice property value input field (e.g., providing the voice actor with a script will fill-in-blanks and a second piece of paper with fields for filling in the blanks). None of the additional limitations recited in claim 13 amount to anything more than the same or a similar abstract idea as recited in claim 7.    Nor do any limitations in claim 13: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea because a graphical user interface is merely a generic computer function or component, or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components (e.g., a graphical user interface) amounts to no more than mere instructions to apply the exception using generic computer components.  Indeed, utilizing a graphical user interface with a text input field and voice property value input field is mere data gathering and insignificant pre-solution activity.  MPEP 2106.05(g) (citing CyberSource v. Retail Decisions, Inc., 654 F.3d 1366, 1375, 99 USPQ2d 1690, 1694 (Fed. Cir. 2011) for the proposition that obtaining information about transactions using the Internet to verify credit card transactions is mere data gathering and insignificant extra-solution activity). Claim 13 is not patent eligible.
Claim 21 depends from claim 7 and recites generating code for implementing the neural speech synthesis model, where specific details of the algorithm implementing the code for the neural speech synthesis model are specifically claimed.  Therefore, the limitations in claim 21 cannot be practically be performed in the human mind and are considered to be eligible subject matter.  
Claims 22-25 depend from claim 21 and are therefore considered to be eligible subject matter for the same reasons set forth above with respect to claim 21.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 7-12 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Chicote et al., U.S. Patent Application Publication 2021/0097976 A1, hereinafter referenced as CHICOTE.

Regarding claim 7, CHICOTE discloses:
A computerized method of synthesizing speech audio, the computerized method comprising: (systems and methods for synthesizing speech from first input data; para. 0018; Figs. 10 and 11, device 710 and natural language processing system 720, are part of a computing system that may perform text-to-speech processing, where system 720 includes servers performing TTS processing, including TTS component 295 as depicted in Fig. 2; paras. 0100 and 0101)
receiving a string of text (Fig. 2, input text data 110, e.g., “Hello, world”; paras. 0023, 0026) and at least one voice property value with a perceptible meaning; (Fig. 2, vocal characteristic data 120; para. 0026; vocal characteristic data may represent characteristics such as age, gender, accent, emotion, coarseness, and speed; paras. 0018, 0021)
synthesizing speech audio corresponding to the string of text (Fig. 2, TTS component/processor 295 takes input text data 110 and vocal characteristic data 120 and outputs audio data 290 representing synthesized speech; paras. 0026, 0034) using a neural speech synthesis model (neural networks are used to perform TTS processing; paras. 0060-63; neural networks may be in encoders 102/104 or decoders 106, which are in speech model 100 (see Fig. 1), which is part of speech synthesis engine 218 (see Fig. 2), which is part of TTS component/processor 295; para. 0026, 0028, 0060) that conditions a sound of speech audio on the at least one voice property value to generate synthesized speech audio; and (neural networks performing TTS process use the vocal characteristic 120; para. 0060; for example, synthesized speech audio may sound like a 40-year-old news anchor, or Arnold Schwarzenegger, or some other celebrity as designated by the one or more vocal characteristics 120; paras. 0018, 0021, 0098, 0099).
outputting the synthesized speech audio, (Fig. 2, output audio data 290 represents synthesized speech; para. 0034)
wherein the sound of the synthesized speech audio perceptually relates to the at least one voice property value. (Fig. 2, speech synthesis engine 218 performs synthesis on the input text 110 and vocal characteristic 120 and outputs audio data 290, which represents synthesized speech; for example, synthesized speech audio may sound like a 40-year-old news anchor, or Arnold Schwarzenegger, or some other celebrity as designated by the one or more vocal characteristics 120; paras. 0018, 0021, 0098, 0099).

Regarding claim 8, CHICOTE teaches the computerized method of claim 7.  CHICOTE further discloses:
wherein the at least one voice property value includes at least one of a gender voice property, an age voice property, an accent voice property, a timbre voice property, or an attitude voice property. (vocal characteristics include gender, age, accent, and emotion, e.g., attitude; paras. 0018, 0021; whispered and excited speech; paras. 0027, 0037, 0038; the examiner notes that the broadest reasonable interpretation of timbre voice property includes an excited voice property as set forth in para. 0061 to the instant specification)

Regarding claim 9, CHICOTE teaches the computerized method of claim 7.  CHICOTE further discloses:
enabling download of the synthesized speech audio. (output audio data 290 may be returned to a device 710 as requested by a user; paras. 0098, 0099)

Regarding claim 10, CHICOTE teaches the computerized method of claim 7.  CHICOTE further discloses:
enabling playback of the synthesized speech audio. (e.g., Fig. 10, a user can ask an Alexa-enabled smart speaker device 710a, “Alexa, read the Gettysburg Address like Celebrity A.”, where the input text is the Gettysburg Address and the output audio data 290 is returned to device 710a to be read by Alexa; paras. 0098, 0099; system 720 includes servers performing TTS processing, including TTS component 295 as depicted in Fig. 2; paras. 0100 and 0101)

Regarding claim 11, CHICOTE teaches the computerized method of claim 7.  CHICOTE further discloses:
wherein the string of text is associated with at least one text tag. (input text data 110 may include text tags, such as <begin whisper> and <end whisper>, and may use tags according to the speech synthesis markup language (SSML); para. 0028)

Regarding claim 12, CHICOTE teaches the computerized method of claim 7.  CHICOTE further discloses:
wherein the string of text indicates dynamically configurable voice parameter values. (input text data 110 may include text tags, such as <begin whisper> and <end whisper>, to indicate the desired output speech quality in tags, where such tags may be written according to the speech synthesis markup language (SSML); para. 0028; the examiner notes that the broadest reasonable interpretation of “dynamically configurable voice parameter values” includes SSML tags as set forth in paras. 0044, 0102, and 0119 in the instant specification)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 13 is  rejected under 35 U.S.C. 103 as being unpatentable over CHICOTE in view of Mori et al., U.S. Patent Application Publication 2020/0013409 A1, hereinafter referenced as MORI.

Regarding claim 13, CHICOTE teaches the computerized method of claim 7.  However, CHICOTE fails to explicitly teach:
providing a graphical user interface that includes one of a text input field or a voice property value input field.

However, in a related field of endeavor, MORI pertains to searching for voice speakers according to various voice quality features, where a user interface may be used to identify the different voice quality features. (para. 0040, 0041, 0081, 0082).  The combination of CHICOTE in view of MORI makes obvious:
providing a graphical user interface (MORI, Fig. 5 depicts a graphical user interface where a user can use knobs 30E and sliding bars 30F to input voice quality features, e.g., gender, age, etc.; MORI paras. 0081, 0082; in combination with CHICOTE, Fig. 3 shows that device 710 may be a smart phone, tablet computer, desktop computer, or laptop computer, which may each utilize the graphical user interface of MORI, and which may modify the graphical user interface of MORI to include a text input field, as set forth in CHICOTE at para. 0092) that includes one of a text input field (CHICOTE further discloses that the user of device 710 may provide a text input; para. 0092) or a voice property value input field (MORI, Fig. 6, knobs 30E and sliding bars 30F for voice quality features, e.g., gender, age; MORI paras. 0081, 0082).
	
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the graphical user interface teachings of MORI with CHICOTE so that device 710 in CHICOTE, which may be a device such as a smart phone, tablet, desktop computer, or laptop computer, may use graphical user interfaces to permit user input of text and/or voice quality features (e.g., age, gender).  One of ordinary skill would have been motivated to utilize the teachings of MORI to enable users to customize speech in a subject manner, because voice quality features may be subjective to the user. (MORI, para. 0093).

Claim 21 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over CHICOTE in view of Wang et al., U.S. Patent Application Publication 2021/0208916 A1, hereinafter referenced as WANG.

Regarding claim 21, CHICOTE teaches the computerized method of claim 7.  CHICOTE further discloses:
the code implementing the neural speech synthesis model (TTS processing may be performed using neural networks; paras. 0028, 0060-63) wherein a node in a hidden layer; (Fig. 8, hidden layer node 804 connections have an associated weight or score; para. 0060)
outputting the code, and (computer instructions are output to memory 1106/1206 as temporary working storage at runtime; para. 0102)
wherein the code implements a speech synthesis function of a speech synthesizer. (TTS processing, including speech synthesis engine 218, may be performed using neural networks, where speech synthesis engine 218 may maintain its own controller/processor, memory, and instructions; paras. 0028, 0060-63)

However, CHICOTE fails to explicitly teach:
generating code for execution by a computer
wherein a node in a hidden layer includes, in its summation, a constant term derived from a product of the at least one voice property value and a weight learned from a training process;

However, in a related field of endeavor, WANG pertains to deploying applications across multiple architectures, where an embodiment utilizes neural networks.  The combination of CHICOTE in view of WANG makes obvious:
generating code for execution by a computer (WANG discloses that code may be compiled into binaries for execution by a particular operating system; WANG, paras. 0074, 0075, 0100; in combination with CHICOTE, the instructions for performing TTS synthesis, which are updated and re-configured, are compiled into binaries for execution; CHICOTE, paras. 0025, 0028, 0036, 0063.)
wherein a node in a hidden layer includes, (CHICOTE, Fig. 8, hidden layer nodes 804 have an associated weight or score; CHICOTE, para. 0060; combined with WANG, Fig. 7, hidden layer nodes 705 in a deep neural network; WANG, para. 0088; CHICOTE and WANG both utilize a similar structure for a neural network, with an input layer of one or more nodes, one or more hidden layers, and an output layer with one or more nodes) in its summation, a constant term derived from a product of the at least one voice property value and a weight learned from a training process; (WANG discloses that neural networks may utilize summation functions; WANG, para. 0111; in combination with CHICOTE, CHICOTE further discloses in Fig. 8, hidden layer node 804 connections have an associated weight or score, learned during training; CHICOTE, paras. 0060, 0063; each node in input layer 802 represents an input to the neural network, such as an acoustic feature; CHICOTE, paras. 0060, 0062; acoustic features include emotion, speaker, accent, language, tone, pitch, rate of change of pitch, speed, intonation, nasality, breath; CHICOTE, para. 0031; as shown in Fig. 8, each node in the first hidden layer 804 is connected to all input layer nodes; CHICOTE, para. 0060, using the summation teachings of WANG, the first hidden layer nodes can perform a summation of the products of the acoustic feature input and an associated trained weight for each node connection in the first hidden layer.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of WANG, specifically the teachings related to compiling code into binary format for execution and neural networks that utilize summation functions, with the teachings of CHICOTE.  As disclosed in WANG, one of ordinary skill would be motivated to apply the teachings of WANG to CHICOTE because neural networks may implement artificial intelligence for performing complex tasks, such as evaluating source code or binaries from an unknown operation system.  (WANG, paras. 0080, 0085).  One of ordinary skill would further be motivated to apply the teachings of WANG to CHICOTE because as disclosed in WANG, code needs to be generated that is capable of being run on a particular operating system and processor architecture, and therefore one of ordinary skill would be motivated to generate compiled code in binary format in a manner that ensures that the code properly executes on the particular operating system and processor, as disclosed in WANG.  (WANG, para. 0062).
Similarly, CHICOTE discloses using neural networks to perform text-to-speech processing and synthesis, determining weighted predictions utilizing one or more hidden layer nodes 804, which would benefit from the summation operation disclosed in WANG.  (CHICOTE, paras. 0060-0063).

Regarding claim 22, CHICOTE in view of WANG teaches the computerized method of claim 21.  The combination of CHICOTE in view of WANG makes obvious:
wherein the code is in a binary format. (WANG discloses that code may be compiled into a binary format; WANG, paras. 0074, 0075, 0100; in combination with CHICOTE, the instructions for performing TTS synthesis, including instructions for speech synthesis engine 218, are compiled into binary format for execution; CHICOTE, paras. 0025, 0028, 0036, 0063, 0101.)

Allowable Subject Matter
Claims 23-25 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Regarding claim 23, the examiner notes that CHICOTE in view of WANG teaches the computerized method of claim 21, but that CHICOTE in view of WANG fails to explicitly teach:
reading at least one stored voice property vector from a brand database; and 
computing a distance between the at least one stored voice property vector and the voice property vector to generate a computed distance.

The examiner notes that Zhang et al., U.S. Patent Application Publication 2021/0075787 A1, hereinafter referenced as ZHANG, and Vemparala et al., U.S. Patent Application Publication 2014/0007155 A1, hereinafter referenced as VEMPARALA, collectively teach:

reading at least one stored voice property vector from a brand database; and (ZHANG discloses that voiceprints may be stored in a database; ZHANG paras. 0033, 0034, 0038, 0039, 0083; voiceprints may be converted into vectors; ZHANG para. 0065 equation 11; and voiceprints relate to particular voice features, e.g., accent, tone, volume; ZHANG paras. 0006, 0021; VEMPERALA discloses maintaining a fingerprint library 122 with audio fingerprints relating to brands; VEMPERALA, paras. 0029 and 0030)
computing a distance between the at least one stored voice property vector and the voice property vector to generate a computed distance. (ZHANG discloses computing a cosine distance on vectors representing a speaker voice input and a stored voiceprint; para. 0083; ZHANG further discloses using cosine similarity (i.e., 1-cosine distance) between first and second voiceprint templates and comparing to a threshold to determine if authentication succeeds or fails; ZHANG paras. 0068, 0118-0120 and 0156-0158)

While CHICOTE, WANG, ZHANG, and VEMPERALA detail each limitation as claimed, one of ordinary skill in the art at the time as of the effective filing date of the present application would not have found sufficient motivation to combine the references without hindsight aid of Applicant’s disclosure. As such claim 23 contains allowable subject matter as the prior art does not anticipate nor make obvious the limitations as currently presented.

Claims 24 and 25 depend from claim 23 and would be allowable for the reasons set forth above with respect to claim 23.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 11410684 B1 (Klimkov et al.) pertains to text-to-speech processing with transfer of vocal characteristics, utilizing neural networks for speech synthesis.
US 11348601 B1 (Deshpande et al.) discloses system for using voice characteristics to determine user intent in a natural language understanding system. The voice characteristics data may indicate if a user's level of uncertainty when speaking the utterance, an age group of the user, a sentiment of the user when speaking the utterance, and other data.  Neural network learning models may be employed.
US 20210319780 A1 (Aher et al.) discloses a system that trains a model to provide information used to provide a synthesized speech response to a voice input. The model takes as input prosodic information that may include pitch, note, duration, prominence, timbre, rate, and rhythm, for example. The system receives a plurality of voice inputs, each associated with prosodic metric, as well as a plurality of responses, each also associated with prosodic metrics.  A user interface 410 is included (see para. 0072).
US 20210256961 A1 (Garman et al.) discloses a system for parametric speech synthesis. For example, in an embodiment, a text-to-speech conversion system may comprise a text converter adapted to convert input text to at least one phoneme selected from a plurality of phonemes stored in memory, a machine-learning model storing voice patterns for a plurality of individuals and adapted to receive the at least one phoneme and an identity of a speaker and to generate acoustic features for each phoneme, and a decoder adapted to receive the generated acoustic features and to generate a speech signal simulating a voice of the identified speaker in a language.
US 20210225357 A1 (Zhao et al.) discloses extracting audio signal vectors from training examples and generating an emotion-adapted voice font model based on the audio signal vectors and the labelling data.
US 10930263 B1 (Mahyar et al.) discloses an automatic voice dubbing system for replicating characteristics of an actor or actresses voice across different languages.
US 20200193971 A1 (Feinauer et al.) discloses a method for selecting a target dialect and accent to use to modify voice communications based on a context and a method for selectively modifying one or more words in voice communications in one dialect and accent with one or more vocal features of a different accent.
US 20210075787 A1 (Zhang et al.) pertains to a biometric authentication system using voiceprint techniques for authentication.  (paras. 0002 and 0005). Voiceprint templates synthesize particular voice features, e.g., accent, tone, volume. (paras. 0006, 0021).  Voiceprints may be converted to vector format.  (para. 0065, equation (11)).
US 20140007155 A1 (Vemperala et al.) pertains to brand detection in audio-visual media.  A fingerprint library 122 includes metadata about visual or audio portions of audiovisual media found to be associated with a brand, which may include television advertisements.  (paras. 0029 and 0030).  Fig. 3 discloses a process for brand detection, where a brand is detecting using an audio fingerprint (step 302, para. 0039), and based on detection, can present an associated interactive experience for such brand (step 310, para. 0050).
US 20180130471 A1 (Trufinescu et al.) discloses a bot server program configured to speak in a voice that matches the voice of a company’s brand ambassador (para. 0028).
US 20060095265 A1 (Chu et al.) discloses using celebrity voice fonts (para. 0017).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL C LEE whose telephone number is (571)272-4933. The examiner can normally be reached M-F 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL C. LEE/Examiner, Art Unit 2655                                                                                                                                                                                                        
/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655