Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION

Status of the Application

The following is an Office Action in response to communication received on 12/08/2020. Claims 2-4, 6, 8-10, 12-14, 16, 18-26 and 30 have been examined in this application.
  
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/27/2019 has been entered.

Response to Amendment
Applicant’s amendments to claims 2-4, 6, 8-10, 12-14, 16, 18-26 and 30 are acknowledged. Applicant’s cancellation of claims 1, 5, 7, 11, 15, 17 and 27-29 is acknowledged.

35 USC § 101
1.	Step 1: The claims 2-4, 6, 14, 15, 18, 19, 26 and 30 are a products and claims 8-10, 12, and 20-26 are methods. Thus, each independent claim, on its face, is directed to one of the statutory categories of 35 U.S.C. §101. 
Step 2A
Prong 1: Claims 2-4, 6, 8-10, 12-14, 16, 18-26 and 30 include the abstract concept of providing ancillary merchandise that is capable of providing a plurality of marketing impressions per customer and the specification explains at length that the invention is a marketing tool. Providing audio messages like ads is an idea of ‘itself’ and as well as a method of organizing human activities that falls under commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations) hence the claims offer an abstract idea. 
Prong 2: This judicial is integrated into a practical application because:
The claim are applying or using the judicial exception in some other meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort designed to monopolize the exception - see MPEP 2106.05(e) and Vanda Memo. Specifically, the claims include producing a database of phonemes and diphones from recordings of a specific user, and although the specification does not explain how, these phonemes and diphones are used to generate audio in the voice of the recorded user in a language different than the language from which the phonemes and diphones were collected. The steps the independent claims provide meaningful limitation in the technical field of converting text to speech (TSS), which is enough to go beyond generally linking the marketing or advertising concepts such that the claims as a whole is more than a drafting effort designed to monopolize providing ancillary merchandise that is capable of providing a plurality of marketing impressions per customer.
Step 2B: Not reached.



Claim Rejections - 35 USC § 103

2.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	Claims 2-4, 6, 8-10, 12-14, 18-21 and 23-25 are rejected under 35 U.S.C. 103 as being unpatentable over Freeland et al. (US 2003/0028380, Hereafter: Freeland) further in view of Akabane et al. (US 2006/0253286, Hereafter: Akabane) further in view of Kurzweil et al. (US 2006/0253286, Hereafter: Kurzweil).
As per claim 13, Freeland in view of Akabane and Kurzweil;
13. A text-to-speech product as ancillary merchandise, comprising: 
Freeland discloses an electronic device including a processor and memory for storing one or more programs configured to receive a set of electronic text; and Examiner’s note: The user provides text in a first language. See, “Various embodiments are described below 
Freeland discloses a non-transitory computer-readable storage medium comprising text-to-speech executable instructions generated from a digitized set of raw data from a voice recording, the voice recording being mined to produce a database of phonemes and diphones from which the executable instructions, when executed by the processor generate audio of the set of electronic text from a source pre-selected by the user; Examiner’s note: Voice are mined from speech in a second language. See, “A synthesised TTS system uses advanced text, phonetic and grammatical processing to enhance the range of phrases and sentences understood by the TTS system and relies to a lesser extent on pre-recorded words and phrases than does the concatenative TTS system but rather, synthesises the audio output based on a stored theoretical model of the selected character's voice and individual phoneme or diphone recordings.” [0121].
Freeland discloses the voice of the text-to-speech product selected by the user from one or more voices, such that the audio generated by the text-to-speech executable instructions from the set of electronic text is in the voice selected by the user; Examiner’s note: The user selects a character. See, “In embodiments that are implemented in software, the chosen character is selected from a database of supported characters, either automatically or by the user. The conversion process of generating an audio message is described in greater detail 
Freeland does not disclose wherein the text-to-speech product is configured to generate audio in a foreign language selected by the user from a plurality of foreign languages and in the voice selected by the user, Examiner’s note: As set forth above Freeland discloses translating from a first language to a second language. The example translates to the selected receivers language, thus Freeland does not disclose the user selects between a plurality of foreign languages. 
"native language," "foreign language," "hushed voice "loud voice," etc.” [0022]. Column 3, Line 50-65. See, “In some embodiments, a character can be associated with multiple voice models. If a character is associated with multiple voice models, the character has multiple moods that can be selected by the user. Each mood has an associated (single) voice model. When the user selects a character the user also selects the mood for the character such that the appropriate voice model is chosen. For example, a character could have multiple moods in which the character speaks in a different language in each of the moods.”[0032]. Column 5, Line 63 to Column 6, Line 4. See, “The edit cast member window 136 also includes a portion 147 for selecting a voice to be associated with the character. For example, the system can include a drop down menu of select a voice from the drop down menu of voices and a language 148 via a drop down menu, as shown.” [0035]. Column 6, Line 45-50.
Freeland does not disclose the foreign language of the electronic text received by the electronic device and the generated audio being different from the native language of the voice recording of the voice selected by the user from a plurality of foreign languages, thereby eliminating a need to record multiple speakers for multiple foreign languages; and Examiner’s note: As set forth above Kurzweil has already been cited to disclose “from a plurality of foreign languages”. Examiner’s interpretation is that the selected language, which could be the same as the user provided text must be in a language that is different than the voice recording used to create the audio output, which is a concept not disclosed by Freeland. Kurzweil discloses TSS from parts of speech being used to generate audio in many languages, but Kurzweil does not explain whether the recorded parts of speech are limited to the speaker’s native language.  
Freeland discloses similar steps to those claimed as part of a conversation were a user provides text in a first language, selects a celebrity and determines that a message should be provided in the first language or another language, such as French. When another language is selected then the system translates the message and provides the message in the voice of a celebrity. In the claims of this application the claims receive the input text in the same language as the desired output, which is no different then how Freeland receives input text in English and provides the output in English. 
The difference is that the celebrity does not speak the selected language (i.e. English) so a synthetic speech is generated as if a famous character did speak the selected language using phonemes and diphones of the character extract from character speaking it their native language. 
However, Akabane discloses received phoneme data for a celebrity and drama data (text selected by the user) and generating synthetic speech as if a famous character speaks the language. For example, a user inputs Japanese text, selects Japanese as the output audio and selects the voice of famous foreign character that does not speak Japanese. Examiner respectfully asserts that is clear from Akabane that phonemes and diphones of the selected character speaking in in a different language are used to synthesize the audio in Japanese. See, “For example, the phoneme data may be of celebrities and entertainers (actors, actresses, voice actors, politicians, and so on) regardless of their nationalities and of any generations (infants, grade-schoolers, junior high school students, high school students, college students, full members of society, and so on).” [0036]. See, “Then, the text-to-speech synthesis processing section 16 generates a synthetic speech from the phoneme data supplied from the search section 15 and the drama data supplied from the search section 15 or the receiving section 11.” [0044]. See, “It should be noted that the text-to-speech synthesizing section 24 may be adapted to generate a synthetic speech on the basis of prosody data made up of pitch, power, and duration of tone, in addition to phoneme data. In this case, a synthetic speech may be generated as if a famous foreign character speaks Japanese, for example. Since prosody data are determines personality, a combination of the phoneme data of voice actor A and the phoneme data of voice actor B may also generate a synthetic speech of voice actor A in the manner of voice actor B. In addition, the text-to-speech synthesizing section 24 may convert one piece of speech data into another.” [0049]. See also [0045-0048] for an example of drama data, which is in this case is supplied in Japanese and output in Japanese but the voice could be synthetic speech as if a famous foreign character speaks Japanese.
Freeland discloses wherein the text-to-speech product as ancillary merchandise is a self-contained text-to-speech engine marketed and purveyed to a customer as ancillary merchandise and capable of creating a plurality of marketing impressions per said customer. See, “As a further example, it is possible to provide, in accordance with the inventive concept, a physical toy which can be configured by a user to play one or more voice messages in the voice of a character or personality represented by the stylistic design of the toy (for example, Elvis Presley or Homer Simpson). In either case, the text-based message can be constructed by the user by typing or otherwise constructing the text message representative of the desired audio message.” [0009]. See also, [0058, 0070, 0075, 0081-0085, 0109-0111, 0229, 0233, 0236-0244, 0320-0338] where Freedland provides a detailed disclosure of the toys features and operation.
Therefore, from the teaching of Kurzweil, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention for the voice synthesis of celebrity voices and the creation of audio in multiple languages, as disclosed by Freeland, to select from a plurality of foreign languages, as taught by Kurzweil, for the purpose of providing a narration of a text in multiple different voices and moods. [Abstract].


As per claim 20, Freeland in view of Akabane and Kurzweil;
20. A method for marketing using a text-to-speech product as ancillary merchandise, the method comprising: 
Freeland discloses selecting a voice from one or more voices such that the audio generated by the text-to-speech executable instructions from a set of electronic text is in the voice selected by the user; Examiner’s note: The user provides text in a first language. See, “Various embodiments are described below in detail. The system by which text is converted to speech is referred to as the TTS system. ln certain embodiments, the user can enter text or retrieve text which represents the written language statements of the audible words or language constructs that the user desires to be spoken. The TTS system processes this text-based message and performs a conversion operation upon the message to generate an audio message. The audio message is in the voice of a character that is recognisable to most users,…” [0110].
Freeland discloses acquiring a digitized set of raw data from a recording of the voice selected by a user, the voice recording being mined to produce a database of phonemes and diphones; Examiner’s note: Voice are mined from speech in a second language. See, “A 
Freeland does not disclose generating text-to-speech executable instructions from the acquired voice recordings, wherein the text-to-speech executable instructions configured to generate audio in a foreign language selected by the user from a plurality of foreign languages and in the voice selected by the user, Examiner’s note: As set forth above Freeland discloses translating from a first language to a second language. The example translates to the receivers language, thus Freeland does not disclose the user selects between a plurality of foreign languages. 
However, Kurzweil discloses characters (voices) that are associated with multiple voice models for different moods. The moods include language and the user can select a voice from the drop down menu of voices and a language via a drop down menu. See, “Another way to produce TTS voices concatenates small parts of speech which were recorded from an actual person. This concatenated TTS sounds more natural…Additionally, the computer voices can be associated with different languages (e.g., English, French, Spanish, Cantonese, Japanese, etc).” Column 3, Lines 1-5. See, “As used herein a "character" refers to an entity and is typically stored as a data structure or file, etc. on computer storage media and includes a graphical representation, e.g., picture, animation, or another graphical representation of the entity and which may in some embodiments be associated with a voice model. A "mood" refers to an instantiation of a voice model according to a particular "mood attribute" that is desired for the "native language," "foreign language," "hushed voice "loud voice," etc.” [0022]. Column 3, Line 50-65. See, “In some embodiments, a character can be associated with multiple voice models. If a character is associated with multiple voice models, the character has multiple moods that can be selected by the user. Each mood has an associated (single) voice model. When the user selects a character the user also selects the mood for the character such that the appropriate voice model is chosen. For example, a character could have multiple moods in which the character speaks in a different language in each of the moods.”[0032]. Column 5, Line 63 to Column 6, Line 4. See, “The edit cast member window 136 also includes a portion 147 for selecting a voice to be associated with the character. For example, the system can include a drop down menu of available voices and the user can select a voice from the drop down menu of voices and a language 148 via a drop down menu, as shown.” [0035]. Column 6, Line 45-50.
Freeland does not disclose the foreign language of the electronic text received by an electronic device and the generated audio being different from a native language of a source of the voice recording of the voice selected by the user, thereby eliminating a need to record multiple speakers for multiple foreign languages; and Examiner’s note: As set forth above Kurzweil has already been cited to disclose “from a plurality of foreign languages”. Examiner’s interpretation is that the selected language, which could be the same as the user provided text must be in a language that is different than the voice recording used to create the audio output, which is a concept not disclosed by Freeland. Kurzweil discloses TSS from parts 
Freeland discloses similar steps to those claimed as part of a conversation were a user provides text in a first language, selects a celebrity and determines that a message should be provided in the first language or another language, such as French. When another language is selected then the system translates the message and provides the message in the voice of a celebrity. In the claims of this application the claims receive the input text in the same language as the desired output, which is no different then how Freeland receives input text in English and provides the output in English. 
The difference is that the celebrity does not speak the selected language (i.e. English) so a synthetic speech is generated as if a famous character did speak the selected language using phonemes and diphones of the character extract from character speaking it their native language. As set forth above Freeland discloses translating between a native language and a foreign language as well “synthesises the audio output based on a stored theoretical model of the selected character's voice and individual phoneme or diphone.” Freeland discloses providing the output audio in the voice of a celebrity whose native language is the output language and fails to explicitly disclose using the phoneme or diphone of celebrities to output audio in a language that is different from the language of the voice recording of the voice selected by the user. Simply put, Feedland just fails to contemplate that a user might input French test and want audio in French with the voice of Elvis instead of a French speaker like Gerard Depardieu.
However, Akabane discloses received phoneme data for a celebrity and drama data (text selected by the user) and generating synthetic speech as if a famous character speaks the language. For example, a user inputs Japanese text, selects Japanese as the output audio and regardless of their nationalities and of any generations (infants, grade-schoolers, junior high school students, high school students, college students, full members of society, and so on).” [0036]. See, “Then, the text-to-speech synthesis processing section 16 generates a synthetic speech from the phoneme data supplied from the search section 15 and the drama data supplied from the search section 15 or the receiving section 11.” [0044]. See, “It should be noted that the text-to-speech synthesizing section 24 may be adapted to generate a synthetic speech on the basis of prosody data made up of pitch, power, and duration of tone, in addition to phoneme data. In this case, a synthetic speech may be generated as if a famous foreign character speaks Japanese, for example. Since prosody data are determines personality, a combination of the phoneme data of voice actor A and the phoneme data of voice actor B may also generate a synthetic speech of voice actor A in the manner of voice actor B. In addition, the text-to-speech synthesizing section 24 may convert one piece of speech data into another.” [0049]. See also [0045-0048] for an example of drama data, which is in this case is supplied in Japanese and output in Japanese but the voice could be synthetic speech as if a famous foreign character speaks Japanese.
Freeland discloses providing the text-to-speech executable instructions as a self-contained text-to-speech product configured to provide the user selected voice as audio generated from a source of electronic text; purveying said self-contained text-to-speech product as a souvenir, for marketing said text-to-speech product as ancillary merchandise at a venue; and wherein the self-contained text-to-speech product as ancillary merchandise is marketed and purveyed to a customer as ancillary merchandise and is capable of creating a plurality of marketing impressions. See, “As a further example, it is possible to provide, in accordance with the inventive concept, a physical toy which can be configured by a user to play one or more voice messages in the voice of a character or personality represented by the stylistic design of the toy (for example, Elvis Presley or Homer Simpson). In either case, the text-based message can be constructed by the user by typing or otherwise constructing the text message representative of the desired audio message.” [0009]. See also, [0058, 0070, 0075, 0081-0085, 0109-0111, 0229, 0233, 0236-0244, 0320-0338] where Freedland provides a detailed disclosure of the toys features and operation.
Therefore, from the teaching of Kurzweil, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention for the voice synthesis of celebrity voices and the creation of audio in multiple languages, as disclosed by Freeland, to select from a plurality of foreign languages, as taught by Kurzweil, for the purpose of providing a narration of a text in multiple different voices and moods. [Abstract].
Therefore, from the teaching of Akabane, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention for the voice synthesis of celebrity voices and the creation of audio in multiple languages, as disclosed by Freeland in view of Kurzweil, to use the voice samples in a first language to synthesize audio in a second language, as taught by Akabane in view of Kurzweil, for the purpose of providing a text-to-speech synthesis system which is capable of easily making input data be recited by user's desired readers. [0005].

As per claim 26, Freeland in view of Akabane and Kurzweil;
26. A text-to-speech product as ancillary merchandise, comprising:
Freeland discloses an electronic device including a processor and memory for storing one or more programs; and a non-transitory computer-readable storage medium comprising text-to-speech executable instructions configured to, when executed by a processor, generate audio of a source of electronic text, Examiner’s note: The user provides text in a first language. See, “Various embodiments are described below in detail. The system by which text is converted to speech is referred to as the TTS system. ln certain embodiments, the user can enter text or retrieve text which represents the written language statements of the audible words or language constructs that the user desires to be spoken. The TTS system processes this text-based message and performs a conversion operation upon the message to generate an audio message. The audio message is in the voice of a character that is recognisable to most users,…” [0110].
Freeland discloses wherein the audio generated is created from training pronunciation sounds from a voice recording in a native language; Examiner’s note: Voice are mined from speech in a second language. See, “A synthesised TTS system uses advanced text, phonetic and grammatical processing to enhance the range of phrases and sentences understood by the TTS system and relies to a lesser extent on pre-recorded words and phrases than does the concatenative TTS system but rather, synthesises the audio output based on a stored theoretical model of the selected character's voice and individual phoneme or diphone recordings.” [0121].
Freeland discloses wherein the audio generated is in a voice selected by a user from a plurality of voices;  Examiner’s note: The user selects a character. See, “In embodiments that 
Freeland does not discloses wherein a language of the electronic text and the audio generated is in the language selected by the user from a plurality of languages, Examiner’s note: As set forth above Freeland discloses translating from a first language to a second language. The example translates to the receivers language, thus Freeland does not disclose the user selects between a plurality of foreign languages. 
However, Kurzweil discloses characters (voices) that are associated with multiple voice models for different moods. The moods include language and the user can select a voice from the drop down menu of voices and a language via a drop down menu. See, “Another way to produce TTS voices concatenates small parts of speech which were recorded from an actual person. This concatenated TTS sounds more natural…Additionally, the computer voices can be associated with different languages (e.g., English, French, Spanish, Cantonese, Japanese, etc).” Column 3, Lines 1-5. See, “As used herein a "character" refers to an entity and is typically stored as a data structure or file, etc. on computer storage media and includes a graphical representation, e.g., picture, animation, or another graphical representation of the entity and which may in some embodiments be associated with a voice model. A "mood" refers to an instantiation of a voice model according to a particular "mood attribute" that is desired for the character. A character can have multiple associated moods. "Mood attributes" can be various attributes of a character. For instance, one attribute can be "normal," other attributes include "native language," "foreign language," "hushed voice "loud voice," etc.” [0022]. Column 3, Line 50-65. See, “In some embodiments, a character can be associated with multiple voice models. If a character is associated with multiple voice models, the character has multiple moods that can be selected by the user. Each mood has an associated (single) voice model. When the user selects a character the user also selects the mood for the character such that the appropriate voice model is chosen. For example, a character could have multiple moods in which the character speaks in a different language in each of the moods.”[0032]. Column 5, Line 63 to Column 6, Line 4. See, “The edit cast member window 136 also includes a portion 147 for selecting a voice to be associated with the character. For example, the system can include a drop down menu of available voices and the user can select a voice from the drop down menu of voices and a language 148 via a drop down menu, as shown.” [0035]. Column 6, Line 45-50.
Freeland does not disclose said language being different from the native language of the voice recording; and Examiner’s note: As set forth above Kurzweil has already been cited to disclose “from a plurality of foreign languages”. Examiner’s interpretation is that the selected language, which could be the same as the user provided text must be in a language that is different than the voice recording used to create the audio output, which is a concept not disclosed by Freeland. Kurzweil discloses TSS from parts of speech being used to generate audio in many languages, but Kurzweil does not explain whether the recorded parts of speech are limited to the speaker’s native language.  
Freeland discloses similar steps to those claimed as part of a conversation were a user provides text in a first language, selects a celebrity and determines that a message should be provided in the first language or another language, such as French. When another language is 
The difference is that the celebrity does not speak the selected language (i.e. English) so a synthetic speech is generated as if a famous character did speak the selected language using phonemes and diphones of the character extract from character speaking it their native language. As set forth above Freeland discloses translating between a native language and a foreign language as well “synthesises the audio output based on a stored theoretical model of the selected character's voice and individual phoneme or diphone.” Freeland discloses providing the output audio in the voice of a celebrity whose native language is the output language and fails to explicitly disclose using the phoneme or diphone of celebrities to output audio in a language that is different from the language of the voice recording of the voice selected by the user. Simply put, Feedland just fails to contemplate that a user might input French test and want audio in French with the voice of Elvis instead of a French speaker like Gerard Depardieu.
However, Akabane discloses received phoneme data for a celebrity and drama data (text selected by the user) and generating synthetic speech as if a famous character speaks the language. For example, a user inputs Japanese text, selects Japanese as the output audio and selects the voice of famous foreign character that does not speak Japanese. Examiner respectfully asserts that is clear from Akabane that phonemes and diphones of the selected character speaking in in a different language are used to synthesize the audio in Japanese. See, “For example, the phoneme data may be of celebrities and entertainers (actors, actresses, voice actors, politicians, and so on) regardless of their nationalities and of any generations (infants, a synthetic speech may be generated as if a famous foreign character speaks Japanese, for example. Since prosody data are determines personality, a combination of the phoneme data of voice actor A and the phoneme data of voice actor B may also generate a synthetic speech of voice actor A in the manner of voice actor B. In addition, the text-to-speech synthesizing section 24 may convert one piece of speech data into another.” [0049]. See also [0045-0048] for an example of drama data, which is in this case is supplied in Japanese and output in Japanese but the voice could be synthetic speech as if a famous foreign character speaks Japanese.
Freeland discloses wherein the text-to-speech product as ancillary merchandise is a voice-font product that runs on an existing text-to-speech engine, and that is purveyed to a customer as ancillary merchandise at venue. See, “As a further example, it is possible to provide, in accordance with the inventive concept, a physical toy which can be configured by a user to play one or more voice messages in the voice of a character or personality represented by the stylistic design of the toy (for example, Elvis Presley or Homer Simpson). In either case, the text-based message can be constructed by the user by typing or otherwise constructing the text message representative of the desired audio message.” [0009]. See also, [0058, 0070, 0075, 
Therefore, from the teaching of Kurzweil, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention for the voice synthesis of celebrity voices and the creation of audio in multiple languages, as disclosed by Freeland, to select from a plurality of foreign languages, as taught by Kurzweil, for the purpose of providing a narration of a text in multiple different voices and moods. [Abstract].
Therefore, from the teaching of Akabane, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention for the voice synthesis of celebrity voices and the creation of audio in multiple languages, as disclosed by Freeland in view of Kurzweil, to use the voice samples in a first language to synthesize audio in a second language, as taught by Akabane in view of Kurzweil, for the purpose of providing a text-to-speech synthesis system which is capable of easily making input data be recited by user's desired readers. [0005].

Claims that depend from Claim 13:

As per claim 2, Freeland in view of Akabane and Kurzweil;
 	Freeland discloses 2. The product of claim 13 wherein a source of the voice has an occupation selected from the group consisting of an activist, an actor, an astronaut, a banker, a celebrity, a chef, a comedian, a singer, a songwriter, a musician, an entertainer, a television commercial spokesperson, a disk jockey, a company owner, a millionaire, a rentier, an investor, a philanthropist, a religious leader, a psychology leader, a health leader, a doctor, a television personality, a talk show host, a reality star, an heir, a motivational speaker, a butler, a maid, a production assistant, a movie star, a real estate mogul, a consultant, a marketing consultant, a policeman, a soldier, a firefighter, a spokesperson, a cheerleader, a race car driver, a founder, an executive, a manager, an athlete, a referee, a team owner, a newscaster, a person who has become famous by appearing in a video, a social media blogger, a fashion model, a photography model, a radio announcer, a blog owner, a deceased celebrity, and a specialist. See, “The audio message is in the voice of a character that is recognisable to most users, such as a popular cartoon character (for example, Homer Simpson) or real-life personality (for example, Elvis Presley). Alternatively "stereotypical" characters may be used, such as a "rap artist" (e.g. Puffy), whereby the message is in a voice typical of how a rap artist speaks. Or the voice could be a "granny'' (for grandmother) "spaced" (for a spaced-out drugged person) or in a "sexy" voice. Many other stereotypical character voices can be used).” [0110].
 
As per claim 3, Freeland in view of Akabane and Kurzweil;
Freeland discloses 3. The product of claim 13 wherein said electronic device is selected from the group consisting of a computing device, a vehicle, a station, an unmanned vehicle, an appliance, a cloud, a smart phone, and a robotic device. See, “In the toy embodiment, the toy may optionally have electromechanical mechanisms for performing animation of moving parts of the toy during the replay of recorded messages. The toy has a number of mechanically actuated lugs for the connection of accessories. Optionally, the accessories represent stylised body parts, such as eyes, hat, mouth, ears etc. or stylised personal accessories, such as musical instruments, glasses, handbags etc;” [0322]. See, “Shown in FIG. 

As per claim 4, Freeland in view of Akabane and Kurzweil;
Freeland discloses 4. The product of claim 13, further comprising a venue offering for procurement the executable instructions product, wherein said venue is selected from the group consisting of an amusement park, a blog post, a booth, an in-program messaging, a conference, a concert, a concession stand, a contest, a convention, an entertainment event, a social media post, an internet, a fundraising/charity event, a gossip magazine, a gossip television show, a jumbotron, a kiosk, a launch, a magazine, a meeting, a movie, a musical, an Olympics event, an opera, a play, a rally, a race, a reality show, a rodeo, an email campaign, a sports event, a radio broadcast, a space event, a speech, a stage production, a souvenir place, a store, a commercial, a show, a short message, a theme park, a video, a wrestling event, a merchandise store, and a training event. See, “further feature relates to a novel way of purchasing a toy product online (such as over the Internet) as a gift. The product is selected, the shipping address is entered, the billing addres and payment details and a peronalised greeting message is entered in a manner similar to regular online purchases. Thereafter, upon shipping of the product to the recipient of the gift, instead of printing the giver's personal greeting message (for example, "Happy birthday Richard, I thought this Elma Fudd character would appeal to your sense of humour. From Peter'') upon a card or gift certificate to accompany the gift, said greeting message is preferably stored in a database on the Internet server computer(s)” [0337]. See, “further feature relates to a novel way of purchasing a toy product online (such as over the Internet) as a gift. The product is selected, the shipping address is entered, the billing addres and payment details and a peronalised greeting message is entered in a manner similar to regular online purchases. Thereafter, upon shipping of the product to the recipient of the gift, instead of printing the giver's personal greeting message (for example, "Happy birthday Richard, I thought this Elma Fudd character would appeal to your sense of humour. From Peter'') upon a card or gift certificate to accompany the gift, said greeting message is preferably stored in a database on the Internet server computer(s)).” [0338].

As per claim 6, Freeland in view of Akabane and Kurzweil;
Freeland discloses 6. The product of claim 13, further including an intermediate customer having a means of conveying said text-to-speech product to a first customer, wherein said marketing & purveying takes as its direct object said intermediate customer and as its indirect object said first customer. See, “further feature relates to a novel way of purchasing a toy product online (such as over the Internet) as a gift. The product is selected, the 

As per claim 14, Freeland in view of Akabane and Kurzweil;
Freeland discloses 14. The product of claim 13 wherein the set of text is generated from a source selected from the group consisting of email, short message service, social media posts, the internet, navigation systems, user recordings, app messages, computing device messages, vehicle messages, station messages, unmanned vehicle messages, appliance messages, internet of things messages, messages from a cloud, smart phone messages, and robot messages. See, “Preferably, the connetion means allows text-based messages (such as email) or recorded audio messages to be provided to the toy for playback 

As per claim 18, Freeland in view of Akabane and Kurzweil;
Freeland discloses 18. The product of claim 13, wherein the text-to-speech is general text-to-speech of a production quality with varying pitch and volume, capable of saying any message fluently. See, “Likewise, text could be highlighted and the toolbar button pressed to adjust the speed of the spoken text, the accent, the emotion, the volume etc. Visual coding (for example, by colour or via charts or graphs) indicate to the user, where the speech markers are set and what they mean.” [0175]. See, “When a text message is converted to a voice message via the TIS system, the prosidy (pitch and speaking speed) of the message is determined by one or another of the methods previously described. It would be advantageous, however, for the speaking speed of the message to be variable, depending upon factors, such as” [0153]. See, “For example, a TIS system would probably generate the word 'going' with rising pitch in the text message "So where do you think you're going?". A markup language can be used to instruct the TIS system to generate the word 'you're' with a sarcastic emphasis and the word 'going' with an elongated duration and falling pitch” [0165]. See, “For example, the FO signal could be frequency shifted from typical male to typical female (ie, to a higher frequency), 

As per claim 19, Freeland in view of Akabane and Kurzweil;
Freeland discloses 19. The product of claim 13, wherein the audio generated is in a foreign language and generated from a set of electronic text in that foreign language in the voice selected by the user. See, Freeland at [0144, 0142 and 0143] for generating audio that is in a foreign language in a voice selected by the user. The set of electronic text received is in a first language then the text is converted to a second language (foreign text) then the audio is output in the second language, thus in the scenario describe by Freeland the received text is not already in the foreign language. However, Claim 13 recites “receive a set of electronic text” and this limitation again recites “a set of electronic text in that foreign language”, which means these two sets of text do not have to be interpreted as the same, which means this limitation is met by the disclosure in Freeland. See, “A language conversion system (LCS) can be used with certain embodiments to convert a text message in one language to a text message in another language. The character ITS system is consequently adapted to include a supported word-base of voice samples in one or more characters, speaking in the target language; and paragraph 0143, note: Thus a user can convert a message from one language into another language, wherein the message is subsequently converted to an audio format message, representative of the voice of a character or personality, such as one well known in the culture of the second target language).” [0042].


Claims that depend from Claim 20:

As per claim 8, Freeland in view of Akabane and Kurzweil;
Freeland discloses 8. The method of claim 20 wherein  a source of the voice has an occupation selected from the group consisting of an activist, an actor, an astronaut, a banker, a celebrity, a chef, a comedian, a singer, a songwriter, a musician, an entertainer, a television commercial spokesperson, a disk jockey, a company owner, a millionaire, a rentier, an investor, a philanthropist, a religious leader, a psychology leader, a health leader, a doctor, a television personality, a talk show host, a reality star, an heir, a motivational speaker, a butler, a maid, a production assistant, a movie star, a real estate mogul, a consultant, a marketing consultant, a policeman, a soldier, a firefighter, a spokesperson, a cheerleader, a race car driver, a founder, an executive, a manager, an athlete, a referee, a team owner, a newscaster, a person who has become famous by appearing in a video, a social media blogger, a fashion model, a photography model, a radio announcer, a blog owner, a deceased celebrity, and a specialist. See, “The audio message is in the voice of a character that is recognisable to most users, such as a popular cartoon character (for example, Homer Simpson) or real-life personality (for example, Elvis Presley). Alternatively "stereotypical" characters may be used, such as a "rap artist" (e.g. Puffy), whereby the message is in a voice typical of how a rap artist speaks. Or the voice could be a 

As per claim 9, Freeland in view of Akabane and Kurzweil;
Freeland discloses 9. The method of claim 20 wherein said electronic device is selected from the group consisting of a computing device, a vehicle, a station, an unmanned vehicle, an appliance, a cloud, a smart phone, and a robotic device. See, “In the toy embodiment, the toy may optionally have electromechanical mechanisms for performing animation of moving parts of the toy during the replay of recorded messages. The toy has a number of mechanically actuated lugs for the connection of accessories. Optionally, the accessories represent stylised body parts, such as eyes, hat, mouth, ears etc. or stylised personal accessories, such as musical instruments, glasses, handbags etc;” [0322]. See, “Shown in FIG. 12 is a toy 70 that may be connectable to a computing means 72 via a connection means 74 through link 76 that may be wireless and therefore connected to a network or by fixed cable. The toy 70 has a non volatile memory 71 and a controller means 75. An audio message may be downloaded though various software to the computing means 72 via the Internet for example and subsequently transferred to the toy through the connection means 74;” [0330] and See, “A number of features specific to toy-based embodiments are now described. In one feature the audio format message remains in non-volatile memory 71 within the toy70 and can be replayed many times until the user instructs the microprocessor in the toy, by way of the controller means 75, to erase the message from the toy. Preferably, the toy is capable of storing multiple audio format messages and replaying any of these messages by operation of the controller means 75. 

As per claim 10, Freeland in view of Akabane and Kurzweil;
Freeland discloses 10. The method of claim 20 further comprising offering for procurement the executable instructions product at said venue, wherein said venue is selected from the group consisting of an amusement park, a blog post, a booth, an in-program messaging, a conference, a concert, a concession stand, a contest, a convention, an entertainment event, a social media post, an internet, a fundraising/charity event, a gossip magazine, a gossip television show, a jumbotron, a kiosk, a launch, a magazine, a meeting, a movie, a musical, an Olympics event, an opera, a play, a rally, a race, a reality show, a rodeo, an email campaign, a sports event, a radio broadcast, a space event, a speech, a stage production, a souvenir place, a store, a commercial, a show, a short message, a theme park, a video, a wrestling event, a merchandise store, and a training event. See, “further feature relates to a novel way of purchasing a toy product online (such as over the Internet) as a gift. The product is selected, the shipping address is entered, the billing addres and payment details and a peronalised greeting message is entered in a manner similar to regular online purchases. Thereafter, upon shipping of the product to the recipient of the gift, instead of printing the giver's personal greeting message (for example, "Happy birthday Richard, I thought this Elma Fudd character would appeal to your sense of humour. From Peter'') upon a card or gift certificate to accompany the gift, said greeting message is preferably stored in a database on the Internet server computer(s)” [0337]. See, “further feature relates to a novel way of purchasing a toy product online (such as over the Internet) as a gift. The product is selected, the shipping address 

As per claim 12, Freeland in view of Akabane and Kurzweil;
Freeland discloses 12. The method of claim 20 further including an intermediate customer having a means of conveying said text-to-speech product to a first customer, wherein said marketing & purveying takes as its direct object said intermediate customer and as its indirect object said first customer. See, “further feature relates to a novel way of purchasing a toy product online (such as over the Internet) as a gift. The product is selected, the shipping address is entered, the billing addres and payment details and a peronalised greeting message is entered in a manner similar to regular online purchases. Thereafter, upon shipping of the product to the recipient of the gift, instead of printing the giver's personal greeting message (for example, "Happy birthday Richard, I thought this Elma Fudd character would appeal to your sense of humour. From Peter'') upon a card or gift certificate to accompany the gift, said greeting message is preferably stored in a database on the Internet server computer(s)” [0337]. See, “further feature relates to a novel way of purchasing a toy product online (such as over the Internet) as a gift. The product is selected, the shipping address is entered, the billing addres and payment details and a peronalised greeting message is entered in a manner similar to regular online purchases. Thereafter, upon shipping of the 

As per claim 21, Freeland in view of Akabane and Kurzweil;
Freeland discloses 21. The method of claim 20 wherein the source of electronic text is generated from a source selected from the group consisting of email, short message service, social media posts, the internet, navigation systems, user recordings, app messages, computing device messages, vehicle messages, station messages, unmanned vehicle messages, appliance messages, internet of things messages, messages from a cloud, smart phone messages, and robot messages. See, “Preferably, the connetion means allows text-based messages (such as email) or recorded audio messages to be provided to the toy for playback through the speaker means. Alternatively, the connection means allows an audio signal to be provided directly to the speaker means for playback of audio message; abstract, note: Either a voice message or text based message may be used to construct the audio message” [0082]. See, “The inventive concept resides in a recogniton that text can desirably be converted into a voice representative of a particular character, such as a well-known entertainment personality or fictional character).” [0009].

As per claim 23, Freeland in view of Akabane and Kurzweil;
23. The method of claim 20 wherein the user selected voice is selected from a plurality of available voices. See, “In embodiments that are implemented in software, the chosen character is selected from a database of supported characters, either automatically or by the user. The conversion process of generating an audio message is described in greater detail below under the heading "TTS System." In the toy embodiment, the voice is desirably compatible with the visual design of the toy and/or the toy's accessories such as clip-on components.” [0111].

As per claim 24, Freeland in view of Akabane and Kurzweil;
Freeland discloses 24. The method of claim 20 wherein the text-to-speech is general text-to-speech of a production quality with varying pitch and volume, capable of saying any message fluently. See, “Likewise, text could be highlighted and the toolbar button pressed to adjust the speed of the spoken text, the accent, the emotion, the volume etc. Visual coding (for example, by colour or via charts or graphs) indicate to the user, where the speech markers are set and what they mean.” [0175]. See, “When a text message is converted to a voice message via the TIS system, the prosidy (pitch and speaking speed) of the message is determined by one or another of the methods previously described. It would be advantageous, however, for the speaking speed of the message to be variable, depending upon factors, such as” [0153]. See, “For example, a TIS system would probably generate the word 'going' with rising pitch in the text message "So where do you think you're going?". A markup language can be used to instruct the TIS system to generate the word 'you're' with a sarcastic emphasis and the word 'going' with an elongated duration and falling pitch” [0165]. See, “For example, the FO signal could be frequency shifted from typical male to typical female (ie, to a higher frequency),  

As per claim 25, Freeland in view of Akabane and Kurzweil;
Freeland discloses 25. The method of claim 20 wherein the audio generated is in a foreign language and generated from the set of electronic text in the foreign language in the voice selected by the user.  See, Freeland at [0144, 0142 and 0143] for generating audio that is in a foreign language in a voice selected by the user. The set of electronic text received is in a first language then the text is converted to a second language (foreign text) then the audio is output in the second language, thus in the scenario describe by Freeland the received text is not already in the foreign language. However, Claim 13 recites “receive a set of electronic text” and this limitation again recites “a set of electronic text in that foreign language”, which means these two sets of text do not have to be interpreted as the same, which means this limitation is met by the disclosure in Freeland. See, “A language conversion system (LCS) can be used with certain embodiments to convert a text message in one language to a text message in another language. The character ITS system is consequently adapted to include a supported word-base of voice samples in one or more characters, speaking in the target language; and paragraph 0143, note: Thus a user can convert a message from one language into another language, wherein the message is subsequently converted to an audio format message, representative of the voice of a character or personality, such as one well known in the culture of the second target language).” [0042].
.

Claims 16 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Freeland in view of Akabane and Kurzweil further in view of Meeker et al. (United States Patent Application Publication Number: US 2012/0073482).
As per claim 16, Freeland in view of Akabane, Kurzweil and Meeker teaches;
Freeland discloses 16. The product of claim 13, wherein the text-to-speech executable instructions are provided through a server (see paragraph 0043, note: Embodiments of the invention are preferably facilitated using a network which allows for communication of text-based messages and/or audio messages between users. Preferably, a network server can be used to distribute one or more audio messages generated in accordance with embodiments of the invention).
Freeman et al. does not expressly teach providing information through a cloud server. 
However Meeker et al. teaches providing information through a cloud server (see paragraph 0046, note: As alluded to above, several of the elements in cash management safe 1 have on-board programmable, or reprogrammable, memory. Cloud Server 8, through its connection to the safes, can re-program and update the instructions on flash memory devices 72 and 73. This provides the ability to globally update a system of several disparately located safes from a single remote location nearly simultaneously. Alternatively, a system of safes wherein the safes are acquired and installed over time may have variations among the models of individual elements. In that case, the cloud based server 8 can maintain a table, or index, of the component 
Before the effective filing date of the claimed invention it would have been obvious for one of ordinary skill in the art to have modified Freeland in view of Akabane and Kurzweil. with the aforementioned teachings from Meeker et al. with the motivation of providing a common way to update information remotely through use of a cloud server (see Meeker et al. paragraph 0046), when the use of servers to provide information to users is known (see Freeland et al. paragraph 0043).

As per claim 22, Freeland in view of Akabane, Kurzweil and Meeker teaches;
Freeland discloses The method of claim 20, wherein the text-to-speech executable instructions are provided through a server. (see paragraph 0043, note: Embodiments of the invention are preferably facilitated using a network which allows for communication of text-based messages and/or audio messages between users. Preferably, a network server can be used to distribute one or more audio messages generated in accordance with embodiments of the invention).
Freeman et al. in view does not expressly teach providing information through a cloud server. 
However Meeker et al. teaches providing information through a cloud server (see paragraph 0046, note: As alluded to above, several of the elements in cash management safe 1 
Before the effective filing date of the claimed invention it would have been obvious for one of ordinary skill in the art to have modified Freeland in view of Akabane and Kurzweil the aforementioned teachings from Meeker et al. with the motivation of providing a common way to update information remotely through use of a cloud server (see Meeker et al. paragraph 0046), when the use of servers to provide information to users is known (see Freeland et al. paragraph 0043).

Claim 30 is rejected under 35 U.S.C. 103 as being unpatentable over Freeland in view of in view of Akabane and Kurzweil, further in view of Noyes (United States Patent Application Publication Number: US 2003/0028377).
As per claim 30, Freeland in view of Akabane, Kurzweil, and Noyes teaches
30.    (New) The product of claim 13, wherein a customer provides the electronic device component and the text-to-speech executable instructions are sold to the customer as a software component, thereby eliminating a need for sales projections and capital investment in on-site physical ancillary merchandise stocking inventory, thereby eliminating a need for pre-sales physical storage of ancillary merchandise product inventory on location at an entertainment venue sales booth, and thereby allowing unlimited ancillary merchandise sales from road trips, small venues, or any location. (see paragraphs 0244, 0309, 0310, and 0336-0339. Paragraph 0309, note: Some or all of the components of the system can either be distributed as server or client software in a networked or internetworked environment and the split between functions of server and client is arbitrary and based on communications load, file size, compute power etc. Additionally, the complete system may be contained within a single stand alone device which does not rely on a network for operation. In this case, the system can be further refined to be embedded within a small appliance or other application with a relatively small memory and computational footprint for use in devices such as set-top boxes, Net PCs, Internet appliances, mobile phones etc, Paragraph 0337 note: A further feature relates to a novel way of purchasing a toy product online (such as over the Internet) as a gift. The product is selected, the shipping address is entered, the billing addres and payment details and a peronalised greeting message is entered in a manner similar to regular online purchases. Thereafter, upon shipping of the product to the recipient of the gift, instead of printing the giver's personal greeting message (for example, "Happy birthday Richard, I thought this Elma Fudd character would appeal to your sense of humour. From Peter") upon a card or gift certificate to accompany the gift, said greeting message is preferably stored in a database on the Internet server computer(s).

However, Noyes discloses selling synthesized voice of celebrities for devices provided by customers. See, “FIG. 1 is a schematic diagram of one embodiment of a Voice Flavor (VF) selection, distribution and application system. A voice flavors database 10 contains a variety of Voice Flavor Components (VFCs) 12 and Voice Flavor Profiles (VFPs) 14 which, when combined, describe the desired VFs 16 and enable them to be synthesized and processed in voice-enabled devices. This database 10 will be accessible from a Voice Flavors web site 20 over the Internet, satellite or other wireless network 30. Once purchased, VFs 16 will be downloaded over the network 30 or delivered on floppy disks, CDs, DVDs, microchips or other storage media and stored by the user on their home computer or other connected computer device 40.” [0016].
Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to have modified the translation of celebrity voices in Freeland in view of Akabane and Kurzweil to sell celebrity voices for customer devices as taught by Noyes with the motivation of allowing flexibility in the selection of voices for any particular voice-enabled device. (see Noyes paragraph 0024). 


Response to Arguments
Applicant’s arguments are moot because the claim are no longer rejected under 35 USC 101.
Regarding 35 USC 112(b): Applicant points to [0191] and [0243] as showing support for using parameters, such as diaphones extracted from a speaker’s native language to synthesize speech in a foreign language. Paragraph [0191] explains downloading software and following its instructions. Paragraph [0243] explains the case where the famous personality is no-longer living so samples are matched to obtain the needed parameters. Short of downloading the software and following the instructions there is no way to prove that this does not support what is claimed or enable a viable result. However, Examiner is at the least concerned that there is great deal of experimentation required to achieve usable voices of the speaker in foreign languages. Further, the specification also concedes that most if not all of the technology was “known to those skilled in the art in the art”. 
Applicant’s arguments with respect to the prior art rejections are acknowledged. However, the arguments are moot in view of the new grounds of rejection with respect to the additional prior art that is cited in this office action.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
Schultz et al. (United States Patent Application Publication Number: US 2002/0010584)
Gabai World Intellectual Property Organization (WIPO) WO 01/69829 teaches a networked interactive toy apparatus operative to promote sales (see abstract)

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Eric Netzloff whose telephone number is (571)270-3109. The examiner can normally be reached on Monday to Friday 08:00a.m.-05:00p.m. EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KAMBIZ ABDI can be reached on (571)272-6702. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ERIC R NETZLOFF/Primary Examiner, Art Unit 3688