Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-20 are pending. Claims 1 and 19-20 are independent.
This Application was published as U.S. 20220005460.
            Apparent priority: 2 July 2020.
	Figure 1 provides an overview of the process and steps 100to 400 of Figure 1 are expounded in the follow charts of Figures 2-5

    PNG
    media_image1.png
    543
    438
    media_image1.png
    Greyscale


Instant Application refers to a term: “walla”:
[0003] Background speech audio--known as walla--is an audio component that mimics the vocal murmur and background dialogue of crowds that people are accustomed to hearing as they go about in public. Incorporating walla into a film or television production can lend it a sense of realism; a production that is lacking walla can feel empty and unreal.

Claims 1-18 have no mention to “walla” which appears to be the main focus of the instant Application.
Claims 19-20 have limitations similar in scope to the limitations of Claim 1 but they include in the preamble that their method is for generating “walla.”  In other words, whatever Claims 1 and 19 and 20 generate are being generated by nearly identical methods except that Claims 19 and 20 call the end product “walla.”  Preamble is accorded weight if it breathes life and meaning into the Claim.  In this situation, how is the Examiner to interpret the preamble as breathing life and meaning when the Applicant has provided Claims with identical limitations and all ending in the synthesis of speech and two of the Claims calling the method as being for generation of “walla” and the other remaining quiet? Place the “walla” inside the body of the Claim and as a limitation and preferably as the last and resulting limitation if generation of “walla” is intended.

Note also the following definitions:
[0046] In this embodiment, components of the syllabic profile pertain to which syllables occur within the input text and how many syllables are used as, or combined for, words. The syllabic profile includes syllables of the input text themselves, the frequencies at which each individual syllable occurs in the input text (the syllable-occurrence rates), and the rate at which n-syllable words occur in the text (the syllable-count-per-word rates). More or fewer components may be included in the syllabic profile.
[0013] In an embodiment, the grammatical profile comprises: a syllabic profile including the syllables, the syllable-occurrence rates and the syllable-count-per-word rates; and a structural profile defining the input text of actual words as a function of at least one of: word-count-per-sentence rates, sentence type occurrence rates, number of words per sentence, frequency of interjections, and distribution of word length by speaker.
Drawings
Figure 6 includes a series of numbered blocks that do not include any suitable legend for understanding the drawing. Accordingly, these drawings fail to convey the part of invention to which they are intended to pertain without referring to the Specification. 
37 C.F.R. 1.84 Standards for drawings:
(o) Legends. Suitable descriptive legends may be used subject to approval by the Office, or may be required by the examiner where necessary for understanding of the drawing. They should contain as few words as possible.
To overcome the objection, refer to Figures 1-5 which are good examples of the amount of information that is considered "suitable descriptive legends" and a drawing should include.
Additionally, the drawings are objected to under 37 CFR 1.83(a) because they fail to show the names of the modules as described in the specification. Any structural detail that is essential for a proper understanding of the disclosed invention should be shown in the drawing. MPEP § 608.02(d). 
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Objections
The language of the independent Claims 1 and 19-20 is unclear.
1. A computer-implemented method for synthesizing speech audio comprising: 
obtaining a grammatical profile defining an input text of actual words as a function of at least syllable-occurrence rates and syllable-count-per-word rates; 
generating a dictionary of pseudo-words having the syllable-count-per-word rates, each pseudo-word consisting of one syllable or concatenated syllables selected from the input text, wherein substantially all of the pseudo-words are not actual words; 
constructing an output text product having the grammatical profile, the output text product comprising at least one sentence consisting of one or more pseudo-words selected from the dictionary; and 
synthesizing speech audio using the output text product. 

It is not clear from the language of “obtaining a grammatical profile defining an input text of actual words as a function of at least syllable-occurrence rates and syllable-count-per-word rates;” whether the “input text” is an input to and the basis of determination of the “grammatical profile” or an output of and generated by the “grammatical profile.”
According to Figure 2, the “grammatical profile” is generated from the “input text.”

    PNG
    media_image2.png
    424
    456
    media_image2.png
    Greyscale

Additionally, it is not clear which one of the “grammatical profile,” “input text,” or the “actual words” is “a function of at least syllable occurrence rat ….”
Suggestion:
obtaining a grammatical profile [[defining]] from an input text of actual words, wherein the grammatical profile is [[as]] a function of at least syllable-occurrence rates and syllable-count-per-word rates; 

Claims 19-20 include similar language and must be corrected as well.

An indefiniteness rejection under 35 U.S.C. 112(b) is not handed out because, based on the remainder of the Claims and Disclosure, the meaning is clear to the Examiner.  However, the language of this Claim and its counterparts, on its face, is quite confusing and must be fixed.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Step 1: The independent Claims are directed to statutory categories: 
Claim 1 is a method claim and directed to the process category of patentable subject matter.
Claim 19 is a computer-readable-storage device claim and is directed to the machine or manufacture category of patentable subject matter.
Claim 20 is a system claim and directed to the machine or manufacture category of patentable subject matter.

Step 2A, Prong One: Does the Claim recite a Judicially Recognized Exception? Abstract Idea? Are these Claims nevertheless considered Abstract as a Mathematical Concept (mathematical relationships, mathematical formulas or equations, mathematical calculations), Mental Process (concepts performed in the human mind (including an observation, evaluation, judgment, opinion), or Certain Methods of Organizing Human Activity (1-fundamental economic principles or practices (including hedging, insurance, mitigating risk), 2-commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations), 3- managing personal behavior or relationships or interactions between people (including social activities, teaching, and following rules or instructions) and fall under the judicial exception to patentable subject matter?)
The rejected Claims are directed to Mental Processes.
Step 2A, Prong Two: Additional Elements that Integrate the Judicial Exception into a Practical Application? Identifying whether there are any additional elements recited in the claim beyond the judicial exception(s), and evaluating those additional elements to determine whether they integrate the exception into a practical application of the exception. “Integration into a practical application” requires an additional element(s) or a combination of additional elements in the claim to apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that the claim is more than a drafting effort designed to monopolize the exception. Uses the considerations laid out by the Supreme Court and the Federal Circuit to evaluate whether the judicial exception is integrated into a practical application.

The rejected Claims do not include additional limitations that point to integration of the abstract idea into a practical application.
Claim 1 is a generic automation of a mental process because a human can go through the steps in his mind and then utter (synthesize) the word. We have a new question (prong 2 of step 2A) in the 101 analysis that asks whether the abstract idea is integrated with a practical application. The answer is no in this instance because there is no technological solution in the Claim that “integrates” the abstract idea. The Claim does not describe a practical application.  
Note Claims 19-20 provide “synthesizing walla” in the preamble.  However, this is an insufficient practical application because the relationship to technology remains unexpressed and the Claims do not include a limitation (aside from the mention in the preamble) of how this “walla” is generated.

1. A computer-implemented method for synthesizing speech audio comprising: 
obtaining a grammatical profile defining an input text of actual words as a function of at least syllable-occurrence rates and syllable-count-per-word rates; [Person picks up an Input text= come here.  Syllables: come he ere.  Syllable occurrence rate is com =1, he=1, ere= 1.  Syllable count per word: 1 of 1-syllable word and 1 of 2-syllable word.]
generating a dictionary of pseudo-words having the syllable-count-per-word rates, each pseudo-word consisting of one syllable or concatenated syllables selected from the input text, wherein substantially all of the pseudo-words are not actual words; [Person Generate the pseudowords = ere and erecom and hecom and comere.  The words include 1-syllable words and 2-syllable words. ] [How would a machine do this?  Based on what procedure? Generates all of such possible words through all possible combinations? Generates some of such words? Little particularity provided by limitation and for a machine to perform a step as this one, particularity is required.]
constructing an output text product having the grammatical profile, the output text product comprising at least one sentence consisting of one or more pseudo-words selected from the dictionary; and [Person constructs the sentence: “ere erecome.”  Sentence must have a 1-syllable made-up word and a 2-syllable made-up word to have the same grammatical profile of “come here.”] [Again, how? How many such sentences are constructed? How are some selected and others culled?  Limitation requires particularity.]
synthesizing speech audio using the output text product. [Person speaks out the sentence “ere erecome.”]

syllable-count-per-word rate = frequency of occurrence of n-syllable words in the input text for any n.  For example: there are 5 of 1-syllable words, 3 of 2-syllable words, and 10 of 3-syllable words in the input text.  Answers: how many few-syllable words and how many many-syllable words are there.
syllable-occurrence rates = a frequency of occurrence for each of the different syllables occurring in the input text.  For example, syllable “out” occurs 7 times and syllable “put” occurs 5 times.  Answers: which syllables occur in the input text?

Step 2B: Search for Inventive Concept: Additional Element Do not amount to Significantly More: The preamble refers to “a computer-implemented method for synthesizing speech audio” and the last limitation states “synthesizing speech audio using the output text product” without mentioning a particular TTS software or hardware or even stating that any product or module is used for the synthesis of speech.  Independent Claim 19 is a CRM claim and refers to a “computer-readable medium” and “computer code” and Claim 20 is a system Claim and includes a “computer processor.”    These are all well-understood, routine, and conventional machine components that and are being used for their conventional and rather generic functions.  Additionally, these limitations are expressed parenthetically and lack nexus to the Claim language and as such are a separable and divisible mention to a machine. Accordingly, they are not sufficient to cause the Claim to amount to significantly more than the underlying abstract idea. 
The Dependent Claims do not add limitations that could help the Claim as a whole to amount to significantly more than the Abstract idea identified for the Independent Claim:
2. The method of claim 1, wherein constructing the output text product comprises: 
constructing multiple sentences each consisting of one or more pseudo-words selected from the dictionary. [Person can make several nonsensical sentences.  No specific algorithm/method is articulated for the constructing of these sentences.]

3. The method of claim 2, wherein constructing multiple sentences further comprises: 
associating each of the multiple sentences with an intended one of a plurality of different speakers. [Person assigns each of the sentences to one of his friends to read out.]

Limitations of Claim 4 come close to not being Abstract.  Steps 2, 3, and 4 set forth a method that can be implemented by a computer.  But, step 1 is still abstract in the sense that it does not set forth an algorithm for the machine to generate the pseudoword.  The remaining step of systematically comparing the pseudoword to the words of a regular dictionary to ascertain that the dictionary (dictionaries) do not include the word, is tailored to programming for a machine.  Further, because the key limitations of Claim 1 are abstract and lack nexus to a technological device or environment, the addition of limitations of Claim 4 to Claim 1 would not be sufficient as additional elements that integrate the judicial exception into a practical application or additional elements that cause the Claim as a whole to amount to substantially more than the underlying abstract idea.
4. The method of claim 1, wherein the generating comprises: 
generating a potential pseudo-word using one or more syllables selected from the input text; [Person picks out the syllables at will and generates word.]
comparing the potential pseudo-word with actual words in a set of actual words; [When the person is not certain if the word is actually a pseudoword, he checks the dictionary to make sure the generated word does not exist as an actual word.]
in the event that the potential pseudo-word is determined to substantially match one or more of the actual words, discarding the potential pseudo-word; and 
otherwise: adding the potential pseudo-word to the dictionary. 

Claim 5 provides a definition for the “grammatical profile.”   
5. The method of claim 1, wherein the grammatical profile comprises: 
a syllabic profile including the syllables, the syllable-occurrence rates and the syllable-count-per-word rates; and 
a structural profile defining the input text of actual words as a function of at least one of: word-count-per-sentence rates, sentence type occurrence rates, number of words per sentence, frequency of interjections, and distribution of word length by speaker. 

Claim 6 provides a definition for the “syllable-occurrence rates.”   This definition should actually be present in Claim 1 or opens the door for a very broad definition of the phrase in Claim 1.
6. The method of claim 1, wherein the syllable-occurrence rates include at least: a frequency of occurrence for each of a plurality of different syllables occurring in the input text. 

Claim 7 provides a definition for the “syllable-count-per-word rates”   This definition should actually be present in Claim 1 or opens the door for a very broad definition of the phrase in Claim 1.
7. The method of claim 1, wherein the syllable-count-per-word rates include at least: a frequency of monosyllabic words occurring in the input text, a frequency of bisyllabic words occurring in the input text, a frequency of trisyllabic words occurring in the input text, and a frequency of quadrisyllabic words occurring in the input text. 

Claim 8 reiterates Claim 7 only with a fancy formula and with the number of syllables being unlimited.
8. The method of claim 1, wherein the syllable-count-per-word rates comprise: for each of n=1 to x: a frequency of n-syllable words occurring in the input text; wherein x>1. 

Claim 9 sets a limit for the number of syllables.
9. The method of claim 8, further comprising: 
establishing x as a number of syllables in the highest-syllable count word occurring in the input text. 

Claim 10 further defines the grammatical profile.  The grammatical profile is not an input.  The input text is an input.  The various components of the definition of the grammatical profile are obtained from the input text.  Claim 1 does not set forth a method of analyzing the “input text” to arrive at these components.
10. The method of claim 8, wherein the grammatical profile further defines average distributions of positions of n-syllable words within sentences in the input text. 

Claim 11 provides a type for the input text.
11. The method of claim 1, wherein the input text comprises at least one film/television script. [Person can take up a script to set the pattern for his walla generation.]

Claim 12 provides no steps that can be programmed for a computer to perform.  The Person can read the input text and come up with the syllabic an structural profiles.
12. The method of claim 5, further comprising: 
processing the input text to generate the syllabic profile and the structural profile. 

13. The method of claim 1, further comprising: 
causing user-selectable prosody to manifest in the speech audio. [Person gives each part to a friend of his to read out or uses a different accent to read each portion (Australian accent; Irish accent).]

14. The method of claim 13, wherein the user-selectable prosody is one or more of the characteristics selected from the group consisting of: energy, complexity, muting, volume, injection, speak over, grit, pitch, intonation, rhythm, and tempo. [Person can change how loudly he speaks out each portion.]

15. The method of claim 13, wherein causing user-selectable prosody to manifest in the speech audio comprises: 
associating at least one selected prosody characteristic with one or more of the pseudo-words. [Person reads out each made-up word with a different volume or accent.]

16. The method of claim 13, wherein causing user-selectable prosody to manifest in the speech audio comprises: 
during the synthesizing, providing at least one selected prosody characteristic and the output text product to a text-to-speech process. [This limitation is good but needs a bit more considering that the remaining limitations are so very devoid of technological components.]

17. The method of claim 13, wherein causing user-selectable prosody to manifest in the speech audio comprises: 
after the synthesizing, modifying the speech audio in accordance with at least one selected prosody characteristic. [This limitation is not abstract by itself because a person cannot modify his speech after he has spoken it.  However, in view of the abstract nature of Claims 1+13, the limitation of this Claim amounts to post-solution activity that, while of a technological nature, is not sufficient to make the Claim as a whole as substantially more significant than the underlying abstract idea.  The key steps of Claim 1 need to be stated in technological terms and with an indivisible tie to technological components.  These additional limitations would not be sufficient as additional elements that integrate the judicial exception into a practical application or as additional elements that cause the Claim as a whole to amount to substantially more than the underlying abstract idea.]

18. The method of claim 13, further comprising: 
generating and displaying user interface elements enabling a user to select the user-selectable prosody. [Person can select his friends from a screen presenting their names and ask them to read out a sentence.]

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed 1invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 6, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Oudeyer (U.S. 2002/0198717) in view of Butler (U.S. 20170154546).

Oudeyer is directed to synthesizing speech that is gibberish as is the overall frame of the Claim. 

Regarding Claim 1, Oudeyer teaches:
1. A computer-implemented method for synthesizing speech audio [Oudeyer, Figure 1, “Synthesize a voice/sound S4.”  The output speech in Oudeyer is gibberish/meaningless speech that is output by a Robot and intended only to sound a certain way in order to convey a certain emotion.]
comprising: 
obtaining a grammatical profile defining an input text of actual words as a function of at least syllable-occurrence rates and syllable-count-per-word rates; [Oudeyer begins in Figure 1 with S1 which detects or obtains an “emotional state” that needs to be output by the synthesized speech.  So, it does not begin with text.  Although this limitation does not quite begin with text either and rather starts out by a number and type of syllables.  Number of syllables in Oudeyer is a random number selected between one and a maximum number where the maximum is one of the input parameters of step S3.] 
generating a dictionary of pseudo-words having the syllable-count-per-word rates, each pseudo-word consisting of one syllable or concatenated syllables selected from the input text, [Oudeyer generates meaningless words that are formed into meaningless sentences.  For generating the meaningless words, it uses a certain number of syllables.  Number of syllables used is one of the parameters determined at S3 and S3 may come before or after S2.   “[0064] Generation of a meaningless sentence to be uttered can be realized by randomly combining words each of which is produced by randomly combining syllables. Herein, each syllable is composed of a combination of a consonantal phoneme C and a vowel phoneme V in the form of CV or CCV ….”  “[0061] Emotion and the algorithm of synthesizing meaningless words are described in detail below. An object of the present embodiment is to realize a technique of producing a meaningless sentence which is varied each time when it is uttered so that it seems to be a realistic speech….”  “[0072] [3-1] The number of syllables is determined for each word. For example, the number of syllables is given by a random number within the range from 2 to MAXSYLL. Herein, MAXSLYY is a voice synthesis parameter indicating the maximum number of syllables allowed to be included in one word.”  “[0053] … This step S2 may be performed before step S1 or after step S3 which will be described later….”]
wherein substantially all of the pseudo-words are not actual words; [Oudeyer, the sentence of step S2 is intended to be meaningless on purpose and is made up of meaningless words:  “[0053] In the following step S2, a sentence representing a content to be uttered in the form of a voice is outputted. This step S2 may be performed before step S1 or after step S3 which will be described later. A new sentence may be produced each time it is outputted or a sentence may be randomly selected from a plurality of sentences prepared in advance. However, in the present embodiment of the invention, the sentence should have a meaningless content, because, in contrast to meaningful dialogs which are difficult to produce, meaningless sentences can be easily produced by a simply-configured robot apparatus, and addition of emotional expressions allows meaningless sentences to seem to be realistic dialogs….” ]
constructing an output text product having the grammatical profile, the output text product comprising at least one sentence consisting of one or more pseudo-words selected from the dictionary; and [Oudeyer, Figure 1, the “output sentence” of step S2 which is synthesized into speech is a meaningless sentence.  “2 …. said sentence has a meaningless content.”  “[0043] From the above investigation results, it can be concluded that communication between a human being and a robot apparatus via a meaningless word is possible …”  See also [0053] above.]
synthesizing speech audio using the output text product. [Oudeyer, Figure 1, S4 which is based on the “output sentence” of step S2. “ … The robot apparatus (1) utters a sentence by means of voice synthesis …a sentence output step (S2) for outputting a sentence representing a content to be uttered in the form of a voice … a voice synthesis step (S4) for inputting, to a voice synthesis unit, the sentence output in the sentence output step (S2) and synthesizing a voice in accordance with the controlled parameter.”  Abstract.]

Oudeyer forms meaningless syllables according to a rule.  Oudeyer does not teach that it uses the “syllable-occurrence rates” and does not teach that it obtains its “syllable-occurrence rates” and “syllable-count-per-word rates” from an “input text.”

Butler is directed to lexical dialect analysis and analyzes words based on their sound patterns and teaches obtaining the syllable count and syllable type of an input text.  Butler expressly teaches determining a syllable count and also teaches that the syllable to be counted is input by the user who asking for the syllable count and therefore once the syllable count is obtained, it is the count of a particular syllable and it separately determines how many syllables are in a word:
obtaining a grammatical profile defining an input text of actual words [Butler is directed to lexical analysis of an input file which may be a text file. “[0066] … A result 802 may include an analysis of sound patterns of words from, for example, a text file, a transcript of an audio file, an audio file, a database of words from a source, or some other source….”   Butler, Figure 3, “Present Input Form 302” which is “input text of actual words.”  Figure 5 shows “file markup” where a user submits a file and the system marks up the locations of a particular sound/syllable in the file.  “[0049] … The input form 302 may be presented as a web page, or as an application, or as some other input form type….”  “[0013] … input files may be, for example, text files, audio files, video files or other input files….”  “[0030] … The lexical query may be based on typed input entered by the user, or on audio data spoken by the user, or on a text input file selected by the user, or from an audio input file selected by the user, or on a video input file selected by the user, or on some other type of data….” ] 
as a function of at least syllable-occurrence rates [ Butler teaches various statistics generated for a particular sound pattern/syllable that is requested by the user and the generated statistics show how many times this syllable occurs in the input file that is being searched.  Figure 2, “lexical analysis 216” and 204 and 208 showing the syllables/sounds found in the word “infinity.” Figure 4, “sound search 420,” shows that the user may ask 428 for the system to find the words with a particular sound/syllable (and at a particular location in the word).  This will generate the “syllable occurrence rate” of the input text. Butler identifies Sounds or Sound Patterns which form a syllable:  “[0021] The regular expression may be generated from the user interface elements by, for example, mapping each option of each user interface element to at least a portion of a regular expression. A user interface element configured to allow a user to specify a sound pattern in the start of a word may be a dropdown list of possible sounds, which allows a single selection. As used herein, a "sound pattern" is a sequence of one or more sounds associated with the pronunciation of a word. For example, the word "infinity" has four syllables, ("IH0 N," "F IH1," "N AX0," and "T IY0") in Arpabet and ("In," "`fI," "n," and "ti") in IPA with a stress on the second syllable (the "l" in Arpabet and the accent mark in IPA). The first syllable ("IH N" in Arpabet) has two sound patterns, the "IH" (as in "fish" or "sit") and the "N" (as in "nice" or "any"). The first syllable is also a sound pattern, "IH N" (as in "inner" or "spin"). Sound patterns can be comprised of additional sets of syllables. For example, the first two syllables of infinity ("IH N" and "F IH") also are a sound pattern (as in "infinite" or "Spinfisher.RTM."). Sound patterns may be specified for lexical queries as single elements (e.g., "IH" or "N") or as sequences of such elements (e.g., "IH N" or "IH N"; "F IH").” “[0054] The user interface 502 illustrates mark-up 520 functionality which may allow a user to analyze a file and to mark words within that file that match one or more specified sound patterns, by searching for those patterns within the lexicon database 510. A user may first browse for a file 522 and then may specify one or more pattern/color pairs that may be used to mark-up the file. In the example illustrated in FIG. 5, there is a first pattern 524 that specifies that words in the file that have the "IY" pattern anywhere in the word should be marked in blue and a second pattern 526 that specifies that words in the file that have the "EH" pattern anywhere in the medial position should be marked in red. ….”  Figure 6 finds the words that have a specific sound pattern/syllable:   “[0061] The user interface 602 illustrates a basic categorization 620 functionality which may allow a user to load a file and to categorize words within that file that match one or more specified sound patterns, by searching for those patterns within the lexicon database 610. The search, which may be the same as the search described herein in connection with FIG. 1, may be performed by generating queries from a user interface that are based on constraints such as sound position, number of syllables, language, sub-language, word frequency, and/or other such constraints….”  Figure 7 is an advanced search:  “[0065] …   For example, the advanced custom category 722 may include functionality to specify a custom category name, to specify a proceeding boundary for a sound, to specify a proceeding sound, to specify a primary sound, to specify a following sound, to specify a post-pattern boundary, and to specify whether the pattern may cross syllable boundaries. Specifications for syllable boundaries and/or whether patterns may cross syllable boundaries may introduce additional lexical query constraints and/or regular expressions into the lexical query 704….” ]  and 
syllable-count-per-word rates; [Butler provides a Syllable Count.  Butler, Figure 2, “Syllable Analysis 212.”  “[0043] The pronunciation dictionary 208 entry and the one or more pronunciations from the pronunciation translator 210 may be used by a syllable analysis 212 system to determine the number of syllables in a word. …”   Figures 4-7 show that the user may submit a query to words with a particular number of syllables or a range of syllables such as 1 to 9 (422) and the frequency of occurrence of such words (424).  “[0052] The user interface 402 illustrates sound search 420 functionality which may allow a user to search for words within the lexicon database 410 that may match one or more sound patterns and that may also match one or more other word parameters. ….  The user interface 402 may also allow a user to specify 422 words that have a certain number of syllables or a certain range of syllables….”  This teaches the “syllable-count-per-word rate” of the Claim.  The number of syllables of a word (syllable-count-per-word) is input as a constraint and the system returns the subset of words that have each of the specified number of syllables:  “1 …. receiving a second set of constraints, each constraint of the second set of constraints specifying one or more non-sound specific aspects of a word; …” and “2…. wherein the one or more non-sound specific aspects of the word include at least one of: a number of syllables of the word, a minimum number of syllables of the word, a maximum number of syllables of the word,…  or a frequency of the word.”]

Oudeyer and Butler pertain to language analysis and it would have been obvious to combine the teachings of Butler which provides a detailed lexical analysis of an input text with the teachings of Oudeyer which is directed to generating meaningless speech which begins by generating meaningless words of a particular syllable count in order to impart more specificity to the meaningless speech that is generated by Oudeyer and add the type of syllables as another parameter used for the generation and synthesis of output meaningless speech.  The method of Oudeyer uses syllables that are made up according to a particular rule and instead of the rule used in Oudeyer the syllable types obtained from the lexical analysis of Butler can be used.  This combination falls under simple substitution of one known element for another to obtain predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 6, Oudeyer does not teach lexical analysis.
Butler teaches:
6. The method of claim 1, wherein the syllable-occurrence rates include at least: a frequency of occurrence for each of a plurality of different syllables occurring in the input text. [Butler, Figure 2, “lexical analysis 216” and 204 and 208 showing the syllables/sounds found in the word “infinity.” Figure 4, “sound search 420,” shows that the user may ask 428 for the system to find the words with a particular sound/syllable (and at a particular location in the word).  This will generate the “syllable occurrence rate” of the input text.  See [0021].  Figure 6 finds the words that have a specific sound pattern/syllable. See [0061].  Figure 7 is an advanced search.  See [0065].  All of these respond to the user asking for a particular Syllable/Sound pattern and therefore the statistics that is generated pertains to that particular syllable. ]  
Rationale for combination as provided for Claim 1.

Claim 19 is a computer program product system claim with limitations corresponding to the limitations of method Claim 1 and is rejected under similar rationale.  Additionally, Oudeyer teaches:  “[0102] In the body unit 2, there is provided a control unit 16 including, as shown in FIG. 7, a CPU (Central Processing unit) 10, a DRAM (Dynamic Random Access Memory) 11, a flash ROM (Read Only Memory) 12, a PC personal Computer) card interface circuit 13, and a signal processor 14, wherein these components are connected to one another via an internal bus 15. In the body unit 2, there are also provided a battery 17 serving as a power source of the robot apparatus 1 and an angular velocity sensor 18 and an acceleration sensor 19 for detecting the orientation and the acceleration of motion of the robot apparatus 1.”
(Butler: “[0072] FIG. 9 is a simplified block diagram of a computer system 900 that may be used to practice an embodiment of the present invention. In various embodiments, the computer system 900 may be used to implement any of the systems illustrated and described above. For example, the computer system 900 may be used to implement processes for performing lexical queries according to the present disclosure. As shown in FIG. 9, the computer system 900 may include one or more processors 902 that may be configured to communicate with and are operatively coupled to a number of peripheral subsystems via a bus subsystem 904. These peripheral subsystems may include a storage subsystem 906, comprising a memory subsystem 908 and a file storage subsystem 910, one or more user interface input devices 912, user interface output devices 914, and a network interface subsystem 916.”  “[0077] The storage subsystem 906 may provide a computer-readable storage medium for storing the basic programming and data constructs that provide the functionality of the present invention. Software (programs, code modules, instructions) that, when executed by one or more processors 902, may provide the functionality of the present invention, may be stored in storage subsystem 906. The storage subsystem 906 may also provide a repository for storing data used in accordance with the present invention. The storage subsystem 906 may comprise memory subsystem 908 and file/disk storage subsystem 910. The storage subsystem may include database storage for the lexicon database, file storage for results files, and/or other storage functionality.”)
(Kurzweil, Figure 1 shows the hardware structure including the “storage 16” which stores the operative software and the “processor 14” which executes the software.)
Claim 20 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.   Additionally, see Oudeyer, Figure 7 and [0102].
(Butler, Figure 9 and [0072] and [0077].)
(Kurzweil, Figure 1.)

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Oudeyer and Butler in view of Vanstone (U.S. 20090022309).
Regarding Claim 2, Oudyer directed to meaningless sentences and if one sentence is generated more can be by the same method.
Butler does not teach generating sentences.
Vanstone expressly teaches: 
2. The method of claim 1, wherein constructing the output text product comprises: 
constructing multiple sentences each consisting of one or more pseudo-words selected from the dictionary. [Vanstone teaches forming a paragraph of several sentences:  “…. The text representations can be further transformed to a paragraph in an apparently grammatically correct form.”  Abstract.  Figure 4, “[0070] A sentence is constructed at step 460. This construction step is more than simply aggregating together all components determined in the above steps. The sentence is constructed grammatically. For example, if a question sentence is to be constructed, an appropriate auxiliary verb is first determined and then placed at the beginning of the sentence….”  “[0071] After a sentence is constructed, if there are still more bits remaining in the bit stream (step 470), the process returns to step 410 to construct the next sentence. If all bits have been consumed, the sentence constructed will be the last sentence and all sentences constructed will be sent to an output (step 480). ….”]
Oudeyer and Butler and Vanstone pertain to language analysis generation and it would have been obvious to combine the generation of a dictionary of pseudowords from Vanstone and generation of entire paragraphs made up of pseudowords from Vanstone with the system of combination for a teaching and component which is quite express in generation of multiple sentences.  This combination falls under simple substitution of one known element for another to obtain predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Vanstone teaches the generation of a vocabulary of pseudowords from a text including real words and then generating a grammatically correct text that is made up of pseudowords.  Note the mapping to portions of Claim 1:
obtaining a grammatical profile defining an input text of actual words as a function of at least syllable-occurrence rates and syllable-count-per-word rates; [Vanstone, Figure 1, “select collection of words {W} 110.”  “[0051] This process 100 is further illustrated in FIG. 1. First, at step 110, a collection of words is selected. As mentioned before, the collection of words may be a database of valid English words. As will be understood, other databases of words or groupings of characters can also be used…”]
generating a dictionary of pseudo-words having the syllable-count-per-word rates, each pseudo-word consisting of one syllable or concatenated syllables selected from the input text, [Vanstone teaches generating a lexicon/vocabulary/ dictionary of pseudowords:  “… The text representation is formed from words selected from a vocabulary, which may include a collection of pseudowords.….”  Abstract.  Figure 1 shows the process of generating pseudowords which ends with “output pseudoword 180.”  The pseudowords are made from “segments” of the “initial words” and are thus a concatenation of segments “selected from the input text.”  “Vanstone does not teach that the “segments” are selected to be “syllables.”  However, it teaches:  “[0052] … Of course, although four-letter segments are used in this example, other groupings, for example, pairs of letters or three-letter segments, can also be used….”]
wherein substantially all of the pseudo-words are not actual words; [Vanstone, Figure 1. The process checks to make sure that the generated word is not an actual word:  ”[0052] …Optionally, the generated pseudoword can be compared with words in the collection at step 180 to determine if it is an exact match of one existing word. If so, the generated word is rejected, and the process returns to step 120 to form a word that is not one of the "valid" words.”]
constructing an output text product having the grammatical profile, the output text product comprising at least one sentence consisting of one or more pseudo-words selected from the dictionary; and [Vanstone, Figure 4, shows that the pseudowords are put together to generate a what appears as a grammatically correct sentence.  “Construct sentence 460” and [0070].  “[0066] FIG. 4 illustrates in detail the steps of an example adaptive process 400 for converting a bit stream to a text that is grammatically correct, but does not necessarily convey any meaning….”  The reason is that these grammatically correct sentences are easier to encode:  “[0064] Another approach to rendering a cryptographic value legible, the so-called "Grammatical Paragraph Method", is to make grammatically correct text. … Sentences that do not make semantic sense may well be usable as a user interface to cryptographic values. Because non-semantic sentences have less redundancy they can offer smaller representations.”]

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Oudeyer and Butler and Vanstone in view of view of Kurzweil (U.S. 20110288861).

Claims 3 and 13-18 pertain to Figure 5 of the instant Application:

    PNG
    media_image3.png
    526
    482
    media_image3.png
    Greyscale

Regarding Claim 3, Oudeyer is not specific on the particulars of speech synthesis.  Butler and Vanstone are not directed to speech synthesis
Kurzweil teaches:
3. The method of claim 2, wherein constructing multiple sentences further comprises: 
associating each of the multiple sentences with an intended one of a plurality of different speakers. [Kurzweil, Figures 3 and 4.  User can select a portion of the text (104) and select a character/speaker (such as Homer Simpson or Charlie Brown) to narrate that highlighted portion of the text.  See Figure 7 for assigning each portion of the text to a separate Speaker including Henry and Sally:  “[0062] Referring to FIG. 7, a portion of an exemplary document rendered on a user display 171 that includes text based tags is shown. Here, the actors names are written inside square braces (using a technique that is common in theatrical play scripts). Each line of text has a character name associated with the text….”]
Oudeyer/Butler/Vanstone and Kurzweil pertain to speech synthesis and it would have been obvious to combine the elaborate speech synthesis teachings of Kurzweil with the system of combination to provide facets and features for the synthesis step as a tandem combination or as substitution of the elaborate scheme of Kurzweil for the simple synthesis of the combination.  This combination falls under simple substitution of one known element for another to obtain predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claims 11 and 13-18 are rejected under 35 U.S.C. 103 as being unpatentable over Oudeyer and Butler in view of Kurzweil.
Regarding Claim 11, Oudeyer and Butler do not specify that input text is a film script.
Kurzweil teaches:
11. The method of claim 1, wherein the input text comprises at least one film/television script. [Kurzweil, Figure 7 is an example of the input text being a script that includes a conversation between Henry and Sally.  “[0040] Referring to FIG. 2, text 50 is rendered on a user display 51. As shown, the text 50 includes only words and does not include images. However, in some examples, the text could include portions that are composed of images and portions that are composed of words. The text 50 is a technical paper, namely, "The Nature and Origin of Instructional Objects." Exemplary texts include but not limited to electronic versions of books, word processor documents, PDF files, electronic versions of newspapers, magazines, fliers, pamphlets, menus, scripts, plays, and the like. The system 10 can read the text using one or more stored voice models. In some examples, the system 10 reads different portions of the text 50 using different voice models. For example, if the text includes multiple characters, a listener may find listening to the text more engaging if different voices are used for each of the characters in the text rather than using a single voice for the entire narration of the text. In another example, extremely important or key points could be emphasized by using a different voice model to recite those portions of the text.” “[0062] Referring to FIG. 7, a portion of an exemplary document rendered on a user display 171 that includes text based tags is shown. Here, the actors names are written inside square braces (using a technique that is common in theatrical play scripts). Each line of text has a character name associated with the text. The character name is set out from the text of the story or document with a set of brackets or other computer recognizable indicator such as the pound key, an asterisks, parenthesis, a percent sign, etc. For example, the first line 172 shown in document 170 includes the text "[Henry] Hi Sally!" and the second line 174 includes the text "[Sally] Hi Henry, how are you?" Henry and Sally are both characters in the story and character models can be generated to associate a voice model, volume, reading speed, etc. with the character, for example, using the methods described herein….”  “[0107] In some additional examples, in addition to associating voice models to read various portions of the text, a user can additionally associate sound effects with different portions of the text. For example, a user can select a particular place within the text at which a sound effect should occur and/or can select a portion of the text during which a particular sound effect such as music should be played. For example, if a script indicates that eerie music plays, a user can select those portions of the text and associate a music file (e.g., a wave file) of eerie music with the text. When the system reads the story, in addition to reading the text using an associated voice model (based on voice model highlighting), the system also plays the eerie music (based on the sound effect highlighting).”]
Oudeyer/Butler and Kurzweil pertain to speech synthesis and it would have been obvious to combine the teaching of Kurzweil that the input text is a film script with the system of combination to provide a specific type of input text.  This combination falls under simple substitution of one known element for another to obtain predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 13, Oudeyer and Butler are not specific on the particulars of speech synthesis.
Kurzweil teaches:
13. The method of claim 1, further comprising: 
causing user-selectable prosody to manifest in the speech audio. [Kurzweil permits the user to select the character/voice prosody for each portion of the text that is to be narrated.  Figures 3, 4, and 7.  “[0049] Referring to FIG. 4 a process 100 for selecting different characters or voice models to be used when the system 10 reads a text is shown. The system 10 displays 102 the text on a user interface. In response to a user selection, the system 10 receives 104 a selection of a portion of the text and displays 106 a menu of available characters each associated with a particular voice model. In response to a user selecting a particular character (e.g., by clicking on the character from the menu), the system receives 108 the user selected character and associates the selected portion of the text with the voice model for the character…..”]
Oudeyer/Butler and Kurzweil pertain to speech synthesis and it would have been obvious to combine the elaborate speech synthesis teachings of Kurzweil with the system of combination to provide facets and features for the synthesis step as a tandem combination or as substitution of the elaborate scheme of Kurzweil for the simple synthesis of the combination.  This combination falls under simple substitution of one known element for another to obtain predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 14, Oudeyer and Butler are not specific on the particulars of speech synthesis.
Kurzweil teaches:
14. The method of claim 13, wherein the user-selectable prosody is one or more of the characteristics selected from the group consisting of: energy, complexity, muting, volume, injection, speak over, grit, pitch, intonation, rhythm, and tempo. [Kurzweil, Figure 5, 146, 149.  “[0054] …The edit cast member window 136 also includes a portion 145 for selecting the color or type of visual indicia to be applied to the text selected by a user to be read using the particular character. The edit cast member window 136 also includes a portion 149 for selecting a volume for the narration by the character.”  “[0055] As shown in FIG. 5, a sliding scale is presented and a user moves a slider on the sliding scale to indicate a relative increase or decrease in the volume of the narration by the corresponding character. In some additional examples, a drop down menu can include various volume options such as very soft, soft, normal, loud, very loud. The edit cast member window 136 also includes a portion 146 for selecting a reading speed for the character. The reading speed provides an average number of words per minute that the computer system will read at when the text is associated with the character….”]
Rationale for combination as provided for Claim 13.

Regarding Claim 15, Oudeyer and Butler are not specific on the particulars of speech synthesis.
Kurzweil teaches:
15. The method of claim 13, wherein causing user-selectable prosody to manifest in the speech audio comprises: 
associating at least one selected prosody characteristic with one or more of the pseudo-words. [Kurzweil, Figures 3 and 4. Kurzweil permits the user to select a portion of text (104) to be narrated and associated a voice (particular prosody) with it (106, 108).  The text can be any text and while the text in Kurzweil is not shown to be made up of pseudo-words, the type of text does not impact the principles of operation of Kurzweil.]
Rationale for combination as provided for Claim 13.

Regarding Claim 16, Oudeyer and Butler are not specific on the particulars of speech synthesis.
Kurzweil teaches:
16. The method of claim 13, wherein causing user-selectable prosody to manifest in the speech audio comprises: 
during the synthesizing, providing at least one selected prosody characteristic and the output text product to a text-to-speech process. [Kurzweil teaches that its narration may be produced by various methods including speech synthesis or Text-to-Speech process:  “[0038] Text is narrated by the narration software 30 using several possible technologies: text-to-speech (TTS); audio recording of speech; and possibly in combination with speech, audio recordings of music (e.g., background music) and sound effects (e.g., brief sounds such as gunshots, door slamming, tea kettle boiling, etc.). The narration software 30 controls generation of speech, by controlling a particular computer voice (or audio recording) stored on the computer 12, causing that voice to be rendered through the computer's speakers 22. Narration software often uses a text-to-speech (TTS) voice which artificially synthesizes a voice by converting normal language text into speech….”  “[0069] In some embodiments, the voice models associated with the characters can be electronic Text-To-Speech (TTS) voice models. TTS voices artificially produce a voice by converting normal text into speech. In some examples, the TTS voice models are customized based on a human voice to emulate a particular voice….”  “[0099] As previously described, a portion of text is marked to be read aloud either by a particular TTS voice (e.g., a default voice or user selected voice) or by playing an audio recording. A single document can contain one portion of text marked for TTS and another portion marked for an audio recording. When the system begins reading a portion of text, and the portion is marked for TTS, the system presents the text to be read to the TTS engine, which will produce speech from the text….”]
Rationale for combination as provided for Claim 13.

Regarding Claim 17, Oudeyer and Butler are not specific on the particulars of speech synthesis.
Kurzweil teaches:
17. The method of claim 13, wherein causing user-selectable prosody to manifest in the speech audio comprises: 
after the synthesizing, modifying the speech audio in accordance with at least one selected prosody characteristic. [Kurzweil permits the user to select the character/voice prosody for each portion of the text that is to be narrated and also teaches that after the selection of a character voice for reading/synthesizing a portion of the text, the user may change the volume or speed of narration.  Figure 5, 145, 146, “[0055] As shown in FIG. 5, a sliding scale is presented and a user moves a slider on the sliding scale to indicate a relative increase or decrease in the volume of the narration by the corresponding character. In some additional examples, a drop down menu can include various volume options such as very soft, soft, normal, loud, very loud. The edit cast member window 136 also includes a portion 146 for selecting a reading speed for the character. The reading speed provides an average number of words per minute that the computer system will read at when the text is associated with the character. As such, the portion for selecting the reading speed modifies the speed at which the character reads. The edit cast member window 136 also includes a portion 138 for associating an image with the character. This image can be presented to the user when the user selects a portion of the text to associate with a character (e.g., as shown in FIG. 3). The edit cast member window 136 can also include an input for selecting the gender of the character (e.g., as shown in block 140) and an input for selecting the age of the character (e.g., as shown in block 142). Other attributes of the voice model can be modified in a similar manner.”  ]
Rationale for combination as provided for Claim 13.

Regarding Claim 18, Oudeyer and Butler are not specific on the particulars of speech synthesis.
Kurzweil teaches:
18. The method of claim 13, further comprising: 
generating and displaying user interface elements enabling a user to select the user-selectable prosody. [Kurzweil permits the user to select the character/voice prosody for each portion of the text that is to be narrated.  Figures 3, 4, and 7.  “Display menu of available characters for narration voices 106” and “Receive user selection of one of the characters 108.”]
Rationale for combination as provided for Claim 13.

Claims 4-5, 7-10 and 12 rejected under 35 U.S.C. 103 as being unpatentable over Oudeyer and Butler (U.S. 20170154546) in view of Kucera (U.S. 4773009).

Butler counts the syllables specified by a user.  Kucera has a general syllable counter that counts the syllables of words of an input text in order to determine the readability of the text.  (Not too many big words.)

Regarding Claim 4, Oudeyer and Butler do not teach the details of determining pseudowords in the manner of Claim 4.
Kucera teaches and therefore suggests:
4. The method of claim 1, wherein the generating comprises: [Kucera, Figure 6 looks up the word in a lexicon/list of words and if not found (OOV term), checks it against an “except list” and if not found in either check it to identify abbreviations, acronyms, and proper nouns.  Either of the “exception list” words or “other words” not found in any list can be considered pseudowords.  The process that Kucera uses for separating the different words teaches the process that is claimed be it for a different purpose.]
generating a potential pseudo-word using one or more syllables selected from the input text; [Kucera does not pertain to pseudowords but finds abbreviations and words that are not regular words which could suggest the “pseudoword” of the Claim.  Figure 6, “… Each word is stripped of punctuation in stage 170 and is looked up in the Dale-Chall list in a look-up operation 171. If the word is in the Dale-Chall list its syllable count is returned and a counter 172 tallies the syllable count data. If not, a check 173 determines if the word is in the exception list….”  Col. 10, lines 8-18.]
comparing the potential pseudo-word with actual words in a set of actual words; [Kucera, Figure 6, “lookup operation 171.”  The DC list is defined in col. 9, line 42 to col. 10, line 7 as a list of common word:  “The table is a modified Dale-Chall list of the words which have been determined on a statistical basis to be the most frequently used words in the English language. Several modifications to the table have been made…..”]
in the event that the potential pseudo-word is determined to substantially match one or more of the actual words, discarding the potential pseudo-word; and [Kucera, Figure 6, this is when the word is a listed word: “… Each word is stripped of punctuation in stage 170 and is looked up in the Dale-Chall list in a look-up operation 171. If the word is in the Dale-Chall list its syllable count is returned and a counter 172 tallies the syllable count data. ….”  Col. 10, lines 8-18.]
otherwise: adding the potential pseudo-word to the dictionary. [Kucera, the “exception list” of Kucera is a list of non-dictionary words and Kucera also includes “Other words” which are words that are not real words and not in the exceptions list either:  “… If not, a check 173 determines if the word is in the exception list. If so, the exception syllable count is returned and tallied….”  Col. 10, lines 8-18.  “For the remaining words, a syllable counter 176 performs an ordered sequence of substitutions 178 of certain letters and symbols for the occurrence of other letters and symbols, when occurring in a particular context in the text, to produce modified text words….”  Col. 10, lines 31-40.  “First, however, words not found in the Dale-Chall or exception lists are checked at stage 174 to identify abbreviations, acronyms, and proper nouns for special processing.”  Col. 10, lines 41-44.]
Oudeyer/Butler and Kucera pertain to lexical analysis and it would have been obvious to combine the more detailed process of Kucera that loops through different lists to check the words for being an OOV or abbreviation, etc. with the system of combination that does not provide how it checks to confirm that the generated word is not a dictionary word.  This combination falls under simple substitution of one known element for another to obtain predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 5, Butler as applied to Claim 1 determines the types of syllables.  Oudeyer and Butler do not teach determining the structural profile.
Kucera teaches:
5. The method of claim 1, wherein the grammatical profile comprises: 
a syllabic profile including the syllables, the syllable-occurrence rates and the syllable-count-per-word rates; and [Kucera, “…A preferred embodiment includes a readability analyzer having a syllable counter for determining the number of syllables in each word….In a preferred embodiment, tallies are kept of words per sentence, syllable count, sentences per paragraph, and similar data, and readability scores based on the tallies are displayed.”  Abstract.  Figure 2, “syllable counter 64.”] 
a structural profile defining the input text of actual words as a function of at least one of: word-count-per-sentence rates, sentence type occurrence rates, number of words per sentence, frequency of interjections, and distribution of word length by speaker.  [Kucera, “An electronic text analyzer operates on an ordered block of digitally coded text by analyzing sequential strings thereof to determine paragraph and sentence boundaries. Each string is broken down into component words. Possible abbreviations are identified and checked against a table of common abbreviations to identify abbreviations which cannot end a sentence. End punctuation and the following string are analyzed to identify the terminal word of a sentence. When sentence boundaries have been determined, the test may be further processed by a grammar checker, a readability analyzer, or other higher-level text processing system. ….”  Abstract.   “5. A system according to claim 4, further including word table means, for storing a Dale-Chall table of common words together with an indication of the syllable count of a word, and wherein the counts of basic text units include at least one of word length in syllables and occurrences of familiar words in the text.”]
Rationale for combination as provided for Claim 4.  Kucera is similar to the Oudeyer/Butler (particularly to Butler) but provides additional details more expressly.

Regarding Claim 7, Oudeyer and Butler both teach syllable count determination.
Kucera more expressly teaches:
7. The method of claim 1, wherein the syllable-count-per-word rates include at least: a frequency of monosyllabic words occurring in the input text, a frequency of bisyllabic words occurring in the input text, a frequency of trisyllabic words occurring in the input text, and a frequency of quadrisyllabic words occurring in the input text. [Kucera, Figure 2, “syllable counter 64” which counts the syllables of each word and “readability processor 68” which determines how many high count syllable words exist in the text provide data from which the frequency of n-syllable words can be determined whatever n may be.  See Figure 7, the various Readability Measures.  “… The readability is a measure of the style difficulty of text, and has been empirically associated in the literature with a great number of different readability formulae. Among the known formulae for readability in the literature are the FOG index which measures the proportion of words having greater than three syllables; several indices related to the proportion of words occurring on the Dale-Chall list of common English words; and several formulae based on functions of both sentence length in words, and word length in syllables or characters.”  Col. 1, line 56 to Col. 2, line 2.]
Rationale for combination as provided for Claim 5.  Kucera is particularly similar to Butler but provides additional details more expressly.

Regarding Claim 8, Oudeyer and Butler teach syllable count.  Oudeyer gives a Maxsyl parameter to its system and Butler lets the user determine the number of syllables being counted in the words.
Kucera more expressly teaches:
8. The method of claim 1, wherein the syllable-count-per-word rates comprise: for each of n=1 to x: a frequency of n-syllable words occurring in the input text; wherein x>1. [Kucera, see rejection of Claim 7.  Kucera determines the number (n) of syllables in the words of the text, whatever n may be.  The term “rate” refers to the number of n-syllabic words that occur in the text.  (See the definition from the instant Application.) ]
Rationale for combination as provided for Claim 5.

Regarding Claim 9, Oudeyer gives a Maxsyl parameter to its system and Butler lets the user determine the number of syllables being counted in the words.
Kucera more expressly teaches:
9. The method of claim 8, further comprising: 
establishing x as a number of syllables in the highest-syllable count word occurring in the input text. [Kucera, see rejection of Claim 7.  Kucera determines the number (n) of syllables in the words of the text, whatever n may be.  The term “rate” refers to the number of n-syllabic words that occur in the text.  (See the definition from the instant Application.) Obviously, and as Kucera does, if the number of syllables of all of the words are to be determined, no matter what n may be, then n has to be set to the largest number syllables per word in the text.]
Rationale for combination as provided for Claim 5.

Regarding Claim 10, Oudeyer does not address position of particular meaningless words.  
Butler teaches in Figure 5 that user sets the location/position that a particular pattern (syllable) should occur in the word as well as the structure of input sentences:  “[0038] … Retaining such capitalization and/or punctuation information may be used to reproduce the sentence and/or paragraph structure of a source document where multiple query words are processed….”
Kucera teaches and suggests:
10. The method of claim 8, wherein the grammatical profile further defines average distributions of positions of n-syllable words within sentences in the input text. [Kucera, Figure 2, “Sentence Splitter 58” which takes input from the “Token Processor 56” provides the “token position” to the “central processor 124” where the “tokens” are words or punctuation.  “Token position” is a byte count position in the input buffer.  See Figure 5.  Because the information regarding the positions of words within the sentence and paragraph is determined, the “average distributions of positions” of the words can be derived.  Further, the syllable count of each word is also known by the operation of “syllable counter 64” of Figure 2.]
Oudeyer/Butler and Kucera pertain to lexical analysis and it would have been obvious to combine the more detailed process of Kucera that determines the positions of n-syllable words in the input sentence with the system of combination to provide more information that links the input to the generated output.  This combination falls under simple substitution of one known element for another to obtain predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

(Instant Application:  “[0052] In this embodiment, the syllabic profile portion of the grammatical profile also includes a component establishing the average distributions of the positions of n-syllable words within sentences of the input text. This component is for codifying trends arising in the input text whereby certain length words--those with a particular number of syllables--are typically clustered more often at the beginning, middle or ends of sentences. For example, in a given input text for a given language, monosyllabic words may tend to arise more often at the beginning of sentences, whereas quadrasyllabic words in the same input text may tend to arise only in the middle of sentences. The average distributions of the positions of n-syllable words within sentences can be used to inform the downstream construction of sentences in the output text product using the dictionary.”)

Regarding Claim 12, Oudeyer generates the meaningless sentences.  
Butler teaches:” [0038] In an embodiment, a lexicon entry includes a word, one or more phonetic pronunciations such as those described herein, the number of syllables of the word, the frequency count of the word (i.e., how common the word is in a representative corpus), which language the word may belong to, which sub-language (if any) the word may belong to, and other such information. The example lexicon entry 204 is for the English word "infinity." The word may be stored in a lower case written form ("infinity," in this example) that is stripped of all capitalization and punctuation. This normalized form of the word may be configured to facilitate easier searches for the word, so that, for example, a search for "infinity," "Infinity," or "INFINITY" (and/or other word forms) may yield the same search results, all based on a search for the normalized form. The capitalization and/or punctuation information may be retained so that the original input may be reproduced after processing. Retaining such capitalization and/or punctuation information may be used to reproduce the sentence and/or paragraph structure of a source document where multiple query words are processed.”
Kucera more expressly teaches:
12. The method of claim 5, further comprising: 
processing the input text to generate the syllabic profile and the structural profile. [Kucera, Figure 2, “syllable counter 64” determines the “syllabic profile” of the input text and “token processor 56” and “sentence splitter 58” determine the “structural profile.”  See also Figure 5 which determines the “positions” of the tokens (words or punctuations) in the buffer storing the text.  See also the Abstract referring to the steps of the analysis of the text.]
Rationale for combination as provided for Claim 5.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Kucera teaches obtaining the syllable count of the words of a text in order to determine readability of the text:
obtaining a grammatical profile defining an input text of actual words as a function of at least syllable-occurrence rates and syllable-count-per-word rates; [Kucera teaches, Figure 2, a “syllable counter 64,” which determines the “syllable-count-per-word rate” of the words of an input text:   “An electronic text analyzer operates on an ordered block of digitally coded text by analyzing sequential strings thereof to determine paragraph and sentence boundaries. Each string is broken down into component words. Possible abbreviations are identified and checked against a table of common abbreviations to identify abbreviations which cannot end a sentence. End punctuation and the following string are analyzed to identify the terminal word of a sentence. When sentence boundaries have been determined, the test may be further processed by a grammar checker, a readability analyzer, or other higher-level text processing system. A preferred embodiment includes a readability analyzer having a syllable counter for determining the number of syllables in each word. The system includes a modified common-word table having an empirical syllable-count field. A checker first determines if a word is in the table and, if so, returns its syllable count. An exception table identifies words not conforming to a syllable counting algorithm. Each word not in the common-word or exception tables is modified, and the modified word is processed to derive its syllable count. In a preferred embodiment, tallies are kept of words per sentence, syllable count, sentences per paragraph, and similar data, and readability scores based on the tallies are displayed.”  Abstract.]

Kuceara in Figure 6 includes operations 176 and 178 where a word that does not yield to the regular counting of syllables is modified in order to yield a syllable count.   “For the remaining words, a syllable counter 176 performs an ordered sequence of substitutions 178 of certain letters and symbols for the occurrence of other letters and symbols, when occurring in a particular context in the text, to produce modified text words. Finally, a counter 179 counts occurrences of the vowels and certain symbols in the substituted word to yield the syllable count of the original word. The syllable counter thus provides the syllable count data for readability analysis discussed further below.”  Col. 10, lines 31-40.  This is a type of generation of pseudowords.  But it is not based on the syllable count of the words of the text.

Regarding Claim 1, Kurzweil teaches: 
1. A computer-implemented method for synthesizing speech audio [Kurzweil is directed to “narration of a text” by various methods including “speech synthesis.  “[0003] Recent advances in computer technology and computer based speech synthesis have opened various possibilities for the artificial production of human speech. A computer system used for artificial production of human speech can be called a speech synthesizer. One type of speech synthesizer is text-to-speech (TTS) system which converts normal language text into speech.”]
…
synthesizing speech audio using the output text product. [Kurzweil, “[0038] Text is narrated by the narration software 30 using several possible technologies: text-to-speech (TTS); audio recording of speech; and possibly in combination with speech, audio recordings of music (e.g., background music) and sound effects (e.g., brief sounds such as gunshots, door slamming, tea kettle boiling, etc.). The narration software 30 controls generation of speech, by controlling a particular computer voice (or audio recording) stored on the computer 12, causing that voice to be rendered through the computer's speakers 22. Narration software often uses a text-to-speech (TTS) voice which artificially synthesizes a voice by converting normal language text into speech….”]
Kurzweil is a straight forward speech synthesis reference and synthesizes the text that is provided to it.
Kurzweil does not teach generating a text of pseudowords and then synthesizing that text into speech.

Chang (U.S. 2013/0166303) is directed to determining statistics of a script.  See Figure 2 where the user can query a script scenes and obtain a ranked list of scenes that best match his query where this list is determined from the statistics of the script.  “[0023] A script converter 110 is included to capture movie and/or television scripts (e.g., "Hollywood Movie" or "Television Spec" scripts). In some implementations, script elements are systematically extracted from scripts by the script converter 110 and converted into a structured format. This may allow script elements (e.g., scenes, shots, action, characters, dialog, parentheticals, camera transitions) to be accessible as metadata to other applications, such as those that provide indexing, searching, and organization of video by textual content. The script converter 110 may capture scripts from a wide variety of sources, for example, from professional screenwriters using word processing or script writing tools, from fan-transcribed scripts of film and television content, and from legacy script archives captured by optical character recognition (OCR).”  “[0007] … Performing the online search may comprise entering the first term in an online search engine, receiving a search result from the online search engine for the first term, computing statistics of word occurrences in the search results, and selecting the second term from the search result based on the statistics.”

Bhamidipati (U.S. 20110093270): 
[0024] Various techniques can be used for identifying syllables. Examples of the techniques include, but are not limited to, a technique described in a publication titled "Syllable detection in read and spontaneous speech" by Hartmut R. Pfitzinger, Susanne Burger, Sebastian Heid, of Institut fur Phonetik and Sprachliche Kommunikation, University of Munich, Germany; and in a publication titled "Syllable detection and segmentation using temporal flow neural networks" by Lokendra Shastri, Shuangyu Chang, Steven Greenberg of International Computer Science Institute, which are incorporated herein by reference in their entirety.
[0025] Sound of consonants and sound of vowels are also identified in the first syllable in the first audio and in the second syllable in the second audio. The sound of vowels and sound of consonants can be identified using various techniques, for example a technique described in a publication titled "Robust Acoustic-Based Syllable Detection" by Zhimin Xie, Partha Niyogi of Department of Computer Science University of Chicago, Chicago, Ill.; in a publication titled "Vowel landmark detection" by A W Howitt, submitted on 15 Jan. 1999 to Eurospeech 99, the 6th European Conference on Speech Communication and Technology, 5-10 Sep. 1999, Budapest, Hungary, organized by ESCA, the European Speech Communication Association; in a publication titled "Detection of speech landmarks: Use of temporal information" by Ariel Salomon, Carol Y. Espy-Wilson, and Om Deshmukh in The Journal of the Acoustical Society of America, 2004; and in a publication titled "Speech recognition based on phonetic features and acoustic landmarks" by Amit Juneja in Pages: 169 Year of Publication: 2004 ISBN: 0-496-13166-4, Order Number: AAI3152591, ACM, which are incorporated herein by reference in their entirety.

Marcus (U.S. 20100162879) is directed to automated generation of a song and Marcus, Figure 1, “Load Process 310” provides the “Input Text” of the Claim and “Determine Syllabic Profile (Segments 350)” determines the “syllable occurrence rate” and “syllable count per word” of the Claim.  The “Syllabic Profile” determines a syllable count for the words of the text.   “5 … syllabic profile is a number of syllables in at least one of the lyrical segments.”  “6. … syllabic profile is a number of letters in a particular word in at least one of the lyrical segments.”  “[0021] In block 350, the syllabic profile of the lyrical segments can be determined. Specifically, a syllable count can be determined for the text of one or more of the lyrical segments. Alternatively, a letter count can be determined for a word in one or more of the lyrical segments deemed important for learning the process….”

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached on 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Fariba Sirjani/
Primary Examiner, Art Unit 2659