DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 10/30/2020.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Specification
The disclosure is objected to because of the following informalities: page 11, [0058], line 6: "<Tome walked" should read "<Tom walked".  
Appropriate correction is required.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an 
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitations use a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations is: utterance detecting module for detecting, context information determining module for determining, role determining module for determining, role attribute determining module for determining, voice model selecting module for selecting, and voice generating module for generating in claim 14.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:


Claims 1, 14, and 15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite “detecting at least a first utterance from the document; determining context information of the first utterance from the document; determining a first role corresponding to the first utterance from the context information of the first utterance; determining attributes of the first role; selecting a voice model corresponding to the first role based at least on the attributes of the first role; and generating voice corresponding to the first utterance through the voice model.” The claim elements under their broadest reasonable interpretation cover the concepts of analyzing a document, determining information from the document, selecting a voice model, and generating a voice. These elements are mental processes and can be performed in the human mind by a person reading a document, determining information from the document, deciding on voice characteristics that match the information, and speaking aloud in a voice corresponding to those characteristics (see MPEP § 2106.04(a)(2), subsection III).
This judicial exception is not integrated into a practical application because the claimed elements for performing these functions ("at least one processor; and a memory storing computer-executable instructions [claim 15]") amount to no more than generic computer parts. Therefore, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, as discussed above with respect to integration of the abstract idea

	Claim 13 is directed to an abstract idea. The claim recites "obtaining the document; detecting at least one utterance and at least one descriptive part from the document; for each utterance in the at least one utterance: determining a role corresponding to the utterance, and generating voice corresponding to the utterance through a voice model corresponding to the role; and generating voice corresponding to the at least one descriptive part." The claim elements under their broadest reasonable interpretation cover the concepts of obtaining a document, analyzing the document, determining information from the document, and generating a voice. These elements are mental processes and can be performed in the human mind by a person reading a document, determining information from the document, deciding on voice characteristics that match the information, and speaking aloud in a voice corresponding to those characteristics (see MPEP § 2106.04(a)(2), subsection III).
However, this judicial exception is integrated into a practical application. The claim also recites “providing the audio file based on voice corresponding to the at least one utterance and the voice corresponding to the at least one descriptive part.” This element is meaningful because it limits the use of the abstract idea to the practical application of providing a specific audio file. Therefore the claim is patent eligible.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 14 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. Claim limitations 
“an utterance detecting module, for detecting at least a first utterance from the document; 
a context information determining module, for determining context information of the first utterance from the document; 
a role determining module, for determining a first role corresponding to the first utterance from the context information of the first utterance; 
a role attribute determining module, for determining attributes of the first role; 
a voice model selecting module, for selecting a voice model corresponding to the first role based at least on the attributes of the first role; and 
a voice generating module, for generating voice corresponding to the first utterance through the voice model” invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed functions and to clearly link the structure, material, or acts to the functions. The specification is devoid of adequate structure to perform the claimed functions. In particular, the specification provides hardware in the form of a general purpose computer comprising a processor and a memory but does not also provide any particular algorithm for performing the claimed functions. Mere reference to a general purpose computer with appropriate programming without providing an explanation of the appropriate programming is not adequate disclosure of the corresponding structure to satisfy the requirements of 35 U.S.C. 112(b)  or pre-AIA  35 U.S.C. 112, second paragraph. The specification does not provide sufficient details such that one of ordinary skill in the art would understand the written 
Applicant may:
(a)        Amend the claim so that the claim limitations will no longer be interpreted as limitations under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed functions, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the functions recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the functions so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed functions, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed functions and clearly links or associates the structure, material, or acts to the claimed functions, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed functions. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

The following is a quotation of the first paragraph of 35 U.S.C. 112(a):


The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claim  rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventors, at the time the application was filed, had possession of the claimed invention. As described above, the disclosure does not provide adequate structure to perform the claimed functions of: 
an utterance detecting module, for detecting at least a first utterance from the document; 
a context information determining module, for determining context information of the first utterance from the document; 
a role determining module, for determining a first role corresponding to the first utterance from the context information of the first utterance; 
a role attribute determining module, for determining attributes of the first role; 
a voice model selecting module, for selecting a voice model corresponding to the first role based at least on the attributes of the first role; and 
a voice generating module, for generating voice corresponding to the first utterance through the voice model. The specification does not demonstrate that applicant has made an invention that achieves the claimed functions because the invention is not described with sufficient detail such that one of 

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-4, 8, 10, 13, and 14-15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Skuratovsky (Pat. No. US 8,326,629 B2).
Regarding claims 1, Skuratovsky teaches a method for generating audio for a plain text document (Spec. Col 1, lines 55-57), comprising: 
detecting at least a first utterance from the document (Spec. Col 4, line 39; the spoken passage, i.e. a first utterance, is “Hi Mary”); 
determining context information of the first utterance from the document (Spec. Col 4, lines 39-40; the sample text phrase “’Hi Mary’, Tom said. ‘How was your day?’” includes context information of the first utterance related to the speaker); 
determining a first role corresponding to the first utterance from the context information of the first utterance (Spec. Col 1, lines 57-60; the method determines speaker identity, i.e. a first role, corresponding to the spoken passage, i.e. the first utterance); 
determining attributes of the first role (Spec. Col 1, lines 57-61; the method determines speaker gender, i.e. an attribute); 

generating voice corresponding to the first utterance through the voice model (Spec. Col 1, line 67- Col 2, line 7; the method converts the utterance to speech, i.e. generating voice, by applying the voice configuration, i.e. the voice model).

Regarding claim 2, Skuratovsky further teaches wherein the context information of the first utterance comprises at least one of: 
the first utterance (Spec. Col 4, line 39; the spoken passage, i.e. first utterance, is “Hi Mary”); 
a first descriptive part in a first sentence including the first utterance (Spec. Col 4, line 39; the phrase “Tom said” is the first descriptive part, i.e. part of sentence which is not the utterance); and 
at least a second sentence adjacent to the first sentence including the first utterance (Spec. Col 4, line 40; “How was your day?”).

Regarding claim 3, Skuratovsky further teaches wherein the determining the first role corresponding to the first utterance (Spec. Col 4, lines 34-37; speaker identity is speaker role and spoken passages are utterances) comprises: 
performing natural language understanding on the context information of the first utterance (Spec. Col 4, lines 34-37; semantic interpretation is natural language understanding) to obtain at least one feature of the following features: part-of-speech of words in the context information (Spec. Col 4, lines 46-47; “Tom” is labelled a proper name, i.e. a part of speech), results of syntactic parsing on the context information (Spec. Col 4, lines 44-46; syntactic parsing on the input phrase “’Hi Mary’, Tom said. 
identifying the first role based on the at least one feature (Spec. Col 4, line 62-Col 5, line 1; the features discussed above are used for determining the speaker identity, i.e. first role, for the utterance as PROPER_NAME which corresponds to Tom).

Regarding claim 4, Skuratovsky further teaches wherein the determining the first role corresponding to the first utterance comprises: 
performing natural language understanding on the context information of the first utterance (Spec. Col 4, lines 34-37; semantic interpretation is natural language understanding) to obtain at least one feature of the following features: part-of-speech of words in the context information (Spec. Col 4, lines 46-47; “Tom” is labelled a proper name, i.e. a part of speech), results of syntactic parsing on the context information (Spec. Col 4, lines 44-46; syntactic parsing on the input phrase “’Hi Mary’, Tom said. ‘How was your day?’” [determined to be context information as described above with respect to claim 2] produces the resulting sentence structure “SPOKEN_PASSAGE COMMA PROPER_NAME SPEAKING_REF PERIOD SPOKEN_PASSAGE PERIOD”), and results of semantic understanding on the context information (Spec. Col 4, lines 47-48; semantic understanding of “said” produces the result of the meaning “speaking”);  
providing the at least one feature to a role classification model (Spec. Col 5, lines 3-5; the statistical model, which uses the features discussed above to determine speaker identity is considered to be a role classification model); and 


Regarding claim 8, Skuratovsky further teaches 
detecting at least a second utterance from the document (Fig. 1 element 110 shows a first and second utterance, underlined, from the document, the second utterance being “’So, as I was saying….’”); 
determining context information of the second utterance from the document (Fig. 1 element 110 shows context information in the form of the utterance [“’So, as I was saying….’”], a descriptive part in the sentence including the utterance [“Tom continued,”], and a second sentence adjacent to the sentence including the utterance [“’How are you?’, replied Mary.”], similar to the context information of claim 2); 
determining a second role corresponding to the second utterance from the context information of the second utterance (Spec. Col 4, lines 10-12; each spoken passage, i.e. utterance, is associated with a speaker identity, i.e. role, based on the text source, i.e. context information. “Tom” and “Tom Smith” are identified as first and second roles); 
determining that the second role corresponds to the first role (Fig. 1 element 115 shows “Tom” and “Tom Smith” in the same cell of the graph, indicating they correspond to one another); and 
performing co-reference resolution on the first role and the second role (Spec. Col 4, lines 14-16; the speakers “Tom” and “Tom Smith,” i.e. the first and second roles, are attributed to the same character).

Regarding claim 10, Skuratovsky further teaches wherein the generating the voice corresponding to the first utterance (Spec. Col 5, lines 58-61) comprises: 

generating the voice corresponding to the first utterance through applying the at least one voice parameter to the voice model (Spec. Col 6, lines 56–58; speech is generated using the voice configuration, i.e. voice model; Spec. Col 5, lines 48-51: the voice configurations are applied according to the attributes, i.e. parameters).

Regarding claim 13, Skuratovsky teaches a method for providing an audio file based on a plain text document (Spec. Col 1, lines 55-57; a method is provided for generating the audio, Spec. Col 3, lines 9-14; digital recording, i.e. an audio file, is made from the audio), comprising: 
obtaining the document (Spec. Col 5, line 67– Col 6, line 4; the text source being loaded into the text processing system is considered to be the system obtaining the document); 
detecting at least one utterance and at least one descriptive part from the document (Spec. Col 1, lines 55-57; spoken passages correspond to utterances and non-spoken passages correspond to descriptive parts of the document); 
for each utterance in the at least one utterance: 

generating voice corresponding to the utterance through a voice model corresponding to the role (Spec. Col 1, line 67- Col 2, line 7; the voice configuration is considered to be the voice model); and 
generating voice corresponding to the at least one descriptive part (Spec. Col 1, line 67- Col 2, line 7; the system also generates voice for the non-spoken passages, i.e. the descriptive parts); and 
providing the audio file based on voice corresponding to the at least one utterance and the voice corresponding to the at least one descriptive part (Spec. Col 3, lines 9-14; the invention makes a digital recording, i.e. an audio file, with the embodiments of the patent, i.e. the speech including the at least one utterance, i.e. spoken passage, and at least one descriptive part, i.e. non-spoken passages described above).

Regarding claim 14, the claim is directed to an apparatus for generating audio for a plain text document, comprising: 
an utterance detecting module, for detecting at least a first utterance from the document; 
a context information determining module, for determining context information of the first utterance from the document; 
a role determining module, for determining a first role corresponding to the first utterance from the context information of the first utterance; 
a role attribute determining module, for determining attributes of the first role; 
a voice model selecting module, for selecting a voice model corresponding to the first role based at least on the attributes of the first role; and 


Regarding claim 15, the claim is directed to an apparatus for generating audio for a plain text document (Spec. Col 1, lines 55-57), comprising: 
at least one processor (Spec. Col 2, lines 8-10: a computer system, which includes a processor); and 
a memory storing computer-executable instructions (Spec. Col 2, lines 26-30) that, when executed, cause the processor to perform the features presented in the claimed method of claim 1. Skuratovsky teaches an apparatus comprising these elements for performing the method of claim 1, therefore claim 15 is rejected under the same grounds. 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the 
Claims 5-7 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Skuratovsky in view of Lester (Pat. No. US 8,972,265 B1), which incorporates Story, et. al. (Pat. No. US 8,606,775 B2), hereinafter Story.

Regarding claim 5, Skuratovsky teaches all of the elements of the current invention as stated above, except 
determining at least one candidate role from the document, 
wherein the determining the first role corresponding to the first utterance comprises: selecting the first role from the at least one candidate role.
Lester teaches a content customization service which renders audio versions of textual content documents such as audiobooks (Spec. Col 1, lines 24-28) in multiple voices (Spec. Col 1, lines 62-64).  Lester further teaches 
determining at least one candidate role from the document (Spec. Col 2, lines 18-24; named entity extraction comprises determining characters, etc., i.e. candidate roles, from the content), 
wherein the determining the first role corresponding to the first utterance comprises: selecting the first role from the at least one candidate role (Spec. Col 2, lines 24-26; mapping lines of dialog, i.e. utterances, to a character from the content is considered to be selecting a first role, i.e. the character, from the candidate roles, the characters determined from the text by named entity extraction).


Regarding claim 6, the combination of Skuratovsky and Lester teach all of the elements of the current invention as stated above with regards to claim 5. The combination further teaches wherein 
the at least one candidate role is determined based on at least one of: a candidate role classification model (Spec. Col 5, lines 3-5; the statistical model, which uses the features discussed above to determine speaker identity is considered to be a role classification model), predetermined language patterns (Spec. Col 4, line 58 – Col 5 line 1; the statistical model uses the rules, i.e. predetermined language patterns, to determine the speaker, i.e. the role), and a sequence labeling model (Spec. Col 4 lines 34-37: the semantic interpreter and the statistical model combined is considered to be the 
the candidate role classification model adopts at least one feature of the following features: word frequency (Story, incorporated into Lester and providing an exemplary system and method for determining speaker presence in content, which is used to maintain the stack of candidate roles [Lester – Spec. Col 13, lines 35-50, Col 14, lines 13-22] - Spec. Col 5, lines 6-11; the user may view a graphical representation of the frequency of a character name referenced in text), boundary entropy (Spec. Col 4, lines 38 – 47; the statistical model uses boundary entropy to determine the ends and beginnings of the parts-of-speech to determine the speaker, i.e. the role), and part-of-speech (Spec. Col 4, lines 38 – 47; the statistical model uses the parts-of-speech to determine the speaker, i.e. the role), 
the predetermined language patterns comprise combinations of part-of-speech and/or punctuation (Spec. Col 4, line 58 – Col 5 line 1; the statistical model uses the rules, i.e. predetermined language patterns, involving punctuation and parts-of-speech to determine the speaker, i.e. the role), and 
the sequence labeling model adopts at least one feature of the following features: key word, a combination of part-of-speech of words, and probability distribution of sequence elements (Spec. Col 4, lines 46-48: the semantic interpreter identifies the keyword “said” in determining a speaker, Spec. Col 4, line 58 – Col 5 line 1: the statistical model uses the rules to label the input sequence given in Col. 4 lines 39-40 by part-of-speech to determine the speaker, i.e. the role, Spec. Col 7, lines 54-63: the statistical model calculates and uses confidence scores, i.e. probability distributions, to classify text).

Regarding claim 7, Skuratovsky teaches all of the elements of the current invention as stated above, except 
determining that part-of-speech of the first role is a pronoun; and

Lester teaches a content customization service which renders audio versions of textual content documents such as audiobooks (Spec. Col 1, lines 24-28) in multiple voices (Spec. Col 1, lines 62-64). Lester further teaches 
determining that part-of-speech of the first role is a pronoun (Spec. Col 11, lines 35-38; dialog indicator including a pronoun is identified near dialog); and 
performing pronoun resolution on the first role (Spec. Col 11, lines 40-42; content customization service uses dialog indicator including a pronoun to map dialog, i.e. a first utterance, to a speaker, i.e. a first role. The pronoun in the dialog indicator corresponds to the speaker).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Skuratovsky to incorporate the teachings of Lester to provide the method according to claim 1, further comprising determining that part-of-speech of the first role is a pronoun; and performing pronoun resolution on the first role. Lester is considered to be analogous to Skuratovsky as both are directed towards rendering the audio version of digital text with different voices for different roles. Skuratovsky recognizes that when an audio rendition of a story has different vocal characteristics for different passages, such as different voices for male and female roles(Spec. Col 1, lines 27-33), the result is a more enjoyable listening experience for the user (Spec. Col 1, lines 44-51). Lester also understands that users may prefer a recording where different characters are voiced differently (Spec. Col 1, lines 9-11), and teaches a service for customizing the audio of text content to include multiple voices (Spec. Col 1, lines 62-64). Therefore, it would have been obvious to combine the features of both disclosures to solve the same problem, making digital text more pleasing to a user by using different voices for different roles.

Regarding claim 9, Skuratovsky teaches the method of claim 1 described above. Skuratovsky further teaches wherein the attributes of the first role comprise at least one of gender and physical condition (Spec. Col 6, lines 5-13: utterances are differentiated based on speaker identity, i.e. the role, then further by the attribute of gender. For examination purposes, the gender of the first role is also considered to be a physical condition), and the determining the attributes of the first role comprises: 
determining the attributes of the first role according to at least one of: an attribute table of a role voice database (Spec. Col 6, lines 30-37; attributes of the first role are determined from the reference table, i.e. attribute table, to select a corresponding voice configuration. As the system is selecting a particular voice configuration, there must be a database of such configurations from which to select), role address (Spec. Col 5, lines 23-28; the role “Tom” is addressed by name which is used to determine the gender attribute “male” for that role), and role name (Spec. Col 5, lines 23-28; role name “Tom” is used to determine the gender attribute “male” for that role).
However, Skuratovsky fails to teach wherein the attributes of the first role comprise age, profession, or character. Additionally, Skuratovsky fails to teach that the determining attributes of the first role comprises determining the attributes according to priori role information or role description. Lester teaches a content customization service which renders audio versions of textual content documents in multiple voices, as described above. Lester further teaches attributes of roles comprising age (Spec. Col 8, lines 25-29; attribute of role is age), profession (Fig. 7, elements 708 and 710 are labeled with roles comprising the profession, i.e. soldier, of the role), and character (Spec. Col 8, lines 47-48; attribute of role is character, i.e. Candide, the title character of the novel). Lester further teaches that the determining attributes of the first role comprises determining the attributes according to priori role information (Spec. Col 3, lines 20-30; it is known before the process of analyzing of the audiobooks that the character Jane Eyre has the female gender attribute) and role description (Spec. Col 8, lines 29-
While Skuratovsky does not teach determining attributes of the first role according to pronoun resolution as claimed, Skuratovsky does detail that the gender attribute is determined from the pronoun used in the passage. Lester further teaches the use of pronouns in dialog indicators in text for pronoun resolution on the first role, as described above with regards to claim 7. Adapting Skuratovsky’s method of determining gender from pronoun usage using the features as taught by Lester further discloses determining attributes of the first role comprises determining the attributes (Spec. Col 5, lines 11-15; the attribute gender is determined from the pronoun) according to pronoun resolution (Spec. Col 11, lines 40-42; content customization service uses dialog indicator including a pronoun to map dialog, i.e. a first utterance, to a speaker, i.e. a first role. The pronoun in the dialog indicator corresponds to the speaker and is now used with Skuratovsky to determine the gender of the speaker).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Skuratovsky to incorporate the teachings of Lester to provide the method according to claim 1, further comprising wherein the attributes of the first role comprise at least one of age, gender, profession, character and physical condition, and the determining the attributes of the first role comprises: determining the attributes of the first role according to at least one of: an attribute table of a role voice database, pronoun resolution, role address, role name, priori role information, and role description. Lester is considered to be analogous to Skuratovsky as both are directed towards rendering the audio version of digital text with different voices for different roles. Skuratovsky recognizes that when an audio rendition of a story has different vocal characteristics for different passages, such as different voices for male and female roles(Spec. Col 1, lines 27-33), the result is a more enjoyable listening experience for the user (Spec. Col 1, lines 44-51). Lester also understands that users may prefer a recording where different characters are voiced differently (Spec. Col 1, lines 9-.

Claims 11 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Skuratovsky in view of VanBlon, et. al (Pub. No. US 2017/0060365 A1), hereinafter VanBlon.

Regarding claim 11, Skuratovsky teaches all of the elements of the current invention as stated above, except determining a content category of the document and selecting a background music based on the content category.
VanBlon teaches a method of supplementing digital text with multimedia effects, involving determining a content category of the document (Spec. page 3, [0043], lines 1-4; genre, i.e. a content category, is used to select multimedia effects) and selecting a background music based on the content category (Spec. page 3, [0042], lines 5-6; multimedia effects include background music).
It would have been prima facie obvious to one of ordinary skill in the art before the effective
filing date of the claimed invention to have modified Skuratovsky to incorporate the teachings of VanBlon to provide the method according to claim 1, further comprising determining a content category of the document and selecting a background music based on the content category. VanBlon is considered to be analogous to Skuratovsky as both are directed towards rendering the audio version of digital text more pleasing to the listener. Skuratovsky understands that synthesized speech versions of textual media has traditionally been less pleasing to the listener as the speech sounds mechanical and monotone (Spec. Col 1, lines 44-48). VanBlon also recognizes the need for enhancing the user experience of reader (Spec. page 1, [0003], lines 5-8) and provides methods for augmenting the reading 

Regarding claim 12, Skuratovsky teaches all of the elements of the current invention as stated above, except 
detecting at least one sound effect object from the document, the at least one sound effect object comprising an onomatopoetic word, a scenario word or an action word; and 
selecting a corresponding sound effect for the sound effect object.
VanBlon teaches a method of supplementing digital text with multimedia effects, involving
detecting at least one sound effect object from the document, the at least one sound effect object comprising an onomatopoetic word, a scenario word or an action word (Spec. page 8, [0087], lines 1-5; the context module identifies sound effect objects “door slammed” and “birds chirp” i.e. scenario and action words, from the text); and 
selecting a corresponding sound effect for the sound effect object (Spec. page 8, [0087], lines 5-7; the matching module selects sound effects of door being slammed and birds chirping to correspond to the sound effect objects discussed above).
It would have been prima facie obvious to one of ordinary skill in the art before the effective
filing date of the claimed invention to have modified Skuratovsky to incorporate the teachings of VanBlon to provide the method according to claim 1, further comprising detecting at least one sound effect object from the document, the at least one sound effect object comprising an onomatopoetic word, a scenario word or an action word; and selecting a corresponding sound effect for the sound effect object. VanBlon is considered to be analogous to Skuratovsky as both are directed towards rendering the audio version of digital text more pleasing to the listener. Skuratovsky understands that .

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Kurzweil et al. (Pub. No. US 2016/0027431 A1) teaches a system and method for a user to change the voice model for selected portions of text such that the synthesized speech rendition of those portions are different from the narration voice model (Spec. page 1, [0007], lines 1-8). 
Chen et al. (Pub. No. US 2013/0191130 A1) teaches a speech synthesis method for producing synthesized speech such the prosody is similar to human speech (Spec. page 1, [0007]).
Bakis et al. (Pub. No. US 2005/0096909 A1) teaches a system and method for converting text to speech using style sheets to apply particular characteristics to the synthetic speech (Spec. page 1, [0005]).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PARKER L MAYFIELD whose telephone number is (571)272-4745. The examiner can normally be reached Monday - Friday 7:30 AM-5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/PARKER L MAYFIELD/
Examiner
Art Unit 2655

/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655