DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
No claims are amended. Claims 2-3, 5, 10-11 and 13 are cancelled. Claims 1, 4, 6-9, 12 and 14-17 are presented for examination. 
Response to Arguments
Applicant’s arguments filed on 2/23/2022 have been reviewed. Following are the response to applicant’s arguments: 

Claim Rejections - 35 U.S.C. @ 103 
Applicant argues “the disclosure of paragraphs [0110] to [0119], especially the features "The weighting can be projected Response to Office ActionPage 4onto their own space, a "weights space" with initially a weight representing each dimension. This space can be rearranged into a different space which dimensions represent different voice attributes.... if the modelled voice characteristic is expression, one dimension may indicate happy voice characteristics, another nervous etc., the user may select to increase the weighting on the happy voice dimension so that this voice characteristic dominates... The system determines the weightings automatically.... The system may recognize from the text when something is being spoken by a character in the book opposed to the narrator, for example from quotation marks, and change the weighting to introduce a new voice characteristic to the output....", it can be seen that the weighting in the disclosure of Latorre actually represents different features in the voice attribute, and can be changed or adjusted in respect to specific requirements or according to the content of the text that to be modelled to voice, so as to change the voice attribute of the speaker voice. Therefore, the weighting for the speaker and speaker attribute is different from the determining the additional attribute and additional attribute priority corresponding to each of the pre- stored speaker according to the voice parameter information of the pre-stored speakers as defined in the pending claim 1” However examiner has not relied on these paragraphs of Latorre for the concept of claim 1. Further even if these para suggest a different weights of Latorre as in claim 1, Latorre also teaches speaker attribute related to the style or the accent, Para 0034-0037; one-to-one correspondence based pronounced word, Para 0086; attributes has weights, Para 0097, Fig 4, Para 0113- prestored weights, memory v. 1 


Applicant further contends “Moreover, even though Latorre discloses in paragraphs [0094]-[0096] that "The system of FIG. 4 can output speech using a number of different speakers with a number of different voice attributes. For example, voice attributes may be selected from a voice sounding, happy, sad, angry, nervous, calm, commanding, etc. The speaker may be selected from a range of potential speaking voices such as a male voice, young female voice etc. Instep S204, the desired speaker is determined.... In step 206, the speaker attribute which to be used for the voice is selected. The speaker attribute may be selected from a number of different categories. For example, the categories may be selected from emotion, accent, etc., ... the attributes may be: happy, sad, anger, etc", it can be seen in combination with the disclosure about the weighting in the Latorre that, the speaker and speaker attribute are not directly obtained from a plurality of speakers with different voice attribute according to the voice attribute. On the contrary, the speaker and the speaker attributes of the output speech are determined by adjusting Page 5the weighting of different voice attribute and applying the weight to model parameters of the acoustic model. Therefore, Latorre also fails to disclose the determining of the speakers in one-to-one correspondence with the characters according to the additional attribute from speakers having the basic attribute corresponding to the characters, and specifically using speakers with highest additional attribute priorities as the speakers in one-to-one correspondence with the characters as defined in pending claim 1” However examiner relied on the combination of Meng and Latorre to teach above concepts, the plurality of speaking with same basic attributes is taught by Meng and it can be modified by the concept of Latorre to have different weights for additional attribute which can selected. 
Response to Office Action 



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 6-7, 9, 14-15 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Meng ( CN 108091321) and further  in view of Latorre ( US Pub: 20130262119)

Regarding claim 1, Meng teaches a  voice synthesis method, comprising: obtaining text information( obtaining text, Page 3)   and determining characters in the text information and a text content of each of the characters(since a speaking role may appear in the statement text multiple times, therefore, after identifying all the speaking character in the full text of the statement text, regular text, the speaker role to arranging multiple same speaking role identified to perform unified processing. For example, in an identified sentence text has multiple " female ", then after the final regular, multiple "woman" will be to do uniform process, that is uniform corresponds to a persona and its synthesizer parameter set, Page 4);  performing a character recognition on the text content of each of the characters, to determine character attribute information of the each of the characters( character role, Page 4-5); obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters( one to one correspondence – for e.g. role sub-role etc., Page 6), wherein the speakers are pre-stored speakers having the character attribute information( characters are pre-set, Page 6); and generating multi-character synthesized voices according to the text information and the speakers corresponding to the characters of the text information( multi character synthesis, Page 4-6), wherein the character attribute information comprises a basic attribute( gender and age, Page 6), and the basic attribute comprises at least one of a gender attribute and an age attribute; before the obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters the method further comprises: determining the basic attribute corresponding to each of the pre-stored speakers according  to voice parameter information of the pre-stored speakers; and correspondingly the obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters comprises: for each of the characters, obtaining a speaker having the basic attribute corresponding to the each of the characters( speaker with the characteristics required, Page 6-7) 

Meng teaches character attribute has a formant information (  similar to pronunciation), Page 6 however does not explicitly teaches wherein the character attribute information further comprises an additional attribute, and the additional attribute comprises at least one of the following: regional information, timbre information, and pronunciation style information; before the obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters, the method further comprises: Appl. Ser. No. 16/565,784 Response to Office Action Page 3 determining the additional attribute and additional attribute priority corresponding to each of the pre-stored speakers according to the voice parameter information of the pre-stored speakers, and correspondingly the obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters further comprises: determining, from speakers having the basic attribute corresponding to the characters, the speakers in one-to-one correspondence with the characters according to the additional attribute

However Latorre teaches wherein the character attribute information further comprises an additional attribute, and the additional attribute comprises at least one of the following: regional information, timbre information, and pronunciation style information ( speaker attribute related to the style or the accent, Para 0034-0037; one-to-one correspondence based pronounced word, Para 0086) ; before the obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters, the method further comprises: Appl. Ser. No. 16/565,784 Response to Office Action Page 3 determining the additional attribute and additional attribute priority corresponding to each of the pre-stored speakers according to the voice parameter information of the pre-stored speakers ( attributes has weights, Para 0097, Fig 4, Para 0113- prestored weights, memory) , and correspondingly the obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters further comprises: determining, from speakers having the basic attribute corresponding to the characters, the speakers in one-to-one correspondence with the characters according to the additional attribute ( male/female and speaking style accent etc, Fig 4, Para 0094, 0195) 

Meng has a base concept of a voice synthesis method, belonging to the technical field of voice processing; in said method, persona preset multiple and preset synthesizer parameter set, further comprising: obtaining a sentence text; from the sentence in the text analysis to obtain each reference part, and the corresponding part of speech of each reference character for sentence text global regular role and the role with the preset characters to match, respectively determining speech corresponding to the role of personas and synthesizer parameter set according to the matching result; The synthesizer parameter of each talking character set corresponding to the referenced part carrying out the voice synthesis to form corresponding to the synthetic speech of the statement text and output. Meng differed by the claimed concept of additional character information and the keyword information which can help determine the semantic meaning of the scene, Latorre teaches the concept of accent/speaking style of the stored speaker and the scenario information. It would have been obvious to include the concept Latorre into Meng to  to make systems sound more like a human voice ( Para 0004, Latorre) 

Regarding claim 6, Meng as above in claim 1, teaches , wherein the obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters comprises:  30obtaining a candidate speaker for each of the characters according to the character 19431-039US attribute information of the each of the characters; displaying description information of the candidate speaker to a user and receiving an indication of the user; and obtaining the speakers in one-to-one correspondence with the characters in the candidate 5speaker of each of the characters according to the instruction of the user ( user confirms, Page 7- in the preferred embodiment of the present invention, in the step S3, are respectively matched corresponding to the characters for each of the speaker role, then outputting the match wing, and the user is
viewing and confirming the matching result and turn to the step S4l Specifically, when the set each persona or each sub-role needs to respectively set the corresponding role tags, such as "man" in the above, male 1, male 2 - "and" male 3. The more directly or a label such as "male", "young male", "mature male" and "middle-aged man" and so on. when the output matching result, which can be the basis of the statement text, the distribution of character roles corresponding to the role mark is added on the corresponding positions of sentence text, so as to form a character text and outputting it to the user to view. and outputting the final synthesized speech after the user confirms the character text. the user also can modify distribution of characters, so as to modify and output a er can manual intervention,
the synthesized voice to reach the better output effect) 

Regarding claim 7, Meng as above in claim 1, teaches wherein the generating multi-character synthesized voices according to the text information and the speakers corresponding to the characters of the text information comprises: processing a corresponding text content in the text information according to the speakers 10corresponding to the characters, to generate the multi-character synthesized voices ( claim 5. The speech synthesis method according to claim 1, wherein the predetermined plurality of the persona
includes a representation voiceover of the voiceover role, in said step S3, the statement text is removed from the reference portion of the speaking character part matching with the voiceover role, in the step S4, using the synthesizer parameters corresponding to the vo entence speech character of the reference part and the part for voice synthesis, Page 7) 


Regarding claim 9, arguments analogous to claim 1, are applicable 
Regarding claim 14, arguments analogous to claim 6, are applicable 
Regarding claim 15, arguments analogous to claim 7, are applicable 
Regarding claim 17, Meng as above in claim 1, teaches  A storage medium comprising a non-transitory readable storage medium and computer instructions stored in the non-transitory readable storage medium; the computer instructions are configured to implement the voice synthesis method according to claim 1 ( executed by the processor, Page 4) 

Claims 4, 8, 12 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Meng ( CN 108091321) and further in view of  Latorre ( US Pub: 20130262119)and further  in view of Bao ( CN 108962217) 

Regarding claim 4,  Meng modified by Latorre as above in claim 1, does not explicitly teaches  , wherein the determining, from speakers having the basic attribute corresponding to the characters, the speakers in one-to-one correspondence with the characters according to the additional attribute comprises:  15obtaining a character sound description class keyword in text contents of the characters; determining the additional attribute corresponding to the characters according to the character voice description class keyword; in the speakers having the basic attribute corresponding to the characters, determining the speakers in one-to-one correspondence with the characters having the additional attribute 20corresponds to the characters 
However Bao teaches the speakers in one-to-one correspondence with the characters according to the additional attribute comprises:  obtaining a character sound description class keyword in text contents of the characters; determining the additional attribute corresponding to the characters according to the character voice description class keyword; in the speakers having the basic attribute corresponding to the characters, determining the speakers in one-to-one correspondence with the characters having the additional attribute 20corresponds to the characters ( keyword class matching based on the scene, Page 34-- character terminal further is simulating, but can also can judge the content of the input text corresponding to the input voice of the user is related to content of the character simulating by the DM module. in specific implementation, the DM module by full text matching, keyword matching and semantic similarity matching is determined such as to reply content of the role these content includes simulating, lyrics, sound effect, movie actor and animation and dialogue script. wherein the text matching mode is that a portion of the input text with the corresponding video or musical works are the same, keyword matching way is that inputted text and a portion of video or music key words are the same, semantic similarity matching way is that inputted text and a portion of video or music semantic similarity matching. For example, the input text is " he has been the leading role, his PVRs day is not error, not dream of human is salty only. on the path is the dream , our effort will be harvested is enough. using the mode after the matching content is found in the input text of "not dream of human is salty only is belonging to matching content, content matching the dialogues of movie" Shaolin football "in dimensional is not ideal, and salted with what differences", voice is the role "" dubbing. then, setting the current dialog is character simulation " of the scene.) 
It would have been obvious having the teachings of Meng and Latorre to further include the concepts of Bao before effective filing date so that even without clear intention the content of the dialogue can be determined and weights/voice can be incorporated for the particular content automatically via semantic analysis ( Page 5, Bao) 


Regarding claim 8, Meng as above in claim 7, does not explicitly teaches wherein after the processing a corresponding text content in the text information according to the speakers corresponding to the characters, to generate the multi-character synthesized voices, the method further comprises: obtaining background audios that are matched with a plurality of consecutive text 15contents in the text information; and adding the background audio to voices corresponding to the plurality of text contents, in the multi-character synthesized voices
However Bao teaches wherein after the processing a corresponding text content in the text information according to the speakers corresponding to the characters, to generate the multi-character synthesized voices, the method further comprises: obtaining background audios that are matched with a plurality of consecutive text 15contents in the text information; and adding the background audio to voices corresponding to the plurality of text contents, in the multi-character synthesized voices  ( superimposing the background sound, Page 35-- In the specific embodiment of the invention, after the synthesized speech, in order to improve the expression effect of each TTS parameters, also can be the output synthesized speech, superposed background sound. Hereinafter the synthetic voice superposing background sound scene as an example to describe the speech synthesis method of the embodiment of the invention, referring to FIG. 29, the method can be described by following several steps:
step 701, the terminal is preset with a music library. specific examples, in the TTS parameter database of the terminal is pre-set with a music library, the music library comprises a plurality of music files, the music files for providing background sound in the speech synthesis process, the background sound is music of a music section (such as pure music or song) or a sound effect (such as movie sound, game sound, language sound, animation sound, etc.).

step 702, the terminal determines the reply text is suitable for overlapping content of the background music.
embodiments, the terminal can be determined by DM module suitable for overlapping content of the background music. the suitable background music content can have emotional polar character can be poetry and music, can be video lines. For example, the terminal may identify an emotion tendency word in the sentence by the DM) 

It would have been obvious having the teachings of Meng to further include the concept of Bao before effective filing date to improve user experience ( Abstract, Bao) 

Regarding claim 12, arguments analogous to claim 4, are applicable 
Regarding claim 16, arguments analogous to claim 8, are applicable
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RICHA MISHRA whose telephone number is (571)272-5357. The examiner can normally be reached M-T 7AM - 5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Tieu can be reached on (571)272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RICHA MISHRA/Primary Examiner, Art Unit 2674