DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
This application takes the priority of foreign application CN201811567415.1 filed on 12/20/2018
Information Disclosure Statement
The information disclosure submitted on 2/26/2021 was after the mailing data of the first office action. The submission is in compliance with the pro/visions of 37 CFR 1.97. Accordingly, the information, disclosure statement is being considered by/ the examiner.

Response to Amendment 
Claims 1, 9 and 17 are amended. Claims 2-3 and 10-11 are cancelled. Claims 1, 4-9 and 12-17 are presented for examination. 
Response to Arguments
Applicant’s arguments filed on 4/20/2021 have been reviewed. Following are the responses: 
Claim Rejections - 35 U.S.C.  101 
In light of amendments rejection under 35 U.S.C. 101 is withdrawn
Claim Rejections - 35 U.S.C.  102 
Applicant notes “ Meng fails to disclose or suggest the features of original claim 3 now incorporated into claim 1, including the feature of determining the additional attribute and additional attribute priority corresponding to each of the pre-stored speakers according to the voice parameter information of the pre-stored speakers. As such, claim 1 cannot be anticipated by Meng.” Examiner agrees with applicant’s arguments and hence the rejection under 35 U.S.C. 102(a)(3)  as being unpatentable over Meng ( CN 108091321) is withdrawn. However upon further consideration a new ground(s) of rejection is given 35 
Claim Rejections - 35 U.S.C.  103 
Applicant argues “Additionally, Bao also fails to teach or suggest the above underlined feature, and thus cannot cure the defect of Meng.  Specifically, Bao discloses a voice synthesis method, comprising: determining an identity of a user according to currently input voice of the user; 
acquiring an acoustic model from a preset acoustic model library according to the currently input voice, wherein preset information of the acoustic model comprises more than one of preset sound speed, preset volume, preset pitch, preset timbre, preset intonation, and preset rhythmic rhythm;  determining basic voice synthesis information according to the identity of the user, wherein the basic voice synthesis information comprises a change amount of one or more of the preset sound speed, the preset volume, and the preset pitch; determining a reply text according to the currently input voice;  determining an enhanced voice synthesis information according to the reply text and context information of currently input voice, wherein the enhanced voice synthesis information comprises a change amount of one or more of the preset timbre, the preset intonation, and the preset rhythmic rhythm; and performing, by the acoustic model, voice synthesis on the reply text according to the basic voice synthesis information and the enhanced voice synthesis information. Page 12According to the voice synthesis method of Bao, the enhanced voice synthesis information (also called as enhanced TTS parameter, see par. 0005 line 10 of Bao) is determined according to reply text and context information of currently input voice.  Regarding reply text, claim 1 of Bao defines "determining a reply text according to the currently input voice", that is, the reply text is determined according to the currently input voice, but not according to the voice parameter information of the pre-stored speakers” However as in claim 1 of Bao the information is pre-stored, and based on the current input voice, acquire the preset/prestored user information. refer to Page 10 - The historical input voice of the user determined target character and the corresponding relationship between user preferred pronunciation, corresponding relation of the target character and the pronunciation of user preferences are associated with the identity of the user, and corresponding relation of the target character and the pronunciation of user preferences stored in the speech synthesis parameter database, the processor is specifically used for: When the target character in the reply text associated with the identity of the user by the acoustic model according to the target character with the corresponding relation between user preferred pronunciation, said basic speech synthesis information and the enhanced speech synthesis information to the response text to speech synthesis. Same concept is taught in Page 16,  Hence the concept of pre-stored user is taught. 

Applicant further contends “Further, claim 2 of Bao defines "the determining an enhanced voice synthesis information according to the reply text and context information of currently input voice, comprises determining literary style feature of the reply text according to the reply text, wherein the literary style feature comprises one or more of the number of sentences, the number of words in each sentence, and arrangement order of the number of sentences  in part or all of contents of the reply text." It can be seen that the reply text corresponds to one or more of the number of sentences, the number of words in each sentence, and arrangement order of the number of sentences, which is obviously not corresponding to voice parameter information as defined above in claim 1” However voice parameter is taught too, the method of Bao is taking into account multiple parameter including the voice and the text. Bao teaches for e.g. in claim 7 of Bao  - wherein, before determining the identity of the user according
to the user current input voice, further comprising: The historical input voice of the user determined target character and the corresponding relationship between user preferred pronunciation, corresponding relation of the target character and the pronunciation of user preferences association between the identity of the user, accordingly, the speech synthesis through the acoustic model, the response text according to the basic speech synthesis information and the enhanced speech synthesis information. Comprises: the target character in the reply text associated with the identity of the user by the acoustic model according to the target character with the corresponding relation between user preferred pronunciations, said basic speech synthesis information and the enhanced speech synthesis information to the response text to speech synthesis.

Applicant notes that “Regarding context information of currently input voice, par. 0008 of Bao records "the context information may represent context of the currently input voice or historically input voice before the currently input voice". It is obviously here that "context of the currently input voice or historically input voice" does not correspond to voice parameter information of claim 1 as listed above” However that not the case, Bao context information of the currently input voice could incorporate the historically input voice and further model is preset for the particular user ( hence the concept of pre-stored user is taught  in Bao). 

Appl. Ser. No. 16/565,784 
Response to Office Action 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.

4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 4-9 and 12-17 are rejected under 35 U.S.C. 103 as being unpatentable over Meng ( CN 108091321) and further  in view of Bao ( CN 108962217) 


Regarding claim 1, Meng teaches a  voice synthesis method, comprising: obtaining text information( obtaining text, Page 3)   and determining characters in the text information and a text content of each of the characters(since a speaking role may appear in the statement text multiple times, therefore, after identifying all the speaking character in the full text of the statement text, regular text, the speaker role to arranging multiple same speaking role identified to perform unified processing. For example, in an identified sentence text has multiple " female ", then after the final regular, multiple "woman" will be to do uniform process, that is uniform corresponds to a persona and its synthesizer parameter set, Page 4);  performing a character recognition on the text content of each of the characters, to determine character attribute information of the each of the characters( character role, Page 4-5); obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters( one to one correspondence – for e.g. role sub-role etc., Page 6), wherein the speakers are pre-stored speakers having the character attribute information( characters are pre-set, Page 6); and generating multi-character synthesized voices according to the text information and the speakers corresponding to the characters of the text information( multi character synthesis, Page 4-6), wherein the character attribute information comprises a basic attribute( gender and age, Page 6), and the basic attribute comprises at least one of a gender attribute and an age attribute; before the obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters the method further comprises: determining the basic attribute corresponding to each of the pre-stored speakers according  to voice parameter information ( speaker with the characteristics required, Page 6-7) 

Meng teaches character attribute has a formant information ( sorta similar to pronunciation), Page 6 however does not explicitly teaches wherein the character attribute information further comprises an additional attribute, and the additional attribute comprises at least one of the following: regional information, timbre information, and pronunciation style information; before the obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters, the method further comprises: Appl. Ser. No. 16/565,784 Response to Office Action Page 3 determining the additional attribute and additional attribute priority corresponding to each of the pre-stored speakers according to the voice parameter information of the pre-stored speakers, and correspondingly the obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters further comprises: determining, from speakers having the basic attribute corresponding to the characters, the speakers in one-to-one correspondence with the characters according to the additional attribute
However Bao teaches wherein the character attribute information further comprises an additional attribute, and the additional attribute comprises at least one of the following: regional information, timbre information, and pronunciation style information( pronunciation/ timbre information, Page 16); before the obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters, the method further comprises: Appl. Ser. No. 16/565,784 Response to Office Action Page 3 determining the additional attribute and additional attribute priority corresponding to each of the pre-stored speakers according to the voice parameter information of the pre-stored speakers( parameters are pre stored, Page 14; could be based on target character associated with user property or a particular speaker, page 15-16), and correspondingly the obtaining speakers in one-to-one correspondence with the ( based on all the parameter, choose a TTS voice, or example, preset information of the general acoustic model comprises the model of the preset speed, the preset volume, preset pitch, timbre, a preset predetermined intonation, rhythm preset in two or more than two; outside of the individual acoustic model of the preset information comprises the model of the preset speed, the preset volume, preset pitch, timbre, a preset predetermined intonation, rhythm preset in two or more than two, and may include other personalized information, such as a phrase, a response mode with the specified scene. Intelligent type, character type, language or dialect of the appellation of the special character language style features. to be understood that pre- set velocity of different acoustic models, the preset volume, preset pitch, timbre, a preset predetermined intonation, rhythm preset default information such as each is different, for example, personalized acoustic models of the preset information can be obviously different from the preset information of the universal acoustic model. In the embodiment of the invention, the acoustic model according to the change information of the preset information and the preset information, the reply text into response voice. the preset information of the change information that represents basic TTS parameter selected in speech synthesis, strengthening the TTS parameter, the target character and the mapping relation between the user preferred pronunciation, background sound and other information. synthesized by the general acoustic model of voice presentation of sound effects under normal and universal dialogue scene, sound effects dialogue scene and voice synthesized by a customized acoustic model capable of simulating an "char e dialogue scene realizing method will be described in detail hereinafter; and Referring to FIG. 7, after obtaining the input voice of the user voice response system, obtaining the reply text via speech recognition module and a voice conversation module; speech dialogue module determines basic TTS parameter associated with the identity from the TTS parameter database based on the current user identity, determining enhanced TTS parameters, background sound from the TTS parameter database based on the reply text, context information; if there is reply text associated with the user identity of the target character, then also determines the target character corresponding to the pronunciation of user preferences. after the voice synthesis module based on input voice or user preference (the preference of the user of the user associated with the identity of the user) or reply text, to invoke an appropriate acoustic model from the acoustic model database. and the acoustic model combined with TTS TTS parameters (basic parameters, strengthening TTS parameters, target character and the mapping relation between the user preferred pronunciation d in one or more) voice synthesis so as to generate reply speech for presentation to the user, Page 16; Claim 7) 

Meng has a base concept of a voice synthesis method, belonging to the technical field of voice processing; in said method, persona preset multiple and preset synthesizer parameter set, further comprising: obtaining a sentence text; from the sentence in the text analysis to obtain each reference part, and the corresponding part of speech of each reference character for sentence text global regular role and the role with the preset characters to match, respectively determining speech corresponding to the role of personas and synthesizer parameter set according to the matching result; The synthesizer parameter of each talking character set corresponding to the referenced part carrying out the voice synthesis to form corresponding to the synthetic speech of the statement text and output. Meng differed by the claimed concept of additional character information and the keyword information which can help determine the semantic meaning of the scene, Bao teaches the concept of timbre/pronunciation information and the scenario information. It would have been obvious to include the concept Bao into Meng to  improve the voice interactive experience of the user ( Abstract, Bao) 

Regarding claim 4, Bao as above in claim 3, teaches , wherein the determining, from speakers having the basic attribute corresponding to the characters, the speakers in one-to-one correspondence with the characters according to the additional attribute comprises:  15obtaining a character sound description class ( keyword class matching based on the scene, Page 34-- character terminal further is simulating, but can also can judge the content of the input text corresponding to the input voice of the user is related to content of the character simulating by the DM module. in specific implementation, the DM module by full text matching, keyword matching and semantic similarity matching is determined such as to reply content of the role these content includes simulating, lyrics, sound effect, movie actor and animation and dialogue script. wherein the text matching mode is that a portion of the input text with the corresponding video or musical works are the same, keyword matching way is that inputted text and a portion of video or music key words are the same, semantic similarity matching way is that inputted text and a portion of video or music semantic similarity matching. For example, the input text is " he has been the leading role, his PVRs day is not error, not dream of human is salty only. on the path is the dream , our effort will be harvested is enough. using the mode after the matching content is found in the input text of "not dream of human is salty only is belonging to matching content, content matching the dialogues of movie" Shaolin football "in dimensional is not ideal, and salted with what differences", voice is the role "" dubbing. then, setting the current dialog is character simulation " of the scene.) 
Regarding claim 5, Bao as above in claim 3, teaches  wherein the determining, from speakers having the basic attribute corresponding to the characters, the speakers in one-to-one correspondence with the characters according to the additional attribute comprises: in the speakers having the basic attribute corresponding to the characters, using speakers 25with highest additional attribute priorities as the speakers in one-to-one correspondence with the characters ( closest matching acoustic model, Page 33; also inherent from claim 15. The method according to claim 10-12 any one of said claims, wherein the acoustic model in the acoustic model library with a plurality of, the voice synthesis module is specifically used for: The identity of the user of the plurality of acoustic models selected in the acoustic model, determining the weight value of each acoustic model of the plurality of acoustic models, wherein the weight value of each acoustic model is a user-set, or the weight value of each acoustic model is pre-determined according to the favor of the user; the individual acoustic model based on the weight value fusion, the obtained acoustic model after fusion) 



Regarding claim 6, Meng as above in claim 1, teaches , wherein the obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters comprises:  30obtaining a candidate speaker for each of the characters according to the character 19431-039US attribute information of the each of the characters; displaying description information of the candidate speaker to a user and receiving an indication of the user; and obtaining the speakers in one-to-one correspondence with the characters in the candidate 5speaker of each of the characters according to the instruction of the user ( user confirms, Page 7- in the preferred embodiment of the present invention, in the step S3, are respectively matched corresponding to the characters for each of the speaker role, then outputting the match wing, and the user is
viewing and confirming the matching result and turn to the step S4l Specifically, when the set each persona or each sub-role needs to respectively set the corresponding role tags, such as "man" in the above, male 1, male 2 - "and" male 3. The more directly or a label such as "male", "young male", "mature male" and "middle-aged man" and so on. when the output matching result, which can be the basis of the statement text, the distribution of character roles corresponding to the role mark is added on the corresponding positions of sentence text, so as to form a character text and outputting it to the user to view. and outputting the final synthesized speech after the user confirms the character text. the user also can modify distribution of characters, so as to modify and output a er can manual intervention,
the synthesized voice to reach the better output effect) 

Regarding claim 7, Meng as above in claim 1, teaches wherein the generating multi-character synthesized voices according to the text information and the speakers corresponding to the characters of the text information comprises: processing a corresponding text content in the text information according to the speakers 10corresponding to the characters, to generate the multi-character synthesized voices ( claim 5. The speech synthesis method according to claim 1, wherein the predetermined plurality of the persona
includes a representation voiceover of the voiceover role, in said step S3, the statement text is removed from the reference portion of the speaking character part matching with the voiceover role, in the step S4, using the synthesizer parameters corresponding to the vo entence speech character of the reference part and the part for voice synthesis, Page 7) 


Regarding claim 8, Meng as above in claim 7, does not explicitly teaches wherein after the processing a corresponding text content in the text information according to the speakers corresponding to the characters, to generate the multi-character synthesized voices, the method further comprises: obtaining background audios that are matched with a plurality of consecutive text 15contents in the text information; and adding the background audio to voices corresponding to the plurality of text contents, in the multi-character synthesized voices
However Bao teaches wherein after the processing a corresponding text content in the text information according to the speakers corresponding to the characters, to generate the multi-character synthesized voices, the method further comprises: obtaining background audios that are matched with a plurality of consecutive text 15contents in the text information; and adding the background audio to voices corresponding to the plurality of text contents, in the multi-character synthesized voices  ( superimposing the background sound, Page 35-- In the specific embodiment of the invention, after the 
step 701, the terminal is preset with a music library. specific examples, in the TTS parameter database of the terminal is pre-set with a music library, the music library comprises a plurality of music files, the music files for providing background sound in the speech synthesis process, the background sound is music of a music section (such as pure music or song) or a sound effect (such as movie sound, game sound, language sound, animation sound, etc.).

step 702, the terminal determines the reply text is suitable for overlapping content of the background music.
embodiments, the terminal can be determined by DM module suitable for overlapping content of the background music. the suitable background music content can have emotional polar character can be poetry and music, can be video lines. For example, the terminal may identify an emotion tendency word in the sentence by the DM) 

It would have been obvious having the teachings of Meng to further include the concept of Bao before effective filing date to improve user experience ( Abstract, Bao) 

Regarding claim 9, arguments analogous to claim 1, are applicable 
Regarding claim 12, arguments analogous to claim 4, are applicable 
Regarding claim 13, arguments analogous to claim 5, are applicable 
Regarding claim 14, arguments analogous to claim 6, are applicable 
Regarding claim 15, arguments analogous to claim 7, are applicable 

Regarding claim 17, Meng as above in claim 1, teaches  A storage medium comprising a non-transitory readable storage medium and computer instructions stored in the non-transitory readable storage medium; the computer instructions are configured to implement the voice synthesis method according to claim 1 ( executed by the processor, Page 4) 
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RICHA MISHRA whose telephone number is (571)272-5357.  The examiner can normally be reached on M-T 7AM - 5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Tieu can be reached on (571)272-7490.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained 






/RICHA MISHRA/Primary Examiner, Art Unit 2674