Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 1/22/2020 and 7/28/2020 are being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 15 is drawn to a “signal” per se as recited in the preamble and as such are non-statutory subject matter. In [00293] of the As Filed Specification, the term “computer readable recording medium" is not defined, therefore the examiner is using the plain meaning of the term,  which includes data signals. Hence, one of ordinary skilled in the art can interpret such term to include transitory signals and non-transitory signals. It does not appear that a claim reciting a signal encoded with functional descriptive material falls within any of the categories of patentable subject matter set forth in § 101. First, a claimed signal is clearly not a "process" under § 101 because it is not a series of steps. The other three § 101 classes of machine, compositions of matter and manufactures "relate to structural entities and can be grouped as 'product' claims in order to contrast them with process claims." 1 D. Chisum, Patents § 1.02 (1994). 
The Applicant's As Filed Specification in [00293], refers to the “computer-readable recording medium” but does not .


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 6, 7, 8, 10, 13, 14 and 15 rejected under 35 U.S.C. 103 as being unpatentable over Edrenkin (US 20170092258 A1) and in further view of Kim (US 20080010070 A1) and Schroeter (US 20080065383 A1).
With respect to claim 1, 8 and 15, Edrenkin teaches An electronic device/method/computer-readable medium including a program/comprising: a memory and a processor connected to the memory ([0040] The implementations of the server 102 are well known in the art. So, suffice it to state, that the server 102 comprises inter alia a network communication interface 109 (such as a modem, a network card and the like) for two-way communication over a communication network 110; and a processor 108 coupled to the network communication interface 109 and the information storage medium 104, the processor 108 being configured to execute various routines, including those described herein below. To that end the processor 108 may have access to computer readable instructions stored on the information storage medium 104, which instructions, when executed, cause the processor 108 to execute the various routines described herein, and [0069] The client device 112 further comprises a computer usable information storage medium (also referred to as a local memory 114). Local memory 114 can comprise any type of media, including but not limited to RAM, ROM, disks (CD-ROMs, DVDs, floppy disks, hard drivers, etc.), USB keys, solid state-drives, tape drives, etc. Generally speaking, the purpose of the local memory 114 is to store computer readable instructions as well as any other data.).
Edrenkin does not teach wherein the processor is configured to: acquire text to respond to a user's speech received by the electronic device, acquire a plurality of pieces of parameter information for determining a style of an output speech corresponding to the text based on information on a type of a plurality of text-to-speech (TTS) databases and the user's speech, identify a TTS database corresponding to the plurality of pieces of parameter information among the plurality of TTS databases, identify a weight set corresponding to the plurality of pieces of parameter information among a plurality of weight sets acquired through a trained artificial intelligence model, adjust information on the output speech stored in the TTS database based on the weight set, synthesize the output speech based on the adjusted information on the output speech, and output the output speech corresponding to the text.  
Kim teaches wherein the processor is configured to: acquire text to respond to a user's speech received by the electronic device ([0038] The sentence contents database 106 stores therein the sentence contents to be used in a user response sentence, for example, a weather search, a schedule management, a news search, a TV program guide, an email management, etc.), acquire a plurality of pieces of parameter information for determining a style of an output speech corresponding to the text based on information on a type of a plurality of text-to-speech (TTS) databases and the user's speech ([0061] The sentence selector 1082 extracts harmonizing features from the user's input speech by using the user speech harmonizing rules stored in database 1083 (S110). The harmonizing rule database 1083 stores therein data of harmonizing features (i.e., harmonizing rules), e.g., such as a table for difficulty levels of words; a table for adverbs which expresses intensity of meaning; a table for emotional interjections, emotional adjectives, emotional nouns, and the like. [there are multiple tables and each table is a database, and the collection of tables maps to a plurality of databases ), identify a TTS database corresponding to the plurality of pieces of parameter information among the plurality of TTS databases ([0061] The harmonizing rule database 1083 stores therein data of harmonizing features (i.e., harmonizing rules), e.g., such as a table for difficulty levels of words; a table for adverbs which expresses intensity of meaning; a table for emotional interjections, emotional adjectives, emotional nouns, and the like.) [[identify a weight set corresponding to the plurality of pieces of parameter information among a plurality of weight sets acquired through a trained artificial intelligence model, adjust information on the output speech stored in the TTS database based on the weight set, synthesize the output speech based on the adjusted information on the output speech, and output the output speech]] 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Edrenkin to include the teachings of Kim motivation being to make interactive speech as natural as a dialog between persons by generating an output that corresponds to intentions and situations (Kim, [0014]).


Schroeter teaches identify a weight set corresponding to the plurality of pieces of parameter information among a plurality of weight sets acquired through a trained artificial intelligence model ([0005] A system, method and computer readable medium that trains a text-to-speech synthesis system for use in speech synthesis is disclosed. The method may include recording audio files of one or more live voices speaking language used in a specific domain, the audio files being recorded using various prosodies, storing the recorded audio files in a speech database; and training a text-to-speech synthesis system using the speech database, wherein the text-to-speech synthesis system selects audio selects audio segments having a prosody based on at least one dialog state and one speech act. [the weights are the weights of the trained network. Input prosody is a parameter], and, [0028] In addition, the audio files may be tagged for dialog state and speech act. At step 4400, the domain-specific speech knowledge module 160 trains the TTS system 150 using the speech database 150"), adjust information on the output speech stored in the TTS database based on the weight set ([0028] At step 4400, the domain-specific speech knowledge module 160 trains the TTS system 150 using the speech database 150.Therefore, the unit selector 220 of the TTS system 150 may select audio segments from respective audio files having an appropriate prosody based on a given dialog state and a speech act.[TTS system 150 is trained and the trained weights are used to adjust output speech]), synthesize the output speech based on the adjusted information on the output speech (([0028] At step 4400, the domain-specific speech knowledge module 160 trains the TTS system 150 using the speech database 150 Therefore, the unit selector 220 of the TTS system 150 may select audio segments from respective audio files having an appropriate prosody based on a given dialog state and a speech act.[TTS system 150 is trained and the trained weights are used to adjust output speech]), and output the output speech corresponding to the text ([0028] At step 4400, the domain-specific speech knowledge module 160 trains the TTS system 150 using the speech database 150 Therefore, the unit selector 220 of the TTS system 150 may select audio segments from respective audio files having an appropriate prosody based on a given dialog state and a speech act.[TTS system 150 is trained and the trained weights are used to adjust output speech])
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Edrenkin and Kim to include the teachings of Schroeter motivation being the use of TTS systems trained and maintained by domain-specific  speech knowledge greatly improves IVR services( Schroeter,  [0033]).

With respect to claims 3 and 10 Kim further teaches acquire information on an acoustic feature of the user's speech based on the user's speech, and acquire at least one of the plurality of pieces of parameter information based on the acquired information on the acoustic feature ([0036] The speech recognition unit 100 performs a speech recognition and delivers a character sequence corresponding to the recognized speech to the dialog model unit 102. The speech recognition includes a process of detecting a user's input speech; a process of amplifying the speech detected to a specific level; a process of extracting feature parameters from the speech; and other processes necessary to perform the speech recognition.)  


With respect to claims 6 and 13, Kim further  teaches wherein the plurality of pieces of parameter information comprises at least one of information on a language of the output speech, information on a speaker of the output speech, information on a type of an application that provides information on the output speech, information on a tone of the output speech, information on a user's preference regarding the output speech, context information of a user corresponding to the user's speech, or context information of the electronic device ([0040] The system response unit 108 generates an output sentence by initially generating a plurality of candidate sentences, then selecting one of the candidate sentences which is determined to be harmonized with the user's input speech or expressing the situation of the system, and finally assigning an ending form of the sentence and an intonation pattern to the selected sentence.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Edrenkin to include the teachings of Kim motivation being to make interactive speech as natural as a dialog between persons by generating an output that corresponds to intentions and situations (Kim,  [0014]).

With respect to claims 7 and 14, Schroeter further  teaches 
wherein the plurality of weight sets comprises a plurality of weights for adjusting information on output speeches stored in the plurality of TTS databases ([0005] A system, method and computer readable medium that trains a text-to-speech synthesis system for use in speech synthesis is disclosed. The method may include recording audio files of one or more live voices speaking language used in a specific domain, the audio files being recorded using various prosodies, storing the recorded audio files in a speech database [recorded files based on various prosodies], and [0028] In addition, the audio files may be tagged for dialog state and speech act. At step 4400, the domain-specific speech knowledge module 160 trains the TTS system 150 using the speech database 150. Therefore, the unit selector 220 of the TTS system 150 may select audio segments from respective audio files having an appropriate prosody based on a given dialog state and a speech act. The dialog state may be the beginning (e.g, greeting), the middle (e.g, gathering customer account information), or the end or closing of the dialog (transfer to an agent/transition to another sub-system for further processing/end the call by thanking the caller and hanging up) for example. The speech act may concern the content of what is trying to be conveyed such as a greeting, an apology, thanks, a request for information, a question, a confirmation, etc. Therefore, speech database 170 used by the TTS system 150 can provide close coverage of the specific domain (i.e., airline reservations, customer care, etc.) not only by content, but how that content is spoken (i.e., prosody, speech acts, emotion, etc.). The process goes to step 4500 and ends [different sets of weights would be produced by the training depending on the different audio segments based on the appropriate prosody]), respectively, and 
wherein the plurality of weight sets is acquired by inputting a learning speech corresponding to the plurality of pieces of parameter information to the trained artificial intelligence model ([0005] The method may include recording audio files of one or more live voices speaking language used in a specific domain, the audio files being recorded using various prosodies, storing the recorded audio files in a speech database; and training a text-to-speech synthesis system using the speech database, wherein the text-to-speech synthesis system selects audio selects audio segments having a prosody based on at least one dialog state and one speech act [the weights are the weights of the trained network. Input prosody is a parameter], and [0028] In addition, the audio files may be tagged for dialog state and speech act. At step 4400, the domain-specific speech knowledge module 160 trains the TTS system 150 using the speech database 150. Therefore, the unit selector 220 of the TTS system 150 may select audio segments from respective audio files having an appropriate prosody based on a given dialog state and a speech act. The dialog state may be the beginning (e.g, greeting), the middle (e.g, gathering customer account information), or the end or closing of the dialog (transfer to an agent/transition to another sub-system for further processing/end the call by thanking the caller and hanging up) for example. The speech act may concern the content of what is trying to be conveyed such as a greeting, an apology, thanks, a request for information, a question, a confirmation, etc. Therefore, speech database 170 used by the TTS system 150 can provide close coverage of the specific domain (i.e., airline reservations, customer care, etc.) not only by content, but how that content is spoken (i.e., prosody, speech acts, emotion, etc.). The process goes to step 4500 and ends. [different sets of weights would be produced by the training depending on the different audio segments based on the appropriate prosody]).   
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Edrenkin and Kim to include the teachings of Schroeter motivation being the use of TTS systems trained and maintained by domain-specific  speech knowledge greatly improves IVR services( Schroeter,  [0033]).

Claims 2 and 9 are rejected over Edrenkin, Kim and Schroeter as applied to claims 1 and 8 and in further view of Mulherkar (US 10365887 B1) 
With respect to claims 2 and 9 Edrenkin does not teach acquire text corresponding to the user's speech by recognizing the user's speech, acquire the text to respond to the user's speech based on natural language processing for the text corresponding to the user's speech, and acquire at least one of   
Kim teaches wherein the processor is further configured to: acquire text corresponding to the user's speech by recognizing the user's speech ([0036] The speech recognition unit 100 performs a speech recognition and delivers a character sequence corresponding to the recognized speech to the dialog model unit 102. ) [[acquire the text to respond to the user's speech based on natural language processing for the text corresponding to the user's speech)  [[acquire the text to respond to the user's speech based on natural language processing for the text corresponding to the user's speech]]
Neither Edrenkin, Kim or Schroeter teach acquire the text to respond to the user's speech based on natural language processing for the text corresponding to the user's speech.
Mulherkar teaches acquire the text to respond to the user's speech based on natural language processing for the text corresponding to the user's speech (Col 19 ll 50-55 If the format of the command (as stored in the lookup table) does not allow for the command to be output according to the determined output, text-to-speech (“TTS”), ASR, and/or NLU processes may be performed on the command data to convert it into a format that may be output using the determined output type.) and 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Edrenkin Kim and Schroeter to include the teachings of Mulherkar motivation being to improve performance and TTS processing, the TTS module may revise contents of the TTS storage based on feedback.  (Mulherkar, Col 27 ll 25-32).


Claims 4 and 11 are rejected over Edrenkin, Kim and Schroeter as applied to claims 1 and 8 and in further view of Junqua (US 20020120450 A1) and Nicolis (US 10319365 B1)

Junqua  teaches wherein the plurality of pieces of parameter information comprises at least one of context information of a user corresponding to the user's speech or context information of the electronic device ([0007] In accordance with yet another aspect of the invention, the previously described speaker dependent parameters and speaker independent parameters may be obtained by decomposing the initial set of parameters into two groups: context independent parameters and context dependent parameters. ) [[wherein the processor is further configured to acquire at least one of the context information of the user and the context information of the electronic device based on sensing information acquired from a sensing device.]]
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Edrenkin, Kim and Schroeter to include the teachings of Junqua motivation being  that by associating context independent parameters with speaker dependent parameters enables excellent personalization to obtain minimal computational burden( Junqua,  [0008]).
Edrenkin, Kim, Schroeter and Junqua do not teach wherein the processor is further configured to acquire at least one of the context information of the user and the context information of the electronic device based on sensing information acquired from a sensing device.
Nicolis teaches wherein the processor is further configured to acquire at least one of the context information of the user and the context information of the electronic device based on sensing (Col 21 ll 31-33 For example, for TTS processing by a global positioning system (GPS) device, the TTS storage 320 may include customized speech specific to location and navigation.) 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Edrenkin, Kim, Schroeter and Junqua to include the teachings of Nicolis motivation being that by using GPS, the TTS storage may include customized speech specific to location and navigation (Nicolis, [Col 21 ll 27-44]).

Claims 5 and 12 are rejected over Edrenkin, Kim and Schroeter as applied to claims 1 and 8 and in further view of Qian (US-20100066742-A1).
With respect to claims 5 and 12 Edrenkin, Kim and Schroeter do not teach a user interface, wherein the processor is further configured to change at least one of the plurality of pieces of parameter information based on a user instruction input through the user interface.   
Qian teaches a user interface, wherein the processor is further configured to change at least one of the plurality of pieces of parameter information based on a user instruction input through the user interface ([0010] FIG. 3 is a representation of a graphical interface for interacting with  speech output to change prosody. )  
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Edrenkin, Kim, Schroeter to include the teachings of Qian motivation being it is powerful to  synthesize speech is based on user-specific requirements (Qian, [0002 ]). 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675.  The examiner can normally be reached on Monday-Thursday Alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.   Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657