DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on 12/01/2021. Claims 1, 3-9, and 11-16 are pending and have been examined. 
All previous objection/rejections not mentioned in the previous office action has been withdrawn by the examiner.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) was submitted on October 23, 2019. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
	Response to Arguments and Amendments
	The applicant amends independent claims 1 and 9 by adding the limitations of what were previously claims 2 and 10, respectively. The applicant further notes that the prior art of record does not specifically teach the training of a speech correction model based on (1) a synthesized speech feature set and (2) a human speech feature set  and (3) input syntax analysis information for the same learning target text for training the speech correction model. However, the Examiner respectfully disagrees with this assertion. The mapping can be found below under McDuff. The applicant in page 9 of their Remarks note and cite claim 1 of McDuff when describing the limitation. The Examiner would like to note that this claim discussed in these paragraphs are not relied on for specific mappings to the applicants’ claims. According to McDuff, The learning processor is interpreted to be the “conversational style manager 402” 
Hence, the Applicant’s arguments are not persuasive.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

(s) 1, 4-9, and 12-16 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by McDuff (U.S Patent No. 20200279553).
Regarding claim 1, McDuff teaches (Figure 1) an artificial intelligence apparatus (110) comprising memory (114), a processor (112), and a learning processor (112). The memory is configured to store learning target text ([0035] - The labeled dataset may be a collection of text labeled with intent data) and human speech of a person who pronounces the text ([0023] - The memory may store instructions for implementing detection of voice activity, speech recognition, paralinguistic parameter recognition, for processing audio signals generated by the microphone that are representative of detected sound). The processor is configured to generate synthesized speech in which the text is pronounced by synthesized sound ([0045] – A speech synthesizer converts a symbolic linguistic representation of the utterance to be generated by the conversational agent into an audio file or electronic signal that can be provided to the local computing device for output by the speaker) and extracts a synthesized speech feature set including information on a feature pronounced in the synthesized speech ([0045] -  The speech synthesizer may create a completely synthetic voice output such as by use of a model of the vocal tract and other human voice characteristics) and a human speech feature set including information on a feature pronounced in the human speech ([0029] – Output from the voice activity recognizer is also provided to a prosody recognizer that performs paralinguistic parameter recognition of the audio segments that contain voice activity). The learning processor is configured to train a speech correction model ([0064] - conversational style manager) for outputting a corrected speech feature set to allow predetermined synthesized speech to be corrected based on a human pronunciation feature when a synthesized speech feature set extracted from predetermined synthesized speech is input, based on the synthesized speech 
Regarding claim 4, McDuff teaches (Figure 1) an artificial intelligence apparatus (11) wherein the synthesized speech feature set and the human speech feature set include information on at least one of a pitch of speech, a tone of speech, a rate of speech or way of talking of speech ([0029] – The paralinguistic parameters may be extracted using a digital signal processing approach. Paralinguistic parameters extracted by the voice activity recognizer may include, but are not limited to, speech rate, the fundamental frequency (f0), which is perceived by the ear as pitch, and the root mean squared (RMS) energy which reflects the loudness of the speech).

Regarding claim 6, McDuff teaches (Figure 1) an artificial intelligence apparatus (110) that comprises a communication interface ([0097] – embodied conversational agent), a processor (112), and a learning processor (112). The communication interface is configured to receive first text which is a speech synthesis target ([0090] – an alternate source of conversational input from the user, text input, may be received). The processor generates first synthesized speech in which the first text is pronounced by synthesized sound and extracts a first synthesized speech feature set including information on a feature pronounced in the first synthesized speech ([0037] – The dialogue manager captures input from the linguistic style extractor and the custom intent recognizer to generate for dialogue that will be produced by the conversational agent. Thus, the dialogue manager can combine dialogue generated by the neural models of the neural dialogue generator and domain-specific scripted dialogue from the custom intent recognizer). The learning processor inputs the first synthesized speech feature set to the speech correction model ([0064] - conversational style manager)  and acquires first corrected speech feature set to allow the first 
Regarding claim 7, McDuff teaches (Figure 1) an artificial intelligence apparatus (110) wherein the processor corrects the first synthesized speech based on the first corrected speech feature set and generates a second synthesized speech. ([0046] – the speech synthesizer will generate synthetic speech which not only provides appropriate response content in response to an utterance of the user but also is modified based on the content variables identified in the user’s utterance).
	Regarding claim 8, McDuff teaches (Figure 1) an artificial intelligence apparatus (110) that comprises a processor (112) and a learning processor (112). The processor extracts first syntax analysis information including information necessary to pronounce the first text ([0038] – The dialogue manager generates a representation of an utterance in computer-readable form. This may be a textual form representing the words to be “spoken” by the conversational agent… Alternatively, the output from the dialogue manager may be provided in a richer format such as… Java Speech Markup Language (JSML)…. JSML defines elements which define a document’s structure, the pronunciation of certain words and phrases, features of speech such as emphasis and intonation, etc.). The learning processor inputs the first synthesized speech feature set and the first syntax analysis information to the speech correction model and acquires the first corrected speech feature set ([0066] – The speech synthesizer converts a symbolic linguistic 
	Regarding claim 9, McDuff teaches (Figure 1) a method of correction synthesized speech ([0055] – conversational agent system) for an artificial intelligence apparatus (110). This method comprises storing learning target text ([0035] – The labeled dataset may be a collection of text labeled with intent data) and human speech of a person who pronounces the text ([0023] - The memory may store instructions for implementing detection of voice activity, speech recognition, paralinguistic parameter recognition, for processing audio signals generated by the microphone that are representative of detected sound). This method also comprises generating synthesized speech ([0055] – conversational agent system)  in which the text is pronounced by synthesized sound ([0045] – A speech synthesizer converts a symbolic linguistic representation of the utterance to be generated by the conversational agent into an audio file or electronic signal that can be provided to the local computing device for output by the speaker) and extracting a synthesized speech feature set including information on a feature pronounced in the synthesized speech ([0045] -  The speech synthesizer may create a completely synthetic voice output such as by use of a model of the vocal tract and other human voice characteristics) and a human speech 
	Regarding claim 12, McDuff teaches the synthesized speech feature set and the human speech feature set ([0045] - The speech synthesizer may create a completely synthetic voice 0), which is perceived by the ear as pitch, and the root mean squared (RMS) energy which reflects the loudness of the speech).
Regarding claim 13, McDuff teaches the syntax analysis information ([0068] –
synthesized speech output), which includes information on at least one of a phoneme included in the text, a position of a phoneme, the number of phonemes, ([0068] – a phenome recognizer receives the synthesized speech output from the speech synthesizer and outputs a corresponding sequence of visual groups of phonemes or visemes), a syllable, a position of a syllable, the number of syllables, a position of a word, the number of words, a position of a phrase, the number of phrases, a position of a stress, a position of an accent, presence/absence of a stress or presence/absence of an accent ([0089] – a linguistic style of speech is determined. The linguistics style may include the content variables and acoustic variables of the speech).
	Regarding claim 14, McDuff teaches receiving first text which is a speech synthesis target ([0090] – an alternate source of conversational input from the user, text input, may be received) and generating first synthesized speech in which the first text is pronounced by synthesized sound and extracting a first synthesized speech feature set including information on a feature pronounced in the first synthesized speech ([0037] – The dialogue manager captures input from the linguistic style extractor and the custom 
	Regarding claim 15, McDuff teaches correction of the first synthesized speech based on the first corrected speech feature set and generating a second synthesized speech ([0046] – the speech synthesizer will generate synthetic speech which not only provides appropriate response content in response to an utterance of the user but also is modified based on the content variables identified in the user’s utterance).
	Regarding claim 16, McDuff teaches the generation of the first synthesized speech feature set which includes extracting first syntax analysis information including information necessary to pronounce the first text ([0038] – The dialogue manager generates a representation of an utterance in computer-readable form. This may be a textual form representing the words to be “spoken” by the conversational agent… Alternatively, the output from the dialogue manager may be provided in a richer format such as… Java Speech Markup Language (JSML)…. JSML defines elements which define a document’s structure, the pronunciation of certain words and phrases, features of speech such as emphasis and intonation, etc.) and the first corrected speech feature set .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over McDuff in view of Jeon (U.S. Patent No. 9697820B2), hereinafter Jeon.
Regarding claim 3, McDuff teaches (Figure 1) an artificial intelligence apparatus (110) wherein the learning processor (112) trains the speech correction model based on a machine learning algorithm or a deep learning algorithm ([0028] – The speech recognizer recognizes words in the electronic signals corresponding to the user’s speech. The speech recognizer may use any suitable algorithm or technique for speech recognition including, but not limited, a Hidden Markov Model, dynamic time warping (DTW), a neural network, a deep feedforward neural network (DNN), or a recurrent neural network).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified McDuff to incorporate the teachings of Jeon to provide an input and output layer for the correction model to use in order to transfer data of the synthesized speech feature set, syntax analysis information, and human speech feature set. Doing so would enable speech segments to be selected based on actual data rather than arbitrarily defined acoustic features that are envisioned as ideal, which results in more natural sound synthesizing speech (Jeon, Col. 25 Lines 64-67)
Regarding claim 11, McDuff teaches learning ([0064] – conversational agent), which includes training the correction model ([0064] - conversational style manager) based on a machine learning algorithm or a deep learning algorithm ([0028] – The speech recognizer recognizes words in the electronic signals corresponding to the user’s speech. The speech recognizer may use any suitable algorithm or technique for speech recognition including, but not limited, a Hidden Markov Model, dynamic time warping (DTW), a neural network, a deep feedforward neural network (DNN), or a recurrent neural network).
 Jeon teaches that this model is set to input the synthesized speech feature set and the syntax analysis information to an input layer (Column 25, Lines 7-10 – Input layer can be configured to receive as inputs the set of linguistic features of the current target unit and the set 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified McDuff to incorporate the teachings of Jeon to provide an input and output layer for the correction model to use in order to transfer data of the synthesized speech feature set, syntax analysis information, and human speech feature set. Doing so would enable speech segments to be selected based on actual data rather than arbitrarily defined acoustic features that are envisioned as ideal, which results in more natural sound synthesizing speech (Jeon, Col. 25 Lines 64-67)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Liu (U.S. Patent No. 10902841) teaches a personalized custom synthetic speech apparatus. Ogawa (U.S. Patent No. 20200349932) teaches an oral communication device and computing systems for processing data and outputting oral feedback and related methods. Peng (U.S. Patent No. 11017761) teaches a parallel neural text-to-speech apparatus. Yang (U.S. Patent No. 11074904) teaches a speech synthesis method and apparatus based on emotion information. 
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ETHAN DANIEL KIM whose telephone number is (571)272-1405.  The examiner can normally be reached on Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on N/A.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to 
/ETHAN DANIEL KIM/Examiner, Art Unit 2658       

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658