DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 02/27/2020. Claims 1-16 are pending in the application and have been examined.
	
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	
Specification

The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.

Claims 1, 4-9 and 12-16 are rejected under 35 U.S.C. 103 as being unpatentable over Chen, Y. Y., Wu, C. H., & Huang, Y. F. (2016, September), Generation of Emotion Control Vector Using MDS-Based Space Transformation for Expressive Speech Synthesis. In INTERSPEECH (pp. 3176-3180) in view of Tachibana, M., Yamagishi, J., Masuko, T., & Kobayashi, T. (2005), Speech synthesis with various emotional expressions and speaking styles by style interpolation and morphing. IEICE transactions on information and systems, 88(11), 2484-2491.
Regarding claim 1, Chen teaches a method for controlling a speech style (see Chen, pg. 3176, sect. 1), the method comprising: acquiring audio data having a predetermined speech style (see Chen, pg. 317, sect. 3.1, the MRHSMMs are trained using an emotional corpus containing a large amount of speech utterances for M emotion classes along with transcripts and the corresponding control vectors in CAT emotion/style space as shown in Figure 1. First, the user inputs the text and a vector designating the AV values of the desired emotion in the AV space. Then, the context-dependent labels are analyzed by text analysis; emotion is interpreted as predetermined speech style); generating a condition vector relating to a condition for determining the speech style of the audio data (see Chen, pg. 3178, sect 3.2,  In this study, the defined emotions contain happy, angry, sad, and neutral emotions, and their control vectors are defined as (0, 1, 0), (1, 0, 0), (0, 0, 1), and (0, 0, 0), respectively; this is interpreted as condition vector for speech style/emotion); reducing a dimension of the condition vector to a predetermined reduction dimension (see Chen, pg.  3176,77 sect. 1, sect. 2.2 and sect. 3.2 Before the transformation, the multidimensional scaling (MDS) is adopted to cope with the non-orthogonal problem by projecting the emotion representations in the AV space and CAT space onto their corresponding orthogonal coordinates. Therefore, the MDS is performed to obtain the
vector space with m (m = 2 in this study) orthogonal axes (so-called MDS-AV), in which the distance matrix in the MDS procedure is based on Euclidean distance between every two representative AV vectors; orthogonal projection and Euclidian distance processing of emotion representation  interpreted as the condition vector to a predetermined reduction dimension). However, Chen does not teach acquiring a sparse code vector based on a dictionary vector acquired through sparse dictionary coding with respect to the condition vector having the predetermined reduction dimension; and changing a vector element value included in the sparse code vector. However, Tachibana teaches acquiring a sparse code vector based on a dictionary vector acquired through sparse dictionary coding with respect to the condition vector having the predetermined reduction dimension (see Tachibana, pg. 2485, sect. 3.1, Let λ1, λ2,...,λN be models of N representative styles S1, S2,..., SN, and λ˜ be a model of style S˜ obtained by interpolating N representative style models with interpolation weights a1, a2,..., aN, where                         
                            
                                
                                    ∑
                                    
                                        k
                                        =
                                        0
                                    
                                    
                                        N
                                    
                                
                                
                            
                        
                    ak = 1. Furthermore, suppose that µk and Uk are the mean vector and the covariance matrix of the output pdf of style Sk; models of N representations are interpreted as dictionary and style is interpreted as the sparse code vector having predetermined reduction); and changing a vector element value included in the sparse code vector (see Tachibana, pg. 1325 sect. 2.2. Speech Synthesis with a Desired Style In speech synthesis stage, for a given style control vector v, the mean parameters of each synthesis unit, μi and mi, are calculated from (3) and (4). Then synthetic speech is generated in the same manner as the speech synthesis framework based on HMM. Consequently, by setting the style vector to a desired point in the style space, we can change the style expressivity of the synthetic speech; mean parameters calculated is interpreted as changing element value as shown in Tachibana Fig. 1).
Chen and Tachibana are considered to be analogous to the claimed invention because they relate to generating speech with emotional expressivity and speaking style variability. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective see Tachibana, pg.2484, sect. 1).
Regarding claim 4, Chen and Tachibana teach the method according to claim 1. Tachibana further teaches wherein the changing of the vector element value comprises changing a vector element value for a valid vector element included in the sparse code vector (see Tachibana, pg. 1325 sect. 2.2. Speech Synthesis with a Desired Style In speech synthesis stage, for a given style control vector v, the mean parameters of each synthesis unit, μi and mi, are calculated from (3) and (4). Then synthetic speech is generated in the same manner as the speech synthesis framework based on HMM. Consequently, by setting the style vector to a desired point in the style space, we can change the style expressivity of the synthetic speech; mean parameters calculated is interpreted as changing element value as shown in Tachibana Fig. 1).

    PNG
    media_image1.png
    245
    338
    media_image1.png
    Greyscale
Regarding claim 5, Chen and Tachibana teach the method according to claim 1. Tachibana further teaches wherein, when the sparse code vector is plural, the changing of the vector element value comprises changing a vector element value based on a valid vector element included in each of the plurality of sparse code vectors (see Tachibana, pg. 2485, section 3.1 and Fig. 1 shows how the interpolated Style is calculated by different Styles; the styles are interpreted as sparse code vectors).
Regarding claim 6, Chen and Tachibana teach the method according to claim 1. Tachibana further teaches acquiring the condition vector having the predetermined reduction dimension from the sparse code vector having the changed vector element value based on the dictionary vector (see Tachibana, pg. 2485, section 3.1 furthermore, suppose that µk and Uk are the mean vector and the covariance matrix of the output pdf of style Sk, and that ˜µ and U˜ are the mean vector and the covariance matrix of the output pdf for the interpolated style S˜, respectively. µk is interpreted as the condition vector having predetermined reduction dimension); and acquiring the condition vector in which the condition for determining the speech style is changed by extending the dimension of the condition vector having the predetermined dimension (see Tachibana, pg. 2485, section 3.1, let λ1, λ2,...,λN be models of N representative styles S1, S2,..., SN, and λ˜ be a model of style S˜ obtained by interpolating N representative style models with interpolation weights a1, a2,..., aN, where                          
                            
                                
                                    ∑
                                    
                                        k
                                        =
                                        0
                                    
                                    
                                        N
                                    
                                
                                
                            
                        
                      ak =1;  λ1 is interpreted as the condition vector with the extended dimension).
Regarding claim 7, Chen and Tachibana teach the method according to claim 6. Tachibana further teaches acquiring a prosody vector representing each of at least one speech style (see Tachibana, pg. 2484, sect. 1, In the same way as the style modeling , we refer to one of the emotional expressions or speaking styles as the style ; style is interpreted as the prosody vector representing one speech style); and generating a prosody embedding vector having a changed speech style using the prosody vector and the condition vector having the changed condition for determining the speech style (see Tachibana, pg. 2484, sect. 1, we choose four representative styles, i.e., neutral, joyful, sad, and rough styles in read speech and synthesize speech from models obtained by interpolating two models for every combination of two styles).
Chen and Tachibana are considered to be analogous to the claimed invention because they relate to generating speech with emotional expressivity and speaking style variability. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chen on style modeling and style adapting and then using method of synthesizing speech with an intermediate obtained from different styles by applying model interpolation techniques of Tachibana to change the speaking styles and emotional expressions of synthetic speech while maintaining its naturalness( see Tachibana, pg.2484, sect. 1).
Regarding claim 8, Chen and Tachibana teach the method according to claim 7. Tachibana further teaches acquiring text data (see Tachibana, pg. 2485, sect. 2, We utilize an HMM-based TTS system in this study as the platform for the style interpolation approach. (TTS : Text to Speech)); and generating a synthesized speech based on the text data and the prosody embedding vector (see Tachibana, pg. 2490, sect. 5 We have investigated a technique for synthesizing speech with an intermediate style by applying model interpolation techniques. The results of subjective evaluation tests, we have shown that we can add various emotional expressions).
Chen and Tachibana are considered to be analogous to the claimed invention because they relate to generating speech with emotional expressivity and speaking style variability. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chen on style modeling and style adapting and then using method of synthesizing speech with an intermediate obtained from different styles by applying model interpolation techniques of Tachibana to change the see Tachibana, pg.2484, sect. 1).
	Regarding claim 9, Chen teaches an artificial intelligence device comprising:
a memory configured to store audio data having a predetermined speech style (see Chen, pg. 317, sect. 3.1, the MRHSMMs are trained using an emotional corpus containing a large amount of speech utterances for M emotion classes along with transcripts and the corresponding control vectors in CAT emotion/style space as shown in Figure 1. First, the user inputs the text and a vector designating the AV values of the desired emotion in the AV space. Then, the context-dependent labels are analyzed by text analysis. ; emotion is interpreted as predetermined speech style); and a processor configured to: generate a condition vector relating to a condition for determining the speech style of the audio data (see Chen, pg. 3178, sect 3.2,  In this study, the defined emotions contain happy, angry, sad, and neutral emotions, and their control vectors are defined as (0, 1, 0), (1, 0, 0), (0, 0, 1), and (0, 0, 0), respectively; this is interpreted as condition vector for speech style/emotion);  reduce a dimension of the condition vector to a predetermined reduction dimension see Chen, pg.  3176, 3177 sect. 1 sect. 2.2 and sect. 3.2 Before the transformation, the multidimensional scaling (MDS) is adopted to cope with the non-orthogonal problem by projecting the emotion representations in the AV space and CAT space onto their corresponding orthogonal coordinates. Therefore, the MDS is performed to obtain the vector space with m (m = 2 in this study) orthogonal axes (so-called MDS-AV), in which the distance matrix in the MDS procedure is based on Euclidean distance between every two representative AV vectors; orthogonal projection and Euclidian distance processing of emotion representation  interpreted as the condition vector to a predetermined reduction dimension).  However, Chen does not teach acquire a sparse code vector based on a dictionary vector acquired through sparse dictionary coding with respect to the condition vector having the predetermined reduction dimension; and change a vector element value included in the sparse code vector. However, Tachibana teaches acquire a sparse code vector based on a dictionary vector acquired through sparse dictionary coding with respect to the condition vector having the predetermined reduction dimension(see Tachibana, pg. 2485, sect. 3.1, Let λ1, λ2,...,λN be models of N representative styles S1, S2,..., SN, and λ˜ be a model of style S˜ obtained by interpolating N representative style models with interpolation weights a1, a2,..., aN, where                         
                            
                                
                                    ∑
                                    
                                        k
                                        =
                                        0
                                    
                                    
                                        N
                                    
                                
                                
                            
                        
                    ak = 1. Furthermore, suppose that µk and Uk are the mean vector and the covariance matrix of the output pdf of style Sk; models of N representations are interpreted as dictionary and style is interpreted as the sparse code vector having predetermined reduction); and change a vector element value included in the sparse code vector(see Tachibana, pg. 1325 sect. 2.2. Speech Synthesis with a Desired Style In speech synthesis stage, for a given style control vector v, the mean parameters of each synthesis unit, μi and mi, are calculated from (3) and (4). Then synthetic speech is generated in the same manner as the speech synthesis framework based on HMM. Consequently, by setting the style vector to a desired point in the style space, we can change the style expressivity of the synthetic speech; mean parameters calculated is interpreted as changing element value as shown in Tachibana Fig. 1).
Chen and Tachibana are considered to be analogous to the claimed invention because they relate to generating speech with emotional expressivity and speaking style variability. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chen on style modeling and style adapting and then using method of synthesizing speech with an intermediate obtained from different styles by applying model interpolation techniques of Tachibana to change the see Tachibana, pg.2484, sect. 1).
	Regarding claim 12, Chen and Tachibana teach the artificial intelligence device according to claim 9, wherein the processor is configured to change a vector element value for a valid vector element included in the sparse code vector(see Tachibana, pg. 1325 sect. 2.2. Speech Synthesis with a Desired Style In speech synthesis stage, for a given style control vector v, the mean parameters of each synthesis unit, μi and mi, are calculated from (3) and (4). Then synthetic speech is generated in the same manner as the speech synthesis framework based on HMM. Consequently, by setting the style vector to a desired point in the style space, we can change the style expressivity of the synthetic speech; mean parameters calculated is interpreted as changing element value as shown in Tachibana Fig. 1).

    PNG
    media_image1.png
    245
    338
    media_image1.png
    Greyscale
	Regarding claim 13, Chen and Tachibana teach the artificial intelligence device according to claim 9, wherein when the sparse code vector is plural, the processor is configured to change a vector element value based on a valid vector element included in each of the plurality of sparse code vectors (see Tachibana, pg. 2485, section 3.1 and Fig. 1 shows how the interpolated Style is calculated by different Styles; the styles are interpreted as sparse code vectors).
Regarding claim 14, Chen and Tachibana teach the artificial intelligence device according to claim 9. Tachibana further teaches acquire the condition vector having the predetermined reduction dimension from the sparse code vector having the changed vector element value based on the dictionary vector (see Tachibana, pg. 2485, section 3.1 furthermore, suppose that µk and Uk are the mean vector and the covariance matrix of the output pdf of style Sk, and that ˜µ and U˜ are the mean vector and the covariance matrix of the output pdf for the interpolated style S˜, respectively. µk is interpreted as the condition vector having predetermined reduction dimension); and acquire the condition vector in which the condition for determining the speech style is changed by extending the dimension of the condition vector having the predetermined dimension (see Tachibana, pg. 2485, section 3.1, let λ1, λ2,...,λN be models of N representative styles S1, S2,..., SN, and λ˜ be a model of style S˜ obtained by interpolating N representative style models with interpolation weights a1, a2,..., aN, where                          
                            
                                
                                    ∑
                                    
                                        k
                                        =
                                        0
                                    
                                    
                                        N
                                    
                                
                                
                            
                        
                      ak =1;   λ1 is interpreted as the condition vector with the extended dimension).
Regarding claim 15, Chen and Tachibana teach the artificial intelligence device according to claim 14. Tachibana further teaches to acquire a prosody vector representing each of at least one speech style (see Tachibana, pg. 2484, sect. 1, In the same way as the style modeling , we refer to one of the emotional expressions or speaking styles as the style ; style is interpreted as the prosody vector representing one speech style); and generate a prosody embedding vector having a changed speech style using the prosody vector and the condition vector having the changed condition for determining the speech style (see Tachibana, pg. 2484, sect. 1, we choose four representative styles, i.e., neutral, joyful, sad, and rough styles in read speech and synthesize speech from models obtained by interpolating two models for every combination of two styles).
Chen and Tachibana are considered to be analogous to the claimed invention because they relate to generating speech with emotional expressivity and speaking style variability. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective see Tachibana, pg.2484, sect. 1).
Regarding claim 16, Chen and Tachibana teach the artificial intelligence device according to claim 15. Tachibana further teaches to acquire text data (see Tachibana, pg. 2485, sect. 2, We utilize an HMM-based TTS system in this study as the platform for the style interpolation approach. (TTS : Text to Speech)); and generate a synthesized speech based on the text data and the prosody embedding vector (see Tachibana, pg. 2490, sect. 5 We have investigated a technique for synthesizing speech with an intermediate style by applying model interpolation techniques. The results of subjective evaluation tests, we have shown that we can add various emotional expressions).
Chen and Tachibana are considered to be analogous to the claimed invention because they relate to generating speech with emotional expressivity and speaking style variability. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chen on style modeling and style adapting and then using method of synthesizing speech with an intermediate obtained from different styles by applying model interpolation techniques of Tachibana to change the speaking styles and emotional expressions of synthetic speech while maintaining its naturalness( see Tachibana, pg.2484, sect. 1).
Claims 2 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Chen, Y. Y., Wu, C. H., & Huang, Y. F. (2016, September), Generation of Emotion Control Vector Using MDS-Based Space Transformation for Expressive Speech Synthesis. In INTERSPEECH (pp. 3176-3180) in view of Tachibana, M., Yamagishi, J., Masuko, T., & Kobayashi, T. (2005), Speech synthesis with various emotional expressions and speaking styles by style interpolation and morphing. IEICE transactions on information and systems, 88(11), 2484-2491 further in view of Junqua et.al. (US Patent 6,970,820).
Regarding claim 2, Chen and Tachibana teach the method according to claim 1, however fail to teach reducing the condition vector to the predetermined reduction dimension by applying a Principal Component Analysis (PCA) algorithm to the condition vector. However Junqua, teaches reducing the condition vector to the predetermined reduction dimension by applying a Principal Component Analysis (PCA) algorithm to the condition vector (see Junqua, col.6, lines 18-25, Next, at step 72, a dimensionality reduction process is performed. Principal Component Analysis (PCA) is one such reduction technique. The reduction process generates an eigenspace 74, having a dimensionality that is low compared with the supervectors used to construct the eigenspace. The eigenspace thus represents a reduced-dimensionality vector space to which the context-independent parameters of all training speakers are confined).
Chen, Tachibana and Junqua are considered to be analogous to the claimed invention because they relate to generating speech with speaking style variability. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chen and Tachibana on style modeling and style adapting and then using method of the speaker parameter adaptation process using minimal data techniques of Junqua to reduce the amount of enrolment data (see Junqua, col. 3, lines 4-15).
Regarding claim 10, Chen and Tachibana teach the artificial intelligence device according to claim 9, however fail to teach to reduce the condition vector to the predetermined reduction dimension by applying a Principal Component Analysis (PCA) algorithm to the condition vector. However Junqua, teaches to reduce the condition vector to the predetermined reduction dimension by applying a Principal Component Analysis (PCA) algorithm to the condition vector (see Junqua, col.6, lines 18-25, Next, at step 72, a dimensionality reduction process is performed. Principal Component Analysis (PCA) is one such reduction technique. The reduction process generates an eigenspace 74, having a dimensionality that is low compared with the supervectors used to construct the eigenspace. The eigenspace thus represents a reduced-dimensionality vector space to which the context-independent parameters of all training speakers are confined).
Chen, Tachibana and Junqua are considered to be analogous to the claimed invention because they relate to generating speech with speaking style variability. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chen and Tachibana on style modeling and style adapting and then using method of the speaker parameter adaptation process using minimal data techniques of Junqua to reduce the amount of enrolment data (see Junqua, col. 3, lines 4-15).
Claims 3 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Chen, Y. Y., Wu, C. H., & Huang, Y. F. (2016, September), Generation of Emotion Control Vector Using MDS-Based Space Transformation for Expressive Speech Synthesis. In INTERSPEECH (pp. 3176-3180) in view of Tachibana, M., Yamagishi, J., Masuko, T., & Kobayashi, T. (2005), Speech synthesis with various emotional expressions and speaking styles by style interpolation and morphing. IEICE transactions on information and systems, 88(11), 2484-2491 further in view of M. J. Gangeh, P. Fewzee, A. Ghodsi, M. S. Kamel and F. Karray, "Multiview Supervised Dictionary Learning in Speech Emotion Recognition," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 6, pp. 1056-1068, June 2014.
Regarding claim 3, Chen and Tachibana teach the method according to claim 1, however fail to teach acquiring a plurality of pieces of audio training data for the sparse dictionary coding; acquiring condition training vectors relating to the condition for determining the speech style with respect to the plurality of pieces of audio training data; reducing the dimension of each of the condition training vectors to the predetermined reduction dimension; and acquiring a dictionary vector and a sparse representation coefficient vector, which are capable of acquiring the condition training vector, through sparse coding. However Gangeh teaches acquiring a plurality of pieces of audio training data for the sparse dictionary coding (see Gangeh, pg. 1060, sect IV , Although dozens of 
    PNG
    media_image2.png
    336
    347
    media_image2.png
    Greyscale
 emotional speech databases have been collected in the past few years, not all could attract the attention of the research community. SEMAINE, however, has been one of the most well-received databases; pg. 1058, algorithm 1, Input);  acquiring condition training vectors relating to the condition for determining the speech style with respect to the plurality of pieces of audio training data (see Gangeh, pg. 1058, algorithm 1, step 4, S λ interpreted as conditional training vector); reducing the dimension of each of the condition training vectors to the predetermined reduction dimension(see Gangeh, pg. 1058, algorithm 1, step 4, Compute Training Coefficients; interpreted as condition training vector reduced to a predetermined reduction dimension);  and acquiring a dictionary vector and a sparse representation coefficient vector, which are capable of acquiring the condition training vector, through sparse coding (see Gangeh, pg. 1058, algorithm 1, Output Dictionary vector , Training Coefficients is interpreted as the sparse representation coefficient vector).
Chen, Tachibana and Gangeh are considered to be analogous to the claimed invention because they relate to speech processing with speaking style variability. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chen and Tachibana on style modeling and style adapting and then using dictionary learning and spare representation techniques of Gangeh to improve speech emotion recognition (see Gangeh, pg. 1056, sect. 1).
Regarding claim 11, Chen and Tachibana teach the artificial intelligence device according to claim 9, however fail to teach to acquire a plurality of pieces of audio training data for the sparse dictionary coding; acquire condition training vectors relating to the condition for determining the speech style with respect to the plurality of pieces of audio training data; reduce the dimension of each of the condition training vectors to the predetermined reduction dimension; and acquire a dictionary vector and a sparse representation coefficient vector, which are capable of acquiring the condition training vector, through sparse coding. However Gangeh teaches to acquire a plurality of pieces of audio training data for the sparse dictionary coding (see Gangeh, pg. 1060, sect IV , Although dozens of 
    PNG
    media_image2.png
    336
    347
    media_image2.png
    Greyscale
 emotional speech databases have been collected in the past few years, not all could attract the attention of the research community. SEMAINE, however, has been one of the most well-received databases; pg. 1058, algorithm 1, Input);  acquire condition training vectors relating to the condition for determining the speech style with respect to the plurality of pieces of audio training data (see Gangeh, pg. 1058, algorithm 1, step 4, S λ interpreted as conditional training vector); reduce the dimension of each of the condition training vectors to the predetermined reduction dimension(see Gangeh, pg. 1058, algorithm 1, step 4, Compute Training Coefficients; interpreted as condition training vector reduced to a predetermined reduction dimension);  and acquire a dictionary vector and a sparse representation coefficient vector, which are capable of acquiring the condition training vector, through sparse coding (see Gangeh, pg. 1058, algorithm 1, Output Dictionary vector , Training Coefficients is interpreted as the sparse representation coefficient vector).
Chen, Tachibana and Gangeh are considered to be analogous to the claimed invention because they relate to speech processing with speaking style variability. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chen and Tachibana on style modeling and style adapting and then using dictionary learning and spare representation techniques of Gangeh to  Gangeh, pg. 1056, sect. 1).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Makoto Tachibana, Shinsuke Izawa, Takashi Nose and Takao Kobayashi, "Speaker and style adaptation using average voice model for style control in HMM-based speech synthesis," 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, pp. 4633-4636 teaches adapt the average voice model to target speaker’s styles using a technique for simultaneous adaptation of speaker and style (see Makoto, Fig. 1 and sect. 3).
Lin et. al. US Patent Application Publication 2012/0166198 teaches prosody re-estimation system to reduce the prosody difference between TTS synthesized speech and recorded speech, to generate synthesized speech with higher naturalness (see Lin, Fig. 4 and [0032]).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 2:00pm - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NANDINI SUBRAMANI/Examiner, Art Unit 2656

/EDGAR X GUERRA-ERAZO/Primary Examiner, Art Unit 2656