DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 09/07/2022 has been entered.
Claims 1-5, 9-13 and 17-18 are pending in the application and have been examined. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The response filed on 09/07/2022 has been correspondingly accepted and considered in this Office Action. Claims 1-5, 9-13 and 17-18 have been examined. New claims 17-18  have been examined.
Response to Arguments
Applicant's arguments filed 09/07/2022 have been fully considered as follows:
Applicant’s arguments with respect to amended claims 1 and 9 on pg. 10 states that
“Applicant submits that none of the applied references disclose or suggest Applicant's clarified features of: a) the speech style is changed based on the sparse code vector mixed a first valid vector element value determining first emotion and a second valid vector element value determining a second emotion different from the first emotion, and b) the second valid vector element value is different from the first valid vector element value.”
	
Applicant’s arguments above with respect to amended claims 1 and 9 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
In response to the art rejection(s) of the remainder of dependent claims are rejected under 35 U.S.C 103, in case said claims are correspondingly discussed and/or argued for at least the same rationale presented in Remarks filed 09/07/2022, Examiner respectfully notes as follows. For completeness, should the mentioned the dependent claims are likewise traversed for similar reasons to independent claims 1 and claim 9, Examiner respectfully directs Applicant to the same previous supra reasons provided in the response directed towards claim 1 correspondingly discussed above. For at least the same supra provided reasons, Examiner likewise respectfully disagrees, and Applicant's arguments have been fully considered but they are not persuasive and therefore, the rejection of the claims are rejected under 35 U.S.C. 103 are sustained and further updated accordingly.
Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 3-5, 9, 11-13 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over M. J. Gangeh, P. Fewzee, A. Ghodsi, M. S. Kamel and F. Karray, "Multiview Supervised Dictionary Learning in Speech Emotion Recognition," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 6, pp. 1056-1068, June 2014 in view of Tachibana, M., Yamagishi, J., Masuko, T., & Kobayashi, T. (2005), Speech synthesis with various emotional expressions and speaking styles by style interpolation and morphing. IEICE transactions on information and systems, 88(11), 2484-2491 further in view of Yang et. al. US Patent Application Publication 2020/0035215.
Regarding claim 1, Gangeh teaches a method for generating a synthesized speech having a different speech style, the method comprising: acquiring audio data having  (see Gangeh, pg. 1059 sec. III, Given the speech signal , there are two major phases into a solution for speech emotion recognition: 1) extraction of low-level descriptors (LLDs) (acoustic features) from speech, and 2) statistical modeling ); generating a condition vector relating to a condition for determining the speech style of the audio data (see Gangeh, pg. 1058, algorithm 1, step 4, Sλ interpreted as condition vector); reducing a dimension of the condition vector to a predetermined reduction dimension (see Gangeh, pg. 1058, algorithm 1, step 4, Compute Training Coefficients; interpreted as condition training vector reduced to a predetermined reduction dimension); acquiring a sparse code vector based on a dictionary vector acquired through sparse dictionary coding with respect to the condition vector having the predetermined reduction dimension (see Gangeh, pg. 1058, algorithm 1, step 4, Compute Training Coefficients; interpreted as condition training vector reduced to a predetermined reduction dimension and  Output Dictionary vector , Training Coefficients is interpreted as the sparse representation coefficient vector); wherein the  reduced  dimension of the condition vector is determined by discarding an eigen vector of the dimension with variance smaller than a reference variance based on the order of the eigen values (see Gangeh, pg. 1058 sect IIB According to the Rayleigh-Ritz Theorem [32], the solution for the optimization problem given in (4) is the corresponding eigenvectors of the top eigenvalues of  
    PNG
    media_image1.png
    21
    120
    media_image1.png
    Greyscale
); the acquired  (see Gangeh, 1058 sect IIB  After finding the dictionary , the sparse coefficients can be computed using the formulation given in (2) ); and the sparse code vector includes a plurality of vector element values having at least one valid vector element value and the remainder 0 (see Gangeh, pg. 1058 sect IIB, where Sλ (interpreted as Sparse Vector) is defined as follows 
    PNG
    media_image2.png
    61
    380
    media_image2.png
    Greyscale
). 
However, Gangeh does not teach changing a vector element value included in the sparse code vector; acquiring the condition vector having the predetermined reduction dimension from the sparse code vector having the changed vector element value based on the dictionary vector; acquiring the condition vector in which the condition for determining the speech style is changed by extending the dimension of the condition vector having the predetermined dimension; acquiring a prosody vector representing each of at least one speech style; generating a prosody embedding vector having a changed speech style using the prosody vector and the condition vector having the changed condition for determining the speech style; acquiring text data; and generating a synthesized speech based on the text data and the prosody embedding vector; wherein the speech style is changed based on the sparse code vector mixed a first valid vector element value determining first emotion and a second valid vector element value determining a second emotion different from the first emotion, and the second valid vector element value is different from the first valid vector element value.
However, Tachibana teaches changing a vector element value included in the sparse code vector (see Tachibana, pg. 1325 sect. 2.2. Speech Synthesis with a Desired Style In speech synthesis stage, for a given style control vector v, the mean parameters of each synthesis unit, μi and mi, are calculated from (3) and (4). Then synthetic speech is generated in the same manner as the speech synthesis framework based on HMM. Consequently, by setting the style vector to a desired point in the style space, we can change the style expressivity of the synthetic speech; mean parameters calculated is interpreted as changing element value as shown in Tachibana Fig. 1); acquiring the condition vector having the predetermined reduction dimension from the sparse code vector having the changed vector element value based on the dictionary vector (see Tachibana, pg. 2485, section 3.1 furthermore, suppose that µk and Uk are the mean vector and the covariance matrix of the output pdf of style Sk, and that ˜µ and U˜ are the mean vector and the covariance matrix of the output pdf for the interpolated style S˜, respectively. µk is interpreted as the condition vector having predetermined reduction dimension); acquiring the condition vector in which the condition for determining the speech style is changed by extending the dimension of the condition vector having the predetermined dimension (see Tachibana, pg. 2485, section 3.1, let λ1, λ2,...,λN be models of N representative styles S1, S2,..., SN, and λ˜ be a model of style S˜ obtained by interpolating N representative style models with interpolation weights a1, a2,..., aN, where                          
                            
                                
                                    ∑
                                    
                                        k
                                        =
                                        0
                                    
                                    
                                        N
                                    
                                
                                
                            
                        
                      ak =1;  λ1 is interpreted as the condition vector with the extended dimension); acquiring a prosody vector representing each of at least one speech style (see Tachibana, pg. 2484, sect. 1, In the same way as the style modeling , we refer to one of the emotional expressions or speaking styles as the style ; style is interpreted as the prosody vector representing one speech style); generating a prosody embedding vector having a changed speech style using the prosody vector and the condition vector having the changed condition for determining the speech style (see Tachibana, pg. 2484, sect. 1, we choose four representative styles, i.e., neutral, joyful, sad, and rough styles in read speech and synthesize speech from models obtained by interpolating two models for every combination of two styles); acquiring text data (see Tachibana, pg. 2485, sect. 2, We utilize an HMM-based TTS system in this study as the platform for the style interpolation approach. (TTS : Text to Speech)); and generating a synthesized speech based on the text data and the prosody embedding vector (see Tachibana, pg. 2490, sect. 5 We have investigated a technique for synthesizing speech with an intermediate style by applying model interpolation techniques. The results of subjective evaluation tests, we have shown that we can add various emotional expressions).
Gangeh and Tachibana are considered to be analogous to the claimed invention because they relate to speech processing with speaking style variability. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Gangeh on speech emotion recognition using dictionary learning and spare representation techniques with the different speech styles obtained by applying model interpolation techniques of Tachibana to change the speaking styles and emotional expressions of synthetic speech while maintaining its naturalness( see Tachibana, pg.2484, sect. 1).
However, Gangeh in view of Tachibana does not teach wherein the speech style is changed based on the sparse code vector mixed a first valid vector element value determining first emotion and a second valid vector element value determining a second emotion different from the first emotion, and the second valid vector element value is different from the first valid vector element value.

    PNG
    media_image3.png
    404
    482
    media_image3.png
    Greyscale
However, Yang teaches wherein the speech style is changed based on the sparse code vector mixed a first valid vector element value determining first emotion and a second valid vector element value determining a second emotion different from the first emotion, and the second valid vector element value is different from the first valid vector element value (see Yang Fig. 17 where EVs and EVc are the first emotion and second emotion respectively to created the speech style change vector EV ).
Gangeh, Tachibana and Yang are considered to be analogous to the claimed invention because they relate to speech processing with speaking style variability. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Gangeh and Tachibana on speech emotion recognition using dictionary learning and spare representation techniques with the different speech styles with the merging of emotions techniques of Yang to synthesis speech with the intention or emotion of a user who has actually delivered the text is reflected in speech output ( see Yang, [0003]).
Regarding claim 3, Gangeh in view of Tachibana further in view of Yang teach the method according to claim 1, Gangeh further teaches acquiring a plurality of pieces of audio training data for the sparse dictionary coding (see Gangeh, pg. 1060, sect IV , Although dozens of 
    PNG
    media_image4.png
    336
    347
    media_image4.png
    Greyscale
 emotional speech databases have been collected in the past few years, not all could attract the attention of the research community. SEMAINE, however, has been one of the most well-received databases; pg. 1058, algorithm 1, Input);  acquiring condition training vectors relating to the condition for determining the speech style with respect to the plurality of pieces of audio training data (see Gangeh, pg. 1058, algorithm 1, step 4, S λ interpreted as conditional training vector); reducing the dimension of each of the condition training vectors to the predetermined reduction dimension(see Gangeh, pg. 1058, algorithm 1, step 4, Compute Training Coefficients; interpreted as condition training vector reduced to a predetermined reduction dimension);  and acquiring a dictionary vector and a sparse representation coefficient vector, which are capable of acquiring the condition training vector, through sparse coding (see Gangeh, pg. 1058, algorithm 1, Output Dictionary vector , Training Coefficients is interpreted as the sparse representation coefficient vector).
Regarding claim 4, Gangeh in view of Tachibana further in view of Yang teach the method according to claim 1. Yang further teaches wherein the changing of the vector element value comprises changing the valid vector element 4 Attorney Docket No. 3130-3290value included in the sparse code vector (see Yang Fig. 17 and [0294] discusses the first weight Ws can be applied to the first emotion vector EVs, the second weight Wc can be applied to the second emotion vector EVc and the two emotion vectors can be summed. As a result, vector values respectively corresponding to a plurality of emotion items constituting the emotion vector EV can be calculated, as illustrated in FIG. 17. For example, “neutral” 0.0, “love” 0.84, “happy” 0.16, “anger” 0.0 and “sad” 0.0 can be calculated).
Regarding claim 5, Gangeh in view of Tachibana further in view of Yang teach the method according to claim 1. Yang further teaches wherein, when the sparse code vector is plural, the changing of the vector element value comprises changing the valid vector element value included in each of the plurality of sparse code vectors (see Yang Fig. 17 and [0294-0295] The emotion determination module can transmit the calculated vector values to the speech synthesis engine. For example, vector values corresponding to the plurality of emotion items can be transmitted in a sequence form or a one-dimensional matrix form to the speech synthesis engine. For example, the vector values can be transmitted in the form of [0.0, 0.84, 0.16, 0.0, 0.0].).
Regarding claim 9, is directed to an artificial intelligence device claim corresponding to the method claim presented in claim 1 and is rejected under the same grounds stated above regarding claim 1.
Regarding claim 11, is directed to an artificial intelligence device claim corresponding to the method claim presented in claim 3 and is rejected under the same grounds stated above regarding claim 3.
Regarding claim 12, is directed to an artificial intelligence device claim corresponding to the method claim presented in claim 4 and is rejected under the same grounds stated above regarding claim 4.
Regarding claim 13, is directed to an artificial intelligence device claim corresponding to the method claim presented in claim 5 and is rejected under the same grounds stated above regarding claim 5.
Regarding claim 17, Gangeh in view of Tachibana further in view of Yang teach the method according to claim 1. Gangeh further teaches wherein the reduced dimension of the condition vector corresponds to a pre-set low loss rate (see Gangeh, pg. 1056, The main goal in classical dictionary learning and sparse representation (DLSR) is to decompose the data over a few dictionary atoms by minimizing a loss function as shown in eqn. 1 ).
Regarding claim 18, is directed to an artificial intelligence device claim corresponding to the method claim presented in claim 17 and is rejected under the same grounds stated above regarding claim 17.
Claims 2 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over M. J. Gangeh, P. Fewzee, A. Ghodsi, M. S. Kamel and F. Karray, "Multiview Supervised Dictionary Learning in Speech Emotion Recognition," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 6, pp. 1056-1068, June 2014 in view of Tachibana, M., Yamagishi, J., Masuko, T., & Kobayashi, T. (2005), Speech synthesis with various emotional expressions and speaking styles by style interpolation and morphing. IEICE transactions on information and systems, 88(11), 2484-2491, further in view of Yang et. al. US Patent Application Publication 2020/0035215, further in view of Junqua et.al. (US Patent 6,970,820).
Regarding claim 2, Gangeh in view of Tachibana further in view of Yang teach the method according to claim 1, however fail to teach reducing the condition vector to the predetermined reduction dimension by applying a Principal Component Analysis (PCA) algorithm to the condition vector. 
However, Junqua, teaches reducing the condition vector to the predetermined reduction dimension by applying a Principal Component Analysis (PCA) algorithm to the condition vector (see Junqua, col.6, lines 18-25, Next, at step 72, a dimensionality reduction process is performed. Principal Component Analysis (PCA) is one such reduction technique. The reduction process generates an eigenspace 74, having a dimensionality that is low compared with the supervectors used to construct the eigenspace. The eigenspace thus represents a reduced-dimensionality vector space to which the context-independent parameters of all training speakers are confined).
Gangeh, Tachibana, Yang and Junqua are considered to be analogous to the claimed invention because they relate to speech processing with speaking style variability. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Gangeh, Tachibana and Yang on style modeling and style adapting and then using method of the speaker parameter adaptation process using minimal data techniques of Junqua to reduce the amount of enrolment data (see Junqua, col. 3, lines 4-15).
Regarding claim 10, is directed to an artificial intelligence device claim corresponding to the method claim presented in claim 2 and is rejected under the same grounds stated above regarding claim 2.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Makoto Tachibana, Shinsuke Izawa, Takashi Nose and Takao Kobayashi, "Speaker and style adaptation using average voice model for style control in HMM-based speech synthesis," 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, pp. 4633-4636 teaches adapt the average voice model to target speaker’s styles using a technique for simultaneous adaptation of speaker and style (see Makoto, Fig. 1 and sect. 3).
Lin et. al. US Patent Application Publication 2012/0166198 teaches prosody re-estimation system to reduce the prosody difference between TTS synthesized speech and recorded speech, to generate synthesized speech with higher naturalness (see Lin, Fig. 4 and [0032]).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 12:00pm - 5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NANDINI SUBRAMANI/Examiner, Art Unit 2656                                                                                                                                                                                                        
/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656