DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on February 5, 2021 has been entered.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 3-6, 8-22 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-6, 19-20, 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Waltermann et al. (US 2016/0372110 A1, “Waltermann”) in view of Martinez (US 2014/0236596 A1).
As to claims 1, 19-20, Waltermann discloses an intelligent interactive method (para. 0014-0018, 0029), comprising: 
obtaining an emotion recognition result according to a user message, wherein the user message comprises at least a user voice message (estimate characteristics of a speaker, including emotional state, para. 0017-0018, 0031-0032); 
performing an intention analysis according to a text content of the user voice message to obtain corresponding basic intention information (plain word meaning of the voice data, para. 0035); and 
determining corresponding emotional intention information according to the emotion recognition result and the basic intention information (user’s original intent is determined using contextual voice characteristics and the plain meaning of the voice input, para. 0031, 0035-0036); and 

Waltermann differs from claims 1, 19-20 in that although it discloses: obtaining an audio emotion recognition result according to audio data of the user voice message (para. 0017-0018, 0031-0032), it does not teach:
obtaining a text emotion recognition result according to the text content of the user voice message; and
obtaining an emotion recognition result according to the audio emotion recognition result and the text emotion recognition result.
Martinez teaches determining an emotion expressed in a voice message based on a combination of inputs, including acoustic features mapped to different emotions in the audio, combined with linguistic parameters, such as words associated with particular emotions in the transcription (para. 0025-0028).  It would have been obvious to one of ordinary skill in the art to modify Waltermann with the above feature of Martinez in order to more accurately determine an emotion result by relying on a combination of inputs.  One of ordinary skill in the art would have recognized that the results of the combination of prior art elements of Waltermann in view of Martinez according to known methods were predictable.
As to claim 3, Waltermann in view of Martinez discloses: wherein the interactive instruction comprises presenting a feedback content of the emotional intention information (Martinez: audio, tactile or visual indication of the determined emotion is presented, para. 0029-0030, 0034, 0039).
As to claim 4, Waltermann in view of Martinez discloses: wherein the interactive instruction comprises one or more of the following sentiment presentation modes: a text output sentiment presentation mode, a music play sentiment presentation mode, a voice sentiment 
As to claim 5, Waltermann in view of Martinez discloses: wherein the emotional intention information comprises sentiment need information corresponding to the emotion recognition result; or the emotional intention information comprises the sentiment need information corresponding to the emotion recognition result and an association relationship between the emotion recognition result and the basic intention information (Waltermann: based on determined stress or urgency emotion and a query “Where is that file I need to present,” a sentiment need is inferred for the system to access calendar entries in the near future instead of adding to the stress by asking additional questions, para. 0018, 0036).
As to claim 6, Waltermann in view of Martinez discloses: wherein the user message comprises at least a user voice message; and the obtaining an emotion recognition result according to an obtained user message comprises: obtaining the emotion recognition result according to the user voice message (Waltermann: para. 0017-0018, 0036).
As to claim 22, Waltermann in view of Martinez discloses: wherein the determining corresponding emotional intention information according to the emotion recognition result and the basic intention information comprises: 
determining corresponding emotional intention information according to the emotion recognition result and the basic intention information, in combination with an emotion recognition result and basic intention information of a previous user voice message and/or a subsequent user voice message.
Claims 1, 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Waltermann in view of Kalinli-Akbacak (US 2014/0112556 A1, “Kalinli-Akbacak ‘556”).

obtaining an emotion recognition result according to a user message, wherein the user message comprises at least a user voice message (estimate characteristics of a speaker, including emotional state, para. 0017-0018, 0031-0032); 
performing an intention analysis according to a text content of the user voice message to obtain corresponding basic intention information (plain word meaning of the voice data, para. 0035); and 
determining corresponding emotional intention information according to the emotion recognition result and the basic intention information (user’s original intent is determined using contextual voice characteristics and the plain meaning of the voice input, para. 0031, 0035-0036); and 
determining a corresponding interactive instruction according to the emotional intention information, or determining the corresponding interactive instruction according to the emotional intention information and the basic intention information (a subsequent response or action is taken according to the determined intent, para. 0037-0039, 0041).
Waltermann differs from claim 1 in that although it discloses: obtaining an audio emotion recognition result according to audio data of the user voice message (para. 0017-0018, 0031-0032), it does not teach:
obtaining a text emotion recognition result according to the text content of the user voice message; and
obtaining an emotion recognition result according to the audio emotion recognition result and the text emotion recognition result.
Kalinli-Akbacak ‘556 teaches determining an emotional state of a user from analysis of a combination of two or more different types of features, including acoustic and linguistic (para. 0017, 0021, 0026, 0039).  It would have been obvious to one of ordinary skill in the art before 
As to claim 8, Waltermann in view of Kalinli-Akbacak ‘556 discloses: 
wherein the audio emotion recognition result and the text emotion recognition result respectively correspond to one coordinate point in a multi-dimensional emotion space (Kalinli-Akbacak ‘556: three-dimensional valence-arousal-dominance model, para. 0031), 
wherein each dimension in the multi-dimensional emotion space corresponds to a psychologically defined sentiment factor, and each of the emotion classifications comprises a plurality of emotion intensity levels (Kalinli-Akbacak ‘556: para. 0031-0032, Table I), and 
the obtaining an emotion recognition result according to the audio emotion recognition result and the text emotion recognition result comprises: 
obtaining coordinate points by weighting averaging processing of coordinate values of the coordinate point of the audio emotion recognition result and the coordinate point of the text emotion recognition result in the multi-dimensional emotion space (Kalinli-Akbacak ‘556: fusing process may take the average of probability scores for estimating emotion class and weighting, para. 0043); 
using the coordinate points as the emotion recognition result (Kalinli-Akbacak ‘556: para, 0031).
Claims 9-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Waltermann in view of Kalinli-Akbacak ‘556, as applied to claim 1 above, and further in view of Kalinli-Akbacak (US 2014/0114655 A1, “Kalinli-Akbacak ‘655”).
As to claim 9, Waltermann in view of Kalinli-Akbacak ‘556 differs from claim 9 in that it does not disclose: wherein the obtaining an audio emotion recognition result according to audio data of the user voice message comprises: 

matching the audio feature vector of the user voice message with a plurality of emotional feature models, wherein the plurality of emotional feature models respectively correspond to one of a plurality of emotion classifications; and 
using an emotion classification corresponding to the emotional feature model that is matched as an emotion classification of the user voice message.
Kalinli-Akbacak ‘655 teaches extracting an audio feature vector for matching with a plurality of emotional features models and classification (Fig. 1A, para. 0040-0044).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Waltermann in view of Kalinli-Akbacak ‘556 with the above feature of Kalinli-Akbacak ‘655 in order to improve emotion recognition by analyzing only salient parts of a speech signal, as taught by Akbacak ‘655 (para. 0018).
As to claim 10, Waltermann in view of Kalinli-Akbacak ‘556 and ‘655 discloses: wherein the plurality of emotional feature models are established by pre-learning respective audio feature vector sets of a plurality of preset voice segments comprising emotion classification labels corresponding to the plurality of emotion classifications (Kalinli-Akbacak ‘655: para. 0042-0044).
As to claims 14, 15, 16, Waltermann in view of Kalinli-Akbacak ‘556 and ‘655 discloses: wherein the audio feature vector comprises one or more of the following audio features: an energy feature, a speech frame number feature, a pitch frequency feature, a formant feature, a harmonic to noise ratio feature, and a mel-frequency cepstral coefficient feature (Waltermann: para. 0031-0032; Kalinli-Akbacak ‘556: para. 0021; Kalinli-Akbacak ‘655: para. 0053-0063).
Claim 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Waltermann in view of Kalinli-Akbacak, as applied to claim 10 above, and further in view of Kalinli-Akbacak et al. (US 2016/0027452 A1, “Kalinli-Akbacak ‘452”).

performing clustering processing on the respective audio feature vector sets of the plurality of preset voice segments comprising the emotion classification labels corresponding to the plurality of emotion classifications, to obtain a clustering result of a preset emotion classification; and 
training, according to the clustering result, an audio feature vector set of the preset voice segment in each cluster to be one of the emotional feature models.
Kalinli-Akbacak ‘452 teaches the use of a clustering process for generating emotion recognition models (para. 0012-0018).  It would have been obvious to modify Waltermann in view of Kalinli-Akbacak with the above teaching of Kalinli-Akbacak ‘452 in order to provide improved emotion recognition models adaptive to different speaking styles to maximize accuracy, as taught by Kalinli-Akbacak ‘452 (para. 0043-0045).
Claims 12-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Waltermann in view of Kalinli-Akbacak ‘556 and ‘655, as applied to claim 9 above, and further in view of Tsiartas et al. (US 2017/0084295 A1, “Tsiartas”).
Waltermann in view of Kalinli-Akbacak ‘556 and ‘655 differs from claim 12 in that it does not specifically disclose: determining a voice start frame and a voice end frame in the to-be-identified audio stream; and extracting an audio stream portion between the voice start frame and the voice end frame as the user voice message.
Tsiartas teaches determining a time window starting when the user first starts to speak (para. 0028, 0037), start and end times (para. 0048) and audio segmentation which identifies segments defined by start and end points of each portion of speech (para. 0088).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Waltermann in view of Kalinli-Akbacak ‘556 and ‘655 with the above teaching of Tsiartas in order to identify the audio input window which contains speech for 
As to claim 13, Waltermann in view of Kalinli-Akbacak ‘556 and ‘655 and Tsiartas discloses: 
determining whether a voice frame in the to-be-identified audio stream is a speech frame or a non-speech frame (Tsiartas: para. 0116); 
after the voice end frame of a previous voice segment, or a first voice segment is not yet identified, and when a first preset quantity of voice frames are consecutively determined as speech frames, using the first voice frame of the first preset quantity of the voice frames as the voice start frame of a current voice segment (Tsiartas: para. 0088, 0116); and 
after the voice start frame of the current voice segment, and when a second preset quantity of voice frames are consecutively determined as non-speech frames, using the first voice frame of the second preset quantity of the voice frames as the voice end frame of the current voice segment (Tsiartas: para. 0116).
Claims 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Waltermann in view of Martinez, as applied to claim 1, and further in view of Wang et al. (US 2018/0314689 A1, “Wang”).
Waltermann in view of Martinez differs from claim 17 in that it does not specifically disclose: 
matching the text content of the user voice message with a plurality of preset semantic templates in a semantic knowledge repository to determine a matched semantic template; and 
obtaining the basic intention information corresponding to the matched semantic template, 
wherein a correspondence between the semantic template and the basic intention information is pre-established in the semantic knowledge repository, and same intention information corresponds to one or more semantic templates.

As to claim 18, Waltermann in view of Martinez, and Wang discloses: performing calculation on a similarity between the text content of the user voice message and the plurality of preset semantic templates; and using a semantic template with a highest similarity as the matched semantic template (Wang: determines, with a statistically high degree of confidence, an output most closely matched to the person’s intent, using semantic rules/models, para. 0169).
Claims  21-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Waltermann in view of Martinez, as applied to claim 1 above, and further in view of Tsiartas.
Waltermann in view of Martinez differs from claim 21 in that it does not specifically disclose: calculating confidence of an emotion classification in the audio emotion recognition result and confidence of an emotion classification in the text emotion recognition result; obtaining the emotion recognition result according to the confidence in the audio emotion recognition result and the confidence in the text emotion recognition result.
Tsiartas teaches the calculation of a measure of degree or confidence in an emotion classification (para. 0027).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Waltermann in view of Martinez with the above teaching of Tsiartas by applying the confidence calculation to both audio and text emotion recognition results in order to provide a more accurate indication of speaker state.
As to claim 22, Waltermann in view of Martinez and Tsiartas discloses: wherein the determining corresponding emotional intention information according to the emotion recognition result and the basic intention information comprises: 
.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Wang et al. (US 2018/0357286 A1) teach assigning an emotional state based on prior interactions (para. 0018).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to STELLA L WOO whose telephone number is (571)272-7512.  The examiner can normally be reached on Monday - Friday, 9 a.m. to 3 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached on 571-272-7488.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/Stella L. Woo/            Primary Examiner, Art Unit 2652