DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 3-6, 8-22 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-6, 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Divakaran et al. (US 2017/0160813 A1, “Divakaran”) in view of Oudeyer et al. (US 2002/0198717 A1, “Oudeyer”).
As to claim 1, 19-20, Divakaran discloses an intelligent interactive method, comprising: 
obtaining an emotion recognition result according to a user message, wherein the user message comprises at least a user voice message (Fig. 12, audio input 1210 and speech emotion detection 1214, para. 0164-0165; virtual personal assistant uses emotion detection to determine emotion such as anxiety or frustration in a user’s tone of voice, para. 0328-0329, 0333-0334); 
performing an intention analysis according to a text content of the user voice message to obtain corresponding basic intention information (speech recognition 1212, para. 0164-0165; 
determining corresponding emotional intention information according to the emotion recognition result and the basic intention information (combined analysis of speech recognition and speech emotion detection, para. 0165, to determine what a person wants, para. 0039-0040); and 
determining a corresponding interactive instruction according to the emotional intention information, or determining the corresponding interactive instruction according to the emotional intention information and the basic intention information (virtual personal assistant responds in a reassuring manner, para. 0329, such as “I’m sorry. Did you try aisle six?” to alleviate frustration, para. 0333-0334).
Divakaran differs from claim 1 in that although it teaches outputting a voice broadcast in a reassuring manner (para. 0329), it does not disclose:
determining an intonation and a speaking speed of a voice broadcast according to the emotional intention information; 
outputting the voice broadcast corresponding to the interactive instruction to the user at the intonation and the speaking speed.  
Oudeyer teaches a robot apparatus which, in response to a determined emotion state of a human, synthesizes a voice sentence expressing a particular emotion (Abstract; Fig. 1), such as calm and comfort (para. 0056-0057, 0088) by adjusting prosodic parameters including duration, pitch, intensity of each phoneme, etc. (para. 0040, 0056-0067, 0090).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Divakaran with the above teaching of Oudeyer in order to establish a good relationship between the virtual assistant and the human user, as taught by Oudeyer (para. 0036).

As to claim 4, Divakaran in view of Oudeyer discloses: wherein the interactive instruction comprises one or more of the following sentiment presentation modes: a text output sentiment presentation mode, a music play sentiment presentation mode, a voice sentiment presentation mode, an image sentiment presentation mode, and a mechanical action sentiment presentation mode (Divakaran: output 106 includes vocalized output, display of text, graphics or video, action, para. 0051, 0081, 0143).
As to claim 5, Divakaran in view of Oudeyer discloses: wherein the emotional intention information comprises sentiment need information corresponding to the emotion recognition result (Divakaran: detection of anxiety indicates a need for reassurance, para. 0329); or the emotional intention information comprises the sentiment need information corresponding to the emotion recognition result and an association relationship between the emotion recognition result and the basic intention information (Divakaran: system associates the emotion of “frustration” with the concept of “unable to find Annie’s Macaroni and Cheese, para. 0335).
As to claim 6, Divakaran in view of Oudeyer discloses: wherein the user message comprises at least a user voice message; and the obtaining an emotion recognition result according to an obtained user message comprises: obtaining the emotion recognition result according to the user voice message (Divakaran: detect frustration from user’s tone of voice, para. 0334).
Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Divakaran in view of Oudeyer, as applied to claim 1 above, and further in view of Kalinli-Akbacak (US 2014/0112556 A1, “Kalinli-Akbacak ‘556”).
Divakaran in view of Oudeyer discloses: obtaining an audio emotion recognition result according to audio data of the user voice message (Divakaran: emotional state is identified from verbal cues, such as the manner in which words were spoken and/or verbalizations that were not words, para. 0139; speech emotion detection engine 1214, 0165), but differs from claim 8 in that it does not disclose: 
obtaining a text emotion recognition result according to the text content of the user voice message; 
obtaining an emotion recognition result according to the audio emotion recognition result and the text emotion recognition result, 
wherein the audio emotion recognition result and the text emotion recognition result respectively correspond to one coordinate point in a multi-dimensional emotion space, 
wherein each dimension in the multi-dimensional emotion space corresponds to a psychologically defined sentiment factor, and each of the emotion classifications comprises a plurality of emotion intensity levels; and 
the obtaining an emotion recognition result according to the audio emotion recognition result and the text emotion recognition result comprises: 
obtaining coordinate points by weighting averaging processing of coordinate values of the coordinate point of the audio emotion recognition result and the coordinate point of the text emotion recognition result in the multi-dimensional emotion space; 
using the coordinate points as the emotion recognition result.
Kalinli-Akbacak ‘556 teaches determining an emotional state of a user from analysis of a combination of two or more different types of features, including acoustic and linguistic (para. 0017, 0021, 0026, 0039), a three-dimensional valence-arousal-dominance model (para. 0031), 
It would have been obvious to one of ordinary skill in the art before the effective date of the claimed invention to modify Divakaran in view of Oudeyer with the above teaching of Kalinli-Akbacak ‘556 in order to provide for reliable emotion recognition by fusing multi-modal inputs, as taught by Kalinli-Akbacak ‘556 (para. 0017).
Claims 9-10, 14-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Divakaran in view of Oudeyer, as applied to claim 1 above, and further in view of Kalinli-Akbacak (US 2014/0114655 A1, “Kalinli-Akbacak ‘655”).
Divakaran in view of Oudeyer differs from claim 9 in that it does not disclose: wherein the obtaining an audio emotion recognition result according to audio data of the user voice message comprises: 
extracting an audio feature vector of the user voice message, wherein the user voice message corresponds to a segment of a to-be-identified audio; 
matching the audio feature vector of the user voice message with a plurality of emotional feature models, wherein the plurality of emotional feature models respectively correspond to one of a plurality of emotion classifications; and 
using an emotion classification corresponding to the emotional feature model that is matched as an emotion classification of the user voice message.
Kalinli-Akbacak ‘655 teaches extracting an audio feature vector for matching with a plurality of emotional features models and classification (Fig. 1A, para. 0040-0044).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Divakaran in view of Oudeyer with the above feature of Kalinli-Akbacak ‘655 in order to improve emotion recognition by analyzing only salient parts of a speech signal, as taught by Akbacak ‘655 (para. 0018).

As to claims 14, 15, 16, Divakaran in view of Oudeyer and Kalinli-Akbacak ‘655 discloses: wherein the audio feature vector comprises one or more of the following audio features: an energy feature, a speech frame number feature, a pitch frequency feature, a formant feature, a harmonic to noise ratio feature, and a mel-frequency cepstral coefficient feature (Kalinli-Akbacak ‘655: para. 0053-0063).
Claim 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Divakaran in view of Oudeyer and Kalinli-Akbacak ‘655, as applied to claim 10 above, and further in view of Kalinli-Akbacak et al. (US 2016/0027452 A1, “Kalinli-Akbacak ‘452”).
Divakaran in view of Oudeyer and Kalinli-Akbacak ‘655 differs from claim 11 in that it does not specifically disclose: 
performing clustering processing on the respective audio feature vector sets of the plurality of preset voice segments comprising the emotion classification labels corresponding to the plurality of emotion classifications, to obtain a clustering result of a preset emotion classification; and 
training, according to the clustering result, an audio feature vector set of the preset voice segment in each cluster to be one of the emotional feature models.
Kalinli-Akbacak ‘452 teaches the use of a clustering process for generating emotion recognition models (para. 0012-0018).  It would have been obvious to modify Divakaran in view of Oudeyer and Kalinli-Akbacak ‘655 with the above teaching of Kalinli-Akbacak ‘452 in order to provide improved emotion recognition models adaptive to different speaking styles to maximize accuracy, as taught by Kalinli-Akbacak ‘452 (para. 0043-0045).
Claims 12-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Divakaran in view of Oudeyer and Kalinli-Akbacak ‘655, as applied to claim 9 above, and further in view of Tsiartas et al. (US 2017/0084295 A1, “Tsiartas”).
Divakaran in view of Oudeyer and Kalinli-Akbacak ‘655 differs from claim 12 in that it does not specifically disclose: determining a voice start frame and a voice end frame in the to-be-identified audio stream; and extracting an audio stream portion between the voice start frame and the voice end frame as the user voice message.
Tsiartas teaches determining a time window starting when the user first starts to speak (para. 0028, 0037), start and end times (para. 0048) and audio segmentation which identifies segments defined by start and end points of each portion of speech (para. 0088).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Divakaran in view of Oudeyer and Kalinli-Akbacak ‘655 with the above teaching of Tsiartas in order to identify the audio input window which contains speech for subsequent analysis, which would have been recognized by one of ordinary skill in the art as a predictable result.
As to claim 13, Divakaran in view of Oudeyer, Kalinli-Akbacak ‘655 and Tsiartas discloses: 
determining whether a voice frame in the to-be-identified audio stream is a speech frame or a non-speech frame (Tsiartas: para. 0116); 
after the voice end frame of a previous voice segment, or a first voice segment is not yet identified, and when a first preset quantity of voice frames are consecutively determined as speech frames, using the first voice frame of the first preset quantity of the voice frames as the voice start frame of a current voice segment (Tsiartas: para. 0088, 0116); and 
after the voice start frame of the current voice segment, and when a second preset quantity of voice frames are consecutively determined as non-speech frames, using the first .
Claims 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Divakaran in view of Oudeyer, as applied to claim 1, and further in view of Wang et al. (US 2018/0314689 A1, “Wang”).
Divakaran in view of Oudeyer differs from claim 17 in that it does not specifically disclose: 
matching the text content of the user voice message with a plurality of preset semantic templates in a semantic knowledge repository to determine a matched semantic template; and 
obtaining the basic intention information corresponding to the matched semantic template, 
wherein a correspondence between the semantic template and the basic intention information is pre-established in the semantic knowledge repository, and same intention information corresponds to one or more semantic templates.
Wang teaches applying sematic rules and/or models to determine intent associated with verbal input (para. 0003, 0169).  It would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to modify Divakaran in view of Oudeyer with the above feature of Wang in order to more accurately identify user intent.
As to claim 18, Divakaran in view of Oudeyer and Wang discloses: performing calculation on a similarity between the text content of the user voice message and the plurality of preset semantic templates; and using a semantic template with a highest similarity as the matched semantic template (Wang: determines, with a statistically high degree of confidence, an output most closely matched to the person’s intent, using semantic rules/models, para. 0169).
Claims  21-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Divakaran in view of Oudeyer, as applied to claim 1 above, and further in view of Tsiartas.

Tsiartas teaches the calculation of a measure of degree or confidence in an emotion classification (para. 0027).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Divakaran in view of Oudeyer with the above teaching of Tsiartas by applying the confidence calculation to both audio and text emotion recognition results in order to provide a more accurate indication of speaker state.
As to claim 22, Divakaran in view of Oudeyer and Tsiartas discloses: wherein the determining corresponding emotional intention information according to the emotion recognition result and the basic intention information comprises: 
determining corresponding emotional intention information according to the emotion recognition result and the basic intention information, in combination with an emotion recognition result and basic intention information of a previous user voice message and/or a subsequent user voice message (Tsiartas: para. 0047, 0086, 0098-0099).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Bodin et al. (US 2007/0100628 A1) teach synthesizing voice using a particular prosody setting, which adjusts intonation, rate of speech, etc. (para. 0153).  Zimmerman (US 2018/0124242 A1) teach monitoring an emotional state of a conversation and performing an intervention action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to STELLA L WOO whose telephone number is (571)272-7512.  The examiner can normally be reached on Monday - Friday, 9 a.m. to 3 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached on 571-272-7488.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/Stella L. Woo/            Primary Examiner, Art Unit 2652