DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 9/12/2019. Claims 1-20 are pending in the application. As such, claims 1-20 have been examined. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings are objected to because there are no  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 8 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 8, 13 and 20 use the phrase “certain criteria” which is unclear.  Specifically, it’s unclear as to what criteria qualifies as “certain” criteria and what criteria does not, which renders the scope of the claim indefinite.
Claims 9-11, 14, and 15 are also indefinite, as they inherit the deficiency from an indefinite base claim and do not correct the problem.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-4, 6, 7 and 17-19 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Wang et al. (US Patent Pub. No. 2021/0191506), hereinafter Wang.

Regarding claim 1, Wang teaches an emotion recognition device (Wang [0032] An affective interaction system based on an affective computing user interface (“AUI”) may enable a user to make affective interaction in one or more modalities with the system and receive affective feedbacks from the system through a process comprising, e.g., emotion-related data collection, emotion recognition, user intention computing, affective strategy formulation, and affective computing expression generation)
comprising: an uni-modal preprocessor (Wang [0141] The systems, apparatus, and methods disclosed herein may be implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine readable storage device, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers)
configured to include a plurality of recognition processors (Wang [0141] The systems, apparatus, and methods disclosed herein may be implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine readable storage device, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers)
each 
corresponding to a different one of a plurality of modals (Wang [0037] Multichannel front-end terminal 116 may be a hardware device such as a robot, a smart terminal, a smartphone, an instant message (“IM”) platform, or any electronic device capable of providing an interface for a human user to make affective interaction with system 100. Through an affective interface of terminal 116, the user may make an emotion communication 102 in one or more modalities, such as a text 104, a voice 106, a facial expression 108, a gesture 110, a physiological signal 112, and/or a multimodality 114, and receive affective feedbacks also in one or more modalities. Text 104 may be any written information or expression in human or computer readable language, such as a word, a text message, an emoji, etc. Voice 106 may be any sound made by a human being using the vocal folds for talking, singing, laughing, crying, screaming, etc. Facial expression 108 may be an observed facial movement that reflects one or more motions or positions of the muscles beneath the skin of a user's face, such as a sad look, laughing, raising eyebrows, an eye contact, etc.), 
and learned to recognize emotion information of a user contained in uni-modal input data (Wang [0037] Multichannel front-end terminal 116 may be a hardware device such as a robot, a smart terminal, a smartphone, an instant message (“IM”) platform, or any electronic device capable of providing an interface for a human user to make affective interaction with system 100. Through an affective interface of terminal 116, the user may make an emotion communication 102 in one or more modalities, such as a text 104, a voice 106, a facial expression 108, a gesture 110, a physiological signal 112, and/or a multimodality 114, and receive affective feedbacks also in one or more modalities. Text 104 may be any written information or expression in human or computer readable language, such as a word, a text message, an emoji, etc. Voice 106 may be any sound made by a human being using the vocal folds for talking, singing, laughing, crying, screaming, etc. Facial expression 108 may be an observed facial movement that reflects one or more motions or positions of the muscles beneath the skin of a user's face, such as a sad look, laughing, raising eyebrows, an eye contact, etc.); 
and a multi-modal recognizer (Wang [0043] Emotion recognizer 204 may be implemented as a hardware device running one or more computing programs to receive an emotion-related data, recognize an emotion feature based on different forms of emotion-related data. Further, it may fuse the recognized emotion features into a multimodal emotion feature)
configured to merge output data from each of the plurality of recognition processors (Wang [0043] Emotion recognizer 204 may be implemented as a hardware device running one or more computing programs to receive an emotion-related data, recognize an emotion feature based on different forms of emotion-related data. Further, it may fuse the recognized emotion features into a multimodal emotion feature), 
and to be learned to recognize the emotion information of the user contained in the merged data (Wang [0043] Emotion recognizer 204 may be implemented as a hardware device running one or more computing programs to receive an emotion-related data, recognize an emotion feature based on different forms of emotion-related data. Further, it may fuse the recognized emotion features into a multimodal emotion feature), 
wherein the emotion recognition device is to output a complex emotion recognition result (Wang [0069] Referring back to FIG. 3A, emotion recognizer 204 may derive an emotion state 304 based on emotion-related data 302, and then transmit it to a user intention computing processor 206 at module 120)
that includes a plurality of emotion recognition results (Wang [0067] Referring back to FIG. 7, in some other embodiments, multimodal fusion processor 512 may be implemented based on models of emotion feature of each modality that are inter-connected with each other. For example, a video and an audio may be processed based on a hidden Markov model in order to build connections and complementarity between emotion features of two modalities based on the needs of processing. In addition, in some other embodiments, multimodal fusion processor 512 may also be implemented based on separate models of emotion feature of each modality. In such embodiments, each model independently recognizes an emotion feature and outputs all recognized emotion features at the end. For example, recognized emotion features in voice emotion-related data, facial expression emotion-related data, and physiological signal emotion-related data may be output together based on weighted superposition (linear), or the multi-layer perceptron in the convolutional neural network (non-linear), etc.)
each corresponding to a different one of the plurality of recognition processors (Wang [0067] Referring back to FIG. 7, in some other embodiments, multimodal fusion processor 512 may be implemented based on models of emotion feature of each modality that are inter-connected with each other. For example, a video and an audio may be processed based on a hidden Markov model in order to build connections and complementarity between emotion features of two modalities based on the needs of processing. In addition, in some other embodiments, multimodal fusion processor 512 may also be implemented based on separate models of emotion feature of each modality. In such embodiments, each model independently recognizes an emotion feature and outputs all recognized emotion features at the end. For example, recognized emotion features in voice emotion-related data, facial expression emotion-related data, and physiological signal emotion-related data may be output together based on weighted superposition (linear), or the multi-layer perceptron in the convolutional neural network (non-linear), etc.) 
and an emotion recognition result of the multi-modal recognizer (Wang [0043] Emotion recognizer 204 may be implemented as a hardware device running one or more computing programs to receive an emotion-related data, recognize an emotion feature based on different forms of emotion-related data. Further, it may fuse the recognized emotion features into a multimodal emotion feature).

Regarding claim 2, Wang teaches the emotion recognition device of claim 1.
Wang further teaches
further comprising 
a modal separator for separating input data into a plurality of uni-modal input data each being uni-modal (Wang [0064] When emotion recognizer 204 receives more than one type of emotion-related data at the same time, it may utilize different forms of emotion recognizers as illustrated above to recognize such emotion-related data separately but simultaneously. Then, emotion recognizer 204 may further include a multimodal fusion processor 512 to fuse the recognized emotion features into multimodal emotion feature. In some embodiments, multimodal fusion processor 512 may just fuse the emotion feature data, if such data is of the same structure and format. However, in some other embodiments, multimodal fusion processor 512 may align emotion features obtained from emotion-related data of different modalities and construct vector quantity of aligned features. For example, when emotion features are extracted from a video and an audio, the multimodal fusion processor may synchronize the features based on the timeline. Then it may derive vector quantity for both emotion features in order for them to be processed as a whole in later stages. For instance, multimodal fusion processor 512 may be implemented to fuse emotion features extracted from audio and video based on a convolutional neural network, as illustrated in FIG. 15), 
and to provide the plurality of uni-modal input data to the uni-modal preprocessor (Wang [0064] When emotion recognizer 204 receives more than one type of emotion-related data at the same time, it may utilize different forms of emotion recognizers as illustrated above to recognize such emotion-related data separately but simultaneously. Then, emotion recognizer 204 may further include a multimodal fusion processor 512 to fuse the recognized emotion features into multimodal emotion feature. In some embodiments, multimodal fusion processor 512 may just fuse the emotion feature data, if such data is of the same structure and format. However, in some other embodiments, multimodal fusion processor 512 may align emotion features obtained from emotion-related data of different modalities and construct vector quantity of aligned features. For example, when emotion features are extracted from a video and an audio, the multimodal fusion processor may synchronize the features based on the timeline. Then it may derive vector quantity for both emotion features in order for them to be processed as a whole in later stages. For instance, multimodal fusion processor 512 may be implemented to fuse emotion features extracted from audio and video based on a convolutional neural network, as illustrated in FIG. 15).

Regarding claim 3, Wang teaches the emotion recognition device of claim 2.
Wang further teaches
wherein the plurality of uni- modal input data comprises 
image uni-modal input data (Wang [0061] Facial expression emotion-related data 316 and gesture emotion-related data 318 may be captured with similar tools and compiled in similar data format, as illustrated in FIG. 6. Therefore, with reference to FIG. 7, facial expression emotion recognizer 706 and gesture emotion recognizer 708 may also be implemented similarly based on image and video processing because of the similarities of facial expression emotion-related data 316 and gesture emotion-related data 318. Taking facial expression emotion recognizer 706 as an example, in some embodiments, it can be implemented based on recognizing facial features. In such embodiments, after obtaining facial expression emotion-related data such as an images or a video, the facial expression emotion recognizer may extract static facial feature from an image and extract a series of static facial features and/or facial motion feature from a video. Based on the extracted features, the facial expression emotion recognizer 706 may recognize an emotion feature in such facial expression emotion-related data by using a matching model, a probabilistic model, and/or a support vector machine. Moreover, in some other embodiments, facial expression emotion recognizer 706 may be implemented based on machine learning of human facial expressions by using a 3D morphable model (3DMM), as illustrated in FIG. 14. The 3DMM is a statistical model of 3D facial shape and texture. It can represent a novel face in an image by model coefficients and reconstruct a 3D face (including a facial shape and image textures) from single images based on rendering or scene parameters), 
speech uni-modal input data (Wang [0060] With reference to FIG. 7, voice emotion recognizer 704 may separately or jointly analyze the acoustic features and/or linguistic features in voice emotion-related data 314 in order to recognize the emotion thereof. Acoustic features include features such as energy, frame numbers, a fundamental tone frequency, formant, a noise rate of a harmonic wave, etc. Such features may be described in a form of an example value, a mean value, a greatest value, a median value, a standard deviation, etc. Linguistic features in voice emotion-related data may be the characteristics of the words and language used therein. In some embodiments, voice emotion recognizer 704 may be implemented based on analysis of linguistic features. It may convert the voice emotion-related data into text and process it in a similar way as for text emotion-related data 312 with possible exceptions of different ways of an expression in oral language and written language. In some other embodiments, voice emotion recognizer 704 may be implemented based on analysis of acoustic features by using machine learning. During the learning process, the voice emotion recognizer may extract acoustic features of certain voice emotion-related data from a training database and comprehend the matching rules for such acoustic features and their matched emotion thereof. Therefore, in the future, the voice emotion recognizer may be able to match a certain type of an acoustic feature with a certain emotion based on the matching rules it has learned during the learning process. Furthermore, in some embodiments, voice emotion recognizer 704 may be implemented based on analysis of both acoustic features and linguistic features of voice emotion-related data 314. When there is more than one output, the voice emotion recognizer in such embodiments may make selections and determine a final output based on analysis of a credence and tendentiousness level thereof), 
and text uni-modal input data (Wang [0059] In some embodiment, text emotion recognizer 702 may be implemented based on machine learning. Based on a database that contains certain type of text emotion-related data and its matched emotion state, text emotion recognizer 702 may be able to learn the recognition and output pattern. It may therefore be able to derive a desired emotion state based on a certain text emotion-related data input. In some other embodiments, text emotion recognizer 702 may be implemented based on natural language processing methods. Such text emotion recognizer may reply on an emotion semantic database and an emotion expression word database to extract key words, determine a property of certain words, and analyze a sentence structure in order to recognize an emotion in the text. They emotion semantic database may contain sematic information of certain polysemous words and the usage of each meaning thereof, in order to enable the text emotion recognizer to eliminate ambiguity and determine an exact emotion expression that is contained in such words. The emotion expression word database may include matching rules for various emotion expression words, which enables the text emotion recognizer to recognize an emotion expressed by difference words when matched together. An exemplary embodiment of the emotion expression word database can be structured as below)
that are separated from moving image data that includes the user (Wang [0056] With reference to FIG. 6, in some embodiments, data collector 202 may further include a data analyzer 618 to analyze captured emotion communication data 616 to obtain emotion-related data 302. Data analyze 618 may compile captured emotion communication data 616 into emotion-related data 302 of a desired structure, format, annotation, method of storage, and inquiry mode based on the modality of the emotion, different scenarios, and need of further processing. Emotion-related data 302, for example, may be text emotion-related data 312, voice emotion-related data 314, facial expression emotion-related data 316, gesture emotion-related data 318, physiological emotion-related data 320, and multimodality emotion-related data 322. Emotion-related data 302 may be static data or dynamic data. Static emotion-related data may be a certain type of data that records affective interaction between a user and an affective interaction system of a certain moment, such as a photo, a text, an electrocardiogram, or an emoji. Dynamic emotion-related data may be a certain type of streaming data that records the affective interaction between a user and an affective interaction system of a time span, such as a clip of video, a sonogram video, and a clip of audio. Dynamic data may reflect a dynamic change of the affective interaction of a certain time span. Whether to obtain/use static or dynamic data depends on the modality of emotion communication 102 and/or the need of further processing. The format of emotion-related data 302 may be structured such as a data record, or non-structured such as video, audio, signal, text, and so on).

Regarding claim 4, Wang teaches the emotion recognition device of claim 3.
Wang further teaches
wherein the text uni-modal input data is data obtained by converting a speech, separated from the moving image data, into text (Wang [0097] the processor can use a semantic conversion to convert substance of the image or video into a text or symbols as a focus part for further processing).

Regarding claim 6, Wang teaches the emotion recognition device of claim 1.
Wang further teaches
wherein the multi-modal recognizer comprises: 
a merger for combining feature point vectors separately outputted by the plurality of recognition processors based on the corresponding modal (Wang [0064] When emotion recognizer 204 receives more than one type of emotion-related data at the same time, it may utilize different forms of emotion recognizers as illustrated above to recognize such emotion-related data separately but simultaneously. Then, emotion recognizer 204 may further include a multimodal fusion processor 512 to fuse the recognized emotion features into multimodal emotion feature. In some embodiments, multimodal fusion processor 512 may just fuse the emotion feature data, if such data is of the same structure and format. However, in some other embodiments, multimodal fusion processor 512 may align emotion features obtained from emotion-related data of different modalities and construct vector quantity of aligned features. For example, when emotion features are extracted from a video and an audio, the multimodal fusion processor may synchronize the features based on the timeline. Then it may derive vector quantity for both emotion features in order for them to be processed as a whole in later stages. For instance, multimodal fusion processor 512 may be implemented to fuse emotion features extracted from audio and video based on a convolutional neural network, as illustrated in FIG. 15); 
and a multi-modal emotion recognizer learned to recognize the emotion information of the user based on output data of the merger (Wang [0064] When emotion recognizer 204 receives more than one type of emotion-related data at the same time, it may utilize different forms of emotion recognizers as illustrated above to recognize such emotion-related data separately but simultaneously. Then, emotion recognizer 204 may further include a multimodal fusion processor 512 to fuse the recognized emotion features into multimodal emotion feature. In some embodiments, multimodal fusion processor 512 may just fuse the emotion feature data, if such data is of the same structure and format. However, in some other embodiments, multimodal fusion processor 512 may align emotion features obtained from emotion-related data of different modalities and construct vector quantity of aligned features. For example, when emotion features are extracted from a video and an audio, the multimodal fusion processor may synchronize the features based on the timeline. Then it may derive vector quantity for both emotion features in order for them to be processed as a whole in later stages. For instance, multimodal fusion processor 512 may be implemented to fuse emotion features extracted from audio and video based on a convolutional neural network, as illustrated in FIG. 15).

Regarding claim 7, Wang teaches the emotion recognition device of claim 1.
Wang further teaches
wherein the emotion recognition result of each separate one of the plurality of recognition processors includes a probability for each of preset emotion classes (Wang [0066] The softmax function provides probabilities for each class label and is often used in a final layer of a neural network-based classifier).

Regarding claim 17, Wang teaches a server (Wang [0038] Terminal 116 provides an affective computing user interface that is capable of collecting a user's emotion communication and deriving emotion-related data for the purpose of further processing. In later stages of the affective interaction session, terminal 116 may receive commands from another device, e.g. module 120, and execute such commands and generate affective expressions to feed back to the user. For example, in the embodiment illustrated in FIG. 1, a user may make emotion communication 102, which may be collected by terminal 116. Terminal 116 may then send the received emotion communication 102 to module 120 through network 118 for further processing. Module 120 may accordingly complete the processing and transmit the results back to terminal 116 in order to enable terminal 116 to accordingly provide affective expressions as a feedback to the user)
comprising: 
a communication device configured to receive (Wang [0038] Terminal 116 provides an affective computing user interface that is capable of collecting a user's emotion communication and deriving emotion-related data for the purpose of further processing. In later stages of the affective interaction session, terminal 116 may receive commands from another device, e.g. module 120, and execute such commands and generate affective expressions to feed back to the user. For example, in the embodiment illustrated in FIG. 1, a user may make emotion communication 102, which may be collected by terminal 116. Terminal 116 may then send the received emotion communication 102 to module 120 through network 118 for further processing. Module 120 may accordingly complete the processing and transmit the results back to terminal 116 in order to enable terminal 116 to accordingly provide affective expressions as a feedback to the user), 
from a robot (Wang [0036] FIG. 1 is a block diagram illustrating an exemplary affective interaction system 100. Exemplary system 100 may be any type of system that provides affective interaction to a user based on an AUI, such as a service robot, a companion robot, a smart wearable, smart furniture, a smart home device, etc.), 
moving image data including a user, 
and transmit (Wang [0038] Terminal 116 provides an affective computing user interface that is capable of collecting a user's emotion communication and deriving emotion-related data for the purpose of further processing. In later stages of the affective interaction session, terminal 116 may receive commands from another device, e.g. module 120, and execute such commands and generate affective expressions to feed back to the user. For example, in the embodiment illustrated in FIG. 1, a user may make emotion communication 102, which may be collected by terminal 116. Terminal 116 may then send the received emotion communication 102 to module 120 through network 118 for further processing. Module 120 may accordingly complete the processing and transmit the results back to terminal 116 in order to enable terminal 116 to accordingly provide affective expressions as a feedback to the user), 
to the robot (Wang [0036] FIG. 1 is a block diagram illustrating an exemplary affective interaction system 100. Exemplary system 100 may be any type of system that provides affective interaction to a user based on an AUI, such as a service robot, a companion robot, a smart wearable, smart furniture, a smart home device, etc.), 
a complex emotion recognition result that includes a plurality of emotion recognition results; 
and an emotion recognition device (Wang [0038] Terminal 116 provides an affective computing user interface that is capable of collecting a user's emotion communication and deriving emotion-related data for the purpose of further processing. In later stages of the affective interaction session, terminal 116 may receive commands from another device, e.g. module 120, and execute such commands and generate affective expressions to feed back to the user. For example, in the embodiment illustrated in FIG. 1, a user may make emotion communication 102, which may be collected by terminal 116. Terminal 116 may then send the received emotion communication 102 to module 120 through network 118 for further processing. Module 120 may accordingly complete the processing and transmit the results back to terminal 116 in order to enable terminal 116 to accordingly provide affective expressions as a feedback to the user)
configured to include an uni-modal preprocessor (Wang [0141] The systems, apparatus, and methods disclosed herein may be implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine readable storage device, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers)
and a multi-modal recognizer (Wang [0043] Emotion recognizer 204 may be implemented as a hardware device running one or more computing programs to receive an emotion-related data, recognize an emotion feature based on different forms of emotion-related data. Further, it may fuse the recognized emotion features into a multimodal emotion feature), 
the uni-modal preprocessor configured to include a plurality of recognition processors (Wang [0141] The systems, apparatus, and methods disclosed herein may be implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine readable storage device, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers)
each corresponding to a different one of a plurality of modals (Wang [0037] Multichannel front-end terminal 116 may be a hardware device such as a robot, a smart terminal, a smartphone, an instant message (“IM”) platform, or any electronic device capable of providing an interface for a human user to make affective interaction with system 100. Through an affective interface of terminal 116, the user may make an emotion communication 102 in one or more modalities, such as a text 104, a voice 106, a facial expression 108, a gesture 110, a physiological signal 112, and/or a multimodality 114, and receive affective feedbacks also in one or more modalities. Text 104 may be any written information or expression in human or computer readable language, such as a word, a text message, an emoji, etc. Voice 106 may be any sound made by a human being using the vocal folds for talking, singing, laughing, crying, screaming, etc. Facial expression 108 may be an observed facial movement that reflects one or more motions or positions of the muscles beneath the skin of a user's face, such as a sad look, laughing, raising eyebrows, an eye contact, etc.), 
and learned to recognize emotion information of a user contained in uni-modal input data (Wang [0037] Multichannel front-end terminal 116 may be a hardware device such as a robot, a smart terminal, a smartphone, an instant message (“IM”) platform, or any electronic device capable of providing an interface for a human user to make affective interaction with system 100. Through an affective interface of terminal 116, the user may make an emotion communication 102 in one or more modalities, such as a text 104, a voice 106, a facial expression 108, a gesture 110, a physiological signal 112, and/or a multimodality 114, and receive affective feedbacks also in one or more modalities. Text 104 may be any written information or expression in human or computer readable language, such as a word, a text message, an emoji, etc. Voice 106 may be any sound made by a human being using the vocal folds for talking, singing, laughing, crying, screaming, etc. Facial expression 108 may be an observed facial movement that reflects one or more motions or positions of the muscles beneath the skin of a user's face, such as a sad look, laughing, raising eyebrows, an eye contact, etc.), 
and the multi-modal recognizer (Wang [0043] Emotion recognizer 204 may be implemented as a hardware device running one or more computing programs to receive an emotion-related data, recognize an emotion feature based on different forms of emotion-related data. Further, it may fuse the recognized emotion features into a multimodal emotion feature)
configured to merge output data from each of the plurality of recognition processors (Wang [0043] Emotion recognizer 204 may be implemented as a hardware device running one or more computing programs to receive an emotion-related data, recognize an emotion feature based on different forms of emotion-related data. Further, it may fuse the recognized emotion features into a multimodal emotion feature), 
and be learned to recognize the emotion information of the user contained in the merged data (Wang [0043] Emotion recognizer 204 may be implemented as a hardware device running one or more computing programs to receive an emotion-related data, recognize an emotion feature based on different forms of emotion-related data. Further, it may fuse the recognized emotion features into a multimodal emotion feature), 
and to output a complex emotion recognition result (Wang [0069] Referring back to FIG. 3A, emotion recognizer 204 may derive an emotion state 304 based on emotion-related data 302, and then transmit it to a user intention computing processor 206 at module 120)
that includes a plurality of emotion recognition results (Wang [0067] Referring back to FIG. 7, in some other embodiments, multimodal fusion processor 512 may be implemented based on models of emotion feature of each modality that are inter-connected with each other. For example, a video and an audio may be processed based on a hidden Markov model in order to build connections and complementarity between emotion features of two modalities based on the needs of processing. In addition, in some other embodiments, multimodal fusion processor 512 may also be implemented based on separate models of emotion feature of each modality. In such embodiments, each model independently recognizes an emotion feature and outputs all recognized emotion features at the end. For example, recognized emotion features in voice emotion-related data, facial expression emotion-related data, and physiological signal emotion-related data may be output together based on weighted superposition (linear), or the multi-layer perceptron in the convolutional neural network (non-linear), etc.)
each corresponding to a different one of the plurality of recognition processors (Wang [0067] Referring back to FIG. 7, in some other embodiments, multimodal fusion processor 512 may be implemented based on models of emotion feature of each modality that are inter-connected with each other. For example, a video and an audio may be processed based on a hidden Markov model in order to build connections and complementarity between emotion features of two modalities based on the needs of processing. In addition, in some other embodiments, multimodal fusion processor 512 may also be implemented based on separate models of emotion feature of each modality. In such embodiments, each model independently recognizes an emotion feature and outputs all recognized emotion features at the end. For example, recognized emotion features in voice emotion-related data, facial expression emotion-related data, and physiological signal emotion-related data may be output together based on weighted superposition (linear), or the multi-layer perceptron in the convolutional neural network (non-linear), etc.)
and an emotion recognition result of the multi-modal recognizer (Wang [0043] Emotion recognizer 204 may be implemented as a hardware device running one or more computing programs to receive an emotion-related data, recognize an emotion feature based on different forms of emotion-related data. Further, it may fuse the recognized emotion features into a multimodal emotion feature).

Regarding claim 18, Wang teaches the server of claim 17.

wherein, through the communication device (Wang [0038] Terminal 116 provides an affective computing user interface that is capable of collecting a user's emotion communication and deriving emotion-related data for the purpose of further processing. In later stages of the affective interaction session, terminal 116 may receive commands from another device, e.g. module 120, and execute such commands and generate affective expressions to feed back to the user. For example, in the embodiment illustrated in FIG. 1, a user may make emotion communication 102, which may be collected by terminal 116. Terminal 116 may then send the received emotion communication 102 to module 120 through network 118 for further processing. Module 120 may accordingly complete the processing and transmit the results back to terminal 116 in order to enable terminal 116 to accordingly provide affective expressions as a feedback to the user), 
video call data (Wang [0064] When emotion recognizer 204 receives more than one type of emotion-related data at the same time, it may utilize different forms of emotion recognizers as illustrated above to recognize such emotion-related data separately but simultaneously. Then, emotion recognizer 204 may further include a multimodal fusion processor 512 to fuse the recognized emotion features into multimodal emotion feature. In some embodiments, multimodal fusion processor 512 may just fuse the emotion feature data, if such data is of the same structure and format. However, in some other embodiments, multimodal fusion processor 512 may align emotion features obtained from emotion-related data of different modalities and construct vector quantity of aligned features. For example, when emotion features are extracted from a video and an audio [video and audio maps to video call], the multimodal fusion processor may synchronize the features based on the timeline)
is received from the robot (Wang [0038] Terminal 116 provides an affective computing user interface that is capable of collecting a user's emotion communication and deriving emotion-related data for the purpose of further processing. In later stages of the affective interaction session, terminal 116 may receive commands from another device, e.g. module 120, and execute such commands and generate affective expressions to feed back to the user. For example, in the embodiment illustrated in FIG. 1, a user may make emotion communication 102, which may be collected by terminal 116. Terminal 116 may then send the received emotion communication 102 to module 120 through network 118 for further processing. Module 120 may accordingly complete the processing and transmit the results back to terminal 116 in order to enable terminal 116 to accordingly provide affective expressions as a feedback to the user)
and emotion recognition result of the user included in the received video call data (Wang [0043] Emotion recognizer 204 may be implemented as a hardware device running one or more computing programs to receive an emotion-related data, recognize an emotion feature based on different forms of emotion-related data. Further, it may fuse the recognized emotion features into a multimodal emotion feature)
is transmitted to the robot (Wang [0038] Terminal 116 provides an affective computing user interface that is capable of collecting a user's emotion communication and deriving emotion-related data for the purpose of further processing. In later stages of the affective interaction session, terminal 116 may receive commands from another device, e.g. module 120, and execute such commands and generate affective expressions to feed back to the user. For example, in the embodiment illustrated in FIG. 1, a user may make emotion communication 102, which may be collected by terminal 116. Terminal 116 may then send the received emotion communication 102 to module 120 through network 118 for further processing. Module 120 may accordingly complete the processing and transmit the results back to terminal 116 in order to enable terminal 116 to accordingly provide affective expressions as a feedback to the user).

Regarding claim 19, Wang teaches the server of claim 17.
Wang further teaches
wherein the emotion recognition device (Wang [0038] Terminal 116 provides an affective computing user interface that is capable of collecting a user's emotion communication and deriving emotion-related data for the purpose of further processing. In later stages of the affective interaction session, terminal 116 may receive commands from another device, e.g. module 120, and execute such commands and generate affective expressions to feed back to the user. For example, in the embodiment illustrated in FIG. 1, a user may make emotion communication 102, which may be collected by terminal 116. Terminal 116 may then send the received emotion communication 102 to module 120 through network 118 for further processing. Module 120 may accordingly complete the processing and transmit the results back to terminal 116 in order to enable terminal 116 to accordingly provide affective expressions as a feedback to the user)
includes a modal separator for separating input data into a plurality of uni-modal input data each being uni-modal (Wang [0060] With reference to FIG. 7, voice emotion recognizer 704 may separately or jointly analyze the acoustic features and/or linguistic features in voice emotion-related data 314 in order to recognize the emotion thereof; [0064] When emotion recognizer 204 receives more than one type of emotion-related data at the same time, it may utilize different forms of emotion recognizers as illustrated above to recognize such emotion-related data separately but simultaneously), 
and to provide the plurality of uni-modal input data to the uni-modal preprocessor (Wang [0038] Terminal 116 provides an affective computing user interface that is capable of collecting a user's emotion communication and deriving emotion-related data for the purpose of further processing. In later stages of the affective interaction session, terminal 116 may receive commands from another device, e.g. module 120, and execute such commands and generate affective expressions to feed back to the user. For example, in the embodiment illustrated in FIG. 1, a user may make emotion communication 102, which may be collected by terminal 116. Terminal 116 may then send the received emotion communication 102 to module 120 through network 118 for further processing. Module 120 may accordingly complete the processing and transmit the results back to terminal 116 in order to enable terminal 116 to accordingly provide affective expressions as a feedback to the user).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Kim et al. (US Patent Pub. No. 2019/0213400), hereinafter Kim.

Regarding claim 5, Wang teaches the emotion recognition device of claim 1.
Wang does not teach
wherein the plurality of recognition processors each separately include an artificial neural network corresponding to input characteristic of uni-modal input data inputted respectively.
Kim teaches
wherein the plurality of recognition processors each separately include an artificial neural network corresponding to input characteristic of uni-modal input data inputted respectively (Kim [0008] The extracting may include the plurality of features for each modality from the plurality of pieces of data using a respective one of first neural networks each including layers trained for a respective modality).
Kim is considered to be analogous to the claimed invention because it is in the same field of emotion recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang further in view of Kim to allow for using a neural network for each modality respectively. Doing so would allow for properly processing each modality according to their unique features respectively.

Claims 8-16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Fu (US Patent Pub. No. 2017/0270922).

Regarding claim 8, Wang teaches the emotion recognition device of claim 1.
Wang does not teach
further comprising a post- processor for outputting a final emotion recognition result according to a certain criteria, when the complex emotion recognition result is based on two or more of the emotion recognition results that do not match.
Fu teaches
further comprising a post- processor for outputting a final emotion recognition result according to a certain criteria, when the complex emotion recognition result is based on two or more of the emotion recognition results that do not match (Fu [0092-0097] Due to a complexity of the emotion recognition, there may be a contradictory situation between two emotion recognition results. In this case, in order to ensure an accuracy of the recognition results, recollecting the data and redo the identification is a better approach. [0093] Specifically, the emotion recognition result determination method preset in the step S3 is specifically stated as follows: [0094] when the said first emotion recognition result and the second emotion recognition result are different levels of commendatory emotion, determining the current user emotion recognition result as a low level commendatory emotion; [0095] when the first emotion recognition result and the second emotion recognition result are different levels of derogatory emotion, determining the current user emotion recognition result as a low level derogatory emotion; [0096] when one of the first emotion recognition result and the second emotion recognition result is a neutral emotion, and the other is a derogatory or commendatory emotion, then the current user emotion recognition result is determined as the said commendatory or derogatory emotion. [0097] All together, when both the first emotion recognition result and the second emotion recognition result have an emotion tendency (commendatory or derogatory), it adopts a degradation method by choosing a lower emotion type. And when one of the two is a neutral result, choosing the result with emotional tendency).
Fu is considered to be analogous to the claimed invention because it is in the same field of emotion recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang further in view of Fu to allow for handling contradictory results. Doing so would allow for taking appropriate action (such as controlling a user’s smart home) based on multiple emotion recognition results which may be contradictory.

Regarding claim 9, Wang in view of Fu teaches the emotion recognition device of claim 8.
Wang does not teach
wherein the post-processor outputs, as the final emotion recognition result, an emotion recognition result that matches the emotion recognition result of the multi-modal recognizer from among the emotion recognition results of the recognition processors, when the complex emotion recognition result is based on two or more of the emotion recognition results that do not match.
Fu teaches
wherein the post-processor outputs, as the final emotion recognition result, an emotion recognition result that matches the emotion recognition result of the multi-modal recognizer from among the emotion recognition results of the recognition processors, when the complex emotion recognition result is based on two or more of the emotion recognition results that do not match (Fu [0092-0097] Due to a complexity of the emotion recognition, there may be a contradictory situation between two emotion recognition results. In this case, in order to ensure an accuracy of the recognition results, recollecting the data and redo the identification is a better approach. [0093] Specifically, the emotion recognition result determination method preset in the step S3 is specifically stated as follows: [0094] when the said first emotion recognition result and the second emotion recognition result are different levels of commendatory emotion, determining the current user emotion recognition result as a low level commendatory emotion; [0095] when the first emotion recognition result and the second emotion recognition result are different levels of derogatory emotion, determining the current user emotion recognition result as a low level derogatory emotion; [0096] when one of the first emotion recognition result and the second emotion recognition result is a neutral emotion, and the other is a derogatory or commendatory emotion, then the current user emotion recognition result is determined as the said commendatory or derogatory emotion. [0097] All together, when both the first emotion recognition result and the second emotion recognition result have an emotion tendency (commendatory or derogatory), it adopts a degradation method by choosing a lower emotion type. And when one of the two is a neutral result, choosing the result with emotional tendency).
Fu is considered to be analogous to the claimed invention because it is in the same field of emotion recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang further in view of Fu to allow for handling contradictory results. Doing so would allow for taking appropriate action (such as controlling a user’s smart home) based on multiple emotion recognition results which may be contradictory.

Regarding claim 10, Wang in view of Fu teaches the emotion recognition device of claim 8.
Wang does not teach
wherein the post-processor outputs, as the final emotion recognition result, a contradictory emotion that includes two emotion classes among the complex emotion recognition result, when the complex emotion recognition result is based on two or more of the emotion recognition results that do not match.
Fu teaches
wherein the post-processor outputs, as the final emotion recognition result, a contradictory emotion that includes two emotion classes among the complex emotion recognition result, when the complex emotion recognition result is based on two or more of the emotion recognition results that do not match (Fu [0092-0097] Due to a complexity of the emotion recognition, there may be a contradictory situation between two emotion recognition results. In this case, in order to ensure an accuracy of the recognition results, recollecting the data and redo the identification is a better approach. [0093] Specifically, the emotion recognition result determination method preset in the step S3 is specifically stated as follows: [0094] when the said first emotion recognition result and the second emotion recognition result are different levels of commendatory emotion, determining the current user emotion recognition result as a low level commendatory emotion; [0095] when the first emotion recognition result and the second emotion recognition result are different levels of derogatory emotion, determining the current user emotion recognition result as a low level derogatory emotion; [0096] when one of the first emotion recognition result and the second emotion recognition result is a neutral emotion, and the other is a derogatory or commendatory emotion, then the current user emotion recognition result is determined as the said commendatory or derogatory emotion. [0097] All together, when both the first emotion recognition result and the second emotion recognition result have an emotion tendency (commendatory or derogatory), it adopts a degradation method by choosing a lower emotion type. And when one of the two is a neutral result, choosing the result with emotional tendency).
Fu is considered to be analogous to the claimed invention because it is in the same field of emotion recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang further in view of Fu to allow for handling contradictory results. Doing so would allow for taking appropriate action (such as controlling a user’s smart home) based on multiple emotion recognition results which may be contradictory.

Regarding claim 11, Wang in view of Fu teaches the emotion recognition device of claim 10.
Wang teaches
wherein the post-processor selects, two emotion classes having a highest probability among the emotion recognition result of the multi-modal recognizer (Wang [0066] The softmax function provides probabilities for each class label and is often used in a final layer of a neural network-based classifier).
Wang does not teach
wherein the post-processor selects, as the contradictory emotion, two emotion classes having a highest probability among the emotion recognition result of the multi-modal recognizer.
Fu teaches
wherein the post-processor selects, as the contradictory emotion (Fu [0092-0097] Due to a complexity of the emotion recognition, there may be a contradictory situation between two emotion recognition results. In this case, in order to ensure an accuracy of the recognition results, recollecting the data and redo the identification is a better approach. [0093] Specifically, the emotion recognition result determination method preset in the step S3 is specifically stated as follows: [0094] when the said first emotion recognition result and the second emotion recognition result are different levels of commendatory emotion, determining the current user emotion recognition result as a low level commendatory emotion; [0095] when the first emotion recognition result and the second emotion recognition result are different levels of derogatory emotion, determining the current user emotion recognition result as a low level derogatory emotion; [0096] when one of the first emotion recognition result and the second emotion recognition result is a neutral emotion, and the other is a derogatory or commendatory emotion, then the current user emotion recognition result is determined as the said commendatory or derogatory emotion. [0097] All together, when both the first emotion recognition result and the second emotion recognition result have an emotion tendency (commendatory or derogatory), it adopts a degradation method by choosing a lower emotion type. And when one of the two is a neutral result, choosing the result with emotional tendency).
Fu is considered to be analogous to the claimed invention because it is in the same field of emotion recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang further in view of Fu to allow for handling contradictory results. Doing so would allow for taking appropriate action (such as controlling a user’s smart home) based on multiple emotion recognition results which may be contradictory.

Regarding claim 12, Wang teaches a robot (Wang [0036] FIG. 1 is a block diagram illustrating an exemplary affective interaction system 100. Exemplary system 100 may be any type of system that provides affective interaction to a user based on an AUI, such as a service robot, a companion robot, a smart wearable, smart furniture, a smart home device, etc.)
comprising: a communication device configured to transmit to a server (Wang [0038] Terminal 116 provides an affective computing user interface that is capable of collecting a user's emotion communication and deriving emotion-related data for the purpose of further processing. In later stages of the affective interaction session, terminal 116 may receive commands from another device, e.g. module 120, and execute such commands and generate affective expressions to feed back to the user. For example, in the embodiment illustrated in FIG. 1, a user may make emotion communication 102, which may be collected by terminal 116. Terminal 116 may then send the received emotion communication 102 to module 120 through network 118 for further processing. Module 120 may accordingly complete the processing and transmit the results back to terminal 116 in order to enable terminal 116 to accordingly provide affective expressions as a feedback to the user), 
moving image data including a user (Wang [0054] Furthermore, visual data, such as image, video, etc., containing facial expression emotion-related data 316 and gesture emotion-related data 318 may be used by emotion recognizer), 
the server including an emotion recognition device (Wang [0038] Terminal 116 provides an affective computing user interface that is capable of collecting a user's emotion communication and deriving emotion-related data for the purpose of further processing. In later stages of the affective interaction session, terminal 116 may receive commands from another device, e.g. module 120, and execute such commands and generate affective expressions to feed back to the user. For example, in the embodiment illustrated in FIG. 1, a user may make emotion communication 102, which may be collected by terminal 116. Terminal 116 may then send the received emotion communication 102 to module 120 through network 118 for further processing. Module 120 may accordingly complete the processing and transmit the results back to terminal 116 in order to enable terminal 116 to accordingly provide affective expressions as a feedback to the user)
that is learned to recognize emotion information of the user included in input data (Wang [0038] Terminal 116 provides an affective computing user interface that is capable of collecting a user's emotion communication and deriving emotion-related data for the purpose of further processing. In later stages of the affective interaction session, terminal 116 may receive commands from another device, e.g. module 120, and execute such commands and generate affective expressions to feed back to the user. For example, in the embodiment illustrated in FIG. 1, a user may make emotion communication 102, which may be collected by terminal 116. Terminal 116 may then send the received emotion communication 102 to module 120 through network 118 for further processing. Module 120 may accordingly complete the processing and transmit the results back to terminal 116 in order to enable terminal 116 to accordingly provide affective expressions as a feedback to the user), 
and the communication device to receive, from the server (Wang [0038] Terminal 116 provides an affective computing user interface that is capable of collecting a user's emotion communication and deriving emotion-related data for the purpose of further processing. In later stages of the affective interaction session, terminal 116 may receive commands from another device, e.g. module 120, and execute such commands and generate affective expressions to feed back to the user. For example, in the embodiment illustrated in FIG. 1, a user may make emotion communication 102, which may be collected by terminal 116. Terminal 116 may then send the received emotion communication 102 to module 120 through network 118 for further processing. Module 120 may accordingly complete the processing and transmit the results back to terminal 116 in order to enable terminal 116 to accordingly provide affective expressions as a feedback to the user), 
a complex emotion recognition result (Wang [0069] Referring back to FIG. 3A, emotion recognizer 204 may derive an emotion state 304 based on emotion-related data 302, and then transmit it to a user intention computing processor 206 at module 120)
that includes a plurality of emotion recognition results of the user (Wang [0067] Referring back to FIG. 7, in some other embodiments, multimodal fusion processor 512 may be implemented based on models of emotion feature of each modality that are inter-connected with each other. For example, a video and an audio may be processed based on a hidden Markov model in order to build connections and complementarity between emotion features of two modalities based on the needs of processing. In addition, in some other embodiments, multimodal fusion processor 512 may also be implemented based on separate models of emotion feature of each modality. In such embodiments, each model independently recognizes an emotion feature and outputs all recognized emotion features at the end. For example, recognized emotion features in voice emotion-related data, facial expression emotion-related data, and physiological signal emotion-related data may be output together based on weighted superposition (linear), or the multi-layer perceptron in the convolutional neural network (non-linear), etc.); 
and an output device configured to output an audio or visual display (Wang [0034] An AUI refers to a user interface that a user uses to interact his emotions with the affective interaction system. A user may initiate an affective interaction by expressing his emotions to the AUI by any available means of operation and control. And the AUI may deliver any relevant command, emotion, information, data, user input, request, and other information to the computing module of the affective interaction system, and simultaneously feed a result and an output produced by the affective interaction system back to the user)
for determining an emotion of the user (Wang [0043] Emotion recognizer 204 may be implemented as a hardware device running one or more computing programs to receive an emotion-related data, recognize an emotion feature based on different forms of emotion-related data. Further, it may fuse the recognized emotion features into a multimodal emotion feature), 
when the complex emotion recognition result (Wang [0043] Emotion recognizer 204 may be implemented as a hardware device running one or more computing programs to receive an emotion-related data, recognize an emotion feature based on different forms of emotion-related data. Further, it may fuse the recognized emotion features into a multimodal emotion feature).
Wang does not teach
for determining an emotion of the user based on two or more of the emotion recognition results that do not match, 
when the complex emotion recognition result is based on the two or more of the emotion recognition results that do not match.
Fu teaches
based on the two or more of the emotion recognition results that do not match (Fu [0092-0097] Due to a complexity of the emotion recognition, there may be a contradictory situation between two emotion recognition results. In this case, in order to ensure an accuracy of the recognition results, recollecting the data and redo the identification is a better approach. [0093] Specifically, the emotion recognition result determination method preset in the step S3 is specifically stated as follows: [0094] when the said first emotion recognition result and the second emotion recognition result are different levels of commendatory emotion, determining the current user emotion recognition result as a low level commendatory emotion; [0095] when the first emotion recognition result and the second emotion recognition result are different levels of derogatory emotion, determining the current user emotion recognition result as a low level derogatory emotion; [0096] when one of the first emotion recognition result and the second emotion recognition result is a neutral emotion, and the other is a derogatory or commendatory emotion, then the current user emotion recognition result is determined as the said commendatory or derogatory emotion. [0097] All together, when both the first emotion recognition result and the second emotion recognition result have an emotion tendency (commendatory or derogatory), it adopts a degradation method by choosing a lower emotion type. And when one of the two is a neutral result, choosing the result with emotional tendency).
Fu is considered to be analogous to the claimed invention because it is in the same field of emotion recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang further in view of Fu to allow for handling contradictory results. Doing so would allow for taking appropriate action (such as controlling a user’s smart home) based on multiple emotion recognition results which may be contradictory.

Regarding claim 13, Wang in view of Fu teaches the robot of claim 12.
Wang does not teach
further comprising a post-processor for outputting a final emotion recognition result according to a certain criteria, when the received complex emotion recognition result is based on the two or more of the emotion recognition results that do not match.
Fu teaches
further comprising a post-processor for outputting a final emotion recognition result according to a certain criteria, when the received complex emotion recognition result is based on the two or more of the emotion recognition results that do not match (Fu [0092-0097] Due to a complexity of the emotion recognition, there may be a contradictory situation between two emotion recognition results. In this case, in order to ensure an accuracy of the recognition results, recollecting the data and redo the identification is a better approach. [0093] Specifically, the emotion recognition result determination method preset in the step S3 is specifically stated as follows: [0094] when the said first emotion recognition result and the second emotion recognition result are different levels of commendatory emotion, determining the current user emotion recognition result as a low level commendatory emotion; [0095] when the first emotion recognition result and the second emotion recognition result are different levels of derogatory emotion, determining the current user emotion recognition result as a low level derogatory emotion; [0096] when one of the first emotion recognition result and the second emotion recognition result is a neutral emotion, and the other is a derogatory or commendatory emotion, then the current user emotion recognition result is determined as the said commendatory or derogatory emotion. [0097] All together, when both the first emotion recognition result and the second emotion recognition result have an emotion tendency (commendatory or derogatory), it adopts a degradation method by choosing a lower emotion type. And when one of the two is a neutral result, choosing the result with emotional tendency).
Fu is considered to be analogous to the claimed invention because it is in the same field of emotion recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang further in view of Fu to allow for handling contradictory results. Doing so would allow for taking appropriate action (such as controlling a user’s smart home) based on multiple emotion recognition results which may be contradictory.

Regarding claim 14, Wang in view of Fu teaches the robot of claim 13.
Wang does not teach
wherein the post-processor outputs a contradictory emotion that includes two emotion classes among the complex emotion recognition result, as the final emotion recognition result, when the complex emotion recognition result is based on the two or more of the emotion recognition results that do not match.
Fu teaches
wherein the post-processor outputs a contradictory emotion that includes two emotion classes among the complex emotion recognition result, as the final emotion recognition result, when the complex emotion recognition result is based on the two or more of the emotion recognition results that do not match (Fu [0092-0097] Due to a complexity of the emotion recognition, there may be a contradictory situation between two emotion recognition results. In this case, in order to ensure an accuracy of the recognition results, recollecting the data and redo the identification is a better approach. [0093] Specifically, the emotion recognition result determination method preset in the step S3 is specifically stated as follows: [0094] when the said first emotion recognition result and the second emotion recognition result are different levels of commendatory emotion, determining the current user emotion recognition result as a low level commendatory emotion; [0095] when the first emotion recognition result and the second emotion recognition result are different levels of derogatory emotion, determining the current user emotion recognition result as a low level derogatory emotion; [0096] when one of the first emotion recognition result and the second emotion recognition result is a neutral emotion, and the other is a derogatory or commendatory emotion, then the current user emotion recognition result is determined as the said commendatory or derogatory emotion. [0097] All together, when both the first emotion recognition result and the second emotion recognition result have an emotion tendency (commendatory or derogatory), it adopts a degradation method by choosing a lower emotion type. And when one of the two is a neutral result, choosing the result with emotional tendency).
Fu is considered to be analogous to the claimed invention because it is in the same field of emotion recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang further in view of Fu to allow for handling contradictory results. Doing so would allow for taking appropriate action (such as controlling a user’s smart home) based on multiple emotion recognition results which may be contradictory.

Regarding claim 15, Wang in view of Fu teaches the robot of claim 14.
Wang teaches
wherein the post-processor selects, two emotion classes having a highest probability among the complex emotion recognition result (Wang [0066] The softmax function provides probabilities for each class label and is often used in a final layer of a neural network-based classifier).
Wang does not teach
wherein the post-processor selects, as the contradictory emotion, two emotion classes having a highest probability among the complex emotion recognition result.
Fu teaches
wherein the post-processor selects, as the contradictory emotion (Fu [0092-0097] Due to a complexity of the emotion recognition, there may be a contradictory situation between two emotion recognition results. In this case, in order to ensure an accuracy of the recognition results, recollecting the data and redo the identification is a better approach. [0093] Specifically, the emotion recognition result determination method preset in the step S3 is specifically stated as follows: [0094] when the said first emotion recognition result and the second emotion recognition result are different levels of commendatory emotion, determining the current user emotion recognition result as a low level commendatory emotion; [0095] when the first emotion recognition result and the second emotion recognition result are different levels of derogatory emotion, determining the current user emotion recognition result as a low level derogatory emotion; [0096] when one of the first emotion recognition result and the second emotion recognition result is a neutral emotion, and the other is a derogatory or commendatory emotion, then the current user emotion recognition result is determined as the said commendatory or derogatory emotion. [0097] All together, when both the first emotion recognition result and the second emotion recognition result have an emotion tendency (commendatory or derogatory), it adopts a degradation method by choosing a lower emotion type. And when one of the two is a neutral result, choosing the result with emotional tendency).
Fu is considered to be analogous to the claimed invention because it is in the same field of emotion recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang further in view of Fu to allow for handling contradictory results. Doing so would allow for taking appropriate action (such as controlling a user’s smart home) based on multiple emotion recognition results which may be contradictory.

Regarding claim 16, Wang in view of Fu teaches the robot of claim 12.
Wang further teaches
wherein the server comprises: an uni-modal preprocessor (Wang [0141] The systems, apparatus, and methods disclosed herein may be implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine readable storage device, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers)
configured to include a plurality of recognition processors (Wang [0141] The systems, apparatus, and methods disclosed herein may be implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine readable storage device, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers)
each 
corresponding to a different one of a plurality of modals (Wang [0037] Multichannel front-end terminal 116 may be a hardware device such as a robot, a smart terminal, a smartphone, an instant message (“IM”) platform, or any electronic device capable of providing an interface for a human user to make affective interaction with system 100. Through an affective interface of terminal 116, the user may make an emotion communication 102 in one or more modalities, such as a text 104, a voice 106, a facial expression 108, a gesture 110, a physiological signal 112, and/or a multimodality 114, and receive affective feedbacks also in one or more modalities. Text 104 may be any written information or expression in human or computer readable language, such as a word, a text message, an emoji, etc. Voice 106 may be any sound made by a human being using the vocal folds for talking, singing, laughing, crying, screaming, etc. Facial expression 108 may be an observed facial movement that reflects one or more motions or positions of the muscles beneath the skin of a user's face, such as a sad look, laughing, raising eyebrows, an eye contact, etc.), 
and learned to recognize emotion information of a user contained in uni-modal input data (Wang [0037] Multichannel front-end terminal 116 may be a hardware device such as a robot, a smart terminal, a smartphone, an instant message (“IM”) platform, or any electronic device capable of providing an interface for a human user to make affective interaction with system 100. Through an affective interface of terminal 116, the user may make an emotion communication 102 in one or more modalities, such as a text 104, a voice 106, a facial expression 108, a gesture 110, a physiological signal 112, and/or a multimodality 114, and receive affective feedbacks also in one or more modalities. Text 104 may be any written information or expression in human or computer readable language, such as a word, a text message, an emoji, etc. Voice 106 may be any sound made by a human being using the vocal folds for talking, singing, laughing, crying, screaming, etc. Facial expression 108 may be an observed facial movement that reflects one or more motions or positions of the muscles beneath the skin of a user's face, such as a sad look, laughing, raising eyebrows, an eye contact, etc.); 
and a multi-modal recognizer (Wang [0043] Emotion recognizer 204 may be implemented as a hardware device running one or more computing programs to receive an emotion-related data, recognize an emotion feature based on different forms of emotion-related data. Further, it may fuse the recognized emotion features into a multimodal emotion feature)
configured to merge output data from each of the plurality of recognition processors (Wang [0043] Emotion recognizer 204 may be implemented as a hardware device running one or more computing programs to receive an emotion-related data, recognize an emotion feature based on different forms of emotion-related data. Further, it may fuse the recognized emotion features into a multimodal emotion feature), 
and to be learned to recognize the emotion information of the user contained in the merged data (Wang [0043] Emotion recognizer 204 may be implemented as a hardware device running one or more computing programs to receive an emotion-related data, recognize an emotion feature based on different forms of emotion-related data. Further, it may fuse the recognized emotion features into a multimodal emotion feature), 
wherein the server transmits (Wang [0038] Terminal 116 provides an affective computing user interface that is capable of collecting a user's emotion communication and deriving emotion-related data for the purpose of further processing. In later stages of the affective interaction session, terminal 116 may receive commands from another device, e.g. module 120, and execute such commands and generate affective expressions to feed back to the user. For example, in the embodiment illustrated in FIG. 1, a user may make emotion communication 102, which may be collected by terminal 116. Terminal 116 may then send the received emotion communication 102 to module 120 through network 118 for further processing. Module 120 may accordingly complete the processing and transmit the results back to terminal 116 in order to enable terminal 116 to accordingly provide affective expressions as a feedback to the user), 
to the robot (Wang [0036] FIG. 1 is a block diagram illustrating an exemplary affective interaction system 100. Exemplary system 100 may be any type of system that provides affective interaction to a user based on an AUI, such as a service robot, a companion robot, a smart wearable, smart furniture, a smart home device, etc.), 
a plurality of emotion recognition results (Wang [0067] Referring back to FIG. 7, in some other embodiments, multimodal fusion processor 512 may be implemented based on models of emotion feature of each modality that are inter-connected with each other. For example, a video and an audio may be processed based on a hidden Markov model in order to build connections and complementarity between emotion features of two modalities based on the needs of processing. In addition, in some other embodiments, multimodal fusion processor 512 may also be implemented based on separate models of emotion feature of each modality. In such embodiments, each model independently recognizes an emotion feature and outputs all recognized emotion features at the end. For example, recognized emotion features in voice emotion-related data, facial expression emotion-related data, and physiological signal emotion-related data may be output together based on weighted superposition (linear), or the multi-layer perceptron in the convolutional neural network (non-linear), etc.)
each corresponding to a different one of the plurality of recognition processors (Wang [0067] Referring back to FIG. 7, in some other embodiments, multimodal fusion processor 512 may be implemented based on models of emotion feature of each modality that are inter-connected with each other. For example, a video and an audio may be processed based on a hidden Markov model in order to build connections and complementarity between emotion features of two modalities based on the needs of processing. In addition, in some other embodiments, multimodal fusion processor 512 may also be implemented based on separate models of emotion feature of each modality. In such embodiments, each model independently recognizes an emotion feature and outputs all recognized emotion features at the end. For example, recognized emotion features in voice emotion-related data, facial expression emotion-related data, and physiological signal emotion-related data may be output together based on weighted superposition (linear), or the multi-layer perceptron in the convolutional neural network (non-linear), etc.)
and a complex emotion recognition result (Wang [0069] Referring back to FIG. 3A, emotion recognizer 204 may derive an emotion state 304 based on emotion-related data 302, and then transmit it to a user intention computing processor 206 at module 120)
based on the emotion recognition result of the multi-modal recognizer (Wang [0043] Emotion recognizer 204 may be implemented as a hardware device running one or more computing programs to receive an emotion-related data, recognize an emotion feature based on different forms of emotion-related data. Further, it may fuse the recognized emotion features into a multimodal emotion feature).

Regarding claim 20, Wang teaches the server of claim 17.
Wang further teaches
wherein the emotion recognition device (Wang [0038] Terminal 116 provides an affective computing user interface that is capable of collecting a user's emotion communication and deriving emotion-related data for the purpose of further processing. In later stages of the affective interaction session, terminal 116 may receive commands from another device, e.g. module 120, and execute such commands and generate affective expressions to feed back to the user. For example, in the embodiment illustrated in FIG. 1, a user may make emotion communication 102, which may be collected by terminal 116. Terminal 116 may then send the received emotion communication 102 to module 120 through network 118 for further processing. Module 120 may accordingly complete the processing and transmit the results back to terminal 116 in order to enable terminal 116 to accordingly provide affective expressions as a feedback to the user).
Wang does not teach
includes a post-processor for outputting a final emotion recognition result according to a certain criteria, when the complex emotion recognition result is based on two or more of the emotion recognition results that do not match.
Fu teaches
includes a post-processor for outputting a final emotion recognition result according to a certain criteria, when the complex emotion recognition result is based on two or more of the emotion recognition results that do not match (Fu [0092-0097] Due to a complexity of the emotion recognition, there may be a contradictory situation between two emotion recognition results. In this case, in order to ensure an accuracy of the recognition results, recollecting the data and redo the identification is a better approach. [0093] Specifically, the emotion recognition result determination method preset in the step S3 is specifically stated as follows: [0094] when the said first emotion recognition result and the second emotion recognition result are different levels of commendatory emotion, determining the current user emotion recognition result as a low level commendatory emotion; [0095] when the first emotion recognition result and the second emotion recognition result are different levels of derogatory emotion, determining the current user emotion recognition result as a low level derogatory emotion; [0096] when one of the first emotion recognition result and the second emotion recognition result is a neutral emotion, and the other is a derogatory or commendatory emotion, then the current user emotion recognition result is determined as the said commendatory or derogatory emotion. [0097] All together, when both the first emotion recognition result and the second emotion recognition result have an emotion tendency (commendatory or derogatory), it adopts a degradation method by choosing a lower emotion type. And when one of the two is a neutral result, choosing the result with emotional tendency).
Fu is considered to be analogous to the claimed invention because it is in the same field of emotion recognition. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang further in view of Fu to allow for handling contradictory results. Doing so would allow for taking appropriate action (such as controlling a user’s smart home) based on multiple emotion recognition results which may be contradictory.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL J MUELLER whose telephone number is (571)272-1875. The examiner can normally be reached M-F 7:30am-5:30pm (Eastern).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C Washburn can be reached on 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PAUL J MUELLER/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657