DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). 
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/25/2021.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
The drawings are objected to because of the following informalities: In Fig. 4, element S406 is recited as “406” in [0054] of the specifications.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a first feature-value information acquiring unit”, “a second feature-value information acquiring unit”, and “an emotion estimating unit” in claims 1 and 9, and “a voice capturing unit”, “an image capturing unit”, and “a device control unit” of claim 6.
Regarding the terms “a first feature-value information acquiring unit”, “a second feature-value information acquiring unit”, “an emotion estimating unit”, “a voice capturing unit”, “an image capturing unit”, and “a device control unit”, the terms are generic placeholders. There is no evidence that one or ordinary skill in the art would understand the structure by looking at the terms. Further, the terms are modified by the functional language “for acquiring”, “for estimating”, and “for controlling”, but are not modified by a sufficient structure for performing the claimed function. Specifically, “a first feature-value information acquiring unit”, “a second feature-value information acquiring unit”, “an emotion estimating unit”, “a voice capturing unit”, “an image capturing unit”, and “a device control unit” is a/are mere functional descriptions.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
The “a first feature-value information acquiring unit”, “a second feature-value information acquiring unit”, “an emotion estimating unit”, and “device control unit” are embodied as a processor, as per the specifications at [0020-0022], [0070], and [0073]. The “a voice capturing unit” is embodied as microphone, and the “an image capturing unit” is embodied as a camera, as per the specifications at [0020].
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
	
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 2, and 8-10 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim(s) 1, 9, and 10, the limitation(s) of “acquiring an acoustic feature-value vector”, “acquiring...a language feature-value vector”, “acquiring an image feature-value vector”, “generating a first output vector”, and “generating a second output vector”, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. More specifically, the mental process of a human reading vector values off of a piece of paper and using a specific set of algorithms to calculate additional vectors using the read vector values. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the --Mental Processes-- grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application because the recitation of “an information processing device” in claim 1, “a non-transitory computer-readable storage medium” and a “computer” in claim 2, and “a first feature-value information acquiring unit”, “a second feature-value information acquiring unit”, and “an emotion estimating unit” in claims 1 and 9, reads to generalized computer components, based upon the claim interpretation wherein the structure is interpreted using [0020-0022], [0070], and [0073] in the specification. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. Claim 10 does not recite any additional elements. The claims are directed to an abstract idea.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using generalized computer components to acquire and generate amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
	
	
With respect to claim(s) 2, the claim(s) recite(s) specific information reflected in the vector data. No additional limitations are present.

With respect to claim(s) 8, the claim(s) recite(s) the information-processing device residing in a vehicle, which reads on a generic computer component as per the specifications [0020-1].

These claims further do not remedy the judicial exception being integrated into a practical application and further fail to include additional elements that are sufficient to amount to significantly more than the judicial exception.

Claim 3 and its dependents do not fall under the category of mental process due to the recitation of using teaching data to acquire a neural network model. 
Claim 6 and its dependent does not fall under the category of mental process due to the recitation of using the determined emotion to control a device. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-3, 6, 9, and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kalinli-Akbacak (U.S. PG Pub No. 2014/0112556), hereinafter Kalinli-Akbacak, in view of Ray et al. (“Multi-level Attention Network using Text, Audio and Video for Depression Prediction”, AVEC ’19, 2019), hereinafter Ray.

Regarding claims 1, 9, and 10, Kalinli-Akbacak teaches
(claim 1) An information-processing device comprising (an apparatus for performing emotion estimation [0052]):
(claim 9) A non-transitory computer-readable storage medium storing a program thereon, the program is for causing a computer to function as (instructions for emotion estimation, i.e. program, may be stored in a computer readable storage medium, i.e. A non-transitory computer-readable storage medium storing a program, that can be retrieved, interpreted, and executed by a computer processing device, i.e. for causing a computer to function [0062]):
(claim 10) An information-processing method comprising (a method for emotion recognition [0017]):

a first feature-value information acquiring unit for acquiring an acoustic feature-value vector and a language feature-value vector, extracted from a user's spoken voice (a processor, i.e. a first feature-value information acquiring unit, may filter the signals from the various sensor to extract relevant features from a user’s speech, i.e. acquiring...extracted from a user’s spoken voice, such as a set of acoustic features, i.e. acoustic feature-value vector, and the linguistic features from the voice signal, i.e. language feature-value vector [0020],[0021],[0026]);
a second feature-value information acquiring unit for acquiring an image feature-value vector extracted from the user's facial image (a processor, i.e. a second feature-value information acquiring unit, may filter the signals from the sensors to extract relevant visual features, i.e. acquiring an image feature-value vector, from the positions of the eyes, eyebrows, lips, nose, and mouth, i.e. extracted from the user’s facial image [0020],[0022],[0029]);
an emotion estimating unit comprising a learned model including (analyzing the acoustic, visual, and linguistic features may include the use of a machine learning algorithm, i.e. a learned model [0039], where the method is performed by a processor, i.e. an emotion estimating unit [0052]):
a first --model-- for generating a first output --value-- based on the acoustic feature-value vector and the image feature-value vector (a machine learning algorithm analyzes acoustic features, i.e. acoustic feature-value vector, to produce a first estimated emotional state, which is fed into a machine learning algorithm, i.e. a first model, that takes the visual features, i.e. image feature-value vector, and first estimated emotional state into account to produce a second estimated emotional state, i.e. generating a first output value [0041]); and
a second --model-- for generating a second output --value-- based on the first output vector and the language feature-value vector, wherein the emotion estimating unit is for estimating the user's emotion based on the second output vector (the second estimated emotional state, i.e. first output vector, is fed into the next machine learning algorithm, i.e. a second model, and taken into account when analyzing the linguistic features, i.e. language feature-value vector, to produce a final estimated emotional state, i.e. generating a second output value...wherein the emotion estimating unit is for estimating the user's emotion based on the second output value [0041-2]).  
While Kalinli-Akbacak provides for machine learning algorithms to evaluate three different modalities and produce outputs used to estimate a user’s emotion, Kalinli-Akbacak does not specifically teach that the machine learning algorithms include attention layers or that the outputs are vectors, and thus does not teach
a first attention layer for generating a first output vector...;
a second attention layer for generating a second output vector... estimating the user's emotion based on the second output vector;
Ray, however, teaches a first attention layer for generating a first output vector...(the fusion model uses the vector output from two modalities and passes them through an attention layer, i.e. first attention layer, to fuse the modalities, i.e. first output vector (Sec 3, para 1),(Sec. 4.4));
a second attention layer for generating a second output vector... estimating the user's emotion based on the second output vector (the fusion model uses all the modalities and passes them through an attention layer, i.e. second attention layer, to fuse the modalities, i.e. second output vector, which are then processed to produce a PHQ8 score, i.e. estimating the user's emotion based on the second output vector  (Sec 3, para 1),(Sec. 4.4));
Where Kalinli-Akbacak teaches that the fusion of all three modalities can be done in a specific sequence [0041-2].
Kalinli-Akbacak and Ray are analogous art because they are from a similar field of endeavor in processing audio, video, and linguistic features of a speaker to identify emotion. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine learning algorithms to evaluate three different modalities and produce outputs used to estimate a user’s emotion teachings of Kalinli-Akbacak with the use of attention layers to fuse modality vectors and output vectors as taught by Ray. It would have been obvious to combine the references to give insights of which features in a modality are more influential in learning, and understanding the ration of contribution of each modality towards the prediction (Ray (Sec 4.4, para. 1)).

Regarding claim 2, Kalinli-Akbacak in view of Ray teaches claim 1, and Kalinli-Akbacak further teaches
the acoustic feature-value vector includes at least one of a feature-value vector of sound pitch, a feature-value vector of speaking speed, a feature-value vector of voice intensity (extracted acoustic features, i.e. acoustic feature-value vector, may include prosodic features such as pitch, i.e. a feature-value vector of sound pitch, and energy, i.e. a feature-value vector of voice intensity, and speaking rate, i.e. a feature-value vector of speaking speed [0021]).  

Regarding claim 3, Kalinli-Akbacak in view of Ray teaches claim 1, and Kalinli-Akbacak further teaches
the learned model is a neural-network model acquired by machine learning using teaching data including (a machine learning algorithm analyzes features to determine estimated emotional state, i.e. learned model...acquired by machine learning [0041], where the machine learning algorithms may be neural networks, i.e. neural-network model [0039], and may be trained using training data, i.e. using teaching data [0043]):
an acoustic feature-value vector and a language feature-value vector extracted from a person's spoken voice (the algorithm is trained to analyze the acoustic and linguistic feature types, i.e. an acoustic feature-value vector and a language feature-value vector extracted from a person's spoken voice, where the algorithm is trained with data that has emotion class labels, and an emotion is associated with an extracted feature [0020-1],[0026],[0031],[0043-4]);
an image feature-value vector extracted from the person's facial image (the algorithm is trained to analyze the visual feature types, i.e. an image feature-value vector, where the algorithm is trained with data that has emotion class labels, and an emotion is associated with an extracted feature [0020],[0022],[0029],[0031],[0043-4]); and
information indicating the person's emotion (the training data has emotion class labels, where an emotion is associated with an extracted feature, i.e. information indicating the person’s emotion [0031],[0044]).  

Regarding claim 6, Kalinli-Akbacak in view of Ray teaches claim 1, and Kalinli-Akbacak further teaches
a voice capturing unit for acquiring the user's spoken voice (signals relating to the user’s voice may be obtained, i.e. acquiring the user's spoken voice, using a microphone or microphone array as a sensor, i.e. a voice capturing unit [0020-1]);
an image capturing unit for acquiring the user's image (a user image may be obtained, i.e. acquiring the user's image, with a camera, i.e. an image capturing unit [0019],[0029]);
25Attorney Docket Number: HO-0255USa device control unit for controlling a device based on the user's emotion estimated by the emotion estimating unit (different applications may use the determined emotion of the user, i.e. based on the user's emotion estimated by the emotion estimating unit, to decide what to do next, such as a game providing more points to whoever stays calm under stress, or an intelligent man-machine interface such as a virtual agent dynamically adapting behavior, i.e. a device control unit for controlling a device [0066]).  

Claim(s) 4 and 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kalinli-Akbacak, in view of Ray, and further in view of Lin et al. (U.S. PG Pub No. 2021/0097887), hereinafter Lin.

Regarding claim 4, Kalinli-Akbacak in view of Ray teaches claim 3, and Kalinli-Akbacak further teaches
wherein the learned model includes:
a first --machine learning model--, for outputting a first vector with the acoustic feature-value vector defined as an input (a machine learning algorithm, i.e. model, analyzes the acoustic features, i.e. acoustic feature-value vector defined as an input, to produce a reduced-dimension feature set to be analyzed by another machine learning algorithm, i.e. outputting a first vector [0039]);
a second --machine learning model--, for outputting a second vector with the image feature-value vector defined as an input (a machine learning algorithm, i.e. second machine learning model, analyzes the visual features, i.e. image feature-value vector defined as an input, to produce a reduced-dimension feature set to be analyzed by another machine learning algorithm, i.e. outputting a second vector [0039]); and
a third --machine learning model--, for outputting a third vector with the language feature-value vector defined as an input (a machine learning algorithm, i.e. third machine learning model, analyzes the linguistic features, i.e. language feature-value vector defined as an input, to produce a reduced-dimension feature set to be analyzed by another machine learning algorithm, i.e. outputting a third vector [0039]), 
wherein the first attention layer is for outputting the first output vector based on the first vector and the second vector (the reduced acoustic feature set is fed into another machine learning algorithm to produce a first estimated emotional state, which is then fed into another machine learning algorithm with the reduced visual feature set, i.e. first model, to produce a second estimated emotional state, i.e. outputting the first output value [0039],[0041]), and the second attention layer is for outputting the second output vector based on the first output vector and the third vector (the reduced linguistic feature set is fed into another machine learning algorithm with the second estimated emotional state, i.e. first output value, to produce a final estimated emotional state, i.e. outputting the second output value [0039],[0041-2]).  
Where Ray teaches the use of an attention layer for processing the vectors from the different modalities to output a vector (Sec 3, para 1),(Sec. 4.4).
While Kalinli-Akbacak in view of Ray provides the use of machine learning algorithms to process acoustic, visual, and linguistic features to reduce the dimensions of the feature sets, Kalinli-Akbacak in view of Ray does not specifically teach that the machine learning algorithms are neural networks, and thus does not teach
a first neural-network layer having a first recurrent-neural-network layer, for outputting a first vector...;
a second neural-network layer having a second recurrent-neural-network layer, for outputting a second vector...;
a third neural-network layer having a third recurrent-neural-network layer, for outputting a third vector...;
Lin, however, teaches a first neural-network layer having a first recurrent-neural-network layer, for outputting a first vector...(multimodal speaker neural network is a DNN that determines speaker audio, speaker text, and speaker video and passes the data to an input layer of a CNN, where the output of the CNN is passed to a speaker audio specific RNN layer of the multimodal neural network, i.e. a first neural-network layer having a first recurrent-neural-network layer, that extracts the features into audio features, and the output of the RNN is a normalized distribution of the features identified, i.e. outputting a first vector Figs.3,5,[0025],[0039],[0040-1]);
a second neural-network layer having a second recurrent-neural-network layer, for outputting a second vector...(multimodal speaker neural network is a DNN that determines speaker audio, speaker text, and speaker video and passes the data to an input layer of a CNN, where the output of the CNN is passed to a speaker video specific RNN layer of the multimodal neural network, i.e. a second neural-network layer having a second recurrent-neural-network layer, that extracts the features into video features, and the output of the RNN is a normalized distribution of the features identified, i.e. outputting a second vector Figs.3,5,[0025],[0039],[0040-1]);
a third neural-network layer having a third recurrent-neural-network layer, for outputting a third vector...(multimodal speaker neural network is a DNN that determines speaker audio, speaker text, and speaker video and passes the data to an input layer of a CNN, where the output of the CNN is passed to a speaker text specific RNN layer of the multimodal neural network, i.e. a third neural-network layer having a third recurrent-neural-network layer, that extracts the features into text features, and the output of the RNN is a normalized distribution of the features identified, i.e. outputting a third vector Figs.3,5,[0025],[0039],[0040-1]);
Kalinli-Akbacak, Ray, and Lin are analogous art because they are from a similar field of endeavor in processing audio, video, and linguistic features of a speaker to analyze speaker behavior. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the use of machine learning algorithms to process acoustic, visual, and linguistic features to reduce the dimensions of the feature sets teachings of Kalinli-Akbacak, as modified by Ray, with the use of a multimodal DNN with modality-specific RNN layers to process audio, video, and text features as taught by Lin. It would have been obvious to combine the references to enable training of the multimodal DNN to improve the accuracy of the DNN and provide an adaptable program to provide speaker guidance to improve the effectiveness of the user’s speech (Lin [0029-30]).

Regarding claim 5, Kalinli-Akbacak in view of Ray and Lin teaches claim 4, and Ray further teaches
the first recurrent-neural-network layer, the second recurrent-neural-network layer, and the third recurrent-neural-network layer are GRU (Gated Recurrent Unit) layers or LSTM (Long short-term memory) layers (the audio, visual, and text features are fed into a BLSTM layer before having attention applied, i.e. first recurrent-neural-network layer, the second recurrent-neural-network layer, and the third recurrent-neural-network layer (Sec. 4.1-4.4, para 2).  
Where the motivation to combine is the same as previously presented.

Claim(s) 7 and 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kalinli-Akbacak, in view of Ray, and further in view of Yang (U.S. PG Pub No. 2021/0362725), hereinafter Yang.

Regarding claim 7, Kalinli-Akbacak in view of Ray teaches claim 6.
While Kalinli-Akbacak in view of Ray provides applications utilizing the determined user emotion to alter the behavior of the application, Kalinli-Akbacak in view of Ray does not specifically teach that the system provides a voice output to the user based on the determined emotion, and thus does not teach
the device is a voice output device for outputting a voice to the user, wherein the device control unit is for generating voice data to be outputted from the voice output device, based on the user's emotion estimated by the emotion estimating unit.  
Yang, however, teaches the device is a voice output device for outputting a voice to the user (the terminal device managed by the in-vehicle control system includes sound output units, i.e. device, positioned around the vehicle that can provide directional voice output to the driver and passengers, i.e. voice output device for outputting a voice to the user [0100-4]), wherein the device control unit is for generating voice data to be outputted from the voice output device, based on the user's emotion estimated by the emotion estimating unit (the terminal device controls the sound output units to provide output such as directional voice output, i.e. device control unit is for generating voice data to be outputted from the voice output device [0100-4], where the voice output is a pacification manner determined by the central controller and sent to the terminal device, and where the pacification manner is a voice interaction based on the emotion of the user determined by the pacification device, i.e. based on the user's emotion estimated by the emotion estimating unit  [0106],[0116-124]). 
Kalinli-Akbacak, Ray, and Yang are analogous art because they are from a similar field of endeavor in processing multimodal input to identify the emotional status of a user. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the applications utilizing the determined user emotion to alter the behavior of the application teachings of Kalinli-Akbacak, as modified by Ray, with a system providing a voice interaction to pacify the user based on the determined emotion as taught by Yang. It would have been obvious to combine the references to enable the avoidance of safety hazards to vehicle operation when a user in the car is experiencing an abnormal emotion (Yang [0037]).

Regarding claim 8, Kalinli-Akbacak in view of Ray teaches claim 1. 
While Kalinli-Akbacak in view of Ray provides a system that identifies user emotion, Kalinli-Akbacak in view of Ray does not specifically teach that the system is part of a vehicle, and thus does not teach
A vehicle comprising the information-processing device....
Yang, however, teaches a vehicle comprising the information-processing device (the central controller can be combined with the central control system of the in-vehicle system, i.e. vehicle comprising the information-processing device, to manage terminal devices and pacification manners [0104]).
Kalinli-Akbacak, Ray, and Yang are analogous art because they are from a similar field of endeavor in processing multimodal input to identify the emotional status of a user. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the system that identifies user emotion teachings of Kalinli-Akbacak, as modified by Ray, with a controller combined with the central control system of an in-vehicle system, as taught by Yang. It would have been obvious to combine the references to enable the avoidance of safety hazards to vehicle operation when a user in the car is experiencing an abnormal emotion (Yang [0037]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
Yadav et al. (U.S. PG Pub No. 2021/0201004): Recognizing emotions using visual, voice, and text features.
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICOLE A K SCHMIEDER whose telephone number is (571)270-1474. The examiner can normally be reached 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NICOLE A K SCHMIEDER/Examiner, Art Unit 2659 

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659