DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the office action from 4/14/2022, the applicant has submitted an amendment, filed 7/11/2022, amending claims 1, 3, 9, 17, 18, cancelling claims 2, 6, and 7, while arguing to traverse the prior art rejections. Applicant’s arguments have been fully considered but the previous grounds of rejections are maintained for the reasons explained in the response to arguments.
Response to Arguments
Page 11 the first two ¶’s provide a broad overview of the latest amendments with no arguments.
Page 11 the 3rd and 4th ¶’s, discuss the previous 112(b) rejections.
Due to the latest amendments the said rejections are withdrawn.
To address the first action’s 102 rejections, from the end of page 11 to the end of the first ¶ on page 15, the primary reference BROMAND et al. (US 2019/0325896) is discussed and in certain pages portions of the reference are also copied. Those pages lack any arguments. Then on page 15 last ¶ it is concluded: “Accordingly, BROMAND may determine the pivot indicating an emotion change of the user” ….. “It appears that the pivot detection and response adaption in BROMAND is different from the feature “deciding the intention type of the voice information is an instruction intention if the emotion types corresponding to the emotion keywords are different” as recited in the amended claim 1”.
Respectfully this lacks any reasoning as to why BROMAND fails to teach the claim element shown in bold which corresponds to the original claim 7.
The claim is silent on the scope of the quoted in bold limitation. According to specification ¶ 0074 sentence 3: “When the emotion types corresponding to the plurality of emotion types are different, the intention type of the voice information can be determined as instruction intention”. This is also void of any details; e.g. it does not  make any specific correlation between the “instruction intention” and the “plurality of” “different” “emotion types” that have been detected. It can simply be interpreted that if there exists subsequent utterances from a user which possess different emotions, the system simply interprets the user’s “intention” to be an “instruction”. The “instruction” according to specification ¶ 0088 first sentence is: “The instruction intention is what the user needs to execute”; according to specification ¶ 0088 last 6 lines: “correspondingly, the method further includes: when the instruction corresponding to the instruction intention is executed, the excitation voice is generated and played according to the instruction”.
This very clearly defines the “instruction” to correspond to something that the system is to “execute” for a user and as one example it is something “play[ed]” back to the user. So as the BROMAND et al.  “determine[s] pivot indicating an emotion change” does fulfil the part of the claim limitation requiring “emotion keywords are different”, and the response by the system of playing “what a wonderful world” does map to the claim element’s “instruction”. Certainly, if an “instruction” associated with an “intention” is executed, it amounts to having determined its associated “intention” as a pre-requisite.
The remainder of the remarks on pages 16-17 discuss the independent claims 17 and 18 and all the dependent claims. As regards to the independent claims, it is asserted that they “recite elements that are similar to the elements discussed above with respect to amended independent claim 1” (page 16 ¶ 2).
Therefore the responses provided above with respect to claim 1 applies to them.
As regards to the dependent claims it is asserted that they “are patentable for at least the same reasons as stated above with respect to independent claims” (page 16 ¶ 3”.
Since applicants have not argued the merits of these dependent claims, but assert patentability solely through their dependence on the allegedly patentable parent claims, they stand or fall with said parent claims and hence no further response to applicant’s arguments is necessary.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 3-5, 8-15, 17-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over BROMAND et al. (US 2019/0325896), and further in view of Wu et al. (US 2020/0044999).

Regarding claim 1, BROMAND et al. do teach an emotion-based voice interaction method (Title, Abstract), 
wherein comprising:
receiving voice information to be processed, and acquiring an intention type of the voice information (for a “natural utterance” “detectable by a microphone” (¶ 0024 lines 1-3), according to ¶ 0033 lines 2+: “receiving” “via a user device, a natural utterance” (receiving a voice information) “including a command” (with an intention type));
determining an emotion type of the voice information when the intention type is an emotion intention (¶ 0033 lines 4+: “extracting the command” (using the intention) “extracting an emotion feature” (determining an emotion type) “from the natural utterance” (from the voice information which can indicate the intention type is an emotion type)); and
generating a response voice of the voice information according to the emotion type, and playing the response voice (¶ 0033 lines 5+: “mapping the emotion feature to an emotion” “and responding to the natural utterance at least by executing the command and providing a synthesized utterance” (generating a response voice to the voice information) “adapted to the emotion” (according to the emotion type)) ; 
wherein the receiving voice information to be processed, and acquiring an intention type of the voice information comprises:
receiving the voice information to be processed, performing word division on the voice information to obtain several words, and judging whether the several words being divided comprise an emotion keyword (¶ 0080: “user device” “includes a speech to text (STT) engine 124 for converting an analog signal of the natural language utterance into a digitalized textual component” (performing a word division on the “utterance” (voice information) to obtain “textual component” (several words)); e.g. ¶ 0099 last sentence: “if the word " Great" is in the textual component of the natural utterance, that textual component emotion feature is mapped to a joyful emotion” (where the “textual component” (several words) comprises “Great” (an emotion keyword mapped to joyful emotion)) ; and
deciding the intention type of the voice information is an emotion intention when the several words comprise the emotion keyword (¶ 0099 last sentence: “if the word " Great" is in the textual component” (if the emotion keyword is within the “textual component” (several words)) “of the natural utterance, that textual component emotion feature is mapped to a joyful emotion” (then the “natural utterance” (voice information) associated with the “textual component” (the several words) is “joyful” (is an emotion intention)); 
wherein the deciding the intention type of the voice information is an emotion intention when the several words comprise the emotion keyword comprises:
acquiring a number of emotion keywords comprised when the several words comprise the emotion keyword (¶ 0099 last sentence: “if the word " Great" is in the textual component” (acquiring one emotion keyword in the “textual component” (among several words)) “of the natural utterance, that textual component emotion feature is mapped to a joyful emotion”);
deciding the intention type of the voice information is the emotion intention when the number is equal to 1 (¶ 0099 last sentence: “if the word " Great" is in the textual component” (based on one emotion keyword (“Great”)) “of the natural utterance, that textual component emotion feature is mapped to a joyful emotion” (the “textual component” (transcript of the voice information and thus the voice information) is determined to be “joyful”));
deciding the intention type of the voice information is an instruction intention if the emotion types corresponding to the emotion keywords are different (¶ 0049 lines 2+: “user’s” “the command” “play me an uplifting song” “mapped to a “neutral feeling”” (i.e., no word in this “command” can be “joyful”) “the second utterance of “Yes!” is received from the user” “and mapped to a” “joyful” “emotion” (the combination of the original command and the word “Yes!” by the “user” together possess at least two keywords that possess two different emotion types), and according to ¶ 0049 last sentence this results in “what a Wonderful World” “by Louis Armstrong is executed” (resulted in treating the overall “user” input as an instruction intention)).
BROMAND et al. do not specifically disclose:
 And detecting whether the emotion types corresponding to the emotion keywords are the same when the number is greater than 1, and deciding the intention type of the voice information is the emotion intention if the emotion types corresponding to the emotion keywords are the same.
Wu et al. do teach:
detecting whether the emotion types corresponding to the emotion keywords are the same when the number is greater than 1, and deciding the intention type of the voice information is the emotion intention if the emotion types corresponding to the emotion keywords are the same (¶ 0111 last 7 lines: “An exemplary sentence with the emotion " sad" may be "I don't like it and want to cry"” (the “sentence” (e.g. voice information) as a whole is decided to correspond to emotion type “sad”, and the sentence possesses the keywords “don’t like” as well as “cry” that are each “sad” emotion type as well)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “emotion model” of Wu et al. into the “emotion sensitive responses” prediction of BROMAND et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable BROMAND et al. to benefit from “an emotion model, which is time-sensitive to an input” as disclosed in Wu et al. ¶ 0153 lines 4-5.

Regarding claim 3, BROMAND et al. main embodiment do teach the emotion-based voice interaction method according to claim 1, wherein the receiving the voice information to be processed, performing word division on the voice information to obtain several words, and judging whether the several words being divided comprise an emotion keyword comprises:
receiving the voice information to be processed, and converting the voice information into text information (¶ 0080: “user device” “includes a speech to text (STT) engine 124 for converting an analog signal of the natural language utterance into a digitalized textual component” (converting the “utterance” (voice information) into “textual component” (text information)).
BROMAND et al. main embodiment do not specifically disclose:
dividing the text information into several words, and selecting words meeting a preset condition from the several words being divided;
judging whether the selected words meeting the preset condition comprises the emotion keyword.
BROMAND et al. alternative embodiment do teach:
dividing the text information into several words, and selecting words meeting a preset condition from the several words being divided (¶ 0044 sentence 1: “a user’s natural utterance” “Ugh” (obtained as one word from voice to text conversion of an “utterance” (voice information) into several words) “having no words from which a command can be extracted” (selecting a word “Ugh” meeting a preset condition which not only cannot function as a “command” but also it does not belong to any known part of speech category)); and
judging whether the selected words meeting the preset condition comprises an emotion keyword (¶ 0044 sentence 1: “a user’s natural utterance” “Ugh” (the selected word) “having no words from which a command can be extracted” “emotion features” (may be judged as an emotion keyword) “based on the tone of the natural utterance are extracted”).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate using “tone of the natural utterance” to determine “emotion features” of an “utterance” of its alternative embodiment into its main embodiment would enable the combined systems and their associated methods of the two embodiments to function in combination as they do separately and to further enable the main embodiment to use “tone” as well as its other methods to determine e.g. if the word “Great” (the emotion keyword) can be judged also as an emotion keyword using this other “tone” based technique and thus enhance its reliability and accuracy should this word “Great” or similar emotion words appear in an “utterance”.

Regarding claim 4, BROMAND et al. main embodiment do not specifically disclose the emotion-based voice interaction method according to claim 3, wherein the preset condition is that a part-of-speech of a word does not belong to a preset part-of-speech list.
BROMAND et al. alternative embodiment do teach the emotion-based voice interaction method according to claim 3, wherein the preset condition is that a part-of-speech of a word does not belong to a preset part-of-speech list (¶ 0044 sentence 1: “a user’s natural utterance” “Ugh” “having no words from which a command can be extracted” (selecting a word “Ugh” meeting a preset condition that it does not belong to any known or preset part of speech category)).
For obviousness to combine BROMAND et al. main and alternative embodiments see claim 3.

Regarding claim 5, BROMAND et al. main embodiment do not specifically disclose the emotion-based voice interaction method according to claim 4, wherein the preset parts-of-speech list comprises non-key part-of-speech, wherein the non-keyword part- of-speech is a part-of-speech without an emotion characteristic or an action characteristic.
BROMAND et al. alternative embodiment do teach the emotion-based voice interaction method according to claim 4, wherein the preset parts-of-speech list comprises non-key part-of-speech, wherein the non-keyword part- of-speech is a part-of-speech without an emotion characteristic or an action characteristic (¶ 0044 sentence 1: “a user’s natural utterance” “Ugh” “having no words from which a command can be extracted” (the word “Ugh” cannot be used as a “command” (an action characteristic)).
For obviousness to combine BROMAND et al. main and alternative embodiments see claim 3.

Regarding claim 8, BROMAND et al. do teach the emotion-based voice interaction method according to claim 1, wherein further comprising:
deciding the intention type of the voice information is the instruction intention when the voice information does not comprise the emotion keyword (¶ 0031 sentence 3: “a " neutral feeling" emotion feature conveys that the associated utterance/utterance portion does not imply any of the other feelings” (e.g., in that case (¶ 0049 lines 2+: “user’s” “the command” “play me an uplifting song” “mapped to a” “neutral feeling” is treated as an instruction to just “execut[e]” “the command” with no audible response, since if any “emotion” is detected according to ¶ 0033 last 4 lines: “executing the command and providing” “utterance” “adapted to the emotion”)).

Regarding claim 9, BROMAND et al. do teach the emotion-based voice interaction method according to claim 1, wherein further comprising:
judging whether the instruction intention can determine instruction content when the intention type is the instruction intention (¶ 0049 lines 2+: “user’s” “the command” “play me an uplifting song” (the instruction intention) “mapped to a “neutral feeling”” “the second utterance of “Yes!” is received from the user” “and mapped to a” “joyful” “emotion”, and according to ¶ 0049 last sentence this results in “what a Wonderful World” “by Louis Armstrong is executed” (is assessed to correspond to this instruction content)); 
and querying a user in a domain clarification manner until the instruction content can be determined when the instruction intention can determine the instruction content, and executing an instruction corresponding to the instruction intention (¶ 0049 lines 2+: “user’s” “the command” “play me an uplifting song” (the instruction intention) “mapped to a “neutral feeling” “In response to the first natural utterance, a synthesized utterance is generated proposing that a first action of playing the song” “What a Wonderful World” “by Louis Armstrong be executed” (a domain clarification query is made by the system)  “the second utterance of “Yes!” is received from the user” “and mapped to a” “joyful” “emotion”, and according to ¶ 0049 last sentence this results in “what a Wonderful World” “by Louis Armstrong is executed” (The answer “Yes!” authorizes the instruction content to be executed)).

Regarding claim 10, BROMAND et al. do teach the emotion-based voice interaction method according to claim 9, wherein further comprising:
generating and playing an excitation voice according to the instruction when the instruction corresponding to the instruction intention is executed (In response to the user requesting “Play my favorite song” (¶ 0036 lines 3+) according to ¶ 0036 lines 9+: “ A synthesized utterance acknowledging the command is then provided” “You seem a little down, Playing your favorite song” (generating an excitation voice according to the instruction) “and the user’s favorite song is played” (while the instruction intention is executed)).

Regarding claim 11, BROMAND et al. do teach the emotion-based voice interaction method according to claim 1, wherein the determining an emotion type of the voice information when the intention type is an emotion intention comprises:
performing emotion analysis on the voice information to obtain the emotion type corresponding to the voice information when the intention type is the emotion intention, the emotion analysis comprises one or more analysis modes of vocabulary emotion analysis, sentence meaning emotion analysis and sound rhythm emotion analysis (¶ 0034: “The emotion feature” (emotion type obtained) “is extracted from one or more of a variety of cues” (by an emotion analysis) “examples of such cues include” “inflection in the natural utterance, a volume of the utterance, a pitch of the utterance” (comprising sound rhythm emotion analysis) “one or more words in the natural utterance” (and a vocabulary emotion analysis)).

Regarding claim 12, BROMAND et al. do teach the emotion-based voice interaction method according to claim 1, wherein the generating a response voice of the voice information according to the emotion type, and playing the response voice comprises:
generating the response voice corresponding to the voice information according to the emotion type based on an emotion empathy principle and an emotion guiding principle, and playing the response voice, the response voice comprises an emotion response statement and a function guiding statement (In response to the user requesting “play my favorite song” (¶ 0036 lines 3+), then according to ¶ 0036 lines 9+: “A synthesized utterance acknowledging the command” (response voice generated) “is then provided” “You seem a little down” (this part of the response abides by an emotion empathy principle) “playing your favorite song” (this part of the response abides by an emotion guiding principle statement and is also responsible as a function guiding statement)).

Regarding claim 13, BROMAND et al. do teach the emotion-based voice interaction method according to claim 12, wherein the generating the response voice corresponding to the voice information according to the emotion type based on an emotion empathy principle and an emotion guiding principle, and playing the response voice comprises:
generating the response voice corresponding to the voice information according to the emotion type based on the emotion empathy principle and the emotion guiding principle(¶ 0036 lines 9+: “A synthesized utterance acknowledging the command” (response voice generated) “is then provided” “You seem a little down” (this part of the response abides by an emotion empathy principle) “playing your favorite song” (this part of the response abides by an emotion guiding principle statement)); and
acquiring a voice characteristic of the voice information, and playing the response voice according to the voice characteristic (¶ 0036 lines 2+: “user’s natural utterance” “play my favorite song” “is received” “one or more emotion features associated with the natural utterance” “are extracted” “from the words of the natural utterance” “and/or the tone/cadence/volume/pitch/pace of the natural utterance” (acquiring a voice characteristic of the “utterance” (voice information)) “mapped to the emotion of sadness” “A synthesized utterance” (playing the response) “acknowledging the command is then provided that is adapted to the sadness emotion” (according to the characteristic)).

Regarding claim 14, BROMAND et al. do teach the emotion-based voice interaction method according to claim 12, wherein the generating the response voice corresponding to the voice information according to the emotion type based on an emotion empathy principle and an emotion guiding principle, and playing the response voice comprises:
generating the response voice corresponding to the voice information according to the emotion type based on the emotion empathy principle and the emotion guiding principle (¶ 0036 lines 9+: “A synthesized utterance acknowledging the command” (response voice generated) “is then provided” “You seem a little down” (this part of the response abides by an emotion empathy principle) “playing your favorite song” (this part of the response abides by an emotion guiding principle statement); and
generating an emotion visual image according to the response voice, and deducing the corresponding response voice through the visual image (¶ 0038 last 5 lines: “providing” “both of a synthesized utterance acknowledging the command” “and/or displaying a text” (generating a visual image of the emotion of the response voice) “acknowledging the command” “adapted to the detected emotion”).

Regarding claim 15, BROMAND et al. do teach the emotion-based voice interaction method according to claim 1, wherein before receiving voice information to be processed, and acquiring an intention type of the voice information, comprising:
activating a voice listening mode and actively playing a preset voice when a voice awakening instruction is received (¶ 0049 lines 2+: “user’s” “the command” “play me an uplifting song” “mapped to a “neutral feeling” “In response to the first natural utterance, a synthesized utterance is generated proposing that a first action of playing the song” “What a Wonderful World” “by Louis Armstrong be executed” (before receiving “the second utterance” (voice information functioning as a voice awakening) and its associated intention)  “the second utterance of “Yes!” is received from the user” “and mapped to a” “joyful” “emotion”, and according to ¶ 0049 last sentence this results in “What a Wonderful World” “by Louis Armstrong is executed” (voice listening mode is activated to play “What a Wonderful World” “by Louis Armstrong” (playing a preset voice) upon assessment of the instruction).

Regarding claim 17, BROMAND et al. do teach a non-transitory computer-readable storage medium, wherein having one or more programs stored thereon, the one or more programs being executable by one or more processors ( ¶ 0038: “In some embodiments of the present disclosure, a non-transitory computer readable medium comprises: an emotion processor having one or more sequences of emotion processor instructions that, when executed by one or more processors, causes the one or more processors to generate an output adapted to a detected emotion in a natural utterance from a user”;
to implement:
receiving voice information to be processed, and acquiring an intention type of the voice information (for a “natural utterance” “detectable by a microphone” (¶ 0024 lines 1-3), according to ¶ 0033 lines 2+: “receiving” “via a user device, a natural utterance” (receiving a voice information) “including a command” (with an intention type));
determining an emotion type of the voice information when the intention type is an emotion intention (¶ 0033 lines 4+: “extracting the command” (using the intention) “extracting an emotion feature” (determining an emotion type) “from the natural utterance” (from the voice information which can indicate the intention type is an emotion type)); and
generating a response voice of the voice information according to the emotion type, and playing the response voice (¶ 0033 lines 5+: “mapping the emotion feature to an emotion” “and responding to the natural utterance at least by executing the command and providing a synthesized utterance” (generating a response voice to the voice information) “adapted to the emotion” (according to the emotion type));
wherein the receiving voice information to be processed, and acquiring an intention type of the voice information comprises:
receiving the voice information to be processed, performing word division on the voice information to obtain several words, and judging whether the several words being divided comprise an emotion keyword (¶ 0080: “user device” “includes a speech to text (STT) engine 124 for converting an analog signal of the natural language utterance into a digitalized textual component” (performing a word division on the “utterance” (voice information) to obtain “textual component” (several words)); e.g. ¶ 0099 last sentence: “if the word " Great" is in the textual component of the natural utterance, that textual component emotion feature is mapped to a joyful emotion” (where the “textual component” (several words) comprises “Great” (an emotion keyword mapped to joyful emotion)) ; and
deciding the intention type of the voice information is an emotion intention when the several words comprise the emotion keyword (¶ 0099 last sentence: “if the word " Great" is in the textual component” (if the emotion keyword is within the “textual component” (several words)) “of the natural utterance, that textual component emotion feature is mapped to a joyful emotion” (then the “natural utterance” (voice information) associated with the “textual component” (the several words) is “joyful” (is an emotion intention)); 
wherein the deciding the intention type of the voice information is an emotion intention when the several words comprise the emotion keyword comprises:
acquiring a number of emotion keywords comprised when the several words comprise the emotion keyword (¶ 0099 last sentence: “if the word " Great" is in the textual component” (acquiring one emotion keyword in the “textual component” (among several words)) “of the natural utterance, that textual component emotion feature is mapped to a joyful emotion”);
deciding the intention type of the voice information is the emotion intention when the number is equal to 1 (¶ 0099 last sentence: “if the word " Great" is in the textual component” (based on one emotion keyword (“Great”)) “of the natural utterance, that textual component emotion feature is mapped to a joyful emotion” (the “textual component” (transcript of the voice information and thus the voice information) is determined to be “joyful”));
deciding the intention type of the voice information is an instruction intention if the emotion types corresponding to the emotion keywords are different (¶ 0049 lines 2+: “user’s” “the command” “play me an uplifting song” “mapped to a “neutral feeling”” (i.e., no word in this “command” can be “joyful”) “the second utterance of “Yes!” is received from the user” “and mapped to a” “joyful” “emotion” (the combination of the original command and the word “Yes!” by the “user” together possess at least two keywords that possess two different emotion types), and according to ¶ 0049 last sentence this results in “what a Wonderful World” “by Louis Armstrong is executed” (resulted in treating the overall “user” input as an instruction intention)).
BROMAND et al. do not specifically disclose:
 And detecting whether the emotion types corresponding to the emotion keywords are the same when the number is greater than 1, and deciding the intention type of the voice information is the emotion intention if the emotion types corresponding to the emotion keywords are the same.
Wu et al. do teach:
detecting whether the emotion types corresponding to the emotion keywords are the same when the number is greater than 1, and deciding the intention type of the voice information is the emotion intention if the emotion types corresponding to the emotion keywords are the same (¶ 0111 last 7 lines: “An exemplary sentence with the emotion " sad" may be "I don't like it and want to cry"” (the “sentence” (e.g. voice information) as a whole is decided to correspond to emotion type “sad”, and the sentence possesses the keywords “don’t like” as well as “cry” that are each “sad” emotion type as well)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “emotion model” of Wu et al. into the “emotion sensitive responses” prediction of BROMAND et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable BROMAND et al. to benefit from “an emotion model, which is time-sensitive to an input” as disclosed in Wu et al. ¶ 0153 lines 4-5.

Regarding claim 18, BROMAND et al. do teach a terminal device (Abstract line 3: “an utterance-based user interface”), 
comprising: a processor and a memory; the memory having a computer-readable program executable by the processor stored thereon; and the processor, when executing the computer-readable program (¶ 0038: “In some embodiments of the present disclosure, a non-transitory computer readable medium comprises: an emotion processor having one or more sequences of emotion processor instructions that, when executed by one or more processors, causes the one or more processors to generate an output adapted to a detected emotion in a natural utterance from a user”;
implementing:
receiving voice information to be processed, and acquiring an intention type of the voice information (for a “natural utterance” “detectable by a microphone” (¶ 0024 lines 1-3), according to ¶ 0033 lines 2+: “receiving” “via a user device, a natural utterance” (receiving a voice information) “including a command” (with an intention type));
determining an emotion type of the voice information when the intention type is an emotion intention (¶ 0033 lines 4+: “extracting the command” (using the intention) “extracting an emotion feature” (determining an emotion type) “from the natural utterance” (from the voice information which can indicate the intention type is an emotion type)); and
generating a response voice of the voice information according to the emotion type, and playing the response voice (¶ 0033 lines 5+: “mapping the emotion feature to an emotion” “and responding to the natural utterance at least by executing the command and providing a synthesized utterance” (generating a response voice to the voice information) “adapted to the emotion” (according to the emotion type)).
wherein the receiving voice information to be processed, and acquiring an intention type of the voice information comprises:
receiving the voice information to be processed, performing word division on the voice information to obtain several words, and judging whether the several words being divided comprise an emotion keyword (¶ 0080: “user device” “includes a speech to text (STT) engine 124 for converting an analog signal of the natural language utterance into a digitalized textual component” (performing a word division on the “utterance” (voice information) to obtain “textual component” (several words)); e.g. ¶ 0099 last sentence: “if the word " Great" is in the textual component of the natural utterance, that textual component emotion feature is mapped to a joyful emotion” (where the “textual component” (several words) comprises “Great” (an emotion keyword mapped to joyful emotion)) ; and
deciding the intention type of the voice information is an emotion intention when the several words comprise the emotion keyword (¶ 0099 last sentence: “if the word " Great" is in the textual component” (if the emotion keyword is within the “textual component” (several words)) “of the natural utterance, that textual component emotion feature is mapped to a joyful emotion” (then the “natural utterance” (voice information) associated with the “textual component” (the several words) is “joyful” (is an emotion intention)); 
wherein the deciding the intention type of the voice information is an emotion intention when the several words comprise the emotion keyword comprises:
acquiring a number of emotion keywords comprised when the several words comprise the emotion keyword (¶ 0099 last sentence: “if the word " Great" is in the textual component” (acquiring one emotion keyword in the “textual component” (among several words)) “of the natural utterance, that textual component emotion feature is mapped to a joyful emotion”);
deciding the intention type of the voice information is the emotion intention when the number is equal to 1 (¶ 0099 last sentence: “if the word " Great" is in the textual component” (based on one emotion keyword (“Great”)) “of the natural utterance, that textual component emotion feature is mapped to a joyful emotion” (the “textual component” (transcript of the voice information and thus the voice information) is determined to be “joyful”));
deciding the intention type of the voice information is an instruction intention if the emotion types corresponding to the emotion keywords are different (¶ 0049 lines 2+: “user’s” “the command” “play me an uplifting song” “mapped to a “neutral feeling”” (i.e., no word in this “command” can be “joyful”) “the second utterance of “Yes!” is received from the user” “and mapped to a” “joyful” “emotion” (the combination of the original command and the word “Yes!” by the “user” together possess at least two keywords that possess two different emotion types), and according to ¶ 0049 last sentence this results in “what a Wonderful World” “by Louis Armstrong is executed” (resulted in treating the overall “user” input as an instruction intention)).
BROMAND et al. do not specifically disclose:
 And detecting whether the emotion types corresponding to the emotion keywords are the same when the number is greater than 1, and deciding the intention type of the voice information is the emotion intention if the emotion types corresponding to the emotion keywords are the same.
Wu et al. do teach:
detecting whether the emotion types corresponding to the emotion keywords are the same when the number is greater than 1, and deciding the intention type of the voice information is the emotion intention if the emotion types corresponding to the emotion keywords are the same (¶ 0111 last 7 lines: “An exemplary sentence with the emotion " sad" may be "I don't like it and want to cry"” (the “sentence” (e.g. voice information) as a whole is decided to correspond to emotion type “sad”, and the sentence possesses the keywords “don’t like” as well as “cry” that are each “sad” emotion type as well)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “emotion model” of Wu et al. into the “emotion sensitive responses” prediction of BROMAND et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable BROMAND et al. to benefit from “an emotion model, which is time-sensitive to an input” as disclosed in Wu et al. ¶ 0153 lines 4-5.

Regarding claim 19, BROMAND et al. do teach the emotion-based voice interaction method according to claim 8, wherein further comprising:
judging whether the instruction intention can determine instruction content when the intention type is the instruction intention (¶ 0049 lines 2+: “user’s” “the command” “play me an uplifting song” (the instruction intention) “mapped to a “neutral feeling”” “the second utterance of “Yes!” is received from the user” “and mapped to a” “joyful” “emotion”, and according to ¶ 0049 last sentence this results in “what a Wonderful World” “by Louis Armstrong is executed” (is assessed to correspond to this instruction content)); 
and querying a user in a domain clarification manner until the instruction content can be determined when the instruction intention can determine the instruction content, and executing an instruction corresponding to the instruction intention (¶ 0049 lines 2+: “user’s” “the command” “play me an uplifting song” (the instruction intention) “mapped to a “neutral feeling” “In response to the first natural utterance, a synthesized utterance is generated proposing that a first action of playing the song” “What a Wonderful World” “by Louis Armstrong be executed” (a domain clarification query is made by the system)  “the second utterance of “Yes!” is received from the user” “and mapped to a” “joyful” “emotion”, and according to ¶ 0049 last sentence this results in “what a Wonderful World” “by Louis Armstrong is executed” (The answer “Yes!” authorizes the instruction content to be executed)).

Claim 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over BROMAND et al. in view of Wu et al., and further in view of Taki et al. (US 2019/0019512).
Regarding claim 16, BROMAND et al. in view of Wu et al. do not specifically disclose the emotion-based voice interaction method according to claim 1, wherein after generating a response voice of the voice information according to the emotion type, and playing the response voice, comprising:
recording a number of the voice information of which the emotion type is the emotion intention, and activating a preset active emotion mode when the number reaches a preset threshold value, wherein a terminal device actively plays voice in the active emotion mode.
Taki et al. do teach recording a number of the voice information of which the emotion type is the emotion intention, and activating a preset active emotion mode when the number reaches a preset threshold value, wherein a terminal device actively plays voice in the active emotion mode (¶ 0031: “FIG. 23 is a diagram illustrated to describe an example of controlling whether to automatically activate” (activating) “a symbol input mode” (a preset emotion mode) “on the basis of emotion information of a user”, where according to ¶ 0102 sentence 2: “the emotion information of the user U1 may be obtained by analyzing the sound information” (the emotion is determined on the basis of “sound information” (speech) of the user); and furthermore according to ¶ 0166: “Alternatively, in a case where the volume change of the sound information collected by the sound collection unit 120” (a number of voice information recorded) “is larger than a threshold value” (is compared to a preset threshold) “the output control unit 143 may determine that the user's emotion” (for determining the user’s emotion) “is stronger than the threshold value”).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the techniques of sound management based on user’s emotion of TAKI et al.’s “SPEECH RECOGNITION PROCESSING” into the “Speech to text” of BROMAND et al. in BROMAND et al. in view of Wu et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable BROMAND et al. in view of Wu et al.  so as to have “The emotion information of the user U1 may be obtained” in order to help with enhancing “accuracy” of the speech recognizer as disclosed in TAKI et al. ¶ 0102-103.
Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARZAD KAZEMINEZHAD whose telephone number is (571)270-5860. The examiner can normally be reached 10:30 am to 11:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DANIEL C WASHBURN can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Farzad Kazeminezhad/
Art Unit 2657
August 24th 2022.