DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Arguments
Applicant’s arguments with respect to claim(s) s 1, 2, 4-9, 11-16, 18, 20-24 have been considered but are not persuasive.
Applicant has amended the claims to include “enabling providing feedback to the user to make the user have a feeling of being understood during interacting with the human-machine interaction device”. Upon further consideration of the Lee reference, Lee provides such a feeling because Lee confirms a users proposed action in par. [0233] “In some embodiments, the voice assist devices 1144 and 1148 may request confirmation before a proposed action is performed, such as “would you like to increase the temperature?,” “would you like to decrease the temperature by 2 degrees Fahrenheit?,” and the like. In some embodiments, the voice assist devices 1144 and 1148 may request additional information in response to detecting an utterance relating to the temperature, such as asking “how hot are you?,” “how cold are you?,” and the like. In some embodiments, the voice assist devices 1144 and 1148 may acknowledge the utterance or describe a proposed action, such as “increasing the temperature,” “lowering the temperature,” “setting the temperature to 72 degrees Fahrenheit,” and the like. For these reasons the examiner believes the claims are still taught by the references.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1, 2, 4-9, 11-16, 18, and 20-24  is/are rejected under 35 U.S.C. 103 as being unpatentable over Huang U.S. PAP 2019/0164554 A1 in view of Lee U.S. PAP 2019/0353379 A1 further in view of Penilla U.S. PAP 2016/0104486 A1.
Regarding claim 1 Huang teaches a method for human-machine interaction (A method, computer-readable medium, and system including a speech-to-text module to receive an input of speech, see abstract), comprising: 
recognizing a word used in a speech instruction from a human-machine interaction device used by a user (receive an input of speech including one or more words generated by a human and to output data including text, see par. [0005]); 
determining an emotion contained in the speech instruction and feedback to be provided to the user based on a predetermined mapping between words, emotions and feedback, wherein the feedback is adapted to the emotion (generate a reply to the speech input, the reply including a textual component, sentimental information associated with the textual component, and contextual information associated with the textual component, see par. [0005]). 
However Huang does not teach the feedback comprises a touching form; and wherein, enabling providing the feedback to the user comprises enabling changing a temperature of the human-machine interaction device used by the user in accordance with the emotion contained in the speech instruction; and enabling providing the feedback to the user to make the user have a feeling of being understood during interaction with the human-machine interaction device.
In the same field of endeavor Lee teaches a method for controlling an HVAC system using one or more voice assist devices to control an HVAC system, while reducing or minimizing energy consumption or costs, see par. [0069]. The method includes receiving, by one or more processors, utterance data from a voice assist device, determining, by the one or more processors, a location of the voice assist device, analyzing, by the one or more processors, the utterance data to identify a sentiment relating to a temperature of the location, and controlling, by the one or more processors, the HVAC system to adjust the temperature of the location based on the sentiment, se par. [0022]; Lee confirms a users proposed action in par. [0233]; In some embodiments, the voice assist devices 1144 and 1148 may request confirmation before a proposed action is performed, such as “would you like to increase the temperature?,” “would you like to decrease the temperature by 2 degrees Fahrenheit?,” and the like. In some embodiments, the voice assist devices 1144 and 1148 may request additional information in response to detecting an utterance relating to the temperature, such as asking “how hot are you?,” “how cold are you?,” and the like. In some embodiments, the voice assist devices 1144 and 1148 may acknowledge the utterance or describe a proposed action, such as “increasing the temperature,” “lowering the temperature,” “setting the temperature to 72 degrees Fahrenheit,” and the like. 
It would have been obvious to one of ordinary skill in the art to combine the Huang invention with the teachings of Lee for the benefit of using one or more voice assist devices to control an HVAC system, while reducing or minimizing energy consumption or costs, see par. [0069].
However Although Lee teaches tactile feedback it does not teach where the feedback is in touch form. 
In a similar field of endeavor Penilla teaches a vehicle which may respond to user input or provide recommendations without being prompted by user input. The system selected supplemental content can be sent to the displays of the vehicle or output via the audio system. As noted above, the type of information, such as the select supplemental content is tailored for the interaction mode selected for the vehicle , see par. [0276]. One example of a vehicle recommendation may be to inquire if the user wishes a particular setting. In one example, if it is cold outside, the vehicle may automatically heat the seats to levels previously set by the or provide recommendations to the user. In other examples, the vehicle can automatically seek input from the user with customized dialogs. By way of example, dialogs may be audio dialogs, text dialogs, icon dialogs, sound and text, voice and text, or simply voice output. One example of voice output may be, “It's cold outside, do you want me to heat the seats?,” or “Hi Fred, its cold outside, do you want more heat?”, or “Your seats have been set to level 3 heat, etc.” These are only some examples, of recommendations that can be provided to the user, based on one or more of the user's voice tone, mood, learned prior settings, use patterns, predictions, and combinations thereof, see par. [0277]. Examiners note: heated seats provide tactile or touch feedback to a user, in this case the system provides this feedback to the user based on voice tone, mood and learned prior patterns.
It would have been obvious to one of ordinary skill in the art to combine the Huang in view of Lee invention with the teachings of Penilla for the benefit of tailoring interactions for the user, see par. [0276]. 
Regarding claim 2 Huang teaches the method according to claim 1, wherein recognizing the word used in the speech instruction from the user comprises: 
obtaining an audio signal comprising the speech instruction (The audio 107 from human(s) 105 is received by a speech-to-text (STT) module 110 as an input thereto, see par. [0020]);
converting the speech instruction into text information (speech-to-text (STT) module, see par. [0020]); 
and extracting the word from the text information (STT module 110 might operate to parse the received speech into its constituent individual words, see par. [0020]). 
Regarding claim 4 Huang teaches the method according to claim 1, wherein the predetermined mapping is obtained by training based on history information of the words, the emotions, and the feedback (STT module 110 may receive at least some knowledge information 142 from response generator 130. The knowledge information may include indications or representations of one or more of the following types of information: expected human responses, keywords, probability distributions, and knowledge of prior or on-going conversations, see par. [0026]). 
Regarding claim 5 Huang teaches the method according to claim 1, wherein the method is implemented in a cloud side or a human-machine interaction device (Apparatus 700 may comprise an implementation of server, a dedicated processor-enabled device, a user entity device, and other systems, including a cloud server embodiment of at least parts of a platform or framework disclosed herein, see par. [0048]).
Regarding claim 6 Huang teaches the method according to claim 5, wherein, when the method is implemented in the cloud side, the method further comprises: 
receiving an audio signal comprising the speech instruction from the human-machine interaction device (mechanism for receiving audio including speech 107 from one or more human, se par. [0019]; input devices 710 to receive inputs from other systems and entities, see par. [0049]); 
and enabling providing the feedback to the user comprises:
sending information to the human-machine interaction device, wherein the information indicates the feedback to be provided to the user, such that the human-machine interaction device provides the feedback to the user (response generator 140 generates a reply based on more than the text parsed from the human(s) generated speech 107, as determined by STT 110. Framework 100 may generate a reply to human-supplied speech based on text 115 parsed from the spoken words and other information and data, including sentiment information 120 and other parameters information 125 associated with the text and additional information received from other sources, see par. [0026]; one or more output devices 720. Communication device 715 may facilitate communication with other systems and components, Output device(s) 720 may comprise, for example, a display or a speaker, see par. [0048]). 
Regarding claim 7 Huang teaches the method according to claim 6, wherein the information comprises text information representing a predetermined speech to be played to the user, and providing the feedback to the user comprises: converting the text information into the predetermined speech (text-to-speech engine 178 of TTS module 175 may generate speech output 185, see par. [0030]). 


Regarding claim 8 Huang teaches an electronic device ((A method, computer-readable medium, and system including a speech-to-text module to receive an input of speech, see abstract), comprising: one or more processors (processor-enabled device or system, see par. [0006]); 
and a memory, configured to store one or more programs that, when executed by the one or more processors, cause the one or more processors to perform a method for human-machine interaction (Data storage device 730 may comprise any appropriate persistent storage device, see par. [0050]), wherein the method comprises: 
recognizing a word used in a speech instruction from a human-machine interaction device used by a user (receive an input of speech including one or more words generated by a human and to output data including text, see par. [0005]); 
determining an emotion contained in the speech instruction and feedback to be provided to the user based on a predetermined mapping between words, emotions and feedback, wherein the feedback is adapted to the emotion (generate a reply to the speech input, the reply including a textual component, sentimental information associated with the textual component, and contextual information associated with the textual component, see par. [0005]). 
However Huang does not teach the feedback comprises a touching form; and wherein, enabling providing the feedback to the user comprises enabling changing a temperature of the human-machine interaction device used by the user in accordance with the emotion contained in the speech instruction; and enabling providing the feedback to the user to make the user have a feeling of being understood during interaction with the human-machine interaction device.
In the same field of endeavor Lee teaches a method for controlling an HVAC system using one or more voice assist devices to control an HVAC system, while reducing or minimizing energy consumption or costs, see par. [0069]. The method includes receiving, by one or more processors, utterance data from a voice assist device, determining, by the one or more processors, a location of the voice assist device, analyzing, by the one or more processors, the utterance data to identify a sentiment relating to a temperature of the location, and controlling, by the one or more processors, the HVAC system to adjust the temperature of the location based on the sentiment, se par. [0022]; Lee confirms a users proposed action in par. [0233]; In some embodiments, the voice assist devices 1144 and 1148 may request confirmation before a proposed action is performed, such as “would you like to increase the temperature?,” “would you like to decrease the temperature by 2 degrees Fahrenheit?,” and the like. In some embodiments, the voice assist devices 1144 and 1148 may request additional information in response to detecting an utterance relating to the temperature, such as asking “how hot are you?,” “how cold are you?,” and the like. In some embodiments, the voice assist devices 1144 and 1148 may acknowledge the utterance or describe a proposed action, such as “increasing the temperature,” “lowering the temperature,” “setting the temperature to 72 degrees Fahrenheit,” and the like. 
It would have been obvious to one of ordinary skill in the art to combine the Huang invention with the teachings of Lee for the benefit of using one or more voice assist devices to control an HVAC system, while reducing or minimizing energy consumption or costs, see par. [0069].
However Although Lee teaches tactile feedback it does not teach where the feedback is in touch form. 
In a similar field of endeavor Penilla teaches a vehicle which may respond to user input or provide recommendations without being prompted by user input. The system selected supplemental content can be sent to the displays of the vehicle or output via the audio system. As noted above, the type of information, such as the select supplemental content is tailored for the interaction mode selected for the vehicle , see par. [0276]. One example of a vehicle recommendation may be to inquire if the user wishes a particular setting. In one example, if it is cold outside, the vehicle may automatically heat the seats to levels previously set by the or provide recommendations to the user. In other examples, the vehicle can automatically seek input from the user with customized dialogs. By way of example, dialogs may be audio dialogs, text dialogs, icon dialogs, sound and text, voice and text, or simply voice output. One example of voice output may be, “It's cold outside, do you want me to heat the seats?,” or “Hi Fred, its cold outside, do you want more heat?”, or “Your seats have been set to level 3 heat, etc.” These are only some examples, of recommendations that can be provided to the user, based on one or more of the user's voice tone, mood, learned prior settings, use patterns, predictions, and combinations thereof, see par. [0277]. Examiners note: heated seats provide tactile or touch feedback to a user, in this case the system provides this feedback to the user based on voice tone, mood and learned prior patterns.
It would have been obvious to one of ordinary skill in the art to combine the Huang in view of Lee invention with the teachings of Penilla for the benefit of tailoring interactions for the user, see par. [0276]. 
Regarding claim 9 Huang teaches the electronic device according to claim 8, wherein recognizing the word used in the speech instruction from the user comprises: 
obtaining an audio signal comprising the speech instruction (The audio 107 from human(s) 105 is received by a speech-to-text (STT) module 110 as an input thereto, see par. [0020]);
converting the speech instruction into text information (speech-to-text (STT) module, see par. [0020]); 
and extracting the word from the text information (STT module 110 might operate to parse the received speech into its constituent individual words, see par. [0020]). 
Regarding claim 11 Huang teaches the electronic device according to claim 8, wherein the predetermined mapping is obtained by training based on history information of the words, the emotions, and the feedback (STT module 110 may receive at least some knowledge information 142 from response generator 130. The knowledge information may include indications or representations of one or more of the following types of information: expected human responses, keywords, probability distributions (e.g., Bayesian), and knowledge of prior or on-going conversations, see par. [0026]). 
Regarding claim 12 Huang teaches the electronic device according to claim 8, wherein the electronic device is implemented in a cloud side or a human-machine interaction device (Apparatus 700 may comprise an implementation of server, a dedicated processor-enabled device, a user entity device, and other systems, including a cloud server embodiment of at least parts of a platform or framework disclosed herein, see par. [0048]). 
Regarding claim 13 Huang teaches the electronic device according to claim 12, wherein, when the electronic device is implemented in the cloud side, the method further comprises: 
receiving an audio signal comprising the speech instruction from the human-machine interaction device (mechanism for receiving audio including speech 107 from one or more human, se par. [0019]; input devices 710 to receive inputs from other systems and entities, see par. [0049]); 
and enabling providing the feedback to the user comprises:
sending information to the human-machine interaction device, wherein the information indicates the feedback to be provided to the user, such that the human-machine interaction device provides the feedback to the user (response generator 140 generates a reply based on more than the text parsed from the human(s) generated speech 107, as determined by STT 110. Framework 100 may generate a reply to human-supplied speech based on text 115 parsed from the spoken words and other information and data, including sentiment information 120 and other parameters information 125 associated with the text and additional information received from other sources, see par. [0026]; one or more output devices 720. Communication device 715 may facilitate communication with other systems and components, Output device(s) 720 may comprise, for example, a display or a speaker,see par. [0048]). 

Regarding claim 14 Huang teaches the electronic device according to claim 13, wherein the information comprises text information representing a predetermined speech to be played to the user, and providing the feedback to the user comprises: converting the text information into the predetermined speech (text-to-speech engine 178 of TTS module 175 may generate speech output 185, see par. [0030]). 
Regarding claim 15 Huang teaches a non-transitory computer-readable storage medium, having computer programs stored thereon, when executed by a processor, causing the processor to perform a method for human-machine interaction (A method, computer-readable medium, and system including a speech-to-text module to receive an input of speech, see abstract), wherein the method comprises: 
recognizing a word used in a speech instruction from a human-machine interaction device used by a user (receive an input of speech including one or more words generated by a human and to output data including text, see par. [0005]); 
determining an emotion contained in the speech instruction and feedback to be provided to the user based on a predetermined mapping between words, emotions and feedback, wherein the feedback is adapted to the emotion (generate a reply to the speech input, the reply including a textual component, sentimental information associated with the textual component, and contextual information associated with the textual component, see par. [0005]). 
However Huang does not teach the feedback comprises a touching form; and wherein, enabling providing the feedback to the user comprises enabling changing a temperature of the human-machine interaction device used by the user in accordance with the emotion contained in the speech instruction; and enabling providing the feedback to the user to make the user have a feeling of being understood during interaction with the human-machine interaction device.
In the same field of endeavor Lee teaches a method for controlling an HVAC system using one or more voice assist devices to control an HVAC system, while reducing or minimizing energy consumption or costs, see par. [0069]. The method includes receiving, by one or more processors, utterance data from a voice assist device, determining, by the one or more processors, a location of the voice assist device, analyzing, by the one or more processors, the utterance data to identify a sentiment relating to a temperature of the location, and controlling, by the one or more processors, the HVAC system to adjust the temperature of the location based on the sentiment, se par. [0022]; Lee confirms a users proposed action in par. [0233]; In some embodiments, the voice assist devices 1144 and 1148 may request confirmation before a proposed action is performed, such as “would you like to increase the temperature?,” “would you like to decrease the temperature by 2 degrees Fahrenheit?,” and the like. In some embodiments, the voice assist devices 1144 and 1148 may request additional information in response to detecting an utterance relating to the temperature, such as asking “how hot are you?,” “how cold are you?,” and the like. In some embodiments, the voice assist devices 1144 and 1148 may acknowledge the utterance or describe a proposed action, such as “increasing the temperature,” “lowering the temperature,” “setting the temperature to 72 degrees Fahrenheit,” and the like. 
It would have been obvious to one of ordinary skill in the art to combine the Huang invention with the teachings of Lee for the benefit of using one or more voice assist devices to control an HVAC system, while reducing or minimizing energy consumption or costs, see par. [0069].
However Although Lee teaches tactile feedback it does not teach where the feedback is in touch form. 
In a similar field of endeavor Penilla teaches a vehicle which may respond to user input or provide recommendations without being prompted by user input. The system selected supplemental content can be sent to the displays of the vehicle or output via the audio system. As noted above, the type of information, such as the select supplemental content is tailored for the interaction mode selected for the vehicle , see par. [0276]. One example of a vehicle recommendation may be to inquire if the user wishes a particular setting. In one example, if it is cold outside, the vehicle may automatically heat the seats to levels previously set by the or provide recommendations to the user. In other examples, the vehicle can automatically seek input from the user with customized dialogs. By way of example, dialogs may be audio dialogs, text dialogs, icon dialogs, sound and text, voice and text, or simply voice output. One example of voice output may be, “It's cold outside, do you want me to heat the seats?,” or “Hi Fred, its cold outside, do you want more heat?”, or “Your seats have been set to level 3 heat, etc.” These are only some examples, of recommendations that can be provided to the user, based on one or more of the user's voice tone, mood, learned prior settings, use patterns, predictions, and combinations thereof, see par. [0277]. Examiners note: heated seats provide tactile or touch feedback to a user, in this case the system provides this feedback to the user based on voice tone, mood and learned prior patterns.
It would have been obvious to one of ordinary skill in the art to combine the Huang in view of Lee invention with the teachings of Penilla for the benefit of tailoring interactions for the user, see par. [0276]. 
Regarding claim 16 Huang teaches the non-transitory computer-readable storage medium according to claim 15, wherein recognizing the word used in the speech instruction from the user comprises: 
obtaining an audio signal comprising the speech instruction (The audio 107 from human(s) 105 is received by a speech-to-text (STT) module 110 as an input thereto, see par. [0020]);
converting the speech instruction into text information (speech-to-text (STT) module, see par. [0020]); 
and extracting the word from the text information (STT module 110 might operate to parse the received speech into its constituent individual words, see par. [0020]). 
Regarding claim 18 Huang teaches the non-transitory computer-readable storage medium according to claim 15, wherein the predetermined mapping is obtained by training based on history information of the words, the emotions, and the feedback (STT module 110 may receive at least some knowledge information 142 from response generator 130. The knowledge information may include indications or representations of one or more of the following types of information: expected human responses, keywords, probability distributions, and knowledge of prior or on-going conversations, see par. [0026]). 
Regarding claim 20 Huang teaches the non-transitory computer-readable storage medium according to claim 19, wherein, when the electronic device is implemented in the cloud side, the method further comprises: 
receiving an audio signal comprising the speech instruction from the human-machine interaction device (mechanism for receiving audio including speech 107 from one or more human, se par. [0019]; input devices 710 to receive inputs from other systems and entities, see par. [0049]); 
and enabling providing the feedback to the user comprises:
sending information to the human-machine interaction device, wherein the information indicates the feedback to be provided to the user, such that the human-machine interaction device provides the feedback to the user (response generator 140 generates a reply based on more than the text parsed from the human(s) generated speech 107, as determined by STT 110. Framework 100 may generate a reply to human-supplied speech based on text 115 parsed from the spoken words and other information and data, including sentiment information 120 and other parameters information 125 associated with the text and additional information received from other sources, see par. [0026]; one or more output devices 720. Communication device 715 may facilitate communication with other systems and components, Output device(s) 720 may comprise, for example, a display or a speaker, see par. [0048]). 
Regarding claim 21 Huang teaches the method according to claim 1, wherein enabling providing the feedback to the user further comprises at least one of: enabling displaying a predetermined color to the user; enabling playing a predetermined speech to the user; and 5 U.S. Patent Application Serial No. 16/281,076 In response to Office Action mailed January 4, 2021 enabling playing a predetermined video to the user (a text-to-speech module to receive the textual component, sentimental information, and contextual information and to generate, based on the received textual component and its associated sentimental information and contextual information, a speech output including one or more spoken words).  
Regarding claim 22 Huang teaches the electronic device according to claim 8, wherein enabling providing the feedback to the user further comprises at least one of: enabling displaying a predetermined color to the user; enabling playing a predetermined speech to the user; and enabling playing a predetermined video to the user (a text-to-speech module to receive the textual component, sentimental information, and contextual information and to generate, based on the received textual component and its associated sentimental information and contextual information, a speech output including one or more spoken words).  
Regarding claim 23 Huang teaches the non-transitory computer-readable storage medium according to claim 15, wherein enabling providing the feedback to the user further comprises at least one of: enabling displaying a predetermined color to the user; enabling playing a predetermined speech to the user; and enabling playing a predetermined video to the user; (a text-to-speech module to receive the textual component, sentimental information, and contextual information and to generate, based on the received textual component and its associated sentimental information and contextual information, a speech output including one or more spoken words).  
Regarding claim 24 Penilla teaches the method according to claim 1, wherein enabling changing a temperature of the human-machine interaction device used by the user in accordance with the emotion: enabling changing a temperature of a housing of the human-machine interaction device in accordance with the emotion ( seats have been set to level 3 heat, see par. [0277]). 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Ortiz-Sanchez whose telephone number is (571)270-3711.  The examiner can normally be reached on Monday- Friday 9AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MICHAEL ORTIZ-SANCHEZ/Primary Examiner, Art Unit 2656