DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
2.	Applicant’s arguments with respect to claims 1- 3, 5 -7, 9 – 14, 16 - 23 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant argues that Amini et al. do no teach generating a response dialogue based on the content of the speech, through use of a neural network; selecting a one of the multiple response dialog choices based on comparison of prosodic qualities of the one of the multiple response dialog choices and based on the conversational context associated with the speech (Amendment, pages 8 – 12).

Claim Rejections - 35 USC § 103
3.	The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
4.	Claims 1 – 3, 5, 6, 21 are rejected under 35 U.S.C. 103 as being unpatentable over Amini et al., (US PAP 2018/0144761) in view of Galley et al. (US PAP 2016/0352656).
As per claims 1, Amini et al., teach a method comprising:

determining a linguistic style of the speech; generating a response dialogue based on the content of the speech; and modifying the response dialogue based on the linguistic style of the speech/prosodic qualities (“better understanding by the virtual agent of user utterances based on affective context (e.g., emotion, mood, personality and/or satisfaction of the user)… determining, by the computer, at least one of a facial expression, body gesture, vocal expression, or verbal expression for the virtual agent based on a content of the particular user utterance”; paragraphs 10, 11, claim 1).
However, Amini et al. do not specifically teach generating a response dialogue based on the content of the speech, through use of a neural network.
Galley et al. disclose that the neural network 322 can be trained from end to end on massive amounts of social media conversational data.  In this example, the response generation engine 318 utilizes the neural network 322 model to improve open-domain response generation in conversations (paragraph 98).
Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to generate a response dialogue based on the content of the speech, through use of a neural network as taught by Galley et al. in Amini et al., because that would help provide improved quality and accuracy of machine generated responses enables more efficient communication between users and the response generation systems (paragraph 37).



As per claims 3, Amini et al., in view of Galley et al. further disclose the content variables include at least one of repetition, or utterance length (“the one or more dialogue metrics relative length of dialogue, number of misunderstandings, number of repetitions”; Amini et al., paragraph 21).

As per claim 5, Amini et al., in view of Galley et al. further disclose generating a synthetic facial expression for an embodied conversational agent based on a sentiment identified from the response dialogue (“facial expression”; Amini et al., paragraphs 11, 22; Galley et al. paragraph 137).

As per claim 6, Amini et al., in view of Galley et al. further disclose identifying a facial expression of the user; and generating a synthetic facial expression for an embodied conversational agent based on the facial expression of the user (“Applying the emotion vector of the user, the mood vector of the user and/or the personality vector of the user to the virtual agent can involve instructing the virtual agent to modify one or more statements, facial expressions, vocal expressions, or body language to match and/or change an emotional state of the user.”; Amini et al., paragraphs 41, 65, 82).

.

5.	Claims 7, 10 – 14, 16 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Amini et al. (US PAP 2018/0144761) in view of Horling et al. (US PAP 2018/0197542).
As per claim 7, Amini et al. teach a system comprising:
a microphone configured to generate an audio signal representative of sound; a speaker configured to generate audio output; one or more processors (paragraphs 3, 41, 245); and
memory storing instructions that, when executed by the one or more processors,
cause the one or more processors (paragraphs 248, 249) to:
detect speech in the audio signal; recognize a content of the speech (“a content of the user utterance”; paragraph 11);
determine a conversational context associated with the speech (“better understanding by the virtual agent of user utterances based on affective context (e.g., emotion, mood, personality and/or satisfaction of the user)… determining, by the computer, at least one of a facial expression, body gesture, vocal expression, or verbal expression for the virtual agent based on a content of the particular user utterance”; paragraphs 10, 11, claim 1).
However, Amini et al. do not specifically teach generate multiple response dialogue choices having response content based on the content of the speech; and

Horling et al. disclose that a response selected from a plurality of candidate responses based on the state expressed by the user.  In various implementations, the state expressed by the user may be a negative sentiment, and the response selected from the plurality of candidate responses may include an empathetic statement… In 
various implementations, the one or more signals may include detection of a change in a context of the user since a last interaction between the user and the chatbot (paragraphs 8, 9).
	Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to select one response dialog based on prosodic qualities as taught by Horling et al. in Amini et al., because that would help provide an appropriate response (paragraph 39).

As per claim 10, Amini et al., further disclose a display, and wherein the instructions cause the one or more processors to generate an embodied conversational agent on the display, and wherein the embodied conversational agent has a synthetic facial expression based on the conversational context associated with the speech (“automatically generating at least one of facial expressions, body gestures, vocal expressions, or verbal expressions for a virtual agent”; Abstract; paragraphs 11, 22, 134).



As per claim 12, Amini et al., further disclose a camera, wherein the instructions cause the one or more processors to identify a facial expression of a user in an image generated by the camera, and wherein the conversational context comprises the facial expression of the user (“can involve receiving voice of the user (e.g., via a microphone) and/or facial expression of the user (e.g., via a camera).  In some embodiments, the method involves receiving one or more dialogue performance metrics (e.g., from a dialogue manager of the virtual agent).”; paragraphs 41, 65, 82).

As per claim 13, Amini et al., further disclose a camera, wherein the instructions cause the one or more processors to identify a head orientation of a user in an image generated by the camera, and wherein the embodied conversational agent has head pose based on the head orientation of the user (paragraphs 41, 133, 166- 169).

As per claim 14, Amini et al., teach a  computer-readable storage medium having computer-executable instructions stored thereupon, when executed by one or more processors of a computing system, cause the computing system to:
receive conversational input from a user; receive video input including a face of the user (“a content of the user utterance”; paragraphs 11, 41);

generate an embodied conversational agent having lip movement based on the response dialogue and a synthetic facial expression based on the facial expression of the user (“applying, by the computer, the facial expression, body gesture, 
vocal expression, verbal expression, or any combination thereof to the virtual agent to produce control of the virtual agent's vocal expression, facial expression or both.”; paragraph 179, claim 1).
	However, Amini et al. do not specifically teach generating a plurality of response dialogue choices based on the conversational input of the user, wherein each of the response dialogue choices is a possible response to the conversational input of the user and each is characterized by linguistic style variables; select a response dialog from the plurality of response dialog choices based on linguistic style variables of the plurality of response dialog choices and the linguistic style of conversational input of the user.
	Horling et al. disclose that a response selected from a plurality of candidate responses based on the state expressed by the user.  In various implementations, the state expressed by the user may be a negative sentiment, and the response selected from the plurality of candidate responses may include an empathetic statement… In 

	Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to select one response dialog based on linguistic style variables as taught by Horling et al. in Amini et al., because that would help provide an appropriate response (paragraph 39).

As per claim 16, Amini et al., further disclose the conversational input comprises speech of the user and wherein the linguistic style comprises content variables and acoustic variables (paragraphs 64, 102).

	As per claim 17, Amini et al., further disclose determination of the facial expression of the user comprises identifying an emotional expression of the user (Abstract).

	As per claim 18, Amini et al., further disclose the computing system is further caused to: identify a head orientation of the user; and cause the embodied conversational agent to have a head pose that is based on the head orientation of the user (paragraphs 41, 133, 166- 169).



	As per claim 20, Amini et al., further disclose the synthetic facial expression is based on a sentiment identified in the speech of the user (“facial expression”; paragraphs 11, 22).

6.	Claims 9, 22, 23, are rejected under 35 U.S.C. 103 as being unpatentable over Amini et al. (US PAP 2018/0144761) in view of Horling et al. (US PAP 2018/0197542); and further in view of Gong (US PAP 2003/0167167).
As per claim 9, Amini et al., do not specifically teach the conversational context comprises a device usage pattern of the system. 
Gong discloses that the machine learning module 332 determines a basic profile of the user that includes information about the verbal and non-verbal styles of the user, application program usage patterns, and the internal and external context of the user (paragraph 34).
Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to determine usage pattern of the system as taught by Gong in Horling et al. in view Amini et al., because that would help provide an improved experience for the user as the agent assists the user in operating a computing device or computing device application program (paragraph 21).


Gong discloses that a video camera or a vision tracking device may provide non-verbal data about the user's eye focus, head orientation, and other body position information (paragraph 25).  
Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to use an eye tracking device as taught by Gong in Horling et al. in view Amini et al., because that would help provide an improved experience for the user as the agent assists the user in operating a computing device or computing device application program (paragraph 21).

As per claim 23, Amini et al., in view of Horling et al. do not specifically teach the   comprise word choice and utterance length. 
Gong discloses that the verbal extractor 322 also parses the verbal content to 
determine the linguistic style of the user, such as word choice, grammar choice, and syntax style. Speech style may include speech rate, pitch average, pitch range, intensity, voice quality, pitch changes, and level of articulation (paragraphs 27, 69).
	Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to determine linguistic style variables as taught by Gong in Horling et al. in view Amini et al., because that would help provide an improved experience for the user as the agent assists the user in operating a computing device or computing device application program (paragraph 21).

Conclusion
7.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
8.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Liu et al. teach personalized custom synthetic speech.  Chandrasekaran et al. teach empathetic personal virtual digital assistant.  Wu teaches systems and methods for an emotionally intelligent chat bot.

9.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD SAINT CYR whose telephone number is (571)272-4247.  The examiner can normally be reached on Monday- Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/LEONARD SAINT CYR/Primary Examiner, Art Unit 2658