DETAILED ACTION
This final action is in response to amendment filed on October 11, 2021. In this amendment, claims 1-2, 6-7 and 11-12 have been amended. Claims 1-15 are pending, with claims 1, 6 and 11 being independent. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Objections
Claims 5, 10 and 15 are objected to because of the following: 
“wherein the generating of the control parameter” should read “generating of a control parameter”.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:



Claims 1-2, 5-7, 10-12 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Lembersky et al. (US 2019/0095775, Pub. Date Mar. 28, 2019), in view of Liu et al. (US 2016/0240195, Pub. Date Aug. 18, 2016), and further in view of Wu (US 2018/0150749, Pub. May 31, 2018).
As per claim 1, Lembersky discloses a method for generating information (Lembersky Para. [0022], The techniques herein provide an Al character or avatar capable of natural verbal and visual interactions with a human), comprising: 
receiving a video and an audio of a user that are sent by a client (Lembersky Para. [0022], the techniques herein receive user input (e.g., data) indicative of a user's speech 102 through an audio processor 104 (e.g., speech-to-text) and of a user's face 106 through a video processor 108); 
identifying the video to obtain user feature information (Lembersky Para. [0022], through a facial recognition API 110 and/or skeletal tracking, the techniques herein can determine the mood of the user; Lembersky Para. [0027], analyze the user's mood [user feature information] based on the emotions on the user's face via facial recognition (based on the video input 106)), identifying the audio to obtain text information (Lembersky Para. [0022], receive user input (e.g., data) indicative of a user's speech 102 through an audio processor 104 (e.g., speech-to-text)), and generating text reply information according to the user feature information and the text information (Lembersky Para. [0022], The user's converted text (speech) and mood 110 [user feature information] may then be passed to an Al engine 112 to determine a proper response 114 to the user (e.g., an answer to a question and specific emotion), which results in the proper text and emotional response being sent to a processor 116), the user feature information comprising at least one of: gender recognition information, age recognition information, expression recognition information (Lembersky Para. [0027], analyze the user's mood [user feature information] based on the emotions on the user's face via facial recognition (based on the video input 106)), posture recognition information, gesture recognition information, and dress recognition;
 generating a reply audio according to the user feature information and the text reply information (Lembersky Para. [0022], The user's converted text (speech) and mood 110 [user feature information] may then be passed to an Al engine 112 to determine a proper response 114 to the user (e.g., an answer to a question and specific emotion), which results in the proper text and emotional response being sent to a processor 116, which then translates the responsive text back to synthesized speech 118);
generating a video of a three-dimensional virtual portrait based on a control parameter and the reply audio (Lembersky Para. [0022], The user's converted text (speech) and mood 110 may then be passed to an Al engine 112 to determine a proper response 114 to the user ( e.g., an answer to a question and specific emotion), which results in the proper text and emotional response being sent to a processor 116, which then translates the responsive text back to synthesized speech 118, and also triggers visual display "blend shapes" 120 to morph a face of the Al character or avatar (two-dimensional (2D) display or even more natural three-dimensional (3D) holograph) into a proper facial expression to convey the appropriate emotional response and mouth movement (lip synching) for the response); Lembersky Para. [0041], a mesh [control parameter] (a collection of vertices, edges, and faces that describe the shape of a 3D object, essentially something 3D) called "A" would just be a face with the mouth closed in a neutral manner. A mesh called "B" would be the same face with the mouth open to make an "O" sound. Using morph target animation the two meshes are "merged" so to speak, and the base mesh (which is the neutral closed mouth) can be morphed into an "O" shape seamlessly. This method allows a variety of combinations to generate facial expressions, and phonemes); and 
presenting the video of the three-dimensional virtual portrait to the user  (Lembersky Para. [0103], the device can control (generate) audio and visual responses of the avatar based on communication with the user, such as visually displaying/animating the avatar (2D, 3D, holographic, etc.), playing audio for the avatar's speech, etc., where the responses are based on the audio user input and/or the visual user input (e.g., the emotion of the user)).  
Lembersky does not explicitly disclose:
receiving a video and an audio of a user that are sent by a client by instant communication;
a timbre of the reply audio being a male voice, a female voice, or a child's voice determined according to the user feature information; 

Liu teaches:
a timbre of the reply audio being a male voice, a female voice, or a child's voice determined according to the user feature information (Liu Para. [0097], setting timbre for the voice reply information to be first timbre corresponding to the gender feature, as the presenting form. Using the above example again, in the case that it is determined that the voice information is input from a male user, the electronic device may set the timbre for the voice replay information as timbre liked by the male user, for example a female voice, or a male voice).
It would been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify Lembersky in view of Liu for generating a reply audio according to the user feature information and the text reply information, a timbre of the reply audio being a male voice, a female voice, or a child's voice determined according to the user feature information.
One of ordinary skill in the art would have been motived because it offers the advantage of improving interaction experience (Liu Para. [0084]).
Lembersky-Liu does not explicitly disclose:
receiving a video and an audio of a user that are sent by a client by instant communication;
transmitting the video of the three-dimensional virtual portrait to the client by the instant communication, for the client to present to the user.
Wu teaches:
(Wu Para. [0097], the query component 300 may include a user interface 310 that provides an input area in which the user can provide input to the query component 300. The input may include typed text, provide or otherwise attach an image file, provide voice input, select emoji symbols, make an audio or voice call, and/or initiate a video conversation with the artificial intelligence entity advertisement system 200) by instant communication (See Wu Fig.4, instant communication between the user and the artificial intelligence);
transmitting a video response to the client by the instant communication, for the client to present to the user (Wu Para. [0097], the user interface 310 may also be used to receive responses from the artificial intelligence entity. As with the input provided by the user, the response provided by the artificial intelligence entity may include text, images, sound, video and so on; See Wu Fig.4, instant communication between the user and the artificial intelligence).
It would been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify Lembersky in view of Wu for receiving a video and an audio of a user that are sent by a client by instant communication; and transmitting the video of the three-dimensional virtual portrait to the client by the instant communication, for the client to present to the user.
One of ordinary skill in the art would have been motived because it offers the advantage of enabling user to interact with artificial intelligence system over a network (Wu Para. [0044]).

As per claim 2, Lembersky-Liu-Wu discloses the method according to claim 1 as set forth above, Lembersky also discloses wherein the generating text reply information according to the user feature information and the text information comprises: 
acquiring relevant information, the relevant information comprising historical user feature information and historical text information (Lembersky Fig. 1 and Para. [0031], collect for example, a historical activity database, the sentiment from the user using facial recognition, and stores this in their emotional history in the database of emotions and responses 124 for a particular user. The machine learning tools and techniques 122 may then be used to improve the virtual assistant's responses based on the user's past experiences such as shopping and dining habits from questions they ask the virtual assistant); and
 generating the text reply information based on the user feature information, the text information (Lembersky Para. [0022], The user's converted text (speech) and mood 110 [user feature information] may then be passed to an Al engine 112 to determine a proper response 114 to the user (e.g., an answer to a question and specific emotion), which results in the proper text and emotional response being sent to a processor 116) and the relevant information (Lembersky Para. [0031], the virtual assistant can learn more about the user and make appropriate responses based on their past experiences. For instance, the techniques herein may collect for example, a historical activity database, the sentiment from the user using facial recognition, and stores this in their emotional history in the database of emotions and responses 124 for a particular user. The machine learning tools and techniques 122 may then be used to improve the virtual assistant's responses based on the user's past experiences such as shopping and dining habits from questions they ask the virtual assistant).  

As per claim 5, Lembersky-Liu-Wu discloses the method according to claim 1 as set forth above, Lembersky also discloses wherein the user feature information comprises a user expression (Lembersky Para. [0022], through a facial recognition API 110 and/or skeletal tracking, the techniques herein can determine the mood [user expression] of the user), and wherein the generating of the control parameter for the three-dimensional virtual portrait according to the user feature information and the reply audio comprises: 
generating the control parameter for the three-dimensional virtual portrait according to the user expression and the reply audio (Lembersky Para. [0022], The user's converted text (speech) and mood 110 may then be passed to an Al engine 112 to determine a proper response 114 to the user (e.g., an answer to a question and specific emotion), which results in the proper text and emotional response being sent to a processor 116, which then translates the responsive text back to synthesized speech 118, and also triggers visual display "blend shapes" 120 to morph a face of the Al character or avatar (two-dimensional (2D) display or even more natural three-dimensional (3D) holograph) into a proper facial expression to convey the appropriate emotional response and mouth movement (lip synching) for the response; Lembersky Para. [0041], a mesh [control parameter] (a collection of vertices, edges, and faces that describe the shape of a 3D object, essentially something 3D) called "A" would just be a face with the mouth closed in a neutral manner. A mesh called "B" would be the same face with the mouth open to make an "O" sound. Using morph target animation the two meshes are "merged" so to speak, and the base mesh (which is the neutral closed mouth) can be morphed into an "O" shape seamlessly. This method allows a variety of combinations to generate facial expressions, and phonemes).  

Claims 6-7 and 10 are apparatus claims reciting similar subject matter to those recited in the method claims 1-2 and 5, and are rejected under similar rationale. Lembersky also discloses an apparatus for generating information (Lembersky Fig. 1, system 100), comprising:
at least one processor (Lembersky Fig. 4, Processor(s) 420); and
a memory (Lembersky Fig. 4, Memory 440) storing instructions, the instructions when executed by the at least one processor, cause the at least one processor to perform operations (Para. [0081 ], The memory 440 comprises a plurality of storage locations that are addressable by the processor 420 for storing software programs and data structures associated with the embodiments described herein).

Claims 11-12 and 15 are computer readable medium claims reciting similar subject matter to those recited in the method claims 1-2 and 5 respectively, and are rejected under similar rationale. Lembersky also discloses a non-transitory computer readable medium, storing a computer program, wherein the computer program, when (Lembersky Para. [0110], a tangible (non-transitory) computer-readable medium (e.g., disks/CDs/RAM/EEPROM/etc.) having program instructions executing on a computer, hardware, firmware, or a combination thereof).

Claims 3-4, 8-9 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Lembersky et al. (US 2019/0095775, Pub. Date Mar. 28, 2019), in view of Liu et al. (US 2016/0240195, Pub. Date Aug. 18, 2016), in view of Wu (US 2018/0150749, Pub. May 31, 2018), and further in view of Brown et al. (US 2014/0317502, Pub. Date Oct. 23, 2014).
As per claim 3, Lembersky-Liu-Wu discloses the method according to claim 2 as set forth above, Lembersky also discloses storing the user feature information and the text information (see Lembersky Fig. 1, Database of User Audio Interactions and Emotions 124; Lembersky Para. [0022], through a facial recognition API 110 and/or skeletal tracking, the techniques herein can determine the mood of the user. The user's converted text (speech) and mood 110 may then be passed to an Al engine 112).
Lembersky-Liu-Wu does not explicitly disclose:
storing the user feature information and the text information in association into a session information set that is set for a current session.  
Brown teaches:
storing data in association into a session information set that is set for a current session (see Brown Para. [0046-0047]: contextual information may be stored in a context data store 138 and may include conversation history during a current session).
It would been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to further modify Lembersky in view of Brown for storing the user feature information and the text information in association into a session information set that is set for a current session.
One of ordinary skill in the art would have been motived because it offers the advantage of allowing to generate a response to user that more closely emulates human-to-human interaction (Brown Para. [0044]).

As per claim 4, Lembersky-Liu-Wu discloses the method according to claim 3 as set forth above, Lembersky Lembersky-Wu does not explicitly disclose wherein the acquiring of the relevant information comprises: 
acquiring the relevant information from the session information set.  
Brown teaches:
acquiring the relevant information from the session information set (see Brown Brown Para. [0115], the smart device 104 may identify contextual information that is related to the conversation. The contextual information may comprise conversation history of the user with the virtual assistant in a current conversation, conversation history of the user with virtual assistant in a previous conversation).

One of ordinary skill in the art would have been motived because it offers the advantage of allowing to generate a response to user that more closely emulates human-to-human interaction (Brown-Para. [0044]).

Claims 8-9 are apparatus claims reciting similar subject matter to those recited in the method claims 3-4 respectively, and are rejected under similar rationale.

Claims 13-14 are computer readable medium claims reciting similar subject matter to those recited in the method claims 3-4 respectively, and are rejected under similar rationale.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Poltorak (US 2011/0283190) Electronic Personal Interactive Device;
Shukla (US 20190371318) System And Method For Adaptive Detection Of Spoken Language Via Multiple Speech Models;
Ponomarev (US 20150133025) Interactive Toy Plaything Having Wireless Communication Of Interaction-Related Information With Remote Entities;
Shukla (US 20190206407) System And Method For Personalizing Dialogue Based On User's Appearances.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VINH NGUYEN whose telephone number is (571)272-4487. The examiner can normally be reached Monday-Friday: 7:30 AM - 5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KAMAL B DIVECHA can be reached on (571)272-5863. The fax phone 
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/VINH NGUYEN/Examiner, Art Unit 2453                                                                                                                                                                                                        

/KAMAL B DIVECHA/Supervisory Patent Examiner, Art Unit 2453