DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2-5, 7-8, and 12-15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claims 2, 7 and 12 recite the limitation “a text content.” Claim 2 is dependent on claim 1 which also recites the limitation “a text content” in line 9. Similarly, claim 7 depends on claim 6 and claim 12 depends on claim 11, which also recite the limitation “a text content.” Claims 2, 7, and 12 are therefore indefinite as it is unclear if the limitation of “a text content” refers to the same limitation of claim 1, 6, and 11 or a further limitation. The examiner suggests editing claims 2, 7, and 12 to read “the text content” to resolve the issue. For expedited prosecution, “a text content” shall be interpreted as “the text content.”

Claim 3 recites the limitation “a lip shape” in line 4. Claim 3 is dependent on claim 1 which also recites the limitation “a lip shape” in line 10. Claim 3 is therefore indefinite as it is unclear if the limitation of “a lip shape” refers to the same limitation of claim 1 or a further limitation. The examiner suggests editing claim 2 line 5 to read “the lip shape” to resolve the issue. For expedited prosecution, “a lip shape” shall be interpreted as “the lip shape.” Claim 3 further recites the limitations “a video” in line 7, “the video” in line 10, and “the target video” in line 11. Claim 1 also recites the limitations “a target video” and “the target video” in the last 2 lines of the claim. Claim 3 is therefore indefinite as it is unclear if the limitations of “a video,” “the video,” “a target video,” and “the target video” refer to the same limitation or a further limitation. The examiner suggests editing claim 3 to read “the target video” throughout to resolve the issue. For expedited prosecution, “a video” shall be interpreted as “the target video” and “the video” shall be interpreted as “the target video.” Claims 8 and 13 are similarly rejected.

Claims 4 and 14 recite the limitation "wherein before converting" in lines 1 and 2. There is insufficient antecedent basis for this limitation in the claim. The examiner suggests amending the claims to read “wherein before the converting.” For examination purposes, the term “wherein before converting” has been construed as “wherein before the converting.”

Claims 5 and 15 recite the limitation "wherein before simulating" in lines 1 and 2. There is insufficient antecedent basis for this limitation in the claim. The examiner suggests amending the claims to read “wherein before the simulating.” For examination purposes, the term “wherein before simulating” has been construed as “wherein before the simulating.”

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-2, 6-7, and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Vadodaria (Patent No. US 10178218 B1) in view of Guo et al. (Patent No. US 7019749 B2), hereinafter Guo.
Regarding claim 1, Vadodaria teaches a method for dialogue with a virtual object, applied to a client end (Spec. Col. 1, line 64- Col. 2, line 15) and comprising: 
converting a first voice collected by the client end into a first text content (Spec. Col. 35, lines 20-27; the user can speak a voice command, i.e. a first voice, on a client device which can convert the voice to text, i.e. a first text content); 
acquiring a second text content responding to the first text content based on natural language processing (NLP) and/or a target database pre-stored by the client end; wherein the target database stores, in an associated manner, a target text content and a text content responding to the target text content (Spec. Col. 21, lines 33-35; the NLP Engine is used to construct the response sentence, i.e. the second text content, which is read out to the user if the Text To Speech is active. Fig. 7 shows a “Common Sense Knowledge Database” to which the Action + Entity from the user’s query is input and which stores responses to the query, i.e. text content responding to the target text, which is then output by the “NLG” natural language generator. This is further detailed in Col. 18, lines 3-8); 
performing voice synthesis on the second text content to acquire a second voice (Spec. Col. 21, lines 33-35; the NLP Engine is used to construct the response sentence which is read out to the user if the Text To Speech is active); 
simulating a lip shape of the second voice by using the virtual object to acquire a target video in which the virtual object says the second voice (Col. 23, lines 51-61; the text to speech audio file of the response is played with the associated facial expression animation, in which lip movement of a virtual face is rendered as a talking segment to speak the response); and 
playing the target video (Col. 23, lines 51-61; the text to speech audio file of the response is played with the associated facial expression animation and displayed on the client device with the NLG response).
However, Vadodaria does not teach that the client end is in an offline mode or that the natural language processing (NLP) is done offline. Guo teaches a technique for rendering a talking head or agent to simulate a conversation with smooth transitions between talking states (Abstract). Guo further teaches that the talking head can be generated offline with prerecorded phrases and a template-based language generator (Spec. Col. 12, lines 45-49).
Adapting Vadodaria to incorporate the features as taught by Guo provides a method for dialogue with a virtual object, applied to a client end (Vadodaria: Spec. Col. 1, line 64- Col. 2, line 15) and comprising: 
converting a first voice collected by the client end into a first text content, in a case that the client end is in an offline mode (Vadodaria: Spec. Col. 35, lines 20-27; the user can speak a voice command, i.e. a first voice, on a client device which can convert the voice to text, i.e. a first text content, now adapted to be done in an offline mode of the client device as taught by the system for rendering a talking head virtual object in Guo Spec. Col. 12, lines 45-49); 
acquiring a second text content responding to the first text content based on offline natural language processing (NLP) and/or a target database pre-stored by the client end; wherein the target database stores, in an associated manner, a target text content and a text content responding to the target text content (Vadodaria: Spec. Col. 21, lines 33-35; the NLP Engine is used to construct the response sentence, i.e. the second text content, which is read out to the user if the Text To Speech is active, now adapted to be done in an offline mode of the client device as taught by the system for rendering a talking head virtual object in Guo Spec. Col. 12, lines 45-49. Vadodaria: Fig. 7 shows a “Common Sense Knowledge Database” to which the Action + Entity from the user’s query is input and which stores responses to the query, i.e. text content responding to the target text, which is then output by the “NLG” natural language generator. This is further detailed in Col. 18, lines 3-8); 
performing voice synthesis on the second text content to acquire a second voice (Vadodaria: Spec. Col. 21, lines 33-35; the NLP Engine is used to construct the response sentence which is read out to the user if the Text To Speech is active); 
simulating a lip shape of the second voice by using the virtual object to acquire a target video in which the virtual object says the second voice (Vadodaria: Col. 23, lines 51-61; the text to speech audio file of the response is played with the associated facial expression animation, in which lip movement of a virtual face is rendered as a talking segment to speak the response); and 
playing the target video (Vadodaria: Col. 23, lines 51-61; the text to speech audio file of the response is played with the associated facial expression animation and displayed on the client device with the NLG response).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Vadodaria by incorporating the teachings of Guo. Vadodaria and Guo are considered to be analogous as both Vadodaria and Guo are directed to the use of a virtual agent to provide information to a user via conversation with a talking head. Vadodaria recognizes there may be instances where a client device may need to manage limited resources (Spec. Cl. 10, lines 4-11). Similarly, Guo recognizes that computing resource availability may impact the operation of the device (Spec. Col. 12, lines 45-49). Guo teaches a particular solution to this problem, namely the generation of the talking head in an offline mode. Given the overlap, in particular the use of a talking head virtual object to deliver information in conversation with a user, incorporation of the features of Guo into Vadodaria would have been predictable to one of ordinary skill in the art at the time of filing.

Regarding claim 2, in addition to the elements stated above regarding claim 1, the combination of Vadodaria and Guo further teaches the method according to claim 1, wherein the acquiring the second text content responding to the first text content based on the offline natural language processing (NLP) and/or the target database pre-stored by the client end (Vadodaria, Spec. Col. 21, lines 33-35; the NLP Engine is used to construct the response sentence, i.e. the second text content, which is read out to the user if the Text To Speech is active) comprises: 
performing the offline natural language processing (NLP) on the first text content to acquire the second text content (the NPL and NLG system of Vadodaria, adapted to be done in an offline mode of the device as taught by Guo. Vadodaria: Spec. Col. 35, lines 60-65; the NLP engine analyzes the user input to perform a semantic analysis to acquire a Semantic Action to be performed by the system and associated parameters. Col. 36, lines 15-20; the Backend Proxy Module receives the Action and parameters and returns results for the user’s query. Col. 36, lines 34-46; the Natural Language Generation Engine receives the results and generates a response sentence as text which is sent to the Conversation management module.).

Regarding claim 6, the claim is directed to a device for dialogue with a virtual object, applied to a client end and comprising: 
at least one processor (Spec. Col. 1, line 64- Col. 2, line 15); and 
a memory communicatively connected to the at least one processor; wherein the memory stores an instruction executable by the at least one processor (Spec. Col. 1, line 64- Col. 2, line 15), and when executing the instruction, the at least one processor is configured to perform the features presented in the claimed method of claim 1. The combination of Vadodaria and Guo teaches a system comprising these elements for performing the method of claim 1, therefore claim 6 is rejected under the same grounds. 

Regarding claim 7, the claim is directed to the device according to claim 6 corresponding to the claimed method of claim 2 and is rejected under the same grounds.

Regarding claim 11, the claim is directed to a non-transitory computer-readable storage medium, storing a computer instruction thereon (Spec. Col. 1, line 64- Col. 2, line 15), wherein the computer instruction is configured to be executed to cause a computer to perform the features presented in the claimed method of claim 1. The combination of Vadodaria and Guo teaches a non-transitory computer-readable storage medium comprising these elements for performing the method of claim 1, therefore claim 6 is rejected under the same grounds.

Regarding claim 12, the claim is directed to the non-transitory computer-readable storage medium according to claim 11 corresponding to the claimed method of claim 2 and is rejected under the same grounds.

Claims 3, 8, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Vadodaria in view of Guo and Habra (Pub. No. US 2019/0172241 A1).

Regarding claim 3, the combination of Vadodaria and Guo teaches the method of claim 1 as detailed above. However, the combination does not teach wherein the simulating the lip shape of the second voice by using the virtual object to acquire the target video in which the virtual - 24 -DGNO111object says the second voice comprises: 
simulating, based on lip shape pictures that are locally stored, a lip shape when the virtual object says the second voice, to acquire a plurality of target pictures in a process of the virtual object saying the second voice; 
processing the plurality of target pictures to acquire a video in which the lip shape continuously changes in the process of the virtual object saying the second voice; and 
synthesizing the video in which the lip shape continuously changes and an audio signal of the second voice to acquire the target video.
Habra teaches a processor and memory storing instructions for real-time lip synchronization animation of animation models according to phonemes used to answer user requests, i.e. a second voice (Abstract). Habra further teaches wherein the simulating the lip shape of the second voice by using the virtual object to acquire the target video in which the virtual - 24 -DGNO111object says the second voice (Spec. page 2, [0026], lines 6-10; the animated virtual assistant speaks the answer, i.e. says the second voice, by animating the mouth shape of the assistant to match the speech) comprises: 
simulating, based on lip shape pictures that are locally stored, a lip shape when the virtual object says the second voice, to acquire a plurality of target pictures in a process of the virtual object saying the second voice (Spec. page 3, [0037]; the answer to the user query is received as a text which is converted to phonemes which are used to create the speech and animate the assistant with the models [0038]; a client device may store a plurality of animation models, i.e. lip shape pictures that are locally stored, corresponding to a phoneme or set of phonemes. The models are used to generate the first animation model and second animation model [0041] at operation of the client device according to the phonemes of the response); 
processing the plurality of target pictures to acquire a video in which the lip shape continuously changes in the process of the virtual object saying the second voice (Figure 4 is a flowchart describing the process of processing the models, i.e. the plurality of target pictures, in elements 402, 404, and 406. Page 4, [0059]; frames are determined for the phoneme and transition models as the duration periods are determined, which is considered to be acquiring a video in which the lip shape continuously changes. Page 5, [0064]; corresponding speech audio is created according to the determined durations); and 
synthesizing the video in which the lip shape continuously changes and an audio signal of the second voice to acquire the target video (Figure 4 elements 408, 410, and 412 describe displaying the models to the user, i.e. synthesizing the video. Spec. page 2, [0026], lines 6-10; the animated virtual assistant speaks the answer, i.e. says the second voice, by animating the mouth shape of the assistant to match the speech).
Adapting the combination of Vadodaria and Guo to incorporate the features as taught by Habra provides the method according to claim 1 (as taught above by Vadodaria and Guo, now adapted to simulate lip movement according to the teachings of Habra) wherein the simulating the lip shape of the second voice by using the virtual object to acquire the target video in which the virtual - 24 -DGNO111object says the second voice (Habra: Spec. page 2, [0026], lines 6-10; the animated virtual assistant speaks the answer, i.e. says the second voice, by animating the mouth shape of the assistant to match the speech) comprises: 
simulating, based on lip shape pictures that are locally stored, a lip shape when the virtual object says the second voice, to acquire a plurality of target pictures in a process of the virtual object saying the second voice (Habra: Spec. page 3, [0037]; the answer to the user query is received as a text which is converted to phonemes which are used to create the speech and animate the assistant with the models [0038]; a client device may store a plurality of animation models, i.e. lip shape pictures that are locally stored, corresponding to a phoneme or set of phonemes. The models are used to generate the first animation model and second animation model [0041] at operation of the client device according to the phonemes of the response); 
processing the plurality of target pictures to acquire a video in which the lip shape continuously changes in the process of the virtual object saying the second voice (Habra: Figure 4 is a flowchart describing the process of processing the models, i.e. the plurality of target pictures, in elements 402, 404, and 406. Page 4, [0059]; frames are determined for the phoneme and transition models as the duration periods are determined, which is considered to be acquiring a video in which the lip shape continuously changes. Page 5, [0064]; corresponding speech audio is created according to the determined durations); and 
synthesizing the video in which the lip shape continuously changes and an audio signal of the second voice to acquire the target video (Habra: Figure 4 elements 408, 410, and 412 describe displaying the models to the user, i.e. synthesizing the video. Spec. page 2, [0026], lines 6-10; the animated virtual assistant speaks the answer, i.e. says the second voice, by animating the mouth shape of the assistant to match the speech).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination Vadodaria and Guo by incorporating the teachings of Habra. Habra and Vadodaria are considered to be analogous as both disclosures are directed to providing real-time responses to user queries using an animated virtual assistant. Vadodaria recognizes there may be instances where a client device may need to manage limited resources (Spec. Cl. 10, lines 4-11). Similarly, Habra recognizes that real-time lip synchronization can be negatively impacted by the limits of computational resources, causing delay (Spec. page 2, [0023]). Habra teaches a particular solution to this problem, namely relying on stored lip shape models for animation. Given the overlap, in particular the use of a talking head virtual object to deliver information in conversation with a user, incorporation of the features of Habra into the combination of Vadodaria and Guo would have been predictable to one of ordinary skill in the art at the time of filing.

Regarding claim 8, the claim is directed to the device according to claim 6 corresponding to the claimed method of claim 3 and is rejected under the same grounds.

Regarding claim 13, the claim is directed to the non-transitory computer-readable storage medium according to claim 11 corresponding to the claimed method of claim 3 and is rejected under the same grounds.

Claims 4, 9, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Vadodaria in view of Guo and Zheng et al. (Doc. ID. US 2013/0246617 A1), hereinafter Zheng.

Regarding claim 4, the combination of Vadodaria and Guo teaches the method of claim 1 as detailed above. However, the combination does not teach wherein before converting the first voice collected by the client end into the first text content, in a case that the client end is in the offline mode, the method further comprises: 
detecting a network transmission rate of the client end; and 
determining that the client end is in the offline mode, in a case that the network transmission rate is lower than a preset value.
Zheng teaches a method for processing network data, including detecting a network status and determining if the status reflects normal network operating conditions (Abstract). Zheng further teaches 
detecting a network transmission rate of the client end (Spec. page 2, [0029]; network status is determined according to data transfer rate); and 
determining that the client end is in the offline mode, in a case that the network transmission rate is lower than a preset value (Spec. page 2, [0029-30]; if the data transfer rate is lower than a preset threshold, S103 is performed, which puts the device in offline mode).
Adapting the combination of Vadodaria and Guo to incorporate the features as taught by Zheng provides the method according to claim 1 (as taught by Vadodaria and Guo above), wherein before converting the first voice collected by the client end into the first text content, in a case that the client end is in the offline mode, the method further comprises: 
detecting a network transmission rate of the client end (Zheng: Spec. page 2, [0029]; network status is determined according to data transfer rate); and 
determining that the client end is in the offline mode, in a case that the network transmission rate is lower than a preset value (Zheng: Spec. page 2, [0029-30]; if the data transfer rate is lower than a preset threshold, S103 is performed, which puts the device in offline mode).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination Vadodaria and Guo by incorporating the teachings of Zheng. Guo teaches that the talking head may be rendered and operated in an offline mode (Spec. Col. 12, lines 45-49). Zheng teaches a method for the device to determine that it is offline from the network and to enter the offline mode (Abstract). Therefore incorporation of the features of Zheng into the combination of Vadodaria and Guo would have been predictable to one of ordinary skill in the art at the time of filing.

Regarding claim 9, the claim is directed to the device according to claim 6 corresponding to the claimed method of claim 4 and is rejected under the same grounds.

Regarding claim 14, the claim is directed to the non-transitory computer-readable storage medium according to claim 11 corresponding to the claimed method of claim 4 and is rejected under the same grounds.

Claims 5, 10, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Vadodaria in view of Gup and Baik et al. (Doc. ID. US 20180047391 A1), hereinafter Baik.

Regarding claim 5, the combination of Vadodaria and Guo teaches the method of claim 1 as detailed above. However, the combination does not teach wherein before simulating the lip shape of the second voice by using the virtual object to acquire the target video in which the virtual object says the second voice, the method further comprises: 
determining a type of the virtual object based on the first text content; and 
selecting the virtual object of the type from a preset virtual object library.
Baik teaches a method for providing audio and visual feedback in response to a voice command (Abstract). Baik further teaches before simulating the lip shape of the second voice by using the virtual object to acquire the target video in which the virtual object says the second voice (Fig. 5; prior to outputting the character with audio and video feedback in element S5090, a character type is selected in element S5020), the method further comprises: 
determining a type of the virtual object based on the first text content (Spec. page 10, [0103], lines 1-6; a character type, i.e. a virtual object type, is determined in step S5020 based on the type of voice command received); and 
selecting the virtual object of the type from a preset virtual object library (Spec. page 10, [0101], lines 1-10; the character generating server includes a database of predefined characters mapped to action and sentence types from which the character is chosen).
Adapting the combination of Vadodaria and Guo to incorporate the features as taught by Baik provides the method according to claim 1 (as taught by the combination of Vadodaria and Guo, now adapted to choose a type of virtual object as taught by Baik), wherein before simulating the lip shape of the second voice by using the virtual object to acquire the target video in which the virtual object says the second voice (Baik: Fig. 5; prior to outputting the character with audio and video feedback in element S5090, a character type is selected in element S5020), the method further comprises: 
determining a type of the virtual object based on the first text content (Baik: Spec. page 10, [0103], lines 1-6; a character type, i.e. a virtual object type, is determined in step S5020 based on the type of voice command received); and 
selecting the virtual object of the type from a preset virtual object library (Baik: Spec. page 10, [0101], lines 1-10; the character generating server includes a database of predefined characters mapped to action and sentence types from which the character is chosen).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination Vadodaria and Guo by incorporating the teachings of Baik. Baik and Vadodaria are considered to be analogous as both disclosures are directed to providing real-time responses to user queries using an animated virtual assistant. Furthermore, Vadodaria (Spec. Col. 1, “Background of the Invention”) and Baik (Spec. pages 3-4, [0056]) are also both directed to providing a virtual object which can express emotion via various facial expressions. Vadodaria recognizes that there is a need for providing emotional context with the animated interactive agent. Baik solves for this need by providing facial expressions, but also by providing different character types according to the context of the user request. Given the overlap, in particular the use of animated virtual objects as agents for responding to user requests, incorporation of the features of Baik into the combination of Vadodaria and Guo would have been predictable to one of ordinary skill in the art at the time of filing.

Regarding claim 10, the claim is directed to the device according to claim 6 corresponding to the claimed method of claim 5 and is rejected under the same grounds.

Regarding claim 15, the claim is directed to the non-transitory computer-readable storage medium according to claim 11 corresponding to the claimed method of claim 5 and is rejected under the same grounds.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Lyberg (Pub. No. WO 98/43235) teaches a device for prosody generation involving storing half-syllables with registered movement patterns of a face, synthesizing speech with the half-syllables, and applying the movement patterns to a model which is applied to a real face to obtain lifelike animation for the generated  speech (Abstract).
Navaratnam (Pub. No. US 2017/0011745 A1) teaches a system for servicing customers with an interactive display providing two-way audio/visual communication delivered by a virtual digital actor (Abstract). 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PARKER L MAYFIELD whose telephone number is (571)272-4745. The examiner can normally be reached Monday - Friday 7:30 AM-5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/PARKER L MAYFIELD/
Examiner
Art Unit 2655



/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655