Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3, 6, 8-10, 13 are rejected under 35 U.S.C. 103 as being unpatentable over Sachs et al (US20190122411) in view of Xu et al (US20170330319).

Regarding Claim 1. Sachs teaches A method for generating a three dimensional (3D) visual representation of a sensed object that is three dimensional (Sachs, abstract, the invention describes system and methods for computer animations of 3D models of heads generated from images of faces. A 2D
captured image that includes an image of a face can be received and used to generate a static 3D model of a head. A rig can be fit to the static 3D model to generate an animation-ready 3D generative model. Sets of rigs can be parameters that each map to particular sounds or particular facial movement observed in a video. These mappings can be used to generate a play lists of sets of rig parameters based upon received audio or video content. The playlist may be played in synchronization with an audio rendition of the audio content. Methods can receive a captured image, identify taxonomy attributes from the captured image, select a template model for the captured image, and perform a shape solve for the selected template model based on the identified taxonomy attributes.), the method comprises:
obtaining at least one 3D visual representation parameter, the visual representation parameters is selected out of a size parameter, a resolution parameter, and a resource consumption parameter (Sachs, [0098] In accordance with a number of embodiments, the generation and/or optimization of the customized 3D model is performed in stages. In accordance with many of these embodiments, a first stage can solve for the camera properties. The camera properties may include, but are not limited to, camera rotation, camera translation, Field of View (FOV), and focal length. In a second stage, the blendshape weights may be solved. In a third stage, a free-form deformation of the model is solved. In a fourth stage, the texture and light and lighting components may be solved. Finally, in a fifth stage, eye details are solved. In
accordance with some embodiments, the eye details may include, but are not limited to, iris shape and eyelid folds. In accordance with a number of embodiments, different resolution meshes may be used in the different stages of the optimization process. In accordance some particular embodiments, a low-resolution mesh may be used for the
first three stages and a high-resolution mesh is used in the fourth and fifth stages.);
obtaining object information that represents the sensed object (Sachs, [0009] In accordance with some embodiments, the one or more processes determine a position for each of a plurality of facial landmarks in the image by performing a Mnemonic Descent Method (MDM) using a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN)that are jointly trained. In accordance with many of these embodiments,);

Sachs fails to explicitly teach, however, Xu teaches selecting, based on the at least one parameter, a neural network for generating the visual representation of the sensed object (Xu, abstract, the invention describes methods relate to detecting multiple landmarks in medical images. By way of introduction, the present embodiments described below include apparatuses and methods for detecting landmarks using hierarchical feature learning with end-to-end training. Multiple neural networks are provided with convolutional layers for extracting features from medical images and with a convolutional layer for learning spatial relationships between the extracted features. Each neural network is trained to detect different landmarks using a different resolution of the medical images, and the convolutional layers of each neural network are trained together with end-to-end training to learn appearance and spatial configuration simultaneously. The trained neural networks detect multiple landmarks in a test image
iteratively by detecting landmarks at different resolutions, using landmarks detected a lesser resolutions to detect additional landmarks at higher resolutions.
[0007] In a third aspect, a method for multiple landmark detection is provided. The method includes receiving medical image data from a medical scanner and identifying a first subset of a plurality of landmarks from the medical image data at a first resolution using a first learned deep neural network. The method also includes identifying a second subset of the plurality of landmarks from the medical image data at a second resolution using a second learned deep neural network. The method further includes displaying a medical image from the medical image data identifying the
identified first subset of landmarks and the identified second subset of landmarks.
[0020] A hierarchy of the neural networks is also established for detecting landmarks at different resolutions. A first neural network is trained at a lower resolution of the image to detect a subset of landmarks at the lower resolution. After the first subset of landmarks is determined, then a second neural network is trained at a higher resolution of the image to detect another subset of landmarks at the higher resolution.
Any number of resolutions and corresponding neural networks may be used. With each increase in resolution, a denser set of landmarks is detected, providing an iterative, coarse-to-fine landmark detection process.); and
	Sachs and Xu are analogous art, because they both teach method of generating 3D model of target object by using neural networks. Xu further teaches using different neural network for different resolution of image. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the 3D model generation method (taught in Sachs), to further use hierarchical neural network to detect image landmarks at different resolution (taught in Xu), so as to provide an iterative, coarse-to-fine landmark detection process (Xu, [0020]).

The combination of Sachs and Xu further teaches generating the 3D visual representation of the 3D object by the selected neural network (Sachs, [0086], in accordance with some of these embodiments, the generated 3D model of the face is animated-ready mappings of generating specific sets of rig parameters based upon to samples of audio data, video data, and/or text data. In many embodiments, a user device executes an application that obtains a captured 2D image of a face and transmits the image to a server system that performs processes that generate an animation-ready 3D model and animate the generated model using mappings of sets of rig parameters. The user device (or another user device) can receive the 3D model and necessary mappings of sets of rig parameters from the server systems and can use this information to display computer animations.).

Regarding Claim 2. The combination of Sachs and Xu further teaches The method according to claim 1 wherein the generating of the visual representation comprising generating a 3D model of the 3D object and at least one 2D texture map of the 3D object (Sachs, [0075] A static 3D model generating process in accordance with some embodiments of the invention generates a static 3D model of a face or head from the image data that includes an image of a face.
[0076] To determine the texture of the face, the appearance of the face is factorized in accordance with some embodiments. In accordance with many embodiments, the appearance is factorized as a product of the skin albedo parameterized with a texture map and a low-frequency environmental lighting.).

Regarding Claim 3. The combination of Sachs and Xu further teaches The method according to claim 2 wherein the generating comprises further processing the 3D model and the 2D texture map during a rendering process of at least one rendered image (Sachs, [0075] A static 3D model generating process in accordance with some embodiments of the invention generates a static 3D model of a face or head from the image data that includes an image of a face. In accordance with several embodiments, the process is performed by a server system that receives the captured image from an execution on a user device. In some embodiments, the static 3D model generating process generates a static 3D model of a head from the image of the face in the captured image. In many embodiments, the process uses a generative animation model to model the face in the captured image. In accordance with embodiments, the generative animation model is based upon internal camera parameters, the shape of the face, the texture of the face and a translation vector.
[0076] To determine the texture of the face, the appearance of the face is factorized in accordance with some embodiments. In accordance with many embodiments, the appearance is factorized as a product of the skin albedo parameterized with a texture map and a low-frequency environmental lighting. In accordance with some embodiments, a Lambertian reflectance model of the face is used to model the low-frequency lighting and the lighting is represented as a combination of point light sources and/or spherical harmonic sources.).

Regarding Claim 6. The combination of Sachs and Xu further teaches The method according to claim 1 comprising outputting the 3D visual representation from the selected set of neural network outputs (Sachs, [0009] In accordance with some embodiments, the one or more processes determine a position for each of a plurality of facial landmarks in the image by performing a Mnemonic Descent Method (MDM) using a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN)that are jointly trained. In accordance with many of these embodiments, 
[0086], in accordance with some of these embodiments, the generated 3D model of the face is animated-ready mappings of generating specific sets of rig parameters based upon to samples of audio data, video data, and/or text data. In many embodiments, a user device executes an application that obtains a captured 2D image of a face and transmits the image to a server system that performs processes that generate an animation-ready 3D model and animate the generated model using mappings of sets of rig parameters. The user device (or another user device) can receive the 3D model and necessary mappings of sets of rig parameters from the server systems and can use this information to display computer animations.).

	Claim 8 is same scope as Claim 1, thus is rejected under same rationale. Claim 8 further requires: 
A non-transitory computer readable medium (Sachs, page 19, claim 13, a non-transitory machine readable medium containing processor instructions for generating a three dimensional (30) head model from a captured image, where execution of the instructions by a processor.).

Claim 9 is same scope as Claim 2, thus is rejected under same rationale.
Claim 10 is same scope as Claim 3, thus is rejected under same rationale.
Claim 13 is same scope as Claim 6, thus is rejected under same rationale.

Claims 4-5, 7, 11-12, 14 are rejected under 35 U.S.C. 103 as being unpatentable over Sachs et al in view of Xu et al further in view of Wang et al (US9613450).

Regarding Claim 4. The combination of Sachs and Xu further teaches The method according to claim 2 wherein the generating is executed by a first computerized unit, wherein the generating is followed by sending the 3D model and the at least one 2D texture map to a second computerized unit configured to render at least one rendered image based on the 3D model (Sachs, [0086] In many embodiments, a user device executes an application that obtains a captured 2D image of a face and transmits the image to a server system that performs processes that generate an animation-ready 3D model and animate the generated model using mappings of sets of rig parameters. The user device (or another user device) can receive the 3D model and necessary mappings of sets of rig parameters from the server systems and can use this information to display computer animations.).

The combination of Sachs and Xu fails to explicitly teach, however, Wang teaches render at least one rendered image based on the 3D model and the at least one 2D texture map (Wang, abstract, the invention describes dynamic texture mapping which is used to create a photorealistic three dimensional animation of an individual with facial features synchronized with desired speech. Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which the animation will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image
sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with facial features, such as
lip movements, synchronized with the desired speech. This image sequence is applied to the three-dimensional model.
Col 4, line 10-32, A general 3D face model is applied for personalized 3D face reconstruction. The 3D shapes have been compressed by the Principal Component Analysis (PCA). After the 2D face alignment, the key feature points are used to compute the 3D shape coefficients of the eigenvectors. Then, the coefficients are used to reconstruct the 3D face shape. Finally, the face texture is extracted from the input image. By mapping the texture onto the 3D face geometry, the 3D face model for the input 2D face image is reconstructed.).
Sachs, Xu and Wang are analogous art, because they all teach method of generating 3D model of target object. Sachs and Wang both further teach creating texture map. Wang further teaches use the texture to map the 3D model. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the 3D model generation method using neural network (taught in Sachs and Xu), to further apply texture mapping on the 3D model (taught in Wang), so as to generate a photo-realistic 3D talking head that looks like real people (Wang, col 1, line 16-22).

Regarding Claim 5. The combination of Sachs, Xu and Wang further teaches The method according to claim 1 wherein the 3D object is a participant of a 3D video conference (Wang, col 2, line 50-59, The application 100 can use a talking head for a variety of purposes. For example, the application 100 can be a computer assisted language learning applications, a language dictionary (e.g., to demonstrate pronunciation), an email reader, a news reader, a book reader, a text-to-speech system, an intelligent voice agent, an avatar of an individual for a virtual meeting room, a virtual agent in dialogue system, video conferencing, online chatting, gaming, movie animation, or other application that provides visual and speech-based interaction with an individual.).
The reasoning for combination of Sachs, Xu and Wang is the same as described in Claim 4.

Regarding Claim 7. The combination of Sachs, Xu and Wang further teaches The method according to claim 6 wherein the 3D object is a participant of a 3D video conference (Wang, col 2, line 50-59, The application 100 can use a talking head for a variety of purposes. For example, the application 100 can be a computer assisted language learning applications, a language dictionary (e.g., to demonstrate pronunciation), an email reader, a news reader, a book reader, a text-to-speech system, an intelligent voice agent, an avatar of an individual for a virtual meeting room, a virtual agent in dialogue system, video conferencing, online chatting, gaming, movie animation, or other application that provides visual and speech-based interaction with an individual.).
The reasoning for combination of Sachs, Xu and Wang is the same as described in Claim 4.

Claim 11 is same scope as Claim 4, thus is rejected under same rationale.
Claim 12 is same scope as Claim 5, thus is rejected under same rationale.
Claim 14 is same scope as Claim 7, thus is rejected under same rationale.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIN SHENG whose telephone number is (571)272-5734. The examiner can normally be reached M-F 9:30AM-3:30PM 6:00PM-8:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 5712727794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Xin Sheng/Primary Examiner, Art Unit 2611