DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Continued Examination
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 9/13/2022 has been entered. Claims 1-21 remain pending in the application. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 3, 12, 14 and 20-21 are rejected under 35 U.S.C. 103 as being unpatentable over Chan non-patent literature titled “Everybody Dance Now” in view of Loper U.S. Patent Application 20150206341, and further in view of Gefen U.S. Patent Application 20110063415.
Regarding claim 20, Chan discloses a system, comprising:
a memory (computer memory) that includes instructions; and
a processor (computer processor) that is coupled to the memory and, when executing the instructions: 
accesses a machine learning model that has been trained via first image data of a 3D animatable asset generated from movements of the 3D animatable asset based on first rig vector data that is associated with a plurality of rig poses; receives second rig vector data (Abstract: given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. We approach this problem as video-to video translation using pose as an intermediate representation; Figure 3: (Top) Training: Our model uses a pose detector P to create pose stick figures from video frames of the target subject… (Bottom) Transfer: We use a pose detector P to obtain pose joints for the source person that are transformed by our normalization process Norm into joints for the target person for which pose stick figures (rig vector data) are created. Then we apply the trained mapping G; page 2 section 2 Related Work: Several approaches rely on calibrated multi-camera setups to ‘scan’ a target actor and manipulate their motions in a new video through a fitted 3D model of the target; Figure 1: a video of a graduate student performing various motions, our method transfers the ballerina’s performance (movements) onto the student); and 
generates, via the machine learning model, second image data of the 3D animatable asset based on the second rig vector data (Abstract: To transfer the motion, we extract poses from the source subject and apply the learned pose-to-appearance mapping to generate the target subject).
Chan discloses all the features with respect to claim 20 as outlined above. However, Chan fails to disclose animatable asset generated by rendering movements of the 3D animatable asset based on first rig vector data that is associated with a plurality of rig poses explicitly, wherein at least one of the movements of the 3D animatable asset is rendered from a plurality of camera views. 
Loper discloses animatable asset generated from movements of the 3D animatable asset based on first rig vector data that is associated with a plurality of rig poses (paragraph [0072]: The method comprises the step S101 of providing a parametric three-dimensional body model, which allows shape and pose variations (movements)... the step S104 of automatically providing an animation by processing the 3D coordinate marker signals in order to provide a personalized three-dimensional body model, based on estimated shape and an estimated pose of the body by means of predicted marker locations).
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Chan’s to generate animation from rig data as taught by Loper, to generate animation conveniently.
Chan as modified by Loper discloses all the features with respect to claim 20 as outlined above. However, Chan as modified by Loper fails to disclose at least one of the movements of the 3D animatable asset is rendered from a plurality of camera views. 
Gefen disclose at least one of the movements of the 3D animatable asset is rendered from a plurality of camera views (paragraph [0056]: using 3D modeling techniques, including, for example and without limitation, texture loading, virtual camera modeling, and rendering to a view port, such as are widely used in gaming applications; paragraph [0058]: changing the appearance of an object includes revealing the interior of the virtual object... or changing the spatial orientation of the virtual object... Such changes in the appearance of the virtual object may be effected by, for example, renderer 37; paragraph [0031]: video tracking and object tracking can be used to estimate the camera parameters and track moving objects; Gefen’s rendering from a plurality of camera views can be combined with Chan and Loper’s device, such that moving pose stick figure in a particular pose can be trained and mapped, and rendered from a plurality of virtual camera views).
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Chan and Loper’s to render from plurality of views as taught by Gefen, to improve interactivity while minimizing interference with the program viewing experience.

Claim 1 recites the functions of the apparatus recited in claim 20 as method steps.  Accordingly, the mapping of the prior art to the corresponding functions of the apparatus in claim 20 applies to the method steps of claim 1.

Regarding claim 3, Chan as modified by Loper and Gefen discloses the computer-implemented method of claim 1, wherein the second rig vector data includes at least one of position data or orientation data (Chan’s Page 4, section 3.1. Pose Encoding and Normalization: analyzing the heights and ankle positions for the poses of each subject and use a linear mapping between the closest and farthest ankle positions in both videos. After gathering these positions, we calculate the scale and translation for each frame based on its corresponding pose detection; Gefen’s paragraph [0058]: changing the appearance of an object includes revealing the interior of the virtual object... or changing the spatial orientation of the virtual object... Such changes in the appearance of the virtual object may be effected by, for example, renderer 37). 
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Chan’s to interact with scene as taught by Loper, to present object of interest at given location in the scene; and combine Chan and Loper’s to render from plurality of views as taught by Gefen, to improve interactivity while minimizing interference with the program viewing experience.

Claim 12 recites the functions of the apparatus recited in claim 20 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the apparatus in claim 20 applies to the medium steps of claim 12.

Claim 14 recites the functions of the method recited in claim 3 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the method in claim 3 applies to the medium steps of claim 14.

Regarding claim 21, Chan as modified by Loper and Gefen discloses the computer-implemented method of claim 1, wherein the second rig vector data includes at least one of a virtual camera position or a virtual camera orientation (Gefen’s paragraph [0056]: using 3D modeling techniques, including, for example and without limitation, texture loading, virtual camera modeling, and rendering to a view port, such as are widely used in gaming applications; paragraph [0031]: video tracking and object tracking can be used to estimate the camera parameters and track moving objects; Chan’s page 2 section 2 Related Work: Several approaches rely on calibrated multi-camera setups to ‘scan’ a target actor and manipulate their motions in a new video through a fitted 3D model of the target; page 4 section 3.1 Global pose normalization: subjects may have different limb proportions or stand closer or farther to the camera than one another).
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Chan’s to interact with scene as taught by Loper, to present object of interest at given location in the scene; and combine Chan and Loper’s to render from plurality of views as taught by Gefen, to improve interactivity while minimizing interference with the program viewing experience.

Claim 4-5 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Chan non-patent literature titled “Everybody Dance Now” in view of Loper U.S. Patent Application 20150206341, in view of Gefen U.S. Patent Application 20110063415, and further in view of Kim U.S. Patent Application 20170046563.
Regarding claim 4, Chan as modified by Loper and Gefen discloses all the features with respect to claim 1 as outlined above. Chan further discloses recent works have applied deep learning for reanimation in different applications and rely on more detailed input representations. Given synthetic renderings, an interior face model, and a gaze map as input, Kim et al. [19] transfer head position and facial expressions between human subjects and render their results in detailed portrait videos (Page 2, Related Work); Figure 3: (Top) Training: Our model uses a pose detector P to create pose stick figures from video frames of the target subject. However, Chan as modified by Loper and Gefen fails to disclose at least one of albedo image data, surface normal image data, depth image data, or mask image data. 
Kim discloses at least one of albedo image data, surface normal image data, depth image data, or mask image data (paragraph [0014]: extracting, from a training image, an albedo image of a face area and a surface normal image; paragraph [0021]: The albedo image may indicate a texture component of a face area without regard to an illumination of the face area, and the surface normal image may indicate a three-dimensional (3D) shape component of the face area without regard to the illumination).
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Chan, Loper and Gefen’s to display albedo image data as taught by Kim, to generate a mapping between the input pattern and an output pattern, which may be expressed as a learning ability of the neural network model.

Regarding claim 5, Chan as modified by Loper, Gefen and Kim discloses the computer-implemented method of claim 1, wherein the machine learning model includes a plurality of layers, and wherein the machine learning model has been trained by:
training a first layer included in the plurality of layers; and subsequent to training the first layer, training a second layer included in the plurality of layers that receives an input from the first layer (Kim’s paragraph [0054]: According to at least one example embodiment, the model trainer 130 may train the illumination compensation model based on a deformation model of an auto-encoder. The auto-encoder refers to a neural network model enabling a desired output to be identical to an input. The auto-encoder may include an encoder provided in an input layer and encoding layers (first layer), and a decoder provided in an output layer and decoding layers (second layer). The auto-encoder may have a neural network structure in which an output value of the encoder is input to the decoder and the decoder outputs the output value identical to an input value input to the encoder; Chan’s Page 5, section 3.3. Full Objective: compares pretrained VGGNet [35] features at different layers of the network. For training details see the supplementary material; Page 8, section 6. Potential Applications: an augmented reality stage performance art piece where a 3D-rendered dancer appears to float next to a real dancer [30]. Another is an in-game entertainment application making NBA players dance [44]). 
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Chan, Loper and Gefen’s to display albedo image data as taught by Kim, to generate a mapping between the input pattern and an output pattern, which may be expressed as a learning ability of the neural network model.

Claim 15 recites the functions of the method recited in claim 4 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the method in claim 4 applies to the medium steps of claim 15.
Claim 16 recites the functions of the method recited in claim 5 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the method in claim 5 applies to the medium steps of claim 16.

Claim 6-8 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Chan non-patent literature titled “Everybody Dance Now” in view of Loper U.S. Patent Application 20150206341, in view of Gefen U.S. Patent Application 20110063415, in view of Kim U.S. Patent Application 20170046563, and further in view of Gu U.S. Patent Application 20200160546.
Regarding claim 6, Chan as modified by Loper, Gefen and Kim discloses all the features with respect to claim 5 as outlined above. However, Chan as modified by Loper, Gefen and Kim fails to disclose second layer includes a convolution block that doubles a spatial resolution of the first layer.
Gu discloses second layer includes a convolution block that doubles a spatial resolution of the first layer (paragraph [0191]: The final layer in the first stage is a transposed convolution layer that applies a 4.times.4 convolution kernel to double the spatial resolution of the output of the first stage, in pixel space).
Therefore, it would be obvious before the effective filing date of the claimed invention to combine Chan, Loper, Gefen and Kim’s to double spatial resolution as taught by Gu, to accurately capture the depth information and increase spatial resolution.

Regarding claim 7, Chan as modified by Loper, Gefen, Kim and Gu discloses the computer-implemented method of claim 1, wherein a first branch of the machine learning model generates albedo image data, and wherein a second branch of the machine learning model generates surface normal image data, depth image data, and mask image data (Kim's paragraph [0014]: extracting, from a training image, an albedo image of a face area and a surface normal image; paragraph [0048]: training refers to a machine learning and the training data stored in a training data storage 140 includes various image data; Gu’s paragraph [0131]: Vertices may be, e.g., specified as a 4-coordinate vector (e.g., <x, y, z, w>) associated with one or more vertex attributes (e.g., color, texture coordinates, surface normal, etc.); paragraph [0135]: geometric primitives may each be scaled based on a depth of the viewing frustum; paragraph [0136]: The rasterization stage 660 may also compute a coverage mask for a plurality of pixels that indicates whether one or more sample locations for the pixel intercept the geometric primitive; paragraph [0147] Neural networks rely heavily on matrix math operations... the PPU 300 is a computing platform capable of delivering performance required for deep neural network-based artificial intelligence and machine learning applications).
Therefore, it would be obvious before the effective filing date of the claimed invention to combine Chan, Loper, Gefen and Kim’s to double spatial resolution as taught by Gu, to accurately capture the depth information and increase spatial resolution.

Regarding claim 8, Chan as modified by Loper, Gefen, Kim and Gu discloses the computer-implemented method of claim 8, wherein at least one of the surface normal image data and the mask image data is derived from the depth image data (Gu’s paragraph [0131]: Vertices may be, e.g., specified as a 4-coordinate vector (e.g., <x, y, z, w>) associated with one or more vertex attributes (e.g., color, texture coordinates, surface normal, etc.); paragraph [0135]: geometric primitives may each be scaled based on a depth of the viewing frustum).
Therefore, it would be obvious before the effective filing date of the claimed invention to combine Chan, Loper, Gefen and Kim’s to double spatial resolution as taught by Gu, to accurately capture the depth information and increase spatial resolution.

Claim 17 recites the functions of the method recited in claim 6 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the method in claim 6 applies to the medium steps of claim 17.
Claim 18 recites the functions of the method recited in claim 7 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the method in claim 7 applies to the medium steps of claim 18.
Claim 19 recites the functions of the method recited in claim 8 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the method in claim 8 applies to the medium steps of claim 19.

Claim 2, 9 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Chan non-patent literature titled “Everybody Dance Now” in view of Loper U.S. Patent Application 20150206341, in view of Gefen U.S. Patent Application 20110063415, and further in view of Gu U.S. Patent Application 20200160546.
Regarding claim 2, Chan as modified by Loper and Gefen discloses computer-implemented method of claim 1, wherein the second rig vector data includes positional data (Page 4, section 3.1. Pose Encoding and Normalization: analyzing the heights and ankle positions for the poses of each subject and use a linear mapping between the closest and farthest ankle positions in both videos. After gathering these positions, we calculate the scale and translation for each frame based on its corresponding pose detection). However, Chan as modified by Loper and Gefen fails to disclose x, y, z data explicitly.
Gu discloses x, y, z data (paragraph [0131]: Vertices may be, e.g., specified as a 4-coordinate vector (e.g., <x, y, z, w>) associated with one or more vertex attributes (e.g., color, texture coordinates, surface normal, etc.)).
Therefore, it would be obvious before the effective filing date of the claimed invention to combine Chan, Loper and Gefen’s to use coordinate data as taught by Gu, to accurately capture the depth information.

Regarding claim 9, Chan as modified by Loper, Gefen and Gu discloses the computer-implemented method of claim 1, further comprising compositing the second image data with image data associated with one or more additional 3D animatable objects to generate a composited scene (Chan’s Page 4, section 3.2. Pose to Video Translation, Face GAN: generating the full image of the scene with the main generator G; see Figure 8 background scene; Gu’s paragraph [0013]: receiving a sequence of input image data including image frames of a scene; paragraph [0127]: The vertex shader program and pixel shader program may execute concurrently, processing different data from the same scene in a pipelined fashion until all of the model data for the scene has been rendered to the frame buffer. Then, the contents of the frame buffer are transmitted to a display controller for display on a display device).
Therefore, it would be obvious before the effective filing date of the claimed invention to combine Chan, Loper and Gefen’s to generate scene as taught by Gu, to accurately capture the depth information.

Claim 13 recites the functions of the method recited in claim 2 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the method in claim 2 applies to the medium steps of claim 13.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Chan non-patent literature titled “Everybody Dance Now” in view of Loper U.S. Patent Application 20150206341, in view of Gefen U.S. Patent Application 20110063415, in view of Gu U.S. Patent Application 20200160546, and further in view of Kim U.S. Patent Application 20170046563.
Regarding claim 10, Chan as modified by Loper, Gefen and Gu discloses image data associated with one or more additional 3D animatable objects (see Chan’s Figure 8 background scene). However, Chan as modified by Loper, Gefen and Gu fails to disclose the second image data includes surface normal image data, and wherein compositing the second image data comprises applying a light model to the 3D asset based on the surface normal image data. 
Kim discloses the second image data includes surface normal image data, and wherein compositing the second image data comprises applying a light model to the 3D asset based on the surface normal image data (paragraph [0069]: In the encoder 310, a training image 330 is input to an illumination compensation model 340 based on a convolutional neural network (CNN) model, and the illumination compensation model 340 may output an albedo image 350, a surface normal image 360, and an illumination feature 370 with respect to the training image 330. The surface normal image 360 represents a 3D shape of a face when light is reflected from each direction of an x-axis, a y-axis, and a z-axis).
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Chan, Loper, Gefen and Gu’s to apply light model as taught by Kim, to generate a mapping between the input pattern and an output pattern, which may be expressed as a learning ability of the neural network model.

Allowable Subject Matter

Claim 11 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. 
The following is a statement of reasons for the indication of allowable subject matter:  
Claim 11 is about the computer-implemented method of claim 9, wherein the second image data includes first albedo image data and first depth image data, and wherein compositing the second image data with image data associated with one or more additional 3D animatable objects comprises: determining a difference between the first depth image data with second depth image data associated with a second 3D animatable object included in the one or more additional 3D animatable objects; and displaying at least one of the first albedo image data and second albedo image data associated with the second 3D animatable object based on the difference. This feature is distinguished from prior art combined, these limitations when read in light of the rest of the limitations in the claim and the claims to which it depends make the claim allowable subject matter.

Response to Arguments

Applicant's arguments filed 9/13/2022, page 8 - 9, with respect to the rejection(s) of claim(s) 1, 12 and 20 under 103 have been fully considered and are moot upon a new ground(s) of rejection made under 35 U.S.C. 103 as being unpatentable over Chan non-patent literature titled “Everybody Dance Now” in view of Loper U.S. Patent Application 20150206341, and further in view of Gefen U.S. Patent Application 20110063415, as outlined above.

Applicant argues on page 8-9 that Chan is completely silent with respect to any pose stick figure in a particular pose being rendered from a plurality of virtual camera views, as required by the amended claim language; in Gefen, the virtual object does not move based on rig vector data... Gefen is completely silent with respect to the rendering of the virtual object from different spatial orientations being used to train a machine learning model.

In reply, the rejection is based on Chan, Loper and Gefen combined. Chan discloses a machine learning model that has been trained via first image data of a 3D animatable asset generated from movements of the 3D animatable asset based on first rig vector data that is associated with a plurality of rig poses (Abstract: given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. We approach this problem as video-to video translation using pose as an intermediate representation; Figure 3: (Top) Training: Our model uses a pose detector P to create pose stick figures from video frames of the target subject… (Bottom) Transfer: We use a pose detector P to obtain pose joints for the source person that are transformed by our normalization process Norm into joints for the target person for which pose stick figures (rig vector data) are created. Then we apply the trained mapping G; page 2 section 2 Related Work: Several approaches rely on calibrated multi-camera setups to ‘scan’ a target actor and manipulate their motions in a new video through a fitted 3D model of the target; Figure 1: a video of a graduate student performing various motions, our method transfers the ballerina’s performance (movements) onto the student). 
Gefen disclose at least one of the movements of the 3D animatable asset is rendered from a plurality of camera views (paragraph [0056]: using 3D modeling techniques, including, for example and without limitation, texture loading, virtual camera modeling, and rendering to a view port, such as are widely used in gaming applications; paragraph [0058]: changing the appearance of an object includes revealing the interior of the virtual object... or changing the spatial orientation of the virtual object... Such changes in the appearance of the virtual object may be effected by, for example, renderer 37; paragraph [0031]: video tracking and object tracking can be used to estimate the camera parameters and track moving objects; Gefen’s rendering from a plurality of camera views can be combined with Chan and Loper’s device, such that moving pose stick figure in a particular pose can be trained and mapped, and rendered from a plurality of virtual camera views).

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Yi Yang whose telephone number is (571)272-9589.  The examiner can normally be reached on Monday-Friday 9:00 AM-6:00 PM EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 

/YI YANG/
Examiner, Art Unit 2616