DETAILED ACTION
		Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
2.	The information disclosure statement (IDS) submitted on 06/15/2021.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
3.	The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

4.	The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103(a) are summarized as follows:
1.	Determining the scope and contents of the prior art.
2.	Ascertaining the differences between the prior art and the claims at issue.
3.	Resolving the level of ordinary skill in the pertinent art.
4.	Considering objective evidence present in the application indicating obviousness or nonobviousness.

s 1-25 are rejected under 35 U.S.C. 103(a) as being unpatentable over Montserrat Mora et al. (US 2015/0178988 A1) in view of Amer et al. (US 2019/0304157 A1).
6.	With reference to claim 1, Montserrat Mora teaches A method for improving a three-dimensional (3D) representation of objects using semantic data, (“The method generating said 3D reconstruction model as an articulation model further using semantic information enabling animation in a fully automatic framework.” [0053] “the system is required to be compatible with standard human motion capture data, which implies that the internal skeleton cannot have virtual bones in order to improve the animation results, or at least, realistic human body animation should be obtainable without using them.” [0161]) Montserrat Mora also teaches receiving an input data generated in response to captured video; (“The process includes mesh generation, texture atlas creation, texture mapping, rigging and skinning. The resulting model is able to be animated using a standard animation engine, which allows using it in a wide range of applications, including movies or videogames.” [0089] “A sequence of images is captured from all peripheral cameras. These sequences are synchronized between them. They are added to the training sequences previously stored. Additionally, a sequence of images is captured from all frontal cameras meanwhile a structured light pattern is projected on the face of the RHM. These sequences are synchronized between them.” [0169]) Montserrat Mora further teaches setting at least one parameter for each region in the input data; (“These cameras must be previously calibrated in a common reference frame. This implies to retrieve their intrinsic parameters (focal distance, principal point, lens distortion), which model each camera generating a 3D representation based in part on the at least one parameter and semantic data associated with the input data. (“The global mesh is merged with the local mesh in order to obtain a quality improved global mesh. After registering the 3D mesh with the animation engine, rigging and skinning algorithms are applied. Meanwhile, the texture atlas is generated from the subset of global images and the capture room information. This texture atlas is then mapped to the improved and registered 3D mesh. Finally, the rigged and skinned 3D 

    PNG
    media_image1.png
    510
    463
    media_image1.png
    Greyscale

Montserrat Mora does not explicitly teach a filming area. This is what Amer teaches (“Animation module 324 may perform functions relating to generating animations or movies from a data structure. Given one or more composition graphs 328, animation module 324 may instantiate a possible movie corresponding to a grounded state within a 3D engine, which may involve raw coordinates, timings, and textures. In some examples, animation module 324 may include interactive three-dimensional animation software. In such an example, animation module 324 may be implemented similar to an actual movie production pipeline, yet may eliminate many of the barriers associated with the creation of an animated film. Animation module 324 may include a library of assets enabling rapid assembly of a scene, and may enable a user (e.g., user 
7.	With reference to claim 2, Montserrat Mora teaches labeling at least a portion of the input data using at least one of: a blendshape process (“The system is trained. A sequence of images is captured from all peripheral cameras. The room is empty during this process. The training sequences are stored in a temporary directory in capture servers. A background statistical model is computed from these frames for each peripheral camera. 2. The real human model or RHM is positioned in the capture room, in a predefined position. 3. A sequence of images is captured from all peripheral cameras. These sequences are synchronized between them. They are added to the training sequences previously stored. Additionally, a sequence of images is captured from all frontal cameras meanwhile a structured light pattern is projected on the face of the RHM. These sequences are synchronized between them. They are stored in a temporary directory present in the capture servers. 4. At this point, all the information necessary to generate the animatable 3D model is have. On the one hand, it can be grabbed the RHM acquisition in an external storage system, in order to capture other RHMs and carry on the 3D model generation later. On the other hand, it can be load a previously grabbed sequence of images from the external storage system into the 
Montserrat Mora does not explicitly teach a deep learning method. This is what Amer teaches (“Machine learning module 109 may represent a system that uses machine learning techniques (e.g., neural networks, deep learning, generative adversarial networks (GANs), convolutional neural networks (CNNs), recurrent neural networks (RNNs), or other artificial intelligence and/or machine learning techniques) to generate data structure 151 based on data of a type corresponding to textual 
8.	With reference to claim 3, Montserrat Mora teaches the at least one parameter is a mesh parameter, and wherein the mesh parameter includes a mesh density of a generated mesh. (“FIG. 11 shows a depth map with its corresponding reference image and the partial mesh recovered from that viewpoint (obtaining this partial mesh from the depth map is trivial). Multiple depth maps should be fused in order to generate a complete 3D model. This requires several images to be captured from different viewing directions. Some multi-view stereo methods which carry out complex depth map fusion have been presented previously. The invention uses VH for a global reconstruction and only critical areas are enhanced using local high accuracy meshes obtained by means of stereo reconstruction methods.” [0126])
9.	With reference to claim 4, Montserrat Mora teaches determining if a region in the labeled input data is labeled as a face; and generating a mesh of the region with a higher mesh density when the region in the input data is determined as a face relative to a region in the input data determined as a non-face. (“FIG. 11 shows a depth map with its corresponding reference image and the partial mesh recovered from that viewpoint (obtaining this partial mesh from the depth map is trivial). Multiple depth maps should be fused in order to generate a complete 3D model. This requires several images to be captured from different viewing directions. Some multi-view stereo methods which carry out complex depth map fusion have been presented 
10.	With reference to claim 5, Montserrat Mora teaches meshing the labeled input data, wherein the meshing further comprises: selecting a mesh process for one or more regions based on their respective labels; and creating a unified mesh by unifying the meshes created for the one or more regions. (“Before starting to move vertices, it is necessary to clean the mesh obtained from the depth map (it will be called "cloud mesh" from now) to avoid the information of different body parts which are not the face, which sometimes include noise and irregularities. To do this, as seen in FIG. 13, an auxiliary mesh will first be created, which is the head section of the original mesh. To determine which triangles belong to the mesh the triangles need to be classified according on how the t-barycenter sees them (i.e. from the front or from the back) using the dot product of the triangle normal and the vector which goes from the t-barycenter to the triangle. Starting with the seed (which, as it was described before, is the closest triangle found in the intersection of the line which joins the camera position and the center of the facial circle, and the body mesh), the face section is a continuous region of triangles seen from the back. As the shape of the head could include some irregular patterns that will not match with the last criteria to determine the head area of triangles (such a pony tail), it is important to use another system to back the invention method up: using the same information of the head situation in the original photograph, a plane is defined which will be used as a guillotine, rejecting possible non-desired 
11.	With reference to claim 6, Montserrat Mora does not explicitly teach the one or more regions are labeled at least as eyes or ears. This is what Amer teaches (“Input processing module 321 uses the information to recognize gestures made by the individual including emblematic, deictic, and iconic gestures. Behavior analytics system 314 uses a head node of the skeleton to extract the Region Of Interest (ROI) of the head/face from the high resolution RGB video. Behavior analytics system 314 outputs information about the ROI to input processing module 321. Input processing module 321 uses the information to track the head pose (orientation) as well as landmarks on the subject's face, and to recognize affective and communicative facial expressions. Input processing module 321 may also use head pose tracking to identify communicative gestures such as head nods and head shakes. Input processing module 321 may further track a subject's iris and eye pupil to determine 3D eye gaze vectors in conjunction with the head pose.” [0137]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Amer into Montserrat Mora, in order to enable an understanding of at least some of the decisions made by the AI system.
generating a mesh of a region by applying a human blendshape fitting method to generate a mesh. (“The system is trained. A sequence of images is captured from all peripheral cameras. The room is empty during this process. The training sequences are stored in a temporary directory in capture servers. A background statistical model is computed from these frames for each peripheral camera. 2. The real human model or RHM is positioned in the capture room, in a predefined position. 3. A sequence of images is captured from all peripheral cameras. These sequences are synchronized between them. They are added to the training sequences previously stored. Additionally, a sequence of images is captured from all frontal cameras meanwhile a structured light pattern is projected on the face of the RHM. These sequences are synchronized between them. They are stored in a temporary directory present in the capture servers. 4. At this point, all the information necessary to generate the animatable 3D model is have. On the one hand, it can be grabbed the RHM acquisition in an external storage system, in order to capture other RHMs and carry on the 3D model generation later. On the other hand, it can be load a previously grabbed sequence of images from the external storage system into the capture servers' temporary repositories, to perform a 3D model generation. 5. The sequences of images from the peripheral cameras (global images) are used to perform the foreground segmentation. A subset of synchronized images is chosen by taking one image from each sequence of global images. All global images of the subset correspond to the same time. Then the binary mask depicting the RHM silhouette is computed for the images of this subset. 6. The obtained subset of global masks is used to extract the visual hull of the RHM. A three-dimensional scalar field expressed in 
13.	With reference to claim 8, Montserrat Mora teaches determining if a region in the input data is labeled as a rigid body part based on semantic data; and tracking each region determined to be a rigid body part. (“Once the mesh structure has been recovered, semantic information must be added to the model in order to make its animation possible. Model animation is usually carried out considering joint angle changes as the measures to characterize human pose changing and gross motion. This means that poses can be defined by joint angles. By defining poses and motion in such a way, the body shape variations caused by pose changing and motion will consist of both rigid and non-rigid deformation. Rigid deformation is associated with the orientation and position of segments that connect joints. Non-rigid deformation is related to the changes in shape of soft tissues associated with segments in motion, which, however, excludes local deformation caused by muscle action alone. The most common method for measuring and defining joint angles is using a skeleton model. In the model, the 
14.	With reference to claim 9, Montserrat Mora teaches tracking at least movement, deformation, or other changes in the rigid body part across a time sequence. (“Once the mesh structure has been recovered, semantic information must be added to the model in order to make its animation possible. Model animation is usually carried out considering joint angle changes as the measures to characterize human pose changing and gross motion. This means that poses can be defined by joint angles. By defining poses and motion in such a way, the body shape variations caused by pose changing and motion will consist of both rigid and non-rigid deformation. Rigid deformation is associated with the orientation and position of segments that connect joints. Non-rigid deformation is related to the changes in shape of soft tissues associated with segments in motion, which, however, excludes local deformation caused by muscle action alone. The most common method for measuring and defining joint angles is using a skeleton model. In the model, the human body is divided into multiple segments according to major joints of the body, each segment is represented by a rigid linkage, and an appropriate joint is placed between the two corresponding 
15.	With reference to claim 10, Montserrat Mora teaches determining if a region in the labeled input data is labeled as a non-rigid body part based on semantic data; and tracking each region determined to be a non-rigid body part. (“Once the mesh structure has been recovered, semantic information must be added to the model in order to make its animation possible. Model animation is usually carried out considering 
16.	With reference to claim 11, Montserrat Mora teaches determining at least one property of each identified non-rigid body part to improve mesh creation. (“Once the mesh structure has been recovered, semantic information must be added to the model in order to make its animation possible. Model animation is usually carried out considering joint angle changes as the measures to characterize human pose changing and gross motion. This means that poses can be defined by joint angles. By defining poses and motion in such a way, the body shape variations caused by pose changing 
17.	With reference to claim 12, Montserrat Mora teaches determining a set of compression parameters based at least on importance of each identified region; and applying a compression process on the generated mesh based on the set of the determined compression parameters. (“a local high accuracy reconstruction rig may be composed by two or more cameras (see FIG. 4), relying only in passive methods to find correspondences. The method described in [27] may be used to generate the local high accuracy mesh. nce the depth map has been obtained, every pixel of the reference image can be assigned to a 3D position which defines a vertex in the local mesh (neighbor pixel connections can be assumed). This usually generates 
18.	With reference to claim 13, Montserrat Mora does not explicitly teach A non-transitory computer readable medium having stored thereon instructions for causing a processing circuitry to execute the method of claim 1. This is what Amer teaches (“this disclosure describes a computer-readable storage medium comprising instructions that, when executed, configure processing circuitry of a computing system to: receive information about a sequence of events involving a plurality of objects,  ... 
19.	Claim 14 is similar in scope to claim 1, and thus is rejected under similar rationale. Montserrat Mora does not explicitly teach a processing circuitry; and a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system. This is what Amer teaches (“this disclosure describes a computer-readable storage medium comprising instructions that, when executed, configure processing circuitry of a computing system to: receive information about a sequence of events involving a plurality of objects,  ... information sufficient to create an animation illustrating the sequence of events.” [0016] “one or 
20.	Claims 15-25 are similar in scope to claims 2-12, and they are rejected under similar rationale.

Conclusion
21.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michelle Chin whose telephone number is (571)270-
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at (886)217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call (800)786-9199 (IN USA OR CANANA) or (571)-272-1000.

/MICHELLE CHIN/

Primary Examiner, Art Unit 2619