DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of the Claims
Claims 1-19 are pending in the present application, with claims 1, 10, and 16-19 being independent.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 28 February 2022 has been considered by the examiner.
Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim(s) 1-19 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claim 1 recites specify, for an element forming a three-dimensional space for generating three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the object. Given the plain and ordinary meaning of the words themselves or when interpreted in light of the corresponding disclosure, the scope of the limitation is unclear. As an initial matter, it is unclear as to what an element forming three-dimensional space in the claim limitation is referencing. (e.g., Is it an object? A voxel? A pixel? An area?). Secondly, it is not clear as to how the pixel or area correspond to the element in the area of the object?  Is the element a second object near the object? A region near the object?  The examiner respectfully requests the applicant clarify the scope of the aforementioned claimed limitation.
Claim 16 recites a substantially similar limitation as to that of claim 1 and is also rejected using substantially similar rationale as to that of claim 1. 
Claims depending thereon do not cure the noted deficiency and are accordingly rejected using substantially similar rationale as to that set forth for the claims from which they depend. 
Claim 3 recites “wherein the three-dimensional shape data corresponding to the object is generated by removing, from the three-dimensional shape data generated based on the obtained image data representing the area of the object and the obtained image data representing the area of the structure, data corresponding to an element for which the specified number of the image capturing apparatuses is less than a threshold value.” Given the plain and ordinary meaning of the words themselves or when interpreted in light of the corresponding disclosure, the scope of the limitation is unclear. For instance, it is unclear as to how the 3D shape data is generated by removing data corresponding to an element for which the specified number of the image apparatuses is less than a threshold value? Furthermore, is the element of claim 3 different or the same as the element set forth in claim 1.  The examiner respectfully requests the applicant clarify the scope of the aforementioned limitation.
Claim 12 recites a substantially similar limitation as to that of claim 3 and is also rejected using substantially similar rationale as to that of claim 3. 
Claims depending thereon do not cure the noted deficiency and are accordingly rejected using substantially similar rationale as to that set forth for the claims from which they depend. 
Claims 9 and 15 recite “the element is a point or a voxel forming the three-dimensional space.” It is not immediately clear given the plain and ordinary meaning of the words themselves, or when interpreted in light of the corresponding disclosure how the element, which forms a three-dimensional space for generating three-dimensional shape data corresponding to the object could be a singular point. The examiner respectfully requests the applicant clarify the scope of the claimed limitation.
Claim 10 recites “specify, for an element forming a three-dimensional space for generating three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure.” Given the plain and ordinary meaning of the words themselves or when interpreted in light of the corresponding disclosure, the scope of the limitation is unclear. As an initial matter, it is unclear as to what an element forming three-dimensional space in the claim limitation is referencing. (e.g., Is it an object? A voxel? A pixel? An area?). Secondly, it is not clear as to how the pixel or area correspond to the element in the area of the structure?  Is the element a second object near the structure? A region near the structure?  The examiner respectfully requests the applicant clarify the scope of the aforementioned claimed limitation.
Claims 17-19 recite a substantially similar limitation as to that of claim 10 and are also rejected using substantially similar rationale as to that of claim 10. 
Claims depending thereon do not cure the noted deficiency and are accordingly also rejected using substantially similar rationale as to that set forth for the claims from which they depend.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 7-11, and 13-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Würmlin et al (US PG Publication 2009/0315978) in view of Kuhn et al. (“Multi-View Reconstruction of Unknown Objects within a Known Environment,” 2009) further in view of Shin et al. (“Multi-Object Reconstruction from Dynamic Scenes: An Object-Centered Approach,” 2013).
Regarding claim 1, Würmlin teaches a generation device comprising: one or more memories storing instructions (see for instance, paragraphs 107 and 128); and one or more processors executing the instructions (see for instance, paragraphs 107, and 126-128) to: 
obtain image data representing an area of the object within a plurality of captured images obtained by a plurality of image capturing apparatuses that perform image capturing from a plurality of image capturing directions (The view from these cameras allows tracking of all objects, as long as they do not leave the field, see paragraph 54. The object identification method associates, for each visible object in each video stream, the object’s 2D position and shape in the color texture data with a real object (e.g., players, goalkeepers, referees, ball, etc) based on the camera calibration data, the information on the real-world objects contained in a resource data module and possibly also the extrapolated 3D object position and the 2D position and shape for essentially all objects in all frames of all cameras, see paragraph 147. This method calculates a segmentation or cutout of the color texture data inside the area defined by the object’s position and size between foreground (object) pixels and background, so-called alpha mask, see paragraph 168);
obtain image data representing an area of a structure that occludes the object at a time of image capturing from at least one image capturing direction of the plurality of image capturing directions (The situation can be seen in the view from one camera with both objects colliding in 2D, whereas the same situation from another camera view shows no collision of the objects…1. The tracking method can use the information that two or more objects collide in a certain view…2. The tracking method can keep track of the objects after a collision since it knows where the objects are located or are expected to be in 3D space, see paragraphs 143-145. Fig. 5 schematically shows a 2D object position and size structure, in a bounding box and also depicts the difference between foreground and background and the alpha mask resulting from the cutout or segmentation method, see paragraph 169. If the bounding box does not contain the entire object or intersects the bounding box, the bounding box can be enlarged by a certain size, see paragraph 170. This method calculates a segmentation or cutout of the color texture data inside the area defined by the object’s position and size between foreground (object) pixels and background, so-called alpha mask, see paragraph 168);
specify, for an element forming a three-dimensional space for generating three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the object; and
generate the three-dimensional shape data corresponding to the object based on the obtained image data representing the area of the object, the obtained image data representing the area of the structure, and information of the specified number of the image capturing apparatuses (3D reconstruction of the scene is performed, see for instance, paragraphs 4, 52, and 195. Objects are rendered from a virtual view using a particular 3D representation of the scene and using the object textures and either fixed or view dependent alpha values…rendering of geometry of the background by alpha blending and depth buffering with the already rendered objects and by blending one or more hole-filled background textures…, see paragraphs 195-201).
Würmlin does not teach specifying/determining the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding in the area of the object, and thus does not teach specify, for an element forming a three-dimensional space for generating three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the object.
In the same art of multi-view reconstruction, Kuhn teaches that the reconstruction of objects based on its silhouettes in multiple cameras is known as surface from silhouette or inferred visual hull…almost all approaches have the assumptions that the object(s) to reconstruct is not occluded by static obstacles, see page 784, paragraph 2. We propose a new approach for treating occlusions in a general and more accurate way by additionally utilizing the geometrical information of the known environment, see page 785, paragraph 2. A surveyed scene contains static and dynamic objects – Static objects are racks, tables, etc, whose geometry, position and appearance are known and do not change over time (apart from possibly occurring shadows and from illumination changes caused by the dynamic obstacles, see page 786, section 2.1 Object Types. Dynamic objects are robots, conveyor belts, humans…This group must be divided into two subgroups: Subgroup 1 contains known dynamic object, with changing geometry, position, and appearance (e.g., roots and conveyor belts); Subgroup 2 contains dynamic objects with unknown changing geometry, position, and appearance (e.g., humans), see page 786, section 2.1 Object Types. Using several cameras with different perspectives onto the surveyed scene, each camera that is used provides a different occlusion and visibility situation, as discovered in the previous section that now has to be merged…all different labeled regions of all cameras are intersected among each other…the resulting intersections are grouped into connected components see page 788, section 2.3 Simultaneous Use of Several Cameras and figs. 2 and 3. Plausibility checks are utilized to revise reconstruction artifacts that do not contain an object, see for instance, page 785, paragraph 2 and page 789, section 2.4, Plausibility Checks. The surface voxel determination is described in section 3.1 on pages 790 and 791. In summary, each camera provides lists of potential surface voxels due to the segmentation – these voxels are sequentially tested in all other cameras – if no camera marks a voxel as free, it actually is a surface voxel, see page 791, paragraph 1.
It would have been obvious to one of ordinary skill in the art having the teachings of Würmlin and Kuhn in front of them before the effective filing date of the claimed invention to incorporate occlusion handling as taught by Kuhn into Würmlin 3D rendering system, as reconstructing objects using multiple cameras, classifying objects and labeling them such as described by Kuhn was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Würmlin. 
The modification of Würmlin with Kuhn would have explicitly allowed the plurality of images to include both dynamic and static obstacles.  Würmlin with Kuhn also teach specifying, for an element forming a three-dimensional space for generating three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the object, when the number of cameras. 
The motivation for combining Würmlin with Kuhn would have been to improve occlusion and visibility handling, see for instance, Kuhn, abstract and page 785, paragraphs 1 and 2 and fig. 1.
Würmlin in view of Kuhn teach the broadest reasonably interpretation of the claimed limitation (e.g., when the number of cameras is two), in the interest of compact prosecution, Shin is being brought in to teach determining which views contain an image of an object. 
In the same art of object reconstruction, Shin teaches that the first step of our algorithm is object recognition across images…we start by determining the number and locations of objects in images, see section 3.1 Recognition. “We compute the camera projection matrices of each individual object using SfM technique. The point correspondences are optimized by the bundle adjustment optimization and the SfM yields both sparsely reconstructed set of 3D point coordinates X with camera matrices P. Assuming we are given K objects in N images, we can have following set of camera matrices and points…Note that not all objects have to be visible in all images; thus, if an object is missing in some images, the corresponding camera matrices and 3D points are not available”, see section 3.2 Camera Calibration.
It would have been obvious to one of ordinary skill in the art having the teachings of Würmlin, Kuhn, Shin in front of them before the effective filing date of the claimed invention to incorporate object visibility in dynamic scenes as taught by Shin into Würmlin 3D rendering system, as determining which camera views visualize a given object such as described by Shin was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Würmlin and Kuhn. 
The modification of Würmlin and Kuhn with Shin would have explicitly allowed specifying, for an element forming a three-dimensional space for generating three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the object. 
The motivation for combining Würmlin and Kuhn with Shin would have been to improve dynamic scene reconstruction and visibility handling, see for instance, Shin, abstract.
Regarding claim 2, Würmlin in view of Kuhn in further view of Shin teach the generation device according to claim 1 and further teach wherein the three-dimensional shape data corresponding to the object is generated based on the information of the specified number of the image capturing apparatuses and three-dimensional shape data generated based on the obtained image data representing the area of the object and the obtained image data representing the area of the structure (3D reconstruction of the scene is performed, see for instance, Würmlin paragraphs 4, 52, and 195. Fig. 5 schematically shows a 2D object position and size structure, in a bounding box and also depicts the difference between foreground and background and the alpha mask resulting from the cutout or segmentation method, see paragraph 169. If the bounding box does not contain the entire object or intersects the bounding box, the bounding box can be enlarged by a certain size, see Würmlin paragraph 170. Objects are rendered from a virtual view using a particular 3D representation of the scene and using the object textures and either fixed or view dependent alpha values…rendering of geometry of the background by alpha blending and depth buffering with the already rendered objects and by blending one or more hole-filled background textures…, see Würmlin paragraphs 195-201. Using several cameras with different perspectives onto the surveyed scene, each camera that is used provides a different occlusion and visibility situation, as discovered in the previous section that now has to be merged…all different labeled regions of all cameras are intersected among each other…the resulting intersections are grouped into connected components see Kuhn, page 788, section 2.3 Simultaneous Use of Several Cameras and figs. 2 and 3. Plausibility checks are utilized to revise reconstruction artifacts that do not contain an object, see for instance, Kuhn, page 785, paragraph 2 and page 789, section 2.4, Plausibility Checks. The surface voxel determination is described in section 3.1 on Kuhn, pages 790 and 791. “We compute the camera projection matrices of each individual object using SfM technique. The point correspondences are optimized by the bundle adjustment optimization and the SfM yields both sparsely reconstructed set of 3D point coordinates X with camera matrices P. Assuming we are given K objects in N images, we can have following set of camera matrices and points…Note that not all objects have to be visible in all images; thus, if an object is missing in some images, the corresponding camera matrices and 3D points are not available”, see Shin, section 3.2 Camera Calibration). The motivation to combine Würmlin, Kuhn, and Shin is the same as that which was set forth in claim 1.
Regarding claim 7, Würmlin in view of Kuhn in further view of Shin teach the generation device according to claim 1 and further teach wherein the obtained image data representing an area of the object is a first image representing the area of the object (The view from these cameras allows tracking of all objects, as long as they do not leave the field, see Würmlin paragraph 54. The object identification method associates, for each visible object in each video stream, the object’s 2D position and shape in the color texture data with a real object (e.g., players, goalkeepers, referees, ball, etc) based on the camera calibration data, the information on the real-world objects contained in a resource data module and possibly also the extrapolated 3D object position and the 2D position and shape for essentially all objects in all frames of all cameras, see Würmlin paragraph 147. This method calculates a segmentation or cutout of the color texture data inside the area defined by the object’s position and size between foreground (object) pixels and background, so-called alpha mask, see Würmlin paragraph 168), and the specified number of the image capturing apparatuses is the number of image capturing apparatuses corresponding to the first image whose the area of the object includes a pixel or an area corresponding to the element (Using several cameras with different perspectives onto the surveyed scene, each camera that is used provides a different occlusion and visibility situation, as discovered in the previous section that now has to be merged…all different labeled regions of all cameras are intersected among each other…the resulting intersections are grouped into connected components see Kuhn, page 788, section 2.3 Simultaneous Use of Several Cameras and figs. 2 and 3. Plausibility checks are utilized to revise reconstruction artifacts that do not contain an object, see for instance,Kuhn, page 785, paragraph 2 and page 789, section 2.4, Plausibility Checks. In summary, each camera provides lists of potential surface voxels due to the segmentation – these voxels are sequentially tested in all other cameras – if no camera marks a voxel as free, it actually is a surface voxel, see Kuhn, page 791, paragraph 1. The first step of our algorithm is object recognition across images…we start by determining the number and locations of objects in images, see Shin, section 3.1 Recognition. “We compute the camera projection matrices of each individual object using SfM technique. The point correspondences are optimized by the bundle adjustment optimization and the SfM yields both sparsely reconstructed set of 3D point coordinates X with camera matrices P. Assuming we are given K objects in N images, we can have following set of camera matrices and points…Note that not all objects have to be visible in all images; thus, if an object is missing in some images, the corresponding camera matrices and 3D points are not available”, see Shin, section 3.2 Camera Calibration). The motivation to combine Würmlin, Kuhn, and Shin is the same as that which was set forth in claim 1.
Regarding claim 8, Würmlin in view of Kuhn in further view of Shin teach the generation device according to claim 1 and further teach wherein the obtained image data representing an area of the object is image data of an image representing the area of the object (The view from these cameras allows tracking of all objects, as long as they do not leave the field, see Würmlin paragraph 54. The object identification method associates, for each visible object in each video stream, the object’s 2D position and shape in the color texture data with a real object (e.g., players, goalkeepers, referees, ball, etc) based on the camera calibration data, the information on the real-world objects contained in a resource data module and possibly also the extrapolated 3D object position and the 2D position and shape for essentially all objects in all frames of all cameras, see Würmlin paragraph 147. This method calculates a segmentation or cutout of the color texture data inside the area defined by the object’s position and size between foreground (object) pixels and background, so-called alpha mask, see Würmlin paragraph 168), and the specified number of the image capturing apparatuses is the number of image capturing apparatuses corresponding to the image whose the area of the object includes a pixel or an area corresponding to the element (Using several cameras with different perspectives onto the surveyed scene, each camera that is used provides a different occlusion and visibility situation, as discovered in the previous section that now has to be merged…all different labeled regions of all cameras are intersected among each other…the resulting intersections are grouped into connected components see Kuhn, page 788, section 2.3 Simultaneous Use of Several Cameras and figs. 2 and 3. Plausibility checks are utilized to revise reconstruction artifacts that do not contain an object, see for instance,Kuhn, page 785, paragraph 2 and page 789, section 2.4, Plausibility Checks. In summary, each camera provides lists of potential surface voxels due to the segmentation – these voxels are sequentially tested in all other cameras – if no camera marks a voxel as free, it actually is a surface voxel, see Kuhn, page 791, paragraph 1. The first step of our algorithm is object recognition across images…we start by determining the number and locations of objects in images, see Shin, section 3.1 Recognition. “We compute the camera projection matrices of each individual object using SfM technique. The point correspondences are optimized by the bundle adjustment optimization and the SfM yields both sparsely reconstructed set of 3D point coordinates X with camera matrices P. Assuming we are given K objects in N images, we can have following set of camera matrices and points…Note that not all objects have to be visible in all images; thus, if an object is missing in some images, the corresponding camera matrices and 3D points are not available”, see Shin, section 3.2 Camera Calibration). The motivation to combine Würmlin, Kuhn, and Shin is the same as that which was set forth in claim 1.
Regarding claim 9, Würmlin in view of Kuhn in further view of Shin teach the generation device according to claim 1 and further teach wherein the element is a point or a voxel forming the three-dimensional space (A voxel-based algorithm of our approach is provided in section 3, see for instance, Kuhn, page 785, last paragraph). The motivation to combine Würmlin, Kuhn, and Shin is the same as that which was set forth in claim 1.
Regarding claim 10, Würmlin teaches a generation device comprising: one or more memories storing instructions (see for instance, paragraphs 107 and 128); and one or more processors executing the instructions (see for instance, paragraphs 107, and 126-128) to: 
obtain image data representing an area of the object within a plurality of captured images obtained by a plurality of image capturing apparatuses that perform image capturing from a plurality of image capturing directions (The view from these cameras allows tracking of all objects, as long as they do not leave the field, see paragraph 54. The object identification method associates, for each visible object in each video stream, the object’s 2D position and shape in the color texture data with a real object (e.g., players, goalkeepers, referees, ball, etc) based on the camera calibration data, the information on the real-world objects contained in a resource data module and possibly also the extrapolated 3D object position and the 2D position and shape for essentially all objects in all frames of all cameras, see paragraph 147. This method calculates a segmentation or cutout of the color texture data inside the area defined by the object’s position and size between foreground (object) pixels and background, so-called alpha mask, see paragraph 168); 
obtain image data representing an area of a structure that occludes the object at a time of image capturing from at least one image capturing direction of the plurality of image capturing directions (The situation can be seen in the view from one camera with both objects colliding in 2D, whereas the same situation from another camera view shows no collision of the objects…1. The tracking method can use the information that two or more objects collide in a certain view…2. The tracking method can keep track of the objects after a collision since it knows where the objects are located or are expected to be in 3D space, see paragraphs 143-145. Fig. 5 schematically shows a 2D object position and size structure, in a bounding box and also depicts the difference between foreground and background and the alpha mask resulting from the cutout or segmentation method, see paragraph 169. If the bounding box does not contain the entire object or intersects the bounding box, the bounding box can be enlarged by a certain size, see paragraph 170. This method calculates a segmentation or cutout of the color texture data inside the area defined by the object’s position and size between foreground (object) pixels and background, so-called alpha mask, see paragraph 168); 
specify, for an element forming a three-dimensional space for generating three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure; and 
generate three-dimensional shape data corresponding to the object based on the obtained image data representing the area of the object, the obtained image data representing the area of the structure, and information of the specified number of the image capturing apparatuses (3D reconstruction of the scene is performed, see for instance, paragraphs 4, 52, and 195. Objects are rendered from a virtual view using a particular 3D representation of the scene and using the object textures and either fixed or view dependent alpha values…rendering of geometry of the background by alpha blending and depth buffering with the already rendered objects and by blending one or more hole-filled background textures…, see paragraphs 195-201).
Würmlin does not teach specifying/determining the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding in the area of the structure, and thus does not teach specify, for an element forming a three-dimensional space for generating three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure.
In the same art of multi-view reconstruction, Kuhn teaches that the reconstruction of objects based on its silhouettes in multiple cameras is known as surface from silhouette or inferred visual hull…almost all approaches have the assumptions that the object(s) to reconstruct is not occluded by static obstacles, see page 784, paragraph 2. We propose a new approach for treating occlusions in a general and more accurate way by additionally utilizing the geometrical information of the known environment, see page 785, paragraph 2. A surveyed scene contains static and dynamic objects – Static objects are racks, tables, etc, whose geometry, position and appearance are known and do not change over time (apart from possibly occurring shadows and from illumination changes caused by the dynamic obstacles, see page 786, section 2.1 Object Types. Dynamic objects are robots, conveyor belts, humans…This group must be divided into two subgroups: Subgroup 1 contains known dynamic object, with changing geometry, position, and appearance (e.g., roots and conveyor belts); Subgroup 2 contains dynamic objects with unknown changing geometry, position, and appearance (e.g., humans), see page 786, section 2.1 Object Types. Using several cameras with different perspectives onto the surveyed scene, each camera that is used provides a different occlusion and visibility situation, as discovered in the previous section that now has to be merged…all different labeled regions of all cameras are intersected among each other…the resulting intersections are grouped into connected components see page 788, section 2.3 Simultaneous Use of Several Cameras and figs. 2 and 3. Plausibility checks are utilized to revise reconstruction artifacts that do not contain an object, see for instance, page 785, paragraph 2 and page 789, section 2.4, Plausibility Checks. The surface voxel determination is described in section 3.1 on pages 790 and 791. In summary, each camera provides lists of potential surface voxels due to the segmentation – these voxels are sequentially tested in all other cameras – if no camera marks a voxel as free, it actually is a surface voxel, see page 791, paragraph 1.
It would have been obvious to one of ordinary skill in the art having the teachings of Würmlin and Kuhn in front of them before the effective filing date of the claimed invention to incorporate occlusion handling as taught by Kuhn into Würmlin 3D rendering system, as reconstructing objects using multiple cameras, classifying objects and labeling them such as described by Kuhn was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Würmlin. 
The modification of Würmlin with Kuhn would have explicitly allowed the plurality of images to include both dynamic and static obstacles.  Würmlin with Kuhn also teach specifying, for an element forming a three-dimensional space for generating three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure. 
The motivation for combining Würmlin with Kuhn would have been to improve occlusion and visibility handling, see for instance, Kuhn, abstract and page 785, paragraphs 1 and 2 and fig. 1.
Würmlin in view of Kuhn teach the broadest reasonably interpretation of the claimed limitation (e.g., when the number of cameras is two), in the interest of compact prosecution, Shin is being brought in to teach determining which views contain an image of an object/structure. 
In the same art of object reconstruction, Shin teaches that the first step of our algorithm is object recognition across images…we start by determining the number and locations of objects in images, see section 3.1 Recognition. “We compute the camera projection matrices of each individual object using SfM technique. The point correspondences are optimized by the bundle adjustment optimization and the SfM yields both sparsely reconstructed set of 3D point coordinates X with camera matrices P. Assuming we are given K objects in N images, we can have following set of camera matrices and points…Note that not all objects have to be visible in all images; thus, if an object is missing in some images, the corresponding camera matrices and 3D points are not available”, see section 3.2 Camera Calibration.
It would have been obvious to one of ordinary skill in the art having the teachings of Würmlin, Kuhn, Shin in front of them before the effective filing date of the claimed invention to incorporate object visibility in dynamic scenes as taught by Shin into Würmlin 3D rendering system, as determining which camera views visualize a given object such as described by Shin was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Würmlin and Kuhn. 
The modification of Würmlin and Kuhn with Shin would have explicitly allowed specifying, for an element forming a three-dimensional space for generating three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure. 
The motivation for combining Würmlin and Kuhn with Shin would have been to improve dynamic scene reconstruction and visibility handling, see for instance, Shin, abstract.
Regarding claim 11, Würmlin in view of Kuhn in further view of Shin teach the generation device according to claim 10 and further teach wherein the three-dimensional shape data corresponding to the object is generated based on the information of the specified number of the image capturing apparatuses and three-dimensional shape data generated based on the obtained image data representing the area of the object and the obtained image data representing the area of the structure (3D reconstruction of the scene is performed, see for instance, Würmlin paragraphs 4, 52, and 195. Fig. 5 schematically shows a 2D object position and size structure, in a bounding box and also depicts the difference between foreground and background and the alpha mask resulting from the cutout or segmentation method, see paragraph 169. If the bounding box does not contain the entire object or intersects the bounding box, the bounding box can be enlarged by a certain size, see Würmlin paragraph 170. Objects are rendered from a virtual view using a particular 3D representation of the scene and using the object textures and either fixed or view dependent alpha values…rendering of geometry of the background by alpha blending and depth buffering with the already rendered objects and by blending one or more hole-filled background textures…, see Würmlin paragraphs 195-201. Using several cameras with different perspectives onto the surveyed scene, each camera that is used provides a different occlusion and visibility situation, as discovered in the previous section that now has to be merged…all different labeled regions of all cameras are intersected among each other…the resulting intersections are grouped into connected components see Kuhn, page 788, section 2.3 Simultaneous Use of Several Cameras and figs. 2 and 3. Plausibility checks are utilized to revise reconstruction artifacts that do not contain an object, see for instance, Kuhn, page 785, paragraph 2 and page 789, section 2.4, Plausibility Checks. The surface voxel determination is described in section 3.1 on Kuhn, pages 790 and 791. “We compute the camera projection matrices of each individual object using SfM technique. The point correspondences are optimized by the bundle adjustment optimization and the SfM yields both sparsely reconstructed set of 3D point coordinates X with camera matrices P. Assuming we are given K objects in N images, we can have following set of camera matrices and points…Note that not all objects have to be visible in all images; thus, if an object is missing in some images, the corresponding camera matrices and 3D points are not available”, see Shin, section 3.2 Camera Calibration). The motivation to combine Würmlin, Kuhn, and Shin is the same as that which was set forth in claim 10.
Regarding claim 13, Würmlin in view of Kuhn in further view of Shin teach the generation device according to claim 10 and further teach wherein the obtained image data representing an area of the structure is image data of an image representing the area of the structure (The view from these cameras allows tracking of all objects, as long as they do not leave the field, see Würmlin paragraph 54. The object identification method associates, for each visible object in each video stream, the object’s 2D position and shape in the color texture data with a real object (e.g., players, goalkeepers, referees, ball, etc) based on the camera calibration data, the information on the real-world objects contained in a resource data module and possibly also the extrapolated 3D object position and the 2D position and shape for essentially all objects in all frames of all cameras, see Würmlin paragraph 147. This method calculates a segmentation or cutout of the color texture data inside the area defined by the object’s position and size between foreground (object) pixels and background, so-called alpha mask, see Würmlin paragraph 168), and the specified number of the image capturing apparatuses is the number of image capturing apparatuses corresponding to the image whose the area of the structure includes a pixel or an area corresponding to the element (Using several cameras with different perspectives onto the surveyed scene, each camera that is used provides a different occlusion and visibility situation, as discovered in the previous section that now has to be merged…all different labeled regions of all cameras are intersected among each other…the resulting intersections are grouped into connected components see Kuhn, page 788, section 2.3 Simultaneous Use of Several Cameras and figs. 2 and 3. Plausibility checks are utilized to revise reconstruction artifacts that do not contain an object, see for instance,Kuhn, page 785, paragraph 2 and page 789, section 2.4, Plausibility Checks. In summary, each camera provides lists of potential surface voxels due to the segmentation – these voxels are sequentially tested in all other cameras – if no camera marks a voxel as free, it actually is a surface voxel, see Kuhn, page 791, paragraph 1. The first step of our algorithm is object recognition across images…we start by determining the number and locations of objects in images, see Shin, section 3.1 Recognition. “We compute the camera projection matrices of each individual object using SfM technique. The point correspondences are optimized by the bundle adjustment optimization and the SfM yields both sparsely reconstructed set of 3D point coordinates X with camera matrices P. Assuming we are given K objects in N images, we can have following set of camera matrices and points…Note that not all objects have to be visible in all images; thus, if an object is missing in some images, the corresponding camera matrices and 3D points are not available”, see Shin, section 3.2 Camera Calibration). The motivation to combine Würmlin, Kuhn, and Shin is the same as that which was set forth in claim 10.
Regarding claim 14, Würmlin in view of Kuhn in further view of Shin teach the generation device according to claim 10 and further teach wherein the obtained image data representing an area of the structure is image data of the image representing the area of the structure (The view from these cameras allows tracking of all objects, as long as they do not leave the field, see Würmlin paragraph 54. The object identification method associates, for each visible object in each video stream, the object’s 2D position and shape in the color texture data with a real object (e.g., players, goalkeepers, referees, ball, etc) based on the camera calibration data, the information on the real-world objects contained in a resource data module and possibly also the extrapolated 3D object position and the 2D position and shape for essentially all objects in all frames of all cameras, see Würmlin paragraph 147. This method calculates a segmentation or cutout of the color texture data inside the area defined by the object’s position and size between foreground (object) pixels and background, so-called alpha mask, see Würmlin paragraph 168), and the specified number of the image capturing apparatuses is the number of image capturing apparatuses corresponding to the image whose the area of the structure includes a pixel or an area corresponding to the element (Using several cameras with different perspectives onto the surveyed scene, each camera that is used provides a different occlusion and visibility situation, as discovered in the previous section that now has to be merged…all different labeled regions of all cameras are intersected among each other…the resulting intersections are grouped into connected components see Kuhn, page 788, section 2.3 Simultaneous Use of Several Cameras and figs. 2 and 3. Plausibility checks are utilized to revise reconstruction artifacts that do not contain an object, see for instance, Kuhn, page 785, paragraph 2 and page 789, section 2.4, Plausibility Checks. In summary, each camera provides lists of potential surface voxels due to the segmentation – these voxels are sequentially tested in all other cameras – if no camera marks a voxel as free, it actually is a surface voxel, see Kuhn, page 791, paragraph 1. The first step of our algorithm is object recognition across images…we start by determining the number and locations of objects in images, see Shin, section 3.1 Recognition. “We compute the camera projection matrices of each individual object using SfM technique. The point correspondences are optimized by the bundle adjustment optimization and the SfM yields both sparsely reconstructed set of 3D point coordinates X with camera matrices P. Assuming we are given K objects in N images, we can have following set of camera matrices and points…Note that not all objects have to be visible in all images; thus, if an object is missing in some images, the corresponding camera matrices and 3D points are not available”, see Shin, section 3.2 Camera Calibration). The motivation to combine Würmlin, Kuhn, and Shin is the same as that which was set forth in claim 10.
Regarding claim 15, Würmlin in view of Kuhn in further view of Shin teach the generation device according to claim 10 and further teach wherein the element is a point or a voxel forming the three-dimensional space (A voxel-based algorithm of our approach is provided in section 3, see for instance, Kuhn, page 785, last paragraph). The motivation to combine Würmlin, Kuhn, and Shin is the same as that which was set forth in claim 10.
Regarding claim 16, Würmlin teaches a method of generating three-dimensional shape data corresponding to an object (see for instance, Würmlin, abstract), the method comprising:
obtaining image data representing an area of the object within a plurality of captured images obtained by a plurality of image capturing apparatuses that perform image capturing from a plurality of image capturing directions (The view from these cameras allows tracking of all objects, as long as they do not leave the field, see paragraph 54. The object identification method associates, for each visible object in each video stream, the object’s 2D position and shape in the color texture data with a real object (e.g., players, goalkeepers, referees, ball, etc) based on the camera calibration data, the information on the real-world objects contained in a resource data module and possibly also the extrapolated 3D object position and the 2D position and shape for essentially all objects in all frames of all cameras, see paragraph 147. This method calculates a segmentation or cutout of the color texture data inside the area defined by the object’s position and size between foreground (object) pixels and background, so-called alpha mask, see paragraph 168); 
obtaining image data representing an area of a structure that occludes the object at a time of image capturing from at least one image capturing direction of the plurality of image capturing directions (The situation can be seen in the view from one camera with both objects colliding in 2D, whereas the same situation from another camera view shows no collision of the objects…1. The tracking method can use the information that two or more objects collide in a certain view…2. The tracking method can keep track of the objects after a collision since it knows where the objects are located or are expected to be in 3D space, see paragraphs 143-145. Fig. 5 schematically shows a 2D object position and size structure, in a bounding box and also depicts the difference between foreground and background and the alpha mask resulting from the cutout or segmentation method, see paragraph 169. If the bounding box does not contain the entire object or intersects the bounding box, the bounding box can be enlarged by a certain size, see paragraph 170. This method calculates a segmentation or cutout of the color texture data inside the area defined by the object’s position and size between foreground (object) pixels and background, so-called alpha mask, see paragraph 168); 
specifying, for an element forming a three-dimensional space for generating the three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the object; and 
generating three-dimensional shape data corresponding to the object based on the obtained image data representing the area of the object, the obtained image data representing the area of the structure, and the specified number of the image capturing apparatuses (3D reconstruction of the scene is performed, see for instance, paragraphs 4, 52, and 195. Objects are rendered from a virtual view using a particular 3D representation of the scene and using the object textures and either fixed or view dependent alpha values…rendering of geometry of the background by alpha blending and depth buffering with the already rendered objects and by blending one or more hole-filled background textures…, see paragraphs 195-201).
Würmlin does not teach specifying/determining the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding in the area of the object, and thus does not teach specifying, for an element forming a three-dimensional space for generating the three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the object.
In the same art of multi-view reconstruction, Kuhn teaches that the reconstruction of objects based on its silhouettes in multiple cameras is known as surface from silhouette or inferred visual hull…almost all approaches have the assumptions that the object(s) to reconstruct is not occluded by static obstacles, see page 784, paragraph 2. We propose a new approach for treating occlusions in a general and more accurate way by additionally utilizing the geometrical information of the known environment, see page 785, paragraph 2. A surveyed scene contains static and dynamic objects – Static objects are racks, tables, etc, whose geometry, position and appearance are known and do not change over time (apart from possibly occurring shadows and from illumination changes caused by the dynamic obstacles, see page 786, section 2.1 Object Types. Dynamic objects are robots, conveyor belts, humans…This group must be divided into two subgroups: Subgroup 1 contains known dynamic object, with changing geometry, position, and appearance (e.g., roots and conveyor belts); Subgroup 2 contains dynamic objects with unknown changing geometry, position, and appearance (e.g., humans), see page 786, section 2.1 Object Types. Using several cameras with different perspectives onto the surveyed scene, each camera that is used provides a different occlusion and visibility situation, as discovered in the previous section that now has to be merged…all different labeled regions of all cameras are intersected among each other…the resulting intersections are grouped into connected components see page 788, section 2.3 Simultaneous Use of Several Cameras and figs. 2 and 3. Plausibility checks are utilized to revise reconstruction artifacts that do not contain an object, see for instance, page 785, paragraph 2 and page 789, section 2.4, Plausibility Checks. The surface voxel determination is described in section 3.1 on pages 790 and 791. In summary, each camera provides lists of potential surface voxels due to the segmentation – these voxels are sequentially tested in all other cameras – if no camera marks a voxel as free, it actually is a surface voxel, see page 791, paragraph 1.
It would have been obvious to one of ordinary skill in the art having the teachings of Würmlin and Kuhn in front of them before the effective filing date of the claimed invention to incorporate occlusion handling as taught by Kuhn into Würmlin 3D rendering system, as reconstructing objects using multiple cameras, classifying objects and labeling them such as described by Kuhn was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Würmlin. 
The modification of Würmlin with Kuhn would have explicitly allowed the plurality of images to include both dynamic and static obstacles.  Würmlin with Kuhn also teach specifying, for an element forming a three-dimensional space for generating the three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the object. 
The motivation for combining Würmlin with Kuhn would have been to improve occlusion and visibility handling, see for instance, Kuhn, abstract and page 785, paragraphs 1 and 2 and fig. 1.
Würmlin in view of Kuhn teach the broadest reasonably interpretation of the claimed limitation (e.g., when the number of cameras is two), in the interest of compact prosecution, Shin is being brought in to teach determining which views contain an image of an object. 
In the same art of object reconstruction, Shin teaches that the first step of our algorithm is object recognition across images…we start by determining the number and locations of objects in images, see section 3.1 Recognition. “We compute the camera projection matrices of each individual object using SfM technique. The point correspondences are optimized by the bundle adjustment optimization and the SfM yields both sparsely reconstructed set of 3D point coordinates X with camera matrices P. Assuming we are given K objects in N images, we can have following set of camera matrices and points…Note that not all objects have to be visible in all images; thus, if an object is missing in some images, the corresponding camera matrices and 3D points are not available”, see section 3.2 Camera Calibration.
It would have been obvious to one of ordinary skill in the art having the teachings of Würmlin, Kuhn, Shin in front of them before the effective filing date of the claimed invention to incorporate object visibility in dynamic scenes as taught by Shin into Würmlin 3D rendering system, as determining which camera views visualize a given object such as described by Shin was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Würmlin and Kuhn. 
The modification of Würmlin and Kuhn with Shin would have explicitly allowed specifying, for an element forming a three-dimensional space for generating the three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the object. 
The motivation for combining Würmlin and Kuhn with Shin would have been to improve dynamic scene reconstruction and visibility handling, see for instance, Shin, abstract.
Regarding claim 17, Würmlin teaches a method of generating three-dimensional shape data corresponding to an object (see for instance, Würmlin, abstract), the method comprising: 
obtaining image data representing an area of the object within a plurality of captured images obtained by a plurality of image capturing apparatuses that perform image capturing from a plurality of image capturing directions (The view from these cameras allows tracking of all objects, as long as they do not leave the field, see paragraph 54. The object identification method associates, for each visible object in each video stream, the object’s 2D position and shape in the color texture data with a real object (e.g., players, goalkeepers, referees, ball, etc) based on the camera calibration data, the information on the real-world objects contained in a resource data module and possibly also the extrapolated 3D object position and the 2D position and shape for essentially all objects in all frames of all cameras, see paragraph 147. This method calculates a segmentation or cutout of the color texture data inside the area defined by the object’s position and size between foreground (object) pixels and background, so-called alpha mask, see paragraph 168); 
obtaining image data representing an area of a structure that occludes the object at a time of image capturing from at least one image capturing direction of the plurality of image capturing directions(The situation can be seen in the view from one camera with both objects colliding in 2D, whereas the same situation from another camera view shows no collision of the objects…1. The tracking method can use the information that two or more objects collide in a certain view…2. The tracking method can keep track of the objects after a collision since it knows where the objects are located or are expected to be in 3D space, see paragraphs 143-145. Fig. 5 schematically shows a 2D object position and size structure, in a bounding box and also depicts the difference between foreground and background and the alpha mask resulting from the cutout or segmentation method, see paragraph 169. If the bounding box does not contain the entire object or intersects the bounding box, the bounding box can be enlarged by a certain size, see paragraph 170. This method calculates a segmentation or cutout of the color texture data inside the area defined by the object’s position and size between foreground (object) pixels and background, so-called alpha mask, see paragraph 168); 
specifying, for an element forming a three-dimensional space for generating the three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure; and
generating three-dimensional shape data corresponding to the object based on the obtained image data representing the area of the object, the obtained image data representing the area of the structure, and the specified number of the image capturing apparatuses (3D reconstruction of the scene is performed, see for instance, paragraphs 4, 52, and 195. Objects are rendered from a virtual view using a particular 3D representation of the scene and using the object textures and either fixed or view dependent alpha values…rendering of geometry of the background by alpha blending and depth buffering with the already rendered objects and by blending one or more hole-filled background textures…, see paragraphs 195-201).
Würmlin does not teach specifying/determining the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding in the area of the structure, and thus does not teach specifying, for an element forming a three-dimensional space for generating the three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure.
In the same art of multi-view reconstruction, Kuhn teaches that the reconstruction of objects based on its silhouettes in multiple cameras is known as surface from silhouette or inferred visual hull…almost all approaches have the assumptions that the object(s) to reconstruct is not occluded by static obstacles, see page 784, paragraph 2. We propose a new approach for treating occlusions in a general and more accurate way by additionally utilizing the geometrical information of the known environment, see page 785, paragraph 2. A surveyed scene contains static and dynamic objects – Static objects are racks, tables, etc, whose geometry, position and appearance are known and do not change over time (apart from possibly occurring shadows and from illumination changes caused by the dynamic obstacles, see page 786, section 2.1 Object Types. Dynamic objects are robots, conveyor belts, humans…This group must be divided into two subgroups: Subgroup 1 contains known dynamic object, with changing geometry, position, and appearance (e.g., roots and conveyor belts); Subgroup 2 contains dynamic objects with unknown changing geometry, position, and appearance (e.g., humans), see page 786, section 2.1 Object Types. Using several cameras with different perspectives onto the surveyed scene, each camera that is used provides a different occlusion and visibility situation, as discovered in the previous section that now has to be merged…all different labeled regions of all cameras are intersected among each other…the resulting intersections are grouped into connected components see page 788, section 2.3 Simultaneous Use of Several Cameras and figs. 2 and 3. Plausibility checks are utilized to revise reconstruction artifacts that do not contain an object, see for instance, page 785, paragraph 2 and page 789, section 2.4, Plausibility Checks. The surface voxel determination is described in section 3.1 on pages 790 and 791. In summary, each camera provides lists of potential surface voxels due to the segmentation – these voxels are sequentially tested in all other cameras – if no camera marks a voxel as free, it actually is a surface voxel, see page 791, paragraph 1.
It would have been obvious to one of ordinary skill in the art having the teachings of Würmlin and Kuhn in front of them before the effective filing date of the claimed invention to incorporate occlusion handling as taught by Kuhn into Würmlin 3D rendering system, as reconstructing objects using multiple cameras, classifying objects and labeling them such as described by Kuhn was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Würmlin. 
The modification of Würmlin with Kuhn would have explicitly allowed the plurality of images to include both dynamic and static obstacles.  Würmlin with Kuhn also teach specifying, for an element forming a three-dimensional space for generating the three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure. 
The motivation for combining Würmlin with Kuhn would have been to improve occlusion and visibility handling, see for instance, Kuhn, abstract and page 785, paragraphs 1 and 2 and fig. 1.
Würmlin in view of Kuhn teach the broadest reasonably interpretation of the claimed limitation (e.g., when the number of cameras is two), in the interest of compact prosecution, Shin is being brought in to teach determining which views contain an image of an object/structure. 
In the same art of object reconstruction, Shin teaches that the first step of our algorithm is object recognition across images…we start by determining the number and locations of objects in images, see section 3.1 Recognition. “We compute the camera projection matrices of each individual object using SfM technique. The point correspondences are optimized by the bundle adjustment optimization and the SfM yields both sparsely reconstructed set of 3D point coordinates X with camera matrices P. Assuming we are given K objects in N images, we can have following set of camera matrices and points…Note that not all objects have to be visible in all images; thus, if an object is missing in some images, the corresponding camera matrices and 3D points are not available”, see section 3.2 Camera Calibration.
It would have been obvious to one of ordinary skill in the art having the teachings of Würmlin, Kuhn, Shin in front of them before the effective filing date of the claimed invention to incorporate object visibility in dynamic scenes as taught by Shin into Würmlin 3D rendering system, as determining which camera views visualize a given object such as described by Shin was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Würmlin and Kuhn. 
The modification of Würmlin and Kuhn with Shin would have explicitly allowed specifying, for an element forming a three-dimensional space for generating the three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure. 
The motivation for combining Würmlin and Kuhn with Shin would have been to improve dynamic scene reconstruction and visibility handling, see for instance, Shin, abstract.
Regarding claim 18, Würmlin teaches a non-transitory computer readable storage medium storing a program for causing a computer to execute a method of generating three-dimensional shape data corresponding to an object (see for instance, Würmlin, paragraph 107), the method comprising:
obtaining image data representing an area of the object within a plurality of captured images obtained by a plurality of image capturing apparatuses that perform image capturing from a plurality of image capturing directions (The view from these cameras allows tracking of all objects, as long as they do not leave the field, see paragraph 54. The object identification method associates, for each visible object in each video stream, the object’s 2D position and shape in the color texture data with a real object (e.g., players, goalkeepers, referees, ball, etc) based on the camera calibration data, the information on the real-world objects contained in a resource data module and possibly also the extrapolated 3D object position and the 2D position and shape for essentially all objects in all frames of all cameras, see paragraph 147. This method calculates a segmentation or cutout of the color texture data inside the area defined by the object’s position and size between foreground (object) pixels and background, so-called alpha mask, see paragraph 168);
obtaining image data representing an area of a structure that occludes the object at a time of image capturing from at least one image capturing direction of the plurality of image capturing directions(The situation can be seen in the view from one camera with both objects colliding in 2D, whereas the same situation from another camera view shows no collision of the objects…1. The tracking method can use the information that two or more objects collide in a certain view…2. The tracking method can keep track of the objects after a collision since it knows where the objects are located or are expected to be in 3D space, see paragraphs 143-145. Fig. 5 schematically shows a 2D object position and size structure, in a bounding box and also depicts the difference between foreground and background and the alpha mask resulting from the cutout or segmentation method, see paragraph 169. If the bounding box does not contain the entire object or intersects the bounding box, the bounding box can be enlarged by a certain size, see paragraph 170. This method calculates a segmentation or cutout of the color texture data inside the area defined by the object’s position and size between foreground (object) pixels and background, so-called alpha mask, see paragraph 168);
specifying, for an element forming a three-dimensional space for generating the three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure; and
generating three-dimensional shape data corresponding to the object based on the obtained image data representing the area of the object, the obtained image data representing the area of the structure, and the specified number of the image capturing apparatuses (3D reconstruction of the scene is performed, see for instance, paragraphs 4, 52, and 195. Objects are rendered from a virtual view using a particular 3D representation of the scene and using the object textures and either fixed or view dependent alpha values…rendering of geometry of the background by alpha blending and depth buffering with the already rendered objects and by blending one or more hole-filled background textures…, see paragraphs 195-201).
Würmlin does not teach specifying/determining the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding in the area of the structure, and thus does not teach specifying, for an element forming a three-dimensional space for generating the three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure.
In the same art of multi-view reconstruction, Kuhn teaches that the reconstruction of objects based on its silhouettes in multiple cameras is known as surface from silhouette or inferred visual hull…almost all approaches have the assumptions that the object(s) to reconstruct is not occluded by static obstacles, see page 784, paragraph 2. We propose a new approach for treating occlusions in a general and more accurate way by additionally utilizing the geometrical information of the known environment, see page 785, paragraph 2. A surveyed scene contains static and dynamic objects – Static objects are racks, tables, etc, whose geometry, position and appearance are known and do not change over time (apart from possibly occurring shadows and from illumination changes caused by the dynamic obstacles, see page 786, section 2.1 Object Types. Dynamic objects are robots, conveyor belts, humans…This group must be divided into two subgroups: Subgroup 1 contains known dynamic object, with changing geometry, position, and appearance (e.g., roots and conveyor belts); Subgroup 2 contains dynamic objects with unknown changing geometry, position, and appearance (e.g., humans), see page 786, section 2.1 Object Types. Using several cameras with different perspectives onto the surveyed scene, each camera that is used provides a different occlusion and visibility situation, as discovered in the previous section that now has to be merged…all different labeled regions of all cameras are intersected among each other…the resulting intersections are grouped into connected components see page 788, section 2.3 Simultaneous Use of Several Cameras and figs. 2 and 3. Plausibility checks are utilized to revise reconstruction artifacts that do not contain an object, see for instance, page 785, paragraph 2 and page 789, section 2.4, Plausibility Checks. The surface voxel determination is described in section 3.1 on pages 790 and 791. In summary, each camera provides lists of potential surface voxels due to the segmentation – these voxels are sequentially tested in all other cameras – if no camera marks a voxel as free, it actually is a surface voxel, see page 791, paragraph 1.
It would have been obvious to one of ordinary skill in the art having the teachings of Würmlin and Kuhn in front of them before the effective filing date of the claimed invention to incorporate occlusion handling as taught by Kuhn into Würmlin 3D rendering system, as reconstructing objects using multiple cameras, classifying objects and labeling them such as described by Kuhn was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Würmlin. 
The modification of Würmlin with Kuhn would have explicitly allowed the plurality of images to include both dynamic and static obstacles.  Würmlin with Kuhn also teach specifying, for an element forming a three-dimensional space for generating the three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure. 
The motivation for combining Würmlin with Kuhn would have been to improve occlusion and visibility handling, see for instance, Kuhn, abstract and page 785, paragraphs 1 and 2 and fig. 1.
Würmlin in view of Kuhn teach the broadest reasonably interpretation of the claimed limitation (e.g., when the number of cameras is two), in the interest of compact prosecution, Shin is being brought in to teach determining which views contain an image of an object/structure. 
In the same art of object reconstruction, Shin teaches that the first step of our algorithm is object recognition across images…we start by determining the number and locations of objects in images, see section 3.1 Recognition. “We compute the camera projection matrices of each individual object using SfM technique. The point correspondences are optimized by the bundle adjustment optimization and the SfM yields both sparsely reconstructed set of 3D point coordinates X with camera matrices P. Assuming we are given K objects in N images, we can have following set of camera matrices and points…Note that not all objects have to be visible in all images; thus, if an object is missing in some images, the corresponding camera matrices and 3D points are not available”, see section 3.2 Camera Calibration.
It would have been obvious to one of ordinary skill in the art having the teachings of Würmlin, Kuhn, Shin in front of them before the effective filing date of the claimed invention to incorporate object visibility in dynamic scenes as taught by Shin into Würmlin 3D rendering system, as determining which camera views visualize a given object such as described by Shin was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Würmlin and Kuhn. 
The modification of Würmlin and Kuhn with Shin would have explicitly allowed specifying, for an element forming a three-dimensional space for generating the three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure. 
The motivation for combining Würmlin and Kuhn with Shin would have been to improve dynamic scene reconstruction and visibility handling, see for instance, Shin, abstract.
Regarding claim 19, Würmlin teaches a non-transitory computer readable storage medium storing a program for causing a computer to execute a method of generating three-dimensional shape data corresponding to an object (see for instance, Würmlin, paragraph 107), the method comprising:
obtaining image data representing an area of the object within a plurality of captured images obtained by a plurality of image capturing apparatuses that perform image capturing from a plurality of image capturing directions (The view from these cameras allows tracking of all objects, as long as they do not leave the field, see paragraph 54. The object identification method associates, for each visible object in each video stream, the object’s 2D position and shape in the color texture data with a real object (e.g., players, goalkeepers, referees, ball, etc) based on the camera calibration data, the information on the real-world objects contained in a resource data module and possibly also the extrapolated 3D object position and the 2D position and shape for essentially all objects in all frames of all cameras, see paragraph 147. This method calculates a segmentation or cutout of the color texture data inside the area defined by the object’s position and size between foreground (object) pixels and background, so-called alpha mask, see paragraph 168);
obtaining image data representing an area of a structure that occludes the object at a time of image capturing from at least one image capturing direction of the plurality of image capturing directions(The situation can be seen in the view from one camera with both objects colliding in 2D, whereas the same situation from another camera view shows no collision of the objects…1. The tracking method can use the information that two or more objects collide in a certain view…2. The tracking method can keep track of the objects after a collision since it knows where the objects are located or are expected to be in 3D space, see paragraphs 143-145. Fig. 5 schematically shows a 2D object position and size structure, in a bounding box and also depicts the difference between foreground and background and the alpha mask resulting from the cutout or segmentation method, see paragraph 169. If the bounding box does not contain the entire object or intersects the bounding box, the bounding box can be enlarged by a certain size, see paragraph 170. This method calculates a segmentation or cutout of the color texture data inside the area defined by the object’s position and size between foreground (object) pixels and background, so-called alpha mask, see paragraph 168);
specifying, for an element forming a three-dimensional space for generating the three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure; and 
generating three-dimensional shape data corresponding to the object based on the obtained image data representing the area of the object, the obtained image data representing the area of the structure, and the specified number of the image capturing apparatuses (3D reconstruction of the scene is performed, see for instance, paragraphs 4, 52, and 195. Objects are rendered from a virtual view using a particular 3D representation of the scene and using the object textures and either fixed or view dependent alpha values…rendering of geometry of the background by alpha blending and depth buffering with the already rendered objects and by blending one or more hole-filled background textures…, see paragraphs 195-201).
Würmlin does not teach specifying/determining the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding in the area of the structure, and thus does not teach specifying, for an element forming a three-dimensional space for generating the three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure.
In the same art of multi-view reconstruction, Kuhn teaches that the reconstruction of objects based on its silhouettes in multiple cameras is known as surface from silhouette or inferred visual hull…almost all approaches have the assumptions that the object(s) to reconstruct is not occluded by static obstacles, see page 784, paragraph 2. We propose a new approach for treating occlusions in a general and more accurate way by additionally utilizing the geometrical information of the known environment, see page 785, paragraph 2. A surveyed scene contains static and dynamic objects – Static objects are racks, tables, etc, whose geometry, position and appearance are known and do not change over time (apart from possibly occurring shadows and from illumination changes caused by the dynamic obstacles, see page 786, section 2.1 Object Types. Dynamic objects are robots, conveyor belts, humans…This group must be divided into two subgroups: Subgroup 1 contains known dynamic object, with changing geometry, position, and appearance (e.g., roots and conveyor belts); Subgroup 2 contains dynamic objects with unknown changing geometry, position, and appearance (e.g., humans), see page 786, section 2.1 Object Types. Using several cameras with different perspectives onto the surveyed scene, each camera that is used provides a different occlusion and visibility situation, as discovered in the previous section that now has to be merged…all different labeled regions of all cameras are intersected among each other…the resulting intersections are grouped into connected components see page 788, section 2.3 Simultaneous Use of Several Cameras and figs. 2 and 3. Plausibility checks are utilized to revise reconstruction artifacts that do not contain an object, see for instance, page 785, paragraph 2 and page 789, section 2.4, Plausibility Checks. The surface voxel determination is described in section 3.1 on pages 790 and 791. In summary, each camera provides lists of potential surface voxels due to the segmentation – these voxels are sequentially tested in all other cameras – if no camera marks a voxel as free, it actually is a surface voxel, see page 791, paragraph 1.
It would have been obvious to one of ordinary skill in the art having the teachings of Würmlin and Kuhn in front of them before the effective filing date of the claimed invention to incorporate occlusion handling as taught by Kuhn into Würmlin 3D rendering system, as reconstructing objects using multiple cameras, classifying objects and labeling them such as described by Kuhn was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Würmlin. 
The modification of Würmlin with Kuhn would have explicitly allowed the plurality of images to include both dynamic and static obstacles.  Würmlin with Kuhn also teach specifying, for an element forming a three-dimensional space for generating the three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure. 
The motivation for combining Würmlin with Kuhn would have been to improve occlusion and visibility handling, see for instance, Kuhn, abstract and page 785, paragraphs 1 and 2 and fig. 1.
Würmlin in view of Kuhn teach the broadest reasonably interpretation of the claimed limitation (e.g., when the number of cameras is two), in the interest of compact prosecution, Shin is being brought in to teach determining which views contain an image of an object/structure. 
In the same art of object reconstruction, Shin teaches that the first step of our algorithm is object recognition across images…we start by determining the number and locations of objects in images, see section 3.1 Recognition. “We compute the camera projection matrices of each individual object using SfM technique. The point correspondences are optimized by the bundle adjustment optimization and the SfM yields both sparsely reconstructed set of 3D point coordinates X with camera matrices P. Assuming we are given K objects in N images, we can have following set of camera matrices and points…Note that not all objects have to be visible in all images; thus, if an object is missing in some images, the corresponding camera matrices and 3D points are not available”, see section 3.2 Camera Calibration.
It would have been obvious to one of ordinary skill in the art having the teachings of Würmlin, Kuhn, Shin in front of them before the effective filing date of the claimed invention to incorporate object visibility in dynamic scenes as taught by Shin into Würmlin 3D rendering system, as determining which camera views visualize a given object such as described by Shin was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Würmlin and Kuhn. 
The modification of Würmlin and Kuhn with Shin would have explicitly allowed specifying, for an element forming a three-dimensional space for generating the three-dimensional shape data corresponding to the object, the number of image capturing apparatuses that obtain a captured image including a pixel or an area corresponding to the element in the area of the structure. 
The motivation for combining Würmlin and Kuhn with Shin would have been to improve dynamic scene reconstruction and visibility handling, see for instance, Shin, abstract.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J COBB whose telephone number is (571)270-3875. The examiner can normally be reached Monday - Friday, 11am - 7pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL J COBB/            Primary Examiner, Art Unit 2613