DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-18 and 20-30 are pending under this Office action.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 2/11/2022 has been entered.

Response to Amendment
Applicant's arguments filed on February 11, 2022, have been fully considered.
Applicant argues that the independent claims 1 and 12-13 are amended with new limitations of "a second generation device configured to generate a part image corresponding to a part of an image of an imaging region captured by a second imaging apparatus different from the first imaging apparatus, configured to transmit the part image, and configured to control not to transmit an image which corresponds to another part of the image of the imaging region captured by the second imaging apparatus and which is different from the part image, the part image being used for determining a color of a background region in the virtual viewpoint image to be generated" (emphasis added). Applicant argues that the prior arts 
Examiner replies that the newly added limitations may overcome the current rejection for the independent claims 1 and 12-13. However, the fourth art on record, Handa, etc. (US 20190356906 A1), teaches that and configured to control not to transmit an image which corresponds to another part of the image of the imaging region captured by the second imaging apparatus and which is different from the part image (See Handa: Figs. 34-35, and [0371], "Next, the background extraction unit 05004 reads a portion of the background image 05002 and transmits the portion of the background image 05002 to the transmission unit 06120. In a case where a plurality of cameras 112 are installed so that the entire field may be subjected to imaging without a blind angle when an image of a game, such as a soccer game, is to be captured in the stadium or the like, large portions of background information of the cameras 112 overlap with one another. Since the background information is large, the images may be transmitted after deleting the overlapping portions in terms of the transmission band restriction so that a transmission amount may be reduced. A flow of this process will be described with reference to FIG. 35D. In step S05010, the background extraction unit 05004 sets a center portion of the background image as denoted by a partial region 3401 surrounded by a dotted line in FIG. 34C, for example. Specifically, the partial region 3401 indicates a background region which is transmitted by the camera 112 itself and other portions in the background region are transmitted by the others of the cameras 112. In step S05011, the background extraction unit 05004 reads the set partial region 3401 in the background image. In step S05012, the background extraction unit 05004 outputs the partial region 3401 to the transmission unit 06120. The output background images are collected in the image computing server 200 and used as texture of a background model. Positions of extraction of the background images 05002 in the camera adapters 120 are set in accordance with predetermined parameter values so that lack of texture information for a background model does not occur. Normally, requisite minimum of the extraction regions is set so that an amount of transmission data is reduced. Accordingly, a large transmission amount of background information may be efficiently reduced and the system may cope with high resolution”). Note that the portion of the background images is transmitted for the virtual point image generation. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-18, 10-22, 24-25, 27-28, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Tamir, etc. (US 20080192116 A1) in view of Wurmlin, etc. (US 20090315978 A1), further in view of Mizuno (US 20180089842 A1), and Handa, etc. (US 20190356906 A1).
Regarding claim 1, Tamir teaches that an image processing system (See Tamir: Figs. 6A- B, and [0078], "FIG. 6a shows an automatic players/ball tracking and motion capture system 600 based on multiple (typically 2-3) pan/tilt/zoom robotic cameras 604 a ... n for automatic individual player identification. FIG. 6b shows a flow chart of a method of use. The system in FIG. 6a comprises in addition to the elements of system 200 an Identification Processing Unit (IDPU) 602 connected through a preferably Ethernet connection to system server 206 and operative to receive video streams from multiple robotic cameras 604") comprising:
a first generation device configured to generate a foreground image including a foreground object based on an image of an imaging region captured by a first imaging apparatus (See Tamir: Figs. 2-3, and [0069], "The calculated background is subtracted from the video frame by IPU 204 to create a foreground image in step 304. Separation of the required foreground objects (players, ball, referees, etc.) from the background scene can be done using a chroma-key method for cases where the playing field has a more or less uniform color (like grass in a typical soccer field), by subtracting a dynamically updated "background image" from the live frame for the case of stationary cameras, or by a combination of both methods. The foreground/background separation step is followed by thresholding, binarization, morphological noise cleaning processes and connection analysis (connecting isolated pixels in the generated foreground image to clusters) to specify "blobs"  representing foreground objects") and configured to transmit the foreground image (See Tamir: Figs. 2 and 4, and [0072], "In step 408, the team colors and/or uniform textures are analyzed by the IPU based on the locations of each segmented object and their count. For example, the goalkeeper of team 1 is specified by (a) being a single object and (b) a location near goal 1. The color and intensity histograms, as well as their vertical distributions, are then stored into the IPU to be later used for the assignment step of blobs to teams"), the foreground image being used for generating a virtual viewpoint image;
a second generation device configured to generate a part image corresponding to a part of an image of an imaging region captured by a second imaging apparatus different from the first imaging apparatus (See Tamir: Figs. 2-3, and [0069], "In one embodiment, system 200 is used to locate and track players in a team and assign each object to a particular team in real-time. The assignment is done without using any personal identification (ID). The process follows the steps shown in FIG. 3. The dynamic background of the playing field is calculated by IPU 204 in step 302. The dynamic background image is required in view of frequent lighting changes expected in the sports arena. It is achieved by means of median filter processing (or other appropriate methods) used to avoid the inclusion of moving objects in the background image being generated"), configured to transmit the part image (See Tamir: Fig. 2, and [0069], "Separation of the required foreground objects (players, ball, referees, etc.) from the background scene can be done using a chroma-key method for cases where the playing field has a more or less uniform color (like grass in a typical soccer field), by subtracting a dynamically updated "background image" from the live frame for the case of stationary cameras, or by a combination of both methods"), and configured to control not to transmit an image which corresponds to another part of the image of the imaging region captured by the second imaging apparatus and which is different from the part image, the part image being used for determining a color of a background region in the virtual viewpoint image to be generated; and
an image generation device configured to generate the virtual viewpoint image according to a virtual viewpoint based on the foreground image generated by the first generation device and the part image generated by the second generation device (See Tamir: Figs. 7, and [0085], "In step 742, a dynamic graphical environment may be created at the user's computer. This environment is composed of 3D specific player models having temporal behaviors selected in step 740, composed onto a 3D graphical model of the stadium or onto the real playing field separated in step 732. In step 744, the user may select a static or dynamic viewpoint to watch the play. For example, he/she can decide that they want to watch the entire match from the eyes of a particular player. The generated 3D environment is then dynamically rendered in step 746 to display the event from the chosen viewpoint. This process is repeated for every video frame, leading to  a generation of a 3D graphical representation of the real match in real time").
However, Tamir fails to explicitly disclose that configured to transmit the foreground image, the foreground image being used for generating a virtual viewpoint image; configured to transmit the part image, and configured to control not to transmit an image which corresponds to another part of the image of the imaging region captured by the second imaging apparatus and which is different from the part image, the part image being used for determining a color of a background region in the virtual viewpoint image to be generated; and an image generation device configured to generate the virtual viewpoint image.
However, Wurmlin teaches that an image generation device configured to generate the virtual viewpoint image (See Wurmlin: Figs. 7 and 12, and [0077-0080], "In a preferred variant of the invention, a synthesized view is provided which shows the scene from a virtual viewpoint that is distinct from the positions of the real cameras. This includes the steps of: providing camera parameters of a virtual camera; determining a background image as seen by the virtual camera; determining a projection of each of the objects into the virtual camera and superimposing it on the background image").
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Tamir to have an image generation device configured to generate the virtual viewpoint image as taught by Wurmlin in order to improve the final quality of the object  rendering (See  Wurmlin: [0092], "In another  preferred variant  of the invention, the billboards are augmented with a height field defining the coarse geometry of the object. That is, height fields are determined from two or more real camera views by for example shape- from-silhouettes or shape-from-stereo methods, as described  e.g. in  "Multiple View Geometry in Computer Vision", Richard Hartley and Andrew Zisserman, Cambridge University Press, 2000. These height fields are then preferably  used to  improve the final quality of the object rendering. The billboards can also be augmented using displacement-maps representing finer details of the object geometry. The latter is useful for faster rendering"). Tamir teaches a method and system that may track, identify various acting entities and capture the full motion of these entities in a sports event by separating the input images into background images and foreground images, recognizing the objects in the foreground  images, and providing the "on air" camera's field of view to the users; while Wurmlin teaches a system and method that may generate 3D representation of the dynamically changing 3D scene, track the objects in 3D position, and present a virtual viewpoint image to users by combining the processing two or more input video streams. Therefore, it is obvious to one of ordinary skill in the art to modify Tamir by Wurmlin to track the objects in 3D environments and present virtual viewpoint videos to users. The motivation to modify Tamir by Wurmlin is "Use of known technique to improve similar devices (methods, or products) in the same way".
However, Tamir, modified by Wurmlin, fails to explicitly disclose that configured to transmit the foreground image, the foreground image being used for generating a virtual viewpoint image; and configured to transmit the part image, and configured to control not to transmit an image which corresponds to another part of the image of the imaging region captured by the second imaging apparatus and which is different from the part image, the part image being used for determining a color of a background region in the virtual viewpoint image to be generated.
However, Mizuno teaches that configured to transmit the foreground image, the foreground image being used for generating a virtual viewpoint image (See Mizuno: Figs. 1-2, and [0029], "The database also holds an image of an object (a specific object) such as a player playing a game, and this image is held as a foreground image. The foreground image can be generated, by detecting an object from an image captured by the imaging apparatus 100 and separating a region representing this object"; and [0027], "The imaging apparatuses 100 are each, for example, a digital camera, and simultaneously perform image capturing based on a synchronization signal from an external synchronization apparatus (not illustrated). The image captured by each of the imaging apparatuses 100 is transmitted to an image generation apparatus 200, via a communication cable such as a local area network (LAN) cable. The communication cable is described using the LAN cable as an example, but may be a video transmission cable such as a DisplayPort cable and a High Definition Multimedia Interface (HDMI, registered trademark) cable. Images used in the present exemplary embodiment may each be an image captured using a still-image capturing function of the imaging apparatus 100, or an image captured using a moving-image capturing function of the imaging apparatus 100. The images will each be expressed below merely as an image or a captured image, without making a distinction in terms of whether the image is a still or moving image"); and
configured to transmit the part image, the part image being used for determining a color of a background region in the virtual viewpoint image to be generated (See Mizuno: Figs. 1-2, and [0029], "The image generation apparatus 200 is, for example, a server apparatus. The image generation apparatus 200 is an example of an image processing apparatus including a database function and an image processing function. A database of the image generation apparatus 200 holds beforehand a captured image of the sports stadium in a state where no object is present, such as a state before start of a game. This captured image is held as a background image"; and [0058], "In the present exemplary embodiment, the foreground image detection unit 207 determines whether the foreground image is included in the first virtual viewpoint  image, by executing the processing for  detecting the foreground image (the specific object), for the first virtual viewpoint image. However, this example is not !imitative. For example, the processing for detecting the foreground image may be executed in an apparatus different from the image generation apparatus 200. In this case, the image generation apparatus 200 acquires the result of the processing for detecting the foreground image, from this different apparatus").
Therefore, it  would have been obvious to  one of ordinary skill in the art at the time of the invention was effectively filed to  modify Tamir to  have configured to transmit the foreground image, the foreground image being used for generating a virtual viewpoint image; and configured to transmit the part image, the part image being used for determining a color of a background region in the virtual viewpoint image to be generated as taught by Mizuno in order to move the virtual viewpoint toward a player supported by the user, while performing less complicated user operation (See Mizuno: Fig. 5, and [0069], "This combined object is semitransparent and thus can be recognized as an object not appearing in the first virtual viewpoint image. In addition, the orientation and the shape of the body of this combined object are displayed in an as-is state. Therefore, the user can recognize the status of the object in a simple and intuitive manner. Accordingly, the user can move the virtual viewpoint toward a desired object, without performing complicated user operation"). Tamir teaches a method and system that may track, identify various acting entities and capture the full motion of these entities in a sports event by separating the input images into background images and foreground images, recognizing the objects in the foreground  images, and providing the "on air" camera's field of view to the users, and Mizuno teaches a system and method that may generate the virtual viewpoint image by combining the foreground image and background image with object detection within the foreground image so that users may focus on the specific object using less complicated user operations. Therefore, it is obvious to one of ordinary skill in the art to modify Tamir by Mizuno to track the objects in 3D environments within the foreground image and present virtual viewpoint videos to users. The motivation to modify Tamir by Mizuno is "Use of known technique to improve similar devices (methods, or products) in the same way".
However, Tamir, modified by Wurmlin and Mizuno, fails to explicitly disclose that configured to control not to transmit an image which corresponds to another part of the image of the imaging region captured by the second imaging apparatus and which is different from the part image.
However, Handa teaches that configured to control not to transmit an image which corresponds to another part of the image of the imaging region captured by the second imaging apparatus and which is different from the part image (See Handa: Figs. 34-35, and [0371], "Next, the background extraction unit 05004 reads a portion of the background image 05002 and transmits the portion of the background image 05002 to the transmission unit 06120. In a case where a plurality of cameras 112 are installed so that the entire field may be subjected to imaging without a blind angle when an image of a game, such as a soccer game, is to be captured in the stadium or the like, large portions of background information of the cameras 112 overlap with one another. Since the background information is large, the images may be transmitted after deleting the overlapping portions in terms of the transmission band restriction so that a transmission amount may be reduced. A flow of this process will be described with reference to FIG. 35D. In step S05010, the background extraction unit 05004 sets a center portion of the background image as denoted by a partial region 3401 surrounded by a dotted line in FIG. 34C, for example. Specifically, the partial region 3401 indicates a background region which is transmitted by the camera 112 itself and other portions in the background region are transmitted by the others of the cameras 112. In step S05011, the background extraction unit 05004 reads the set partial region 3401 in the background image. In step S05012, the background extraction unit 05004 outputs the partial region 3401 to the transmission unit 06120. The output background images are collected in the image computing server 200 and used as texture of a background model. Positions of extraction of the background images 05002 in the camera adapters 120 are set in accordance with predetermined parameter values so that lack of texture information for a background model does not occur. Normally, requisite minimum of the extraction regions is set so that an amount of transmission data is reduced. Accordingly, a large transmission amount of background information may be efficiently reduced and the system may cope with high resolution”).
Therefore, it  would have been obvious to  one of ordinary skill in the art at the time of the invention was effectively filed to  modify Tamir to  have configured to control not to transmit an image which corresponds to another part of the image of the imaging region captured by the second imaging apparatus and which is different from the part image as taught by Handa in order to reduce the transmission amount (See Handa: Figs. 35A-E, and [0371], " Since the background information is large, the images may be transmitted after deleting the overlapping portions in terms of the transmission band restriction so that a transmission amount may be reduced"). Tamir teaches a method and system that may track, identify various acting entities and capture the full motion of these entities in a sports event by separating the input images into background images and foreground images, recognizing the objects in the foreground  images, and providing the "on air" camera's field of view to the users, and Handa teaches a system and method that may generate the virtual viewpoint image by combining the foreground image and part of the background images. Therefore, it is obvious to one of ordinary skill in the art to modify Tamir by Handa to generate virtual viewpoint image with parts of the background image. The motivation to modify Tamir by Handa is "Use of known technique to improve similar devices (methods, or products) in the same way".
Regarding claim 2, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 1 as outlined above. Further, Tamir teaches that the image processing system according to claim 1,
wherein a plurality of imaging apparatuses included in a first imaging apparatus group which contains the first imaging apparatus and does not contain the second imaging apparatus are set up so as to be able to capture a region in the imaging region where the foreground object is placed, from a plurality of directions (See Tamir: Figs. 6A-B, and [0078], "FIG. 6a shows an automatic players/ball tracking and motion capture system 600 based on multiple (typically 2- 3) pan/tilt/zoom robotic cameras 604 a ...	n for automatic individual player identification. FIG. 6b shows a flow chart of a method of use. The system in FIG. 6a comprises in addition to the elements of system 200 an Identification Processing Unit (IDPU) 602 connected through a preferably Ethernet connection to system server 206 and operative to receive video streams from multiple robotic cameras 604"), and
wherein one or more imaging apparatuses included in a second imaging apparatus group which contains the second imaging apparatus and does not contain the first imaging apparatus are set up so as to be able to capture a region that is a background, in the imaging region (See Tamir: Fig. 8, and [0087], "FIG. 8 shows an embodiment of a system 800 of the present invention used to generate a "virtual camera flight"-type effect (very similar to  the visual effects shown in the movie "The Matrix") for a sports event. The effect includes generation of a "virtual flight clip" (VFC). System 800 comprises a plurality of high-resolution fixed cameras 802a-n arranged in groups around a sports arena 804. Each group includes at least one camera. All cameras are connected to a high resolution video recorder 806. The cameras can capture any event in a game on the playing field from multiple directions in a very high spatial resolution (.about.1 cm). All video outputs of all the cameras are continuously recorded on recorder 806. A VFC processor 808 is then used to pick selective recorded "real" frames of various cameras, create intermediate synthesized frames, arrange all real and synthesized frames in a correct order and generate the virtual flight clip intended to mimic the effect in "The Matrix" movie as an instant replay in sports events. The new video clip is composed of the real frames taken from the neighboring cameras (either simultaneously, if we "freeze" the action, or at progressing time periods when we let the action move slowly) as well as many synthesized (interpolated) frames inserted between the real ones").
Regarding claim 3, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 1 as outlined above. Further, Tamir teaches that the image processing system according to claim 1, wherein an imaging range of the second imaging apparatus is wider than an imaging range of the first imaging apparatus (See Tamir: Figs. SA-B, and [0076], "FIG. Sb shows a flow chart of a method for individual player identification implemented by sub-system 505, using a manual ID provided by the operator with the aid of the robotic camera. The tracking system provides an alert that a tracked player is either "lost" (i.e. the player is not detected by any camera) or that his ID certainty is low in step 520. The latter may occur e.g. if the player is detected but his ID is in question due to a collision between two players. The robotic camera automatically locks on the predicted location of this player (i.e. the location where the player was supposed to be based on his motion history) and zooms in to provide a high magnification video stream in step 522. The operator identifies the "lost" player using the robotic camera's video stream (displayed on a monitor) and indicates the player's identity to the system in step 524. As a result, the system now knows the player's ID and can continue the accumulation of personal statistics for this player as well as performance of various related functions". Zoom has narrow view).
Regarding claim 4, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 1 as outlined above. Further, Wurmlin teaches that the image processing system according to claim 1, wherein the imaging region captured by the second imaging apparatus comprises a stadium containing at least a field and spectator seats (See Wurmlin: Figs. 11-12, and [0056], "In a preferred variant of the invention, the objects are categorised as belonging to one of at least two categories. The categories preferably are based on a statistical model such as a Gaussian mixture model and including at least two of first team, second team, first team goalkeeper, second team goalkeeper, ball and referee. The parameters incorporated  by the statistical  model preferably  are the colour(s) of the objects. It is e.g. known that Team A is dressed with a first set of colours, Team Bin a second set, the goalkeepers of the teams have different colours than both teams, and the referee is predominantly black or another color, and the background green, white and a variety of other colours (colour of grass, markings, goal posts and spectators). Thus, the image is segmented not only by separating objects from background, but the objects are classified into different sets. The statistical model is preferably generated from a still image from one camera, and then applied to the video streams of all cameras. The statistical model is generated by, for each category of objects, the user moving, by means of a pointing device, a reference mark along a path over a variety of points that belong to said category. The colours of the points on said path form a sample representing said category in the generation of the statistical model"; and [0025], “If required, they shall be referred to as "background objects". Spectators in the environment, although in motion, are not considered to be "moving objects" for the purpose of this application”).
Regarding claim 5, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 4 as outlined above. Further, Tamir teaches that the image processing system according to claim 4, wherein a plurality of imaging apparatuses included in a first imaging apparatus group which contains the first imaging apparatus and does not contain the second imaging apparatus is set up so as to be able to image the foreground object in the field from a plurality of directions (See Tamir: Figs. 6A-B, and [0067], "In a first embodiment used for player assignment to teams and generation of a schematic template, cameras 202 are fixed cameras deployed together at a single physical location ("single location deployment") relative to the sports arena such that together they view the entire arena. Each camera covers one section of the playing field. Each covered section may be defined as the camera's field of view. The fields of view of any two cameras may overlap to some degree. In a second embodiment, the cameras are deployed in at least two  different locations ("multiple  location deployment")  so that each point in the sports arena is covered by at least one camera from each location. This allows calculation of the 3D locations of objects that are not confined to the flat playing field (like the ball in a soccer match) by means of triangulation. Preferably, in this second embodiment, the players are individually identified by an operator with the aid of an additional remotely controlled pan/tilt/zoom camera ("robotic camera"). The robotic camera is automatically aimed to the predicted location of a player "lost" by the system (i.e. that the system cannot identify any more) and provides a high magnification view of the player to the operator. In a third embodiment, robotic cameras are located in multiple locations (in addition to the fixed cameras that are used for objects tracking and motion capture). The robotic cameras are used to automatically lock on a "lost player", to zoom in and to provide high magnification views of the player from multiple directions").
Regarding claim 6, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 4 as outlined above. Further, Tamir and Wurmlin teach that the image processing system according to claim 4, wherein an imaging apparatus for imaging the field and an imaging apparatus for imaging the spectator seats are included in the second imaging apparatus group which contains the second imaging apparatus and does not contain the first imaging apparatus (See Tamir: Fig. 1, and [0029], "According to the present invention there is provided a system for real-time object localization, tracking and personal identification of players in a sports event comprising a plurality of cameras positioned at multiple locations relative to a sports playing field and operative to capture video of the playing field including objects located therein") and an imaging apparatus for imaging the spectator seats are included in the second imaging apparatus group (See Wurmlin: Figs. 7-9, and [0025], "Other key image elements are the background, which is essentially stationary, and which on the one hand may comprise the pitch or playing field, characteristic features such as lines and other marks on the pitch, walls and an environment (e.g. stadium) surrounding the pitch. If required, they shall be referred to as "background objects". Spectators in the environment, although in motion, are not considered to be "moving objects" for the purpose of this application").
Regarding claim 7, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 6 as outlined above. Further, Wurmlin teaches that the image processing system according to claim 6, wherein a plurality of part images includes a part image of the field that is generated based on the image captured by the imaging apparatus for imaging the field and a part image of the spectator seats that is generated based on the image captured by the imaging apparatus for imaging the spectator seats (See Wurmlin: Fig. 11, and [0070], "In a further  preferred  embodiment  of the invention, information  about  the location of landmarks is used to guide the inpainting. Landmarks are characteristic, immobile background features, typically linearly extended features such as straight or circular lines on the playing field, whose nominal location is known. This approach preferably incorporates knowledge from a field model including the location and orientation of the lines on the playing field. Preferably, this is combined with the 3D information about the location of patches along the landmark"; and [0025], “If required, they shall be referred to as "background objects". Spectators in the environment, although in motion, are not considered to be "moving objects" for the purpose of this application”).
Regarding claim 8, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 1 as outlined above. Further, Tamir teaches that the image processing system according to claim 1, 
wherein the first generation device is connected to the first imaging apparatus (See Tamir: Figs. SA-B, and [0076], "FIG. Sb shows a flow chart of a method for individual  player identification implemented  by sub-system  505, using a manual ID provided by the operator with the aid of the robotic camera. The tracking system provides an alert that a tracked player is either "lost" (i.e. the player is not detected by any camera) or that his ID certainty is low in step 520. The latter may occur e.g. if the player is detected but his ID is in question due to a collision between two players. The robotic camera automatically locks on the predicted location of this player (i.e. the location where the player was supposed to be based on his motion history) and zooms in to provide a high magnification video stream in step 522. The operator identifies the "lost" player using the robotic camera's video stream (displayed on a monitor) and indicates the player's identity to the system in step 524. As a result, the system now knows the player's ID and can continue the accumulation of personal statistics for this player as well as performance of various related functions"; and [0065], "IPU 204 communicates through an Ethernet or similar local area network (LAN) with a central server 206, which is operative to make "system level" decisions where information from more than a single camera is required, like decision on a "lost player", 3D localization and tracking, object history considerations, etc.; with a graphical overlay server 208 which is operative to generate a graphical display such as a top  view of the playing field with player icons (also referred to herein as a "schematic template"); with a team/player statistics server 210 which is operative to calculate team or player statistical functions like speed profiles, or accumulated distances based on object location information; and with a plurality of other applications servers 212 which are operative to perform other applications as listed in the Summary below"), and
wherein the second generation device is connected to the second imaging apparatus (See Tamir: Figs. 2-3, and [0066], "An output of graphical overlay server 208 feeds a video signal to at least one broadcast station and is displayed on viewers' TV sets. Outputs of team/player statistics server 210 are fed to a web site or to a broadcast station". Background images are sent to the server through Ethernet, and this may be mapped to transmit).
Regarding claim 9, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 8 as outlined above. Further, Tamir teaches that the image processing system according to claim 8, further comprising a foreground processing server configured to
generate shape data representing a shape of the foreground object based on a plurality of foreground images transmitted from a plurality of first generation devices respectively connected to a plurality of imaging apparatuses included in a first imaging apparatus group which contains the first imaging apparatus and does not contain the second imaging apparatus (See Tamir: Figs. 6A-B, and [0079], "In use, as shown in FIG. 6b, the method starts with step 620, which is essentially identical with step 520 above. Step 622 is similar to step 522, except that multiple robotic cameras (typically 2-3) are used instead of a single one. In step 624, the multiple video streams are fed into IDPU 602 and each stream is processed to identify a player by automatically recognizing his shirt's number or another unique pattern on his outfit. The assumption is that the number or unique pattern is exposed by at least one of the video streams, preferably originating from different viewpoints. The recognized player's ID is then conveyed to the system server (206) in step 626"), and
wherein the image generation device is configured to generate the virtual viewpoint image based on the generated shape data (See Tamir: Figs. 7A-B, and [0085], "In step 742, a dynamic graphical environment may be created at the user's computer. This environment is composed of 3D specific player models having temporal behaviors selected in step 740, composed onto a 3D graphical model of the stadium or onto the real playing field separated in step 732. In step 744, the user may select a static or dynamic viewpoint to watch the play. For example, he/she can decide that they want to watch the entire match from the eyes of a particular player. The generated 3D environment is then dynamically rendered in step 746 to display the event from the chosen viewpoint. This process is repeated for every video frame, leading to a generation of a 3D graphical representation of the real match in real time"; and Fig. 8-9, and [0089], "The process is schematically described in FIG. 9. Three symbolic representations of recorded frame sequences of 3 consecutive cameras, CAM.sub.i, CAM.sub.i+I and CAM.sub.i+2 are shown as 902,904 and 906, respectively. The VFC processor first receives a production requirement  as to  the temporal dynamics with which the play event is to be replayed. The VFC processor then calculates the identity of real frames that should be picked from consecutive real cameras (frames j, k, and m from cameras i, i+1 and i+2 respectively in this example) to create the sequences of intermediate synthesized frames, 908 and 910 respectively, to generate the virtual camera flight clip symbolically represented as 920").
Regarding claim 10, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 8 as outlined above. Further, Tamir teaches that the image processing system according to claim 8, further comprising a background processing server
to generate data for coloring the background region based on a plurality of part images transmitted from a plurality of second generation devices respectively connected to a plurality of imaging apparatuses included in a second imaging apparatus group which contains the second imaging apparatus and does not contain the first imaging apparatus (See Tamir: Figs. 2-3, and [0069], "In one embodiment, system 200 is used to locate and track players in a team and assign each object to a particular team in real-time. The assignment is done without using any personal identification (ID). The process follows the steps shown in FIG. 3. The dynamic background of the playing field is calculated by IPU 204 in step 302. The dynamic background image is required in view of frequent lighting changes expected in the sports arena. It is achieved by means of median filter processing (or other appropriate methods) used to avoid the inclusion of moving objects in the background image being generated. The calculated background is subtracted from the video frame by IPU 204 to create a foreground image in step 304. Separation of the required foreground objects (players, ball, referees, etc.) from the background scene can be done using a chroma-key method for cases where the playing field has a more or less uniform color (like grass in a typical soccer field), by subtracting a dynamically updated "background image" from the live frame for the case of stationary cameras, or by a combination of both methods. The foreground/background separation step is followed by thresholding, binarization, morphological noise cleaning processes and connection analysis (connecting isolated pixels in the generated foreground image to clusters) to specify "blobs" representing foreground objects. This is performed by IPU 204 in step 306. Each segmented blob is analyzed in step 308 by IPU 204 to assign the respective object to an identity group. Exemplarily, in a soccer match there are 6 identity groups--first team, second team, referees, ball, first goalkeeper, second goalkeeper. The blob analysis is implemented by correlating either the vertical color and/or intensity profiles or just the blob's color content (preferably all attributes) with pre-defined templates representing the various identity teams. Another type of blob analysis is the assignment of a given blob to other blobs in previous frames and to blobs identified in neighboring cameras, using methods like block matching and optical flow. This analysis is especially needed in cases of players' collisions and/or occlusions when a "joint blob" of two or more players needs to be segmented into its "components", a.k.a. the individual players. The last step in the blob analysis is the determination of the object's location in the camera's field of view. This is done is step 310"), and
wherein the image generation device is configured to generate the virtual viewpoint image based on the generated data to the background region (See Tamir: Figs. 7A-B, and [0085], "In step 742, a dynamic graphical environment may be created at the user's computer. This environment is composed of 3D specific player models having temporal behaviors selected in step 740, composed onto a 3D graphical model of the stadium or onto the real playing field separated in step 732. In step 744, the user may select a static or dynamic viewpoint to watch the play. For example, he/she can decide that they want to watch the entire match from the eyes of a particular player. The generated 3D environment is then dynamically rendered in step 746 to display the event from the chosen viewpoint. This process is repeated for every video frame, leading to a generation of a 3D graphical representation of the real match in real time"; and Fig. 8-9, and [0089], "The process is schematically described in FIG. 9. Three symbolic representations of recorded frame sequences of 3 consecutive cameras, CAM.sub.i, CAM.sub.i+I and CAM.sub.i+2 are shown as 902, 904 and 906, respectively. The VFC processor first receives a production requirement as to the temporal dynamics with which the play event is to be replayed. The VFC processor then calculates the identity of real frames that should be picked from consecutive real cameras (frames j, k, and m from cameras i, i+1 and i+2 respectively in this example) to create the sequences of intermediate synthesized frames, 908 and 910 respectively, to generate the virtual camera flight clip symbolically represented as 920").
Regarding claim 11, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 1 as outlined above. Further, Tamir teaches that the image processing system according to claim 1, wherein the foreground object includes at least any of a player, a ball, and a referee (See Tamir: Fig. 1, and [0064], "The following description is focused on soccer as an exemplary sports event. FIG. 1 shows various entities (also referred to as "objects") that appear in an exemplary soccer game: home and visitor (or "first and second" or "A and B") goalkeepers and players, one or more referees and the ball. The teams are separated and identifiable on the basis of their outfits (also referred to herein as "jerseys" or "shirts")").
Regarding claim 12, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 1 as outlined above. Further, Tamir, Wurmlin, Mizuno, and Handa teach that an image processing method (See Tamir: Figs. 6A-B, and [0078], "FIG. 6a shows an automatic players/ball tracking and motion capture system 600 based on multiple (typically 2-3) pan/tilt/zoom robotic cameras 604 a ... n for automatic individual player identification. FIG. 6b shows a flow chart of a method of use. The system in FIG. 6a comprises in addition to the elements of system 200 an Identification Processing Unit (IDPU) 602 connected through a preferably Ethernet connection to system server 206 and operative to receive video streams from multiple robotic cameras 604") comprising:
generating a foreground image including a foreground object based on an image of an imaging region captured by first imaging apparatus (See Tamir: Figs. 2-3, and [0069], "The calculated background is subtracted from the video frame by IPU 204 to create a foreground image in step 304. Separation of the required foreground objects (players, ball, referees, etc.) from the background scene can be done using a chroma-key method for cases where the playing field has a more or less uniform color (like grass in a typical soccer field), by subtracting a dynamically updated "background image" from the live frame for the case of stationary cameras, or by a combination of both methods. The foreground/background  separation  step is followed  by thresholding,  binarization, morphological noise cleaning processes and connection analysis (connecting isolated pixels in the generated foreground image to clusters) to specify "blobs" representing foreground objects"), the foreground image being used generating a virtual viewpoint image (See Tamir: Figs. 2 and 4, and [0072], "In step 408, the team colors and/or uniform textures are analyzed by the IPU based on the locations of each segmented object and their count. For example, the goalkeeper of team 1 is specified by (a) being a single object and (b) a location near goal 1. The color and intensity histograms, as well as their vertical distributions, are then stored into the IPU to be later used for the assignment step of blobs to teams") in a virtual viewpoint image to be generated (See Mizuno: Figs. 1-2, and [0029], "The database also holds an image of an object (a specific object) such as a player playing a game, and this image is held as a foreground image. The foreground image can be generated, by detecting an object from an image captured by the imaging apparatus 100 and separating a region representing this object"; and [0027], "The imaging apparatuses 100 are each, for example, a digital camera, and simultaneously perform image capturing based on a synchronization signal from an external synchronization apparatus (not illustrated). The image captured by each of the imaging apparatuses 100 is transmitted to an image generation apparatus 200, via a communication cable such as a local area network (LAN) cable. The communication cable is described using the LAN cable as an example, but may be a video transmission cable such as a DisplayPort cable and a High Definition Multimedia Interface (HDMI, registered trademark) cable. Images used in the present exemplary embodiment may each be an image captured using a still-image capturing function of the imaging apparatus 100, or an image captured using a moving-image capturing function of the imaging apparatus 100. The images will each be expressed below merely as an image or a captured image, without making a distinction in terms of whether the image is a still or moving image");
generating a part image corresponding to a part of an image of the imaging region captured by second imaging apparatus different from the first imaging apparatus (See Tamir: Figs. 2-3, and [0069], "In one embodiment, system 200 is used to locate and track players in a team and assign each object to a particular team in real-time. The assignment is done without using any personal identification (ID). The process follows the steps shown in FIG. 3. The dynamic background of the playing field is calculated by IPU 204 in step 302. The dynamic background image is required in view of frequent lighting changes expected in the sports arena. It is achieved by means of median filter processing (or other appropriate methods) used to avoid the inclusion of moving objects in the background image being generated"), transmit the part image and control not to transmit an image which corresponds to another part of the image of the imaging region captured by the second imaging apparatus and which is different from the part image (See Handa: Figs. 34-35, and [0371], "Next, the background extraction unit 05004 reads a portion of the background image 05002 and transmits the portion of the background image 05002 to the transmission unit 06120. In a case where a plurality of cameras 112 are installed so that the entire field may be subjected to imaging without a blind angle when an image of a game, such as a soccer game, is to be captured in the stadium or the like, large portions of background information of the cameras 112 overlap with one another. Since the background information is large, the images may be transmitted after deleting the overlapping portions in terms of the transmission band restriction so that a transmission amount may be reduced. A flow of this process will be described with reference to FIG. 35D. In step S05010, the background extraction unit 05004 sets a center portion of the background image as denoted by a partial region 3401 surrounded by a dotted line in FIG. 34C, for example. Specifically, the partial region 3401 indicates a background region which is transmitted by the camera 112 itself and other portions in the background region are transmitted by the others of the cameras 112. In step S05011, the background extraction unit 05004 reads the set partial region 3401 in the background image. In step S05012, the background extraction unit 05004 outputs the partial region 3401 to the transmission unit 06120. The output background images are collected in the image computing server 200 and used as texture of a background model. Positions of extraction of the background images 05002 in the camera adapters 120 are set in accordance with predetermined parameter values so that lack of texture information for a background model does not occur. Normally, requisite minimum of the extraction regions is set so that an amount of transmission data is reduced. Accordingly, a large transmission amount of background information may be efficiently reduced and the system may cope with high resolution”), the part image being used for determining a color of a background region in the virtual viewpoint image  (See Tamir: Fig. 2, and [0069], "Separation of the required foreground objects (players, ball, referees, etc.) from the background scene can be done using a chroma- key method for cases where the playing field has a more or less uniform color (like grass in a typical soccer field), by subtracting a dynamically updated "background image" from the live frame for the case of stationary cameras, or by a combination of both methods") to be generated (See Mizuno: Figs. 1-2, and [0029], "The image generation apparatus 200 is, for example, a server apparatus. The image generation apparatus 200 is an example of an image processing apparatus including a database function and an image processing function. A database of the image generation apparatus 200 holds beforehand a captured image of the sports stadium in a state where no object is present, such as a state before start of a game. This captured image is held as a background image"; and [0058], "In the present exemplary embodiment, the foreground image detection unit 207 determines whether the foreground image is included in the first virtual viewpoint image, by executing the processing for  detecting the foreground  image (the specific object), for  the first virtual viewpoint image. However, this example is not !imitative. For example, the processing for detecting the foreground image may be executed in an apparatus different from the image generation apparatus 200. In this case, the image generation apparatus 200 acquires the result of the processing for detecting the foreground image, from this different apparatus"); and 
generating a virtual viewpoint image according to a virtual viewpoint (See Wurmlin: Figs. 7 and 12, and [0077-0080], "In a preferred variant of the invention, a synthesized view is provided which shows the scene from a virtual viewpoint that is distinct from the positions of the real cameras. This includes the steps of: providing camera parameters of a virtual camera; determining a background image as seen by the virtual camera; determining a projection of each of the objects into the virtual camera and superimposing it on the background image") based on the generated foreground image and the generated part image (See Tamir: Figs. 7, and [0085], "In step 742, a dynamic graphical environment may be created at the user's computer. This environment is composed of 3D specific player models having temporal behaviors selected in step 740, composed onto a 3D graphical model of the stadium or onto the real playing field separated in step 732. In step 744, the user may select a static or dynamic viewpoint to watch the play. For example, he/she can decide that they want to watch the entire match from the eyes of a particular player. The generated 3D environment is then dynamically rendered in step 746 to display the event from the chosen viewpoint. This process is repeated for every video frame, leading to a generation of a 3D graphical representation of the real match in real time").
Regarding claim 13, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 12 as outlined above. Further, Tamir, Wurmlin, Mizuno, and Handa teach that a non-transitory computer-readable storage medium storing a computer program which, when run on a computer, causes the computer to carry out an image processing method (See Wurmlin: [0107], "A computer program product for generating a 3D representation of a dynamically changing 3D scene according to the invention is loadable into an internal memory of a digital computer, and includes computer program code means to make, when the computer program code means is loaded in the computer, the computer execute the method according to the invention. In a preferred embodiment of the invention, the computer program product includes a computer readable medium, having the computer program code means recorded thereon") comprising:
generating a foreground image including a foreground object based on an image of an imaging region captured by first imaging apparatus (See Tamir: Figs. 2-3, and [0069], "The calculated background is subtracted from the video frame by IPU 204 to create a foreground image in step 304. Separation of the required foreground objects (players, ball, referees, etc.) from the background scene can be done using a chroma-key method for cases where the playing field has a more or less uniform color (like grass in a typical soccer field), by subtracting a dynamically updated "background image" from the live frame for the case of stationary cameras, or by a combination of both methods. The foreground/background  separation  step is followed  by thresholding,  binarization, morphological noise cleaning processes and connection analysis (connecting isolated pixels in the generated foreground image to clusters) to specify "blobs" representing foreground objects"), the foreground image being used for generating a virtual viewpoint image (See Tamir: Figs. 2 and 4, and [0072], "In step 408, the team colors and/or uniform textures are analyzed by the IPU based on the locations of each segmented object and their count. For example, the goalkeeper of team 1 is specified by (a) being a single object and (b) a location near goal 1. The color and intensity histograms, as well as their vertical distributions, are then stored into the IPU to be later used for the assignment step of blobs to teams") in a virtual viewpoint image to be generated (See Mizuno: Figs. 1-2, and [0029], "The database also holds an image of an object (a specific object) such as a player playing a game, and this image is held as a foreground image. The foreground image can be generated, by detecting an object from an image captured by the imaging apparatus 100 and separating a region representing this object"; and [0027], "The imaging apparatuses 100 are each, for example, a digital camera, and simultaneously perform image capturing based on a synchronization signal from an external synchronization apparatus (not illustrated). The image captured by each of the imaging apparatuses 100 is transmitted to an image generation apparatus 200, via a communication cable such as a local area network (LAN) cable. The communication cable is described using the LAN cable as an example, but may be a video transmission cable such as a DisplayPort cable and a High Definition Multimedia Interface (HDMI, registered trademark) cable. Images used in the present exemplary embodiment may each be an image captured using a still-image capturing function of the imaging apparatus 100, or an image captured using a moving-image capturing function of the imaging apparatus 100. The images will each be expressed below merely as an image or a captured image, without making a distinction in terms of whether the image is a still or moving image");
generating a part image corresponding to a part of an image of the imaging region captured by second imaging apparatus different from the first imaging apparatus (See Tamir: Figs. 2-3, and [0069], "In one embodiment, system 200 is used to locate and track players in a team and assign each object to a particular team in real-time. The assignment is done without using any personal identification (ID). The process follows the steps shown in FIG. 3. The dynamic background of the playing field is calculated by IPU 204 in step 302. The dynamic background image is required in view of frequent lighting changes expected in the sports arena. It is achieved by means of median filter processing (or other appropriate methods) used to avoid the inclusion of moving objects in the background image being generated"), transmit the part image and control not to transmit an image which corresponds to another part of the image of the imaging region captured by the second imaging apparatus and which is different from the part image (See Handa: Figs. 34-35, and [0371], "Next, the background extraction unit 05004 reads a portion of the background image 05002 and transmits the portion of the background image 05002 to the transmission unit 06120. In a case where a plurality of cameras 112 are installed so that the entire field may be subjected to imaging without a blind angle when an image of a game, such as a soccer game, is to be captured in the stadium or the like, large portions of background information of the cameras 112 overlap with one another. Since the background information is large, the images may be transmitted after deleting the overlapping portions in terms of the transmission band restriction so that a transmission amount may be reduced. A flow of this process will be described with reference to FIG. 35D. In step S05010, the background extraction unit 05004 sets a center portion of the background image as denoted by a partial region 3401 surrounded by a dotted line in FIG. 34C, for example. Specifically, the partial region 3401 indicates a background region which is transmitted by the camera 112 itself and other portions in the background region are transmitted by the others of the cameras 112. In step S05011, the background extraction unit 05004 reads the set partial region 3401 in the background image. In step S05012, the background extraction unit 05004 outputs the partial region 3401 to the transmission unit 06120. The output background images are collected in the image computing server 200 and used as texture of a background model. Positions of extraction of the background images 05002 in the camera adapters 120 are set in accordance with predetermined parameter values so that lack of texture information for a background model does not occur. Normally, requisite minimum of the extraction regions is set so that an amount of transmission data is reduced. Accordingly, a large transmission amount of background information may be efficiently reduced and the system may cope with high resolution”), the part image being used for determining a color of a background region in the virtual viewpoint image (See Tamir: Fig. 2, and [0069], "Separation of the required foreground objects (players, ball, referees, etc.) from the background scene can be done using a chroma- key method for cases where the playing field has a more or less uniform color (like grass in a typical soccer field), by subtracting a dynamically updated "background image" from the live frame for the case of stationary cameras, or by a combination of both methods") to be generated (See Mizuno: Figs. 1-2, and [0029], "The image generation apparatus 200 is, for example, a server apparatus. The image generation apparatus 200 is an example of an image processing apparatus including a database function and an image processing function. A database of the image generation apparatus 200 holds beforehand a captured image of the sports stadium in a state where no object is present, such as a state before start of a game. This captured image is held as a background image"; and [0058], "In the present exemplary embodiment, the foreground image detection unit 207 determines whether the foreground image is included in the first virtual viewpoint image, by executing the processing for  detecting the foreground  image (the specific object), for  the first virtual viewpoint image. However, this example is not !imitative. For example, the processing for detecting the foreground image may be executed in an apparatus different from the image generation apparatus 200. In this case, the image generation apparatus 200 acquires the result of the processing for detecting the foreground image, from this different apparatus"); and 
generating a virtual viewpoint image according to a virtual viewpoint (See Wurmlin: Figs. 7 and 12, and [0077-0080], "In a preferred variant of the invention, a synthesized view is provided which shows the scene from a virtual viewpoint that is distinct from the positions of the real cameras. This includes the steps of: providing camera parameters of a virtual camera; determining a background image as seen by the virtual camera; determining a projection of each of the objects into the virtual camera and superimposing it on the background image") based on the generated foreground image and the generated part image (See Tamir: Figs. 7, and [0085], "In step 742, a dynamic graphical environment may be created at the user's computer. This environment is composed of 3D specific player models having temporal behaviors selected in step 740, composed onto a 3D graphical model of the stadium or onto the real playing field separated in step 732. In step 744, the user may select a static or dynamic viewpoint to watch the play. For example, he/she can decide that they want to watch the entire match from the eyes of a particular player. The generated 3D environment is then dynamically rendered in step 746 to display the event from the chosen viewpoint. This process is repeated for every video frame, leading to  a generation of a 3D graphical representation of the real match in real time").
Regarding claim 14, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 1 as outlined above. Further, Mizuno teaches that the image processing system according to claim 1, wherein the second generation device does not generate the image which corresponds to another part of the image of the imaging region captured by the second imaging apparatus and which is different from the part image (See Mizuno: Figs. 1-2, and [0029], "The image generation apparatus 200 is, for example, a server apparatus. The image generation apparatus 200 is an example of an image processing apparatus including a database function and an image processing function. A database of the image generation apparatus 200 holds beforehand a captured image of the sports stadium in a state where no object is present, such as a state before start of a game. This captured image is held as a background image").
Regarding claim 15, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 1 as outlined above. Further, Mizuno teaches that the image processing system according to claim 1, wherein the second generation device does not generate the image which corresponds to another part of the image of the imaging region captured by the second imaging apparatus and which is different from the part image (See Mizuno: Figs. 1-2, and [0058], "In the present exemplary embodiment, the foreground image detection unit 207 determines whether the foreground image is included in the first virtual viewpoint image, by executing the processing for detecting the foreground image (the specific object), for the first virtual viewpoint image. However, this example is not !imitative. For example, the processing for detecting the foreground image may be executed in an apparatus different from the image generation apparatus 200. In this case, the image generation apparatus 200 acquires the result of the processing for detecting the foreground image, from this different apparatus").
Regarding claim 16, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 1 as outlined above. Further, Tamir teaches that the image processing system according to claim 1,
image processing system according to claim 1, wherein the second generation device does not generate the image which corresponds to another part of the image of the imaging region captured by the second imaging apparatus and which is different from the part image (See Mizuno: Figs. 1-2, and [0036], "A user input unit 201 converts a transmission signal input from the  terminal apparatus 300 via the LAN cable, into virtual viewpoint information.  The user input unit 201 then outputs the virtual viewpoint information to a first virtual viewpoint image management unit 202"), and
wherein a plurality of second generation devices #s-respectively connected to a plurality of imaging apparatuses included in the second imaging apparatus group which contains the second imaging apparatus and does not contain the first imaging apparatus is connected via a second daisy chain different from the first daisy chain (See Mizuno: Figs. 1-2, and [0058], "In the present exemplary embodiment, the foreground image detection unit 207 determines whether the foreground image is included in the first virtual viewpoint image, by executing the processing for detecting the foreground image (the specific object), for the first virtual viewpoint image. However, this example is not !imitative. For example, the processing for detecting the foreground image may be executed in an apparatus different from the image generation apparatus 200. In this case, the image generation apparatus 200 acquires the result of the processing for detecting the foreground image, from this different apparatus").
Regarding claim 17, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 1 as outlined above. Further, Handa teaches that the image processing system according to claim 1, wherein the first generation device and the second generation device are connected via a daisy chain (See Handa: Fig. 1, and [0069], "First, an operation of transmitting 26 sets of images and sound of the sensor systems 110a to 110z from the sensor system 110z to the image computing server 200 will be described. In the image processing system 100 of this embodiment, the sensor systems 110a to 110z are connected to one another by daisy chain").
Regarding claim 18, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 1 as outlined above. Further, Handa teaches that the image processing system according to claim 1,
wherein the first generation device is configured to transmit the foreground image at a first frame rate (See Handa: Figs. 1-4, and [0120], "Furthermore, the background image and the foreground image may be captured in different frame rates. For example, in a case where a frame rate of the background image is 1 fps, one background image is captured per one second, and therefore, it may be determined that all the data has been obtained in a state in which a background image does not exist in a period of time in which a background image is not obtained"), and
wherein the second generation device is configured to transmit the background image at a second frame rate lower than the first frame rate (See Handa: Figs. 1-4, and [0127], "In a case where frame rates of the foreground images and the background images obtained by the data input controller 02120 are different from each other, it is difficult for the imaging data file generation unit 02180 to associate the foreground images and the background images obtained at the same time point with each other before the outputting. Therefore, the imaging data file generation unit 02180 associates a foreground image and a background image having time information having the relationship with time information of the foreground image based on a predetermined rule with each other before the outputting").
Regarding claim 20, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 19 as outlined above. Further, Handa teaches that the image processing system according to claim 15, wherein the background image generated by the first generation device is not transmitted from the first generation device (See Handa: Figs. 1-4, and [0127], "In a case where frame rates of the foreground images and the background images obtained by the data input controller 02120 are different from each other, it is difficult for the imaging data file generation unit 02180 to associate the foreground images and the background images obtained at the same time point with each other before the outputting. Therefore, the imaging data file generation unit 02180 associates a foreground image and a background image having time information having the relationship with time information of the foreground image based on a predetermined rule with each other before the outputting. Here, the background image having time information having the relationship  with time information of the foreground  image  based on a predetermined rule means a background image having time information most similar to the time information of the foreground image among the background images obtained by the imaging data file generation unit 02180, for example. In this way, by associating the foreground image with the background image based on the predetermined rule, even if the frame rates of the foreground image and the background image are different from each other, a virtual viewpoint image may be generated using the foreground image and the background image which are captured at the similar time points. Note that a method for associating the foreground image and the background image is not limited to the method described above. For example, the background image having time information having the relationship with time information of the foreground image based on the predetermined rule may be a background image having time information closest to the time information of the foreground image among obtained background images having time information corresponding to time points before a time point of the foreground image. According to this method, the foreground images and the background images which are associated with each other may be output with less delay without waiting for an obtainment of a background images having a frame rate lower than those of the foreground images. The background image having the time information having the relationship with the time information of the foreground image based on the predetermined rule may be a background image having time information closest to the time information of the foreground image among obtained background images having time information corresponding to time points after the time point of the foreground image").
Regarding claim 21, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 1 as outlined above. Further, Wurmlin teaches that the image processing system according to claim 1, wherein the imaging region captured by the second imaging apparatus comprises spectator seats in a stadium (See Wurmlin: Fig. 11, and [0070], "In a further  preferred  embodiment  of the invention, information  about  the location of landmarks is used to guide the inpainting. Landmarks are characteristic, immobile background features, typically linearly extended features such as straight or circular lines on the playing field, whose nominal location is known. This approach preferably incorporates knowledge from a field model including the location and orientation of the lines on the playing field. Preferably, this is combined with the 3D information about the location of patches along the landmark"; and [0025], “If required, they shall be referred to as "background objects". Spectators in the environment, although in motion, are not considered to be "moving objects" for the purpose of this application”).
Regarding claim 22, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 1 as outlined above. Further, Mizuno teaches that the image processing system according to claim 1, further comprising a third generation device configured to generate another part image corresponding to a part of an image of an imaging region captured by a third imaging apparatus different from the first imaging apparatus and the second imaging apparatus, configured to transmit said another part image, and configured to control not to transmit an image which corresponds to another part of the image of the imaging region captured by the third imaging apparatus and which is different from said another part image, said another part image being used for determining a color of a background region in the virtual viewpoint image to be generated, wherein the image generation device is configured to generate the virtual viewpoint image further based on the said another part image generated by the third generation device (See Mizuno: Fig. 7, and [0076], “In step S701, the Nth virtual viewpoint image management unit 601 generates N pieces of virtual viewpoint information by converting the first virtual viewpoint information. For example, a viewpoint (first other viewpoint) at a position behind the first virtual viewpoint and 10 m away therefrom is assumed to be second virtual viewpoint information. Further, a viewpoint (second other viewpoint) at a position on the left of the first virtual viewpoint and 10 m away therefrom is assumed to be third virtual viewpoint information. Subsequently, in step S702, the virtual viewpoint image generation unit 203 generates an N-number of virtual viewpoint images corresponding to the N pieces of virtual viewpoint information”; and Fig. 10, and [0026], “FIG. 10 is an example of an arrangement of the imaging apparatuses 100. The imaging apparatuses 100 are disposed so that a part or the whole of the sports stadium forms an imaging range”).
Regarding claim 24, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 22 as outlined above. Further, Handa teaches that the image processing system according to claim 22, further comprising a background processing server configured to generate data for coloring the background region based on the part image transmitted from the second generation device and said another part image transmitted from the third generation device, wherein the image generation device is configured to generate the virtual viewpoint image further based on the generated data for coloring the background region (See Handa: Fig. 4, and [0354], “After the data required for the generation of a file is buffered by the data synchronization unit 02130, various conversion processes including a process of developing RAW image data, correction of lens distortion, adjustment of colors and luminance values of images captured by the cameras, such as the foreground image and the background image, are performed (S02330)”).
Regarding claim 25, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 1 as outlined above. Further, Wurmlin and Handa teach that the image processing system according to claim 1, further comprising:
a foreground processing server configured to receive the foreground image transmitted from the first generation apparatus, and configured to generate shape data representing a shape of the foreground object based on the received foreground image (See Wurmlin: Fig. 1, and [0132], “In parallel, or following the calibration method 103, the tracking method 104 uses the digitized color texture data 121, camera calibration data of the actual and/or (depending on whether it is parallel or not) previous steps 131 and the extrapolated 3D object position 131 to determine the 2D position and shape 123 of all visible objects in each set of color texture data 121”); and
a background processing server configured to receive the part image transmitted from the second generation device, and configured to generate data for coloring the background region based on the received part image (See Handa: Fig. 4, and [0354], “After the data required for the generation of a file is buffered by the data synchronization unit 02130, various conversion processes including a process of developing RAW image data, correction of lens distortion, adjustment of colors and luminance values of images captured by the cameras, such as the foreground image and the background image, are performed (S02330)”).
Regarding claim 27, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 25 as outlined above. Further, Mizuno and Handa teach that the image processing system according to claim 25,
wherein the first generation apparatus transmits the foreground image via a first daisy chain (See Handa: Fig. 1, and [0069], "First, an operation of transmitting 26 sets of images and sound of the sensor systems 110a to 110z from the sensor system 110z to the image computing server 200 will be described. In the image processing system 100 of this embodiment, the sensor systems 110a to 110z are connected to one another by daisy chain"), and
wherein the second generation device transmits the part image via a second daisy chain different from the first daisy chain See Mizuno: Figs. 1-2, and [0058], "In the present exemplary embodiment, the foreground image detection unit 207 determines whether the foreground image is included in the first virtual viewpoint image, by executing the processing for detecting the foreground image (the specific object), for the first virtual viewpoint image. However, this example is not !imitative. For example, the processing for detecting the foreground image may be executed in an apparatus different from the image generation apparatus 200. In this case, the image generation apparatus 200 acquires the result of the processing for detecting the foreground image, from this different apparatus").
Regarding claim 28, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 25 as outlined above. Further, Handa teaches that the image processing system according to claim 25,
wherein the first generation apparatus transmits the foreground image via a daisy chain (See Handa: Fig. 1, and [0069], "First, an operation of transmitting 26 sets of images and sound of the sensor systems 110a to 110z from the sensor system 110z to the image computing server 200 will be described. In the image processing system 100 of this embodiment, the sensor systems 110a to 110z are connected to one another by daisy chain"), and
wherein the second generation device transmits the part image via the daisy chain via which the first generation apparatus transmits the foreground image (See Handa: Fig. 1, and [0069], "First, an operation of transmitting 26 sets of images and sound of the sensor systems 110a to 110z from the sensor system 110z to the image computing server 200 will be described. In the image processing system 100 of this embodiment, the sensor systems 110a to 110z are connected to one another by daisy chain").
Regarding claim 30, Tamir, Wurmlin, Mizuno, and Handa teach all the features with respect to claim 1 as outlined above. Further, Handa teaches that the image processing system according to claim 1, wherein the foreground image transmitted by the first generation device comprises information for specifying a first transmission destination, and wherein the part image transmitted by the second generation device comprises information for specifying a second transmission destination different from the first transmission destination (See Handa: Fig. 2, and [0097], “The data routing processor 06122 determines routing destinations of data received by the data transmission/reception unit 06111 and data processed by the image processor 06130 using data stored in a data routing information storage unit 06125 to be described below. The data routing processor 06122 further has a function of transmitting data to a determined routing destination. The routing destination preferably corresponds to one of the camera adapters 120 which corresponds to one of the cameras 112 which focuses on the same gazing point in terms of image processing since the image frame correlation among the cameras 112 is high. Order of the camera adapters 120 which output the foreground images and the background images in a relay manner in the image processing system 100 is determined in accordance with determinations performed by the data routing processor 06122 of the plurality of camera adapters 120”).

Allowable Subject Matter
Claims 23, 26, and 29, are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Dzhurinskiy, etc. (US 20150054823 A1) which teaches that the stadium seats may be in the foreground or background images, and this may be used to map the insignificant cited terms “the stadium seats” in some dependent claims.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GORDON G LIU whose telephone number is (571)270-0382. The examiner can normally be reached Monday - Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on 571-272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GORDON G LIU/             Primary Examiner, Art Unit 2612