DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 2-11 are pending under this Office action.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-11 are rejected under 35 U.S.C. 103 as being unpatentable over Jain, etc. (US 5745126 A) in view of Utsugi, etc. (US 20180204387 A1).
Regarding claim 2, Jain teaches that an information processing apparatus (See Jain: Fig. 1, and Col. 17 Lines 9-23, “The high level architecture for a MPI video system so functioning is shown in a first level block diagram in FIG. 1. A image at a certain perspective from each camera 10a, 10b, . . . 10n is converted to its associated camera scene in camera screen buffers CSB 11a, 11b, . . . 11n. Multiple camera scenes are then assimilated into the environment model 13 by computer process in the Environ. Model Builder 12. A viewer 14 (shown in phantom line for not being part of the MPI video system of the present invention) can select his perspective at the Viewer Interface 15, and that perspective is communicated to the Environment Model via a computer process in Query Generator 16. The programmed reasoning system in the Environment Model 13 decides what to send via Display Control 17 to the Display 18 of the viewer 14”) comprising:
one or more memories (See Jain: Fig. 15, and Col. 12 Lines 16-24, “FIG. 15 is a pictorial representation of the distributed architecture of the GM-PPS portion of the MPI video system of the present invention wherein (i) a graphics and visualization workstation acts as the modeler, (ii) several workstations on the network act as slaves which process individual frames based on the master's request so as to (iii) physically store the processed frames either locally, in a nearby storage server, or, in the real-time case, as digitized information on a local or nearby frame-grabber”) storing instructions; and 
one or more processors executing the instructions (See Jain: Fig. 15, and Col. 31 Lines 42-47, “The central master computer and the remote slave computers communicate at a high symbolic level; minimal image information is exchanged. Hence only a very low network bandwidth is required for master-slave communication. The master-slave information exchange protocol is preferably as follows”) to:
transmit information for specifying data which corresponds to a time of a virtual viewpoint image to be generated and is used to generate the virtual viewpoint image to an apparatus which controls output of a plurality of items of data which is used to generate a virtual viewpoint image  (See Jain: Fig. 1, and Col. 17 Lines 16-23, “A viewer 14 (shown in phantom line for not being part of the MPI video system of the present invention) can select his perspective at the Viewer Interface 15, and that perspective is communicated to the Environment Model via a computer process in Query Generator 16. The programmed reasoning system in the Environment Model 13 decides what to send via Display Control 17 to the Display 18 of the viewer 14”) and corresponds to a plurality of times (See Jain: Fig. 5, and Col. 22 Lines 15-24, “At any moment, there are several cameras that shoot the game. Automatic camera selection is a function that selects the best camera according to the preference of a user. Suppose a player is captured by three cameras and they produce three views shown in FIG. 5. In this case camera 2 is the best to see this player, for in camera 1 the player is out of the area while in camera 3 the player is too small. Different cameras provide focus on different objects. Depending on the current interest, an appropriate camera must be selected”);
obtain the data which corresponds to the time specified based on the transmitted information (See Jain: Fig. 1, and Col. 9 Lines 8-10, “The computer also receives from a prospective user/viewer of the scene a user/viewer-specified criterion relative to which criterion the user/viewer wishes to view the scene”) and is used to generate the virtual viewpoint image; and 
generate (See Jain: Fig. 1, and Col. 10 Lines 17-30, “Fourth, the user/viewer-specified criterion may be of a particular object in the scene. In this case the computer will combine the images from the multiple video cameras not only so as to generate a three-dimensional video model of the scene, but so as to generate a model in which objects in the scene are identified. The computer will subsequently produce, and the display will subsequently show, the particular image--whether real or virtual--appropriate to best show the selected object. Clearly this is a feedback loop: the location of an object in the scene serves to influence, in accordance with a user/viewer selection of the object, how the scene is shown. Clearly the same video scene could be, if desired, shown over and over, each time focusing view on a different selected object in the scene”) the virtual viewpoint image corresponding to the time in accordance with the obtained data.
However, Jain fails to explicitly disclose that one or more memories storing instructions; and is used to generate the virtual viewpoint image; and generate the virtual viewpoint image corresponding to the time in accordance with the obtained data.
However, Utsugi teaches that one or more memories storing instructions (See Utsugi: Figs. 3-4, and [0061], “The image generation device 302 includes an arithmetic processing unit (CPU) 401, which operates in accordance with a program stored in a main memory unit 404, the main memory unit 404, which stores a program to be executed by the arithmetic processing unit 401, an input interface 403, which receives a signal from an input device 308, an auxiliary storage unit 405, which stores data necessary for operation, a network interface 402, which controls communication to/from the network, and an image processing unit 406, which generates an image to be output from the output device 303”); 
and is used to generate the virtual viewpoint image (See Utsugi: Fig. 2, and [0139], “In the third embodiment, a description is given of a procedure of setting the position and angle of a virtual viewpoint for viewing the virtual space 200. In the third embodiment, a method referred to as a target camera, which defines the position and angle of the virtual viewpoint with three-dimensional coordinates U1 serving as a center (position of photographic subject) of the visual field and three-dimensional coordinates U2 serving as the position of the virtual viewpoint (camera)”); and 
generate the virtual viewpoint image corresponding to the time in accordance with the obtained data (See Utsugi: Fig. 2, and [0139], “A translation matrix T centered around U2 and a rotation matrix R with the direction from U1 to U2 being set to the line-of-sight direction (Z-axis) of the camera. The view matrix V is defined by a product V=RT of the matrices. Such a method of setting the viewpoint position is a method generally known as the target camera. In the following description, U1 is referred to as a point of focus, and U2 is referred to as a virtual viewpoint. Further, a distance |U2−U1| between U2 and U1 is referred to as a target camera distance d. A generally known method can be adopted as calculation processing of defining the viewpoint position U2 to create the view matrix V using the angle R, the target camera distance d, and the point of focus U1 as input”; and [0159], “In a fourth embodiment of this invention, a description is given of a method of operating a device for capturing an entire peripheral image existing in the real space 100 by the viewer interacting with a display marker on the three-dimensional map. In the fourth embodiment, only the configuration and processing different from those of the embodiments described above are described, and a description of the same configuration and processing is omitted”).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Jain to have one or more memories storing instructions; and is used to generate the virtual viewpoint image; and generate the virtual viewpoint image corresponding to the time in accordance with the obtained data as taught by Utsugi in order to display the customized image which the user wants to see (See Utsugi: Fig. 3, and [0157], “As described above, according to the third embodiment of this invention, the image generation device 302 generates image data for displaying at least one marker in the virtual three-dimensional space near the photographing position, the shape of the virtual screen or an entire peripheral image to be mapped to the virtual screen is changed when operation of the marker is detected, and thus it is possible to display an image that the user desires to view in accordance with operation by the user”). Jain teaches a method and system that may generate an 3D models of the real word from multiple camera captured images and present 2D images to user based on the user selected spatial perspective on the scene; while Utsugi teaches a system and method that may arrange 3D objects in real world, map to virtual 3D space, and present the virtual viewpoint images to the user interactively. Therefore, it is obvious to one of ordinary skill in the art to modify Jain by Utsugi to present virtual viewpoint images to users interactively. The motivation to modify Jain by Utsugi is “Use of known technique to improve similar devices (methods, or products) in the same way”.
Regarding claim 3, Jain and Utsugi teach all the features with respect to claim 2 as outlined above. Further, Jain and Utsugi teach that the information processing apparatus according to claim 2, wherein the obtained data includes a foreground image which includes an object and corresponds to the time specified based on the information (See Jain: Fig. 12, and Col. 28 Lines 57-64, “Two key aspects of the architecture diagrammed in FIG. 12 are the (i) static model and the (ii) dynamic model. The static model contains a priori information such as camera calibration parameters, look-up tables and obstacle information. The dynamic model contains task specific information like two dimensional and three dimensional maps, dynamic objects, states of objects in the scene (e.g., a particular human is mobile, or the robot vehicle immobile), etc.”) and a background image which does not include the object and corresponds to the time specified based on the information (See Utsugi: Fig. 2, and [0111], “At this time, this color information may be shaded depending on a specific light source condition or subjected to translucence processing of blending this color information with an already rendered background for rendering. Methods widely used in known technologies, for example, OpenGL, may be used as a method of shading or translucence processing. Further, the character strings 251 to 253 read from the database 306 are rendered at positions near the three-dimensional models 231 to 233. In the following, the rendered character strings are referred to as text captions”).
Regarding claim 4, Jain and Utsugi teach all the features with respect to claim 3 as outlined above. Further, Utsugi teaches that the information processing apparatus according to claim 3, wherein the background image corresponds to a time which is closest to the time specified based on the information  (See Utsugi: Fig. 2, and [0115], “At this time, this color information may be shaded depending on a specific light source condition or subjected to translucence processing of blending this color information with an already rendered background for rendering. Methods widely used in known technologies, for example, OpenGL, may be used as a method of shading or translucence processing”).
Regarding claim 5, Jain and Utsugi teach all the features with respect to claim 3 as outlined above. Further, Utsugi teaches that the information processing apparatus according to claim 3, wherein the background image corresponds to a time which is earlier than and closest to the time specified based on the information  (See Utsugi: Fig. 2, and [0111], “At this time, this color information may be shaded depending on a specific light source condition or subjected to translucence processing of blending this color information with an already rendered background for rendering. Methods widely used in known technologies, for example, OpenGL, may be used as a method of shading or translucence processing. Further, the character strings 251 to 253 read from the database 306 are rendered at positions near the three-dimensional models 231 to 233. In the following, the rendered character strings are referred to as text captions”).
Regarding claim 6, Jain and Utsugi teach all the features with respect to claim 3 as outlined above. Further, Jain teaches that the information processing apparatus according to claim 3, wherein the background image corresponds to a time which is later than and closest to the time specified based on the information (See Jain: Fig. 12, and Col. 28 Lines 57-64, “Two key aspects of the architecture diagrammed in FIG. 12 are the (i) static model and the (ii) dynamic model. The static model contains a priori information such as camera calibration parameters, look-up tables and obstacle information. The dynamic model contains task specific information like two dimensional and three dimensional maps, dynamic objects, states of objects in the scene (e.g., a particular human is mobile, or the robot vehicle immobile), etc.”. Note that the static object models may be corresponding to the background later than the time when the dynamic models are presents).
Regarding claim 7, Jain and Utsugi teach all the features with respect to claim 2 as outlined above. Further, Jain teaches that the information processing apparatus according to claim 2, wherein the obtained data includes a three-dimensional model of an object corresponding to the time specified based on the information (See Jain: Fig. 12, and Col. 29 Lines 6-15, “A good three dimensional model is required to recognize complex static and moving obstacles. At a basic level, the multi-perspective perception system must maintain information about the positions of all the significant static obstacles and dynamic objects in the environment. In addition, the system must extract information from both the two-dimensional static model as well as the three-dimensional dynamic model. As such, a representation must be chosen that (i) facilitates maintenance of object positional information as well as (ii) supporting more sophisticated questions about object behavior”. Note that the dynamic odject models are time dependent).
Regarding claim 8, Jain and Utsugi teach all the features with respect to claim 2 as outlined above. Further, Jain teaches that the information processing apparatus according to claim 2, wherein the obtained data includes a background model corresponding to the time specified based on the information (See Jain: Fig. 12, and Col. 29 Lines 26-35, “When combined with information about the exact position and orientation of a camera, the a priori knowledge of the static environment is very rich source of information which has not previously received much attention. For each single view, the preferred system is able to compute the three dimensional position of each dynamic object detected by its motion segmentation component. To do so, the (i) a priori information about the scene and (ii) the camera calibration parameters are coupled with (iii) the assumption that all dynamic objects move on the ground surface”).
Regarding claim 9, Jain and Utsugi teach all the features with respect to claim 8 as outlined above. Further, Jain teaches that the information processing apparatus according to claim 8, wherein the background model corresponds to a time earlier than the time specified based on the information (See Jain: Fig. 12, and Col. 28 Lines 57-64, “Two key aspects of the architecture diagrammed in FIG. 12 are the (i) static model and the (ii) dynamic model. The static model contains a priori information such as camera calibration parameters, look-up tables and obstacle information. The dynamic model contains task specific information like two dimensional and three dimensional maps, dynamic objects, states of objects in the scene (e.g., a particular human is mobile, or the robot vehicle immobile), etc.”; and Col. 29 Lines 26-35, “When combined with information about the exact position and orientation of a camera, the a priori knowledge of the static environment is very rich source of information which has not previously received much attention. For each single view, the preferred system is able to compute the three dimensional position of each dynamic object detected by its motion segmentation component. To do so, the (i) a priori information about the scene and (ii) the camera calibration parameters are coupled with (iii) the assumption that all dynamic objects move on the ground surface”. Note that the static object models may be corresponding to the background earlier than the time when the dynamic models are presents).
Regarding claim 10, Jain and Utsugi teach all the features with respect to claim 2 as outlined above. Further, Jain and Utsugi teach that an information processing method (See Jain: Fig. 1, and Col. 17 Lines 9-23, “The high level architecture for a MPI video system so functioning is shown in a first level block diagram in FIG. 1. A image at a certain perspective from each camera 10a, 10b, . . . 10n is converted to its associated camera scene in camera screen buffers CSB 11a, 11b, . . . 11n. Multiple camera scenes are then assimilated into the environment model 13 by computer process in the Environ. Model Builder 12. A viewer 14 (shown in phantom line for not being part of the MPI video system of the present invention) can select his perspective at the Viewer Interface 15, and that perspective is communicated to the Environment Model via a computer process in Query Generator 16. The programmed reasoning system in the Environment Model 13 decides what to send via Display Control 17 to the Display 18 of the viewer 14”) comprising:
transmitting information for specifying data which corresponds to a time of a virtual viewpoint image to be generated and is used to generate the virtual viewpoint image to an apparatus which controls output of a plurality of items of data which is used to generate a virtual viewpoint image and corresponds to a plurality of times (See Jain: Fig. 1, and Col. 17 Lines 16-23, “A viewer 14 (shown in phantom line for not being part of the MPI video system of the present invention) can select his perspective at the Viewer Interface 15, and that perspective is communicated to the Environment Model via a computer process in Query Generator 16. The programmed reasoning system in the Environment Model 13 decides what to send via Display Control 17 to the Display 18 of the viewer 14”) and corresponds to a plurality of times (See Jain: Fig. 5, and Col. 22 Lines 15-24, “At any moment, there are several cameras that shoot the game. Automatic camera selection is a function that selects the best camera according to the preference of a user. Suppose a player is captured by three cameras and they produce three views shown in FIG. 5. In this case camera 2 is the best to see this player, for in camera 1 the player is out of the area while in camera 3 the player is too small. Different cameras provide focus on different objects. Depending on the current interest, an appropriate camera must be selected”);
obtaining the data which corresponds to the time specified based on the transmitted information (See Jain: Fig. 1, and Col. 9 Lines 8-10, “The computer also receives from a prospective user/viewer of the scene a user/viewer-specified criterion relative to which criterion the user/viewer wishes to view the scene”) and is used to generate the virtual viewpoint image (See Utsugi: Fig. 2, and [0139], “In the third embodiment, a description is given of a procedure of setting the position and angle of a virtual viewpoint for viewing the virtual space 200. In the third embodiment, a method referred to as a target camera, which defines the position and angle of the virtual viewpoint with three-dimensional coordinates U1 serving as a center (position of photographic subject) of the visual field and three-dimensional coordinates U2 serving as the position of the virtual viewpoint (camera)”); and 
generating (See Jain: Fig. 1, and Col. 10 Lines 17-30, “Fourth, the user/viewer-specified criterion may be of a particular object in the scene. In this case the computer will combine the images from the multiple video cameras not only so as to generate a three-dimensional video model of the scene, but so as to generate a model in which objects in the scene are identified. The computer will subsequently produce, and the display will subsequently show, the particular image--whether real or virtual--appropriate to best show the selected object. Clearly this is a feedback loop: the location of an object in the scene serves to influence, in accordance with a user/viewer selection of the object, how the scene is shown. Clearly the same video scene could be, if desired, shown over and over, each time focusing view on a different selected object in the scene”) the virtual viewpoint image corresponding to the time in accordance with the obtained data (See Utsugi: Fig. 2, and [0139], “A translation matrix T centered around U2 and a rotation matrix R with the direction from U1 to U2 being set to the line-of-sight direction (Z-axis) of the camera. The view matrix V is defined by a product V=RT of the matrices. Such a method of setting the viewpoint position is a method generally known as the target camera. In the following description, U1 is referred to as a point of focus, and U2 is referred to as a virtual viewpoint. Further, a distance |U2−U1| between U2 and U1 is referred to as a target camera distance d. A generally known method can be adopted as calculation processing of defining the viewpoint position U2 to create the view matrix V using the angle R, the target camera distance d, and the point of focus U1 as input”; and [0159], “In a fourth embodiment of this invention, a description is given of a method of operating a device for capturing an entire peripheral image existing in the real space 100 by the viewer interacting with a display marker on the three-dimensional map. In the fourth embodiment, only the configuration and processing different from those of the embodiments described above are described, and a description of the same configuration and processing is omitted”).
Regarding claim 11, Jain and Utsugi teach all the features with respect to claim 2 as outlined above. Further, Jain and Utsugi teach that a non-transitory computer readable storage medium storing computer executable instructions for causing a computer to execute an information processing method (See Jain: Fig. 1, and Col. 17 Lines 9-23, “The high level architecture for a MPI video system so functioning is shown in a first level block diagram in FIG. 1. A image at a certain perspective from each camera 10a, 10b, . . . 10n is converted to its associated camera scene in camera screen buffers CSB 11a, 11b, . . . 11n. Multiple camera scenes are then assimilated into the environment model 13 by computer process in the Environ. Model Builder 12. A viewer 14 (shown in phantom line for not being part of the MPI video system of the present invention) can select his perspective at the Viewer Interface 15, and that perspective is communicated to the Environment Model via a computer process in Query Generator 16. The programmed reasoning system in the Environment Model 13 decides what to send via Display Control 17 to the Display 18 of the viewer 14”) comprising: 
transmitting information for specifying data which corresponds to a time of a virtual viewpoint image to be generated and is used to generate the virtual viewpoint image to an apparatus which controls output of a plurality of items of data which is used to generate a virtual viewpoint image and corresponds to a plurality of times (See Jain: Fig. 1, and Col. 17 Lines 16-23, “A viewer 14 (shown in phantom line for not being part of the MPI video system of the present invention) can select his perspective at the Viewer Interface 15, and that perspective is communicated to the Environment Model via a computer process in Query Generator 16. The programmed reasoning system in the Environment Model 13 decides what to send via Display Control 17 to the Display 18 of the viewer 14”) and corresponds to a plurality of times (See Jain: Fig. 5, and Col. 22 Lines 15-24, “At any moment, there are several cameras that shoot the game. Automatic camera selection is a function that selects the best camera according to the preference of a user. Suppose a player is captured by three cameras and they produce three views shown in FIG. 5. In this case camera 2 is the best to see this player, for in camera 1 the player is out of the area while in camera 3 the player is too small. Different cameras provide focus on different objects. Depending on the current interest, an appropriate camera must be selected”); 
obtaining the data which corresponds to the time specified based on the transmitted information (See Jain: Fig. 1, and Col. 9 Lines 8-10, “The computer also receives from a prospective user/viewer of the scene a user/viewer-specified criterion relative to which criterion the user/viewer wishes to view the scene”) and is used to generate the virtual viewpoint image (See Utsugi: Fig. 2, and [0139], “In the third embodiment, a description is given of a procedure of setting the position and angle of a virtual viewpoint for viewing the virtual space 200. In the third embodiment, a method referred to as a target camera, which defines the position and angle of the virtual viewpoint with three-dimensional coordinates U1 serving as a center (position of photographic subject) of the visual field and three-dimensional coordinates U2 serving as the position of the virtual viewpoint (camera)”); and 
generating (See Jain: Fig. 1, and Col. 10 Lines 17-30, “Fourth, the user/viewer-specified criterion may be of a particular object in the scene. In this case the computer will combine the images from the multiple video cameras not only so as to generate a three-dimensional video model of the scene, but so as to generate a model in which objects in the scene are identified. The computer will subsequently produce, and the display will subsequently show, the particular image--whether real or virtual--appropriate to best show the selected object. Clearly this is a feedback loop: the location of an object in the scene serves to influence, in accordance with a user/viewer selection of the object, how the scene is shown. Clearly the same video scene could be, if desired, shown over and over, each time focusing view on a different selected object in the scene”) the virtual viewpoint image corresponding to the time in accordance with the obtained data (See Utsugi: Fig. 2, and [0139], “A translation matrix T centered around U2 and a rotation matrix R with the direction from U1 to U2 being set to the line-of-sight direction (Z-axis) of the camera. The view matrix V is defined by a product V=RT of the matrices. Such a method of setting the viewpoint position is a method generally known as the target camera. In the following description, U1 is referred to as a point of focus, and U2 is referred to as a virtual viewpoint. Further, a distance |U2−U1| between U2 and U1 is referred to as a target camera distance d. A generally known method can be adopted as calculation processing of defining the viewpoint position U2 to create the view matrix V using the angle R, the target camera distance d, and the point of focus U1 as input”; and [0159], “In a fourth embodiment of this invention, a description is given of a method of operating a device for capturing an entire peripheral image existing in the real space 100 by the viewer interacting with a display marker on the three-dimensional map. In the fourth embodiment, only the configuration and processing different from those of the embodiments described above are described, and a description of the same configuration and processing is omitted”).


Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GORDON G LIU whose telephone number is (571)270-0382. The examiner can normally be reached Monday - Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on 571-272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GORDON G LIU/Primary Examiner, Art Unit 2612