Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4, 5, and 8-13 are rejected under 35 U.S.C. 103 as being unpatentable over “Personal Viewpoint Navigation Based on Object Trajectory Distribution for Multi-View Videos” by Xueting Wang, et al. (hereinafter Wang) in view of U.S. Patent Application Publication 2019/0066733 A1 (hereinafter Somanath).
Regarding claim 1, the limitation “An information processing apparatus comprising: … an obtaining unit configured to obtain information for specifying a position of an object included in multi-viewpoint image data obtained by image capturing using a plurality of imaging apparatuses” is taught by Wang (Wang, e.g., abstract, sections 3-5, describes a system for generating personal viewpoint navigation based on trajectory data of the objects in a multi-view video, which includes a pre-processing step for obtaining the positions of players and the ball on the field, e.g. section 5.1, i.e. obtaining information specifying positions of objects in multi-viewpoint image data captured using a plurality of imaging apparatuses.)

The limitations “one or more memories storing instructions; and one or more processors executing the instructions … a generation unit configured to generate virtual viewpoint path data to generate virtual viewpoint image data by inputting the information obtained by the obtaining unit to an output unit which is a learned model learned from the virtual viewpoint path data to be training data and at least information specifying a position of an object to be input data corresponding to the virtual viewpoint path data and is configured to output virtual viewpoint data by receiving input of information for specifying a position of an object” are partially taught by Wang (As discussed above, Wang’s system uses a trained machine learning model to generate a personalized viewpoint path from a multi-view input dataset.  Further, Wang suggests, section 6.2.3, 
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang’s personal viewpoint navigation system to use processor(s) executing a stored program as taught by Somanath and to include Somanath’s space-time view interpolation technique to generate intermediate frames in time and space from the sequences generated by Wang’s machine learning models in order to improve performance by smoothing the generated sequences as suggested by Wang, section 6.2.3.  As noted above, smoothed video sequences correspond to virtual viewpoint image data generated on virtual viewpoint paths.
Regarding claim 2, the limitation “wherein the output unit outputs pieces of virtual viewpoint data corresponding to a plurality of sequential frames” is taught by Wang (Wang, e.g. section 4.3.2 teaches that for each frame, the best viewpoint is selected using a cut, i.e. a sequential subset of frames from the input video, with the frame at the center.  That is, for every frame of the entire video sequence, a best viewpoint is determined using one of the machine learning models.)
Regarding claim 4, the limitations “wherein the obtaining unit further obtains additional information for generating the virtual viewpoint path data, and the output unit outputs the virtual viewpoint path data based on the information and the additional information” is taught by Wang (Wang, e.g. section 4, paragraph 1, section 4.2, teaches that each object has an assigned type which is part of the input to the machine learning model for generating the virtual viewpoint path.  Further, Wang, section 4, paragraph 3, teaches that the trajectories are measured in a 2D field coordinate system, which is also part of the input to the machine learning model for generating the virtual viewpoint path.)
Regarding claim 5, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 4 above, i.e. as noted the information includes object type information, and field coordinate information.
Regarding claim 6, the limitation “wherein the obtaining unit further obtains a timecode for designating target time to generate the virtual viewpoint image data, and the generation unit generates the virtual viewpoint data based on the information of the frame corresponding to the timecode” is taught by Wang in view of Somanath (Wang, e.g. section 4.3.2 teaches that for each frame, the best viewpoint is selected using a cut, i.e. a sequential subset of frames from the input video, with the frame at the center.  That is, for every frame of the entire video sequence, a best viewpoint is determined using one of the machine learning models.  Further, Somanath, e.g. paragraphs 65-78, Fig 3A, teaches that a time t is associated with each frame, i.e. a timecode, and that the intermediate images are based on designating intermediate timecodes, i.e. as in paragraph 75, the intermediate timecode is specified as [t+(t+1)]/2, such that in the modified system, target timecodes between captured frame times are determined/obtained for generating the virtual viewpoint image data.)
Regarding claim 9, the limitation “wherein the generation unit calculates an evaluation score evaluating the virtual viewpoint path according to the information when generating the virtual viewpoint data” is taught by Wang (Wang, section 5.4, teaches that the system can calculate an evaluation score measuring the concordance rate between the path generated by a machine learning model and the training data.)
Regarding claim 10, the limitation “wherein the output unit is an external apparatus connected via a network” is taught by Wang in view of Somanath (Somanath, e.g. paragraphs 35-37, teaches that the system may rely on a cloud based platform user interface, and communicate over a cloud network.  While not explicitly stated by Somanath, Official Notice is taken of the fact that it is old and well-known in the art of computer graphics to use a thin-client user interface with a remote server performing processing, analogous to Somanath’s cloud based platform user interface, in order to allow a client device lacking sufficient processing capability for a given application to provide the application to a user by offloading the processing to said server and/or to allow the server to provide the same functionality to multiple client devices.)
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang’s personal viewpoint navigation system to using processor(s) executing a stored program and including Somanath’s space-time view interpolation technique, to use Somanath’s cloud based platform user interface as a thin-client interface for a cloud server performing processing because one of ordinary skill in the art would have known that it is conventional to use thin-client/server system in order to allow a client device lacking sufficient processing capability for a given application to provide the application to a user by offloading the processing to said server and/or to allow the server to provide the same functionality to multiple client devices.  In the modified system, the cloud server accessed over the network would perform all the processing for Wang’s machine learning models, i.e. the claimed output unit generating the path data, allowing the server to offload viewpoint path generation from a client device lacking sufficient processing capability and/or to provide viewpoint paths for multiple client devices.
Regarding claim 11, the limitation “wherein the generation unit includes the output unit” is taught by Wang in view of Somanath (Somanath teaches one computing embodiment would be a single computing device, e.g. paragraph 24, i.e. all the processing being performed by processors of a single computing device.  As noted in the claim 1 rejection, Wang does not discuss the details of the computing device used.)
Regarding claims 12 and 13, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 2 above.

Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over “Personal Viewpoint Navigation Based on Object Trajectory Distribution for Multi-View Videos” by Xueting Wang, et al. (hereinafter Wang) in view of U.S. Patent Application Publication 2019/0066733 A1 (hereinafter Somanath) as applied to claim 1 above, and further in view of U.S. Patent 5,745,126 (hereinafter Jain).
Regarding claim 7, the limitation “wherein the information is three-dimensional model data representing a position of an object included in multi-viewpoint image data” is not explicitly taught by Wang (Wang, e.g. section 4, paragraph 3, section 5.1, teaches that the trajectories of the object positions are measured as 2D coordinate data rather than 3D coordinate data, and but notes that trajectory reconstruction from video is not their focus.)  However, this limitation is taught by Jain (Jain, e.g. abstract, col 7, line 30 – col 9, line 16, describes a system of generating new viewpoints of objects within a multi-view video, which includes building an environment model including all potential objects of interest and their locations, e.g. col 17, line 51-col 18, line 6, including in 3D, e.g. col 22, line 63 – col 24, line 56, and object trajectories/paths, e.g. figure 16.  It is further noted that Jain does not require the use of any laser tracking or manual labeling to extract player and ball positions from the multi-view video, in contrast to Wang, section 5.1)
Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang’s personal viewpoint navigation system to using processor(s) executing a stored program and including Somanath’s space-time view interpolation technique, to use Jain’s multi-view video object trajectory extraction technique to determine object trajectories without requiring laser tracking or manual labeling.  It is further noted that Jain’s extracted positions are 3D rather than 2D, which could improve the results of Wang’s machine learning models for sports where players or objects are not primarily on the ground plane like in Wang’s exemplary soccer videos, e.g. as in Jain’s example of American football, it is common for the ball to be thrown or kicked well above the ground plane, which is also common in sports such as baseball, basketball, volleyball, etc.  Finally, it is noted that while Wang does not address use of 3D coordinates, per se, it would only require modifying Wang’s machine learning models to use a 3D histogram (section 4.1.2) and perform 3D integration (section 4.2, paragraph 3), and while this may involve an increased processing requirement, as noted above, the results for sports which are not primarily on the ground plane could be substantially improved, thereby motivating one of ordinary skill in the art to modify Wang’s machine learning models to operate on Jain’s 3D object trajectories.
Regarding claim 8, the limitation “wherein the three-dimensional model data is three-dimensional model data of a foreground object” is taught by Wang in view of Jain (The objects for which Jain’s technique extracts trajectories are foreground objects, i.e. players and the ball, rather than the background objects such as the ground or surrounding static structures.  Similarly, Wang’s system only tracks foreground objects in the game, players and the ball.)

Allowable Subject Matter
Claim 3 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
Depending claim 3 requires that the virtual viewpoint path data is generated by interpolating between virtual viewpoint data output by the learned model for a plurality of non-sequential frames.  As discussed in the above claim 2 rejection, requiring output for sequential frames, Wang teaches that the machine learning model determines a best viewpoint for every frame of the video sequence, and Wang does not teach or otherwise suggest determining a viewpoint path for only a non-sequential subset of frames and interpolating between them.  As discussed in the claim 1 rejection above, Somanath teaches performing space-time view interpolation between frames of a multi-view video, but only discusses interpolating between sequential frames, i.e. for a time between a first frame at time t and a second frame at time t+1.  Further, the other cited prior art does not teach or suggest modifying the machine learning model to operate on a non-sequential subset of frames of the video and interpolate between the results thereof, such that the scope of the limitations of claim 3 taken together with the limitations of claim 1 is not anticipated or obvious in view of the cited prior art.



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT BADER whose telephone number is (571)270-3335. The examiner can normally be reached 10-6 m-f.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ROBERT BADER/           Primary Examiner, Art Unit 2619