DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claims 1-19 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-19 are rejected under 35 U.S.C. 103 as being unpatentable over Ramaswamy et al. (U.S. Patent Application Publication 2020/0014961) in view of Ibrahim et al (U.S. Patent Application Publication 2016/0101358) in view of Yee et al. (WO 2015/088719 A1).
Regarding claim 1, Ramaswamy et al. discloses a non-transitory computer-readable storage medium storing a program that causes a computer to execute a process, the process comprising: receiving first positional information of each of a plurality of players, the first positional information being identified based on first video information captured by cameras installed in a field where the plurality of players play a competition (Fig. 4; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface)); acquiring second video information from a second camera that captures a video image of the competition (Fig. 4; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416); when accepting identification information of a specific player among the plurality of players, converting first positional information of the specific player when and after the identification information is accepted, to second positional information of the specific player in the second video information (Figs. 4 and 9A-9D; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416; paragraph [0078] – in the embodiment of Figs. 9A-9D the center of the zoomed video 918 is selected based on the plurality of selected objects of interest – for example, a bounding box may be generated so as to encompass bounding boxes of all respective selected objects, and the zoomed video may be centered on the center of the generated bounding box – alternatively, the client may calculate a mean location from the locations of the respective zoomed images and use mean location as the center of the aggregate zoomed video 918 – the center of the aggregate zoomed video may be updated on a frame-by-frame basis); generating third video information on the specific player that is a partial area cut out from the second video information based on the second positional information of the specific player obtained by the conversion (Fig. 4; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416)); and outputting the third video information (Fig. 4; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416).  However, Ramaswamy et al. fails to explicitly disclose a plurality of cameras and the correspondence between the plurality of cameras and the specific player being tracked.
Referring to the Ibrahim et al. reference, Ibrahim et al. discloses a non-transitory computer-readable storage medium storing a program that causes a computer to execute a process, the process comprising: a plurality of cameras, wherein one captures the field where the plurality of players plays a competition, and wherein a plurality of cameras captures plural pieces of partial video information from a plurality of cameras that capture video images of respective areas of the field (Fig. 1; paragraph [0047] – the tracking cameras 200 can be further configured so that they are each assigned to capture tracking video of a particular zone of interest – each of the tracking cameras 200’, 200’’, and 200’’’ are positioned to capture tracking video primarily in its assigned zone of interest – if the FOV of the cameras overlap, each camera can be individually calibrated so as to ignore motion within its field of view if that motion occurs outside the camera’s assigned zone of interest; paragraph [0051] – the first camera is a broadcast camera 106 which is a video camera positioned near the center of the field).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had used a plurality of cameras in order to track the players as disclosed by Ibrahim et al. in the method disclosed by Ramaswamy et al. in order to more efficiently track the players as well as be able to zoom in on players that are of interest to the user.  However, Ramaswamy et al. in view of Ibrahim et al. fails to explicitly disclose the correspondence between the plurality of cameras and the specific player being tracked.
Referring to the Yee et al. reference, Yee et al. discloses a non-transitory computer-readable storage medium storing a program that causes a computer to execute a process, the process comprising: the correspondence between the plurality of cameras and the specific player being tracked (Figs. 7A and 7B; paragraph [0004] – a focus point may be a coordinate pair in the video frame that is updated to follow the point of interest as it moves within the video frame – as such, a coordinate pair for a given frame of the video content may indicate a sub-frame within the given frame, such that the receiver can determine an appropriate area in each frame to zoom in on; paragraph [0063] – Figs. 7A and 7B illustrate a scenario in which an exemplary graphical user interface may be provided, which allows a user to select various viewing options from a GUI corresponding to different points of interest in a television program – specifically, Fig. 7A illustrates an exemplary GUI 710 for zooming in on different points of interest of a video stream; paragraph [0064] – the GUI 710 may have a first selection level 712 that is associated with metadata – for instance, the first selection level 712 may be associated with focal-point type metadata (e.g., cornerbacks, quarterback, running backs, coaches, band members, people in the stands); paragraph [0065] – the GUI 710 may have a second selection level 714 that may be associated with metadata – for instance, the second selection level 714 may be associated with focal-point metadata – in Fig. 7A, for example, the second selection level 714 is associated with focal-point metadata corresponding to points of interest (e.g., Player 1, Player 2); paragraph [0066] – in a further aspect, the GUI 710 may have a third selection level 716 that may be associated with metadata – for instance, the third selection level 716 may be associated with camera selection metadata – Fig. 7A illustrates different cameras placed around the field with reference numerals C1, C2, C8, C9, and C10 – in Fig. 7A, for example, the third selection level 716 is associated with camera selection metadata corresponding to different camera views of the event (e.g., camera 1, camera 2, camera 8, camera 9, and camera 10); paragraph [0067] – Fig. 7B illustrates the result of the three selection requests of Fig. 7A – as shown in Fig. 7B, Player 1 is centered or featured in a zoomed-in display from the view of camera 9; as can be seen from the Fig. 7A, the positional information (metadata) of the specific player selected is used to determine which other cameras correspond to that position and specific player).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had determined the correspondence between the plurality of cameras and the specific player being tracked as disclosed by Yee et al. in the method disclosed by Ramaswamy et al. in view of Ibrahim et al. in order to more efficiently track the players between camera views.  
Regarding claim 2, Ramaswamy et al. in view of Ibrahim et al. in view of Yee et al. disclose all of the limitations as previously discussed with respect to claim 1 including that wherein the third video information is a close-up video image of the specific player (Ramaswamy et al.: Fig. 4; paragraph [0049] – the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416).  
Regarding claim 3, Ramaswamy et al. in view of Ibrahim et al. in view of Yee et al. disclose all of the limitations as previously discussed with respect to claim 1 including that wherein the second camera is a camera with a higher resolution than the first camera, and 49Attorney Docket No. 15195US01 the third video information cut out from the second video information of the second camera is information to be distributed to a terminal of a viewer of the competition (Ramaswamy et al.: Fig. 4; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416).  
Regarding claim 4, Ramaswamy et al. in view of Ibrahim et al. in view of Yee et al. disclose all of the limitations as previously discussed with respect to claim 1 including that wherein the acquiring second video information includes: acquiring plural pieces of partial video information from a plurality of second cameras that capture video images of respective areas of the field, and generating the second video information from the plural pieces of partial video information (Ramaswamy et al.: Figs. 4 and 10; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416; paragraph [0066] – in some embodiments, a client device may operate to construct zoomed streams by requesting selected spatial regions (e.g. slices) from a high-resolution video made available by a server – one such embodiment is illustrated in Fig. 10 – Fig. 10 illustrates a still from a high-resolution video (e.g., higher resolution than the original un-zoomed video displayed by the client device) – the high-resolution video in Fig. 10 is divided into 36 spatial regions (e.g. slices), numbered 1 through 36 – the content server provides metadata to the client indicating the location of the regions of interest within the high-resolution video – this location data may be provided as pixel coordinates of the regions of interest within the full video or as other parameters – a client device, in response to receiving a user selection of a region of interest, selects an appropriate slice or slices to retrieve from the high-resolution video; Ibrahim et al.: Fig. 1; paragraph [0047] – the tracking cameras 200 can be further configured so that they are each assigned to capture tracking video of a particular zone of interest – each of the tracking cameras 200’, 200’’, and 200’’’ are positioned to capture tracking video primarily in its assigned zone of interest – if the FOV of the cameras overlap, each camera can be individually calibrated so as to ignore motion within its field of view if that motion occurs outside the camera’s assigned zone of interest; paragraph [0051] – the first camera is a broadcast camera 106 which is a video camera positioned near the center of the field).  
Regarding claim 5, Ramaswamy et al. in view of Ibrahim et al. in view of Yee et al. disclose all of the limitations as previously discussed with respect to claims 1 and 4 including that wherein the acquiring second video information further includes: correcting distortions of the plural pieces of partial video information, and generating the second video information from plural pieces of partial video information in which distortions are corrected (Ramaswamy et al.: Figs. 4 and 10; paragraph [0016] – scaling, alpha blending – image processing; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416; paragraph [0066] – in some embodiments, a client device may operate to construct zoomed streams by requesting selected spatial regions (e.g. slices) from a high-resolution video made available by a server – one such embodiment is illustrated in Fig. 10 – Fig. 10 illustrates a still from a high-resolution video (e.g., higher resolution than the original un-zoomed video displayed by the client device) – the high-resolution video in Fig. 10 is divided into 36 spatial regions (e.g. slices), numbered 1 through 36 – the content server provides metadata to the client indicating the location of the regions of interest within the high-resolution video – this location data may be provided as pixel coordinates of the regions of interest within the full video or as other parameters – a client device, in response to receiving a user selection of a region of interest, selects an appropriate slice or slices to retrieve from the high-resolution video; Ibrahim et al.: Fig. 1; paragraph [0047] – the tracking cameras 200 can be further configured so that they are each assigned to capture tracking video of a particular zone of interest – each of the tracking cameras 200’, 200’’, and 200’’’ are positioned to capture tracking video primarily in its assigned zone of interest – if the FOV of the cameras overlap, each camera can be individually calibrated so as to ignore motion within its field of view if that motion occurs outside the camera’s assigned zone of interest; paragraph [0051] – the first camera is a broadcast camera 106 which is a video camera positioned near the center of the field).  
Regarding claim 6, Ramaswamy et al. in view of Ibrahim et al. in view of Yee et al. disclose all of the limitations as previously discussed with respect to claim 1 including that wherein the specific player is a player related to an event that occurs in the competition (Ramaswamy et al.: in some embodiments, the user interface is operative to enable a user to map zoom buttons to selected objects of interest (e.g., the user’s favorite athletes); Ibrahim et al.: paragraph [0072] – game play can be determined by analyzing the activity of players within a zone of interest and selecting a gameplay scenario from a predetermined set of game play scenarios – the set of scenarios can include, for example, a faceoff, a breakaway, or player movement – a scenario can be selected based on a pattern of players or a sequence of events of players or groups of players in the zones of interest – once a scenario is selected, the PTZ of the broadcast camera is adjusted to provide the best view of game action according to the scenario).  
Regarding claim 7, Ramaswamy et al. in view of Ibrahim et al. in view of Yee et al. disclose all of the limitations as previously discussed with respect to claim 1 including that wherein the converting includes calculating, every predetermined time period, average positional information by averaging plural pieces of second positional information included in a predetermined time period, 50Attorney Docket No. 15195US01 wherein the generating includes generating different video information that is a partial area cut out from the second video information, in accordance with the average positional information (Ramaswamy et al.: Figs. 9A-9D; paragraph [0078] – in the embodiment of Figs. 9A-9D the center of the zoomed video 918 is selected based on the plurality of selected objects of interest – for example, a bounding box may be generated so as to encompass bounding boxes of all respective selected objects, and the zoomed video may be centered on the center of the generated bounding box – alternatively, the client may calculate a mean location from the locations of the respective zoomed images and use mean location as the center of the aggregate zoomed video 918 – the center of the aggregate zoomed video may be updated on a frame-by-frame basis).  
Regarding claim 8, Ramaswamy et al. in view of Ibrahim et al. in view of Yee et al. disclose all of the limitations as previously discussed with respect to claim 1 including that wherein the first positional information is information indicating a three-dimensional position of each of the plurality of players in the field, and the second positional information is information indicating a two- dimensional position of each of the plurality of players in the second video information (Ramaswamy et al.: paragraph [0010] – the client also receives, over the network, information identifying at least one object of interest and a spatial position of the object of interest within the original video stream; Ibrahim et al.: paragraph [0049] – the output of the optical tracking module 206 consists of a 3-tuple of x and y position and size of blobs identified in the video – these can be referred to as the location or coordinates of the blobs – in other embodiments, the coordinates can include other parameters, such as a z position or orientation for example).  
Regarding claim 9, Ramaswamy et al. discloses a video image generation method executed by a computer, the video image generation method comprising: receiving first positional information of each of a plurality of players, the first positional information being identified based on first video information captured by a cameras installed in a field where the plurality of players play a competition (Fig. 4; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface)); acquiring second video information from a second camera that captures a video image of the competition (Fig. 4; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416); when accepting identification information of a specific player among the plurality of players, converting first positional information of the specific player when and after the identification information is accepted, to second positional information of the specific player in the second video information (Figs. 4 and 9A-9D; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416; paragraph [0078] – in the embodiment of Figs. 9A-9D the center of the zoomed video 918 is selected based on the plurality of selected objects of interest – for example, a bounding box may be generated so as to encompass bounding boxes of all respective selected objects, and the zoomed video may be centered on the center of the generated bounding box – alternatively, the client may calculate a mean location from the locations of the respective zoomed images and use mean location as the center of the aggregate zoomed video 918 – the center of the aggregate zoomed video may be updated on a frame-by-frame basis); generating third video information on the specific player that is a partial area cut out from the second video information based on the second positional information of the specific player obtained by the conversion (Fig. 4; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416)); and outputting the third video information (Fig. 4; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416).  However, Ramaswamy et al. fails to explicitly disclose a plurality of cameras and the correspondence between the plurality of cameras and the specific player being tracked.
Referring to the Ibrahim et al. reference, Ibrahim et al. discloses a video image generation method executed by a computer, the video image generation method comprising: a plurality of cameras, wherein one captures the field where the plurality of players plays a competition, and wherein a plurality of cameras captures plural pieces of partial video information from a plurality of cameras that capture video images of respective areas of the field (Fig. 1; paragraph [0047] – the tracking cameras 200 can be further configured so that they are each assigned to capture tracking video of a particular zone of interest – each of the tracking cameras 200’, 200’’, and 200’’’ are positioned to capture tracking video primarily in its assigned zone of interest – if the FOV of the cameras overlap, each camera can be individually calibrated so as to ignore motion within its field of view if that motion occurs outside the camera’s assigned zone of interest; paragraph [0051] – the first camera is a broadcast camera 106 which is a video camera positioned near the center of the field).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had used a plurality of cameras in order to track the players as disclosed by Ibrahim et al. in the method disclosed by Ramaswamy et al. in order to more efficiently track the players as well as be able to zoom in on players that are of interest to the user.  However, Ramaswamy et al. in view of Ibrahim et al. fails to explicitly disclose the correspondence between the plurality of cameras and the specific player being tracked.
Referring to the Yee et al. reference, Yee et al. discloses a video image generation method executed by a computer, the video image generation method comprising: the correspondence between the plurality of cameras and the specific player being tracked (Figs. 7A and 7B; paragraph [0004] – a focus point may be a coordinate pair in the video frame that is updated to follow the point of interest as it moves within the video frame – as such, a coordinate pair for a given frame of the video content may indicate a sub-frame within the given frame, such that the receiver can determine an appropriate area in each frame to zoom in on; paragraph [0063] – Figs. 7A and 7B illustrate a scenario in which an exemplary graphical user interface may be provided, which allows a user to select various viewing options from a GUI corresponding to different points of interest in a television program – specifically, Fig. 7A illustrates an exemplary GUI 710 for zooming in on different points of interest of a video stream; paragraph [0064] – the GUI 710 may have a first selection level 712 that is associated with metadata – for instance, the first selection level 712 may be associated with focal-point type metadata (e.g., cornerbacks, quarterback, running backs, coaches, band members, people in the stands); paragraph [0065] – the GUI 710 may have a second selection level 714 that may be associated with metadata – for instance, the second selection level 714 may be associated with focal-point metadata – in Fig. 7A, for example, the second selection level 714 is associated with focal-point metadata corresponding to points of interest (e.g., Player 1, Player 2); paragraph [0066] – in a further aspect, the GUI 710 may have a third selection level 716 that may be associated with metadata – for instance, the third selection level 716 may be associated with camera selection metadata – Fig. 7A illustrates different cameras placed around the field with reference numerals C1, C2, C8, C9, and C10 – in Fig. 7A, for example, the third selection level 716 is associated with camera selection metadata corresponding to different camera views of the event (e.g., camera 1, camera 2, camera 8, camera 9, and camera 10); paragraph [0067] – Fig. 7B illustrates the result of the three selection requests of Fig. 7A – as shown in Fig. 7B, Player 1 is centered or featured in a zoomed-in display from the view of camera 9; as can be seen from the Fig. 7A, the positional information (metadata) of the specific player selected is used to determine which other cameras correspond to that position and specific player).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had determined the correspondence between the plurality of cameras and the specific player being tracked as disclosed by Yee et al. in the method disclosed by Ramaswamy et al. in view of Ibrahim et al. in order to more efficiently track the players between camera views.  
Regarding claim 10, Ramaswamy et al. discloses a video image generation system comprising: 51Attorney Docket No. 15195US01 a first server that includes a first memory and a first processor coupled to the first memory (Figs. 3-5); and a second server that includes a second memory and a second processor coupled to the second memory (Figs. 3-5), wherein the first processor is configured to: acquire first video information from cameras installed in a field where a plurality of players play a competition, identify first positional information of each of the plurality of players, based on the first video information, and transmit first positional information of each of the plurality of players to the second server (Fig. 4; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416), wherein the second processor is configured to: receive first positional information of each of the plurality of players from the first server, acquire second video information from a second camera that captures a video image of the competition (Fig. 4; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416); when accepting identification information of a specific player among the plurality of players, convert first positional information of the specific player when and after the identification information is accepted, to second positional information of the specific player in the second video information (Figs. 4 and 9A-9D; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416; paragraph [0078] – in the embodiment of Figs. 9A-9D the center of the zoomed video 918 is selected based on the plurality of selected objects of interest – for example, a bounding box may be generated so as to encompass bounding boxes of all respective selected objects, and the zoomed video may be centered on the center of the generated bounding box – alternatively, the client may calculate a mean location from the locations of the respective zoomed images and use mean location as the center of the aggregate zoomed video 918 – the center of the aggregate zoomed video may be updated on a frame-by-frame basis); generate third video information on the specific player that is a partial area cut out from the second video information based on the second positional information of the specific player obtained by the conversion (Fig. 4; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416)); and output the third video information (Fig. 4; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416).  However, Ramaswamy et al. fails to explicitly disclose a plurality of cameras and the correspondence between the plurality of cameras and the specific player being tracked.
Referring to the Ibrahim et al. reference, Ibrahim et al. discloses a video image generation system comprising: a plurality of cameras, wherein one captures the field where the plurality of players plays a competition, and wherein a plurality of cameras captures plural pieces of partial video information from a plurality of cameras that capture video images of respective areas of the field (Fig. 1; paragraph [0047] – the tracking cameras 200 can be further configured so that they are each assigned to capture tracking video of a particular zone of interest – each of the tracking cameras 200’, 200’’, and 200’’’ are positioned to capture tracking video primarily in its assigned zone of interest – if the FOV of the cameras overlap, each camera can be individually calibrated so as to ignore motion within its field of view if that motion occurs outside the camera’s assigned zone of interest; paragraph [0051] – the first camera is a broadcast camera 106 which is a video camera positioned near the center of the field).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had used a plurality of cameras in order to track the players as disclosed by Ibrahim et al. in the system disclosed by Ramaswamy et al. in order to more efficiently track the players as well as be able to zoom in on players that are of interest to the user.  However, Ramaswamy et al. in view of Ibrahim et al. fails to explicitly disclose the correspondence between the plurality of cameras and the specific player being tracked.
Referring to the Yee et al. reference, Yee et al. discloses a video image generation system comprising: the correspondence between the plurality of cameras and the specific player being tracked (Figs. 7A and 7B; paragraph [0004] – a focus point may be a coordinate pair in the video frame that is updated to follow the point of interest as it moves within the video frame – as such, a coordinate pair for a given frame of the video content may indicate a sub-frame within the given frame, such that the receiver can determine an appropriate area in each frame to zoom in on; paragraph [0063] – Figs. 7A and 7B illustrate a scenario in which an exemplary graphical user interface may be provided, which allows a user to select various viewing options from a GUI corresponding to different points of interest in a television program – specifically, Fig. 7A illustrates an exemplary GUI 710 for zooming in on different points of interest of a video stream; paragraph [0064] – the GUI 710 may have a first selection level 712 that is associated with metadata – for instance, the first selection level 712 may be associated with focal-point type metadata (e.g., cornerbacks, quarterback, running backs, coaches, band members, people in the stands); paragraph [0065] – the GUI 710 may have a second selection level 714 that may be associated with metadata – for instance, the second selection level 714 may be associated with focal-point metadata – in Fig. 7A, for example, the second selection level 714 is associated with focal-point metadata corresponding to points of interest (e.g., Player 1, Player 2); paragraph [0066] – in a further aspect, the GUI 710 may have a third selection level 716 that may be associated with metadata – for instance, the third selection level 716 may be associated with camera selection metadata – Fig. 7A illustrates different cameras placed around the field with reference numerals C1, C2, C8, C9, and C10 – in Fig. 7A, for example, the third selection level 716 is associated with camera selection metadata corresponding to different camera views of the event (e.g., camera 1, camera 2, camera 8, camera 9, and camera 10); paragraph [0067] – Fig. 7B illustrates the result of the three selection requests of Fig. 7A – as shown in Fig. 7B, Player 1 is centered or featured in a zoomed-in display from the view of camera 9; as can be seen from the Fig. 7A, the positional information (metadata) of the specific player selected is used to determine which other cameras correspond to that position and specific player).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had determined the correspondence between the plurality of cameras and the specific player being tracked as disclosed by Yee et al. in the system disclosed by Ramaswamy et al. in view of Ibrahim et al. in order to more efficiently track the players between camera views.  
Regarding claim 11, Ramaswamy et al. in view of Ibrahim et al. in view of Yee et al. disclose all of the limitations as previously discussed with respect to claim 10 including that wherein the third video information is a close-up video image of the specific player (Ramaswamy et al.: Fig. 4; paragraph [0049] – the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416).  
Regarding claim 12, Ramaswamy et al. in view of Ibrahim et al. in view of Yee et al. disclose all of the limitations as previously discussed with respect to claim 10 including that wherein 52Attorney Docket No. 15195US01 the second camera is a camera with a higher resolution than the first camera, and the third video information cut out from the second video information of the second camera is information to be distributed to a terminal of a viewer of the competition (Ramaswamy et al.: Fig. 4; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416).  
Regarding claim 13, Ramaswamy et al. in view of Ibrahim et al. in view of Yee et al. disclose all of the limitations as previously discussed with respect to claim 10 including that wherein the second processor is configured to acquire plural pieces of partial video information from a plurality of second cameras that capture video images of respective areas of the field, wherein the second processor is configured to generate the second video information from the plural pieces of partial video information (Ramaswamy et al.: Figs. 4 and 10; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416; paragraph [0066] – in some embodiments, a client device may operate to construct zoomed streams by requesting selected spatial regions (e.g. slices) from a high-resolution video made available by a server – one such embodiment is illustrated in Fig. 10 – Fig. 10 illustrates a still from a high-resolution video (e.g., higher resolution than the original un-zoomed video displayed by the client device) – the high-resolution video in Fig. 10 is divided into 36 spatial regions (e.g. slices), numbered 1 through 36 – the content server provides metadata to the client indicating the location of the regions of interest within the high-resolution video – this location data may be provided as pixel coordinates of the regions of interest within the full video or as other parameters – a client device, in response to receiving a user selection of a region of interest, selects an appropriate slice or slices to retrieve from the high-resolution video; Ibrahim et al.: Fig. 1; paragraph [0047] – the tracking cameras 200 can be further configured so that they are each assigned to capture tracking video of a particular zone of interest – each of the tracking cameras 200’, 200’’, and 200’’’ are positioned to capture tracking video primarily in its assigned zone of interest – if the FOV of the cameras overlap, each camera can be individually calibrated so as to ignore motion within its field of view if that motion occurs outside the camera’s assigned zone of interest; paragraph [0051] – the first camera is a broadcast camera 106 which is a video camera positioned near the center of the field).  
Regarding claim 14, Ramaswamy et al. in view of Ibrahim et al. in view of Yee et al. disclose all of the limitations as previously discussed with respect to claims 10 and 13 including that wherein the second processor is further configured to: correct distortions of the plural pieces of partial video information, and generate the second video information from plural pieces of partial video information in which distortions are corrected (Ramaswamy et al.: Figs. 4 and 10; paragraph [0016] – scaling, alpha blending – image processing; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); paragraph [0050] – in response to the selection, the device scales and crops the un-zoomed video in step 408 to generate a low-resolution zoomed view – an example of a low-resolution zoomed view is illustrated in box 410 – in parallel, the client device in step 412 retrieves a high-resolution zoomed stream (e.g. from a content server over a network) – when a sufficient amount of the high-resolution zoomed stream has been received, the client performs a synchronized switchover in step 414 to display of the high-resolution zoomed stream – an example of a high-resolution zoomed view is illustrated in box 416; paragraph [0066] – in some embodiments, a client device may operate to construct zoomed streams by requesting selected spatial regions (e.g. slices) from a high-resolution video made available by a server – one such embodiment is illustrated in Fig. 10 – Fig. 10 illustrates a still from a high-resolution video (e.g., higher resolution than the original un-zoomed video displayed by the client device) – the high-resolution video in Fig. 10 is divided into 36 spatial regions (e.g. slices), numbered 1 through 36 – the content server provides metadata to the client indicating the location of the regions of interest within the high-resolution video – this location data may be provided as pixel coordinates of the regions of interest within the full video or as other parameters – a client device, in response to receiving a user selection of a region of interest, selects an appropriate slice or slices to retrieve from the high-resolution video; Ibrahim et al.: Fig. 1; paragraph [0047] – the tracking cameras 200 can be further configured so that they are each assigned to capture tracking video of a particular zone of interest – each of the tracking cameras 200’, 200’’, and 200’’’ are positioned to capture tracking video primarily in its assigned zone of interest – if the FOV of the cameras overlap, each camera can be individually calibrated so as to ignore motion within its field of view if that motion occurs outside the camera’s assigned zone of interest; paragraph [0051] – the first camera is a broadcast camera 106 which is a video camera positioned near the center of the field).
Regarding claim 15, Ramaswamy et al. in view of Ibrahim et al. in view of Yee et al. disclose all of the limitations as previously discussed with respect to claim 10 including that wherein the first positional information is information indicating a three-dimensional position of each of the plurality of players in the field, and the second positional information is information indicating a two- dimensional position of each of the plurality of players in the second video information (Ramaswamy et al.: paragraph [0010] – the client also receives, over the network, information identifying at least one object of interest and a spatial position of the object of interest within the original video stream; Ibrahim et al.: paragraph [0049] – the output of the optical tracking module 206 consists of a 3-tuple of x and y position and size of blobs identified in the video – these can be referred to as the location or coordinates of the blobs – in other embodiments, the coordinates can include other parameters, such as a z position or orientation for example).  
Regarding claim 16, Ramaswamy et al. in view of Ibrahim et al. in view of Yee et al. disclose all of the limitations as previously discussed with respect to claim 10 including that wherein second positional information is bird’s-eye view information (Ramaswamy et al: Figs. 2A, 4, and 10 – bird’s-eye view; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); Yee et al.: Fig. 7A – bird’s-eye view).
Regarding claim 17, Ramaswamy et al. in view of Ibrahim et al. in view of Yee et al. disclose all of the limitations as previously discussed with respect to claim 1 including that wherein second positional information is bird’s-eye view information (Ramaswamy et al: Figs. 2A, 4, and 10 – bird’s-eye view; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); Yee et al.: Fig. 7A – bird’s-eye view).
Regarding claim 18, Ramaswamy et al. in view of Ibrahim et al. in view of Yee et al. disclose all of the limitations as previously discussed with respect to claim 9 including that wherein the first positional information is information indicating a three-dimensional position of each of the plurality of players in the field, and the second positional information is information indicating a two- dimensional position of each of the plurality of players in the second video information (Ramaswamy et al.: paragraph [0010] – the client also receives, over the network, information identifying at least one object of interest and a spatial position of the object of interest within the original video stream; Ibrahim et al.: paragraph [0049] – the output of the optical tracking module 206 consists of a 3-tuple of x and y position and size of blobs identified in the video – these can be referred to as the location or coordinates of the blobs – in other embodiments, the coordinates can include other parameters, such as a z position or orientation for example).  
Regarding claim 19, Ramaswamy et al. in view of Ibrahim et al. in view of Yee et al. disclose all of the limitations as previously discussed with respect to claim 9 including that wherein second positional information is bird’s-eye view information (Ramaswamy et al: Figs. 2A, 4, and 10 – bird’s-eye view; paragraph [0049] – in the method of Fig. 4, the client device in step 402 displays available objects of interest in an un-zoomed video - the device receives in step 404 a user selection of an object of interest (e.g. through a touch-screen interface); Yee et al.: Fig. 7A – bird’s-eye view).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to HEATHER R JONES whose telephone number is (571)272-7368. The examiner can normally be reached Mon. - Fri.: 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, William Vaughn can be reached on (571)272-3922. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HEATHER R JONES/Primary Examiner, Art Unit 2481                                                                                                                                                                                                        
October 8, 2022