DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Response to Arguments
3.	Applicant’s arguments with respect to the rejections of amended claims 1-20 under 35 U.S.C. 103 have been fully considered but are moot in view of the new grounds of rejection as described below. 
	
Response to Amendment
Claim Rejections - 35 USC § 103
4.	The text of those sections of Title 35, U.S. Code not included in this section can be found in a prior Office action.
5.	Claims 1, 9, 11, 17, and 19-20 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Soroushian et al. (US Publication 2013/0051767, hereinafter Soroushian) in view of Ojala (US Patent 11,272,237).
Regarding claim 1, Soroushian discloses a method comprising: 
(a) at a video encoding server, receiving an input video file comprising at least an input video stream (V0) having an input video resolution (R0) comprising an input width in pixels (W0) and an input length in pixels (L0) (Soroushian, para’s 0005-0028, fig. 5, receiving multimedia content having primary resolution comprising pixel width and pixel length); 
(b) at said video encoding server, generating from said input video stream (V0) a first generated video stream (V1), which is a downscaled and non-cropped version of an entire field-of-view of said input video stream (V0), wherein the first generated video stream (V1) has a first video resolution (R1) that is smaller than the input video resolution (R0), wherein the first video resolution (R1) has a width in pixels (W1) that is smaller than the input width in pixels (W0), wherein the first video resolution (R1) has a length in pixels (L1) that is smaller than the input length in pixels (L0) (Soroushian, para’s 0005-0028, 0057, fig. 5, source encoder generates a first generated video stream (V1) of a plurality of alternative streams having different resolution and same aspect ratio as the primary content; for example 1920X1080 resolution is downscaled to a non-cropped 1280X720 pixel resolution);
(c) at said video encoding server, generating from said input video stream (VO) a second generated video stream (V2) (Soroushian, para’s 0005-0028, 0057, fig. 5, source encoder generates a second generated video stream (V2) of the plurality of alternative streams having different resolution and same aspect ratio as the primary content);
(d) at said video encoding server, generating a streams manifest file, comprising at least: (i) a first pointer which points to a first storage address that stores the first generated video stream (V1), and also (ii) a second pointer which points to a second storage address that stores the second generated video stream (V2) (Soroushian, para. 0058, source encoding server generates a top level index, i.e., generates a manifest file containing a plurality of container files pointing to the plurality of alternative streams including first stream V1 and second stream V2. Alternative streams are streams that encode the same media content in different ways including but not limited to different maximum bitrates, resolutions and/or different frame rates, and the same aspect ratio as  the aspect ratio of the source video. In addition to Soroushian’s disclosure above, it is noted that providing an manifest/index file comprises information on one or more video streams or video portions of such video streams including references, e.g. URL, to the one or more video streams or video portions of such video streams is also well known in the art as evidenced by Okerman et al., WIPO Publication WO 2021213831, page 4, para. 11 and page 14, para. 53);
wherein said streams manifest file is provided by said video encoding server, enables a video playback unit to dynamically transition, during video playback and in response to a user command, from (i) playback of the first generated video stream (V1) that is a downscaled version of the entire field-of-view the input video stream, to (ii) playback of the second video stream (V2) having a different bit rate, resolution, and/or frame rate (Soroushian, para’s 0058 and 0062, the top level index file can also be dynamically generated in response to a request for a specific piece of content by a playback device; the adaptive bitrate streaming systems enables switching between different video streams encoded at different bitrates and resolutions depending on streaming conditions. In addition to Soroushian’s disclosure above, it is noted that dynamically switching, based on a request, from a first video content to a second video content described in a manifest file is also well known in the art as evidenced by Oyman, WIPO Publication WO 2018/044338, page 34, example 74, claim 1, based on a request, dynamically switch from a first video content to a second video content described in a manifest file).
 Soroushian does not explicitly disclose but Ojala discloses:
 (c) at said video encoding server, generating from said input video stream (VO) the second generated video stream (V2), which is a non-downscaled cropped region of only a partial field-of-view of said input video stream (V0), wherein the second generated video stream (V2) has said first video resolution (R1) that is smaller than the input video resolution (R0); wherein the second video stream (V2) tracks an object-of-interest that is visually depicted within said partial field-of-view in said input video stream (V0) (Ojala, col. 8, lines 12-45, smart TV 207 may crop a target and a region of interest (or spatial portion) from the video stream and enlarge the image. The new cropped and zoomed video stream may be transmitted to other connected user devices.  The individual users may select an object to track or specify a desired region of interest to show on user devices. The proximity of the viewers to user devices may create an effective zoom effect. That is, displaying a cropped image on closer device creates a perceived zoom effect, even if the resolution is the same. However, the cropped video portion may have a lower resolution or higher resolution than the resolution of the video stream (e.g., the same resolution as the first generated video stream V0), depending on the display capabilities of user devices; col. 8, line 24-45, The new cropped video stream is a non-scaled cropped region of only a partial field-of-view of or different viewpoint from said video stream; the new cropped video stream tracks an object that appears in the partial field of view of the full video stream).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate either Ojala’s features into Soroushian’s invention for enhancing user’s playback experience by providing portions of a full source video for tracking object without loss of image quality.

Regarding claim 9, Soroushian-Ojala discloses the method of claim 1, wherein the method comprises:
tracking a plurality of objects-of-interest within said input video stream (Ojala, fig. 5, tracking multiple objects 502a and 502b in the video);
generating a plurality of secondary video streams; wherein each one of the secondary video streams tracks a single object-of-interest that appears in the input video stream and that moves within the input video stream; wherein each one of the secondary video streams has an area, in pixels, that is smaller relative to the area in pixels of the input video stream (Ojala, fig. 13, col. 18, lines 55-64,  generate a plurality of tailored video content; each tailored stream tracks an object  appearing in the input video and has a cropped box smaller in area of pixels than the input video).
	The motivation and obviousness arguments are the same as claim 1.

Regarding claim 11, Soroushian discloses a method comprising: 
(a) receiving at a video playback device, a streams manifest file of a video (Soroushian, para’s 0058-0060, playback devices use HTTP or another appropriate stateless protocol to request and receive a top-level index file and the container files);
wherein the streams manifest file comprises at least:
(i) a first pointer to a first storage address of a first video stream (V1) depicting a full field-of-view of a video scene; (ii) a second pointer to a second storage address of a second video stream (V2) (Soroushian, para’s 0005-0028, 0058, fig. 5, the top-level index includes a plurality of container files pointing to the plurality of alternative streams including first alternative stream V1 and second alternative stream V2; each alternative stream is encoded from the same media content in different ways including but not limited to different maximum bitrates, resolutions and/or different frame rates, and the same aspect ratio as  the aspect ratio of the source video; the top-level index obviously can include an encoded full-view video 1920X1080 pixel resolution and an encoded full-view downscaled version of the 1920X1080 pixel resolution);
(b) playing the first video stream (V1) on said video playback device (Soroushian,, para. 0060, initiate playback of a video, the first alternative video stream V1);
(c) in response to a zoom-in command received at a particular time-point (T) during playback of the first video stream, transitioning from playing the first video stream (V1) on said video playback device to playing said second video stream (V2) on said video playback device from time-point T of said second video stream and onward (Soroushian, para’s 0058 and 0062, the top level index file can also be dynamically generated in response to a request for a specific piece of content by a playback device; the adaptive bitrate streaming systems enables switching between different video streams depending on streaming conditions; switching between different video streams obviously can be performed at a certain time during playback. Dynamically switching, based on a request, i.e., zoom-in command, from a first video content to a second video content described in a manifest file is well known in the art as evidenced by Oyman, WIPO Publication WO 2018/044338, page 34, example 74, claim 1, based on a request, dynamically switch from a first video content to a second video content described in a manifest file, and Pham et al., US Patent 10,897,637, col. 2-4, fig’s 4 through 7, claims 1 through 5, playback or present one or more synchronized content streams based on the modified manifest file and a  current time provided by the user device such that the application can buffer presentation of a portion of the received content streams using the reference time information, the capture times, and the current time. The synchronization feature may be implemented to modify a user interface to present, in a synchronized fashion, one or more content streams for a single event that are synchronized where each content stream may be from a different content provider and represent a different point of view or perspective of the event; col. 15, lines 49-67, user may prefer to control when to switch between the plurality of content streams being presented synchronously. The synchronization feature described herein also includes providing a visual or audio indicator of a particular content stream within the user interface presenting the multiple content streams to indicate to a user information about a more relevant content stream as determined by the attributes. A user may choose to respond to the indicator by using an input/output device to interact with the particular content stream that has a visual indicator of relevancy or audio indicator of relevancy or choose not to respond as they are content with their current user interface presentation of the multiple content streams).
Soroushian does not explicitly disclose but Ojala discloses:
the second video stream (V2) depicting a non-downscaled cropped version of said video scene; wherein the first video stream and the second video stream have same video resolution measured in pixels (Ojala, col. 8, lines 12-45, smart TV 207 may crop a target and a region of interest (or spatial portion) from the video stream and enlarge the image. The new cropped and zoomed video stream may be transmitted to other connected user devices.  The individual users may select an object to track or specify a desired region of interest to show on user devices. The proximity of the viewers to user devices may create an effective zoom effect. That is, displaying a cropped image on closer device creates a perceived zoom effect, even if the resolution is the same; col. 8, line 24-45, the new cropped video stream is a non-downscaled cropped region of only a partial field-of-view of said video stream, and having the same quality or resolution as the video stream). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate either Ojala’s features into Soroushian’s invention for enhancing user’s playback experience by providing zoom-in portions of a full source video for tracking object without loss of image quality. 

Regarding claim 17, Okerman-Ojala discloses the method of claim 11, wherein the method comprises:
receiving at said video playback device, said streams manifest file which points to said first video stream and to a plurality of secondary video streams (Soroushian, para’s 0058-0060, fig. 5, playback devices use HTTP or another appropriate stateless protocol to request and receive a top-level index file and the container files via a network 102; the top-level index includes a plurality of container files pointing to the plurality of alternative streams including first stream V1 and second stream V2; the top-level index obviously can include the full field-of-view video and the full-view downscaled versions of 1920X1080 pixel resolution),
wherein each one of the secondary video streams tracks a single object-of-interest that appears in the first video stream and that moves within the first video stream (Ojala, fig. 5, each tailored video content can track one of the objects 502a and 502b that appears in the video);
wherein each one of the secondary video streams has an area, in pixels, that is smaller relative to the area in pixels of the first video stream (Ojala, fig. 13, col. 18, lines 55-64, generate a plurality of tailored video content;  each tailored stream tracks an object  appearing in the input video and has a cropped box smaller in area of pixels than the input video).
The motivation and obviousness arguments are the same as claim 11.

Regarding claims 19 and 20, these claims comprise limitations substantially the same as claims 1 and 11; therefore, they are rejected for the same rationale. Soroushian-Ojala further discloses one or more hardware processors to execute code, operably associated with one or more memory units to store code (see Soroushian, para. 0030, processor and memory).
	
6.	Claims 1, 9, and 19 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Soroushian et al. (US Publication 2013/0051767, hereinafter Soroushian) in view of Wan et al. (US Publication 2010/0074341, hereinafter Wan), and further in view of Sundaresan et al. (US Publication 20190114804, hereinafter Sundaresan).
Regarding claim 1, Soroushian discloses a method comprising: 
(a) at a video encoding server, receiving an input video file comprising at least an input video stream (V0) having an input video resolution (R0) comprising an input width in pixels (W0) and an input length in pixels (L0) (Soroushian, para’s 0005-0028, fig. 5, receiving multimedia content having primary resolution comprising pixel width and pixel length); 
(b) at said video encoding server, generating from said input video stream (V0) a first generated video stream (V1), which is a downscaled and non-cropped version of an entire field-of-view of said input video stream (V0), wherein the first generated video stream (V1) has a first video resolution (R1) that is smaller than the input video resolution (R0), wherein the first video resolution (R1) has a width in pixels (W1) that is smaller than the input width in pixels (W0), wherein the first video resolution (R1) has a length in pixels (L1) that is smaller than the input length in pixels (L0) (Soroushian, para’s 0005-0028, 0057, fig. 5, source encoder generates a first generated video stream (V1) of a plurality of alternative streams having different resolution and same aspect ratio as the primary content; for example 1920X1080 resolution is downscaled to a non-cropped 1280X720 pixel resolution);
(c) at said video encoding server, generating from said input video stream (VO) a second generated video stream (V2) (Soroushian, para’s 0005-0028, 0057, fig. 5, source encoder generates a second generated video stream (V2) of the plurality of alternative streams having different resolution and same aspect ratio as the primary content);
(d) at said video encoding server, generating a streams manifest file, comprising at least: (i) a first pointer which points to a first storage address that stores the first generated video stream (V1), and also (ii) a second pointer which points to a second storage address that stores a second generated video stream (V2) (Soroushian, para’s 0004 and 0058, source encoding server generates a top level index pointing to a plurality of alternative streams, i.e., generates a manifest file containing a plurality of container files pointing to the plurality of alternative streams including first stream V1 and second stream V2. Alternative streams are streams that encode the same media content in different ways including but not limited to different maximum bitrates, resolutions and/or different frame rates, and the same aspect ratio corresponding to the aspect ratio of the source video. In addition to Soroushian’s disclosure above, it is noted that providing an manifest/index file comprising information on one or more video streams or video portions of such video streams including references, e.g. URL, to the one or more video streams or video portions of such video streams is also well known in the art as evidenced by Okerman et al., WIPO Publication WO 2021213831, page 4, para. 11 and page 14, para. 53);
wherein said streams manifest file is provided by said video encoding server, enables a video playback unit to dynamically transition, during video playback and in response to a user command, from (i) playback of the first generated video stream (V1) that is a downscaled version of the entire field-of-view the input video stream, to (ii) playback of the second video stream (V2) of the plurality of alternative streams  having a different bit rate, resolution, and/or frame rate (Soroushian, para’s 0058 and 0062, the top level index file can also be dynamically generated in response to a request for a specific piece of content by a playback device; the adaptive bitrate streaming systems enables switching between different video streams encoded at different bitrates and resolutions depending on streaming conditions. In addition to Soroushian’s disclosure above, it is noted that dynamically switching, based on a request, from a first video content to a second video content described in a manifest file is also well known in the art as evidenced by Oyman, WIPO Publication WO 2018/044338, page 34, example 74, claim 1, based on a request, dynamically switch from a first video content to a second video content described in a manifest file).
 Soroushian does not explicitly disclose:
(c) at said video encoding server, generating from said input video stream (VO) the second generated video stream (V2), which is a non-downscaled cropped region of only a partial field-of-view of said input video stream (V0), wherein the second generated video stream (V2) has said first video resolution (R1) that is smaller than the input video resolution (R0); wherein the second video stream (V2) tracks an object-of-interest that is visually depicted in said input video stream (V0);
Wan discloses: 
(c) at said video encoding server, generating from said input video stream (VO) the second generated video stream (V2), which is a non-downscaled cropped region of only a partial field-of-view of said input video stream (V0), wherein the second generated video stream (V2) has said first video resolution (R1) that is smaller than the input video resolution (R0) (Wan, para’s 0034-0035, 0043, 0150-0158, scalability pre-processor 210 may be operable to form HD resolution video content by cropping at least a portion of received HD resolution source video content. For example, the HD resolution video content may be formed by subtracting the cropped video content from the received HD resolution source video content; para. 0057, scalable encoder 114 may be enabled to crop the received video content to form multiple resolution video layers comprising a base video layer and one or more enhancement video layers having different resolutions (e.g., including same  resolution as the first video resolution R1). The formed HD resolution video content is a non-scaled cropped region of only a partial field-of-view of said HD resolution source video content. In addition to Wan’s disclosure, it is noted that generating a non-downscaled cropped video version from an original video source is also well known in the art as evidenced by Wang et al., US Publication 11,233,920, fig’s 2 and 3, generate video stream which is non-downscaled  cropped version of an original video source).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate either Wan’s features or well-known technique in the art into Soroushian’s invention for enhancing user’s playback experience by providing focused portions of a full source video without loss of image quality.
Soroushian-Wan discloses generating the second generated video stream (V2), which is a non-downscaled cropped region of only a partial field-of-view of said input video stream (V0) as described above, but does not explicitly disclose wherein the second video stream (V2) tracks an object-of-interest that is visually depicted within said partial field-of-view.
Sundaresan discloses wherein the second video stream (V2) tracks an object-of-interest that is visually depicted within said partial field-of-view (Sundaresan, para. 0106, a cropped region from an input video frame and around one or more ROIs can be used to track object). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Sundaresan’s features into Soroushian-Wan’s invention for enhancing user’s playback experience by providing focused portions of a full source video for tracking object of interest.

Regarding claim 9, Soroushian-Wan-Sundaresan discloses the method of claim 1, wherein the method comprises:
tracking a plurality of objects-of-interest within said input video stream (Sundaresan, fig. 7a-7c, para. 0133,  SSD detector matches objects with default boxes of different aspect ratios (shown as dashed rectangles in FIG. 7B and FIG. 7C). Each element of the feature map has a number of default boxes associated with it);
generating a plurality of secondary video streams; wherein each one of the secondary video streams tracks a single object-of-interest that appears in the input video stream and that moves within the input video stream; wherein each one of the secondary video streams has an area, in pixels, that is smaller relative to the area in pixels of the input video stream (Wan, para. 0057, scalable encoder 114 may be enabled to crop the received video content to form multiple resolution video layers comprising a base video layer and one or more enhancement video layers having different resolutions (e.g., including same  resolution as the first video resolution R1). The formed HD resolution video content is a non-scaled cropped region of only a partial field-of-view of said HD resolution source video content; Sundaresan, fig. 7a-7c, para. 0133, SSD detector matches objects with default boxes of different aspect ratios (shown as dashed rectangles in FIG. 7B and FIG. 7C).
The motivation and obviousness arguments are the same as claim 1.

Regarding claim 19, this claim comprises limitations substantially the same as claim 1; therefore, it is rejected for the same rationale. Soroushian-Wan-Sundaresan further discloses one or more hardware processors to execute code, operably associated with one or more memory units to store code (see Soroushian, para. 0030, processor and memory).
	
7.	Claims 11 and 20 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Soroushian et al. (US Publication 2013/0051767, hereinafter Soroushian) in view of Wan et al. (US Publication 2010/0074341, hereinafter Wan).
Regarding claim 11, Soroushian discloses a method comprising: 
(a) receiving at a video playback device, a streams manifest file of a video (Soroushian, para’s 0058-0060, playback devices use HTTP or another appropriate stateless protocol to request and receive a top-level index file and the container files);
wherein the streams manifest file comprises at least:
(i) a first pointer to a first storage address of a first video stream (V1) depicting a full field-of-view of a video scene; (ii) a second pointer to a second storage address of a second video stream (V2) (Soroushian, para’s 0005-0028, 0058, fig. 5, the top-level index includes a plurality of container files pointing to the plurality of alternative streams including first alternative stream V1 and second alternative stream V2; each alternative stream is encoded from the same media content in different ways including but not limited to different maximum bitrates, resolutions and/or different frame rates, and the same aspect ratio as  the aspect ratio of the source video; the top-level index obviously can include an encoded full-view video 1920X1080 pixel resolution and an encoded full-view downscaled version of the 1920X1080 pixel resolution);
(b) playing the first video stream (V1) on said video playback device (Soroushian,, para. 0060, initiate playback of a video, the first alternative video stream V1);
(c) in response to a zoom-in command received at a particular time-point (T) during playback of the first video stream, transitioning from playing the first video stream (V1) on said video playback device to playing said second video stream (V2) on said video playback device from time-point T of said second video stream and onward (Soroushian, para’s 0058 and 0062, the top level index file can also be dynamically generated in response to a request for a specific piece of content by a playback device; the adaptive bitrate streaming systems enables switching between different video streams depending on streaming conditions; switching between different video streams obviously can be performed at a certain time during playback. Dynamically switching, based on a request, i.e., zoom-in command, from a first video content to a second video content described in a manifest file is well known in the art as evidenced by Oyman, WIPO Publication WO 2018/044338, page 34, example 74, claim 1, based on a request, dynamically switch from a first video content to a second video content described in a manifest file, and Pham et al., US Patent 10,897,637, col. 2-4, fig’s 4 through 7, claims 1 through 5, playback or present one or more synchronized content streams based on the modified manifest file and a  current time provided by the user device such that the application can buffer presentation of a portion of the received content streams using the reference time information, the capture times, and the current time. The synchronization feature may be implemented to modify a user interface to present, in a synchronized fashion, one or more content streams for a single event that are synchronized where each content stream may be from a different content provider and represent a different point of view or perspective of the event; col. 15, lines 49-67, user may prefer to control when to switch between the plurality of content streams being presented synchronously. The synchronization feature described herein also includes providing a visual or audio indicator of a particular content stream within the user interface presenting the multiple content streams to indicate to a user information about a more relevant content stream as determined by the attributes. A user may choose to respond to the indicator by using an input/output device to interact with the particular content stream that has a visual indicator of relevancy or audio indicator of relevancy or choose not to respond as they are content with their current user interface presentation of the multiple content streams).
Soroushian does not explicitly disclose but Wan discloses:
the second video stream (V2) depicting a non-downscaled cropped version of said video scene; wherein the first video stream and the second video stream have same video resolution measured in pixels (Wan, para’s 0034-0035, 0043, 0150-0158, scalability pre-processor 210 may be operable to form HD resolution video content by cropping at least a portion of received HD resolution source video content. For example, the HD resolution video content may be formed by subtracting the cropped video content from the received HD resolution source video content; para. 0057, scalable encoder 114 may be enabled to crop the received video content to form multiple resolution video layers comprising a base video layer and one or more enhancement video layers having different resolutions (e.g., including same  resolution as the first video resolution R1). The formed HD resolution video content is a non-scaled cropped region of only a partial field-of-view of said HD resolution source video content. In addition to Wan’s disclosure, it is noted that generating a non-downscaled cropped video version from an original video source is also well known in the art as evidenced by Wang et al., US Publication 11,233,920, fig’s 2 and 3, generate video stream which is non-downscaled  cropped version of an original video source). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate either Wan’s features into Soroushian’s invention for enhancing user’s playback experience by providing zoom-in portions of a full source video for tracking object without loss of image quality.

Regarding claim 20, this claim comprises limitations substantially the same as claim 11; therefore, it is rejected for the same rationale. Soroushian-Wan further discloses one or more hardware processors to execute code, operably associated with one or more memory units to store code (see Soroushian, para. 0030, processor and memory).

8.	Claims 2-7 and 12-15 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Soroushian-Ojala, as applied to claim 1 above, in view of Katz et al. (US Publication 2018/0082152, hereinafter Katz).
Regarding claim 2, Soroushian-Ojala discloses the method of claim 1.
Soroushian-Ojala does not explicitly disclose but Katz discloses:
wherein step (c) comprises:
performing a computer vision analysis of said input video stream (V0), and recognizing an object-of-interest that is visually depicted in said input video stream (V0) (Katz, para. 0109, identifying object using computer vision algorithm). 
tracking in-frame locations of said object-of-interest across multiple frames of said input video stream (V0) (Katz et al, para. 0104, identify a sponsor logo within at least the first frame and then track an in-frame location of the sponsor logo across a plurality of subsequent frames in which the sponsor logo is depicted. The system may then augment image data in at least the plurality of subsequent frames to visually mark the sponsor logo). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate either Katz’s features into Soroushian-Ojala’s invention for effectively identifying the cropped frames in the video by tracking the in-frame location.

Regarding claim 3, Soroushian-Ojala-Katz discloses the method of claim 2, wherein step (c) further comprises:
cropping original non-downscaled frames of said input video stream (V0), into cropped frames that are composed to form the second video stream (V2); wherein each cropped frame contains therein said object-of-interest (Ojala, fig. 13, col. 18, lines 55-64,  generate a tailored video content for tracking;  each frame of the video content can be cropped to track the object as known in the art; Katz, para. 0104, identify a sponsor logo within at least the first frame and then track an in-frame location of the sponsor logo), 
wherein at least two cropped frames are cropped at different in-frame locations of said input video stream (V0) (Katz, para. 0104, identify a sponsor logo within at least the first frame and then track an in-frame location of the sponsor logo across a plurality of subsequent frames in which the sponsor logo is depicted. The system may then augment image data in at least the plurality of subsequent frames to visually mark the sponsor logo). 
The motivation and obviousness arguments are the same as claim 2.

Regarding claim 4, Soroushian-Ojala discloses the method of claim 1.
Soroushian-Ojala does not explicitly disclose but Katz discloses:
performing a computer vision analysis of said input video stream (V0), and recognizing at least a first object-of-interest and a second object-of-interest that are visually depicted in said input video stream (V0) (Katz, para. 0109, identifying objects using computer vision algorithm).
applying an object tracking algorithm to track the in-frame location of the first object-of-interest across frames of said input video stream (V0), and generating a first set of metadata indicating the in-frame location of the first object-of-interest across frames of said input video stream (V0) (Katz et al, para. 0104, identify a sponsor logo within at least the first frame and then track an in-frame location of the sponsor logo across a plurality of subsequent frames in which the sponsor logo is depicted. The system may then augment image data in at least the plurality of subsequent frames to visually mark the sponsor logo “generate a first set of metadata”);
applying said object tracking algorithm to track the in-frame location of the second object- of-interest across frames of said input video stream (V0), and generating a second set of metadata indicating the in-frame location of the second object-of-interest across frames of said input video stream (V0) (Katz et al, para. 0104, identify a second logo or object within at least the first frame and then track an in-frame location of the second logo/object across a plurality of subsequent frames in which the second logo/object is depicted. The system may then augment image data in at least the plurality of subsequent frames to visually mark the second logo/object “generate a second set of metadata”). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate either Katz’s features into Soroushian-Ojala’s invention for effectively identifying the cropped frames in the video by tracking the in-frame location.

Regarding claim 5, Soroushian-Ojala-Katz discloses the  method of claim 4, comprising:
based on said first set of metadata, generating from said input video stream (V0) a first cropped non-downscaled video stream, which tracks the first object-of-interest (Katz, para. 0104, identify a sponsor logo within at least the first frame and then track an in-frame location of the sponsor logo across a plurality of subsequent frames in which the sponsor logo is depicted. The system may then augment image data in at least the plurality of subsequent frames to visually mark the sponsor logo “generate a first set of metadata”; first cropped video stream can be generated based on the first set of metadata);
based on said second set of metadata, generating from said input video stream (V0) a second cropped non-downscaled video stream, which tracks the second object-of-interest (Katz et al, para. 0104, identify a second logo or object within at least the first frame and then track an in-frame location of the second logo/object across a plurality of subsequent frames in which the second logo/object is depicted. The system may then augment image data in at least the plurality of subsequent frames to visually mark the second logo/object “generate a second set of metadata”; second cropped video stream can be generated based on the second set of metadata).
The motivation and obviousness arguments are the same as claim 4.

Regarding claim 6, Soroushian-Ojala-Katz discloses the method of claim 5, comprising:
inserting to said streams manifest file at least: (i) a first pointer to a first storage address that stores the first cropped non-downscaled video stream which tracks the first object-of-interest, and (ii) a second pointer to a second storage address that stores the second cropped non-downscaled video stream which tracks the second object-of-interest (Soroushian, para. 0058, source encoding server generates a top level index, i.e., generates a manifest file containing a plurality of container files pointing to the plurality of alternative streams including first stream V1 and second stream V2. Alternative streams are streams that encode the same media content in different ways including but not limited to different maximum bitrates, resolutions and/or different frame rates, and the same aspect ratio corresponding to the aspect ratio of the source video. In addition to Soroushian’s disclosure above, it is noted that providing an manifest/index file comprises information on one or more video streams or video portions of such video streams including references, e.g. URL, to the one or more video streams or video portions of such video streams is also well known in the art as evidenced by Okerman et al., WIPO Publication WO 2021213831, page 4, para. 11 and page 14, para. 53).
The motivation and obviousness arguments are the same as claim 4.

Regarding claim 7, Soroushian-Ojala-Katz discloses the  method of claim 6, comprising:
in response to a first user-command, which indicates a request via an end-user device to perform a zoom-in operation on the first object-of-interest, providing to said end-user device the first cropped non-downscaled video stream which tracks the first object-of-interest (Ojala, fig’s 5 and 13, col. 18, lines 55-64,  generate a first tailored video content of a plurality of tailored video content;  each tailored stream tracks a “first” object appearing in the input video and has a cropped box smaller in area of pixels than the input video; Ojala, col. 8, lines 12-45, smart TV 207 may crop a target and a region of interest (or spatial portion) from the video stream and enlarge the image. The new cropped and zoomed video stream may be transmitted to other connected user devices.  The individual users may select an object to track or specify a desired region of interest to show on user devices. The proximity of the viewers to user devices may create an effective zoom effect. That is, displaying a cropped image on closer device creates a perceived zoom effect);
in response to a second user-command, which indicates a request via said end-user device to perform a zoom-in operation on the second object-of-interest, providing to said end-user device the second cropped non-downscaled video stream which tracks the second object-of-interest (Ojala, fig’s 5 and 13, col. 18, lines 55-64,  generate a second tailored video content of the plurality of tailored video content;  each tailored stream tracks a “second” object appearing in the input video and has a cropped box smaller in area of pixels than the input video; Ojala, col. 8, lines 12-45, smart TV 207 may crop a target and a region of interest (or spatial portion) from the video stream and enlarge the image. The new cropped and zoomed video stream may be transmitted to other connected user devices.  The individual users may select an object to track or specify a desired region of interest to show on user devices. The proximity of the viewers to user devices may create an effective zoom effect. That is, displaying a cropped image on closer device creates a perceived zoom effect).
The motivation and obviousness arguments are the same as claim 4.

Regarding claim 12, Soroushian-Ojala discloses the method of claim 11, comprising a manifest file as described above in claim 11.
Soroushian-Ojala does not disclose but Kat discloses:
parsing said streams manifest file at the video playback device, and extracting from said streams manifest file at least: a set of metadata indicating an in-frame location of said object-of-interest in at least one frame of the first video stream (V1) which depicts the full field-of-view of said video scene (Katz, para. 0104, identify a sponsor logo within at least the first frame and then track an in-frame location of the sponsor logo across a plurality of subsequent frames in which the sponsor logo is depicted. The system may then augment image data in at least the plurality of subsequent frames to visually mark the sponsor logo; these visual mark can be extracted as metadata). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate either Katz’s features into Soroushian-Ojala’s invention for effectively identifying the cropped frames in the video by tracking the in-frame location.

Regarding claim 13, Soroushian-Ojala discloses the method of claim 11.
Soroushian-Ojala does not disclose but Kat discloses:
based on said set of metadata extracted from said streams manifest file, generating at the video playback device a visual marking which indicates to a user that said object-of-interest is zoomable; wherein the visual marking is generated and is displayed as an overlay element on top of the first video stream (V1) during playback of the first video stream (Katz, para. 0104, identify a sponsor logo within at least the first frame and then track an in-frame location of the sponsor logo across a plurality of subsequent frames in which the sponsor logo is depicted. The system may then augment image data in at least the plurality of subsequent frames to visually mark the sponsor logo). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate either Katz’s features into Soroushian-Ojala’s invention for effectively identifying the cropped frames in the video by tracking the in-frame location.

Regarding claim 14, Soroushian-Ojala discloses the  method of claim 11, comprising:
at time-point T, transitioning from playing the first video stream (V1) on said video playback device to playing said second video stream (V2) on said video playback device from time-point T of said second video stream and onward, as described in claim 11 above.
Soroushian-Ojala does not disclose but Kat discloses:
monitoring user engagement with said overlay element, via one or more input units of the video playback device; and upon user engagement with said overlay element at time-point T, transitioning from playing the first video stream (V1) to playing said second video stream (V2) on said video playback device (Katz, para. 0104, identify a sponsor logo within at least the first frame and then track an in-frame location of the sponsor logo across a plurality of subsequent frames in which the sponsor logo is depicted. The system may then augment image data in at least the plurality of subsequent frames to visually mark the sponsor logo). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate either Katz’s features into Soroushian-Ojala’s invention for effectively identifying the cropped frames in the video by tracking the in-frame location.

Regarding claim 15, Soroushian-Ojala discloses the method of claim 11.
Soroushian-Ojala does not disclose but Katz discloses:
based on said set of metadata extracted from said streams manifest file, generating at the video playback device a textual indication which indicates to a user that describes said object-of-interest and that indicates to the user that said object-of-interest is zoomable (Katz, para. 0104, identify a sponsor logo within at least the first frame and then track an in-frame location of the sponsor logo across a plurality of subsequent frames in which the sponsor logo is depicted. The system may then augment image data in at least the plurality of subsequent frames to visually mark the sponsor logo). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate either Katz’s features into Soroushian-Ojala’s invention for effectively identifying the cropped frames in the video by tracking the in-frame location.

9.	Claim 8 is rejected under AIA  35 U.S.C. 103 as being unpatentable over Soroushian-Ojala, as applied to claim 1 above, in view of Lou et al. (US Publication 2014/0105576, hereinafter Lou).
Regarding claim 8, Soroushian-Ojala discloses the method of claim 1, comprising:
(I) for said input video stream (V0), generating a corresponding video or video-segment that corresponds to a downscaled video or video-segment depicting a full field-of-view of said video input stream (V0), to form said first video stream (V1) which is a downscaled version of said input video stream (V0), as described above in claim 1; 
(II) for said input video stream (V0), generating a corresponding video or video-segment that corresponds to a cropped non-downscaled video or video-segment depicting that visually tracks said first object-of-interest within said video input file (V0), to form said second video stream (V2) which is a cropped non-downscaled version of said input video stream (V0) ), as described above in claim 1.
Soroushian-Ojala does not explicitly disclose but Lou discloses segmenting said input video stream (V0) into a plurality of time-segments of equal length; and said input video stream (V0) comprises each of said time-segments (Lou, para. 0046, divide the video file into segments of a fixed size or segments of an equal time length).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Lou’s features into Soroushian-Ojala’s invention by referring to each timed video-segment as a video content for effectively generating different versions of the input video stream. 

10.	Claims 10 and 18 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Soroushian-Ojala, as applied to claims 1 and 11 above, in view of Gao (US Publication 2020/0245011).
Regarding claim 10, Soroushian-Ojala discloses the method of claim 1, wherein the method comprises: tracking a plurality of objects-of-interest within said input video stream (Ojala, fig. 5, tracking multiple objects 502a and 502b in the video);
generating a plurality of secondary video streams, wherein each one of the secondary video streams tracks a single object-of-interest that appears in the input video stream and that moves within the input video stream (Ojala, fig. 13, col. 18, lines 55-64,  generate a plurality of tailored video content; each tailored stream tracks an object  appearing in the input video and has a cropped box smaller in area of pixels than the input video).
Soroushian-Ojala does not explicitly disclose but Gao discloses:
wherein the input video stream is a 4K video stream or an 8K video stream (Gao, para. 0054, the acquired ultra-high definition video has the following features: a resolution of 4K);
wherein each one of the secondary video streams has an area, in pixels, of either 480p or 720p or 1080p (Gao, para. 0236, adjusts the respectively acquired video with a resolution of 4K to a resolution of 1080P).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Gao’s features into Soroushian-Ojala’s invention for effectively implementing the invention in using high quality video.

Regarding claim 18, Soroushian-Ojala discloses the  method of claim 11, wherein the method comprises:
	receiving at said video playback device, said streams manifest file which points to said first video stream and to a plurality of secondary video streams as described in claim 11 above, wherein each one of the secondary video streams tracks a single object-of-interest that appears in the input video stream and that moves within the input video stream (Ojala, fig. 13, col. 18, lines 55-64,  generate a plurality of tailored video content; each tailored stream tracks an object  appearing in the input video and has a cropped box smaller in area of pixels than the input video).
	Soroushian-Ojala does not explicitly disclose but Gao discloses:
wherein the first video stream is a 4K video stream or an 8K video stream (Gao, para. 0054, the acquired ultra-high definition video has the following features: a resolution of 4K),
wherein each one of the secondary video streams is either 480p or 720p or 1080p (Gao, para. 0236, adjusts the respectively acquired video with a resolution of 4K to a resolution of 1080P). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Gao’s features into Soroushian-Ojala’s invention for effectively implementing the invention in using high quality video.

11.	Claim 16 is rejected under AIA  35 U.S.C. 103 as being unpatentable over Soroushian-Ojala, as applied to claim 11 above, in view of Zhang et al. (US Patent 11,184,558, hereinafter Zhang).
Regarding claim 16, Soroushian-Ojala discloses the method of claim 11.
Soroushian-Ojala does not disclose but Zhang discloses:
between step (b) and step (c), generating and displaying on said video playback device a smooth transition effect, that emulates a smooth transition from (i) playback of the first video stream (V1), to (playback of the second video stream (V2) (Zhang, col. 9, line 60 to col. 10, line 27, the reframing logic will contain the object by the cropping box and the position of the object in the cropped video will have a smooth transition across all frames between start and end frames).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate either Zhang’s features into Soroushian-Ojala’s invention for enhancing user’s viewing experience. 

Consideration of Reference/Prior Art
12.    For applicant’s benefit portions of the cited reference(s) have been cited to aid in the review of the rejection(s). While every attempt has been made to be thorough and consistent within the rejection it is noted that the PRIOR ART MUST BE CONSIDERED IN ITS ENTIRETY, INCLUDING DISCLOSURES THAT TEACH AWAY FROM THE CLAIMS. See MPEP 2141.02 VI.

Conclusion
13.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
   
14.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to LOI H TRAN whose telephone number is (571)270-5645. The examiner can normally be reached 8:00AM-5:00PM PST FIRST FRIDAY OF BIWEEK OFF.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, THAI TRAN can be reached on 571-272-7382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LOI H TRAN/Primary Examiner, Art Unit 2484