DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/03/2022 has been entered. 
Response to Arguments
Applicant's arguments filed 01/03/2021 have been fully considered but they are moot in view of a new ground of rejections.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 6-7, 10-13, 16, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (US 2017/0302719 A1 – hereinafter Chen) and Brown et al. (US 2015/0145782 A1 – hereinafter Brown).
Regarding claim 1, Chen discloses a method comprising: receiving a primary video stream (Fig. 1; [0057] – receiving a primary video stream from a video sources 160, which is a normal video stream in one embodiment as shown in Fig. 5, in other embodiments, the tracking video stream is the primary video stream, yet in other embodiments where the tracking video stream and the normal video stream are the same – see [0016]); receiving tracking information for a plurality of tracked objects in the primary video stream, the tracking information indicating a time-varying position of each of the tracked objects ([0021]; [0060]-[0063] – receiving object tracking information to track objects from frame to frame, or in one embodiment shown in Fig. 5, the tracking video stream can be the tracking information itself); displaying the primary video stream (Fig. 5 – displaying the normal video stream at step 504); and in response to a user input selecting (i) a spatial location in the primary video stream and (ii) a zoom factor ([0061]; [0107] – in response to a user input selecting (i) a spatial location in the primary video stream where the user touches the screen and (ii) a zoom factor, i.e. determined based on the size of the selected ROI and the whole tracking region as further described in at least [0069]): identifying a selected tracked object based on the selected spatial location ([0061] – identifying the object of interest); based on the time-varying position and the selected zoom factor, cropping and upscaling the primary video stream to generate a first cropped and upscaled video stream that follows the selected tracked object at the selected zoom factor ([0100] – based on the time-varying position as described in at least [0062] and the selected zoom factor as described in [0069], cropping and upscaling the normal video stream to generate a zoomed in video stream for playing); and displaying the first cropped and upscaled video stream ([0100] – playing the zoomed in video stream until the full transition to high quality zoomed-in video is done); retrieving from a server a zoomed video stream corresponding to the selected tracked object ([0104]-[0105] – receiving a video stream containing selected ROIs, extracted from a target video stream, from a server); and switching from display of the first cropped and upscaled video stream to display of the zoomed video stream ([0100]; [0135] – transitioning playing of the zoomed-in regions of the normal video stream to playing of the zoomed-in regions of the target video stream after the zoomed-in regions of the target video stream are ready for rendering or display).
However, Chen does not disclose in response to a received user input, the received user input selecting (i) a spatial location in the primary video stream and the received user input selecting (ii) a zoom factor, wherein the received user input selects the zoom factor independence of the spatial location.
Brown discloses in response to a received user input, the received user input selecting (i) a spatial location in a primary image and the received user input selecting (ii) a zoom factor, wherein the received user input selects the zoom factor independence of the spatial location (Figs. 2A-2B, 3; [0030]-[0032]; [0039]; [0043] – a user input comprising: (1) a starting point, i.e. point 31, (2) a length corresponding to the trajectory drawn as part of the input, the starting point defines the spatial location around which the image is zoomed, as shown in Fig. 3, the point 31 is mapped to point 41 at the center of the screen, the zoom factor is independently determined as a function of the computed length in (2)).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Brown into the method taught by Chen to allow the user to zoom in at any point and at any desired zoom factor of the video stream, thus not limiting the user in fixed and unsatisfactory parameters.
Regarding claim 3, Chen also discloses the cropping is performed so as to keep the selected tracked object substantially centered in the first cropped and upscaled video stream as the selected tracked object moves in the primary video stream after selection of the selected tracked object ([0050]).
Regarding claim 6, Chen also discloses the tracking information is provided in-stream in the primary video stream ([0016]-[0017] – the primary video stream corresponds to the normal video stream, which is the same as the tracking video stream, which contains the tracking information, thus tracking information is provided in-stream in the primary video stream).
Regarding claim 7, Chen also disclose the tracking information is provided on a frame-by-frame basis ([0060]-[0063]).
Regarding claim 10, Chen also discloses displaying the first cropped and upscaled video stream comprises switching from display of the primary video stream to display of the first cropped and upscaled video stream ([0135] – playback switching from normal video stream, to zoomed-in regions during transition, and then to zoomed-in regions of target video stream).
Regarding claim 11, Chen also discloses displaying the first cropped and upscaled video stream comprises simultaneously displaying the primary video stream and the first cropped and upscaled video stream ([0072] – in Picture-in-Picture view).
Regarding claim 12, Chen also discloses the tracking information of each tracked object comprises coordinates of a bounding box of the respective tracked object ([0103]-[0104]).
Regarding claim 13, Chen also discloses the primary video stream includes a plurality of segments, and wherein metadata regarding tracked objects in each segment is provided on a segment-by-segment basis ([0060]-[0063] – each segment comprises at least a frame, thus the metadata are synchronized, frame by frame, with the images).
Claim 16 is rejected for the same reason as discussed in claim 1 above in view of Chen also disclosing a client device (Fig. 1) comprising: a network interface ([0053]-[0054]); a processor ([0053]); and a non-transitory computer-readable medium storing instructions operative, when executed on the processor, to cause the client device to perform the recited operations ([0011]; [0042]-[0043]).
	Regarding claim 25, Chen also discloses the instructions are further operative, when executed by the processor, to cause the device to scale the zoomed video stream according to the selected zoom factor, wherein display of the zoomed video stream comprises displaying the scaled zoomed video stream ([0100]; [0135]).
Claims 2, 4, and 21-22 are rejected under 35 U.S.C. 103 as being unpatentable over Chen and Brown as applied to claims 1, 3, 6-7, 10-13, 16, and 25 above, and further in view of Min (US 2014/0059457 A1 – hereinafter Min).
Regarding claim 2, see the teachings of Chen and Brown as discussed in claim 1 above, in which Chen also discloses the cropping and upscaling is based on the time-varying position ([0060]-[0063]).
However, Chen and Brown do not explicitly disclose while the first cropped and upscaled video stream is displayed, receiving a subsequent user input to change the zoom factor to a subsequent zoom factor; and in response to the subsequent user input: based on the subsequent zoom factor, cropping and upscaling the primary video stream to generate a second cropped and upscaled video stream that follows the selected tracked object at the subsequent zoom factor; and displaying the second cropped and upscaled video stream. 
Min discloses while a first cropped and upscaled video stream is displayed, receiving a subsequent user input to change a zoom factor to a subsequent zoom factor (Figs. 5-6 – after zooming in the image 510 into 560, receiving a subsequent zoom factor as shown in image 560 of Fig. 6); and in response to the subsequent user input: based on the subsequent zoom factor, cropping and upscaling a primary video stream to generate a second cropped and upscaled video stream that follows a selected tracked object at the subsequent zoom factor (Figs. 5-6 – in response to the input 690, based on the subsequent zoom factor, cropping and upscaling the image to a second cropped and upscaled image 660); and displaying the second cropped and upscaled video stream (Figs. 5-6). 
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to further incorporate the teachings of Min above into the method taught by Chen and Brown to allow the user continue zooming in the video stream until a satisfactory level of enlargement is reached.
Regarding claim 4, see the teachings of Chen and Brown as discussed in claim 1 above. However, Chen and Brown do not explicitly disclose the user input is an expanding pinch gesture applied to a touch screen, the expanding pinch gesture having initial touch points and final touch points, the method further comprising: determining the selected spatial location based on the initial touch points; and determining the selected zoom factor based on a change in distance between touch points from the initial touch points to the final touch points.
Min discloses a user input is an expanding pinch gesture applied to a touch screen, the expanding pinch gesture having initial touch points and final touch points, the method further comprising: determining a selected spatial location based on the initial touch points (Fig. 2 – step 230; Fig. 3 – steps 310, 330); and determining a selected zoom factor based on a change in distance between touch points from the initial touch points to the final touch points ([0039]-[0040]).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Min above into the method taught by Chen and Brown to allow the user to select a zoom factor conveniently following conventional pinching in and out techniques.
Claim 21 is rejected for the same reason as discussed in claim 2 above.
Claim 22 is rejected for the same reason as discussed in claim 4 above.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Chen and Brown as applied to claims 1, 3, 6-7, 10-13, 16, and 25 above, and further in view of Denoual et al. (US 2016/0182593 A1 – hereinafter Denoual).
Regarding claim 5, see the teachings of Chen and Brown as discussed in claim 1 above. However, Chen does not disclose receiving at least one manifest for the primary video stream, wherein the manifest identifies the plurality of tracked objects.
Denoual discloses receiving at least one manifest for a primary video stream, wherein the manifest identifies the plurality of tracked objects (Figs. 7, 8a-8b, 9a-9d; [0196] – receiving a manifest as shown in Figs. 7, 8a-8b, 9a-9d, for a primary video stream as further described at least at [0022], the manifest identifies the plurality of tracked objects as further described at least at [0039], [0062], [0083], etc. using manifest parameters).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Denoual into the method taught by Chen and Brown to enable handling of object appearance and disappearance over a video sequence effectively (Denoual: [0039]).
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Chen and Brown as applied to claims 1, 3, 6-7, 10-13, 16, and 25 above, and further in view of Min and Herring et al. (US 2013/0167062 A1 – hereinafter Herring).
Regarding claim 9, see the teachings of Chen and Brown as discussed in claim 1 above.
However, Chen and Brown do not disclose identifying a selected tracked object based on the selected spatial location comprises: determining a midpoint of two touch points by the user on a touch screen; and identifying a tracked object having a bounding box that encloses the midpoint.
Min discloses identifying a selected tracked object based on the selected spatial location comprises: determining two touch points by the user on a touch screen (Fig. 3); and identifying a tracked object having a bounding box that encloses a midpoint (Fig. 5).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Min above into the method taught by Chen and Brown to allow the user to select an object conveniently following conventional pinching in and out techniques.
However, Chen, Brown, and Min do not disclose determining a midpoint of the two touch points.
Herring discloses determining a midpoint of the two touch points ([0012]; [0021]; [0022]).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Herring into the method taught by Chen, Brown, and Min to successfully select small objects, the bounding box of which cannot encloses the two touch points.
Claims 14-15 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Chen and Brown as applied to claims 1, 3, 6-7, 10-13, 16, and 25 above, and further in view of Marman et al. (US 2012/0062732 A1 – hereinafter Marman).
Regarding claim 14, see the teachings of Chen and Brown as discussed in claim 1 above. However, Chen and Brown do not disclose highlighting the selected tracked object in response to selection of the selected tracked object.
Marman discloses highlighting the selected tracked object in response to selection of the selected tracked object (Figs. 7-9; [0045]).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Marman into the method taught by Chen and Brown so that the user can easily recognize his or her selection and make corrections if a wrong selection is made.
Regarding claim 15, see the teachings of Chen and Brown as discussed in claim 1 above, in which Chen also discloses the cropping and upscaling is based on the time-varying position of selected tracked objects , and wherein the first cropped and upscaled video stream follows the selected tracked objects at the selected zoom factor ([0060]-[0063]; [0069]; [0100]).
However, Chen and Brown do not explicitly disclose receiving additional user input selecting an additional spatial location in the primary video stream; and identifying an additional tracked object based on the additional spatial location, wherein the cropping and upscaling is based on the time-varying position of the additional tracked object, and wherein the first cropped and upscaled video stream follows the additional tracked object at the selected zoom factor.
Marman discloses receiving additional user input selecting an additional spatial location in the primary video stream ([0057]); and identifying an additional tracked object based on the additional spatial location ([0057]); wherein the cropping and upscaling is based on the time-varying position of the selected tracked object and of the additional tracked object, and wherein the first cropped and upscaled video stream follows the selected tracked object and the additional tracked object at a selected zoom factor ([0057]).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Marman into the method taught by Chen and Brown so that the user can track and zoom in additional objects at the same time.
Claim 23 is rejected for the same reason as discussed in claim 15 above.
Claim 26 are rejected under 35 U.S.C. 103 as being unpatentable over Marman, Chen, Min, and Brown.
Marman discloses method comprising: receiving a primary video stream (Fig. 1; [0049] - receiving a primary video stream 410); receiving tracking information for a plurality of tracked objects in the primary video stream, the tracking information indicating a time-varying position of each of the tracked objects (Fig. 1; [0051] – receiving metadata, which is tracking information for a plurality of tracked objects as further described at least at [0033]-[0034]); displaying the primary video stream (Fig. 7; [0063] – displaying the primary video stream 700); and in response to a received user input, the received user input is associated with (ii) a zoom factor change ([0063] – a user selecting group zoom): identifying at least two selected tracked objects ([0063] – identifying two persons to be tracked); based on the time-varying position and the selected zoom factor change, cropping and upscaling the primary video stream to generate a first cropped and upscaled video stream that follows the selected tracked objects at the selected zoom factor change ([0063]; Fig. 7 – cropping and upscaling the primary video stream to generate a first cropped and upscaled video stream 710 that follows the selected tracked objects at the selected zoom factor change); displaying the first cropped and upscaled video stream (Fig. 7; [0063] – displaying the first cropped and upscaled video stream 710).
However, Marman does not disclose in response to a received user input, the received user input controlling selection of (1) a spatial location in the primary video stream and the received user input controlling selection of (ii) a zoom factor change: identifying at least two selected tracked objects based on the selected spatial location; retrieving from a server a zoomed video stream corresponding to the selected tracked objects; and switching from display of the first cropped and upscaled video stream to display of the zoomed video stream, wherein controlling selection of the zoom factor change is independent of controlling selection of the spatial location.
Chen discloses retrieving from a server a zoomed video stream corresponding to selected tracked objects ([0104]-[0105] – receiving a video stream containing selected ROIs, extracted from a target video stream, from a server); and switching from display of the first cropped and upscaled video stream to display of the zoomed video stream ([0100]; [0135] – transitioning playing of the zoomed-in regions of the normal video stream to playing of the zoomed-in regions of the target video stream after the zoomed-in regions of the target video stream are ready for rendering or display).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Chen into the method taught by Marman to enhance the quality of the zoomed-in images, e.g. with higher resolution (Chen: [0007]).
However, Marman and Chen do not disclose in response to a received user input, the received user input controlling selection of (1) a spatial location in the primary video stream and the received user input controlling selection of (ii) a zoom factor change: identifying at least two selected tracked objects based on the selected spatial location, wherein controlling selection of the zoom factor change is independent of controlling selection of the spatial location.
Min discloses in response to a received user input, the received user input controlling selection of (1) a spatial location in the primary video stream and associated with (ii) a zoom factor change: identifying at least two selected tracked objects based on the selected spatial location (Fig. 7; [0052]).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Min into the method taught by Marman and Chen to allow the user to select zooming in only those objects that are interest of the user, e.g. versus Marman’s teachings that all objects in the entire screen are zoomed in in response to a group zoom command, thus avoiding wasting system’s resources that would have been used to zoom in unintended objects.
Marman, Chen, and Min do not disclose in response to a received user input, the received user input controlling selection of (1) a spatial location in the primary video stream and the received user input controlling selection of (ii) a zoom factor change, wherein controlling selection of the zoom factor change is independent of controlling selection of the spatial location.
Brown discloses in response to a received user input, the received user input controlling selection of (1) a spatial location in the primary video stream and the received user input controlling selection of (ii) a zoom factor change, wherein controlling selection of the zoom factor change is independent of controlling selection of the spatial location (Figs. 2A-2B, 3; [0030]-[0032]; [0039]; [0043] – a user input comprising: (1) a starting point, i.e. point 31, (2) a length corresponding to the trajectory drawn as part of the input, the starting point defines the spatial location around which the image is zoomed, as shown in Fig. 3, the point 31 is mapped to point 41 at the center of the screen, the zoom factor is independently determined as a function of the computed length in (2)).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Brown into the method taught by Marman, Chen, and Min to allow the user to zoom in at any point and at any desired zoom factor of the video stream, thus not limiting the user in fixed and unsatisfactory parameters.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUNG Q DANG whose telephone number is (571)270-1116.  The examiner can normally be reached on IFT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Thai Q Tran can be reached on 571-272-7382.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/HUNG Q DANG/Primary Examiner, Art Unit 2484