DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This office action is a response to an application filed on 03/10/2021, in which claims 1-20 are pending and ready for examination.

Response to Amendment
Claims 1, 13, 17, and 19 are currently amended.

Response to Argument
Applicant's arguments filed 03/10/2021 have been fully considered but they are not persuasive.

With respect to claims 1, 13, 17, 19 rejected under 35 USC 102 and 103, the Applicant argues, see , that Liu does not teach the amended feature “placing a first search window … centered on the determined first location of the target object, wherein the size of the first search window is greater than the size of the determined bounding box”, “one window size … is larger than the size of the first search window to anticipate that the target object will increase in size ...”, “one window size … is smaller than the size of the first search window to anticipate that the target object will decrease in size …” by asserting that Fig 2B in Liu shows that the first scale/search window and the second scale/search window are clearly off-center from the bounding box.
Examiner cannot concur. The Applicant’s argument pivots from the assertion that the windows (e.g. first and second windows) are not centered on the determined location of the target object, wherein the determined location is recited to be specified by the bounding box. Based on the recitation, the determined location is anywhere within the bounding box. From Fig. 2B of Liu, it is clear that a first scale/search window and a second scale/search window are centered on at least on point within a bounding box, and it is irrelevant whether the windows are off-center with respect to the bounding box or not. In addition, in Para. [0029], the bounding box varies in size as the detected object move closer or farther, anticipating/correlating with the size of the target object, wherein the scale/search window is .
   
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims  1-2, 13-14, and 17 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Liu (US 20190103026 A1).

Regarding claim 1, Liu discloses a computer-implemented method for performing real-time visual tracking of a target object captured in a video, the method comprising (Liu; Fig. 3A, Para. [0031]. A computer method/process is used to track a target object in a video in real time.): 
receiving a first video image of the video and a determined bounding box of the target object in the first video image, wherein the determined bounding box is a rectangular box which specifies a determined first location of the target object and determined dimensions of the target object (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. Video images are continuously processed, wherein a first video image is received, including a determined bounding box of a target object, the box is a rectangular box representing a first location of the target object and dimensions of the target object.); 
receiving a second video image of the video following the first video image, wherein the location of the target object is unknown in the second video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. Video images are continuously processed, wherein a second video image is received, including a target object location to be determined.); 
placing a first search window in the first video image centered on the determined first location of the target object, wherein the size of the first search window is greater than the size of the determined bounding box, and separately placing multiple second search windows of multiple window sizes in the second video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. A first search window/cropped portions is placed in a first video image centered on at least a first determined location of a target object within a bounding box, wherein the first search window is larger than the bounding box, and multiple second search windows/cropped portions of different sizes are placed a second video image.), wherein each of the multiple second search windows is centered on a second location in the second video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. Multiple second search windows/cropped portions are centered on a second location in a second video image.), 
wherein the second location corresponds to the determined first location of the target object in the first video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. Multiple second search windows/cropped portions are centered on a second location in a second video image, wherein a second location corresponding to a first determined location.): 
wherein at least one window size in the multiple window sizes is larger than the size of the first search window to anticipate that the target object will increase in size in the second video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. The size of search windows/cropped portions are scaled such that at least one window/cropped portion size is larger, smaller, or remain the same than a size of a first search window/cropped portion, wherein the bounding box varies in size as the detected object move closer or farther, anticipating/correlating with the size of the target object, wherein the scale/search window is smaller or larger for the corresponding bounding box anticipating/correlating with the size of the target object.); and 
wherein at least one window size in the multiple window sizes is smaller than the size of the first search window to anticipate that the target object will decrease in size in the second video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. The size of search windows/cropped portions are scaled such that at least one window/cropped portion size is larger, smaller, or remain the same than a size of a first search window/cropped portion, wherein the bounding box varies in size as the detected object move closer or farther, anticipating/correlating with the size of the target object, wherein the scale/search window is smaller or larger for the corresponding bounding box anticipating/correlating with the size of the target object..);
computing a correlation map between a first image patch of the first video image within the first search window and a second image patch of the second video image within each of the multiple second search window (Liu; Fig. 2B, 5, 6B,C,Para. [0028-30, 38-40, 42-44-47]. A similarity/correlation is calculated between a first image detected portion of a first search window/cropped portion of a first video image and each second image portion of second search windows/cropped portions of a second video image within second.); and 
determining an updated location of the target object in the second video image based on the computed correlation maps (Liu; Fig. 2B, 5, 6B,C,Para. [0089-30, 38-40, 42-44-47]. An updated location of a trajectory is determined in a second video image in accordance with a identified best similarity in the second video image.).

Regarding claim 2, Liu discloses the first search window and at least one search window in the multiple second search windows have the same horizontal and vertical dimensions (Liu; Fig. 2B, 5, 6B,C,Para. [0028-30, 38-40, 42-44-47]. The size of search windows/cropped portions are scaled such that at least one window/cropped portion size is larger, smaller, or remain the same than a size of a first search window/cropped portion.).

Regarding claim 13, Liu discloses a computer-implemented method for performing real-time visual tracking of a target object captured in a video, the method comprising (Liu; Fig. 3A, Para. [0031]. A computer method/process is used to track a target object in a video in real time.): 
receiving a first video image of the video and a previously-determined first location of the target object in the first video image (Liu; Fig. 2B, 5, 6B,C, Para. [0029-30, 44]. Video images are continuously processed, wherein a first video image is received, including a first determined location of a target object.); 
receiving a second video image of the video following the first video image, wherein the location of the target object is unknown in the second video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. Video images are continuously processed, wherein a second video image is received, including a target object location to be determined.); 
placing a first search window in the first video image centered on the determined first location of the target object, wherein the size of the first search window is greater than the size of the determined bounding box, and separately placing multiple second search windows of multiple different sizes in the second video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. A first search window/cropped portions is placed in a first video image centered on at least a first determined location of a target object within a bounding box, wherein the first search window is larger than the bounding box, and multiple second search windows/cropped portions of different sizes are placed a second video image.), wherein each of the multiple second search windows is centered on a second location in the second video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. Multiple second search windows/cropped portions are centered on a second location in a second video image.),
wherein the second location corresponds to the determined first location of the target object in the first video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. Multiple second search windows/cropped portions are centered on a second location in a second video image, wherein a second location corresponding to a first determined location.); 
wherein at least one window size in the multiple window sizes is larger than the size of the first search window to anticipate that the target object will increase in size in the second video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. The size of search windows/cropped portions are scaled such that at least one window/cropped portion size is larger, smaller, or remain the same than a size of a first search window/cropped portion, wherein the bounding box varies in size as the detected object move closer or farther, anticipating/correlating with the size of the target object, wherein the scale/search window is smaller or larger for the corresponding bounding box anticipating/correlating with the size of the target object.); and
wherein at least one window size in the multiple window sizes is smaller than the size of the first search window to anticipate that the target object will decrease in size in the second video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. The size of search windows/cropped portions are scaled such that at least one window/cropped portion size is larger, smaller, or remain the same than a size of a first search window/cropped portion, wherein the bounding box varies in size as the detected object move closer or farther, anticipating/correlating with the size of the target object, wherein the scale/search window is smaller or larger for the corresponding bounding box anticipating/correlating with the size of the target object..);
computing a set of correlation maps between a first image patch of the first video image within the first search window and each of the multiple second image patches of the second video image within the multiple second search windows (Liu; Fig. 2B, 5, 6B,C,Para. [0028-30, 38-40, 42-44-47]. A similarity/correlation is calculated between a first image detected portion of a first search window/cropped portion of a first video image and each second image portion of second search windows/cropped portions of a second video image within second.); 
identifying a peak value in each correlation map of the set of the computed correlation maps (Liu; Fig. 2B, 5, 6B,C,Para. [0028-30, 38-40, 42-44-47]. The best similarity match is identified from a set of calculated similarities/correlations.); 
identifying the highest peak value in the set of peak values (Liu; Fig. 2B, 5, 6B,C,Para. [0028-30, 38-40, 42-44-47]. The best similarity is identified from a set of calculated similarities/correlations.); and 
determining an updated location of the target object in the second video image based on the location of the highest peak value in the second video image (Liu; Fig. 2B, 5, 6B,C,Para. [0028-30, 38-40, 42-44-47]. An updated location of a trajectory is determined in a second video image in accordance with a identified best similarity in the second video image.).

Regarding claim 14, Liu discloses the multiple second search windows include: one or more search windows having window sizes larger than the first search window (Liu; Fig. 2B, 5, 6B,C,Para. [0028-30, 38-40, 42-44-47]. The size of search windows/cropped portions are scaled such that at least one window/cropped portion size is larger, smaller, or remain the same than a size of a first search window/cropped portion.); 
one or more search windows having window sizes smaller than the first search window (Liu; Fig. 2B, 5, 6B,C,Para. [0028-30, 38-40, 42-44-47]. The size of search windows/cropped portions are expanded or shrunk such that at least one window/cropped portion size is smaller than a size of a first search window/cropped portion.); and
 another search window having the same size as the first search window (Liu; Fig. 2B, 5, 6B,C,Para. [0028-30, 38-40, 42-44-47]. The size of search windows/cropped portions are scaled such that at least one window/cropped portion size is larger, smaller, or remain the same than a size of a first search window/cropped portion.).

Claims 17 is directed to a system capable of performing real-time visual tracking of a target object captured in a video by a camera, the system comprising: one or more processors; a memory coupled to the one or more processors; wherein the memory storing instructions that, when executed by the one or more processors (Liu; Fig. 3A, Para. [0061-64].), cause the system to perform a sequence of processing steps corresponding to the same as claimed in claim 1.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3, 5, 16 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Liu (US 20190103026 A1), in view of Han (US Pub. 20050163341 A1).

Regarding claim 3, Liu disclose computing the correlation map between the first image patch within the first search window and the second image patch within the second search window includes: extracting a first feature map from the first image patch and a second feature map from the second image patch (Liu; Fig. 2B, 5, 6B,C,Para. [0029-30, 38-40, 42-44-47]. 7. A score of similarity/correlation between a first image portion of features of at least on target object within a first search window and a second image portion of features of at least one target object within a second search window is determined/computed.);
But it does not specifically disclose computing a two-dimensional (2D) Fast Fourier Transform (FFT) on the first and second extracted feature maps to generating Fourier representations of the first and second extracted feature maps; computing a cross-correlation between the Fourier representations of the first and second extracted feature maps; and converting the computed cross-correlation back to the spatial domain to obtain the correlation map.
However, Han teaches computing the correlation map between the first image patch within the first search window and the second image patch within the second search window includes: extracting a first feature map from the first image patch and a second feature map from the second image patch (Han; Fig. 2, Para. [0043, 44, 48], [0042-68]. A first search region of features is obtained from a first image, and a second search region of features is obtained from a second image); 
computing a two-dimensional (2D) Fast Fourier Transform (FFT) on the first and second extracted feature maps to generating Fourier representations of the first and second extracted feature maps (Han; Fig. 2, Para. [0043, 44, 48]. A 2D FFT of a first search region of features is obtained from a first image, and a 2D FFT of a second search region of features is obtained from a second image); 
computing a cross-correlation between the Fourier representations of the first and second extracted feature maps (Han; Fig. 2, Para. [0045]. A cross-correlation between a 2D  FFT of a first search region of features and a 2D FFT of a second search region of features is calculated.); and 
converting the computed cross-correlation back to the spatial domain to obtain the correlation map (Han; Fig. 2, Para. [0053, 68]. A 2D inverse FFT of a calculated cross-correlation is obtained.).
Therefore, it would have been obvious to a person with ordinary skill in the pertinent before the effective filing date of the claimed invention to modify the object tracking system/method of Liu to adapt an target tracking approach, by incorporating Han’s teaching wherein a FFT based correlation approach is employed for target tracking, for the motivation to provide a robust and accurate target tracking technique (Han; Abstract, Para. [0005].).

Regarding claim 5, modified Liu teaches extracting the first or the second feature map from the first or the second image patch includes: extracting a geometry-based feature map from the image patch (Liu; Fig. 2B, 5, 6B,C,Para. [0029-30, 38-40, 42-44-47]. Geometry features of gradient and features of color are extracted from image patches/portions.); extracting a color-based feature map from the image patch (Liu; Fig. 2B, 5, 6B,C,Para. [0029-30, 38-40, 42-44-47]. Geometry features of gradient and features of color are extracted from image patches/portions.); and concatenating the geometry-based feature map and the color-based feature map to obtain the first or second feature map (Liu; Fig. 2B, 5, 6B,C,Para. [0029-30, 38-40, 42-44-47]. Geometry features of gradient and features of color are extracted from image patches, wherein extracted feature are concatenated together.).

Regarding claim 16, Liu disclose computing a given correlation map between the first image patch within the first search window and the second image patch in the multiple second image patches within the multiple second search windows includes: extracting a first feature map from the first image patch and a second feature map from the second image patch (Liu; Fig. 2B, 5, 6B,C,Para. [0029-30, 38-40, 42-44-47]. 7. A score of similarity/correlation between a first image portion of features of at least on target object within a first search window and a second image portion of features of at least one target object within a second search window is determined/computed.);
But it does not specifically disclose computing a two-dimensional (2D) Fast Fourier Transform (FFT) on the first and second extracted feature maps to generating Fourier representations of the first and second extracted feature maps; computing a cross-correlation between the Fourier representations of the first and second extracted feature maps; and converting the computed cross-correlation back to the spatial domain to obtain the given correlation map.
However, Han teaches computing a given correlation map between the first image patch within the first search window and the second image patch in the multiple second image patches within the multiple second search windows includes: extracting a first feature map from the first image patch and a second feature map from the second image patch (Han; Fig. 2, Para. [0043, 44, 48], [0042-68]. A first search region of features is obtained from a first image, and a second search region of features is obtained from a second image); 
computing a two-dimensional (2D) Fast Fourier Transform (FFT) on the first and second extracted feature maps to generating Fourier representations of the first and second extracted feature maps (Han; Fig. 2, Para. [0043, 44, 48]. A 2D FFT of a first search region of features is obtained from a first image, and a 2D FFT of a second search region of features is obtained from a second image); 
computing a cross-correlation between the Fourier representations of the first and second extracted feature maps (Han; Fig. 2, Para. [0045]. A cross-correlation between a 2D  FFT of a first search region of features and a 2D FFT of a second search region of features is calculated.); and 
converting the computed cross-correlation back to the spatial domain to obtain the given correlation map (Han; Fig. 2, Para. [0053, 68]. A 2D inverse FFT of a calculated cross-correlation is obtained.).
Therefore, it would have been obvious to a person with ordinary skill in the pertinent before the effective filing date of the claimed invention to modify the object tracking system/method of Liu to adapt an target tracking approach, by incorporating Han’s teaching wherein a FFT based correlation approach is employed for target tracking, for the motivation to provide a robust and accurate target tracking technique (Han; Abstract, Para. [0005].).

Claims 18 is directed to a system capable of performing real-time visual tracking of a target object captured in a video by a camera, the system comprising: one or more processors; a memory coupled to (Liu; Fig. 3A, Para. [0061-64].), cause the system to perform a sequence of processing steps corresponding to the same as claimed in claim 3.

Claims 4 are rejected under 35 U.S.C. 103 as being unpatentable over Liu (US 20190103026 A1) in view of Han (US Pub. 20050163341 A1), as applied to claim 3, and further in view of Mentese (US Pub. 20160171330 A1). 

Regarding claim 4, Liu discloses computing the cross-correlation between the first and second extracted feature maps (Liu; Fig. 2B, 5, 6B,C,Para. [0029-30, 38-40, 42-44-47]. 7. A score of similarity/correlation between a first image portion of features of at least on target object within a first search window and a second image portion of features of at least one target object within a second search window is determined/computed.).
But it does not specifically disclose computing the cross-correlation between the Fourier representations of the first and second extracted feature maps further comprises: computing a first feature model for the first feature map by computing a Gaussian kernel auto-correlation of the Fourier representation of the first extracted feature map; computing a second feature model for the second feature map by computing a Gaussian kernel auto-correlation of the Fourier representation of the second extracted feature map; and computing the cross-correlation between the Fourier representations of the first and second extracted feature maps by computing element-wise products of the first feature model and the second feature model.
However, Mentese teaches computing the cross-correlation between the Fourier representations of the first and second extracted feature maps further comprises: computing a first feature model for the first feature map by computing a Gaussian kernel auto-correlation of the Fourier representation of the first extracted feature map (Mentese; Para. [0115-116]. A first model for samples with first features is obtained/computed based on a kernel with Gaussian auto-correlation.); 
computing a second feature model for the second feature map by computing a Gaussian kernel auto-correlation of the Fourier representation of the second extracted feature map (Mentese; Para. [0115-116]. A second model for samples with second features is obtained/computed based on a kernel with Gaussian auto-correlation.); and 
computing the cross-correlation between the Fourier representations of the first and second extracted feature maps by computing element-wise products of the first feature model and the second feature model (Mentese; Para. [0116-117]. A cross-correlation between a first model and a second model is obtained/computed.).
Therefore, it would have been obvious to a person with ordinary skill in the pertinent before the effective filing date of the claimed invention to further modify the object tracking system/method of modified Liu to adapt an target tracking approach, by incorporating Mantese’s teaching wherein a Gaussian Kernal based correlation approach is employed for target tracking, for the motivation to provide automatic tracking of objects using a camera (Mentese, Para. [0001].).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Liu (US 20190103026 A1) in view of Tsougarakis (US Pat. 6901110 B1).

Regarding claim 6, Liu discloses prior to receiving the first video image, the method further comprises: receiving an earliest video image in a sequence of video frames of the video (Liu; Fig. 2B, 5, 6B,C,Para. [0029-30, 38-40, 42-44-47].  An initial image of a sequence of images is received.); and performing an object detection operation on the earliest video image to generate a initial location and an initial bounding box for the target object in the earliest video image based on the user selected location (Liu; Fig. 2B, 5, 6B,C,Para. [0029-30, 38-40, 42-44-47]. An initial location and object window/bounding box is obtained via object detection on an initial image.).
But Liu does not specifically disclose receiving a user selected location of the target object.
However, Tsougarakis teaches receiving a user selected location of the target object within the earliest video image (Tsougarakis; Fig. 12-13, Abstract, Col. 1, Ln. 52-58, Col. 4, Ln. 8-15. A user selection of location of a target object is received.); and
performing an object detection operation on the earliest video image to generate an initial location and an initial bounding box for the target object in the earliest video image based on the user selected location (Tsougarakis; Fig. 12-13, Abstract, Col. 1, Ln. 52-58, Col. 4, Ln. 8-15. An initial location and bounding box (see Fig. 12-13) is obtained via object detection on an initial image in accordance with a user selected location.).
Therefore, it would have been obvious to a person with ordinary skill in the pertinent before the effective filing date of the claimed invention to modify the object tracking system/method of Liu to adapt an initialization approach, by incorporating Tsougarakis’ teaching wherein a user selection of position of an object to be tracked is provided, for the motivation to enable object tracking in video sequences (Tsougarakis; Field of the Invention.).

Claims 7-8 and 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Liu (US 20190103026 A1) in view Gaidon (US Pat. 9443320 B1).

Regarding claim 7, Liu discloses identifying the updated location of the target object in the second video image based on the computed correlation map includes: identifying a peak value in the computed correlation map (Liu; Fig. 2B, 5, 6B,C,Para. [0029-30, 38-40, 42-44-47]. The best similarity match is identified from a set of calculated similarities/correlations.)
But it does not specifically disclose comparing the identified peak value with a first threshold value; and if the identified peak value is greater than or equal to the first threshold value, choosing the location of the peak value as the updated location of the target object in the second video image.
However, Gaidon teaches identifying the updated location of the target object in the second video image based on the computed correlation map includes: identifying a peak value in the computed correlation map (Gaidon; Fig. 2, Col. 8, Ln. 26-34, Col. 13, Ln. 13 to Col. 14, Ln. 7. A score of similarity/correlation between a first image portion of at least on target object within a first search window and a second image portion of at least one target object within a second search window is identified/computed.); 
comparing the identified peak value with a first threshold value (Gaidon; Fig. 2, Col. 8, Ln. 26-34, Col. 13, Ln. 13 to Col. 14, Ln. 7. An identified value of score is compared with a first threshold.); and 
if the identified peak value is greater than or equal to the first threshold value, choosing the location of the peak value as the updated location of the target object in the second video image (Gaidon; Fig. 2, Col. 8, Ln. 26-34, Col. 13, Ln. 13 to Col. 14, Ln. 7. For an identified value of score being greater or equal to a first threshold, a location of a target object associated with the identified value is determined as a location of the target object of a current frame.).
Therefore, it would have been obvious to a person with ordinary skill in the pertinent before the effective filing date of the claimed invention to modify the object tracking system/method of Liu to adapt an target tracking approach, by incorporating Gaidon’s teaching wherein a threshold value associated with a similarity measure is used in object tracking, for the motivation to provide a robust and accurate target tracking technique (Gaidon; Abstract.).

Regarding claim 8, modified Liu teaches if the identified peak value is less than the first threshold value, the method further comprises: receiving a third video image of the video following the second video image (Gaidon; Fig. 2, Col. 8, Ln. 26-34, Col. 13, Ln. 13 to Col. 14, Ln. 7. For an identified value of score being less than a first threshold, a subsequent image of a third image is received.); 
receiving a predicted location of the target object in the third video image from a target motion estimation model, wherein the predicted location is in the vicinity of the first location (Gaidon; Col. 6, Ln. 7-25. A predicted region/location associated a previous location is received/obtained based on an appearance model and a motion model, wherein the prediction region/location is nearby a previous location.); 
searching for the target object locally based on the predicted location (Gaidon; Col. 6, Ln. 7-25. A predicted region/location associated a previous location is received/obtained based on an appearance model and a motion model, wherein the prediction region/location is nearby a previous location, wherein a target object is searched locally based on a predicted region/location.); and 
if the target object is re-identified locally near the predicted location, resuming using the determined location of the target object in the third video image to track the target object in a subsequent video image in the video (Gaidon; Col. 8, Ln. 17-24, Col. 13, Ln. 13 to Col. 14, Ln. 7. For a target object being reinitialized/re-identified near a predicted region/location, A determined location of a target object in a subsequent image of a third image is used to track the target object in images.).

Regarding claim 10, modified Liu teaches searching for the target object locally based on the predicted location includes: placing a third search window in the third video image centered on the predicted location of the target object (Gaidon; Col. 6, Ln. 7-25. A predicted region/location associated a previous location is received/obtained based on an appearance model and a motion model, wherein the prediction region/location is nearby a previous location, wherein a target object is searched locally based on a predicted region/location.); 
extracting a third feature map from a third image patch of the third video image within the third search window (Gaidon; Fig. 2, Col. 5, Ln. 39-50, Col. 6, Ln. 7-25, Col. 7, Ln. 65 to Col. 8, Ln. 10. Feature representations are extracted from object regions in a third image.); 
retrieving a set of stored feature maps computed at a set of previously-determined locations for the target object associated with a set of previously-processed video images (Gaidon; Fig. 2, Col. 5, Ln. 39-50, Col. 6, Ln. 7-25, Col. 7, Ln. 65 to Col. 8, Ln. 10. A set of feature representations are obtained, through learned model, for a target object.); 
computing a set of correlation maps between the third feature map and each of the set of stored feature maps(Gaidon; Col. 5, Ln. 39-50, Col. 6, Ln. 7-25, Col. 7, Ln. 65 to Col. 8, Ln. 10. A set of feature representations are obtained, through learned model, for a target object.); and 
attempting to re-identify the target object in the third video image based on the set of computed correlation maps (Gaidon; Fig. 2, Col. 8, Ln. 1-20. A target object is reinitialized/re-identfified in a third image based on a similarity/correlation measure.).

Regarding claim 11, modified Liu teaches attempting to re-identify the target object in the third video image based on the computed correlation maps includes: identifying a peak value in each correlation map of the set of the computed correlation maps (Gaidon; Fig. 2, Col. 8, Ln. 17-34, Col. 13, Ln. 13 to Col. 14, Ln. 7. A score of similarity/correlation between a third image portion of at least one target object within a third image and corresponding image portions of previous images is identified/computed.); 
identifying the highest peak value in the set of peak values (Gaidon; Fig. 2, Col. 8, Ln. 17-34, Col. 13, Ln. 13 to Col. 14, Ln. 7. A score of similarity/correlation between a first image portion of at least on target object within a first search window and a second image portion of at least one target object within a second search window is identified/computed.); 
comparing the identified highest peak value with a second threshold value (Gaidon; Fig. 2, Col. 8, Ln. 26-34, Col. 13, Ln. 13 to Col. 14, Ln. 7. An identified value of score is compared with a first threshold.); and 
if the identified highest peak value is greater than the second threshold value, determining that the target object is re-identified in the third video image (Gaidon; Fig. 2, Col. 8, Ln. 26-34, Col. 13, Ln. 13 to Col. 14, Ln. 7. For an identified value of score being greater or equal to a second threshold, a location of a target object associated with the identified value is determined as a location of the target object of a current frame.).

Claims 9 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over over Liu (US 20190103026 A1) in view Gaidon (US Pat. 9443320 B1), as applied to claim 8, and further view of Huang (US Pub. 20090110236 A1).

Regarding claim 9, Liu discloses a target motion estimation model (Liu; Fig. 2B, 5, 6B,C,Para. [0029-30, 38-40, 42-44-47]. A object motion model is used.).
But it does not specifically disclose prior to receiving the predicted location, the method further comprises training the target motion estimation model using a set of previously-determined locations for the target object in the sequence of video frames.
However, Huang teaches prior to receiving the predicted location, the method further comprises training the target motion estimation model using a set of previously-determined locations for the target object in the sequence of video frames (Huang; Para. [0054, 57]. Before updating possible location of a target object, a target motion model is updated/trained using previous determined locations for the target object.).
Therefore, it would have been obvious to a person with ordinary skill in the pertinent before the effective filing date of the claimed invention to modify the object tracking system/method of Liu to adapt a target tracking approach, by incorporating Huang’s teaching wherein a target motion model is used and trained with a Kalman filter, for the motivation to provide a system for object detecting and tracking (Huang; Field of the Invention.).

Regarding claim 12, Liu discloses a target motion estimation model (Liu; Fig. 2B, 5, 6B,C,Para. [0029-30, 38-40, 42-44-47]. A object motion model is used.).
But it does not specifically disclose the target motion estimation model uses a trained Kalman filter to predict a current location of the target object.
However, Huang teaches the target motion estimation model uses a trained Kalman filter to predict a current location of the target object (Huang; Para. [0054, 57]. Before updating possible location of a target object, a target motion model is updated/trained using previous determined locations for the target object.).
Therefore, it would have been obvious to a person with ordinary skill in the pertinent before the effective filing date of the claimed invention to modify the object tracking system/method of Liu to adapt a target tracking approach, by incorporating Huang’s teaching wherein a target motion model is used and trained with a Kalman filter, for the motivation to provide a system for object detecting and tracking (Huang; Field of the Invention.).

Claim 15 is rejected, in the alternative, under 35 U.S.C. 103 as obvious over Liu (US 20190103026 A1).

Regarding claim 15, Liu discloses prior to computing the set of correlation maps, the method further comprises scaling each window size of the multiple second search windows to have different sizes (Liu; Fig. 2B, 5, 6B,C,Para. [0029-30, 38-40, 42-44-47]. The size of search windows/cropped portions are scaled to expanded, shrunk, remain the same to have different sizes.) except for scaling each window size of the multiple second search windows to the same size as the first search window.
It would have been an obvious matter of design choice to scale/change to different window size wherein the design choice includes scaling each window size of the multiple second search windows to the same size as the first search window, since such a modification would have involved a mere change in the size of a component/feature of search windows. A change in size is generally recognized as being within the level of ordinary skill in the art (In re Rose, 105 USPQ 237 (CCPA 1955).).

Claims 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Van Schoyck (US Pub. 20190068829 A1) in view of Gaidon (US Pat. 9443320 B1).

Regarding claim 19, Van Schoyck discloses an unmanned aerial vehicle (UAV) capable of performing real-time visual tracking of a moving object, the UAV comprising (Van Schoyck; Fig. 1, Para. [0003]. A UAV used for real time target tracking.): 
one or more processors (Van Schoyck; Fig. 2, Para. [0030]. A processor is used.); 
a memory coupled to the one or more processors (Van Schoyck; Fig. 2, Para. [0030]. A A processor is connected to a memory.); 
a camera mounted on a gimbal and coupled to the one or more processors and the memory (Van Schoyck; Fig. 2, Para. [0030]. A camera on a gimbal is connected to a processor and a memory.), 
wherein the camera is configured to capture a video of the moving object (Van Schoyck; Fig. 2, Para. [0030]. A camera on a gimbal is connected to a processor and a memory, wherein the camera is used to capture video images.);
a visual tracking module (Van Schoyck. Para. [0066].) 
But it does not specifically disclose a visual tracking module configured to: receive a first video image of the video and a determined bounding box of the target object in the first video image, wherein the determined bounding box is a rectangular box which specifies a determined first location of the target object and determined dimensions of the target object; receive a second video image of the video following the first video image from the camera, wherein the location of the target object is unknown in the second video image; place a first search window in the first video image at the determined first location of the target object, and separately placing multiple second search windows of multiple window sizes in the second video image, wherein each of the multiple second search windows is centered on a second location in the second video image, wherein the second location corresponds to the determined first location of the target object in the first video image: wherein at least one window size in the multiple window sizes is larger than the size of the first search window; and wherein at least one window size in the multiple window sizes is smaller than the size of the first search window; compute a correlation map between a first image patch of the first video image within the first search window and a second image patch of the second video image within each of the multiple second search window; and determine an updated location of the target object in the second video image based on the computed correlation maps.
However, Liu teaches receive a first video image of the video and a determined bounding box of the target object in the first video image, wherein the determined bounding box is a rectangular box which specifies a determined first location of the target object and determined dimensions of the target object (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. Video images are continuously processed, wherein a first video image is received, including a determined bounding box of a target object, the box is a rectangular box representing a first location of the target object and dimensions of the target object.); 
receive a second video image of the video following the first video image from the camera, wherein the location of the target object is unknown in the second video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. Video images are continuously processed, wherein a second video image is received, including a target object location to be determined.); 
place a first search window in the first video image centered on the determined first location of the target object, wherein the size of the first search window is greater than the size of the determined bounding box, and separately placing multiple second search windows of multiple window sizes in the second video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. A first search window/cropped portions is placed in a first video image centered on at least a first determined location of a target object within a bounding box, wherein the first search window is larger than the bounding box, and multiple second search windows/cropped portions of different sizes are placed a second video image.), wherein each of the multiple second search windows is centered on a second location in the second video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. Multiple second search windows/cropped portions are centered on a second location in a second video image.), 
wherein the second location corresponds to the determined first location of the target object in the first video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. Multiple second search windows/cropped portions are centered on a second location in a second video image, wherein a second location corresponding to a first determined location.): 
wherein at least one window size in the multiple window sizes is larger than the size of the first search window to anticipate that the target object will increase in size in the second video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. The size of search windows/cropped portions are scaled such that at least one window/cropped portion size is larger, smaller, or remain the same than a size of a first search window/cropped portion, wherein the bounding box varies in size as the detected object move closer or farther, anticipating/correlating with the size of the target object, wherein the scale/search window is smaller or larger for the corresponding bounding box anticipating/correlating with the size of the target object.); and 
wherein at least one window size in the multiple window sizes is smaller than the size of the first search window to anticipate that the target object will decrease in size in the second video image (Liu; Fig. 2B, 5, 6B,C, Para. [0028-30, 44]. The size of search windows/cropped portions are scaled such that at least one window/cropped portion size is larger, smaller, or remain the same than a size of a first search window/cropped portion, wherein the bounding box varies in size as the detected object move closer or farther, anticipating/correlating with the size of the target object, wherein the scale/search window is smaller or larger for the corresponding bounding box anticipating/correlating with the size of the target object.);
compute a correlation map between a first image patch of the first video image within the first search window and a second image patch of the second video image within each of the multiple second search window (Liu; Fig. 2B, 5, 6B,C,Para. [0028-30, 38-40, 42-44-47]. A similarity/correlation is calculated between a first image detected portion of a first search window/cropped portion of a first video image and each second image portion of second search windows/cropped portions of a second video image within second.); and 
determine an updated location of the target object in the second video image based on the computed correlation maps (Liu; Fig. 2B, 5, 6B,C,Para. [0028-30, 38-40, 42-44-47]. An updated location of a trajectory is determined in a second video image in accordance with a identified best similarity in the second video image.).
Therefore, it would have been obvious to a person with ordinary skill in the pertinent before the effective filing date of the claimed invention to modify the object tracking UAV system/method of Van Schoyck to adapt a target tracking approach, by incorporating Liu’s teaching wherein a target object tracking for multiple objects are employed, for the motivation to track multiple objects of different categories in a video (Liu; Abstract.).

Regarding claim 20, modified Van Schoyck teaches the visual tracking module is further configured to use the determined updated location of the target object to control the flight of the UAV and/or the gimbal (Van Schoyck; Para. [0066]. Tracking information together with other information is used to provide flight control for the UAV.).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Hwang (US Pub. 20180046188 A1) teaches an unmanned aerial vehicle with an automatic tracking function for objects.
Dijkman (US Pub. 20170011281 A1) teaches a machine learning network for performing context-based object detection.

a head of a camera-generated image of a person.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALBERT KIR whose telephone number is (571)272-6245.  The examiner can normally be reached on Monday - Friday, 8:30am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jay Patel can be reached on (571) 272-2988.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 


/ALBERT KIR/Primary Examiner, Art Unit 2485