DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 6, 10, 11, 13, 16, 19 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (US 20190130191 A1) in view of Wang et al. (IDS: US 20190205623 A1) in view of Lang et al. (US 10171738 B1).

Regarding claims 1, 11, and 20, Zhou et al. disclose an image processing method, applied to a terminal device, and comprising; image processing apparatus, comprising: a memory and a processor, the memory being configured to store a computer program; and the processor being configured to run the computer program, to perform the following actions, and non-transitory storage medium, storing a computer program, the computer program being configured to, when being run by a processor, cause the processor to perform: detecting a target object in a current video frame of a target video stream, to obtain a current detection region for the target object (tracking objects in one or more video frames. For example, a candidate bounding box for an object tracker can be obtained based on an application of an object detector to at least one key frame in the one or more video frames, the candidate bounding box being associated with one or more input attributes, abstract, object detector comprises a feature-based detector, [0033], detecting moving objects and by tracking moving objects, the video analytics can generate and display a bounding box around a valid object, [0063],  a bounding box can be associated with a blob, [0067]); obtaining a historic detection region corresponding to the target object in a historic video frame of the target video stream (history of the output bounding box in previous frames, [0136]); adjusting the current detection region according to the historic detection region, to obtain a determined current detection region ( a bounding box for a blob tracker in a current frame can be the bounding box of a previous blob in a previous frame for which the blob tracker was associated, updated information for the blob tracker can include the tracking information for the previous frame and also prediction of a location of the blob tracker in the next frame (which is the current frame in this example), [0068], output bounding boxes for key frames, [0124], predicting a target location of the output bounding box in a current frame, a target dimension of the output bounding box in the current frame, or other attributes of the output bounding box, based on a history of the output bounding box in previous frames, [0136]); performing key point positioning on the target object based on the determined current detection region, to obtain a first set of key points (a blob can include a contiguous group of pixels making up at least a portion of a foreground object in a video frame, video sequence, [0067], Tracking of blobs of the current frame A 202A can be performed once the updated blob trackers 310A are generated, [0073], classification of the pixel to either a foreground pixel or a background pixel, [0076],   tracking key points, [0079], foreground mask can include a binary image containing the pixels making up the foreground objects (e.g., moving objects) in a scene, [0080],  a machine learning method can determine that a current blob contains noise (e.g., foliage in a scene), [0090], high values will be present in the activation maps that represent high-level features of people (e.g., two legs are present, a face is present at the top of the object, two eyes are present at the top left and top right of the face, a nose is present in the middle of the face, a mouth is present at the bottom of the face, and/or other features common for a person), [0215]); obtaining a second set of key points corresponding to the target object in the historic video frame of the target video stream (blob trackers 310A that were updated based on the prior video frame A 202A, [0074]); and performing stabilization on locations of the key points in the first set according to locations of the key points in the second set, to obtain current locations of a set of key points of the target object in the current video frame (To improve the smoothness of the output bounding box, certain post-processing of the output bounding box can be performed before the output bounding box is used for object tracking,  By post-processing the output bounding box based on the history of the output bounding box in previous frames, the changes in the location and/or dimensions of the output bounding box can become more aligned with the historical average, which can improve the smoothness (and reduce a degree of jitter) of the output bounding box across a set of video frames, [0136],  perform selectively post-processing of a bounding box to improve the degree of smoothness of the bounding box across a set of video frames, before the bounding box is provided for tracking an object, [0138]).

Zhou et al. indicate obtaining a first and second set of key points, as Zhou et al. describe tracking foreground blobs, recognizing facial features, and doing detection processes in key frames, however, another reference is included herein to make this feature more explicit. Further, while Zhou et al. indicate a stabilization performed via the bounding boxes, as key points are not explicitly disclosed, the stabilization according to key points is not explicitly described. 

Wang et al. teach an image processing method, applied to a terminal device, and comprising: detecting a target object in a current video frame of a target video stream, to obtain a current detection region for the target object (obtaining, from a video stream, an image that currently needs to be processed as a current image frame, abstract, envelope box, [0034], face detection, [0045], detection coordinate box, [0046]); obtaining a historic detection region corresponding to the target object in a historic video frame of the target video stream (An envelope box of the coordinates of the facial key points in the previous image frame is calculated, to obtain a registration coordinate box. In some embodiments, the calculated envelop box is directly used as the registration coordinate box, [0030]); adjusting the current detection region according to the historic detection region, to obtain a determined current detection region ( a registration coordinate box of the previous image frame (that is, an envelope box of a face in the previous image frame) may be used as an envelope box of the same face in the current image frame, [0043]); performing key point positioning on the target object based on the determined current detection region, to obtain a first set of key points (“For example, the registration coordinate box may be specifically used as an envelope box of the facial key points in the current frame, to deduce the positions of the facial key points in the current frame, so as to obtain the coordinates of the facial key points in the current frame. In some embodiments, detection of facial key points of the current frame is confined within the registration coordinate box to reduce computation complexity and save computing resources”, [0042], facial key points, [0048]); obtaining a second set of key points corresponding to the target object in the historic video frame of the target video stream (facial key points in a previous image frame, abstract, facial key points in previous image frame, [0026]).

Zhou et al and Wang et al. are in the same art of object tracking (Zhou et al., abstract; Wang et al., abstract). The combination of Wang et al. with Zhou et al. enables the finding of keypoints in particular. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the keypoints of Wang et al. with the invention of Zhou et al. as this was known at the time of filing, the combination would have predictable results, and as Wang et al. indicate a detection time can be greatly reduced to improve processing efficiency, and resource consumption can be reduced, and by using face registration techniques to track face key points according to confidence level, excessive face detection calculations for certain frames can be avoided, which facilitates real-time calculation by a mobile terminal ([0111]), indicating the efficiency and accuracy improvement in conjunction with Zhou et al. 

Zhou et al. teach performing stabilization on the bounding box locations, but do not make explicit performing stabilization on locations of the key points in the first set according to locations of the key points in the second set, to obtain current locations of a set of key points of the target object in the current video frame.

Liang et al. teach performing stabilization on the bounding box locations, but do not make explicit performing stabilization on locations of the key points in the first set according to locations of the key points in the second set, to obtain current locations of a set of key points of the target object in the current video frame (computing system determines a stabilized location of a facial feature in a frame of video accounting for its location in a previous frame, abstract, col. 2, lines 25-60, computing system determines a stabilized location of the facial feature, col. 7, lines 15-35,  determining the stabilized location 182 of the facial feature accounts for the distance between a potential stabilized location and the actual location 184 of the facial feature (e.g., the mean of face landmarks), col. 7, lines 30-50, determine a stabilized location of the facial feature, col. 8, lines 1-10).

Zhou et al and Wang et al. and Liang et al. are in the same art of object tracking (Zhou et al., abstract; Wang et al., abstract; Liang et al., col. 3, lines 30-45). The combination of Liang et al. with Zhou et al. and Wang et al. enables the stabilization of keypoints in particular. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the stabilization of Liang et al. with the invention of Zhou et al. and Wang et al. as this was known at the time of filing, the combination would have predictable results, and as Liang et al. indicate “The technology described herein can stabilize video to minimize unintentional movement of both the camera and one or more objects. As such, the need for mechanical stabilization may be reduced or eliminated, which can lower manufacturing expenses and the space required to house a camera in a recording device. Alternatively, stabilization may be enhanced if both mechanical and digital video stabilization are used. Another benefit is that a user of a recording device may not have to concentrate on stabilizing movement of the recording device or a subject of the video, and may focus on other aspects of the video-taking experience” (col. 2, lines 5-20), indicating an improvement to the user experience in conjunction with Zhou et al. and Wang et al.

Regarding claims 3 and 13, Zhou et al., Wang et al., and Liang et al. disclose the method and apparatus according to claims 1 and 11. Zhou et al. and Liang et al. further indicate the performing stabilization on locations of the key points in the first set according to locations of the key points in the second set, to obtain current locations of a set of key points of the target object in the current video frame comprises: performing coordinate smoothing on the locations of the key points in the first set according to the locations of the key points in the second set, to obtain the current locations of the set of key points of the target object in the current video frame (Zhou et al., the smoothness of an output bounding region can refer to a rate of change in one or more attributes of the output bounding region over a set of continuous frames. The one or more attributes may include, for example, a position of the output bounding region within the frames (e.g., represented by the pixel coordinates of the geometric center of the output bounding region within the frame), [0009], smooth foreground mask, [0075], pixel coordinates, [0132], smoothing bounding boxes, input attributes may include, for example, a location of the candidate bounding box in the current video frame (e.g., represented by pixel coordinates),[0138]; Liang et al.,  stabilized location of the person's face may be a location at which the person's face would be located if movement of the person's face were smoothed, col. 4, lines 35-50, stabilized center of the face can represent a location at which the face center would be located had the face moved smoothly from the previous frame to the current frame, stabilized location of the face center can involve an optimization process, col. 7, lines 30-50,  apply coordinates to the frame to represent locations of portions of the frame, col. 8, lines 35-50).

Regarding claims 6 and 16, Zhou et al., Wang et al., and Liang et al. disclose the method and apparatus according to claims 1 and 11. Zhou et al. and Wang et al. further indicate before the detecting a target object in a current video frame of a target video stream, to obtain a current detection region for the target object, the method further comprises: detecting a first video frame of the target video stream, to obtain a plurality of second candidate detection regions; using a second candidate detection region having a maximum confidence level in the plurality of second candidate detection regions as a detection region corresponding to the first video frame, and then using the detection region corresponding to the first video frame as a historic detection region of another video frame in the target video stream (Zhou, confidence score can be provided to indicate how certain it is that the predicted bounding box actually encloses an object, [0116], a detector bounding box 723 may be excluded from the final set of bounding boxes 726 if the confidence level of the detector bounding box is below a confidence threshold, [0128], confidence score for a bounding box and the class prediction are combined into a final score that indicates the probability that that bounding box contains a specific type of object, from the 2545 total bounding boxes that were generated, only the three bounding boxes shown in FIG. 29C were kept because they had the best final scores, [0220]; Wang, coordinates of the facial key points in the current frame, and the confidence level are used as reference for tracking facial key points in a next image frame, [0110], calculate an envelope box of the coordinates of the facial key points in the previous image frame when the confidence level is higher than the preset threshold, to obtain a registration coordinate box, [0127], coordinates of facial key points in a previous image frame and a confidence level, to deduce coordinates of facial key points in a current frame, [0146], calculating coordinates of facial key points in the current frame according to the coordinates of the facial key points in the previous image frame when the confidence level is higher than a preset threshold, [0158]).

Regarding claims 10 and 19, Zhou et al., Wang et al., and Liang et al. disclose the method and apparatus according to claims 1 and 11. Zhou et al., Wang et al., and Liang et al. indicate recognizing a part of the target object from the current video frame according to the current locations of the set of key points of the target object; performing adjustment on the recognized part of the target object; and displaying an image of the target object after the adjustment (Zhang et al., a final bounding region of a tracker can be displayed as tracking a tracked blob, [0008], Video analytics can further be used to perform various types of recognition functions, such as face detection and recognition, license plate recognition, object recognition (e.g., bags, logos, body marks, or the like), or other recognition functions. In some cases, video analytics can be trained to recognize certain objects. Another function that can be performed by video analytics includes providing demographics for customer metrics (e.g., customer counts, gender, age, amount of time spent, and other suitable metrics), [0063]; Wang et al., face recognition, abstract, multi-face recognition on the current frame according to the coordinates of the facial key points, [0055]; Liang et al., Stabilizing movement of a face in addition to movement of a camera can provide better stabilization results in various circumstances. For example, suppose that a user is taking a video with a front-facing camera of a smartphone (e.g., a “selfie) while riding in a vehicle. The vehicle may cause both the camera and the user to bounce. A video stabilization mechanism that stabilizes only physical movement of the camera may actually be counterproductive in such a situation because the user's face may continue to bounce even if the camera location were stabilized, col. 1, line 65 – col. 2, line 10, video stabilization, video that appears on a display of the recording device while the recording device is recording video may be the stabilized video, col. 5, lines 5-25).

Claims 2, 5, 12 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (US 20190130191 A1) and Wang et al. (IDS: US 20190205623 A1) and Lang et al. (US 10171738 B1) as applied to claim 1 above, further in view of Kim et al. (“Probabilistic Ship Detection and Classification Using Deep Learning” June 2018).

Regarding claims 2 and 12, Zhou et al., Wang et al., and Liang et al. disclose the method and apparatus according to claims 1 and 11. Wang et al. further teach an intersection over union ([0103]), however, Zhou et al., Wang et al., and Liang et al. do not explicitly disclose the adjusting the current detection region according to the historic detection region, to obtain a determined current detection region comprises: determining an intersection over union between the historic detection region and the current detection region; using the historic detection region as the determined current detection region when the intersection over union is greater than a target threshold; and using the current detection region as the determined current detection region when the intersection over union is less than or equal to the target threshold.

Kim et al. teach adjusting the current detection region according to the historic detection region, to obtain a determined current detection region comprises: determining an intersection over union between the historic detection region and the current detection region; using the historic detection region as the determined current detection region when the intersection over union is greater than a target threshold; and using the current detection region as the determined current detection region when the intersection over union is less than or equal to the target threshold (when the Faster R-CNN returns R bounding boxes from a given image in the t-th frame, the bounding box with the largest IoU with the bounding box Bt-1 in the previous frame is used as the bounding box Bt in the current frame, p4, 


    PNG
    media_image1.png
    909
    849
    media_image1.png
    Greyscale
, p5).

Zhou et al and Wang et al. and Liang et al. and Kim et al. are in the same art of object detection (Zhou et al., abstract; Wang et al., abstract; Liang et al., col. 3, lines 30-45; Kim et al. abstract). The combination of Kim et al. with Zhou et al. and Wang et al. and Liang et al. enables the use of the IoU. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the IoU or Kim et al. with the invention of Zhou et al. and Wang et al. and Liang et al. as this was known at the time of filing, the combination would have predictable results, and as Kim et al. indicate this will prevent missing an object such as a ship (p5), indicating an improvement to the object tracking performed by Zhou et al. and Wang et al. and Liang et al.

Regarding claims 5 and 15, Zhou et al., Wang et al., and Liang et al. disclose the method and apparatus according to claims 1 or 11. Wang et al. further teach an intersection over union ([0103]), however Zhou et al., Wang et al., and Liang et al. do not disclose detecting the current video frame, to obtain a plurality of first candidate detection regions; and determining a first candidate detection region having a maximum intersection over union with the historic detection region from the plurality of first candidate detection regions as the current detection region.

Kim et al. teach detecting the current video frame, to obtain a plurality of first candidate detection regions; and determining a first candidate detection region having a maximum intersection over union with the historic detection region from the plurality of first candidate detection regions as the current detection region (bounding box with the largest IoU with the bounding box Bt-1 in the previous frame is used as the bounding box Bt in the current frame, p4, select bounding box with largest IOU, p5).

Zhou et al and Wang et al. and Liang et al. and Kim et al. are in the same art of object detection (Zhou et al., abstract; Wang et al., abstract; Liang et al., col. 3, lines 30-45; Kim et al. abstract). The combination of Kim et al. with Zhou et al. and Wang et al. and Liang et al. enables the use of the IoU. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the IoU or Kim et al. with the invention of Zhou et al. and Wang et al. and Liang et al. as this was known at the time of filing, the combination would have predictable results, and as Kim et al. indicate this will prevent missing an object such as a ship (p5), indicating an improvement to the object tracking performed by Zhou et al. and Wang et al. and Liang et al.

Claims 7 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (US 20190130191 A1) and Wang et al. (IDS: US 20190205623 A1) and Lang et al. (US 10171738 B1) as applied to claim 1 above, further in view of Corocan et al. (US 20090303342 A1).

Regarding claims 7 and 17, Zhou et al., Wang et al., and Liang et al. disclose the method and apparatus according to claims 1 and 11. Zhou et al. imply performing key point positioning on the target object based on the determined current detection region, to obtain a first set of key points comprises: performing, when the target object in the current video frame is partially located in the determined current detection region, expanding on the determined current detection region by centering around a center of the determined current detection region, to obtain a target detection region; and obtaining the first set of key points according to a target image comprising the target object in the target detection region (a sudden enlargement of a bounding box), [0137] , a change in size of a bounding box associated with the object tracker, t, a rate of change in a physical size of the object and/or a bounding box associated with the object (which may indicate a merging of the bounding boxes), [0139], rapid change in size compared with a historical output bounding box of the same object tracker in previous frames, [0140]), however, another reference is added to make this explicit.

Corocan et al. teach performing, when the target object in the current video frame is partially located in the determined current detection region, expanding on the determined current detection region by centering around a center of the determined current detection region, to obtain a target detection region; and obtaining the first set of key points according to a target image comprising the target object in the target detection region (Haar feature classifiers, [0008], [0011], [0041], “In a further example, where a person in a scene may be walking towards the camera, the person's face will grow from frame to frame. Thus, on each subsequent frame during the confirming step 406, the tracking algorithm after inspecting the history record for this face region may first employ the next largest size of face detector, then the current size. If the face is still not confirmed then additional filters such as a skin pixel filter will try to determine if the face has turned to an angle, or has perhaps grown more than one size, or moved more than was expected and is thus outside the original bounding box which can then be enlarged. However, in certain embodiments, the accurate determination of a tracked face region in one preview image frame is utilized to predict the location of the same face region in the following image frame typically within a 20% larger bounding box. The exact increase in size of the bounding box depends in part on the size of the face region, the frame rate of the preview stream, and/or the amount of camera motion caused by the user, and so on. Thus, the 20% figure is only illustrative and may be adjusted up or down”, [0138]) [Haar feature classifiers and face detection interpreted as key points, but key points also already disclosed by primary references as in the rejections above].
Zhou et al and Wang et al. and Liang et al. and Corocan et al. are in the same art of object tracking (Zhou et al., abstract; Wang et al., abstract; Liang et al., col. 3, lines 30-45; Corocan et al. abstract). The combination of Corocan et al. with Zhou et al. and Wang et al. and Liang et al. enables the use of an expanded bounding box. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the expanded box of Corocan et al. with the invention of Zhou et al. and Wang et al. and Liang et al. as this was known at the time of filing, the combination would have predictable results, and as Corocan et al. indicate this will enable tracking when an object is changing size due to coming closer or going further from the camera ([0138]), indicating a benefit in the object tracking performed by Zhou et al. and Wang et al. and Liang et al. where it is likely the object being tracked will indeed change size for this reason.

Claims 8 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (US 20190130191 A1) and Wang et al. (IDS: US 20190205623 A1) and Lang et al. (US 10171738 B1) and Corocan et al. (US 20090303342 A1) as applied to claims 7 and 17 above, further in view of Pavlakos et al. (IDS: “6-DoF Object Pose from Semantic Keypoints”, 2017).

Regarding claims 8 and 18, Zhou et al., Wang et al., Liang et al., and Corocan et al. disclose the method and apparatus according to claims 7 and 17. Zhou et al., Wang et al., Liang et al., and Corocan et al. do not explicitly disclose the obtaining the first set of key points according to a target image comprising the target object in the target detection region comprises: processing the target image to obtain a plurality of groups of confidence levels of the first set of key points, each group of confidence levels being used for predicting a location of one object key point in the first set of key points; constructing a target matrix by using the each group of confidence levels; determining first target coordinates according to a row and a column of a maximum confidence level in the each group of confidence levels in the corresponding target matrix; and determining the location of the one object key point in the first set of key points according to the first target coordinates.

Pavlakos et al. teach processing the target image to obtain a plurality of groups of confidence levels of the first set of key points, each group of confidence levels being used for predicting a location of one object key point in the first set of key points; constructing a target matrix by using the each group of confidence levels; determining first target coordinates according to a row and a column of a maximum confidence level in the each group of confidence levels in the corresponding target matrix; and determining the location of the one object key point in the first set of key points according to the first target coordinates (
    PNG
    media_image2.png
    277
    453
    media_image2.png
    Greyscale
) [maximum taught by Zhou and Wang above in claim 6].

Zhou et al and Wang et al. and Liang et al. and Corocan et al. and Pavlakos et al. are in the same art of object detection (Zhou et al., abstract; Wang et al., abstract; Liang et al., col. 3, lines 30-45; Corocan et al. abstract; Pavlakos et al., p2012). The combination of Pavlakos et al. with Zhou et al. and Wang et al. and Liang et al. and Corocan et al. enables the use of a matrix using confidence information. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the matrix of Pavlakos et al. with the invention of Zhou et al. and Wang et al. and Liang et al. and Corocan et al. as this was known at the time of filing, the combination would have predictable results, and as Pavlakos et al. indicate this will correct imprecisions due to occlusions and false detections in the background (p2013) indicating a way to increase the tracking precision of the object tracking performed by Zhou et al. and Wang et al. and Liang et al. and Corocan et al..

Claim 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (US 20190130191 A1) and Wang et al. (IDS: US 20190205623 A1) and Lang et al. (US 10171738 B1) and Corocan et al. (US 20090303342 A1) and Pavlakos et al. (IDS: “6-DoF Object Pose from Semantic Keypoints”, 2017) as applied to claim 8 above, further in view of Barron et al. (US 9860441 B1).

Regarding claim 9, Zhou et al., Wang et al., Liang et al., and Corocan et al. and Pavlakos et al. disclose the method according to claim 8. Zhou et al., Wang et al., Liang et al., and Corocan et al. and Pavlakos et al. do not explicitly disclose the determining the location of the one object key point in the first set of key points according to the first target coordinates comprises: determining second target coordinates according to a row and a column of a second maximum confidence level in the each group of confidence levels in the target matrix; offsetting the first target coordinates toward the second target coordinates by a target distance; and determining, according to first target coordinates that are offset by the target distance, a location of the one object key point corresponding to the target matrix on the target object.

Barron et al. teach determining second target coordinates according to a row and a column of a second maximum confidence level in the each group of confidence levels in the target matrix; offsetting the first target coordinates toward the second target coordinates by a target distance; and determining, according to first target coordinates that are offset by the target distance, a location of the one object key point corresponding to the target matrix on the target object (For a plurality of m×n pixel tiles of the first captured image, the computing device may determine respective distance matrixes. The distance matrixes may represent respective fit confidences between the m×n pixel tiles and pluralities of target p×q pixel tiles in the second captured image. The computing device may approximate the distance matrixes with respective bivariate surfaces. The computing device may upsample the bivariate surfaces to obtain respective offsets for pixels in the plurality of m×n pixel tiles. The respective offsets, when applied to pixels in the plurality of m×n pixel tiles, may cause parts of the first captured image to estimate locations in the second captured image, abstract, In order to provide a compact representation of distance matrix D, a two-dimensional polynomial, such as a bivariate quadratic surface, can be fit at or near the entry in distance matrix D that has the minimum value of all entries in distance matrix D. If multiple minima exist, any one may be chosen. This quadratic surface may be useful in a variety of ways. Such a quadratic surface could be used to estimate the sub-pixel location of the minimum of distance matrix D, which is more accurate than simply taking the per-pixel location as the minimum for most motion-estimation tasks. Additionally, a quadratic approximation could also be used as a compact approximation to distance matrix D in a more sophisticated motion estimation algorithm, such as an optical flow algorithm. In optical flow algorithms, for example, the relative confidences of respective motion estimates are used to weigh these estimates. To clarify, distance matrix D may be viewed as an error surface that is to be approximated by a bivariate quadratic surface, where D (u, v) is the L2 distance between the tile T and image portion I when the tile T is offset (e.g., shifted) by (u, v) in the image portion I. This approximation should accurately model the shape of distance matrix D near a minimum, and it is acceptable for the approximation to be poor far from this minimum. In most cases, distance matrix D, as a whole, is poorly modeled with a single bivariate quadratic surface. But for the purposes herein, since the goal is to have a reasonably accurate fit near the minimum, less accurate fits away from the minimum are not problematic, col. 11, lines 25-55, The first column of FIG. 5 shows the three tiles, and the second column shows respective image portions. Each image portion may be searched for one or more matches of its associated tile. The third column shows the distance matrix D for each tile, calculated using Equation (6). The fourth column shows the bivariate quadratic fit to that distance matrix, around the minimum point of the distance matrix, and clipped to the maximum value of the distance matrix. The fifth column shows 3D visualizations of the fitted bivariate quadratic. In addition to representing a fit between a tile and its associated image portion, each bivariate quadratic surface fits also represent confidence measures of the fit. Where the surface has a small value on the z-axis (the vertical axis), the confidence of the fit is higher, and where the surface has a larger value on the z-axis, the confidence of the fit is lower, col. 14, lines 5-25).

Zhou et al and Wang et al. and Liang et al. and Corocan et al. and Pavlakos et al. and Barron et al are in the same art of object detection (Zhou et al., abstract; Wang et al., abstract; Liang et al., col. 3, lines 30-45; Corocan et al. abstract; Pavlakos et al., p2012; Barron et al., dt120). The combination of Barron et al. with Zhou et al. and Wang et al. and Liang et al. and Corocan et al. and Pavlakos et al. enables the use of an offset. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the offset of Barron et al. with the invention of Zhou et al. and Wang et al. and Liang et al. and Corocan et al. and Pavlakos et al. as this was known at the time of filing, the combination would have predictable results, and Barron et al. indicate in this way the alignment procedure is made computationally efficient so that it can operate in real-time, or near-real-time, on various types of image capture devices (col. 1, lines 50-65) indicating a way to increase the speed of the processes performed by Zhou et al. and Wang et al. and Liang et al. and Corocan et al. and Pavlakos et al..

Allowable Subject Matter
Claims 4 and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Closest art: US 10110846 B2: Frame rate conversion generally consists of two parts, namely, motion estimation and motion compensated frame interpolation. Referring to FIG. 1, an overview of a framework for frame rate conversion based upon sparse key points is illustrated. A series of input frames 20 are provided to a key point detection process 100. The key point detection process 100 determines a set of key points for each of the input frames 20, which may be coupled with a distribution control process. The results of the key point detection process 100 may be provided to a key point description extraction and matching process 200. The result of the key point description extraction and matching process 200 may be provided to a parametric motion model estimation and dense motion field computation process 300. The result of the parametric motion model estimation and dense motion field computation process 300 may be provided to a motion compensated frame interpolation process 400 to provide the resultant frame rate conversion. For each key point in a first frame, it is desirable to search for the “best” matching key point in a second frame, and vice versa. Hence, the goal is to establish a sufficient number of one-to-one key point correspondences between the two frames. In the case of frame rate conversion, the first and second frame may be adjacent frames, if desired, in a video sequence.” After the normalization 338 the process may apply any suitable upscaling technique 340 to upscale the 2 dimensional histogram to be the same size as the frame for the dominant region and the object histogram(s), respectively. One exemplary upscaling technique 340 is a bilinear upscaling process to increase the spatial smoothness of the weight matrix. For example, the system may use feathering to increase a smooth transition between the foreground and the background. Increasing such smoothness reduces artifacts in the final frame interpolation. The final object and the dominant region weighting matrices may be referred to as W.sub.1 and W.sub.2, for reference purposes. An additional weighting matrix may be used for each additional object.”


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084. The examiner can normally be reached 10-7 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VINCENT M RUDOLPH can be reached on (571)272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHELLE M ENTEZARI/Primary Examiner, Art Unit 2661