DETAILED ACTION
Response to Amendment
Claims 1, 2, 4-12, 14-18, and 20 are pending. Claims 1, 2, 4-12, 14-18, and 20 are amended directly or by dependency on an amended claim. 
Response to Arguments
Applicant’s arguments, see pages 7-11, filed August 4, 2022, with respect to the objection to claim 8 along with a detailed explanation of how the selecting process is performed have been fully considered and are persuasive.  The objection to claim 8 has been withdrawn. 
Applicant’s arguments, see pages 11-17, filed August 4, 2022, with respect to the 35 USC 103 rejections of claims 1, 2, 4-12, 14-18, and 20 along with accompanying amendments received on the same date have been fully considered and are persuasive.  The arguments with respect to which pixels are included and excluded are demonstrate the differences between the applied prior art and the current application. The 35 USC 103 rejections of claims 1, 2, 4-12, 14-18, and 20 have been withdrawn. 
Allowable Subject Matter
Claims 1, 2, 4-12, 14-18, and 20 are allowed.
The following is an examiner’s statement of reasons for allowance: As indicated above, applicant thoroughly mapped the differences between the claims as they stand amended and the prior art. The following art is cited as relevant, but not sufficient alone or in combination to disclose, teach or fairly suggest the subject matter of the independent claims:

US 20190286932 A1: After determining a score for each object location proposal, the object detection system filters out wrong object location proposals (214). In one or more embodiments, the object detection system filters out wrong object location proposals (214) based on the scores determined in step (212). For example, in at least one embodiment, the object detection system identifies a maximum score among the scores determined for the object location proposals and filters out object location proposals with scores lower than a threshold amount (e.g., 70%) of the maximum score. Additionally, the object detection system further filters the remaining object location proposals by filtering out object location proposals that are in the same position (e.g., object location proposals with an overlap>=0.8). The object detection system can further filter the remaining object location proposals by identifying any remaining object location proposals that are not covered well (e.g., overlap<=0.8) by the boundary box, and by identifying any remaining object location proposals that fail to cover at least one center box well.

US 20200193732 A1: Vehicle determination engine 310 may be operable to receive confidence score data 324 and bounding box coordinate data 326 corresponding to detected objects, and may filter out (e.g., remove from consideration) one or more of the objects. For example, vehicle determination engine 310 may determine whether a confidence value, identified by confidence score data 324 and indicating how likely a detected object within a corresponding bounding box identified by bounding box coordinate data 326 is a vehicle, is below a threshold (e.g., 60%). If the confidence value is below the threshold, vehicle determination engine 310 may remove that object from consideration. In some examples, if the area of the image defined by a bounding box coincides with an area of the image defined by another bounding box, vehicle determination engine 310 may filter out one of the objects. In some examples, vehicle determination engine 310 filters out the object associated with a lower confidence value. In some examples, vehicle determination engine 310 filters out an object if its corresponding confidence value is below a threshold and at least a portion of the image defined by its corresponding bounding box coincides with at least a portion of a bounding box corresponding to another object. Vehicle detection engine 308 may then provide filtered bounding box coordinate data 311, which identifies bounding boxes for remaining objects (e.g., objects not filtered out by vehicle determination engine 310), to new vehicle identification engine 312. New vehicle identification engine 312 may determine whether an object, identified as of a particular type (e.g., a vehicle), was identified as an object in a previous image. New vehicle identification engine 312 may determine whether, for example, the object (e.g., the vehicle) has moved. If new vehicle identification engine 312 determines the object has moved, new vehicle identification engine 312 may filter out the object (e.g., as identified by the more current image). New vehicle identification engine 312 may make the determination based on one or more of an amount of time that has elapsed between when the previous and current images of the objects were taken or received, and the bounding boxes, as identified by filtered bounding box coordinate data 311, associated with the previous and current identified objects. As discussed further below, database 116 stores vehicle data 314, which may include filtered bounding box coordinate data 311 and video image time data 319 associated with previously identified vehicles.

“CASCADE MASK GENERATION FRAMEWORK FOR FAST SMALL OBJECT DETECTION”: In this work, we propose a cascade mask generation framework to tackle this issue. The proposed framework takes in multi-scale images as input and processes them in ascending order of the scale. Each processing stage outputs object proposals as well as a region-of-interest (RoI) mask for the next stage. With RoI convolution, the masked regions can be excluded from the computation in the next stage. The procedure continues until the largest scale image is processed. Finally, the object proposals generated from multiple scales are classified by a post classifier.

US 10902291 B1: 1. A method for training an auto labeling device capable of performing automatic verification by using uncertainty scores of auto-labeled labels, comprising steps of: (a) a learning device performing (i) (i-1) a process of inputting or allowing the auto labeling device to input one or more first unlabeled training images into a feature pyramid network of the auto labeling device, to thereby allow the feature pyramid network to apply at least one convolution operation to each of the first unlabeled training images and thus to generate first pyramid feature maps for training with different resolutions for each of the first unlabeled training images, (i-2) a process of inputting or allowing the auto labeling device to input the first pyramid feature maps for training into an object detection network of the automatic labeling device, to thereby allow the object detection network to detect one or more first objects for training in each of the first pyramid feature maps for training and thus to generate each of first bounding boxes for training corresponding to each of the first objects for training, and (ii) (ii-1) a process of allowing or instructing the auto labeling device to allow an ROI (region of interest) pooling layer of the auto labeling device to apply at least one pooling operation to each of the first pyramid feature maps for training using the first bounding boxes for training, to thereby generate first pooled feature maps for training, and (ii-2) a process of inputting or allowing the auto labeling device to input the first pooled feature maps for training into a deconvolution network of the auto labeling device, to thereby allow the deconvolution network to apply at least one first deconvolution operation to the first pooled feature maps for training and thus to generate each of first segmentation masks for training corresponding to each of the first objects for training, and (iii) (iii-1) a process of training the object detection network and the feature pyramid network, using one or more first losses calculated by referring to the first bounding boxes for training and one or more bounding box ground truths of each of the first unlabeled training images, and (iii-2) a process of training the deconvolution network and the feature pyramid network, using one or more second losses calculated by referring to the first segmentation masks for training and one or more mask ground truths of each of the first unlabeled training images; and (b) the learning device performing (i) (i-1) a process of inputting or allowing the auto labeling device to input one or more second unlabeled training images into the feature pyramid network, to thereby allow the feature pyramid network to generate second pyramid feature maps for training with different resolutions, and (i-2) a process of inputting or allowing the auto labeling device to input the second pyramid feature maps for training into the object detection network, to thereby allow the object detection network to detect one or more second objects for training in each of the second pyramid feature maps for training and thus to generate each of second bounding boxes for training corresponding to each of the second objects for training, (ii) (ii-1) a process of instructing or allowing the auto labeling device to instruct the ROI pooling layer of the auto labeling device to apply at least one pooling operation to each of the second pyramid feature maps for training by using the second bounding boxes for training, to thereby generate each of second pooled feature maps for training, and (ii-2) a process of inputting or allowing the auto labeling device to input the second pooled feature maps for training into the deconvolution network, to thereby allow the deconvolution network to apply at least one first deconvolution operation to the second pooled feature maps for training and thus to generate each of second segmentation masks for training corresponding to each of the second objects for training, (ii-3) and at least one of (ii-3-a) a process of inputting or allowing the auto labeling device to input the second pooled feature maps for training into a first classifier of the auto labeling device, to thereby allow the first classifier to apply at least one second deconvolution operation and then at least one PDF (probability distribution function) operation to each of the second pooled feature maps for training and thus to generate first per-pixel class scores for training and each of first mask uncertainty scores for training, respectively corresponding to each of the second segmentation masks for training, and (ii-3-b) a process of inputting or allowing the auto labeling device to input the second pooled feature maps for training into a second classifier of the auto labeling device, to thereby allow the second classifier to (1) generate k copies of each of the second pooled feature maps for training, (2) randomly set at least one element in each of the k copies of each of the second pooled feature maps for training as 0 and thus generate randomly-zeroed k copies thereof, (3) apply at least one third deconvolution operation and then at least one sigmoid operation to the randomly-zeroed k copies of each of the second pooled feature maps for training, and thus (4) generate second per-pixel class scores for training and each of second mask uncertainty scores for training, respectively corresponding to each of the second segmentation masks for training, and (iii) one of (iii-1) a process of training the first classifier using one or more third losses calculated by referring to the first per-pixel class scores for training and the mask ground truths, and (iii-2) a process of training the second classifier using one or more fourth losses calculated by referring to the second per-pixel class scores for training and the mask ground truths.

US 20220036573 A1 [does not predate]: in the training process of the neural network, based on the respectively output first statistical value and second statistical value for each pixel, a probability that an object is covered by or obscures another object in the input image may be calculated for each pixel of the input image. For example, based on a probability of an object being covered by another object which is calculated for each pixel, a binary mask corresponding to each pixel or a weight mask having a value between “0” and “1” for each pixel may be generated. In this example, an accuracy of training may be increased by applying a generated mask to a loss of the neural network.

US 20210365732 A1: F represents the foreground confidence of the area covered by the template, t.sub.m represents the binary mask of the template, N.sub.f represents the number of foreground pixels included in the binary mask of the template, and N.sub.b represents the number of background pixels included in the binary mask of the template. A portion of the region with high possibility to be foreground is located in the foreground region of the template while the template in which the background region and the foreground region overlap is suppressed. The foreground region can be better aligned by the guidance of the foreground confidence.

US 20180322371 A1: The method of the example further comprises for a given image filtering out S50 all bounding boxes outputted by S40 (i.e. all bounding boxes remaining after the two previous filters S30 and S40) which are associated to a confidence score corresponding to an object category not among the initial labels provided at S10 for the given image. Such a filtering out S50 considers that the initial labels substantially exhaustively indicate which object categories are instantiated in the images provided at S10, such that results inconsistent with this consideration are filtered out at S50. This proves particularly true when the initial labels stem from users adding weak annotations to images in order to create the initial dataset.

“Deeply Shape-guided Cascade for Instance Segmentation” [does not predate]: The key to a successful cascade architecture for precise instance segmentation is to fully leverage the relationship between bounding box detection and mask segmentation across multiple stages. Although modern instance segmentation cascades achieve leading performance, they mainly make use of a unidirectional relationship, i.e., mask segmentation can benefit from iteratively refined bounding box detection. In this paper, we investigate an alternative direction, i.e., how to take the advantage of precise mask segmentation for bounding box detection in a cascade architecture. We propose a Deeply Shape-guided Cascade (DSC) for instance segmentation, which iteratively imposes the shape guidances extracted from mask prediction at previous stage on bounding box detection at current stage. It forms a bi-directional relationship between the two tasks by introducing three key components: (1) Initial shape guidance: A mask-supervised Region Proposal Network (mPRN) with the ability to generate class-agnostic masks; (2) Explicit shape guidance: A mask-guided regionof- interest (RoI) feature extractor, which employs mask segmentation at previous stage to focus feature extraction at current stage within a region aligned well with the shape of the instance-of-interest rather than a rectangular RoI; (3) Implicit shape guidance: A feature fusion operation which feeds intermediate mask features at previous stage to the bounding box head at current stage We replace the RPN with the mask-supervised RPN (mRPN), which is guided by both the box supervision Bg and the mask supervision Mg. Given the feature map F produced by a CNN backbone as the input, the mRPN produces not only a set of RoIs B0 but also class-agnostic mask probability mapsM0 corresponding to these RoIs. In addition, it also outputs a set of intermediate mask feature maps F0. Let B0 2 B0 be an RoI, thenM0 2M0 and F0 2 F0 are its corresponding mask probability map and intermediate mask feature map, respectively. This component involves initial shape guidance, as the shape guidance is learned from the mask supervision and imposed on the early stage (proposal generation stage) of the cascade

“CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection”: The feature vector li, which incorporates both caption semantics and visual cues, is further processed with two fully connected layers to produce candidate saliency probability and box regression. Then the bounding boxes whose class probabilities are larger than a fixed threshold ɵT would be chosen as salient candidates.

“Density Map Guided Object Detection in Aerial Images”: [does not predate] The density map intensity indicates the probability of object presence in one position. Therefore, at each window position, the sum of all (density) pixel intensities within the window is computed, which can be considered as the likelihood of objects in this window. Then, a density threshold is applied to filter out windows with low overall intensity values. The generated density mask indicates the presence of objects. We generate image crops based on the density mask. First, we select all the pixels whose corresponding density mask value is “1”. Second, we merge the eight-neighbor connected pixels into a large candidate region. Finally, we use the candidate region’s circumscribed rectangle to crop the original image. We filter out the crops whose resolution is below the density threshold

“An Algorithm for Multiple Object Trajectory Tracking”: The observation is composed
of the image itself, the foreground mask given by a background modelling method and the object detection map generated by an object detector. The image provides the object
appearance information which helps to keep the tracking identity even with multi-object interaction and occlusion. A Gaussian-mixture based adaptive background modelling [14] is used to generate a binary foreground mask image. This mask image enables the likelihood computation to consider the multi-object configuration in its entirety. The detection map consists of pixel-wise object detection scores. Even the foreground mask likelihood
computation can be fooled because the foreground pixels of the large object cannot be covered by the human.

“A Mask-RCNN Baseline for Probabilistic Object Detection”: The spatial quality, QS, is calculated based on the ground truth’s objects segmentation mask and the probabilistic
bounding box outputted by the detector. Essentially, assigning higher probabilities to pixels that belong to the object as foreground improve the score, and including any background pixels outside of the ground truth bounding box hurt the score. The spatial quality measure can be reduced greatly by assigning high probabilities to pixels that are truly in the background.

“Probabilistic Object Detection: Definition and Evaluation”: [does not predate] We introduce Probabilistic Object Detection, the task of detecting objects in images and accurately quantifying the spatial and semantic uncertainties of the detections. Given the lack of methods capable of assessing such probabilistic object detections, we present the new Probability-based Detection Quality measure (PDQ). Unlike AP-based measures, PDQ has no arbitrary thresholds and rewards spa-tial and label quality, and foreground/background separation quality while explicitly penalising false positive and false negative detections.

    PNG
    media_image1.png
    478
    461
    media_image1.png
    Greyscale


Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084. The examiner can normally be reached 10-7 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VINCENT M RUDOLPH can be reached on (571)272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHELLE M ENTEZARI/Primary Examiner, Art Unit 2661