DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 2-21 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims of U.S. Patent No. 11,093,793. Although the claims at issue are not identical, they are not patentably distinct from each other because ‘793 teaches all of the claimed limitations of the present application, as shown below:
15/689,431 (present application)
US 11,093,793
2. A method comprising: 

processing, by one or more hardware processors, raw image data using a first trained neural network to produce a first initial set of region of interest (ROI) pairs, each ROI pair comprising a detected ROI for the raw image data and a detected region label classifying the detected RI; 

processing, by the one or more hardware processors, the raw image data using a second trained neural network to produce a second initial set of ROI pairs; 

generating, by the one or more hardware processors, a first intermediate set of ROI pairs by combining the first initial set of ROl pairs and the second initial set of ROI pairs; 

evaluating, by the one or more hardware processors, the first intermediate set of ROI pairs using a set of expert classifiers to produce a set of confidence levels for the first intermediate set of ROI pairs; 

identifying, by the one or more hardware processors, first and second subsets of ROI pairs, in the first intermediate set of ROI pairs, based on the set of confidence levels, each ROI pair in the first subset of ROI pairs having a confidence level that does not satisfy a first reference confidence level criterion, and each ROI pair in the second subset of ROI pairs having a confidence level that satisfies the first reference confidence level criterion; 

sending, by the one or more hardware processors, the first subset of ROI pairs to a labeling system that uses a human individual to confirm or modify a particular detected region label of a particular ROI pair in the first subset of ROI pairs; 

receiving, by the one or more hardware processors, a set of human-confirmed ROI pairs received from the labeling system; and 

generating, by the one or more hardware processors, a second intermediate set of ROI pairs based on the set of human-confirmed ROI pairs.
1. A method comprising: 

processing, by one or more hardware processors, raw image data using a first trained neural network to produce a first initial set of region of interest (ROI) pairs, each ROI pair comprising a detected ROI for the raw image data and a detected region label classifying the detected ROI; 

processing, by the one or more hardware processors, the raw image data using a second trained neural network, while the second trained neural network is set for a first precision, to produce a second initial set of ROI pairs; 

generating, by the one or more hardware processors, a first intermediate set of ROI pairs by combining the first initial set of ROI pairs and the second initial set of ROI pairs; 

evaluating, by the one or more hardware processors, the first intermediate set of ROI pairs using a set of expert classifiers to produce a set of confidence levels for the first intermediate set of ROI pairs; 

identifying, by the one or more hardware processors, first and second subsets of ROI pairs, in the first intermediate set of ROI pairs, based on the set of confidence levels, each ROI pair in the first subset of ROI pairs having a confidence level that does not satisfy a first reference confidence level criterion, and each ROI pair in the second subset of ROI pairs having a confidence level that satisfies the first reference confidence level criterion; 

processing, by the one or more hardware processors, the raw image data using the second trained neural network, while the second trained neural network is set for a second precision lower than the first precision, to produce a third initial set of ROI pairs; and 

generating, by the one or more hardware processors, a second intermediate set of ROI pairs based on the third initial set of ROI pairs.

6. The method of claim 1, wherein the generating the second intermediate set of ROI pairs based on the third initial set of ROI pairs comprises combining the third initial set of ROI pairs and a set of human-confirmed ROI pairs the set of human-confirmed ROI pairs being provided by a labeling system that uses a human individual to confirm or modify a particular detected region label of a particular ROI pair in the first subset of ROI pairs.
3. The method of claims 2, wherein the labeling system is a crowd-sourced annotation system.
5. The method of claim 4, wherein the labeling system is a crowd-sourced annotation system.
4. The method of claim 2, wherein the raw image data comprises a plurality of raw images from at least one of a video data stream or a database.
15. The method of claim 1, wherein the raw image data comprises a plurality of raw images from at least one of a video data stream or a database.
5. The method of claim 2, wherein the raw image data comprises a plurality of raw images from a camera fixed at a location in a physical environment and having an angle of view of the physical environment, the method comprising: causing, by the one or more hardware processors, the second trained neural network to train based on the second intermediate set of ROI pairs, the second trained neural network being trained to process images generated by the camera.
8. The method of claim 6, wherein the processing the raw image data using the second trained neural network to produce the third initial set of ROI pairs comprises producing a second set of confidence levels for the third initial set of ROI pairs, the method further comprising: assigning, by the one or more hardware processors, the second set of confidence levels to the second intermediate set of ROI pairs; and identifying, by the one or more hardware processors, third and fourth subsets of ROI pairs, in the second intermediate set of ROI pairs, based on the second set of confidence levels, each ROI pair in the third subset of ROI pairs having a confidence level that does not satisfy a second reference confidence level criterion, each ROI pair in the fourth subset of ROI pairs having a confidence level that satisfies the second reference confidence level criterion, and the second reference confidence level criterion assisting in determining which regions of interest are easy for the second trained neural network to label and which regions of interest are hard for the second trained neural network to label.
6. The method of claim 2, wherein the identifying of the first and second subsets of ROI pairs, in the first intermediate set of ROI pairs, based on the set of confidence levels comprises for each particular ROI pair in the first intermediate set of ROI pairs: determining whether a particular confidence level, in the set of confidence levels, corresponding to the particular ROI satisfies the first reference confidence level criterion; and including the particular ROI in the first subset of ROI pairs in response to the particular confidence level not satisfying the first reference confidence level criterion; and including the particular ROI in the second subset of ROI pairs in response to the particular confidence level satisfying the first reference confidence level criterion.
2. The method of claim 1, wherein identifying the first and second subsets of ROI pairs, in the first intermediate set of ROI pairs, based on the set of confidence levels comprises for each particular ROI pair in the first intermediate set of ROI pairs: determining whether a particular confidence level, in the set of confidence levels, corresponding to the particular ROI satisfies the first reference confidence level criterion; and including the particular ROI in the first subset of ROI pairs in response to the particular confidence level not satisfying the first reference confidence level criterion; and including the particular ROI in the second subset of ROI pairs in response to the particular confidence level satisfying the first reference confidence level criterion.
7. The method of claims 2, wherein the combining of the first initial set of ROI pairs and the second initial set of ROI pairs comprises clustering the first initial set of ROI pairs and the second initial set of ROI pairs based at least on one of region size, region position, or region label.
3. The method of claim 1, wherein the combining the first initial set of ROI pairs and the second initial set of ROI pairs comprises clustering the first initial set of ROI pairs and the second initial set of ROI pairs based at least on one of region size, region position, and region label.
8. The method of claims 2, wherein to produce the second initial set of ROI pairs, the raw image data is processed using the second trained neural network while the second trained neural network is set for a first precision, the method comprising: processing, by the one or more hardware processors, the raw image data using the second trained neural network, while the second trained neural network is set for a second precision lower than the first precision, to produce a third initial set of ROI pairs, the generating the second intermediate set of Ri pairs based on the set of human-confirmed ROI pairs comprises combining the third initial set of ROI pairs and the set of human-confirmed ROI pairs.
See claims 1 and 6 above.
9. The method of claims 8, wherein the combining of the third initial set of ROI pairs and the set of human-confirmed ROI pairs comprises clustering the third initial set of ROI pairs and the set of human-confirmed ROI pairs based at least on one of region size, region position, or region label.
7. The method of claim 6, wherein the combining the third initial set of ROI pairs and the set of human-confirmed ROI pairs comprises clustering the third initial set of ROI pairs and the set of human-confirmed ROI pairs based at least on one of region size, region position, and region label.
10. The method of claim 8, wherein the processing of the raw image data using the second trained neural network to produce the third initial set of ROI pairs comprises producing a second set of confidence levels for the third initial set of ROI pairs, the method comprising: assigning, by the one or more hardware processors, the second set of confidence levels to the second intermediate set of ROI ,pairs; and identifying, by the one or more hardware processors, third and fourth subsets of ROI pairs, in the second intermediate set of ROI pairs, based on the second set of confidence levels, each ROI pair in the third subset of ROI pairs having a confidence level that does not satisfy a second reference confidence level criterion, each ROI pair in the fourth subset of ROI pairs having a confidence level that satisfies the second reference confidence level criterion, and the second reference confidence level criterion assisting in determining which regions of interest are easy for the second trained neural network to label and which regions of interest are hard for the second trained neural network to label.
8. The method of claim 6, wherein the processing the raw image data using the second trained neural network to produce the third initial set of ROI pairs comprises producing a second set of confidence levels for the third initial set of ROI pairs, the method further comprising: assigning, by the one or more hardware processors, the second set of confidence levels to the second intermediate set of ROI pairs; and identifying, by the one or more hardware processors, third and fourth subsets of ROI pairs, in the second intermediate set of ROI pairs, based on the second set of confidence levels, each ROI pair in the third subset of ROI pairs having a confidence level that does not satisfy a second reference confidence level criterion, each ROI pair in the fourth subset of ROI pairs having a confidence level that satisfies the second reference confidence level criterion, and the second reference confidence level criterion assisting in determining which regions of interest are easy for the second trained neural network to label and which regions of interest are hard for the second trained neural network to label.
11. The method of claim 10, comprising: storing, by the one or more hardware processors, the third subset of ROI pairs to first training dataset; storing, by the one or more hardware processors, the fourth subset of ROl pairs to second training dataset; and causing, by the one or more hardware processors, the second trained neural network to train over the first training dataset and the second training dataset such that the second trained neural network trains over the first training dataset faster than over the second training dataset.
10. The method of claim 8, further comprising: storing, by the one or more hardware processors, the third subset of ROI pairs as a first training dataset; storing, by the one or more hardware processors, the fourth subset of ROI pairs as a second training dataset; and training, by the one or more hardware processors, the second trained neural network over the first training dataset and the second training dataset such that the second trained neural network trains over the first training dataset faster than over the second training dataset.
12. A systems comprising: 

a memory storing instructions; and 

one or more hardware processors communicatively coupled to the memory and configured by the instructions to perform operations comprising: 

processing raw image data using a first trained neural network to produce a first initial set of region of interest (ROl) pairs, each ROl pair comprising a detected ROI for the raw image data and a detected region label classifying the detected ROI; 

processing the raw image data using a second trained neural network to produce a second initial set of ROI pairs; 

generating a first intermediate set of ROl pairs by combining the first initial set of ROI pairs and the second initial set of ROI pairs; 

evaluating the first intermediate set of ROI pairs using a set of expert classifiers to produce a set of confidence levels for the first intermediate set of ROI pairs; 

identifying first and second subsets of ROl pairs, in the first intermediate set of ROI pairs, based on the set of confidence levels, each ROI pair in the first subset of ROI pairs having a confidence level that does not satisfy a first reference confidence level criterion, and each ROI pair in the second subset of ROI pairs having a confidence level that satisfies the first reference confidence level criterion; 

sending the first subset of ROI pairs to a labeling systems that uses a human individual to confirm or modify a particular detected region label of a particular RI pair in the first subset of ROI pairs; 

receiving a set of human-confirmed ROI pairs received from the labeling systems; and 

generating a second intermediate set of ROI pairs based on the set of human- confirmed ROI pairs.
16. A system comprising: 

a memory storing instructions; and 

one or more hardware processors communicatively coupled to the memory and configured by the instructions to perform operations comprising: 

processing raw image data using a first trained neural network to produce a first initial set of region of interest (ROI) pairs, each ROI pair comprising a detected ROI for the raw image data and a detected region label classifying the detected ROI; 

processing the raw image data using a second trained neural network, while the second trained neural network is set for a first precision, to produce a second initial set of ROI pairs; 

generating a first intermediate set of ROI pairs by combining the first initial set of ROI pairs and the second initial set of ROI pairs; 

evaluating the first intermediate set of ROI pairs using a set of expert classifiers to produce a set of confidence levels for the first intermediate set of ROI pairs; 

identifying first and second subsets of ROI pairs, in the first intermediate set of ROI pairs, based on the set of confidence levels, each ROI pair in the first subset of ROI pairs having a confidence level that does not satisfy a first reference confidence level criterion, and each ROI pair in the second subset of ROI pairs having a confidence level that satisfies the first reference confidence level criterion; 

processing the raw image data using the second trained neural network, while the second trained neural network is set for a second precision lower than the first precision, to produce a third initial set of ROI pairs; and 

generating a second intermediate set of ROI pairs based on the third initial set of ROI pairs.

18. The system of claim 16, wherein the generating the second intermediate set of ROI pairs based on the third initial set of ROI pairs comprises combining the third initial set of ROI pairs and a set of human-confirmed ROI pairs, the set of human-confirmed ROI pairs being provided by a labeling system that uses a human individual to confirm or modify a particular detected region label of a particular ROI pair in the first subset of ROI pairs.
13. The systems of claims 12, wherein the labeling systems is a crowd-sourced annotation system.
See claims 1 and 5 above.
14. The system of claim 12, wherein the raw image data comprises a plurality of raw images from at least one of a video data stream or a database.
See claims 1 and 5 above.
15. The system of claim 12, wherein the raw image data comprises a plurality of raw images from a camera fixed at a locations in a physical environment and having an angle of view of the physical environment, the operations comprising: causing the second trained neural network to train based on the second intermediate set of ROI pairs, the second trained neural network being trained to process images generated by the camera.
See claims 1, 6 and 8 above.
16. The system of claim 12, wherein the identifying of the first and second subsets of ROI pairs, in the first intermediate set of ROI pairs, based on the set of confidence levels comprises for each particular ROI pair in the first intermediate set of ROI pairs: determining whether a particular confidence level, in the set of confidence levels, corresponding to the particular ROI satisfies the first reference confidence level criterion; and including the particular ROI in the first subset of ROI pairs in response to the particular confidence level not satisfying the first reference confidence level criterion; and including the particular ROI in the second subset of ROI pairs in response to the particular confidence level satisfying the first reference confidence level criterion.
See claims 1-2 above.
17. The systems of claims 12, wherein the combining of the first initial set of ROI pairs and the second initial set of ROI pairs comprises clustering the first initial set of ROI pairs and the second initial set of ROI pairs based at least on one of region size, region position, or region label.
See claims 1 and 3 above.
18. The system of claim 12, wherein to produce the second initial set of ROI pairs, the raw image data is processed using the second trained neural network while the second trained neural network is set for a first precision, the operations comprising: processing the raw image data using the second trained neural network, while the second trained neural network is set for a second precision lower than the first precision, to produce a third initial set of ROI pairs, the generating the second intermediate set of ROI pairs based on the set of human-confirmed ROI pairs comprises combining the third initial set of ROI pairs and the set of human-confirmed ROI pairs.
See claims 1 and 6 above.
19. The systems of claims 18, wherein the combining of the third initial set of ROI pairs and the set of human-confirmed ROI pairs comprises clustering the third initial set of ROI pairs and the set of human-confirmed ROI pairs based at least on one of region size, region position, or region label.
See claims 1 and 6-7 above.
20. The system of claim 18, wherein the processing of the raw image data using the second trained neural network to produce the third initial set of ROI pairs comprises producing a second set of confidence levels for the third initial set of ROIl pairs, the operations comprising: assign inrg, by the one or more hardware processors, the second set of confidence levels to the second intermediate set of ROI pairs; and identifying, by the one or more hardware processors, third and fourth subsets of ROI pairs, in the second intermediate set of ROI pairs, based on the second set of confidence levels, each ROI pair in the third subset of ROI pairs having a confidence level that does not satisfy a second reference confidence level criterion, each ROI pair in the fourth subset of ROI pairs having a confidence level that satisfies the second reference confidence level criterion, and the second reference confidence level criterion assisting in determining which regions of interest are easy for the second trained neural network to label and which regions of interest are hard for the second trained neural network to label.
See claims 1, 6 and 8 above.
21. A non-transitory computer storage medium comprising instructions that, when executed by a hardware processor of a device, cause the device to perform operations comprising: 

processing raw image data using a first trained neural network to produce a first initial set of region of interest (ROI) pairs, each ROI pair comprising a detected ROI for the raw image data and a detected region label classifying the detected Ri; 

processing the raw image data using a second trained neural network to produce a second initial set of ROI pairs; 

generating a first intermediate set of ROI pairs by combining the first initial set of ROI pairs and the second initial set of ROI pairs; 

evaluating the first intermediate set of Ri pairs using a set of expert classifiers to produce a set of confidence levels for the first intermediate set of ROI pairs; 

identifying first and second subsets of Ri pairs, in the first intermediate set of ROI pairs, based on the set of confidence levels, each ROi pair in the first subset of ROi pairs having a confidence level that does not satisfy a first reference confidence level criterion, and each ROI pair in the second subset of Ri pairs having a confidence level that satisfies the first reference confidence level criterion; 

sending the first subset of ROI pairs to a labeling system that uses a human individual to confirm or modify a particular detected region label of a particular ROI pair in the first subset of ROI pairs,  

receiving a set of human-confirmed ROI pairs received from the labeling system; and 

generating a second intermediate set of ROI pairs based on the set of human-confirmed ROI pairs.
Claim 20 teaches a non-transitory computer storage medium comprising instructions that, when executed by a hardware processor of a device, cause the device to perform operations comprising all the noted limitations except for:

sending the first subset of ROI pairs to a labeling system that uses a human individual to confirm or modify a particular detected region label of a particular ROI pair in the first subset of ROI pairs,

receiving a set of human-confirmed ROI pairs received from the labeling system; and 
generating a second intermediate set of ROI pairs based on the set of human-confirmed ROI pairs.

Claims 1 and 6, as previously discussed teach these limitations. 

It can be understood that implementing the method of claims 1 and 6 via a non-transitory computer storage medium comprising instructions was well known to those of ordinary skill in the art.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 2-4, 6, 12-14, 16 and 21  is/are rejected under 35 U.S.C. 103 as being unpatentable over El-Khamy (US 2018/0089505) in view of Sannen et al (NPL: “An On-Line Interactive Self-adaptive Image Classification Network”).
For claim 2, El-Khamy teaches a method (Abstract) comprising:
processing, by one or more hardware processors ([0009]), raw image data (208, Figure 2) using a first trained neural network (first 202, Figure 2) to produce a first initial set of region of interest (ROI) pairs (“each…202 may further refine the box coordinates of the bounding box of the detected object”, [0048]), each ROI pair comprising a detected ROI for the raw image data (bounding box, [0048]) and a detected region label classifying the detected ROI (e.g., pedestrians, [0047]); 
processing, by the one or more hardware processors ([0009]), the raw image data using a second trained neural network (second 202, Figure 2) to produce a second initial set of ROI pairs (as understood by [0048] and Figure 2);  
generating, by the one or more hardware processors ([0009]), a first intermediate set of ROI pairs by combining the first initial set of ROI pairs and the second initial set of ROI pairs (216, Figure 2 and [0049]);
El-Khamy does not distinctly disclose:
evaluating, by the one or more hardware processors, the first intermediate set of ROI pairs using a set of expert classifiers to produce a set of confidence levels for the first intermediate set of ROI pairs; 
identifying, by the one or more hardware processors, first and second subsets of ROI pairs, in the first intermediate set of ROI pairs, based on the set of confidence levels, each ROI pair in the first subset of ROI pairs having a confidence level that does not satisfy a first reference confidence level criterion, and each ROI pair in the second subset of ROI pairs having a confidence level that satisfies the first reference confidence level criterion; 
sending, by the one or more hardware processors, the first subset of ROI pairs to a labeling system that uses a human individual to confirm or modify a particular detected region label of a particular ROI pair in the first subset of ROI pairs; 
receiving, by the one or more hardware processors, a set of human-confirmed ROI pairs received from the labeling system; and 
generating, by the one or more hardware processors, a second intermediate set of ROI pairs based on the set of human-confirmed ROI pairs.
However, Sannen teaches a classification framework (Figure 1) which provides image labels to regions of interest (Figure 1) via an ensemble of expert classifiers (Figure 1 and §2). Sannen expert classifiers employ “a fully automatic incorporation of operator’s feedback into the classifier during online processing” (¶2 of pg. 172) wherein one classifier is set up for each expert and an ensemble method will combine them to generate the final decision (§2).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to calculate a plurality of confidence scores with respect to El-Khamy’s first intermediate set of ROI pairs via an ensemble set of classifiers since an ensemble of classifiers can be more accurate than any of its individual members (§5, Sannen).
The combination of El-Khamy and Sannen teaches:
evaluating, by the one or more hardware processors ([0009]), the first intermediate set of ROI pairs using a set of expert classifiers to produce a set of confidence levels for the first intermediate set of ROI pairs (outputs of Classifier A-Classifier C, Sannen); 
identifying, by the one or more hardware processors ([0009]), first (ROI pairs from the intermediate set of ROI pairs which have a different classification decision by the ensemble classifier, Figure 1 of Sannen) and second (ROI pairs from the intermediate set of ROI pairs affirmed by the ensemble classifier, Figure 1 of Sannen) subsets of ROI pairs, in the first intermediate set of ROI pairs, based on the set of confidence levels (associated with the classification decision by the ensemble classifier), each ROI pair in the first subset of ROI pairs having a confidence level that does not satisfy a first reference confidence level criterion (criterion required for the ensemble classifier to affirm an ROI pair), and each ROI pair in the second subset of ROI pairs having a confidence level that satisfies the first reference confidence level criterion (criterion required for the ensemble classifier to affirm an ROI pair);
sending, by the one or more hardware processors, the first subset of ROI pairs to a labeling system that uses a human individual to confirm or modify a particular detected region label of a particular ROI pair in the first subset of ROI pairs (human experts, Figure 1 and §2);  
receiving, by the one or more hardware processors, a set of human-confirmed ROI pairs from the labeling system (feedback, §2); and 
generating, by the one or more hardware processors, a second intermediate set of ROI pairs based on the set of human-confirmed ROI pairs (final good/bad decision, §2 and Figure 1 of Sannen).
For claim 3, El-Khamy as modified by Sannen teaches all of the limtiations of claim 2 as cited above and Sannen further teaches:
the labeling system is a crowd-sourced annotation system (as understood by Figure 1).
For claim 4, El-Khamy as modified by Sannen teaches all of the limtiations of claim 2 as cited above and El-Khamy further teaches:
the raw image data comprises a plurality of raw images from at least one of a video data stream or a database ([0005] and [0032]).
For claim 6, El-Khamy as modified by Sannen teaches all of the limtiations of claim 2 as cited above and El-Khamy further teaches:
identifying the first and second subsets of ROI pairs, in the first intermediate set of ROI pairs, based on the set of confidence levels comprises for each particular ROI pair in the first intermediate set of ROI pairs:
determining whether a particular confidence level, in the set of confidence levels, corresponding to the particular ROI satisfies the first reference confidence level criterion (whether the output of El-Khamy’s 216 matches the output of the ensemble set of classifiers); and 
including the particular ROI in the first subset of ROI pairs in response to the particular confidence level not satisfying the first reference confidence level criterion (as understood by the combination of references); and 
including the particular ROI in the second subset of ROI pairs in response to the particular confidence level satisfying the first reference confidence level criterion (as understood by the combination of references).
For claim 12, El-Khamy teaches a system (Figures 1-3) comprising:
a memory storing instructions (130, Figure 1); and
one or more hardware processors (120, Figure 1) communicatively coupled to the memory and configured by the instructions to ([0031]-[0032]):
process raw image data (208, Figure 2) using a first trained neural network (first 202, Figure 2) to produce a first initial set of region of interest (ROI) pairs (“each…202 may further refine the box coordinates of the bounding box of the detected object”, [0048]), each ROI pair comprising a detected ROI for the raw image data (bounding box, [0048])  and a detected region label classifying the detected ROI (e.g., pedestrians, [0047]);
process the raw image data using a second trained neural network (second 202, Figure 2), while the second trained neural network is set for a first precision ([0042]), to produce a second initial set of ROI pairs (as understood by [0048] and Figure 2);
generate a first intermediate set of ROI pairs by combining the first initial set of ROI pairs and the second initial set of ROI pairs (216, Figure 2 and [0049]);
El-Khamy does not distinctly disclose:
evaluating the first intermediate set of ROI pairs using a set of expert classifiers to produce a set of confidence levels for the first intermediate set of ROI pairs;
identifying first and second subsets of ROl pairs, in the first intermediate set of ROI pairs, based on the set of confidence levels, each ROI pair in the first subset of ROI pairs having a confidence level that does not satisfy a first reference confidence level criterion, and each ROI pair in the second subset of ROI pairs having a confidence level that satisfies the first reference confidence level criterion; 
sending the first subset of ROI pairs to a labeling systems that uses a human individual to confirm or modify a particular detected region label of a particular RI pair in the first subset of ROI pairs; 
receiving a set of human-confirmed ROI pairs received from the labeling systems; and 
generating a second intermediate set of ROI pairs based on the set of human- confirmed ROI pairs.
However, Sannen teaches a classification framework (Figure 1) which provides image labels to regions of interest (Figure 1) via an ensemble of expert classifiers (Figure 1 and §2). Sannen expert classifiers employ “a fully automatic incorporation of operator’s feedback into the classifier during online processing” (¶2 of pg. 172) wherein one classifier is set up for each expert and an ensemble method will combine them to generate the final decision (§2).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to calculate a plurality of confidence scores with respect to El-Khamy’s first intermediate set of ROI pairs via an ensemble set of classifiers since an ensemble of classifiers can be more accurate than any of its individual members (§5, Sannen).
The combination of El-Khamy and Sannen teaches:
evaluating, by the one or more hardware processors ([0009]), the first intermediate set of ROI pairs using a set of expert classifiers to produce a set of confidence levels for the first intermediate set of ROI pairs (outputs of Classifier A-Classifier C, Sannen); 
identifying, by the one or more hardware processors ([0009]), first (ROI pairs from the intermediate set of ROI pairs which have a different classification decision by the ensemble classifier, Figure 1 of Sannen) and second (ROI pairs from the intermediate set of ROI pairs affirmed by the ensemble classifier, Figure 1 of Sannen) subsets of ROI pairs, in the first intermediate set of ROI pairs, based on the set of confidence levels (associated with the classification decision by the ensemble classifier), each ROI pair in the first subset of ROI pairs having a confidence level that does not satisfy a first reference confidence level criterion (criterion required for the ensemble classifier to affirm an ROI pair), and each ROI pair in the second subset of ROI pairs having a confidence level that satisfies the first reference confidence level criterion (criterion required for the ensemble classifier to affirm an ROI pair);
sending, by the one or more hardware processors, the first subset of ROI pairs to a labeling system that uses a human individual to confirm or modify a particular detected region label of a particular ROI pair in the first subset of ROI pairs (human experts, Figure 1 and §2);  
receiving, by the one or more hardware processors, a set of human-confirmed ROI pairs from the labeling system (feedback, §2); and 
generating, by the one or more hardware processors, a second intermediate set of ROI pairs based on the set of human-confirmed ROI pairs (final good/bad decision, §2 and Figure 1 of Sannen).
For claim 13, El-Khamy as modified by Sannen teaches all of the limtiations of claim 12 as cited above and Sannen further teaches:
the labeling system is a crowd-sourced annotation system (as understood by Figure 1).
For claim 14, El-Khamy as modified by Sannen teaches all of the limtiations of claim 12 as cited above and El-Khamy further teaches:
the raw image data comprises a plurality of raw images from at least one of a video data stream or a database ([0005] and [0032]).
For claim 16, El-Khamy as modified by Sannen teaches all of the limtiations of claim 2 as cited above and El-Khamy further teaches:
identifying the first and second subsets of ROI pairs, in the first intermediate set of ROI pairs, based on the set of confidence levels comprises for each particular ROI pair in the first intermediate set of ROI pairs:
determining whether a particular confidence level, in the set of confidence levels, corresponding to the particular ROI satisfies the first reference confidence level criterion (whether the output of El-Khamy’s 216 matches the output of the ensemble set of classifiers); and 
including the particular ROI in the first subset of ROI pairs in response to the particular confidence level not satisfying the first reference confidence level criterion (as understood by the combination of references); and 
including the particular ROI in the second subset of ROI pairs in response to the particular confidence level satisfying the first reference confidence level criterion (as understood by the combination of references).
For claim 21, El-Khamy teaches a non-transitory computer storage medium comprising instructions that, when executed by a hardware processor of a device, cause the device to perform operations ([0031]-[0032]) comprising:
processing raw image data (208, Figure 2) using a first trained neural network (first 202, Figure 2) to produce a first initial set of region of interest (ROI) pairs (“each…202 may further refine the box coordinates of the bounding box of the detected object”, [0048]), each ROI pair comprising a detected ROI for the raw image data (bounding box, [0048]) and a detected region label classifying the detected ROI (e.g., pedestrians, [0047]);
processing the raw image data using a second trained neural network (second 202, Figure 2), while the second trained neural network is set for a first precision ([0042]), to produce a second initial set of ROI pairs (as understood by [0048] and Figure 2);
generating a first intermediate set of ROI pairs by combining the first initial set of ROI pairs and the second initial set of ROI pairs (216, Figure 2 and [0049]);
El-Khamy does not distinctly disclose:
evaluating, by the one or more hardware processors, the first intermediate set of ROI pairs using a set of expert classifiers to produce a set of confidence levels for the first intermediate set of ROI pairs; 
identifying, by the one or more hardware processors, first and second subsets of ROI pairs, in the first intermediate set of ROI pairs, based on the set of confidence levels, each ROI pair in the first subset of ROI pairs having a confidence level that does not satisfy a first reference confidence level criterion, and each ROI pair in the second subset of ROI pairs having a confidence level that satisfies the first reference confidence level criterion; 
sending, by the one or more hardware processors, the first subset of ROI pairs to a labeling system that uses a human individual to confirm or modify a particular detected region label of a particular ROI pair in the first subset of ROI pairs; 
receiving, by the one or more hardware processors, a set of human-confirmed ROI pairs received from the labeling system; and 
generating, by the one or more hardware processors, a second intermediate set of ROI pairs based on the set of human-confirmed ROI pairs.
However, Sannen teaches a classification framework (Figure 1) which provides image labels to regions of interest (Figure 1) via an ensemble of expert classifiers (Figure 1 and §2). Sannen expert classifiers employ “a fully automatic incorporation of operator’s feedback into the classifier during online processing” (¶2 of pg. 172) wherein one classifier is set up for each expert and an ensemble method will combine them to generate the final decision (§2).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to calculate a plurality of confidence scores with respect to El-Khamy’s first intermediate set of ROI pairs via an ensemble set of classifiers since an ensemble of classifiers can be more accurate than any of its individual members (§5, Sannen).
The combination of El-Khamy and Sannen teaches:
evaluating, by the one or more hardware processors ([0009]), the first intermediate set of ROI pairs using a set of expert classifiers to produce a set of confidence levels for the first intermediate set of ROI pairs (outputs of Classifier A-Classifier C, Sannen); 
identifying, by the one or more hardware processors ([0009]), first (ROI pairs from the intermediate set of ROI pairs which have a different classification decision by the ensemble classifier, Figure 1 of Sannen) and second (ROI pairs from the intermediate set of ROI pairs affirmed by the ensemble classifier, Figure 1 of Sannen) subsets of ROI pairs, in the first intermediate set of ROI pairs, based on the set of confidence levels (associated with the classification decision by the ensemble classifier), each ROI pair in the first subset of ROI pairs having a confidence level that does not satisfy a first reference confidence level criterion (criterion required for the ensemble classifier to affirm an ROI pair), and each ROI pair in the second subset of ROI pairs having a confidence level that satisfies the first reference confidence level criterion (criterion required for the ensemble classifier to affirm an ROI pair);
sending, by the one or more hardware processors, the first subset of ROI pairs to a labeling system that uses a human individual to confirm or modify a particular detected region label of a particular ROI pair in the first subset of ROI pairs (human experts, Figure 1 and §2);  
receiving, by the one or more hardware processors, a set of human-confirmed ROI pairs from the labeling system (feedback, §2); and 
generating, by the one or more hardware processors, a second intermediate set of ROI pairs based on the set of human-confirmed ROI pairs (final good/bad decision, §2 and Figure 1 of Sannen).
Claims 7 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over El-Khamy in view of Sannen and Stanitsas et al (US 2018/0165809).
For claims 7 and 17, El-Khamy as modified by Sannen teaches all of the limitations of claim 2 and 12, respectively, as cited above but does not distinctly disclose:
the combining of the first initial set of ROI pairs and the second initial set of ROI pairs comprises clustering the first initial set of ROI pairs and the second initial set of ROI pairs based at least on one of region size, region position, or region label.
However, Stanitsas teaches that “the outputs generated by the fully-connected layers (assuming Alexnet or VGG-net) of the CNN can be viewed as embedding the original high-dimensional data into a low-dimensional feature space--this embedding can be found to have a clustering effect on the data samples”, ([0183]).  It is noted that El-Khamy’s 204 “may use deep dilated convolutions” ([0042]).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to implement El-Khamy’s 200 such that each respective 202 has an associated 204 to cluster the respective outputs of 202 based on region labeling ([0042]) in order to “further determine a soft confidence score on the primary object detections”, [0042], El-Khamy).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL CALRISSIAN PUENTES whose telephone number is (571)270-5070. The examiner can normally be reached M-F 9-6:30 (flex).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Menatoallah Yousseff can be reached on 571-270-3684. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DANIEL C PUENTES/Primary Examiner, Art Unit 2849