DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
Claims 1-20 are pending in this application. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the Pre-Appeal Brief Request filed by Applicant on December 27, 2021, which was filed in this application in conjunction with an appeal to the Patent Trial and Appeal Board. In response to the decision, prosecution on the application has been re-opened and the finality of the previous office action has been withdrawn. Applicant’s submission of Remarks filed on December 27, 2021 has been entered.  
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
	
Examiner' s Responses to Applicant' s Remark
Applicants' amendments filed on December 27, 2021 have been fully considered. The amendments overcome the following rejections set forth in the office action mailed on August 26, 2021.
a.	Applicant' s arguments regarding the teachings of Fatteh are persuasive, and the rejection of Claims 1-20 under 35 U.S.C. 103(a) as being unpatentable over Zhang et al. (US PGPub US 2010/0321513), hereby referred to as “Zhang”, in view of Jung et al. (US PGPub US 2011/0255741), hereby referred to as “Jung” further in view of Kefi-Fatteh, Takoua, et al. "Human face detection improvement using incremental learning based on low variance directions." Signal, Image and Video Processing 13.8 (published May 2019): 1503-1510), hereby referred to as “Fatteh” is hereby withdrawn. 
Applicant's arguments with respect to claims 1-20 have been considered but are moot in view of the new grounds of rejection, presented below. 
6.	Applicant' s arguments, see “Pre-Appeal Brief Request”, filed December 27, 2021, with respect to the teachings of Fatteh have been fully considered and are persuasive.  Therefore, the rejection of Claims 1-20 under 35 U.S.C. 103(a) as being unpatentable over Zhang et al. (US PGPub US 2010/0321513), hereby referred to as “Zhang”, in view of Jung et al. (US PGPub US 2011/0255741), hereby referred to as “Jung” further in view of Kefi-Fatteh has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made and presented below.

Applicants' arguments filed on December 22, 2021 regarding the teachings of Zhang and Jung have been fully considered but they are not persuasive. The Examiner has thoroughly reviewed Applicants' arguments but firmly believes that the cited reference to reasonably and properly meet the claimed limitation. 

Applicant argues that they “reserves comment regarding the treating of the claims under 35 U.S.C. 112 sixth paragraph”. 
With respect to the argument that the applicant traverses the 35 U.S.C. 112 sixth paragraph rejection of the claims for reciting the term “component”, applicant is directed towards the MPEP 2181 A, which specifically recites the following: “The following is a list of non-structural generic placeholders that may invoke 35 U.S.C. 112(f): "mechanism for," "module for," "device for," "unit for," "component for," "element for," "member for," "apparatus for," "machine for," or "system for." Welker Bearing Co., v. PHD, Inc., 550 F.3d 1090, 1096, 89 USPQ2d 1289, 1293-94 (Fed. Cir. 2008); Mass. Inst. of Tech. v. Abacus Software, 462 F.3d 1344, 1354, 80 USPQ2d 1225, 1228 (Fed. Cir. 2006); Personalized Media, 161 F.3d at 704, 48 USPQ2d at 1886–87; Mas-Hamilton Group v. LaGard, Inc., 156 F.3d 1206, 1214-1215, 48 

With respect to the double patenting rejection of claims, applicant’s traversal of the rejection of Claims 1-20 on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Application No. 16/706,623, does not overcome the rejection. It is not clear at the present moment that either set of claims are in condition for allowance, so the provisional rejection holds.  Although the claims at issue are not identical, they are not patentably distinct from each other because they are both directed towards using variance-based image analysis of vehicular environments for vehicular control, as was previously presented in the office action. Applicant’s arguments that additional features in the instant application of “an indication of a high variance region in the image data represented by a bounding box,” is incorrect. Actually the recitation of additional features in the instant application narrows the scope of the claims and if they were to be placed into condition for allowance, these narrower claims would anticipate the currently filed claims in the co-pending application 16/706623, and as such would be subject to a terminal disclaimer. Applicant is encouraged to amend the claims in a manner so that the claims are directed towards different patentable scopes, or to file a terminal disclaimer to overcome the double patenting rejections of record.

Applicant argues that Examiner made a “factual error to allege that Zhang teaches “determining ... an indication of a low variance region,” and that “Zhang describes the exact opposite [because] “Zhang discusses first selecting visually stand-out blocks by comparing block’s variance, then determining whether a distribution of such stand-out blocks is compact”, and then further states that 
Examiner respectfully disagrees. Applicants are reminded that the Examiner is entitled to give the broadest reasonable interpretation to the language of the claims. Applicant mischaracterizes the teachings of Zhang. To address this feature, let us first examine the teachings of Zhang. Zhang clearly teaches “determining, based at least partly on the indication of the low variance region, an indication of a high variance region in the image data represented by a bounding box; (Zhang: [0033] In the step 400, a bounding box (e.g. rectangular, circular, spherical, square, triangular or another shape) is initialized.)” Applicant is further directed towards paragraphs  [0032]-[0037] and Figure 5, which depicts block variance-based object of interest isolation procedure, and Figure 5 clearly illustrates a block-based delimitation, wherein a series of darker high-variance blocks outline a series of low-variance sub-blocks. Zhang describes in paragraph [0033] “In the step 400, a bounding box (e.g. rectangular, circular, spherical, square, triangular or another shape) is initialized.” Zhang further describes object of interest isolation in paragraphs [0032]-[0037], and applicant is further directed to the block variance-based object of interest isolation by initializing a bounding box in step 400 using the candidate high variance block centroids (paragraph [0033]-[0034]) to generate a convex shape (paragraph [0036]), which is depicted in Figure 5 and described in paragraph [0037] and Figure 5 clearly illustrates a block-based boundary, wherein a series of darker high-variance blocks outline a series of low-variance sub-blocks. Thus in accordance with Figure 5, Zhang teaches embodiments wherein low-variance regions are sub-regions of the high-variance regions.  Thus in accordance with Figure 5, Zhang teaches embodiments wherein low-variance regions are sub-regions of the high-variance regions 
So the Examiner considers these teachings of Zhang to be Applicants' “determining, based at least partly on the indication of the low variance region, an indication of a high variance region in the image data represented by a bounding box the low variance region being a sub-region of the high variance region” 


    PNG
    media_image1.png
    518
    640
    media_image1.png
    Greyscale


Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Application No. 16/706,623. Although the claims at issue are not identical, they are not patentably distinct from each other because they are both directed towards using variance-based image analysis of vehicular environments for vehicular control. This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

35 U.S.C. § 112 Sixth Paragraph - Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:


The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations are: component and unit in claims 1-17 and 19. 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Priority
This application repeats a substantial portion of prior Application No.16/457,524, filed June 28, 2019, and adds disclosure not presented in the prior application. Because this application names the inventor or at least one joint inventor named in the prior application, it constitutes a continuation-in-part of the prior application. In reviewing the subject matter of the claimed invention, it appears that the claims are directed towards subject matter that is largely supported in the newly added disclosure, and as a result, the overall claimed invention in this application is being examined with the priority date of the filing of the instant application, which is December 6, 2019.  Should applicant desire to claim the benefit of the filing date of the prior application, attention is directed to 35 U.S.C. 120, 37 CFR 1.78, and MPEP § 211 et seq.
	
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103(a) as being unpatentable over Zhang et al. (US PGPub US 2010/0321513), hereby referred to as “Zhang”, in view of Jung et al. (US PGPub US 2011/0255741), hereby referred to as “Jung”. 

Consider Claim 1. 
Zhang teaches: 
A method comprising: (Zhang: [0052], Figure 8, abstract, Content adaptive detection of images having stand-out objects involves block variance-based detection and determining if an object includes a stand-out object. The images with a stand-out object are further processed to isolate an object of interest. The images without a detected stand-out object are further processed with a transition map-based detection method which includes generating a transition map. If an object portrait is determined from the transition map, then the image is further processed to isolate the object of interest.)
receiving, from an image capturing device, image data representing an environment; (Zhang: [0055]-[0056], [0008] In another aspect, a camera comprises a lens, a sensor configured for acquiring an input image through the lens, a memory for storing an application, the application configured for processing the input image using a block variance-based detection module, if the input image includes a standout object, [0019])
inputting at least a portion of the image data into a machine learned model; (Zhang: [0019] FIG. 1 illustrates a flowchart of an overall architecture of the method of detecting images. An input image is processed by a block variance-based detection module, in the step 100. [0056] To utilize the content adaptive detection, a user acquires an image such as by a digital camera, and then while the image is acquired or after the image is acquired, the image is able to be processed using the content adaptive detection method. In some embodiments, the camera automatically implements the content adaptive detection, and in some embodiments, a user manually selects to implement the content adaptive detection.)
determining, by the machine learned model, an indication of a low variance region associated with the image data; (Zhang: [0019] In the block variance-based detection module, the visually stand-out blocks are selected by comparing each block's variance with a content adaptive threshold. [0022]-[0023] After the block variance values are obtained in the step 200, the mean block variance value of the high variance blocks is calculated, in the step 202, for example, using an equation that applies a thresholding operation to the mean variance value.)
determining, based at least partly on the indication of the low variance region, an indication of a high variance region in the image data represented by a bounding box; (Zhang: Examiner Note: the mean variance and thresholding operations lead to the determination of 3 types of blocks, blocks that fail to meet the threshold (low), candidate high variance blocks (candidate high) and stand-out criterion blocks (high), [0023]-[0024] In step 202, a thresholding operation is applied to identify blocks that have a mean variance above a certain value to qualify them as candidate high variance blocks  [0023] If this mean value is larger than a threshold (e.g. 1600), the mean value is set as 1600. This mean value is then utilized to determine if a block is a candidate high variance block or not. If the block variance is larger than the mean value, it is selected as high variance block. Otherwise, it is not selected. [0033] In the step 400, a bounding box (e.g. rectangular, circular, spherical, square, triangular or another shape) is initialized. Initializing includes setting the bounding box width as half of an image width plus a six block width and a bounding box height equal to the bounding box width if an image width is larger than an image height, and setting the bounding box width as half of the image height and the bounding box height equal to the bounding box width plus twelve block width if the image width is less than or equal to the image height. Initializing also includes using the candidate high variance block centroid as the bounding box center to draw the bounding box. If the bounding box is over the image boundary, the bounding box is shifted in the image such that the bounding box has a minimum 3 blocks distance from the image boundary.)
the low variance region being a sub-region of the high variance region; (Zhang: [0024] After the above candidate high variance blocks selection in the step 202, the high variance blocks are analyzed if their distribution is able to satisfy the stand-out criterion check, in the step 204. The check process is further illustrated in the FIG. 3. [0033] In the step 400, a bounding box (e.g. rectangular, circular, spherical, square, triangular or another shape) is initialized.)” Zhang further describes object of interest isolation in paragraphs [0032]-[0037], and applicant is further directed to the block variance-based object of interest isolation by initializing a bounding box in step 400 using the candidate high variance block centroids (paragraph [0033]-[0034]) to generate a convex shape (paragraph [0036]), which is depicted in Figure 5 and described in paragraph [0037] and Figure 5, which depicts block variance-based object of interest isolation procedure, and Figure 5 clearly illustrates a block-based boundary, wherein a series of darker high-variance blocks outline a series of low-variance sub-blocks. Thus in accordance with Figure 5, Zhang teaches embodiments wherein low-variance regions are sub-regions of the high-variance regions)

    PNG
    media_image1.png
    518
    640
    media_image1.png
    Greyscale

providing the bounding box to at least one of a planning component or a prediction component for object detection (Zhang: [0039] The block transition map-based object portrait detection scheme is illustrated in FIG. 6. In the step 600, a block transition map is extracted. The block transition map is calculated by the following procedure. [0051] A bounding box generated by the center around growing process is initialized. For each row of object blocks within the bounding box, the leftmost object block and rightmost object block are found, and all of the blocks between them are denoted as an object block. For each column of blocks, the top object block and bottom object block are found, and all of the blocks between them are denoted as object blocks. The resulted convex set is used as the object of interest isolation result)
Zhang does not teach: 
image capturing device associated with a vehicle 
a component of the vehicle for determining a trajectory along which the vehicle is to travel; 
and controlling the vehicle based at least partly on the trajectory
Jung teaches: 
A method comprising: (Jung: abstract, A computer implemented method for detecting the presence of one or more pedestrians in the vicinity of the vehicle is disclosed. Imagery of a scene is received from at least one image capturing device. A depth map is derived from the imagery. A plurality of pedestrian candidate regions of interest (ROis) is detected from the depth map by matching each of the plurality of ROis with a 3D human shape model. At least a portion of the candidate ROIs is classified by employing a cascade of classifiers tuned for a plurality of depth bands and trained on a filtered representation of data within the portion of candidate RO Is to determine whether at least one pedestrian is proximal to the vehicle.)
receiving, from an image capturing device associated with a vehicle, image data representing an environment; (Jung: [0042]-[0047], [0043] FIG. 1 depicts a vehicle 100 that is equipped with an exemplary digital processing system 110 configured to acquire a plurality of images and detect the presence of one or more pedestrians 102 in a scene 104 in the vicinity of the vehicle 100, according to an embodiment of the present invention.)
inputting at least a portion of the image data into a machine learned model; (Jung: [0048], FIG. 3 is a block diagram illustrating exemplary software modules that execute the steps of a method for detecting a pedestrian in the vicinity of the vehicle, according to an embodiment of the present invention. Referring now to FIGS. 1-3, in block Sl, at least one image of the scene is received by one or more image capturing devices 106 from the vehicle 100. In block S2, at least one stereo depth map is derived from the at least one image. In a preferred embodiment, disparities are generated at a plurality of pyramid resolutions, preferably three-Di, i=l, ... , 3, with DO being the resolution of the input image.)
determining, by the machine learned model, an indication of a low variance region associated with the image data; (Jung: [0056] To further classify the patches, in step 618, a representation from the range map is created called a vertical support (VS) histogram. More particularly, a discrete 2D grid of the world X-coordinates and the world disparities is defined. Each point from the range map which satisfies a given distance range and a given height range is projected to a cell on the grid and its height recorded. For each bin, the variance of heights of all the points projected in the bin is computed. This provides a 2D histogram in X-d coordinates which measures the support at a given world location from any visible structure above it.)
determining, based at least partly on the indication of the low variance region, an indication of a high variance region in the image data represented by a bounding box; (Jung: [0051] FIG. 4 depicts exemplary steps executed by the pedestrian detector (PD) module 400 in greater detail, according to an embodiment of the present invention. In the PD module 400, template matching is conducted using a 3D pedestrian shape template applied to a plurality ( e.g., three) disjoint range bands in front of the vehicle 100. The 3D shape size is a predetermined function of the actual range from the image capturing devices 106. [0052] As mentioned above, in step 402, depth maps are  obtained at separate image resolutions, Dl, i=l, ... , 3.)
providing the bounding box to at least one of a planning component or a prediction component of the vehicle for determining a trajectory along which the vehicle is to travel; (Jung: Figure 4, [0053] FIGS. 5A-4D are visual depictions of an example of pedestrian ROI refinement, according to an embodiment of the present invention. Depth map based detected ROIs are further refined by examining a combination of depth and edge features of two individual pedestrian detections. In step 413, a new pedestrian ROI is initialized at each detected peak, which is refined first horizontally and then vertically to obtain a more centered and tightly fitting bounding box about a candidate pedestrian. This involves employing vertical and horizontal projections, respectively, of binarized disparity maps (similar to using the edge pixels above) followed by detection of peak and valley locations in the computed projections. After this refinement, in step 414, any resulting overlapping detections are again removed from the detection list.) 
and controlling the vehicle based at least partly on the trajectory. (Jung: [0046] Portions of a processed video/audio data stream 130 may be stored temporarily in the computer readable medium 128 for later output to an on-board monitor 132, to an onboard automatic collision avoidance system 134, or to a network 136, such as the Internet. [0071]-[0073], [0071] FIGS, 12A-12C depict system performance based on different criteria, System performance was analyzed in terms of different distance intervals, which permit gauging the effectiveness of the system from an application point of view: low latency and high accuracy detection at short distances as well as distant target detection of potential threats of collisions, [0073] Performance was further analyzed in terms of another criteria that determines effectiveness for collision avoidance purposes,)
It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to leverage Zhang’s content adaptive image analysis for object detection 

2. The combination of Zhang and Jung teaches: The method as claim 1 recites, wherein determining the indication of the high variance region comprises: performing a statistical analysis of the indication of the low variance region and an associated portion of the image data that is proximate to the low variance region; and determining, based at least partly on the statistical analysis and the associated portion of the image data that is proximate to the low variance region, a position and extents of the high variance region. (Zhang: [0031] The centroid around variance of the upper half image and lower half image are then compared to the image adaptive threshold VARTH, in the steps 308 and 312, respectively. If any of them is less than the threshold, a confidence value is set to a value (e.g. 3) and the process jumps to the object of interest isolation module. Otherwise, the processing is continued to a transition map-based detection. [0039] The block transition map-based object portrait detection scheme is illustrated in FIG. 6. In the step 600, a block transition map is extracted. The block transition map is calculated by the following procedure. [0040] For each 8x8 luminance (Y) block and 4x4 chrominance (CbCr) block, its intensity and pseudo-color saturation value are calculated, and the difference with its eight neighbor blocks are computed by the following equation (quantized intensity and pseudo-color saturation value are used from the previous calculation). Jung: [0063] FIG, 10 shows examples of foreground masks imposed on pedestrians and negative patches. Alternating sets of three images 1002, 1004, 1006 display the original image, a foreground mask generated from local part templates, and the resulting edge filtering, respectively, The right column 1008 shows the results for negative data, Note that local contour parts can capture global body contours at various poses from its combinations, From FIG, 10, a person skilled in the art would appreciate that for pedestrian images, the method of FIG, 8A refines ROI positions in addition to matching local body parts and can enhance underlying body contours. [0065] For candidate ROIs (pedestrians) located at greater distances beyond a predetermined threshold, a cascade of HOG based classifiers is employed, HOG-based classifiers have been proven to be effective for relatively low-resolution images when body contours are distinguishable from the background. Each HOG classifier is trained separately for each resolution band, For this purpose, in the training phase,)

3. The combination of Zhang and Jung teaches: The method as claim 1 recites, wherein the machine learned model comprises a first machine learned model and the indication of the low variance region comprises a first output, and wherein determining the indication of the high variance region comprises: inputting the first output into a second machine learned model; and receiving, from the second machine learned model, the bounding box, wherein the second  (Jung: [0063] FIG, 10 shows examples of foreground masks imposed on pedestrians and negative patches. Alternating sets of three images 1002, 1004, 1006 display the original image, a foreground mask generated from local part templates, and the resulting edge filtering, respectively, The right column 1008 shows the results for negative data, Note that local contour parts can capture global body contours at various poses from its combinations, [0065] For candidate ROIs (pedestrians) located at greater distances beyond a predetermined threshold, a cascade of HOG based classifiers is employed, HOG-based classifiers have been proven to be effective for relatively low-resolution images when body contours are distinguishable from the background. Each HOG classifier is trained separately for each resolution band. [0070] FIG, 11B displays ROC curves for the first level Contour+HOG classifier 1104 and higher level HOG-based classifiers 1106 evaluated for high resolution image examples, The Contour+HOG classifier shows more robust and stable performance over HOG classifiers alone in terms of detection and false positive rejections, Zhang: [0038]-[0051], Block Transition Map Based Object Portrait Detection, Figure 6)

4. The combination of Zhang and Jung teaches: The method as claim 1 recites, wherein the machine learned model comprises a first machine learned model and the indication of the low variance region comprises a first output, and wherein determining the indication of the high variance region comprises: determining, based at least partly on analyzing the image data using a second machine learned model, a second output indicating at least one feature of the high variance region; and inputting the first output and the second output into a third machine learned model, wherein the third machine learned model is trained to determine high variance regions in image data. (Jung: [0063] FIG, 10 shows examples of foreground masks imposed on pedestrians and negative patches. Alternating sets of three images 1002, 1004, 1006 display the original image, a foreground mask generated from local part templates, and the resulting edge filtering, respectively, The right column 1008 shows the results for negative data, Note that local contour parts can capture global body contours at various poses from its combinations, [0065] For candidate ROIs (pedestrians) located at greater distances beyond a predetermined threshold, a cascade of HOG based classifiers is employed, HOG-based classifiers have been proven to be effective for relatively low-resolution images when body contours are distinguishable from the background. Each HOG classifier is trained separately for each resolution band. [0070] FIG, 11B displays ROC curves for the first level Contour+HOG classifier 1104 and higher level HOG-based classifiers 1106 evaluated for high resolution image examples, The Contour+HOG classifier shows more robust and stable performance over HOG classifiers alone in terms of detection and false positive rejections, Zhang: [0038]-[0051], Block Transition Map Based Object Portrait Detection, Figure 6)

5. The combination of Zhang and Jung teaches:  The method as claim 1 recites, further comprising: receiving, based at least partly on the image data, a plurality of classifications of an object identified in the image data and a plurality of confidence scores, an individual confidence score corresponding to an individual classification; reducing, based at least partly on the indication of the low variance region, a threshold associated with a classification of the plurality of classifications, wherein the classification corresponds to a high variation region; determining that a confidence score associated with the classification meets or exceeds the threshold; and determining the indication of the high variance region based at least partly on determining that the confidence score meets or exceeds the threshold. (Jung: [0061] Here, CtrsubROI\), MFG and ITemplCont denotes the center of a local sub-ROI, a foreground mask, and a binary contour template, CtrTempl (i;Ich) is the center of chamfer matching score with the ith kernel image, respectively. [0062] In step 806, a foreground mask is composed from contour template matching, More particularly, from the contour templates, the foreground mask is composed by overlapping binary local templates at each detected position that is weighted by matching scores, The foreground mask is used as a filter to suppress noisy background features prior to a classification step, In step 808, an HOG-based classifier is applied given the refined sub-ROis and the foreground mask, More particularly, HOG feature descriptors are computed by employing the refined sub-ROI boxes, where gradient values are enhanced by the weighted foreground mask, [0065] For candidate ROIs (pedestrians) located at greater distances beyond a predetermined threshold, a cascade of HOG based classifiers is employed, HOG-based classifiers have been proven to be effective for relatively low-resolution images when body contours are distinguishable from the background. Each HOG classifier is trained separately for each resolution band, For this purpose, in the training phase. Zhang: [0031] The centroid around variance of the upper half image and lower half image are then compared to the image adaptive threshold VARTH, in the steps 308 and 312, respectively. If any of them is less than the threshold, a confidence value is set to a value (e.g. 3) and the process jumps to the object of interest isolation module. Otherwise, the processing is continued to a transition map-based detection. [0038]-[0051], Block Transition Map Based Object Portrait Detection, Figure 6)

Consider Claims 6 and 16. 
Zhang teaches: 
6. A system comprising: one or more processors; and computer-readable instructions that, when executed by the one or more processors, cause the one or more processors to (Zhang: [0052], Figure 8, abstract, Content adaptive detection of images having stand-out objects involves block variance-based detection and determining if an object includes a stand-out object. The images with a stand-out object are further processed to isolate an object of interest. The images without a detected stand-out object are further processed with a transition map-based detection method which includes generating a transition map. If an object portrait is determined from the transition map, then the image is further processed to isolate the object of interest.)
6. receiving sensor data representing an environment; / 16. receiving, sensor data representing an environment within which an object is located; (Zhang: [0055]-[0056], [0008] In another aspect, a camera comprises a lens, a sensor configured for acquiring an input image through the lens, a memory for storing an application, the application configured for processing the input image using a block variance-based detection module, if the input image includes a standout object, [0019])
16. inputting, into a machine learned model, at least a portion of the sensor data; (Zhang: [0019] FIG. 1 illustrates a flowchart of an overall architecture of the method of detecting images. An input image is processed by a block variance-based detection module, in the step 100. [0056] To utilize the content adaptive detection, a user acquires an image such as by a digital camera, and then while the image is acquired or after the image is acquired, the image is able to be processed using the content adaptive detection method. In some embodiments, the camera automatically implements the content adaptive detection, and in some embodiments, a user manually selects to implement the content adaptive detection.)
6. determining an indication of a low variance region associated with the sensor data; / 16.determining, by the machine learned model, an indication of a low variance region associated with the sensor data; Zhang: [0019] In the block variance-based detection module, the visually stand-out blocks are selected by comparing each block's variance with a content adaptive threshold. [0022]-[0023] After the block variance values are obtained in the step 200, the mean block variance value of the high variance blocks is calculated, in the step 202, for example, using an equation that applies a thresholding operation to the mean variance value. [0027] During the above process, the location centroid of the candidate high variance blocks in the upper half of the image is also extracted if the total number of candidate high variance blocks in the upper half of the image is larger than a threshold which is one fourth of the number of blocks in one row of an image. A similar procedure is also applied to a lower half picture)
6. determining an indication of a high variance region associated with the sensor data based at least in part on the indication of the low variance region; / 16.and determining an indication of a high variance region based at least in part on a portion of the sensor data associated with the low variance region. (Zhang: Examiner Note: the mean variance and thresholding operations lead to the determination of 3 types of blocks, blocks that fail to meet the threshold (low), candidate high variance blocks (candidate high) and stand-out criterion blocks (high), [0023]-[0024] In step 202, a thresholding operation is applied to identify blocks that have a mean variance above a certain value to qualify them as candidate high variance blocks  [0023] If this mean value is larger than a threshold (e.g. 1600), the mean value is set as 1600. This mean value is then utilized to determine if a block is a candidate high variance block or not. If the block variance is larger than the mean value, it is selected as high variance block. Otherwise, it is not selected. [0033] In the step 400, a bounding box (e.g. rectangular, circular, spherical, square, triangular or another shape) is initialized. Initializing includes setting the bounding box width as half of an image width plus a six block width and a bounding box height equal to the bounding box width if an image width is larger than an image height, and setting the bounding box width as half of the image height and the bounding box height equal to the bounding box width plus twelve block width if the image width is less than or equal to the image height. Initializing also includes using the candidate high variance block centroid as the bounding box center to draw the bounding box. If the bounding box is over the image boundary, the bounding box is shifted in the image such that the bounding box has a minimum 3 blocks distance from the image boundary.)
6. the low variance region being a sub-region of the high variance region; / 16. the low variance region being a sub-region of the high variance region; (Zhang: [0024] After the above candidate high variance blocks selection in the step 202, the high variance blocks are analyzed if their distribution is able to satisfy the stand-out criterion check, in the step 204. The check process is further illustrated in the FIG. 3. [0033] In the step 400, a bounding box (e.g. rectangular, circular, spherical, square, triangular or another shape) is initialized.)” Zhang further describes object of interest isolation in paragraphs [0032]-[0037], and applicant is further directed to the block variance-based object of interest isolation by initializing a bounding box in step 400 using the candidate high variance block centroids (paragraph [0033]-[0034]) to generate a convex shape (paragraph [0036]), which is depicted in Figure 5 and described in paragraph [0037] and Figure 5, which depicts block variance-based object of interest isolation procedure, and Figure 5 clearly illustrates a block-based boundary, wherein a series of darker high-variance blocks outline a series of low-variance sub-blocks. Thus in accordance with Figure 5, Zhang teaches embodiments wherein low-variance regions are sub-regions of the high-variance regions)

    PNG
    media_image1.png
    518
    640
    media_image1.png
    Greyscale

providing the bounding box to at least one of a planning component or a prediction component for object detection (Zhang: [0039] The block transition map-based object portrait detection scheme is illustrated in FIG. 6. In the step 600, a block transition map is extracted. The block transition map is calculated by the following procedure. [0051] A bounding box generated by the center around growing process is initialized. For each row of object blocks within the bounding box, the leftmost object block and rightmost object block are found, and all of the blocks between them are denoted as an object block. For each column of blocks, the top object block and bottom object block are found, and all of the blocks between them are denoted as object blocks. The resulted convex set is used as the object of interest isolation result)
Zhang does not teach: 
6. and controlling the system based on at least one of the sensor data or the indication of the high variance region. 
16. receiving, from a sensor associated with a vehicle, sensor data representing an environment within which the vehicle is located; 
Jung teaches: 
6. A system comprising: one or more processors; and computer-readable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: / 16. One or more computer-readable media that, when executed by one or more processors, cause the one or more processors to perform operations comprising:  (Jung: abstract, A computer implemented method for detecting the presence of one or more pedestrians in the vicinity of the vehicle is disclosed. Imagery of a scene is received from at least one image capturing device. A depth map is derived from the imagery. A plurality of pedestrian candidate regions of interest (ROis) is detected from the depth map by matching each of the plurality of ROis with a 3D human shape model. At least a portion of the candidate ROIs is classified by employing a cascade of classifiers tuned for a plurality of depth bands and trained on a filtered representation of data within the portion of candidate RO Is to determine whether at least one pedestrian is proximal to the vehicle.)
6. receiving sensor data representing an environment; / 16. receiving, from a sensor associated with a vehicle, sensor data representing an environment within which the vehicle (Jung: [0042]-[0047], [0043] FIG. 1 depicts a vehicle 100 that is equipped with an exemplary digital processing system 110 configured to acquire a plurality of images and detect the presence of one or more pedestrians 102 in a scene 104 in the vicinity of the vehicle 100, according to an embodiment of the present invention.)
16. inputting, into a machine learned model, at least a portion of the sensor data; (Jung: [0048], FIG. 3 is a block diagram illustrating exemplary software modules that execute the steps of a method for detecting a pedestrian in the vicinity of the vehicle, according to an embodiment of the present invention. Referring now to FIGS. 1-3, in block Sl, at least one image of the scene is received by one or more image capturing devices 106 from the vehicle 100. In block S2, at least one stereo depth map is derived from the at least one image. In a preferred embodiment, disparities are generated at a plurality of pyramid resolutions, preferably three-Di, i=l, ... , 3, with DO being the resolution of the input image.)
6. determining an indication of a low variance region associated with the sensor data; / 16.determining, by the machine learned model, an indication of a low variance region associated with the sensor data; (Jung: [0056] To further classify the patches, in step 618, a representation from the range map is created called a vertical support (VS) histogram. More particularly, a discrete 2D grid of the world X-coordinates and the world disparities is defined. Each point from the range map which satisfies a given distance range and a given height range is projected to a cell on the grid and its height recorded. For each bin, the variance of heights of all the points projected in the bin is computed. This provides a 2D histogram in X-d coordinates which measures the support at a given world location from any visible structure above it.)
6. determining an indication of a high variance region associated with the sensor data based at least in part on the indication of the low variance region; / 16.and determining an indication of a high variance region based at least in part on a portion of the sensor data associated with the low variance region. (Jung: [0051] FIG. 4 depicts exemplary steps executed by the pedestrian detector (PD) module 400 in greater detail, according to an embodiment of the present invention. In the PD module 400, template matching is conducted using a 3D pedestrian shape template applied to a plurality ( e.g., three) disjoint range bands in front of the vehicle 100. The 3D shape size is a predetermined function of the actual range from the image capturing devices 106. [0052] As mentioned above, in step 402, depth maps are  obtained at separate image resolutions, Dl, i=l, ... , 3.)
6. and controlling the system based on at least one of the sensor data or the indication of the high variance region. (Jung: Figure 4, [0053] FIGS. 5A-5D are visual depictions of an example of pedestrian ROI refinement, according to an embodiment of the present invention. Depth map based detected ROIs are further refined by examining a combination of depth and edge features of two individual pedestrian detections. In step 413, a new pedestrian ROI is initialized at each detected peak, which is refined first horizontally and then vertically to obtain a more centered and tightly fitting bounding box about a candidate pedestrian. This involves employing vertical and horizontal projections, respectively, of binarized disparity maps (similar to using the edge pixels above) followed by detection of peak and valley locations in the computed projections. After this refinement, in step 414, any resulting overlapping detections are again removed from the detection list. Jung: [0046] Portions of a processed video/audio data stream 130 may be stored temporarily in the computer readable medium 128 for later output to an on-board monitor 132, to an onboard automatic collision avoidance system 134, or to a network 136, such as the Internet. [0071]-[0073], [0071] FIGS, 12A-12C depict system performance based on different criteria, System performance was analyzed in terms of different distance intervals, which permit gauging the effectiveness of the system from an application point of view: low latency and high accuracy detection at short distances as well as distant target detection of potential threats of collisions, [0073] Performance was further analyzed in terms of another criteria that determines effectiveness for collision avoidance purposes,)
16. receiving, from a sensor associated with a vehicle, sensor data representing an environment within which the vehicle is located; (Jung: [0046] Portions of a processed video/audio data stream 130 may be stored temporarily in the computer readable medium 128 for later output to an on-board monitor 132, to an onboard automatic collision avoidance system 134, or to a network 136, such as the Internet. [0071]-[0073], [0071] FIGS, 12A-12C depict system performance based on different criteria, System performance was analyzed in terms of different distance intervals, which permit gauging the effectiveness of the system from an application point of view: low latency and high accuracy detection at short distances as well as distant target detection of potential threats of collisions, [0073] Performance was further analyzed in terms of another criteria that determines effectiveness for collision avoidance purposes,)
It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to leverage Zhang’s content adaptive image analysis for object detection and apply it to Jung’s real-time pedestrian detection. The determination of obviousness is predicated upon the following findings:  One skilled in the art would have been motivated to modify Zhang in this manner in order to leverage the content adaptive algorithm in order improve the overall accuracy for real-time pedestrian detection of Jung. as Furthermore, the 

Consider Claims 7 and 19. 
The combination of Zhang and Jung teaches: 
7. The system as claim 6 recites, the operations further comprising: determining an additional indication of a high variance region associated with the sensor data; and controlling the system further based at least in part on the additional indication of the high variance region. / 19. The one or more computer-readable media as claim 16 recites, the operations further comprising: providing the indication of the high variance region to at least one of a planning component or a prediction component for determining a trajectory along which the vehicle is to travel; and controlling the vehicle based at least partly on the trajectory. (Jung: [0046] Portions of a processed video/audio data stream 130 may be stored temporarily in the computer readable medium 128 for later output to an on-board monitor 132, to an onboard automatic collision avoidance system 134, or to a network 136, such as the Internet. [0071]-[0073], [0071] FIGS, 12A-12C depict system performance based on different criteria, System performance was analyzed in terms of different distance intervals, which permit gauging the effectiveness of the system from an application point of view: low latency and high accuracy detection at short distances as well as distant target detection of potential threats of collisions, [0073] Performance was further analyzed in terms of another criteria that determines effectiveness for collision avoidance purposes,)

Consider Claims 8 and 17. 
The combination of Zhang and Jung teaches: 
8. The system as claim 7 recites, the operations further comprising: determining, based at least in part on the indication of the high variance region and the additional indication of the high variance region, a combined indication of the high variance region; and controlling the system further based at least in part on the combined indication of the high variance region. / 17. The one or more computer-readable media as claim 16 recites, the operations further comprising determining the indication of the high variance region based at least in part on the indication of the low variance region associated with the portion of the sensor data. (Zhang: [0027] During the above process, the location centroid of the candidate high variance blocks in the upper half of the image is also extracted if the total number of candidate high variance blocks in the upper half of the image is larger than a threshold which is one fourth of the number of blocks in one row of an image. A similar procedure is also applied to a lower half picture [0033] In the step 400, a bounding box (e.g. rectangular, circular, spherical, square, triangular or another shape) is initialized. Initializing includes setting the bounding box width as half of an image width plus a six block width and a bounding box height equal to the bounding box width if an image width is larger than an image height, and setting the bounding box width as half of the image height and the bounding box height equal to the bounding box width plus twelve block width if the image width is less than or equal to the image height. Initializing also includes using the candidate high variance block centroid as the bounding box center to draw the bounding box. If the bounding box is over the image boundary, the bounding box is shifted in the image such that the bounding box has a minimum 3 blocks distance from the image boundary. Jung: Figure 4, [0053] FIGS. 5A-5D are visual depictions of an example of pedestrian ROI refinement, according to an embodiment of the present invention. Depth map based detected ROIs are further refined by examining a combination of depth and edge features of two individual pedestrian detections. In step 413, a new pedestrian ROI is initialized at each detected peak, which is refined first horizontally and then vertically to obtain a more centered and tightly fitting bounding box about a candidate pedestrian. This involves employing vertical and horizontal projections, respectively, of binarized disparity maps (similar to using the edge pixels above) followed by detection of peak and valley locations in the computed projections. After this refinement, in step 414, any resulting overlapping detections are again removed from the detection list.)

9. The system as claim 6 recites, wherein the indication of the low variance region is one or more of represented in the sensor data or derived from the sensor data. (Zhang: [0019] In the block variance-based detection module, the visually stand-out blocks are selected by comparing each block's variance with a content adaptive threshold. The distribution compactness of visual stand-out blocks is extracted. If the distribution compactness is compact enough, the image has a very obvious stand-out object. In the step 102, if the image has a stand-out object (e.g. is an object portrait), then the processing directly jumps to an object of interest isolation module to conduct block variance-based object of interest isolation. In the step 102, if the detection result is not good enough (e.g. distribution compactness not compact enough), processing continues to a transition map-based detection module. In the step 104, in the transition map-based detection module, a transition map is generated based on a block difference between each block with its neighbor blocks (e.g. eight neighbor blocks). [0027] During the above process, the location centroid of the candidate high variance blocks in the upper half of the image is also extracted if the total number of candidate high variance blocks in the upper half of the image is larger than a threshold which is one fourth of the number of blocks in one row of an image. A similar procedure is also applied to a lower half picture Jung: [0056] To further classify the patches, in step 618, a representation from the range map is created called a vertical support (VS) histogram. More particularly, a discrete 2D grid of the world X-coordinates and the world disparities is defined. Each point from the range map which satisfies a given distance range and a given height range is projected to a cell on the grid and its height recorded. For each bin, the variance of heights of all the points projected in the bin is computed. This provides a 2D histogram in X-d coordinates which measures the support at a given world location from any visible structure above it.)

Consider Claims 10 and 20. 
The combination of Zhang and Jung teaches: 
10. The system as claim 6 recites, wherein: the determining the indication of the low variance region is based at least in part on analyzing at least a portion of the sensor data using a neural network, the low variance region comprises a representation of a face in the sensor data, and the high variance region comprises a representation of a pedestrian in the sensor data. / 20. The one or more computer-readable media as claim 16 recites, wherein: the sensor data is image data; the low variance region is associated with at least one of a  (Jung: [0063]-[0065] For candidate ROIs (pedestrians) located at greater distances beyond a predetermined threshold, a cascade of HOG based classifiers is employed, HOG-based classifiers have been proven to be effective for relatively low-resolution images when body contours are distinguishable from the background. Each HOG classifier is trained separately for each resolution band, For this purpose, in the training phase. [0079] The input ROI 1502 to the multi-layer convolutional network 1500 may be preprocessed before propagation through the network 1500, according to an embodiment of the present invention. In a preferred embodiment, the input ROI 1502 may comprise an 80x40 pixel block. Contrast normalization is applied to the input ROI 1502. Each pixel's intensity is divided by the standard deviation of the surrounding neighborhood pixels ( e.g., a 7x7 pixel neighborhood). This preprocessing step increases contrast in low-contrast regions and decreases contrast in high-contrast regions. Zhang: [0031] The centroid around variance of the upper half image and lower half image are then compared to the image adaptive threshold VARTH, in the steps 308 and 312, respectively. If any of them is less than the threshold, a confidence value is set to a value (e.g. 3) and the process jumps to the object of interest isolation module. Otherwise, the processing is continued to a transition map-based detection. [0038])

11. The combination of Zhang and Jung teaches: The system as claim 10 recites, wherein the neural network is trained based at least in part on utilizing another neural network to project features associated with input data into an image space to generate reconstructed input data and enforcing consistency between the input data and the reconstructed input data using a loss function. (Jung: [0063]-[0065] For candidate ROIs (pedestrians) located at greater distances beyond a predetermined threshold, a cascade of HOG based classifiers is employed, HOG-based classifiers have been proven to be effective for relatively low-resolution images when body contours are distinguishable from the background. Each HOG classifier is trained separately for each resolution band, For this purpose, in the training phase. [0079] The input ROI 1502 to the multi-layer convolutional network 1500 may be preprocessed before propagation through the network 1500, according to an embodiment of the present invention. In a preferred embodiment, the input ROI 1502 may comprise an 80x40 pixel block. Contrast normalization is applied to the input ROI 1502. Each pixel's intensity is divided by the standard deviation of the surrounding neighborhood pixels ( e.g., a 7x7 pixel neighborhood). This preprocessing step increases contrast in low-contrast regions and decreases contrast in high-contrast regions. Zhang: [0031] The centroid around variance of the upper half image and lower half image are then compared to the image adaptive threshold VARTH, in the steps 308 and 312, respectively. If any of them is less than the threshold, a confidence value is set to a value (e.g. 3) and the process jumps to the object of interest isolation module. Otherwise, the processing is continued to a transition map-based detection. [0038])

Consider Claims 12 and 18. 
The combination of Zhang and Jung teaches: 
12. The system as claim 6 recites, wherein determining the indication of the high variance region comprises: inputting the indication of a low variance region into a portion of a machine learned model trained to detect high variance regions in sensor data; and analyzing the indication of the low variance region by the portion of the machine learned model. / 18. The one or more computer-readable media as claim 16 recites, the operations further comprising determining the indication of the high variance region based at least in part on  (Zhang: [0027] During the above process, the location centroid of the candidate high variance blocks in the upper half of the image is also extracted if the total number of candidate high variance blocks in the upper half of the image is larger than a threshold which is one fourth of the number of blocks in one row of an image. A similar procedure is also applied to a lower half picture [0033] In the step 400, a bounding box (e.g. rectangular, circular, spherical, square, triangular or another shape) is initialized. Initializing includes setting the bounding box width as half of an image width plus a six block width and a bounding box height equal to the bounding box width if an image width is larger than an image height, and setting the bounding box width as half of the image height and the bounding box height equal to the bounding box width plus twelve block width if the image width is less than or equal to the image height. Initializing also includes using the candidate high variance block centroid as the bounding box center to draw the bounding box. If the bounding box is over the image boundary, the bounding box is shifted in the image such that the bounding box has a minimum 3 blocks distance from the image boundary. Jung: Figure 4, [0053] FIGS. 5A-5D are visual depictions of an example of pedestrian ROI refinement, according to an embodiment of the present invention. Depth map based detected ROIs are further refined by examining a combination of depth and edge features of two individual pedestrian detections. In step 413, a new pedestrian ROI is initialized at each detected peak, which is refined first horizontally and then vertically to obtain a more centered and tightly fitting bounding box about a candidate pedestrian. This involves employing vertical and horizontal projections, respectively, of binarized disparity maps (similar to using the edge pixels above) followed by detection of peak and valley locations in the computed projections. After this refinement, in step 414, any resulting overlapping detections are again removed from the detection list.)

13. The combination of Zhang and Jung teaches: The system as claim 6 recites, the operations further comprising: receiving a plurality of classifications of an object identified in the sensor data and a plurality of confidence scores, an individual confidence score corresponding to an individual classification; reducing a threshold associated with a classification of the plurality of classifications associated with the high variance region; determining that a confidence score associated with the classification meets or exceeds the threshold; and determining the indication of the high variance region based at least partly on determining that the confidence score meets or exceeds the threshold. (Jung: [0063]-[0065] For candidate ROIs (pedestrians) located at greater distances beyond a predetermined threshold, a cascade of HOG based classifiers is employed, HOG-based classifiers have been proven to be effective for relatively low-resolution images when body contours are distinguishable from the background. Each HOG classifier is trained separately for each resolution band, For this purpose, in the training phase. Zhang: [0031] The centroid around variance of the upper half image and lower half image are then compared to the image adaptive threshold VARTH, in the steps 308 and 312, respectively. If any of them is less than the threshold, a confidence value is set to a value (e.g. 3) and the process jumps to the object of interest isolation module. Otherwise, the processing is continued to a transition map-based detection. [0038])

14. The combination of Zhang and Jung teaches: The system as claim 13 recites, wherein reducing the threshold associated with the classification is based at least in part on determining the indication of the low variance region. (Jung: [0063]-[0065] For candidate ROIs (pedestrians) located at greater distances beyond a predetermined threshold, a cascade of HOG based classifiers is employed, HOG-based classifiers have been proven to be effective for relatively low-resolution images when body contours are distinguishable from the background. Each HOG classifier is trained separately for each resolution band, For this purpose, in the training phase. Zhang: [0031] The centroid around variance of the upper half image and lower half image are then compared to the image adaptive threshold VARTH, in the steps 308 and 312, respectively. If any of them is less than the threshold, a confidence value is set to a value (e.g. 3) and the process jumps to the object of interest isolation module. Otherwise, the processing is continued to a transition map-based detection. [0038])

15. The combination of Zhang and Jung teaches: The system as claim 6 recites, wherein the indication of the high variance region comprises a bounding box. (Zhang: [0033] In the step 400, a bounding box (e.g. rectangular, circular, spherical, square, triangular or another shape) is initialized. Initializing includes setting the bounding box width as half of an image width plus a six block width and a bounding box height equal to the bounding box width if an image width is larger than an image height, and setting the bounding box width as half of the image height and the bounding box height equal to the bounding box width plus twelve block width if the image width is less than or equal to the image height. Initializing also includes using the candidate high variance block centroid as the bounding box center to draw the bounding box. If the bounding box is over the image boundary, the bounding box is shifted in the image such that the bounding box has a minimum 3 blocks distance from the image boundary. Jung: Figure 4, [0053] FIGS. 5A-4D are visual depictions of an example of pedestrian ROI refinement, according to an embodiment of the present invention. Depth map based detected ROIs are further refined by examining a combination of depth and edge features of two individual pedestrian detections. In step 413, a new pedestrian ROI is initialized at each detected peak, which is refined first horizontally and then vertically to obtain a more centered and tightly fitting bounding box about a candidate pedestrian. This involves employing vertical and horizontal projections, respectively, of binarized disparity maps (similar to using the edge pixels above) followed by detection of peak and valley locations in the computed projections. After this refinement, in step 414, any resulting overlapping detections are again removed from the detection list.)

Conclusion
The prior art made of record in form PTO-892 and not relied upon is considered pertinent to applicant's disclosure. 
Bect et al., USPGPub US 2009/0143987, METHOD AND SYSTEM FOR PREDICTING THE IMPACT BETWEEN A VEHICLE AND A PEDESTRIAN
Bhaskara et al., US PGPub 2020/0309957, IDENTIFYING AND/OR REMOVING FALSE
POSITIVE DETECTIONS FROM LIDAR SENSOR OUTPUT 
Nagaoka; Nobuharu et al., US 7130448 B2, Device for monitoring around a vehicle	

Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAHMINA ANSARI whose telephone number is 571-270-3379.  The examiner can normally be reached on IFP Flex - Monday through Friday 9 to 5.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SUMATI LEFKOWITZ can be reached on 571-272-3638.  The fax phone numbers for the organization where this application or proceeding is assigned are 571-273-8300 for regular communications and 571-273-8300 for After Final communications. TC 2600’s customer service number is 571-272-2600.





2662
/Tahmina Ansari/

March 1, 2021
/TAHMINA N ANSARI/Primary Examiner, Art Unit 2662                                                                                                                                                                                                        /TAHMINA N ANSARI/Primary Examiner, Art Unit 2662