Notice of Pre-AIA  or AIA  Status
Claims 1-20 are pending in this application. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the Pre-Appeal Brief Request filed by Applicant on December 27, 2021, which was filed in this application in conjunction with an appeal to the Patent Trial and Appeal Board. In response to the decision, prosecution on the application has been re-opened and the finality of the previous office action has been withdrawn. Applicant’s submission of Remarks filed on December 27, 2021 has been entered.  
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
	
Examiner' s Responses to Applicant' s Remark
Applicants' amendments filed on December 27, 2021 have been fully considered. The amendments overcome the following rejections set forth in the office action mailed on August 26, 2021.
a.	Applicant' s arguments regarding the teachings of Fatteh are persuasive, and the rejection of Claims 1-20 under 35 U.S.C. 103(a) as being unpatentable over Jung et al. (US PGPub US 2011/0255741), hereby referred to as “Jung” in view of Kefi-Fatteh, Takoua, et al. "Human face detection improvement using incremental learning based on low variance directions." Signal, Image and Video Processing 13.8 (published May 2019): 1503-1510), hereby referred to as “Fatteh” is hereby withdrawn. 
Applicant's arguments with respect to claims 1-20 have been considered but are moot in view of the new grounds of rejection, presented below. 
6.	Applicant' s arguments, see “Pre-Appeal Brief Request”, filed December 27, 2021, with respect to the teachings of Fatteh have been fully considered and are persuasive.  Therefore, the rejection of Claims 1-20 under 35 U.S.C. 103(a) as being unpatentable over Jung et al. (US PGPub US 2011/0255741), hereby referred to as “Jung” in view of Kefi-Fatteh has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made and presented below.
Applicants' arguments filed on December 22, 2021 regarding the teachings of Jung have been fully considered but they are not persuasive. The Examiner has thoroughly reviewed Applicants' arguments but firmly believes that the cited reference to reasonably and properly meet the claimed limitation. 
With respect to the double patenting rejection of claims, applicant’s traversal of the rejection of Claims 1-20 on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Application No. 16/706,608, is rendered moot in view of the new grounds of rejection presented below.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Application No. 16/706,608 in view of Jung et al. (US PGPub US 2011/0255741), hereby referred to as “Jung”. Although the claims at issue are not identical, they are not patentably distinct from each other because they are both directed towards using variance-based image analysis of vehicular environments for vehicular control using a trained machine learned model, with the only distinction being the use of annotated data and using the annotated data to identify differences to (Jung: [0050]). One of ordinary skill in the art at the time of filing could take these improved classifier and scene labeling features of Jung, and leverage them in the teachings of the co-pending application No. 16/706,608  to improve the overall training of the machine learned model to be more accurate in assessing pedestrians in the manner explained above using known engineering design, interface and programming techniques, without changing a “fundamental” operating principle of the co-pending application, while the teachings of Jung continue to perform the same function of applying scene labeling to more accurately identify regions of interest. It is for the above mentioned reasons that the Examiner has come to this conclusion of obviousness with respect to the co-pending applications. Note: This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Priority
This application repeats a substantial portion of prior Application No.16/457,524, filed June 28, 2019, and adds disclosure not presented in the prior application. Because this application names the inventor or at least one joint inventor named in the prior application, it constitutes a continuation-in-part of the prior application. In reviewing the subject matter of the claimed invention, it appears that the claims are directed towards subject matter that is largely supported in the newly added disclosure, and as a result, the overall claimed invention in this application is being examined with the priority date et seq.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103(a) as being unpatentable over Jung et al. (US PGPub US 2011/0255741), hereby referred to as “Jung” in view of Zhang et al. (US PGPub US 2010/0321513), hereby referred to as “Zhang”.  

Consider Claims 1, 8 and 15. 
Jung teaches: 
1. A method comprising: / 8. A system comprising: one or more processors; and one or more non-transitory computer-readable media that, when executed by the one or more processors, cause the system to perform operations comprising: / 15. One or more non-transitory computer-readable media that, when executed by one or more processors, cause the one or more processors to perform operations comprising: (Jung: abstract, A computer implemented method for detecting the presence of one or more pedestrians in the vicinity of the vehicle is disclosed. Imagery of a scene is received from at least one image capturing device. A depth map is derived from the imagery. A plurality of pedestrian candidate regions of interest (ROIs) is detected from the depth map by matching each of the plurality of ROIs with a 3D human shape model. At least a portion of the candidate ROIs is classified by employing a cascade of classifiers tuned for a plurality of depth bands and trained on a filtered representation of data within the portion of candidate RO Is to determine whether at least one pedestrian is proximal to the vehicle.)
1. receiving data associated with environments of vehicles; / 8. receiving sensor data from a sensor associated with environments of vehicles; / 15. determining, based on sensor data received from a sensor associated with a vehicle, (Jung: [0042]-[0047], [0043] FIG. 1 depicts a vehicle 100 that is equipped with an exemplary digital processing system 110 configured to acquire a plurality of images and detect the presence of one or more pedestrians 102 in a scene 104 in the vicinity of the vehicle 100, according to an embodiment of the present invention.)
1. determining annotated data based at least in part on the data, wherein the annotated data comprises an annotated high variance region in the data and an annotated low variance region in the data; / 8. annotating, as annotated data and based on the sensor data, an annotated low variance region associated with the sensor data; / 15. annotated data, wherein the annotated data comprises one or more of an annotated low variance region or an annotated high variance region; (Jung: [0050] In block S4, a structure classification (SC) module employs a combined image derived from the pyramid of depth images, DO+Dl+D2, to classify image regions into several broad categories such as tall vertical structures, overhanging structures, ground, and poles and to remove pedestrian candidate regions having a significant overlap. These image regions classified as non-pedestrians are provided with scene labels 142. In block SS, the scene labels 142 are fused with the pedestrian candidate regions to produce a pruned set of pedestrian regions-of-interest (ROIs ). In block S6, a pedestrian classification (PC) module takes in the list of pedestrian ROIs and confirms valid pedestrian detections 144 by using a cascade of classifiers tuned for several depth bands and trained on a combination of pedestrian contour and gradient features. [0056] To further classify the patches, in step 618, a representation from the range map is created called a vertical support (VS) histogram. More particularly, a discrete 2D grid of the world X-coordinates and the world disparities is defined. Each point from the range map which satisfies a given distance range and a given height range is projected to a cell on the grid and its height recorded. For each bin, the variance of heights of all the points projected in the bin is computed. This provides a 2D histogram in X-d coordinates which measures the support at a given world location from any visible structure above it.) 
1. inputting the data into a model; / 8. training a model based at least in part on the annotated data and the sensor data to generate a trained model, / 15. inputting the sensor data into a model; (Jung: [0048], FIG. 3 is a block diagram illustrating exemplary software modules that execute the steps of a method for detecting a pedestrian in the vicinity of the vehicle, according to an embodiment of the present invention. Referring now to FIGS. 1-3, in block Sl, at least one image of the scene is received by one or more image capturing devices 106 from the vehicle 100. In block S2, at least one stereo depth map is derived from the at least one image. In a preferred embodiment, disparities are generated at a plurality of pyramid resolutions, preferably three-Di, i=l, ... , 3, with DO being the resolution of the input image.)
1. determining, by the model, an output comprising a high variance output and a low variance output; / 8. the trained model configured to output an indication of a low variance region and an indication of a high variance region based at least in part on an input; / 15. (Jung: [0051] FIG. 4 depicts exemplary steps executed by the pedestrian detector (PD) module 400 in greater detail, according to an embodiment of the present invention. In the PD module 400, template matching is conducted using a 3D pedestrian shape template applied to a plurality ( e.g., three) disjoint range bands in front of the vehicle 100. The 3D shape size is a predetermined function of the actual range from the image capturing devices 106. [0052] As mentioned above, in step 402, depth maps are  obtained at separate image resolutions, Dl, i=l, ... , 3.)
1. altering parameters of the model based at least in part on the difference; / 15. altering one or more parameters associated with the model based at least in part on the difference;  (Jung: [0050] These image regions classified as non-pedestrians are provided with scene labels 142. In block SS, the scene labels 142 are fused with the pedestrian candidate regions to produce a pruned set of pedestrian regions-of-interest (ROIs ). In block S6, a pedestrian classification (PC) module takes in the list of pedestrian ROIs and confirms valid pedestrian detections 144 by using a cascade of classifiers tuned for several depth bands and trained on a combination of pedestrian contour and gradient features. [0053] FIGS. SA-SD are visual depictions of an example of pedestrian ROI refinement, according to an embodiment of the present invention. Depth map based detected ROIs are further refined by examining a combination of depth and edge features of two individual pedestrian detections in FIGS. 5A-5D.)
1. and transmitting the model to a vehicle configured to be controlled by another output of the model. / 8. and transmitting the trained model to a vehicle configured to be controlled by another output of the model. / 15. and transmitting the model to a vehicle configured to be controlled by another output of the model. (Jung: Figure 4, [0053] FIGS. 5A-5D are visual depictions of an example of pedestrian ROI refinement, according to an embodiment of the present invention. Depth map based detected ROIs are further refined by examining a combination of depth and edge features of two individual pedestrian detections. In step 413, a new pedestrian ROI is initialized at each detected peak, which is refined first horizontally and then vertically to obtain a more centered and tightly fitting bounding box about a candidate pedestrian. This involves employing vertical and horizontal projections, respectively, of binarized disparity maps (similar to using the edge pixels above) followed by detection of peak and valley locations in the computed projections. After this refinement, in step 414, any resulting overlapping detections are again removed from the detection list. Jung: [0046] Portions of a processed video/audio data stream 130 may be stored temporarily in the computer readable medium 128 for later output to an on-board monitor 132, to an onboard automatic collision avoidance system 134, or to a network 136, such as the Internet. [0071]-[0073], [0071] FIGS, 12A-12C depict system performance based on different criteria, System performance was analyzed in terms of different distance intervals, which permit gauging the effectiveness of the system from an application point of view: low latency and high accuracy detection at short distances as well as distant target detection of potential threats of collisions, [0073] Performance was further analyzed in terms of another criteria that determines effectiveness for collision avoidance purposes,)
Jung does not teach:
determining, by the model, an output comprising a low variance output including a first feature detection and a high variance output including a second feature detection based on the first feature detection
Zhang teaches: 
1. A method comprising: / 8. A system comprising: one or more processors; and one or more non-transitory computer-readable media that, when executed by the one or more processors, cause the system to perform operations comprising: / 15. One or more non-transitory computer-readable media that, when executed by one or more processors, cause the one or more processors to perform operations comprising: (Zhang: [0052], Figure 8, abstract, Content adaptive detection of images having stand-out objects involves block variance-based detection and determining if an object includes a stand-out object. The images with a stand-out object are further processed to isolate an object of interest. The images without a detected stand-out object are further processed with a transition map-based detection method which includes generating a transition map. If an object portrait is determined from the transition map, then the image is further processed to isolate the object of interest.)
1. receiving data associated with an environment; / 8. receiving sensor data from a sensor associated with an environment; / 15. determining, based on sensor data received from a sensor,  (Zhang: [0055]-[0056], [0008] In another aspect, a camera comprises a lens, a sensor configured for acquiring an input image through the lens, a memory for storing an application, the application configured for processing the input image using a block variance-based detection module, if the input image includes a standout object, [0019])
1. determining based on the sensor data, which comprises a high variance region in the data and a low variance region in the data; / 8. determining based on the sensor data, a low variance region associated with the sensor data; / 15. data, which comprises one or more of a low variance region or a high variance region; (Zhang: [0019] In the block variance-based detection module, the visually stand-out blocks are selected by comparing each block's variance with a content adaptive threshold. The distribution compactness of visual stand-out blocks is extracted. If the distribution compactness is compact enough, the image has a very obvious stand-out object. In the step 102, if the image has a stand-out object (e.g. is an object portrait), then the processing directly jumps to an object of interest isolation module to conduct block variance-based object of interest isolation. In the step 102, if the detection result is not good enough (e.g. distribution compactness not compact enough), processing continues to a transition map-based detection module. In the step 104, in the transition map-based detection module, a transition map is generated based on a block difference between each block with its neighbor blocks (e.g. eight neighbor blocks). [0027] During the above process, the location centroid of the candidate high variance blocks in the upper half of the image is also extracted if the total number of candidate high variance blocks in the upper half of the image is larger than a threshold which is one fourth of the number of blocks in one row of an image. A similar procedure is also applied to a lower half picture)
1. inputting the data into a model; / 8. training a model based at least in part on the annotated data and the sensor data to generate a trained model, / 15. inputting the sensor data into a model; (Zhang: [0019] FIG. 1 illustrates a flowchart of an overall architecture of the method of detecting images. An input image is processed by a block variance-based detection module, in the step 100. [0056] To utilize the content adaptive detection, a user acquires an image such as by a digital camera, and then while the image is acquired or after the image is acquired, the image is able to be processed using the content adaptive detection method. In some embodiments, the camera automatically implements the content adaptive detection, and in some embodiments, a user manually selects to implement the content adaptive detection.)
1. determining, by the model, an output comprising(Zhang: Examiner Note: the mean variance and thresholding operations lead to the determination of 3 types of blocks, blocks that fail to meet the threshold (low), candidate high variance blocks (candidate high) and stand-out criterion blocks (high), [0023]-[0024] In step 202, a thresholding operation is applied to identify blocks that have a mean variance above a certain value to qualify them as candidate high variance blocks  [0023] If this mean value is larger than a threshold (e.g. 1600), the mean value is set as 1600. This mean value is then utilized to determine if a block is a candidate high variance block or not. If the block variance is larger than the mean value, it is selected as high variance block. Otherwise, it is not selected. [0033] In the step 400, a bounding box (e.g. rectangular, circular, spherical, square, triangular or another shape) is initialized. Initializing includes setting the bounding box width as half of an image width plus a six block width and a bounding box height equal to the bounding box width if an image width is larger than an image height, and setting the bounding box width as half of the image height and the bounding box height equal to the bounding box width plus twelve block width if the image width is less than or equal to the image height. Initializing also includes using the candidate high variance block centroid as the bounding box center to draw the bounding box. If the bounding box is over the image boundary, the bounding box is shifted in the image such that the bounding box has a minimum 3 blocks distance from the image boundary.)
1. A low variance output including a first feature detection and a high variance output including a second feature detection based on the first feature detection / 8. a low variance region including a first feature detection and an indication of a high variance region including a second feature detection based at least in part on the first feature detection / 16. a low variance output including a first feature detection and a high variance output including a second feature detection based on the first feature detection (Zhang: Examiner Note: the mean variance and thresholding operations lead to the determination of 3 types of blocks, blocks that fail to meet the threshold (low), candidate high variance blocks (candidate high) and stand-out criterion blocks (high), the series of steps which are based on each other [0023]-[0024] [0024] After the above candidate high variance blocks selection in the step 202, the high variance blocks are analyzed if their distribution is able to satisfy the stand-out criterion check, in the step 204. The check process is further illustrated in the FIG. 3. [0033] In the step 400, a bounding box (e.g. rectangular, circular, spherical, square, triangular or another shape) is initialized.)” Zhang further describes object of interest isolation in paragraphs [0032]-[0037], and applicant is further directed to the block variance-based object of interest isolation by initializing a bounding box in step 400 using the candidate high variance block centroids (paragraph [0033]-[0034]) to generate a convex shape (paragraph [0036]), which is depicted in Figure 5 and described in paragraph [0037] and Figure 5, which depicts block variance-based object of interest isolation procedure, and Figure 5 clearly illustrates a block-based boundary, wherein a series of darker high-variance blocks outline a series of low-variance sub-blocks. Thus in accordance with Figure 5, Zhang teaches embodiments wherein low-variance regions are sub-regions of the high-variance regions)

    PNG
    media_image1.png
    518
    640
    media_image1.png
    Greyscale

1. determining a difference between the output and the variance data; / 15. determining a difference between the variance data and the output;(Zhang: [0041], the present system and method implement a number of novel techniques, including: (1) the detection and classification of clutter objects in roadside scenarios such as buildings, trees, and poles by employing dense stereo depth maps to substantially lower false alarms rates; (2) multiple classifiers adapted to a plurality of ranges of distances to increase detection accuracy; and (3) a combination of template matching with 2D human shape contour fragments employed for localization along with the use of standard histogram of oriented gradient (HOG) descriptors for guiding a driver's focus of attention and for computational efficiency while maintaining accuracy.)
1. altering parameters of the model based at least in part on the difference; / 15. altering one or more parameters associated with the model based at least in part on the difference; (Zhang: [0039] The block transition map-based object portrait detection scheme is illustrated in FIG. 6. In the step 600, a block transition map is extracted. The block transition map is calculated by the following procedure. [0051] A bounding box generated by the center around growing process is initialized. For each row of object blocks within the bounding box, the leftmost object block and rightmost object block are found, and all of the blocks between them are denoted as an object block. For each column of blocks, the top object block and bottom object block are found, and all of the blocks between them are denoted as object blocks. The resulted convex set is used as the object of interest isolation result)
It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to leverage Zhang’s content adaptive image analysis for object detection and apply it to Jung’s real-time pedestrian detection. The determination of obviousness is predicated upon the following findings: One skilled in the art would have been motivated to improve the overall accuracy for real-time pedestrian detection of Jung and leverage the content adaptive algorithm of Zhang in order improve the overall accuracy for real-time pedestrian detection in the field of vehicular imaging. Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in the manner explained above using known engineering design, interface and programming techniques, without changing a “fundamental” operating principle of Jung, while the teaching of Zhang continues to perform the same function 

Consider Claims 2, 14 and 16. 
The combination of Jung and Zhang teaches: 
2. The method as claim 1 recites, wherein the annotated low variance region is determined from one or more statistical models./ 14. The system as claim 8 recites, wherein the annotated low variance region is determined from one or more statistical models based at least in part on one or more of entropy, pixel intensity, or aspect ratios associated with indications of low variance regions. / 16. The one or more non-transitory computer-readable media as claim 15 recites, wherein determining the annotated data comprises determining the annotated low variance region based at least in part on a statistical model associated with one or more of the sensor data or an intermediary output of the model based at least in part on the sensor data.(Zhang: [0027] During the above process, the location centroid of the candidate high variance blocks in the upper half of the image is also extracted if the total number of candidate high variance blocks in the upper half of the image is larger than a threshold which is one fourth of the number of blocks in one row of an image. A similar procedure is also applied to a lower half picture [0033] In the step 400, a bounding box (e.g. rectangular, circular, spherical, square, triangular or another shape) is initialized. Initializing includes setting the bounding box width as half of an image width plus a six block width and a bounding box height equal to the bounding box width if an image width is larger than an image height, and setting the bounding box width as half of the image height and the bounding box height equal to the bounding box width plus twelve block width if the image width is less than or equal to the image height. Initializing also includes using the candidate high variance block centroid as the bounding box center to draw the bounding box. If the bounding box is over the image boundary, the bounding box is shifted in the image such that the bounding box has a minimum 3 blocks distance from the image boundary. Jung: Figure 4, [0053] FIGS. 5A-5D are visual depictions of an example of pedestrian ROI refinement, according to an embodiment of the present invention. Depth map based detected ROIs are further refined by examining a combination of depth and edge features of two individual pedestrian detections. In step 413, a new pedestrian ROI is initialized at each detected peak, which is refined first horizontally and then vertically to obtain a more centered and tightly fitting bounding box about a candidate pedestrian. This involves employing vertical and horizontal projections, respectively, of binarized disparity maps (similar to using the edge pixels above) followed by detection of peak and valley locations in the computed projections. After this refinement, in step 414, any resulting overlapping detections are again removed from the detection list. [0079] The input ROI 1502 to the multi-layer convolutional network 1500 may be preprocessed before propagation through the network 1500, according to an embodiment of the present invention. In a preferred embodiment, the input ROI 1502 may comprise an 80x40 pixel block. Contrast normalization is applied to the input ROI 1502. Each pixel's intensity is divided by the standard deviation of the surrounding neighborhood pixels ( e.g., a 7x7 pixel neighborhood). This preprocessing step increases contrast in low-contrast regions and decreases contrast in high-contrast regions.)

Consider Claims 3 and 10. 

3. The method as claim 1 recites, wherein the low variance region is determined based at least in part on a feature associated with the data. / 10. The system as claim 9 recites, wherein the annotated low variance region is associated with a feature of the sensor data determined by the model. (Zhang: [0019] In the block variance-based detection module, the visually stand-out blocks are selected by comparing each block's variance with a content adaptive threshold. The distribution compactness of visual stand-out blocks is extracted. If the distribution compactness is compact enough, the image has a very obvious stand-out object. In the step 102, if the image has a stand-out object (e.g. is an object portrait), then the processing directly jumps to an object of interest isolation module to conduct block variance-based object of interest isolation. In the step 102, if the detection result is not good enough (e.g. distribution compactness not compact enough), processing continues to a transition map-based detection module. In the step 104, in the transition map-based detection module, a transition map is generated based on a block difference between each block with its neighbor blocks (e.g. eight neighbor blocks). [0027] During the above process, the location centroid of the candidate high variance blocks in the upper half of the image is also extracted if the total number of candidate high variance blocks in the upper half of the image is larger than a threshold which is one fourth of the number of blocks in one row of an image. A similar procedure is also applied to a lower half picture. Jung: [0050] In block S4, a structure classification (SC) module employs a combined image derived from the pyramid of depth images, DO+Dl+D2, to classify image regions into several broad categories such as tall vertical structures, overhanging structures, ground, and poles and to remove pedestrian candidate regions having a significant overlap. These image regions classified as non-pedestrians are provided with scene labels 142. In block SS, the scene labels 142 are fused with the pedestrian candidate regions to produce a pruned set of pedestrian regions-of-interest (ROIs ). In block S6, a pedestrian classification (PC) module takes in the list of pedestrian ROIs and confirms valid pedestrian detections 144 by using a cascade of classifiers tuned for several depth bands and trained on a combination of pedestrian contour and gradient features. [0056] To further classify the patches, in step 618, a representation from the range map is created called a vertical support (VS) histogram.)

Consider Claims 4 and 11. 
The combination of Jung and Zhang teaches: 
4. The method as claim 3 recites, the method further comprising: inputting the feature into an additional model; receiving, from the additional model, a reconstructed output; and determining a loss based on a difference between the reconstructed output and the data, wherein altering the one or more parameters is further based at least in part on the loss./ 11. The system as claim 10 recites, the operations further comprising: mapping the feature to a reconstructed input; and determining, as a loss, a difference between the sensor data and the reconstructed input, wherein training the model is further based at least in part on the loss. (Jung: [0063]-[0065] For candidate ROIs (pedestrians) located at greater distances beyond a predetermined threshold, a cascade of HOG based classifiers is employed, HOG-based classifiers have been proven to be effective for relatively low-resolution images when body contours are distinguishable from the background. Each HOG classifier is trained separately for each resolution band, For this purpose, in the training phase. Zhang: [0031] The centroid around variance of the upper half image and lower half image are then compared to the image adaptive threshold VARTH, in the steps 308 and 312, respectively. If any of them is less than the threshold, a confidence value is set to a value (e.g. 3) and the process jumps to the object of interest isolation module. Otherwise, the processing is continued to a transition map-based detection. [0038])

Consider Claim 18. The combination of Jung and Zhang teaches: The one or more non-transitory computer-readable media as claim 17 recites, wherein altering the one or more parameters is further based at least in part on the second difference. (Jung: [0063]-[0065] For candidate ROIs (pedestrians) located at greater distances beyond a predetermined threshold, a cascade of HOG based classifiers is employed, HOG-based classifiers have been proven to be effective for relatively low-resolution images when body contours are distinguishable from the background. Each HOG classifier is trained separately for each resolution band, For this purpose, in the training phase. Zhang: [0031] The centroid around variance of the upper half image and lower half image are then compared to the image adaptive threshold VARTH, in the steps 308 and 312, respectively. If any of them is less than the threshold, a confidence value is set to a value (e.g. 3) and the process jumps to the object of interest isolation module. Otherwise, the processing is continued to a transition map-based detection. [0038])

Consider Claims 5 and 19. 
The combination of Jung and Zhang teaches: 
5. The method as claim 1 recites, wherein: the model is a neural network, and the high variance output is based on the low variance output, the method further comprising determining an additional high variance output, and further wherein altering the one or more parameters comprises training the model end- to-end based at least in part on the low variance output, the high variance output, and the additional high variance output. / 19. The (Jung: [0063]-[0065] For candidate ROIs (pedestrians) located at greater distances beyond a predetermined threshold, a cascade of HOG based classifiers is employed, HOG-based classifiers have been proven to be effective for relatively low-resolution images when body contours are distinguishable from the background. Each HOG classifier is trained separately for each resolution band, For this purpose, in the training phase. [0079] The input ROI 1502 to the multi-layer convolutional network 1500 may be preprocessed before propagation through the network 1500, according to an embodiment of the present invention. In a preferred embodiment, the input ROI 1502 may comprise an 80x40 pixel block. Contrast normalization is applied to the input ROI 1502. Each pixel's intensity is divided by the standard deviation of the surrounding neighborhood pixels ( e.g., a 7x7 pixel neighborhood). This preprocessing step increases contrast in low-contrast regions and decreases contrast in high-contrast regions. Zhang: [0031] The centroid around variance of the upper half image and lower half image are then compared to the image adaptive threshold VARTH, in the steps 308 and 312, respectively. If any of them is less than the threshold, a confidence value is set to a value (e.g. 3) and the process jumps to the object of interest isolation module. Otherwise, the processing is continued to a transition map-based detection. [0038])

Consider Claims 6 and 20.  

6. The method as claim 1 recites, wherein the low variance output comprises head detection and the high variance output comprises a pedestrian detection./ 20. The one or more non-transitory computer-readable media as claim 16 recites, wherein the low variance output comprises head detection and the high variance output comprises a pedestrian detection.(Zhang: [0033] In the step 400, a bounding box (e.g. rectangular, circular, spherical, square, triangular or another shape) is initialized. Initializing includes setting the bounding box width as half of an image width plus a six block width and a bounding box height equal to the bounding box width if an image width is larger than an image height, and setting the bounding box width as half of the image height and the bounding box height equal to the bounding box width plus twelve block width if the image width is less than or equal to the image height. Initializing also includes using the candidate high variance block centroid as the bounding box center to draw the bounding box. If the bounding box is over the image boundary, the bounding box is shifted in the image such that the bounding box has a minimum 3 blocks distance from the image boundary. Jung: Figure 4, [0053] FIGS. 5A-4D are visual depictions of an example of pedestrian ROI refinement, according to an embodiment of the present invention. Depth map based detected ROIs are further refined by examining a combination of depth and edge features of two individual pedestrian detections. In step 413, a new pedestrian ROI is initialized at each detected peak, which is refined first horizontally and then vertically to obtain a more centered and tightly fitting bounding box about a candidate pedestrian. This involves employing vertical and horizontal projections, respectively, of binarized disparity maps (similar to using the edge pixels above) followed by detection of peak and valley locations in the computed projections. After this refinement, in step 414, any resulting overlapping detections are again removed from the detection list.)

Consider Claims 7 and 9. 
The combination of Jung and Zhang teaches: 
7. The method as claim 1 recites, wherein the data comprises image data, a batch of image data, or an image space. / 9. The system as claim 8 recites, wherein the sensor data comprises at least one of image data, a batch of image data, or an image space. (Zhang: [0055]-[0056], [0008] In another aspect, a camera comprises a lens, a sensor configured for acquiring an input image through the lens, a memory for storing an application, the application configured for processing the input image using a block variance-based detection module, if the input image includes a standout object, [0019]; Jung: [0042]-[0047], [0043] FIG. 1 depicts a vehicle 100 that is equipped with an exemplary digital processing system 110 configured to acquire a plurality of images and detect the presence of one or more pedestrians 102 in a scene 104 in the vicinity of the vehicle 100, according to an embodiment of the present invention.)

Consider Claim 12. 
The combination of Jung and Zhang teaches: 12. The system as claim 11 recites, wherein mapping the feature to reconstructed input comprises: inputting the feature into an additional model; and receiving, from the additional model, the reconstructed input.(Zhang: [0055]-[0056], [0008] In another aspect, a camera comprises a lens, a sensor configured for acquiring an input image through the lens, a memory for storing an application, the application configured for processing the input image using a block variance-based detection module, if the input image includes a standout object, [0019]; Jung: [0042]-[0047], [0043] FIG. 1 depicts a vehicle 100 that is equipped with an exemplary digital processing system 110 configured to acquire a plurality of images and detect the presence of one or more pedestrians 102 in a scene 104 in the vicinity of the vehicle 100, according to an embodiment of the present invention.  Jung: [0063]-[0065] For candidate ROIs (pedestrians) located at greater distances beyond a predetermined threshold, a cascade of HOG based classifiers is employed, HOG-based classifiers have been proven to be effective for relatively low-resolution images when body contours are distinguishable from the background. Each HOG classifier is trained separately for each resolution band, For this purpose, in the training phase. [0079] The input ROI 1502 to the multi-layer convolutional network 1500 may be preprocessed before propagation through the network 1500, according to an embodiment of the present invention. In a preferred embodiment, the input ROI 1502 may comprise an 80x40 pixel block. Contrast normalization is applied to the input ROI 1502. Each pixel's intensity is divided by the standard deviation of the surrounding neighborhood pixels ( e.g., a 7x7 pixel neighborhood). This preprocessing step increases contrast in low-contrast regions and decreases contrast in high-contrast regions.)

Consider Claim 13. 
The combination of Jung and Zhang teaches: 13. The system as claim 11 recites, wherein the indication of the high variance region is based at least in part on the indication of the low variance region, and wherein training the model comprises training the model from end to end. (Jung: [0063]-[0065] For candidate ROIs (pedestrians) located at greater distances beyond a predetermined threshold, a cascade of HOG based classifiers is employed, HOG-based classifiers have been proven to be effective for relatively low-resolution images when body contours are distinguishable from the background. Each HOG classifier is trained separately for each resolution band, For this purpose, in the training phase. Zhang: [0031] The centroid around variance of the upper half image and lower half image are then compared to the image adaptive threshold VARTH, in the steps 308 and 312, respectively. If any of them is less than the threshold, a confidence value is set to a value (e.g. 3) and the process jumps to the object of interest isolation module. Otherwise, the processing is continued to a transition map-based detection. [0038])

Consider Claim 17. 
The combination of Jung and Zhang teaches: 
17. The one or more non-transitory computer-readable media as claim 15 recites, the operations further comprising: inputting at least a portion of the sensor data into the model; receiving, as a set of features, an intermediate output of the model; inputting the set of features into one or more of an additional model or a portion of the model; receiving, from the one or more of additional model or portion of the model, a reconstructed input; and determining a second difference between the reconstructed output and the portion of the sensor data, wherein determining the annotated data comprises determining, using a statistical model, the low variance region associated with the set of features. (Zhang: [0027] During the above process, the location centroid of the candidate high variance blocks in the upper half of the image is also extracted if the total number of candidate high variance blocks in the upper half of the image is larger than a threshold which is one fourth of the number of blocks in one row of an image. A similar procedure is also applied to a lower half picture [0033] In the step 400, a bounding box (e.g. rectangular, circular, spherical, square, triangular or another shape) is initialized. Initializing includes setting the bounding box width as half of an image width plus a six block width and a bounding box height equal to the bounding box width if an image width is larger than an image height, and setting the bounding box width as half of the image height and the bounding box height equal to the bounding box width plus twelve block width if the image width is less than or equal to the image height. Initializing also includes using the candidate high variance block centroid as the bounding box center to draw the bounding box. If the bounding box is over the image boundary, the bounding box is shifted in the image such that the bounding box has a minimum 3 blocks distance from the image boundary. Jung: Figure 4, [0053] FIGS. 5A-5D are visual depictions of an example of pedestrian ROI refinement, according to an embodiment of the present invention. Depth map based detected ROIs are further refined by examining a combination of depth and edge features of two individual pedestrian detections. In step 413, a new pedestrian ROI is initialized at each detected peak, which is refined first horizontally and then vertically to obtain a more centered and tightly fitting bounding box about a candidate pedestrian. This involves employing vertical and horizontal projections, respectively, of binarized disparity maps (similar to using the edge pixels above) followed by detection of peak and valley locations in the computed projections. After this refinement, in step 414, any resulting overlapping detections are again removed from the detection list.)
Conclusion
The prior art made of record in form PTO-892 and not relied upon is considered pertinent to applicant's disclosure. 
Bect et al., USPGPub US 2009/0143987, METHOD AND SYSTEM FOR PREDICTING THE IMPACT BETWEEN A VEHICLE AND A PEDESTRIAN
Bhaskara et al., US PGPub 2020/0309957, IDENTIFYING AND/OR REMOVING FALSE
POSITIVE DETECTIONS FROM LIDAR SENSOR OUTPUT 
Nagaoka; Nobuharu et al., US 7130448 B2, Device for monitoring around a vehicle	

Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAHMINA ANSARI whose telephone number is 571-270-3379.  The examiner can normally be reached on IFP Flex - Monday through Friday 9 to 5.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SUMATI LEFKOWITZ can be reached on 571-272-3638.  The fax phone numbers for the organization where this application or proceeding is assigned are 571-273-8300 for regular communications and 571-273-8300 for After Final communications. TC 2600’s customer service number is 571-272-2600.
Any inquiry of a general nature or relating to the status of this application or proceeding should be directed to the receptionist whose telephone number is 571-272-2600.



2662
/Tahmina Ansari/

March 2, 2022
/TAHMINA N ANSARI/Primary Examiner, Art Unit 2662