DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 21-26, 29-34, and 37-40 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hotson (PGPUB: 20180211403) in view of Wang (PGPUB: 20190164290), in view of YAO (PGPUB: 20180018524), and further in view of MITTAL (PGPUB: 20160379094).

Regarding claims 21, 29, and 37, Hotson teaches a computer-implemented method of autonomous vehicle operation, the computer-implemented method comprising: 
receiving sensor data comprising one or more images (see Fig. 1, paragraph 21, the sensor systems/devices 106-110 and 114 may be used to obtain real-time sensor data so that the automated driving/assistance system 102 can assist a driver or drive a vehicle in real-time); 
see Fig. 4, paragraph 32, 33, and 34, The feature maps may then be used for object detection or classification; the first stage 402 shows the input of Image 0 for the generation of one or more feature maps 408; the feature maps may be generated using one or more neural networks; recurrent neural network (RNN) state 0-0 is the resulting prediction for object 0 at the sub-region 410);
determining in the second stage of the multiple stage classification and based on the input to the second stage, one or more second stage characteristics of the sensor data based in part on a second machine-learned model (see Fig. 4, paragraph 32, 33, and 34, The feature maps may then be used for object detection or classification; the first stage 402 shows the input of Image 0 for the generation of one or more feature maps 408. the feature maps may be generated using one or more neural networks; RNN state 0-1 is the resulting prediction for object 1 at the sub-region 410); and 
generating an object output based on the second stage characteristics, wherein the object output describes detection of one or more objects in the sensor data (see Fig. 5, paragraph 39, the detection component 104 may include or use a recurrent connection in a neural network. The detection component 104 determines 506 an output for the second sensor frame indicating a presence of an object or feature based on the output for the first sensor frame).  
Hotson does not expressly teach determining one or more first stage 
Wang teaches that first stage 151 includes two fully convolutional layers followed by a max pooling layer, second stage 152 includes two fully convolutional layers followed by a max pooling layer, third stage 153 includes three fully convolutional layers followed by a max pooling layer, fourth stage 154 includes three fully convolutional layers followed by a max pooling layer, and fifth stage 155 includes three fully convolutional layers followed by a max pooling layer. Network stages 151-155 may include other optional layers (e.g., rectified linear units and/or local response normalization) and network stages 151-155 any suitable fully convolutional layers and/or stages. Furthermore, multi -stage fully convolutional network 105 and semantic image segmentation system 100 may be characterized as a fully convolutional network (see Fig. 1, paragraph 27); ground truth objectness labels 411 include labels 401 corresponding to non-object, not an object, background or the like and labels 402 corresponding to object, object of interest, foreground, or the like. As shown, ground truth objectness labels 411 provide a binary map of training image 211 indicating on a pixel-by-pixel basis whether the pixel corresponds to an object to be labeled or not (see Fig. 4, paragraph 47).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Hotson by Wang for providing multi-stage fully convolutional network 105 and semantic image segmentation system 100 may be characterized as a fully convolutional network, as determining one or more first stage 
However, the combination does not expressly teach the second stage characteristics.
Yao teaches that after fine tuning based on inputs from HMSHNM 102, transfer learning guided FCN 101 is used as the pedestrian-specific FCN model 113 to perform feature detection and provide a heat map consisting of probability scores for pedestrian detection for each received input image (see Fig. 1, paragraph 20).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination by Yao for providing to perform feature detection and provide a heat map consisting of probability scores for pedestrian detection for each received input image, as the second stage characteristics. Therefore, combining the elements from prior arts according to known methods and technique would yield predictable results.
However, the combination does not expressly teach excluding one or more areas within the one or more images.
MITTAL teaches that where the classification platform 115 may determine the ROI based, at least in part, on one or more parameters associated with the image. In see Fig. 4, paragraph 71).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination by MITTAL for providing a ROI within an image may be extracted, for example, by excluding lower and upper portions of the image. The excluded portions may include features or objects that may be of no/low interest (e.g., sky, car blur, etc.) to certain applications or consumers, as excluding one or more areas within the one or more images. Therefore, combining the elements from prior arts according to known methods and technique, such as excluding lower and upper portions of the image or features or objects that may be of no/low interest, would yield predictable results.

Regarding claims 22, 30, and 38, the combination teaches wherein the one or more first stage characteristics of the sensor data determined in the first stage of the multiple stage classifications describes a likelihood that the portion of the sensor data that is excluded from the input to the second stage of the multiple stage classification contains objects (see Wang, Fig. 4, paragraph 47, ground truth objectness labels 411 include labels 401 corresponding to non-object, not an object, background or the like and labels 402 corresponding to object, object of interest, foreground , or the like. As shown, ground truth objectness labels 411 provide a binary map of training image 211 indicating on a pixel-by-pixel basis whether the pixel corresponds to an object to be labeled or not).
  
Regarding claims 23, 31, 32, and 39, the combination teaches further comprising generating, by the computing system, in the first stage, a heat map associated with the sensorPage 3 of 8, the heat map describing a probability of an object being contained within a respective area of the plurality of areas of the sensor data (see Yao, Fig. 1, paragraph 34, the fine-tuning on pedestrian dataset (as described below) is performed to create FCN 211. Thus, while CNN 201 is able to be fed a portion of an image, namely image 200, FCN 211 is able to be fed any sized image, such as image 210 which contains image 200, and produce pedestrian probabilities 212 (e.g., probability scores) to create a heat map 213 (where color is used to represent different levels of probability scores).

Regarding claims 24 and 40, the combination teaches wherein excluding the portion of the sensor data from the input to the second stage of the multiple stage classification based on the one or more first stage characteristics of the sensor data  comprises excluding the portion of the sensor data (see Wang, paragraph 19, receive an input image (e.g., from a memory or other image or video source) and provide a semantic image segmentation of the input (e.g., the semantic image segmentation including pixel-level category labels for pixels of the input image) and/or an objectness image segmentation of the input (e.g., the objectness image segmentation including pixel-level object or non-object labels for pixels of the input image)) based on the heat map (see Yao, Fig. 6, paragraph 54, the detection results output from the pedestrian-specific FCN model for each image in the set of images comprises heat maps. In one embodiment, the detection results output from the pedestrian-specific FCN model for each image in the set of images is a plurality of probability scores for a plurality of positions in each image).  

Regarding claims 25 and 33, the combination teaches wherein the portion of the sensor data that is excluded from the input to the second stage of the multiple stage classification is associated with one or more background portions of the sensor data (see Wang, Fig. 1and 2, paragraph 19 and 45, learning a C-class objectness mask and classifying C -class (e.g., C being the number of total classes) semantic labels. Such a divide and conquer strategy may efficiently remove the interference of image or video frame backgrounds).  

Regarding claims 26 and 34, the combination teaches wherein the input to the second stage of the multiple stage classification is associated with one or more foreground portions of the sensor data (see Wang, Fig. 1-4, paragraph 47 and 56, ground truth objectness labels 411 include labels 401 corresponding to non-object, not an object, background or the like and labels 402 corresponding to object, object of interest, foreground, or the like. As shown, ground truth objectness labels 411 provide a binary map of training image 211 indicating on a pixel-by-pixel basis whether the pixel corresponds to an object to be labeled or not; classification module 168 may generate output objectness labels 122 based on the resultant objectness scores from convolutional layer 167 and/or the resultant semantic scores from convolutional layer 162).  


Claims 27-28 and 35-36 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hotson (PGPUB: 20180211403) in view of Wang (PGPUB: 20190164290), in view of YAO (PGPUB: 20180018524), in view of MITTAL (PGPUB: 20160379094), and further in view of VIDAL (PGPUB: 20150363660).

Regarding claims 27 and 35, the combination teaches further comprising:
generating, by the computing system, in the first stage and based in part on the sensor data, visual descriptor output associated with the sensor data (see Wang, Fig. 8, paragraph 92, Graphics subsystem 815 may perform processing of images such as still images, graphics, or video for display. Graphics subsystem 815 may be a graphics processing unit (GPU), a visual processing unit (VPU), or an image processing unit, for example. In some examples, graphics subsystem 815 may perform scanned image rendering as discussed herein), 
wherein the one or more first stage characteristics are determined based in part on the visual descriptor output (see Wang, Fig. 1, paragraph 28, if 100 filters are applied at first stage 151 with max pooling that reduces the resolution by one half in each dimension, feature maps 112 may include 100 feature maps each having a resolution of 1/2N.times.1/2M. In an embodiment, feature maps 112 have a resolution of 160.times.160. Feature maps 112 and any other feature maps discussed herein may be characterized as a feature map, a set of feature maps, a response map, a set of response maps).  
However, the combination does not expressly teach the visual descriptor output comprising color hue information, color saturation information, brightness information, or histogram of oriented gradients information.
However, the combination does not expressly teach that the visual descriptor output comprises color hue information, color saturation information, brightness information, or histogram of oriented gradients information.
VIDAL teaches that with the region of interest identified, the region is subsequently represented in terms of a plurality of visual characteristics, such as color histograms in different color spaces like CieLab, Luv, or HSV, histograms of oriented gradients, Haar wavelets, shape context and other standard descriptors. The features are typically indexed so that similar images can be found efficiently (see Fig. 1, paragraph 83).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination by VIDAL for providing the region is subsequently represented in terms of a plurality of visual characteristics, such as color histograms in different color spaces like CieLab, Luv, or HSV, histograms of oriented gradients, Haar wavelets, shape context and other standard descriptors, as the visual descriptor output comprises color hue information, color saturation information, brightness information, or histogram of oriented gradients information. Therefore, the combination of the teaching, suggestion, or motivation in the prior art would have led 

Regarding claims 28 and 36, the combination teaches wherein excluding the portion of the sensor data from the input to the second stage of the multiple stage classification based on the one or more first stage characteristics of the sensor data comprises excluding the portion of the sensor data based on the visual descriptor output associated with the sensor data (see VIDAL, Fig. 9, paragraph 93, the query image 902 shows a model against a gray background wearing a long dress with a floral pattern. By applying the process of FIG. 6 to the query at query time, the server processor 202 utilizing the classifier 500 extracts visual descriptors only from the dress, not from any other areas of the image (e.g. the model's face, the gray background). Because the segmentation method was also applied to each of the catalog images, the claimed system 1000 is able to retrieve images that have a different layout from the query (e.g. an image 906 depicting only a dress with no model or background, and an image 908 depicting a differently looking model against a structured background)). 


Response to Arguments
Applicant’s arguments with respect to claim(s) 21, 29, and 37 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIN JIA whose telephone number is (571)270-5536.  The examiner can normally be reached on 9:00 am-7:30pm.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/XIN JIA/Primary Examiner, Art Unit 2667