Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
Applicant is reminded of the proper content of an abstract of the disclosure.
The abstract of the disclosure is objected to because the abstract exceeds 150 words.  Correction is required.  See MPEP § 608.01(b).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 11-14, 16-18, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chiu (US 2019/0051056 Al) in view of Kirillov (Alexander Kirillov et al.: InstanceCut: from Edges to Instances with MultiCut. CoRR abs/1611.08272 (2016)).



Regarding claim 11 Chiu teaches a method for evaluating an optical appearance of surroundings of a vehicle, the method comprising: capturing, by a camera of the vehicle, a captured image of the surroundings of the vehicle; Camera 101 may be integrated into a vehicle as indicated in paragraph [0025] “The 2D video frame captured from the ground vehicle may be
georegistered within a 2D rendered image rendered from 3D reference data.”  

 extracting, by a decoder of the vehicle, features from the captured image; See paragraph [0032] lines 1-7 “In some examples, semantic segmentation unit 106 uses a SegNet encoder decoder network to conduct semantic segmentation for each of 2D frames 102 and label each pixel for the input video sequences. In one example, the encoder-decoder network comprises 4 layers for both the encoder and the decoder, 7x7 convolutional layers, and 64 features per
layer.”

performing, by a first analysis device of the vehicle, a first analysis of the captured image by detecting one or more objects of an object class as surfaces in the captured image, wherein the first analysis is implemented as a semantic segmentation, See all of paragraph [0032] In some examples, semantic segmentation unit 106 uses a SegNet encoder decoder network to conduct semantic segmentation for each of 2D frames 102 and label each pixel for the input video sequences. In one example, the encoder-decoder network comprises 4 layers for both the encoder and the decoder, 7x7 convolutional layers, and 64 features per layer. In one example, semantic segmentation unit 106 processes each 2D frame 102 to label each pixel of the frame
with a different semantic classification label that indicates, for each pixel, that the pixel is a pixel within an image of an object corresponding to the classification label. For example, the set of semantic classification labels to which pixels may be semantically labeled may include: Sky, Building, Pole, Road Marking, Road, Pavement, Tree, Sign Symbol, Fence, Vehicle, Pedestrian, and Bicycle. A pixel of a 2D frame 102 that is labeled with the classification label Tree is a pixel within an image of a tree within the 2D frame. In one example, the SegNet architecture may be used because of its available trained models for urban environments.

However Chiu does not teach wherein the one or more objects overlap or adjoin each other in the captured image, and wherein a result of the first analysis is a first analysis result; 
performing, by a second analysis device of the vehicle, a second analysis of the captured image by detecting edges of the one or more objects, wherein a result of the second analysis is a second analysis result;  and combining, by the decoder, the first analysis result and the second analysis result to form an output image by excluding the edges from the surfaces of the one or more objects, wherein the one or more objects of the object class emerge from the surfaces and are visibly separated from each other by the edges; wherein the first analysis and the second analysis are carried out independently of each other.

Kirillov teaches wherein the one or more objects overlap or adjoin each other in the captured image, and wherein a result of the first analysis is a first analysis result; See figure 1, the cars overlap each other in the cityscape image used for semantic segmentation 

performing, by a second analysis device of the vehicle, a second analysis of the captured image by detecting edges of the one or more objects, wherein a result of the second analysis is a second analysis result; Section 3.3 teaches an instance-aware edge detection to detect edges in the image.

 and combining, by the decoder, the first analysis result and the second analysis result to form an output image by excluding the edges from the surfaces of the one or more objects, wherein the one or more objects of the object class emerge from the surfaces and are visibly separated from each other by the edges; wherein the first analysis and the second analysis are carried out independently of each other.
Figure 2 explicitly outlines the in parallel processing of the combination of the semantic segmentation and edge image data in order to approximate instance segmentation.  
    PNG
    media_image1.png
    719
    466
    media_image1.png
    Greyscale


Chiu and Kirillov are combinable because they are directed to semantic segmentation of street view images 
It would have been obvious to a person of ordinary skill in the art at the time of filing to incorporate Kirillov’s InstanceCut method of   instance segmentation with Chia’s semantic segmentation image because doing so would improve the performance of the instance segmentation and result in faster object detection.
Therefore, it would have been obvious to combine Chiu with Kirillov to obtain the invention as specified in claim 11.

Regarding claim 12, Chiu and Kirillov teach the method of claim 11, Kirillov further teaches wherein the performing the first analysis and the performing the second analysis include using a common deep neural network. See section 3.2, the second to last paragraph in particular “In our experiments, we employ two publicly available pre-trained FCNs: Dilation10 [52] and LRR-4x [20]. These networks have been trained by the respective authors and we can also use them as provided, without any fine-tuning. Note, that we also use the CNN-CRF frameworks [56, 10]
with dense CRF”

Regarding claim 13, Chiu and Kirillov teach the method of claim 11, Kirillov further teaches, wherein the performing the first analysis includes using first training data, wherein the first training data have class classifications. See the last paragraph of the Introduction section “In our approach, we only train classifiers for semantic segmentation and instance-edge detection, and not directly any classifier for dealing with global properties of an instance”

Regarding claim 14, Chiu and Kirillov teach the method of claim 11, Kirillov further teaches wherein the performing the second analysis includes using second training data, wherein the second training data have contour classifications.  Examiner note : in paragraph [0040] of the instant application it states “Contour classifications or contour labels mean data that can be used as stored training data for the second analysis.“ The second analysis being that of detecting an edge. Thus contour classifications must be data which has been used to train the edge detector. See the 3rd paragraph of section 3.3 “In our work the instance-aware edge detection outputs a probability for each pixel, whether it touches a boundary. This problem is more challenging than canonical edge detection, since it requires to reason about contours and semantics jointly, distinguishing the true objects’ boundaries and other not relevant edges, e.g. inside the object or in the background. Below (see Fig. 3), we describe a new network architecture for this task that utilizes the idea of the intermediate FCN features concatenation. As a base for our network we use an FCN that is trained for semantic segmentation on the dataset that we want to use for object boundary prediction. In our experiments we use a pre-trained Dilation10 [52] model”



Regarding claim 16, Chiu and Kirillov teach the method of claim 11, Kirillov further teaches wherein the performing the first analysis and the performing the second analysis are carried out temporally in parallel. See figure 2: prior to the image partition the semantic segmentation and instance-aware edge detection happen in parallel 
    PNG
    media_image1.png
    719
    466
    media_image1.png
    Greyscale


Regarding claim 17 Chiu and Kirillov teach the method of claim 11, Kirillov further teaches wherein the output image comprises a closed surface or a plurality of closed surfaces, and each closed surface represents one of the one or more objects of the object class. See Figure 1: (c) is the output image, which contains a plurality of closed surfaces 
    PNG
    media_image2.png
    294
    522
    media_image2.png
    Greyscale


Regarding claim 18 Chiu and Kirillov teach the method of claim 11, Kirillov further teaches further comprising assigning a unique identification for each of the closed surface. See Figure 1, image (c) represents the output and the instances within each class has their own unique color within their closed surface. 

Regarding claim 20, Chiu and Kirillov teach a vehicle comprising a camera for providing a captured image of surroundings of a vehicle, a first analysis device, a second analysis device, and a decoder, wherein the vehicle is configured to: capture, by the camera, the captured image of the surroundings of the vehicle; Camera 101 may be integrated into a vehicle as indicated in paragraph [0025] “The 2D video frame captured from the ground vehicle may be
georegistered within a 2D rendered image rendered from 3D reference data.”  


extract, by the decoder, features from the captured image; See paragraph [0032] lines 1-7 “In some examples, semantic segmentation unit 106 uses a SegNet encoder decoder network to conduct semantic segmentation for each of 2D frames 102 and label each pixel for the input video sequences. In one example, the encoder-decoder network comprises 4 layers for both the encoder and the decoder, 7x7 convolutional layers, and 64 features per
layer.”



 perform, by the first analysis device, a first analysis of the captured image by detecting one or more objects of an object class as surfaces in the captured image, wherein the first analysis is implemented as a semantic segmentation, wherein the one or more objects overlap or adjoin each other in the captured image, and wherein a result of the first analysis is a first analysis result; See all of paragraph [0032]In some examples, semantic segmentation unit 106 uses a SegNet encoder decoder network to conduct semantic segmentation for each of 2D frames 102 and label each pixel for the input video sequences. In one example, the encoder-decoder network comprises 4 layers for both the encoder and the decoder, 7x7 convolutional layers, and 64 features per layer. In one example, semantic segmentation unit 106 processes each 2D frame 102 to label each pixel of the frame with a different semantic classification label that indicates, for each pixel, that the pixel is a pixel within an image of an object corresponding to the classification label. For example, the set of semantic classification labels to which pixels may be semantically labeled may include: Sky, Building, Pole, Road Marking, Road, Pavement, Tree, Sign Symbol, Fence, Vehicle, Pedestrian, and Bicycle. A pixel of a 2D frame 102 that is labeled with the classification label Tree is a pixel within an image of a tree within the 2D frame. In one example, the SegNet architecture may be used because of its available trained models for urban environments.

However Chiu does not teach perform, by the second analysis device, a second analysis of the captured image by detecting edges of the one or more objects, wherein a result of the second analysis is a second analysis result; and combine, by the decoder, the first analysis result and the second analysis result to form an output image by excluding the edges from the surfaces of the one or more objects, wherein the one or more objects of the object class emerge from the surfaces and are visibly separated from each other by the edges; wherein the first analysis and the second analysis are carried out independently of each other.

Kirillov teaches perform, by the second analysis device, a second analysis of the captured image by detecting edges of the one or more objects, wherein a result of the second analysis is a second analysis result; Section 3.3 teaches an instance-aware edge detection to detect edges in the image.

and combine, by the decoder, the first analysis result and the second analysis result to form an output image by excluding the edges from the surfaces of the one or more objects, wherein the one or more objects of the object class emerge from the surfaces and are visibly separated 
from each other by the edges; wherein the first analysis and the second analysis are carried out independently of each other.
Figure 2 explicitly outlines the in parallel processing of the combination of the semantic segmentation and edge image data in order to approximate instance segmentation.  

    PNG
    media_image1.png
    719
    466
    media_image1.png
    Greyscale







Claim 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chiu and Kirillov, further in view of Yamazaki (US 20060050955 A1).

Regarding claim 15, Chiu and Kirillov teach the method of claim 11, however they do not teach wherein the decoder is divided into two threads, and comprises the first analysis device and the second analysis device.  

However Yamazaki teaches the decoder is divided into two threads, and comprises the first analysis device and the second analysis device.  See paragraph [0065] “the instructions or microcodes of the two threads are stored simultaneously within the buffers 222, 232, and 242 of the instruction fetch unit 220, the instruction decoder 230, and the execution unit 240.” Further see figure 2. The decoder contains a first thread and a second thread within. 
Modified Chiu and Yamazaki are combinable because they are directed broadly to processing color images.
It would have been obvious to a person of ordinary skill in the art at the time of filing to incorporate a method of dividing the first analysis and second analysis into two threads as doing so would allow both of them to be carried out temporally in parallel
The suggestion/motivation for doing so is that doing so would increase the efficiency of the instance segmentation by allowing the edge detection and semantic segmentation to run in parallel to create the instance segmentation image.



Claim 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chiu and Kirillov and in further view of Sharma (US 10,664,722 Bl).
Regarding claim 19, Chiu and Kirillov teach the method of claim 17, however they do not teach further comprising: assigning a fingerprint for each closed surface; and comparing the fingerprint to previously recorded image data
However Sharma teaches assigning a fingerprint for each closed surface; and comparing the fingerprint to previously recorded image data See column 58, lines 50-61 “Yet other fingerprinting techniques are variously known as Bag of Features, or Bag of Words methods. Such methods extract local features from patches of an image (e.g., SIFT points), and automatically cluster the features into N groups ( e.g., 168 groups )-each corresponding to a prototypical local feature. A vector of occurrence counts of each of the groups (i.e., a histogram) is then determined, and serves as a reference signature for the image. To determine if a query image matches the reference image, local features are again extracted from patches of the image, and assigned to one of the earlier-defined N-groups (e.g., based on a distance measure from the corresponding prototypical local features).”
Modified Chiu and Sharma are combinable because they are directed to methods of object recognition involving semantic segmentation.
It would have been obvious to a person of ordinary skill in the art at the time of filing to incorporate a method of fingerprinting objects recognized in an image using the method taught by Sharma into modified Chiu, as doing so would allow one to track an object across frames based on previous frames.
Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KALEB TESSEMA whose telephone number is (571)272-2696.  The examiner can normally be reached on Monday-Thursday and Alternate Fridays 8:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on 571-272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/KALEB TESSEMA/Examiner, Art Unit 2667 

/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667