DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 3, 7, 8, 9, 10, 14, 15, 16 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Mei et al. (US Patent No. 7890512 B2) in view of Agosta et al. (US Pub No. 20140355879 A1). 

Regarding Claim 1,
semantic features in images are obtained for training)

generating a plurality of annotation regional proposals for a batch of images; (Mei, Abstract, discloses images are automatically annotated using semantic distance learning. Training images are manually annotated and partitioned into semantic clusters. Semantic distance functions (SDFs) are learned for the clusters. The SDF for each cluster is used to compute semantic distance scores between a new image and each image in the cluster. The scores for each cluster are used to generate a ranking list which ranks each image in the cluster according to its semantic distance from the new image. An association probability is estimated for each cluster which specifies the semantic features in images are annotated)

computing a confidence metric indicating a degree of agreement between the plurality of annotation regional proposals; (Mei, Abstract, discloses Images are automatically annotated using semantic distance learning. Training images are manually annotated and partitioned into semantic clusters. Semantic distance functions (SDFs) are learned for the clusters. The SDF for each cluster is used to compute semantic distance scores between a new image and each image in the cluster. The scores for each cluster are used to generate a ranking list which ranks each image in the cluster according to its semantic distance from the new image. An association probability is estimated for each cluster which specifies the probability of the new image being semantically associated with the cluster. Cluster-specific probabilistic annotations for the new image are generated from the manual annotations for the images in each cluster. The association probabilities and cluster-specific probabilistic annotations for all the clusters are used to generate final annotations for the new image; probability (confidence) is determined for degree of similarities in cluster of images)




Agosta discloses providing the batch of images to a first manual annotator and a second manual annotator to generate a first manual annotation set and a second manual annotation set and determining a first confidence score associated with the first manual annotator related to the first manual annotation set and a second confidence score associated with the second manual annotator related to the second manual annotation set; (Agosta, [0012], discloses each classifier can be trained by a set of images that are manually labeled by their pixel types and scene characteristics. The first classifier may reduce the image to a manageably small amount of data that has adequate rich features to serve as input to the second classifier. The second classifier may classify the scene in several dependent dimensions that correspond to a set of scene variables by characterizing the scene using the most likely combination of labels associated with the scene variables. The system may generate a list of labels (e.g., one label from each scene variable category) as output when analyzing a video sequence. For example, the second classifier can be implemented as a first and second classifiers manually labeled by scene characteristics and labels (annotations) are given values indicating confidence (or uncertainty) of labels attached to images) and 

assessing a preferred of the first manual annotator and the second manual annotator by comparing the first confidence score and the second confidence score. (Agosta, [0012], [0146], Figure 7A-7B, discloses each classifier can be trained, for example, by a set of images that are manually labeled by their pixel types and scene characteristics. The first classifier may reduce the image to a manageably small amount of data that has adequate rich features to serve as input to the second classifier. The second classifier may classify the scene in several dependent dimensions that correspond to a set of scene variables by characterizing the scene using the most likely combination of labels associated with the scene variables. The system may generate a list of labels (e.g., one label from each scene variable category) as output when analyzing a video sequence. For example, the second classifier can be implemented as a probabilistic model where a set of scene variable nodes associated with the labels can be designated as output nodes. The list of labels can have a value indicating the uncertainty or confidence of the labels attached to the image; A comparison between Table 1 and Table 4 indicates that on average the precisions and recalls for labels of the scene variable surroundings outperform the precisions and recalls for labels of the scene variable road obstacles. This is because the classification of the scene variable Figure 7A discloses tables side by side comparisons of confidence values of object recognition in image data sets)

Mei discloses the claimed invention except for the manual annotation of images and determining confidence values to the annotated images and comparing the confidence values. Agosta teaches that it is known to manually label images and determining confidence values to the labeled images and comparing the confidence values in order to improve quality in images as object detection in image data sets. It would have been obvious to one having ordinary skill in the art at the time the invention was made to improve semantic clustering of image data sets by further labeling the similar sets of data manually and assigning confidence values to annotated data, as taught by Agosta in order to improve quality of object detection in images by comparing the confidence values of each image data sets.

Regarding Claim 2, 
The combination of Mei and Agosta further discloses wherein the plurality of parallel semantic segmentation models comprises a plurality of parallel echo state 

Regarding Claim 3, 
The combination of Mei and Agosta further discloses wherein the initial annotated dataset comprises fewer than 100 annotated images.   (Agosta, [0012], discloses each classifier can be trained by a set of images that are manually labeled by their pixel types and scene characteristics. The first classifier may reduce the image to a manageably small amount of data that has adequate rich features to serve as input to the second classifier. The second classifier may classify the scene in several dependent dimensions that correspond to a set of scene variables by characterizing the scene using the most likely combination of labels associated with the scene variables. The system may generate a list of labels (e.g., one label from each scene variable category) as output when analyzing a video sequence. For example, the second classifier can be implemented as a probabilistic model where a set of scene variable nodes associated with the labels can be designated as output nodes. The list of labels can have a value indicating the uncertainty or confidence of the labels attached to the image; smaller set of image data is annotated to reduce computational complexities). Additionally, the rational and motivation to combine the references Mei and Agosta as applied in claim 1 apply to this claim.



The combination of Mei and Agosta further discloses wherein the method is used in training an autonomous driving/advanced driver assistance system of a vehicle.  (Agosta, [0009], discloses the scene classification technology disclosed herein is capable of classifying a scene type by analyzing an image stream from a moving platform where multiple, simultaneous classifications are generated for the image. The scene classification may capture the gist of the current view over multiple (e.g., two or more) dimensions. The scene classification technology also includes novel systems and methods for predicting characteristics of the scene. The scene classification output has numerous beneficial uses for advising and assisting a driver as discussed below in more detail; scene classification for driving assistance is determined). Additionally, the rational and motivation to combine the references Mei and Agosta as applied in claim 1 apply to this claim.

Claims 8-10 and 14 recite computer program with program instructions corresponding to the method steps recited in Claims 1-3 and 7 respectively. Therefore, the recited instructions of the computer program Claims 8-10 and 14 are mapped to the proposed combination in the same manner as the corresponding steps of Claims 1-3 and 7 respectively. Additionally, the rationale and motivation to combine the Mei and Agosta references presented in rejection of Claim 1, apply to these claims.

Furthermore, the combination of Mei and Agosta further discloses A non-transitory computer readable medium stored in a memory and executed by a processor e system 100 can include one or more moving platforms 135). 

Claims 15-17 recite system with elements corresponding to the method steps recited in Claims 1-3 respectively. Therefore, the recited elements of the system Claims 15-17 are mapped to the proposed combination in the same manner as the corresponding steps of Claims 1-3 respectively. Additionally, the rationale and motivation to combine the Mei and Agosta references presented in rejection of Claim 1, apply to these claims.
e system 100 can include one or more moving platforms 135). 

Claims 5, 12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Mei as modified by Agosta, and further in view of Guttman et al. (US Pub No. 20190294999 A1). The teachings of Mei and Agosta have been discussed previously. 

Regarding Claim 5, 

Guttman discloses wherein the first confidence score and the second confidence score are each determined by finding pairs of IOUs and F is for each manual annotated image of each manual annotation set.   (Guttmann, [0221], discloses the machine learning algorithm of Step 1320 may be trained using the first plurality of training examples of Step 1320 and the first group of values of hyper parameters obtained by Step 1310 to generate a first inference model, for example as described above. Further, in some examples, the first inference model and a plurality of testing examples may be used to generate a first plurality of outputs corresponding to the testing examples. Further, in some examples, the first plurality of outputs and a plurality of desired results may be compared to determine the first result obtained by Step 1320. For example, the first result obtained by Step 1320 may be based on a function of one or more of numbers and/or ratios of false positives and/or false negatives and/or true positives and/or true negatives, accuracy, precision, F score, confusion matrix, other statistics about classification errors, sum of absolute errors, sum of square errors, other statistics about regression errors, intersection over union scores, segmentation error scores, clustering error scores; For example, a similarity between two values may be calculated as a monotonically decreasing function of a distance between the two values, as a function of a correlation between the two values, and so forth. In another example, a similarity between two regions may be calculated using the Jaccard index of the two IOUs and Dice are determined of images)
The combination of Mei Agosta discloses the claimed invention except for the determining of first and second confidence scores by finding pairs of IOUs and F . Guttman teaches that it is known to determine IOUs and Fs to determine similarity confidence scores in image matching techniques. It would have been obvious to one having ordinary skill in the art at the time the invention was made to improve semantic clustering and matching of image data sets by further using specific IOUs and Fs, as taught by Guttman in order to improve quality of object detection in images by accurate determination of accurate intersection of union and dice values. 


Claims 12 and 19 recite computer program and system with program instructions and elements corresponding to the method steps recited in Claim 5. Therefore, the recited instructions of the computer program and elements of system Claims 12 and 19 are mapped to the proposed combination in the same manner as the corresponding steps of Claim 5. Additionally, the rationale and motivation to combine the Mei, Agosta and Guttman references presented in rejection of Claim 5, apply to these claims.


Allowable Subject Matter
Claim 4, 6, 11, 13, 18, 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.



Reasons for Allowance

Claims 4, 11 and 18:  Claims 4, 11 and 18 recite limitations – “wherein the confidence metric is computed with the plurality of annotation regional proposals as inputs by computing intersection-over-union (IOU) and Dice (F1) scores for regional proposal pairs and computing a confidence (confidp) comprising a mean over a variance, where a denominator is a standard deviation between paired IOU or F1 scores”, in combination with features of independent claims are not disclosed by cited prior art references. The specific features of computing, IOUs, Dice, Confidence, mean over variance of which denominator consists of standard deviation between paired IOU and F1 in combination are not disclosed by cited prior art references. Therefore, Claims 4, 11 and 18 are objected as allowable subject matter. 

Claims 6, 13 and 20: Claims 6, 13 and 20 recite limitations – “wherein assessing the preferred of the first manual annotator and the second manual annotator comprises: determining whether the first confidence score and the second confidence score are below a predetermined threshold and declaring a quality assessment automation failure and providing the first manual annotation set and the second manual annotation set to a master manual annotator for analysis if determined that the first confidence score and the second confidence score are below the predetermined threshold; and determining whether the first confidence score and the second confidence score are different to a predetermined degree and declaring a quality assessment automation success and selecting the preferred of the first manual annotator and the second manual annotator 


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US 20180136000 A1
US 11048979 B1

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PINALBEN V PATEL whose telephone number is (571)270-5872. The examiner can normally be reached M-F: 10am - 8pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on (571)272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/Pinalben Patel/Examiner, Art Unit 2661