DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 09/03/2021 has been entered.
Allowable Subject Matter
Claims 8 and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Response to Arguments
Applicant’s arguments, see pg. 10, filed 09/03/2021, with respect to 35 USC § 101
have been fully considered and are persuasive.  The 101 rejection of the Office Action of 06/04/2021 has been withdrawn. 
Applicant’s arguments with respect to claims 1 and 11 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 7, 9, 11-12, 17, 19, and 21-22 are rejected under 35 U.S.C. 103 as being unpatentable over Vijayanarasimhan, Sudheendra et al., "Large-scale live active learning: Training object detectors with crawled data and crowds." International journal of computer vision 108.1-2 (2014)( “Vijayanarasimhan”) in view of Long, Chengjiang, et al. "Active visual recognition with expertise estimation in crowdsourcing." Proceedings of the IEEE International Conference on Computer Vision. 2013(“Chengjiang”) and in view of Russakovsky, Olga et al. "Best of both worlds: human-machine collaboration for object annotation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015(“Russakovsky”) and further in view of Welinder, Peter, et al. "The multidimensional wisdom of crowds." Advances in neural information processing systems 23 (2010)(“Welinder”).  
Regarding claim 1, Vijayanarasimhan in view of Chengjiang and in view of Russakovsky teaches a data annotation server system, comprising: a processor(Vijayanarasimhan, pg. 112, sec. 4.6, “Our times are based on a dual-core 2.8 GHz CPU…”); and a memory(Vijayanarasimhan, pg. 104, sec. 4, “We fix                         
                            
                                
                                    N
                                
                                
                                    ρ
                                
                            
                        
                     = 500 and                         
                            
                                
                                    ϵ
                                
                                
                                    '
                                
                            
                        
                    = 0.01 for the hash table”) wherein the processor: provides at least one subset of source data to at least two annotator devices, wherein each subset of source data includes multiple pieces of data(Vijayanarasimhan pg. 104, sec. 3.5, fig. 6,  “Training our detector entails learning the linear SVM weights in Eq. 2 to distinguish windows that contain the object of interest from all others.” Note: It is being interpreted that the detector represents at least one annotator device); obtains a set of annotation data for each of the pieces of data in the at least one subset of a source data from the at least two annotator devices (Vijayanarasimhan, pgs. 99-100, fig.1, fig.2(c), “we propose an object model consisting of a root window r, multiple part windows {p1, . . . , pP} that overlap the root, and context windows {c1, . . . , cC} surrounding it… Let O = [r, p1, . . . , pP, c1, . . . , cC] denote a candidate object configuration within an image, and let φ(W) denote the sparse feature encoding for local image descriptors extracted from a given sub-window W…[t]he detector scores a candidate configuration as a simple linear sum: where w denotes the learned classifier weights, which we obtain with SVM training.” Note: It is being interpreted that that O represents a set of annotation data)classifies each piece of data in the source data using a machine classifier (Vijayanarasimhan, pg. 103, sec. 3.3, fig. 4, “We search only those [unlabeled] examples, i.e., we compute |wT φ(Oi )| = | f (Oi )| for each one, and rank them in order of increasing value. At this point, each unlabeled image is Note: It is being interpreted that the scores associated with each unlabeled image represents classifies the source). 
Vijayanarasimhan does not teach: generates annotator skill data describing the skill of each annotator device based on the set of annotation data and the classification of the source data from the machine classifier; generates a predicted label and a confidence level for each piece of data in the source data based on the annotator skill data, the set of annotation data, and the classification of the source data from the machine classifier; and trains the machine classifier using the annotator skill data, the set of annotation data, the predicted label, and the confidence level.  
However, Chengjiang teaches: generates annotator skill data describing the skill of each annotator device based on the set of annotation data (Chengjiang, pgs. 3002-3003, “The conditional likelihood probability                         
                            p
                            (
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                            |
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            ,
                             
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            )
                        
                     is also modeled as a flipping noise model, i.e.,                         
                             
                            p
                            
                                
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            ε
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            =
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            Θ
                            
                                
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                            +
                            
                                
                                    1
                                    -
                                    
                                        
                                            ε
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            Θ
                            
                                
                                    -
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                            .
                             
                        
                    Intuitively, with probability                        
                            1
                            -
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                        
                     will be a flipped version of                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                    . Therefore, the larger                         
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                        
                     is, the higher the probability that                         
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                        
                     will agree with the true label                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                    , and vice versa…To online estimate the quality of both the labels and labelers, we need to online estimate the parameters                         
                            
                                
                                    ε
                                
                                →
                            
                            =
                            {
                            
                                
                                    ξ
                                
                                
                                    g
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            
                                                
                                                    ε
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                                
                                    j
                                    =
                                    1
                                
                                
                                    M
                                
                            
                            }
                        
                     which represent the overall label quality and label quality of each labeler.” Chengjiang teaches                         
                            
                                
                                    ε
                                
                                →
                            
                            =
                            {
                            
                                
                                    ξ
                                
                                
                                    g
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            
                                                
                                                    ε
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                                
                                    j
                                    =
                                    1
                                
                                
                                    M
                                
                            
                            }
                        
                     which represent the overall label quality and label quality of each labeler (i.e. generates annotator skill data describing the skill of each annotator device)                         
                            p
                            (
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                            |
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            ,
                             
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            )
                        
                     (i.e. based on the set of annotation data)) and the classification of the source data from the machine classifier(Chengjiang, pg. 3003, “Figure 1: Graphical model of the proposed Gaussian process classifier, with multiple noisy labels from the crowds.” Chengjiang teaches Figure 1: Graphical model of the proposed Gaussian process classifier, with multiple noisy labels from the crowds (i.e. the classification of the source data from the machine classifier)); 
generates a predicted label and a confidence level for each piece of data in the source data based on the annotator skill data, the set of annotation data(Chengjiang, pgs. 3002-3003, “To predict the label                         
                            
                                
                                    y
                                
                                
                                    u
                                
                            
                        
                     of a                         
                            
                                
                                    x
                                
                                
                                    u
                                
                            
                        
                     we need to solve the following Bayesian inference problem                         
                            p
                            (
                            
                                
                                    y
                                
                                
                                    u
                                
                            
                            |
                            
                                
                                    x
                                
                                
                                    u
                                
                            
                            ,
                             
                            
                                
                                    D
                                
                                
                                    L
                                
                            
                            )
                        
                    …The conditional likelihood probability                         
                            p
                            (
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                            |
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            ,
                             
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            )
                        
                     is also modeled as a flipping noise model, i.e.,                         
                             
                            p
                            
                                
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            ε
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            =
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            Θ
                            
                                
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                            +
                            
                                
                                    1
                                    -
                                    
                                        
                                            ε
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            Θ
                            
                                
                                    -
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                            .
                             
                        
                    Intuitively, with probability                        
                            1
                            -
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                        
                     will be a flipped version of                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                    . Therefore, the larger                         
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                        
                     is, the higher the probability that                         
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                        
                     will agree with the true label                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                    , and vice versa…To online estimate the quality of both the labels and labelers, we need to online estimate the parameters                         
                            
                                
                                    ε
                                
                                →
                            
                            =
                            {
                            
                                
                                    ξ
                                
                                
                                    g
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            
                                                
                                                    ε
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                                
                                    j
                                    =
                                    1
                                
                                
                                    M
                                
                            
                            }
                        
                     which represent the overall label quality and label quality of each labeler.” Chengjiang teaches To predict the label                         
                            
                                
                                    y
                                
                                
                                    u
                                
                            
                        
                     of a                         
                            
                                
                                    x
                                
                                
                                    u
                                
                            
                        
                     we need to solve the following Bayesian inference problem                          
                            p
                            (
                            
                                
                                    y
                                
                                
                                    u
                                
                            
                            |
                            
                                
                                    x
                                
                                
                                    u
                                
                            
                            ,
                             
                            
                                
                                    D
                                
                                
                                    L
                                
                            
                            )
                        
                     (i.e. generates a predicted label and a confidence level for each piece of data in the source data)                        
                             
                            
                                
                                    ε
                                
                                →
                            
                            =
                            {
                            
                                
                                    ξ
                                
                                
                                    g
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            
                                                
                                                    ε
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                                
                                    j
                                    =
                                    1
                                
                                
                                    M
                                
                            
                            }
                        
                     which represent the overall label quality and label quality of each labeler (i.e. based on the annotator skill data) The conditional likelihood probability                         
                            p
                            (
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                            |
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            ,
                             
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            )
                        
                     (i.e. the set of annotation data)), the classification of the source data from the machine classifier(Chengjiang, pg. 3003, “Figure 1: Graphical model of the proposed Gaussian process classifier, with multiple noisy labels from the crowds.” Chengjiang teaches Figure 1: Graphical model of the proposed Gaussian process classifier, with multiple noisy labels from the crowds (i.e. the classification of the source data from the machine classifier)), and the difficulty of that piece of source data(Chengjiang, pg. 3003, “For active sample selection, a criterion that can readily be adopted is the entropy                         
                            H
                            
                                
                                    
                                        
                                            y
                                        
                                        
                                            u
                                        
                                    
                                
                            
                            =
                             
                            -
                            
                                
                                    ∑
                                    
                                        
                                            
                                                y
                                            
                                            
                                                u
                                            
                                        
                                        ∈
                                        {
                                        1
                                        ,
                                        -
                                        1
                                        }
                                    
                                
                                
                                    p
                                    
                                        
                                            
                                                
                                                    y
                                                
                                                
                                                    u
                                                
                                            
                                        
                                        
                                            
                                                
                                                    x
                                                
                                                
                                                    u
                                                
                                            
                                            ,
                                             
                                            
                                                
                                                    D
                                                
                                                
                                                    L
                                                
                                            
                                        
                                    
                                    
                                        
                                            log
                                        
                                        ⁡
                                        
                                            p
                                            (
                                            
                                                
                                                    y
                                                
                                                
                                                    u
                                                
                                            
                                            |
                                            
                                                
                                                    x
                                                
                                                
                                                    u
                                                
                                            
                                            ,
                                             
                                            
                                                
                                                    D
                                                
                                                
                                                    L
                                                
                                            
                                            )
                                        
                                    
                                
                            
                        
                     of the                         
                            
                                
                                    y
                                
                                
                                    u
                                
                            
                        
                     on unlabeled data                         
                            
                                
                                    x
                                
                                
                                    u
                                
                            
                        
                    . We select the most uncertain unlabeled example to be labeled, i.e.,                         
                            
                                
                                    x
                                
                                
                                    u
                                
                                
                                    *
                                
                            
                            =
                            
                                
                                    arg
                                
                                ⁡
                                
                                     
                                    
                                        
                                            m
                                            a
                                            x
                                        
                                        
                                            
                                                
                                                    x
                                                
                                                
                                                    u
                                                
                                            
                                            ∈
                                            
                                                
                                                    X
                                                
                                                
                                                    u
                                                
                                            
                                        
                                    
                                    H
                                    (
                                    
                                        
                                            y
                                        
                                        
                                            u
                                        
                                    
                                    )
                                
                            
                        
                    .” Chengjiang teaches We select the most uncertain unlabeled example to be labeled, i.e.,                         
                            
                                
                                    x
                                
                                
                                    u
                                
                                
                                    *
                                
                            
                            =
                            
                                
                                    arg
                                
                                ⁡
                                
                                     
                                    
                                        
                                            m
                                            a
                                            x
                                        
                                        
                                            
                                                
                                                    x
                                                
                                                
                                                    u
                                                
                                            
                                            ∈
                                            
                                                
                                                    X
                                                
                                                
                                                    u
                                                
                                            
                                        
                                    
                                    H
                                    (
                                    
                                        
                                            y
                                        
                                        
                                            u
                                        
                                    
                                    )
                                
                            
                        
                      (i.e. the difficulty of that piece of source data)); 
 and trains the machine classifier using the annotator skill data, the set of annotation data, the predicted label, and the confidence level (Chengjiang, pgs. 3002-3003, “To predict the label                         
                            
                                
                                    y
                                
                                
                                    u
                                
                            
                        
                     of a                         
                            
                                
                                    x
                                
                                
                                    u
                                
                            
                        
                     we need to solve the following Bayesian inference problem                         
                            p
                            (
                            
                                
                                    y
                                
                                
                                    u
                                
                            
                            |
                            
                                
                                    x
                                
                                
                                    u
                                
                            
                            ,
                             
                            
                                
                                    D
                                
                                
                                    L
                                
                            
                            )
                        
                    …The conditional likelihood probability                         
                            p
                            (
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                            |
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            ,
                             
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            )
                        
                     is also modeled as a flipping noise model, i.e.,                         
                             
                            p
                            
                                
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            ε
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            =
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            Θ
                            
                                
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                            +
                            
                                
                                    1
                                    -
                                    
                                        
                                            ε
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            Θ
                            
                                
                                    -
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                            .
                             
                        
                    Intuitively, with probability                        
                            1
                            -
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                        
                     will be a flipped version of                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                    . Therefore, the larger                         
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                        
                     is, the higher the probability that                         
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                        
                     will agree with the true label                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                    , and vice versa…To online estimate the quality of both the labels and labelers, we need to online estimate the parameters                         
                            
                                
                                    ε
                                
                                →
                            
                            =
                            {
                            
                                
                                    ξ
                                
                                
                                    g
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            
                                                
                                                    ε
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                                
                                    j
                                    =
                                    1
                                
                                
                                    M
                                
                            
                            }
                        
                     which represent the overall label quality and label quality of each labeler.” Chengjiang teaches To predict the label                         
                            
                                
                                    y
                                
                                
                                    u
                                
                            
                        
                     of a                         
                            
                                
                                    x
                                
                                
                                    u
                                
                            
                        
                     we need to solve the following Bayesian inference problem                          
                            p
                            (
                            
                                
                                    y
                                
                                
                                    u
                                
                            
                            |
                            
                                
                                    x
                                
                                
                                    u
                                
                            
                            ,
                             
                            
                                
                                    D
                                
                                
                                    L
                                
                            
                            )
                        
                     (i.e. predicted label and confidence level)                        
                             
                            
                                
                                    ε
                                
                                →
                            
                            =
                            {
                            
                                
                                    ξ
                                
                                
                                    g
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            
                                                
                                                    ε
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                                
                                    j
                                    =
                                    1
                                
                                
                                    M
                                
                            
                            }
                        
                     which represent the overall label quality and label quality of each labeler (i.e. annotator skill data) The conditional likelihood probability                         
                            p
                            (
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                            |
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            ,
                             
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            )
                        
                     (i.e. the set of annotation data)). 
Accordingly, one of ordinary skill in the art would modify Vijayanarasimhan’s system in view of Chengjiang to teach: generates annotator skill data describing the skill of each annotator device based on the set of annotation data and the classification of the source data from the machine classifier; generates a predicted label and a confidence level for each piece of data in the source data based on the annotator skill data, the set of annotation data, and the classification of the source i.e., a group of noisy labelers). These two different statistics are modeled hierarchically with two levels of flip models. Expectation propagation…is used for approximate Bayesian inference of the posterior of the latent classification function. A generalized Expectation Maximization (GEM) algorithm is conducted to estimate both quantities. The resulting classifier is more resilient to label noises, adapting to the expertise of labelers.”). 
Vijayanarasimhan does not teach: and if the confidence level for a certain piece of data in the source data is greater than or equal to a certain threshold, accept the predicted label as an estimated ground truth for that piece of data.
However, Russakovsky teaches: and if the confidence level for a certain piece of data in the source data is greater than or equal to a certain threshold, accept the predicted label as an estimated ground truth for that piece of data(Russakovsky, pg. 2123, sec. 4.1, If both target utility U* and precision P* are requested, the system samples detections from Y into Y in decreasing order of probability while                         
                            E
                            
                                
                                    P
                                    r
                                    e
                                    c
                                    i
                                    s
                                    i
                                    o
                                    n
                                    
                                        
                                            Y
                                        
                                    
                                
                            
                            ≥
                        
                     P*...Since expected utility increases with every additional detection, this will correspond to the highest utility set Y under precision P*.  If                         
                            E
                            
                                
                                    U
                                    t
                                    i
                                    l
                                    i
                                    t
                                    y
                                    
                                        
                                            Y
                                        
                                    
                                
                            
                            ≥
                        
                     U* the constraints are satisfied. If not, we continue the labeling system).
Accordingly, one of ordinary skill in the art would modify Vijayanarasimhan’s system in view of Russakovsky to teach: and if the confidence level for a certain piece of data in the source data is greater than or equal to a certain threshold, accept the predicted label as an estimated ground truth for that piece of data. The motivation to do so would be to develop a data labeling scheme based on user provided constraints (Russakovsky, pg. 2122, sec. 3, “We present a policy for efficiently and accurately detecting objects in a given image. The input to the system is an image to annotate and a set of annotation constraints… On one end of the spectrum the requester can set the maximum budget to zero, and obtain the best automatic annotation of the image. On the other end she can set an infinite budget but specify 100% desired precision and 17 annotated objects per image, which will produce a policy for detailed annotation similar to that of the SUN dataset.”). 
Vijayanarasimhan does not teach: estimates the difficulty of each piece of source data of the source data based on the set of annotation data and the classification of the source data from the machine classifier. 
However, Welinder teaches estimates the difficulty of each piece of source data of the source data based on the set of annotation data and the classification of the source data from the machine classifier(Welinder, pgs. 3-5, “We assume that each annotator assigns the label                         
                            
                                
                                    l
                                
                                
                                    i
                                    j
                                
                            
                        
                     according to a linear classifier. The classifier is parameterized by a direction                         
                            
                                
                                    
                                        
                                            w
                                        
                                        
                                            j
                                        
                                    
                                
                                ^
                            
                        
                     of a decision plane and a bias                         
                            
                                
                                    
                                        
                                            τ
                                        
                                        
                                            j
                                        
                                    
                                
                                ^
                            
                        
                     . The label                         
                            
                                
                                    l
                                
                                
                                    i
                                    j
                                
                            
                        
                     is deterministically chosen, i.e.                         
                            
                                
                                    l
                                
                                
                                    i
                                    j
                                
                            
                        
                    =                         
                            I
                            (
                            
                                
                                    
                                        
                                            
                                                
                                                    w
                                                
                                                
                                                    j
                                                
                                            
                                        
                                        ^
                                    
                                    ,
                                     
                                    
                                        
                                            y
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                            ≥
                             
                            
                                
                                    
                                        
                                            τ
                                        
                                        ^
                                    
                                
                                
                                    j
                                
                            
                            )
                        
                    , where                         
                            I
                            (
                            ∙
                            )
                        
                     is the indicator function. It is possible to integrate out                         
                            
                                
                                    y
                                
                                
                                    i
                                    j
                                
                            
                        
                     and put                         
                            
                                
                                    l
                                
                                
                                    i
                                    j
                                
                            
                        
                     in direct dependence on                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                    ,                         
                            p
                            
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                            j
                                        
                                    
                                    =
                                    1
                                
                            
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                            ,
                             
                            
                                
                                    σ
                                
                                
                                    j
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            τ
                                        
                                        ^
                                    
                                
                                
                                    j
                                
                            
                            )
                            =
                            Φ
                            (
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            w
                                                        
                                                        
                                                            j
                                                        
                                                    
                                                
                                                ^
                                            
                                            ,
                                             
                                             
                                            
                                                
                                                    x
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    -
                                    
                                        
                                            
                                                
                                                    τ
                                                
                                                ^
                                            
                                        
                                        
                                            j
                                        
                                    
                                
                                
                                    
                                        
                                            σ
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            )
                        
                     where                         
                            Φ
                            (
                            ∙
                            )
                        
                     is the cumulative standard normal distribution, a sigmoidal-shaped function… Image difficulty is represented in the model by the value of                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     (see                         
                            
                                
                                    w
                                
                                
                                    '
                                
                            
                            ,
                             
                            
                                
                                    τ
                                
                                
                                    '
                                
                            
                        
                    ), images                         
                            
                                
                                    I
                                
                                
                                    i
                                
                            
                        
                     with                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     close to the plane will be more difficult for annotators to label.” Welinder teaches Image difficulty is represented in the model by the value of                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     (see Figure 2b). If there is a particular ground truth decision plane, (                        
                            
                                
                                    w
                                
                                
                                    '
                                
                            
                            ,
                             
                            
                                
                                    τ
                                
                                
                                    '
                                
                            
                        
                    ), images                         
                            
                                
                                    I
                                
                                
                                    i
                                
                            
                        
                     with                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     close to the plane will be more difficult for annotators to label (i.e. estimates the difficulty of each piece of source data of the source data based on the set of annotation data) We assume that each annotator assigns the label                         
                            
                                
                                    l
                                
                                
                                    i
                                    j
                                
                            
                        
                     according to a linear classifier                        
                             
                            p
                            
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                            j
                                        
                                    
                                    =
                                    1
                                
                            
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                            ,
                             
                            
                                
                                    σ
                                
                                
                                    j
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            τ
                                        
                                        ^
                                    
                                
                                
                                    j
                                
                            
                            )
                            =
                            Φ
                            (
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            w
                                                        
                                                        
                                                            j
                                                        
                                                    
                                                
                                                ^
                                            
                                            ,
                                             
                                             
                                            
                                                
                                                    x
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    -
                                    
                                        
                                            
                                                
                                                    τ
                                                
                                                ^
                                            
                                        
                                        
                                            j
                                        
                                    
                                
                                
                                    
                                        
                                            σ
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            )
                        
                     where                         
                            Φ
                            (
                            ∙
                            )
                        
                     is the cumulative standard normal distribution, a sigmoidal-shaped function (i.e. the classification of the source data from the machine classifier)). 
Accordingly, one of ordinary skill in the art would modify Vijayanarasimhan’s system in view of Welinder to teach: estimates the difficulty of each piece of source data of the source data based on the set of annotation data and the classification of the source data from the machine classifier. The motivation to do so would be to incorporate the use of machine learning aspects/parameters when analyzing the human annotation process (Welinder, pg. 8, “We have proposed a Bayesian generative probabilistic model for the annotation process. Given only binary labels of images from many different annotators, it is possible to infer not only the underlying class (or value) of the image, but also parameters such as image difficulty and annotator competence and bias… Besides estimating ground truth classes from binary labels, our model provides information that is valuable for defining loss functions and for training classifiers. For example, the image parameters estimated by our model could be taken into account for weighing different training examples, or, more generally, it could be used for a softer definition of ground truth.”).  
Regarding claim 2, Vijayanarasimhan in view of Chengjiang and in view of Russakovsky and further in view of Welinder  teaches the data annotation server system of claim 1, wherein the generates active learning data for at least one subset of source data, where the active learning data comprises instructions for annotating each piece of data in the at least one subset of source data(Vijayanarasimhan, pg. 103, fig. 4, sec. 3.3, “The function sign(                        
                            
                                
                                    u
                                
                                
                                    T
                                
                            
                            a
                        
                    ) returns 1 if                         
                            
                                
                                    u
                                
                                
                                    T
                                
                            
                            a
                        
                     a ≥ 0, and 0 otherwise, and u and v are sampled from a standard multivariate Gaussian, u, v ∼ N(0, I )… We use these functions to hash the crawled data into the table. Then, at each iteration of the active learning loop, we hash the current classifier as a query, and directly retrieve examples closest to its decision boundary… At this point, each unlabeled image is associated with the score of its jumping window with the smallest margin criterion value. Finally, the system issues a label request for the top T images under this ranking.” Note: It is being interpreted that the top T ranked images represents active learning data and the system issuing a label request represents instructions for annotating the subset of source data ); and provides the active learning data to the at least two annotator devices(Vijayanarasimhan, pg. 103, fig. 5, sec. 3.4, “To automatically obtain annotations on the actively selected examples, our system posts jobs on Mechanical Turk, where it can pay workers to provide labels. The system gathers the images containing the most uncertain bounding boxes, and the annotators are instructed to use a rectangle drawing tool to outline the object of interest with a bounding box.”).  
Regarding claim 7, Vijayanarasimhan in view of Chengjiang and in view of Russakovsky and further in view of Welinder  teaches the data annotation server system of claim 1, wherein the annotation data for each of the piece of data comprises a location of the annotation within the piece of data(Vijayanarasimhan, pg. 100, sec. 3.1.1, fig.1, fig.3, “Each window type (r, pi, or c j) uses these features to create its encoding φ(·). The root window provides a global summary of the object appearance…[s]imilarly, each part window summarizes the local features within it, discarding their positions; however, the location of each part is defined relative to the current root, ).
Regarding claim 9, Vijayanarasimhan in view of Chengjiang and in view of Russakovsky and further in view of Welinder teaches the data annotation server system of claim 1, wherein the annotation data comprises a bounding box identifying the location of the annotation within each of the pieces of source data (Vijayanarasimhan, pg. 98, sec. 1, “Rather than fill the data pool with some canned dataset, the system itself gathers possibly relevant images via keyword search (we use Flickr). It repeatedly surveys the data…and generates tasks on an online crowd-sourcing service (Mechanical Turk) to get the corresponding bounding box annotations”).  
Regarding claim 21, Vijayanarasimhan in view of Chengjiang and in view of Russakovsky and further in view of Welinder teaches the data annotation server system of claim 1, wherein the processor further: 
if the confidence level of a certain piece of data is less than the threshold: provides another at least one subset of source data to another at least one annotator device(Russakovsky, pg. 2123, sec. 4.1, “If both target utility U* and precision P* are requested, the system samples detections from Y into Y in decreasing order of probability while                         
                            E
                            
                                
                                    P
                                    r
                                    e
                                    c
                                    i
                                    s
                                    i
                                    o
                                    n
                                    
                                        
                                            Y
                                        
                                    
                                
                            
                            ≥
                        
                     P*...Since expected utility increases with every additional detection, this will correspond to the highest utility set Y under precision constraint P*.  If                         
                            E
                            
                                
                                    U
                                    t
                                    i
                                    l
                                    i
                                    t
                                    y
                                    
                                        
                                            Y
                                        
                                    
                                
                            
                            ≥
                        
                     U* the constraints are satisfied. If not, we continue the labeling system.”); 
obtains additional annotation data from the other at least one annotator device; classifies each piece of data in the source data using the machine classifier(Vijayanarasimhan, pg. 104, sec. 3.5, fig. 6, “To recap, the main loop consists of using the current classifier to generate candidate jumping windows, storing all candidates in a hash table, querying the hash table using ); 
generates other annotator skill data describing the skill of the other at least one annotator device based on the additional annotation data(Chengjiang, pgs. 3002-3003, “The conditional likelihood probability                         
                            p
                            (
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                            |
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            ,
                             
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            )
                        
                     is also modeled as a flipping noise model, i.e.,                         
                             
                            p
                            
                                
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            ε
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            =
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            Θ
                            
                                
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                            +
                            
                                
                                    1
                                    -
                                    
                                        
                                            ε
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            Θ
                            
                                
                                    -
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                            .
                             
                        
                    Intuitively, with probability                        
                            1
                            -
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                        
                     will be a flipped version of                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                    . Therefore, the larger                         
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                        
                     is, the higher the probability that                         
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                        
                     will agree with the true label                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                    , and vice versa…To online estimate the quality of both the labels and labelers, we need to online estimate the parameters                         
                            
                                
                                    ε
                                
                                →
                            
                            =
                            {
                            
                                
                                    ξ
                                
                                
                                    g
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            
                                                
                                                    ε
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                                
                                    j
                                    =
                                    1
                                
                                
                                    M
                                
                            
                            }
                        
                     which represent the overall label quality and label quality of each labeler.” Chengjiang teaches                         
                            
                                
                                    ε
                                
                                →
                            
                            =
                            {
                            
                                
                                    ξ
                                
                                
                                    g
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            
                                                
                                                    ε
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                                
                                    j
                                    =
                                    1
                                
                                
                                    M
                                
                            
                            }
                        
                     which represent the overall label quality and label quality of each labeler (i.e. generates annotator skill data describing the skill of the other at least one annotator device)                         
                            p
                            (
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                            |
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            ,
                             
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            )
                        
                     (i.e. based on the additional annotation data)), the classification of the source data from the machine classifier (Chengjiang, pg. 3003, “Figure 1: Graphical model of the proposed Gaussian process classifier, with multiple noisy labels from the crowds.” Chengjiang teaches Figure 1: Graphical model of the proposed Gaussian process classifier, with multiple noisy labels from the crowds (i.e. the classification of the source data from the machine classifier)), and the annotation data from the at least one annotator device(Chengjiang, pgs. 3001-3002, “We denote                         
                            
                                
                                    t
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    t
                                                
                                                
                                                    i
                                                    j
                                                
                                            
                                        
                                    
                                
                                
                                    j
                                    =
                                    1
                                
                                
                                    M
                                
                            
                        
                     as the set of labels from the M labelers… The conditional likelihood probability                         
                            p
                            (
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                            |
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            ,
                             
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            )
                        
                     is also modeled as a flipping noise model, i.e.,                         
                             
                            p
                            
                                
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            ε
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            =
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            Θ
                            
                                
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                            +
                            
                                
                                    1
                                    -
                                    
                                        
                                            ε
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            Θ
                            
                                
                                    -
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                            .
                             
                        
                    Intuitively, with probability                        
                            1
                            -
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                        
                     will be a flipped version of                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                    . Therefore, the                         
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                        
                     is, the higher the probability that                         
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                        
                     will agree with the true label                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                    , and vice versa.” Chengjiang teaches The conditional likelihood probability                         
                            p
                            (
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                            |
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            ,
                             
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            )
                        
                     (i.e. the annotation data) We denote                         
                            
                                
                                    t
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    t
                                                
                                                
                                                    i
                                                    j
                                                
                                            
                                        
                                    
                                
                                
                                    j
                                    =
                                    1
                                
                                
                                    M
                                
                            
                        
                     as the set of labels from the M labelers (i.e. from the at least one annotator device)); 
generates another predicted label and another confidence label for each piece of data in the source data based on the other annotator skill data, the additional annotation data, the annotation data from the at least one annotator device, the annotator skill data of the at least one annotator device(Chengjiang, pgs. 3001-3003, “We denote                         
                            
                                
                                    t
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    t
                                                
                                                
                                                    i
                                                    j
                                                
                                            
                                        
                                    
                                
                                
                                    j
                                    =
                                    1
                                
                                
                                    M
                                
                            
                        
                     as the set of labels from the M labelers…To predict the label                         
                            
                                
                                    y
                                
                                
                                    u
                                
                            
                        
                     of a                         
                            
                                
                                    x
                                
                                
                                    u
                                
                            
                        
                     we need to solve the following Bayesian inference problem                         
                            p
                            (
                            
                                
                                    y
                                
                                
                                    u
                                
                            
                            |
                            
                                
                                    x
                                
                                
                                    u
                                
                            
                            ,
                             
                            
                                
                                    D
                                
                                
                                    L
                                
                            
                            )
                        
                    …The conditional likelihood probability                         
                            p
                            (
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                            |
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            ,
                             
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            )
                        
                     is also modeled as a flipping noise model, i.e.,                         
                             
                            p
                            
                                
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                                
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            ε
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            =
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            Θ
                            
                                
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                            +
                            
                                
                                    1
                                    -
                                    
                                        
                                            ε
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            Θ
                            
                                
                                    -
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            t
                                        
                                        
                                            i
                                            j
                                        
                                    
                                
                            
                            .
                             
                        
                    Intuitively, with probability                        
                            1
                            -
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                        
                     will be a flipped version of                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                    . Therefore, the larger                         
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                        
                     is, the higher the probability that                         
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                        
                     will agree with the true label                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                    , and vice versa…To online estimate the quality of both the labels and labelers, we need to online estimate the parameters                         
                            
                                
                                    ε
                                
                                →
                            
                            =
                            {
                            
                                
                                    ξ
                                
                                
                                    g
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            
                                                
                                                    ε
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                                
                                    j
                                    =
                                    1
                                
                                
                                    M
                                
                            
                            }
                        
                     which represent the overall label quality and label quality of each labeler.” Chengjiang teaches To predict the label                         
                            
                                
                                    y
                                
                                
                                    u
                                
                            
                        
                     of a                         
                            
                                
                                    x
                                
                                
                                    u
                                
                            
                        
                     we need to solve the following Bayesian inference problem                          
                            p
                            (
                            
                                
                                    y
                                
                                
                                    u
                                
                            
                            |
                            
                                
                                    x
                                
                                
                                    u
                                
                            
                            ,
                             
                            
                                
                                    D
                                
                                
                                    L
                                
                            
                            )
                        
                     (i.e. generates a predicted label and a confidence level for each piece of data in the source data for each piece of data in the source data)                        
                             
                            
                                
                                    ε
                                
                                →
                            
                            =
                            {
                            
                                
                                    ξ
                                
                                
                                    g
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            
                                                
                                                    ε
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                                
                                    j
                                    =
                                    1
                                
                                
                                    M
                                
                            
                            }
                        
                     which represent the overall label quality and label quality of each labeler (i.e. based on the other annotator skill data) The conditional likelihood probability                         
                            p
                            (
                            
                                
                                    t
                                
                                
                                    i
                                    j
                                
                            
                            |
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            ,
                             
                            
                                
                                    ε
                                
                                
                                    j
                                
                            
                            )
                        
                     (i.e. the additional annotation data), We denote                         
                            
                                
                                    t
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    t
                                                
                                                
                                                    i
                                                    j
                                                
                                            
                                        
                                    
                                
                                
                                    j
                                    =
                                    1
                                
                                
                                    M
                                
                            
                        
                     as the set of labels from the M labelers (i.e. the annotation data from the at least one annotator device)                          
                            
                                
                                    
                                        
                                            
                                                
                                                    ε
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                                
                                    j
                                    =
                                    1
                                
                                
                                    M
                                
                            
                        
                     (i.e. the annotator skill data of the at least one annotator device)), and the classification of the source data from the machine classifier(Chengjiang, pg. 3003, “Figure 1: Graphical model of the proposed Gaussian process classifier, with multiple noisy labels from the crowds.” Chengjiang teaches Figure 1: Graphical model of the proposed Gaussian process classifier, with multiple noisy labels from the crowds (i.e. the classification of the source data from the machine classifier)); and 
if the confidence level for a certain piece of data is greater than or equal to a certain threshold, accepts the other predicted label as an estimated ground truth for that piece of data(Russakovsky, pg. 2123, sec. 4.1, If both target utility U* and precision P* are requested, the system samples detections from Y into Y in decreasing order of probability while                         
                            E
                            
                                
                                    P
                                    r
                                    e
                                    c
                                    i
                                    s
                                    i
                                    o
                                    n
                                    
                                        
                                            Y
                                        
                                    
                                
                            
                            ≥
                        
                     P*...Since expected utility increases with every additional detection, this will correspond to the highest utility set Y under precision constraint P*.  If                         
                            E
                            
                                
                                    U
                                    t
                                    i
                                    l
                                    i
                                    t
                                    y
                                    
                                        
                                            Y
                                        
                                    
                                
                            
                            ≥
                        
                     U* the constraints are satisfied. If not, we continue the labeling system).
It would have been obvious to one of ordinary skill in the art before the effective filing date
of the claimed invention to modify the teachings of Vijayanarasimhan with the above teachings of Russakovsky and Chengjiang for the same rationale stated at Claim 1.
Referring to independent claim 11, it is rejected on the same basis as independent claim 1 since they are analogous claims.
Referring to dependent claims 12, 17, 19, and 22 they are rejected on the same basis as dependent claims 2, 7, 9, and 21 since they are analogous claims.
Claims 5-6 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Vijayanarasimhan, Sudheendra et al., "Large-scale live active learning: Training object detectors with crawled data and crowds." International journal of computer vision 108.1-2 (2014)( Proceedings of the IEEE International Conference on Computer Vision. 2013(“Chengjiang”) and in view of Russakovsky, Olga et al. "Best of both worlds: human-machine collaboration for object annotation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015(“Russakovsky”) and in view of Welinder, Peter, et al. "The multidimensional wisdom of crowds." Advances in neural information processing systems 23 (2010)(“Welinder”) and in view of Niu, Xiao-Xiao et al., "A novel hybrid CNN–SVM classifier for recognizing handwritten digits." Pattern Recognition 45.4 (2012)(“Niu”) and further in view of Platt, John. "Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods." Advances in large margin classifiers 10.3 (1999)(“Platt”).
Regarding claim 5, Vijayanarasimhan in view of Chengjiang and in view of Russakovsky and further in view of Welinder  teaches the data annotation server system of claim 1, wherein the machine classifier comprises a linear support vector machine (SVM) classifying features (Vijayanarasimhan, pg. 104, sec.4, “We use dense SIFT at three scales (16, 24, 32 pixels) with grid spacing of 4 pixels, for 30K features per image… [w]e use the fast linear SVM code….”). 
Vijayanarasimhan in view of Chengjiang and in view of Russakovsky and further in view of Welinder  does not teach: identified using a convolutional neural network 
However Niu teaches identified using a convolutional neural network(Niu, pg. 1320, sec. 2.2, fig. 2,  “A Convolutional Neural Network… is a multi-layer neural network with a deep supervised learning architecture…[t]he feature extractor contains feature map layers and retrieves discriminating features from the raw images.”).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify, Vijayanarasimha’s method in view of  and in view of Russakovsky and in view of Welinder and further in view of Niu to teach identified using a convolutional neural network. The motivation to do so would be to have features that come directly from the image pixels and are automatically extracted by the CNN in the hybrid CNN+SVM architecture (Niu, pg. 1320, sec. 2.2,  “A Convolutional Neural Network is a multi-layer neural network with a deep supervised learning architecture that can be viewed as the composition of two parts [with an] an automatic feature extractor…[t]he feature extractor contains feature map layers and retrieves discriminating features from the raw images....” & see also pg. 1325 sec. 5, “[T]he salient features can be automatically extracted by the hybrid model, while the success of most other traditional classifiers relies largely on the retrieval of good hand-designed features which is a laborious and time- consuming task.”). 
Vijayanarasimhan in view of Chengjiang and in view of Russakovsky and further in view of Welinder does not teach followed by probability calibration using Platt scaling.
However Platt teaches followed by probability calibration using Platt scaling(Platt, pg. 4, “The parameters A and B are found by minimizing the negative log likelihood of the training data, which is a cross-entropy error function [as seen in (11)] where                         
                            
                                
                                    p
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    1
                                
                                
                                    1
                                    +
                                    e
                                    x
                                    p
                                    (
                                    A
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    +
                                    B
                                    )
                                
                            
                        
                    .”  ).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify, Vijayanarasimha’s method in view of Chengjiang and in view of Russakovsky and in view of Welinder  and further in view of Platt to teach: followed by probability calibration using Platt scaling. The motivation to do so would be to calculate a posterior probability for a SVM while still maintaining the SVM’s sparseness (Platt, pg. 9, sec. 4, “The SVM + sigmoid combination preserves the sparseness of the SVM….” ).  
Regarding claim 6, Vijayanarasimhan in view of Chengjiang and in view of Russakovsky and in view of Welinder  and in view of Niu and further in view of Platt teaches the data annotation the machine classifier estimates the label data for each piece of source data by calculating the confidence in the set of annotation data for the piece of source data with a probability estimate (Platt, pg. 3, sec. 2.1, detailing the probability:                          
                            P
                            
                                
                                    y
                                    =
                                    1
                                
                                
                                    f
                                
                            
                            =
                            
                                
                                    1
                                
                                
                                    1
                                    +
                                    e
                                    x
                                    p
                                    ⁡
                                    (
                                    A
                                    f
                                    +
                                    B
                                    )
                                
                            
                        
                    , where                         
                            P
                            
                                
                                    y
                                    =
                                    1
                                
                                
                                    f
                                
                            
                        
                     is the class- conditional probability for a positive label given data f  which in other words forms a posterior probability estimate)                         
                            p
                            
                                
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                     
                                    θ
                                
                            
                            =
                            σ
                            (
                            γ
                            θ
                            ∙
                            ϕ
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            )
                        
                     where                        
                             
                            ϕ
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     is a CNN feature vector,                         
                            θ
                        
                     is a learned SVM weight vector(Niu, pg.1319, sec. 2.1.1, detailing one constraint on the two class soft margin SVM consists as the following:                        
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            (
                            
                                
                                    
                                        
                                            w
                                        
                                        
                                            T
                                        
                                    
                                
                                →
                            
                            ϕ
                            (
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                    )
                                
                                →
                            
                            +
                            b
                            )
                        
                    where                         
                            
                                
                                    w
                                
                                →
                            
                        
                     is an m-dimensional weight vector, and                         
                            ϕ
                            (
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                    )
                                
                                →
                            
                        
                     is a function that maps the training data                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     to a higher dimensional feature space),                         
                            γ
                        
                     is probability calibration scalar from Platt scaling, and                         
                            σ
                        
                     () is the sigmoid function(Platt, pg. 4, detailing the following equation:                         
                             
                            
                                
                                    p
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    1
                                
                                
                                    1
                                    +
                                    e
                                    x
                                    p
                                    (
                                    A
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    +
                                    B
                                    )
                                
                            
                        
                     where                         
                            
                                
                                    p
                                
                                
                                    i
                                
                            
                        
                     is probability calibration from Platt scaling and                         
                            
                                
                                    1
                                
                                
                                    1
                                    +
                                    e
                                    x
                                    p
                                    (
                                    A
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    +
                                    B
                                    )
                                
                            
                        
                     is the sigmoid function.). 
Referring to dependent claims 15-16, they are rejected on the same basis as dependent claims 5-6 since they are analogous claims.
Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Vijayanarasimhan, Sudheendra et al., "Large-scale live active learning: Training object detectors with crawled data and crowds." International journal of computer vision 108.1-2 (2014)( “Vijayanarasimhan”) in view of Long, Chengjiang, et al. "Active visual recognition with expertise estimation in crowdsourcing." Proceedings of the IEEE International Conference on Computer Vision. 2013(“Chengjiang”) and in view of Russakovsky, Olga et al. "Best of both worlds: human-machine collaboration for object annotation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015(“Russakovsky”) and in view of Welinder, Peter, et al. "The multidimensional wisdom of crowds." Advances in neural information processing systems 23 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012(“Yao”). 
Regarding claim 10, Vijayanarasimhan in view of Chengjiang and in view of Russakovsky and further in view of Welinder teaches the data annotation server system of claim 9, wherein generating the confidence level for each piece of data in the source data comprises calculating the risk associated with the annotation data associated with each piece of data in the source data(Chengjiang, pg. 3003, sec. 4, fig. 1, “For active sample selection, a criterion that can readily be adopted is the entropy                         
                            H
                            
                                
                                    
                                        
                                            y
                                        
                                        
                                            u
                                        
                                    
                                
                            
                            =
                             
                            -
                            
                                
                                    ∑
                                    
                                        
                                            
                                                y
                                            
                                            
                                                u
                                            
                                        
                                        ∈
                                        {
                                        1
                                        ,
                                        -
                                        1
                                        }
                                    
                                
                                
                                    p
                                    
                                        
                                            
                                                
                                                    y
                                                
                                                
                                                    u
                                                
                                            
                                        
                                        
                                            
                                                
                                                    x
                                                
                                                
                                                    u
                                                
                                            
                                            ,
                                             
                                            
                                                
                                                    D
                                                
                                                
                                                    L
                                                
                                            
                                        
                                    
                                    
                                        
                                            log
                                        
                                        ⁡
                                        
                                            p
                                            (
                                            
                                                
                                                    y
                                                
                                                
                                                    u
                                                
                                            
                                            |
                                            
                                                
                                                    x
                                                
                                                
                                                    u
                                                
                                            
                                            ,
                                             
                                            
                                                
                                                    D
                                                
                                                
                                                    L
                                                
                                            
                                            )
                                        
                                    
                                
                            
                        
                     of the predicted label                         
                            
                                
                                    y
                                
                                
                                    u
                                
                            
                        
                     on unlabeled data                         
                            
                                
                                    x
                                
                                
                                    u
                                
                            
                        
                    .” Note: It is being interpreted that the                         
                            H
                            
                                
                                    
                                        
                                            y
                                        
                                        
                                            u
                                        
                                    
                                
                            
                        
                      represents the risk associated with the annotation data associated with each piece of data in the source data).
Vijayanarasimhan in view of Chengjiang and in view of Russakovsky and further in view of Welinder  does not teach: by calculating when a pair of bounding boxes match by calculating if their area of intersection over union is at least 50%.
However, Yao teaches by calculating when a pair of bounding boxes match by calculating if their area of intersection over union is at least 50% (Yao, pg. 3246, sec. 5, “For all experiments, we count a detection as a true positive if the intersection-union ratio of the detection and ground truth bounding box is greater than 0.5.”).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify, Vijayanarasimha’s method in view of Chengjiang and in view of Russakovsky and in view of Welinder and further in view of Yao to teach: further calculates the risk associated with a plurality of annotations for a piece of source data by calculating when a pair of bounding boxes match by calculating if their area of intersection over union is at least 50%. The motivation to do so would be to have a system that is able to detect  
Referring to dependent claim 20, it is rejected on the same basis as dependent claim 10 since they are analogous claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Kovashka, Adriana, et al. "Crowdsourcing in computer vision." arXiv preprint arXiv:1611.02145 (2016)(details various crowdsourcing methods and attributes to label for machine learning tasks). 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Adam Clark Standke whose telephone number is (571)270-1806. The examiner can normally be reached 10AM-6:30PM M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
 (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Adam Clark Standke
Assistant Examiner
Art Unit 2129



/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129