DETAILED ACTION
This is the response to applicant’s amendment action regarding application number 16/418,232, filed May 21, 2019.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
The amendment filed October 4, 2021 has been entered. Examiner acknowledges receipt of Amendments to Application 16/418,232, which include: Amendments to the Claims pp.2-8, and Remarks pp.8-11 (containing applicant’s amendments). 
Regarding applicant’s Remarks on p.8, examiner has acknowledged Claims 1, 14- 19, 21-23, 25-32, and 34 have been amended. Claims 1, 14-19, 21-23, 25-32, and 34 remain pending in the application. 
Regarding applicant’s Remarks on p.8, examiner has acknowledged applicant’s Amendments to the Claims for Claims 21, 25, 26, and 31 have overcome each and every claim objection previously set forth in the Non-Final Office Action mailed July 2, 2021, and therefore the respective claim objections are now withdrawn. 
Regarding applicant’s Remarks on p.8, examiner acknowledges applicant’s Amendments to the Claims have resolved the indefiniteness and lack of antecedent issues identified in Claims 1, 14-15, 21-22, 25-28, and 34, and therefore the respective §112(b) rejections previously set forth in the Non-Final Office Action mailed July 2, 2021 are now withdrawn. 

Response to Arguments
Examiner acknowledges receipt of Arguments to Application 16/418,232, which include: Remarks pp.8-11 (containing applicant’s arguments). 
Regarding applicant’s Remarks on p.9 for Claims 1, 14-23, and 25-34 on the grounds of nonstatutory double patenting over Claims 1, 13-21, and 23-31 of U.S. Patent No. 10,339,468, examiner acknowledges applicant’s arguments and have considered them, and have found them to be not persuasive. While certain terms in the instant application are now replaced with synonyms as part of applicant’s amended claims, under its broadest reasonable interpretation the synonyms still convey the same context and scope as the corresponding claims identified in the patent, and therefore the nonstatutory double patenting rejection is still relevant and not withdrawn. Examiner has also noted that the additional claim limitation of “a statistical distribution of the candidate training data” introduced in the applicant’s amended Claims 1, 14, and 27 has a corresponding new mapping to an existing dependent claim identified in the patent. The updated claim mappings with additional clarifications according to the applicant’s amended claims are provided in the sections indicated below.
Applicant's arguments regarding examiner’s 35 U.S.C. §103 rejections for Claims 1, 14-16, 19-22, 25-29, and 32-34 over Qi et al. (U.S. PGPUB 2009/0125461) in view of Esponda et al. (U.S. PGPUB 2014/0279745), and Claims 17-18, 23, and 30-31 over Qi in view of Esponda, in further view of Lin et al. (U.S. PGPUB 2012/0284213) have been fully considered but they are not persuasive. 
Regarding applicant’s Remarks on pp.9-10:
“Claim 1 recites, in part:
wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data and a determination that retraining a current predictive model with updated training data that comprises at least the set of labeled data instances will result in improved model performance.
See Claim 1.
Qi discloses that "a sample-label pair is selected for labeling responsive to at least one error parameter." See Qi, paragraph [0049]. For example, with Qi, "[s]ample-label pair selector 202 analyzes the set of training samples 104 and selects a sample-label pair 212 (of FIG. 2) responsive to at least one error parameter." See Qi, paragraph [0092]. 
In other words, Qi merely discloses selection of a sample-label pair for labeling by an oracle (e.g., oracle 110) using an error parameter associated with the sample-label pair. However, Qi does not teach or suggest "wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data and a determination that re-training a current predictive model with updated training data that comprises at least the set of labeled data instances will result in improved model performance," as recited in amended claim 1.” 
Examiner has considered this argument but finds the argument to be not persuasive. Qi teaches in Qi paragraphs [0055]-[0056] that the identified multi-label samples forming the set of training samples are based on a marginal sample distribution P(x): “For an example embodiment, the ASL learning requests label annotations on the basis of sample-label pairs, which once incorporated into the training set, are expected to result in the lowest generalization error. A Multi-Labeled Bayesian Error Bound is derived with a selected sample-label pair under a multi-label setting, and ASL accordingly selects the optimal pairs to minimize this bound. … For each sample x, it has m labels             
                
                    
                        y
                    
                    
                        i
                    
                
            
        (1≤i≤m) … Let P(y|x) be the unknown conditional distribution over the samples, where y=            
                
                    
                        {
                        0,1
                        }
                    
                    
                        m
                    
                
            
         is the complete label vector and P(x) is the marginal sample distribution.”, where the notation P(x) used in Qi further indicates that this marginal sample distribution is a probability distribution (interpreted as “a statistical distribution”). Hence, it is shown that Qi does teach the new claim limitation indicated in the wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data …”. The corresponding claim mappings according to the applicant’s amended claims will be provided in the sections indicated below.
Regarding applicant’s Remarks on pp.10:
“Esponda does not cure the deficiencies of Qi. Esponda discloses "[a] dynamic classifier for performing binary classification of instance data using oracles that predict accuracy of predictions made by corresponding models." See Esponda, Abstract. However, Esponda does not teach or suggest "wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data and a determination that re-training a current predictive model with updated training data that comprises at least the set of labeled data instances will result in improved model performance," as recited in amended claim 1.
Thus, it is respectfully submitted that no combination of Qi and Esponda teaches or suggests the presently claimed subject matter. Moreover, it is respectfully submitted that a skilled artisan would not combine this cited art as proposed by the Office Action given the divergent subject matter disclosed therein.
Further, it is respectfully submitted that Lin does not cure the above identified defects of Qi and Esponda. Lin merely discloses that "[a] richness score is assigned to each of the data samples in the set and to the retained data samples that indicates how information rich a data sample is for determining accuracy of the trained predictive model." See Lin, Abstract.”
Examiner has considered this argument but finds the argument to be not persuasive. The teachings of Esponda and Lin are applied to teaching the relevant claim limitations identified in the Non-Final Office Action mailed July 2, 2021, which did not include the newly-amended claim limitation found in amended Claims 1, 17, and 27: “wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data …”. Furthermore, as indicated in the response to the above argument, Qi does a statistical distribution of the candidate training data”). Hence, the corresponding claim mappings according to the applicant’s amended claims will be provided in the sections indicated below.

Double Patenting









The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 

Claims 1, 14-23, and 25-34 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 13-21, and 23-31 of U.S. Patent No. 10,339,468 (Johnston, David Alan; Jeffery, Shawn Ryan; and Polychronopoulos, Vasileios (Groupon, Inc.)). Note that the original amendment/correction marks from the instant application have been removed for easier reading and comparison against the issued patent. The bolded text between the instant application and the issued patent in Claim 1, 14, and 27 indicate a merged claim limitation between the instant application (1 limitation) and the issued patent (3 limitations) that express the same scope. Although the claims at issue are not identical, they are not patentably distinct from each other because the issued patent discloses all of the features and limitations in the instant application, thereby making the claims in the instant application obvious over the issued patent. While certain terms in the instant application are now replaced with synonyms as part of applicant’s amended claims, under its broadest reasonable interpretation the synonyms still convey the same context and scope as the corresponding claims from the issued patent.
Instant Application 16/418,232
U.S. Patent 10,339,468
Applicant: Groupon, Inc.
Applicant: Groupon, Inc.
Inventors: David Alan Johnston, Shawn Ryan Jeffery, Vasileios Polychronopoulos
Inventors: David Alan Johnston, Shawn Ryan Jeffery, Vasileios Polychronopoulos
Filed: May 21, 2019
Filed: October 20, 2015


Claim 1
Claim 1
(Currently Amended) A computer-implemented method for adaptively improving the performance of a current predictive model, the method comprising:

A computer-implemented method for adaptively improving the performance of a current predictive model by curating training data used to derive the current predictive model, the method comprising:

receiving the training data and the current predictive model derived using the training data;
selecting a set of labeled data instances from a labeled data reservoir, the labeled data reservoir comprising a pool of candidate training data, 
selecting a set of labeled data instances from a labeled data reservoir, wherein the labeled data reservoir includes a pool of possible training data, 

wherein the set of labeled data instances are not included in the training data, 

wherein each labeled data instance is associated with a true label representing the instance, and
wherein selecting the set of labeled data instances is based at least in part on … 
… a determination that re-training a current predictive model with updated training data that comprises at least the set of labeled data instances 


As identified in an earlier claim limitation, “the set of labeled data instances” are selected from a labeled data reservoir that is a pool of candidate training data, and this pool is updated by the process of curating described in this claim. Hence this set of labeled data instances represents “updated training data”, as well as a pool of “possible” or “candidate” training data (thereby making these terms “possible training data”, “candidate training data”, and “updated training data” synonyms of each other in the context of these claims).
wherein selecting the set of labeled data instances is based on a determination that re-training the model with updated training data likely will result in improved model performance;
generating a candidate model using the updated training data that comprises at least the set of labeled data instances;


The claim limitation reciting “generating a candidate model …” is a form of “deriving a candidate model …”, where the candidate training data set is based on the received training data and a set of label data instances.
generating at least one candidate training data set by updating the training data using the set of labeled data instances;
deriving a candidate model using the candidate training data set;



generating, by a training data manager component, an assessment of whether the candidate model performance is improved from the current model performance; and 
in response to determining that a first performance of the candidate model is improved from a second performance of the current predictive model,
 
instantiating the updated training data set and the candidate model.


The terms “in an instance in which” and “in response to determining that” are synonyms since both identify a particular point in time (i.e., after the determination step recited in an earlier claim). Furthermore, applying “first” and “second” qualifier terms to two distinct performance (a candidate model performance and a current model performance) does not further differentiate or limit the already distinct performance from the two models.
instantiating the candidate training data set and the candidate model 

in an instance in which the candidate model performance is improved from the current model performance.


Claim 1
Claim 3
(Currently Amended) A computer-implemented method for adaptively improving the performance of a current predictive model, the method comprising:

The method of claim 1,


By virtue of dependency, Claim 3 includes all limitations from Claim 1, see above Claim 1 mapping.

… wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data …


The term “a statistical distribution” is a narrower form of “a distribution”, and hence anticipates the term “a distribution” from the issued patent.

The determination step is part of selecting the set of labeled instances (as recited in Claim 1: “wherein selecting the set of labeled data instances is based at least in part on … a determination…”,
Hence this claim limitation is functionally equivalent to the claim limitation “wherein the determination is based at least in part on analyzing the distribution …” from the issued patent.
wherein the determination is based at least in part on analyzing the distribution and quality of the training data.


The determination step indicated in this claim limitation is based on selecting the set of labeled data instances (as recited in Claim 1: “wherein selecting the set of labeled data instances is based on a determination…”, such that when the claim is taken as a whole with Claim 13, the “determination” term can be replaced by its functional equivalent: “selecting the set of labeled data instances is based at least in part on analyzing the distribution …”.


Claim 14
Claim 13
(Currently Amended) A computer program product, 
A computer program product, 
stored on a non-transitory computer readable medium, comprising instructions that when executed on one or more computers cause the one or more computers to perform operations comprising:
stored on a non-transitory computer readable medium, comprising instructions that when executed on one or more computers cause the one or more computers to perform operations comprising:

receiving the training data and the current predictive model derived using the training data;
selecting a set of labeled data instances from a labeled data reservoir, the labeled data reservoir comprising a pool of candidate training data, 

selecting a set of labeled data instances from a labeled data reservoir, wherein the labeled data reservoir includes a pool of possible training data, 

wherein the set of labeled data instances are not included in the training data, 

wherein each labeled data instance is associated with a true label representing the instance, and 
wherein selecting the set of labeled data instances is based at least in part on … 
… a determination that re-training a current predictive model with updated training data that comprises at least the set of labeled data instances 


As identified in an earlier claim limitation, “the set of labeled data instances” are selected from a labeled data reservoir that is a pool of candidate training data, and this pool is updated by the process of curating described in this claim. Hence this set of labeled data instances represents “updated training data”, as well as a pool of “possible” or “candidate” training data (thereby making these terms “possible training data”, “candidate training data”, and “updated training data” synonyms of each other in the context of these claims).
wherein selecting the set of labeled data instances is based on a determination that re-training the model with updated training data likely will result in improved model performance;
generating a candidate model using the updated training data that comprises at least the set of labeled data instances;


The claim limitation reciting “generating a candidate model …” is a form of “deriving a candidate model …”, where the candidate training data set is based on the received training data and a set of label data instances.
generating at least one candidate training data set by updating the training data using the set of labeled data instances;
deriving a candidate model using the candidate training data set;

generating, by a training data manager component, an assessment of whether the candidate model performance is improved from the current model performance; and
in response to determining that a first performance of the candidate model is improved from a second performance of the current predictive model,
 
instantiating the updated training data set and the candidate model.


The terms “in an instance in which” and “in response to determining that” are synonyms since both identify a particular point in time (i.e., after the determination step recited in an earlier claim). Furthermore, applying “first” and “second” qualifier terms to two distinct performance (a candidate model performance and a current model performance) does not further differentiate or limit the already distinct performance from the two models.
instantiating the candidate training data set and the candidate model 

in an instance in which the candidate model performance is improved from the current model performance.


Claim 14
Claim 15
(Currently Amended) A computer program product,

The computer program product of claim 13,


By virtue of dependency, Claim 15 includes all limitations from Claim 13, see above Claim 13 mapping.

… wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data …


The term “a statistical distribution” is a narrower form of “a distribution”, and hence anticipates the term “a distribution” from the issued patent.

The determination step is part of selecting the set of labeled instances (as recited in Claim 14: “wherein selecting the set of labeled data instances is based at least in part on … a determination…”,
Hence this claim limitation is functionally equivalent to the claim limitation “wherein the determination is based at least in part on analyzing the distribution …” from the issued patent.
wherein the determination is based at least in part on analyzing the distribution and quality of the training data.


The determination step indicated in this claim limitation is based on selecting the set of labeled data instances (as recited in Claim 13: “wherein selecting the set of labeled data instances is based on a determination…”, such that when the claim is taken as a whole with Claim 13, the “determination” term can be replaced by its functional equivalent: “selecting the set of labeled data instances is based at least in part on analyzing the distribution …”.


Claim 15
Claim 14
(Currently Amended) The computer program product of claim 14, 
The computer program product of claim 13,
wherein the labeled data reservoir comprises data 
wherein the labeled data reservoir includes data that have been collected continuously over time from input data being processed by the current predictive model.


Claim 16
Claim 13
(Currently Amended) The computer program product of claim 14,

By virtue of dependency, Claim 16 includes all limitations from Claim 14; refer to above Claim 14 mapping.
A computer program product, 

stored on a non-transitory computer readable medium, comprising instructions that when executed on one or more computers cause the one or more computers to perform operations comprising:

receiving the training data and the current predictive model derived using the training data;

selecting a set of labeled data instances from a labeled data reservoir, wherein the labeled data reservoir includes a pool of possible training data, 

wherein the set of labeled data instances are not included in the training data, 
wherein each labeled data instance from the set of labeled data instances is associated with a true label representing the data instance.
wherein each labeled data instance is associated with a true label representing the instance, and 

wherein selecting the set of labeled data instances is based on a determination that re-training the model with updated training data likely will result in improved model performance;

generating at least one candidate training data set by updating the training data using the set of labeled data instances;
deriving a candidate model using the candidate training data set;

generating, by a training data manager component, an assessment of whether the candidate model performance is improved from the current model performance; and

instantiating the candidate training data set and the candidate model 

in an instance in which the candidate model performance is improved from the current model performance.


Claim 17
Claim 15
(Currently Amended) The computer program product of claim 16

By virtue of dependency, Claim 17 includes all limitations from Claims 14 and 16; refer to above Claims 14 and 16 mapping. 
The computer program product of claim 13,


By virtue of dependency, Claim 15 includes all limitations from Claim 13, see above Claim 13 mapping.

wherein the determination is further based at least in part on a updated training data set.


The term “a distribution” is now incorporated into independent Claim 14 and further limited to indicate “a statistical distribution”; see above claim mapping for Claim 14.
wherein the determination is based at least in part on analyzing the distribution and quality of the training data.


Claim 18
Claim 16
(Currently Amended) The computer program product of claim 17,
The computer program product of claim 15,
wherein the current predictive model is a classifier predicting to which of a set of predictive categories an input data instance belongs, 
wherein the current model is a classifier predicting to which of a set of predictive categories an input data instance belongs,
wherein a true label associated with a labeled data instance identifies the predictive category to which the labeled data instance belongs, and
wherein a true label associated with a labeled data instance identifies the predictive category to which the labeled data instance belongs, and
wherein selecting the set of labeled data instances from the labeled data reservoir is based at least in part on maintaining a class balance within the pool of candidate training data.


As established earlier, “updated training data”, “possible training data”, and “candidate training data” are synonyms of each other with respect to each other in the context of these claims, as they all identify and originate from a pool of (candidate/possible) training data.
wherein selecting the set of labeled data instances from the labeled data reservoir is based at least in part on maintaining a class balance within the training data.


Claim 19
Claim 17
(Currently Amended) The computer program product of claim 14, 
The computer program product of claim 13,
wherein generating the updated training data comprises: identifying and removing outlier instances.


As established earlier, “updated training data”, “possible training data”, and “candidate training data” are synonyms of each other with respect to each other in the context of these claims, as they all identify and originate from a pool of (candidate/possible) training data.
wherein generating the candidate training data comprises: identifying and removing outlier instances.


Claim 20
Claim 18
(Previously Presented) The computer program product of claim 19, 
The computer program product of claim 17,
wherein the current predictive model is a classifier predicting to which of a set of predictive categories an input data instance belongs, and
wherein the current model is a classifier predicting to which of a set of predictive categories an input data instance belongs, and
wherein selecting the set of labeled data instances from the labeled data reservoir comprises: identifying and removing outlier instances in one predictive category.
wherein selecting the set of labeled data instances from the labeled data reservoir comprises: identifying and removing outlier instances in one predictive category.


Claim 21
Claim 19
(Currently Amended) The computer program product of claim 14, 
The computer program product of claim 13,
wherein the labeled data reservoir comprises labeled data instances that are received from multiple sources, and 
wherein the labeled data reservoir includes labeled data instances that are received from multiple sources, and
wherein selecting a labeled data instance from the set of labeled data instances comprises:
wherein selecting a labeled data instance from the set of labeled data instances comprises:

comparing a source of the labeled data instance with a pre-determined source; and
selecting the labeled data instance in response to a source of the labeled data instance matching with a pre-determined source.


The terms “in an instance in which” and “in response to” are synonyms since both identify a particular point in time.
selecting the labeled data instance in an instance in which the source of the labeled data instance matches the pre-determined source.


Claim 22
Claim 20
(Currently Amended) The computer program product of claim 14, 
The computer program product of claim 13,
wherein generating the updated training data [[set]] is based on a greedy algorithm, the generating comprising:
wherein generating at least one candidate training data set is based on a greedy algorithm, the generating comprising:
generating a first at least a portion of the candidate training data; and

generating a first candidate training data set by adding a first subset of the labeled data instances to the training data; and
generating a second candidate training data set by adding a second subset of the labeled data instances to the first 


As established earlier, “updated training data”, “possible training data”, and “candidate training data” are synonyms of each other with respect to each other in the context of these claims. As recited in Claim 14, the generation of training sets are performed using the labeled data reservoir comprising the pool of candidate training data, therefore the “first training data set” still represents the “first candidate training data set”.
generating a second candidate training data set by adding a second subset of the labeled data instances to the first candidate training data set.


Claim 23
Claim 21
(Currently Amended) The computer program product of claim 14, 
The computer program product of claim 13,
wherein generating the updatedtraining data [[set]] is based on a non-greedy algorithm, the generating comprising:
wherein generating at least one candidate training data set is based on a non-greedy algorithm, the generating comprising:
replacing at least a portion of the candidate training data with a subset of the labeled data instances.


As established earlier, “updated training data”, “possible training data”, and “candidate training data” are synonyms of each other with respect to each other in the context of these claims. As recited in Claim 14, the generation of training sets are performed using the labeled data reservoir comprising the pool of candidate training data, therefore the “training data” and  “candidate training data” are functionally equivalent.
replacing the training data with a subset of the labeled data instances.


Claim 25
Claim 23
(Currently Amended) The computer program product of claim [[24]] 14, wherein the instructions, when executed on the one or more computers, further cause the one or more computers to perform operations comprising:


The claim limitation “the instructions, when executed on the one or more computers further cause the one or more computers to perform operations” is already established from independent Claim 14 (and 
The computer program product of claim 22,
calculating a cross-validation between the first performance of the candidate model second performance of the current predictive model 


Applying “first” and “second” qualifier terms to a candidate model performance and a current model performance does not further differentiate or limit the already distinct performance from the two models.
wherein generating the assessment comprises calculating a cross-validation between the candidate model performance and the current model performance.


Claim 26
Claim 24
(Currently Amended) The computer program product of claim [[24]] 14, 
The computer program product of claim 22,
wherein the candidate model[[s]] is a first candidate model, and


The term “the candidate model is a first candidate model” is a narrower form of “there are multiple candidate models”. Furthermore, as identified in the following claim limitation, a second candidate model is required to generate an assessment. Hence, this claim taken as a whole anticipates the claim “wherein there are multiple candidate models” from the issued patent.
wherein there are multiple candidate models, and 
wherein the instructions, when executed on the one or more computers, further cause the one or more computers to perform operations comprising:
generating [[the]] an assessment forfirst performance of the first candidate model[[s]] and the second performance of the second candidate model 


The claim limitation “the instructions, when executed on the one or more computers further cause the one or more computers to perform operations” is already established from independent Claim 14 (and established in corresponding Claim 22 of the issued patent, which traces back to independent Claim 13 in the issued patent), and therefore does not add any further distinction to the claim.

Applying “first” and “second” qualifier terms to two different candidate models (representing multiple candidate models) does not further differentiate or limit the already distinct performance from the two models.
Generating the assessment between two candidate models in parallel is a narrower form of generating the assessment for multiple candidate models in parallel, and hence this claim anticipates the claim “wherein generating the assessment for each of the multiple candidate models is implemented in parallel” from the issued patent.
wherein generating the assessment for each of the multiple candidate models is implemented in parallel.


Claim 27
Claim 25
(Currently Amended) A system, comprising:
A system, comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising:

receiving the training data and the current predictive model derived using the training data;
selecting a set of labeled data instances from a labeled data reservoir, the labeled data reservoir comprising a pool of candidate training data, 

selecting a set of labeled data instances from a labeled data reservoir, wherein the labeled data reservoir includes a pool of possible training data,

wherein the set of labeled data instances are not included in the training data, 

wherein each labeled data instance is associated with a true label representing the instance, and
wherein selecting the set of labeled data instances is based at least in part on … 
… a determination that re-training a current predictive model with updated training data that comprises at least the set of labeled data instances 


As identified in an earlier claim limitation, “the set of labeled data instances” are selected from a labeled data reservoir that is a pool of candidate training data, and this pool is updated by the process of curating described in this claim. Hence this set of labeled data instances represents “updated training data”, as well as a pool of “possible” or “candidate” training data (thereby making these terms “possible training data”, “candidate training data”, and “updated training data” synonyms of each other in the context of these claims).
wherein selecting the set of labeled data instances is based on a determination that re-training the model with updated training data likely will result in improved model performance;
generating a candidate model using the updated training data that comprises at least the set of labeled data instances;


The claim limitation reciting “generating a candidate model …” is a form of “deriving a candidate model …”, where the candidate training data set is based on the received training data and a set of label data instances.
generating at least one candidate training data set by updating the training data using the set of labeled data instances;
deriving a candidate model using the candidate training data set;

generating, by a training data manager component, an assessment of whether the candidate model performance is improved from the current model performance; and
in response to determining that a first performance of the candidate model is improved from a second performance of the current predictive model,
 
instantiating the updated training data set and the candidate model.


The terms “in an instance in which” and “in response to determining that” are synonyms since both identify a particular point in time (i.e., after the determination step recited in an earlier claim). Furthermore, applying “first” and “second” qualifier terms to two distinct performance 
instantiating the candidate training data set and the candidate model

in an instance in which the candidate model performance is improved from the current model performance.


Claim 27
Claim 27
(Currently Amended) A system, comprising:

The system of claim 25


By virtue of dependency, Claim 27 includes all limitations from Claim 25, see above Claim 25 mapping.

… wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data …


The term “a statistical distribution” is a narrower form of “a distribution”, and hence anticipates the term “a distribution” from the issued patent.

The determination step is part of selecting the set of labeled instances (as recited in Claim 27: “wherein selecting the set of labeled data instances is based at least in part on … a determination…”,
Hence this claim limitation is functionally equivalent to the claim limitation “wherein the determination is based at least in part on analyzing the distribution …” from the issued patent.
wherein the determination is based at least in part on analyzing the distribution and quality of the training data.


The determination step indicated in this claim limitation is based on selecting the set of labeled data instances (as recited in Claim 25: “wherein selecting the set of labeled data instances is based on a determination…”, such that when the claim is taken as a whole with Claim 13, the “determination” term can be replaced by its functional equivalent: “selecting the set of labeled data instances is based at least in part on analyzing the distribution …”.


Claim 28
Claim 26
(Currently Amended) The system of claim 27, 
The system of claim 25,
wherein the labeled data reservoir comprises data 
wherein the labeled data reservoir includes data that have been collected continuously over time from input data being processed by the current predictive model.


Claim 29
Claim 25
(Currently Amended) The system of claim 27,


By virtue of dependency, Claim 29 includes all limitations from Claim 27; refer to above Claim 27 mapping.
A system, comprising:

one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising:

receiving the training data and the current predictive model derived using the training data;

selecting a set of labeled data instances from a labeled data reservoir, wherein the labeled data reservoir includes a pool of possible training data,

wherein the set of labeled data instances are not included in the training data, 
wherein each labeled data instance from the set of labeled data instances is associated with a true label representing the data instance.
wherein each labeled data instance is associated with a true label representing the instance, and

wherein selecting the set of labeled data instances is based on a determination that re-training the model with updated training data likely will result in improved model performance;

generating at least one candidate training data set by updating the training data using the set of labeled data instances;
deriving a candidate model using the candidate training data set;

generating, by a training data manager component, an assessment of whether the candidate model performance is improved from the current model performance; and

instantiating the candidate training data set and the candidate model

in an instance in which the candidate model performance is improved from the current model performance.


Claim 30
Claim 27
(Currently Amended) The system of claim 29, 


By virtue of dependency, Claim 30 includes all limitations from Claims 27 and 29; refer to above Claims 27 and 29 mapping.
The system of claim 25,


By virtue of dependency, Claim 27 includes all limitations from Claim 25, see above Claim 25 mapping.

wherein the determination is based at least in part on updated training data.


The term “a distribution” is now incorporated into independent Claim 27 and further limited to indicate “a statistical distribution”; see above claim mapping for Claim 27.
wherein the determination is based at least in part on analyzing the distribution and quality of the training data.


Claim 31
Claim 28
(Currently Amended) The system of claim 30, 
The system of claim 27,
wherein the current predictive model is a classifier predicting to which of a set of predictive categories an input data instance belongs, 
wherein the current model is a classifier predicting to which of a set of predictive categories an input data instance belongs, 
wherein a true label associated with a labeled data instance identifies the predictive category to which the labeled data instance belongs, and
wherein a true label associated with a labeled data instance identifies the predictive category to which the labeled data instance belongs, and
wherein selecting the set of labeled data instances from the labeled data reservoir is based at least in part on maintaining a class balance within the pool of candidate training data.
wherein selecting the set of labeled data instances from the labeled data reservoir is based at least in part on maintaining a class balance within the training data.


Claim 32
Claim 29
(Currently Amended) The system of claim 27, 
The system of claim 25,
wherein generating the updated training data comprises: identifying and removing outlier instances.
wherein generating the candidate training data comprises: identifying and removing outlier instances.


Claim 33
Claim 30
(Previously Presented) The system of claim 32, 
The system of claim 29,
wherein the current predictive model is a classifier predicting to which of a set of predictive categories an input data instance belongs, and
wherein the current model is a classifier predicting to which of a set of predictive categories an input data instance belongs, and 
wherein selecting the set of labeled data instances from the labeled data reservoir comprises identifying and removing outlier instances in one predictive category.
wherein selecting the set of labeled data instances from the labeled data reservoir comprises: identifying and removing outlier instances in one predictive category.


Claim 34
Claim 31
(Currently Amended) The system of claim 27, 
The system of claim 25,
wherein the labeled data reservoir comprises labeled data instances that are received from multiple sources, and 
wherein the labeled data reservoir includes labeled data instances that are received from multiple sources, and 

wherein selecting a labeled data instance from the set of labeled data instances comprises:
wherein selecting a labeled data instance from the set of labeled data instances comprises:
selecting the labeled data instance in response to a source of the labeled data instance matching with a pre-determined source.


The terms “in an instance in which” and “in response to” are synonyms since both identify a particular point in time.
selecting the labeled data instance in an instance in which the source of the labeled data instance matches the pre-determined source.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later 
Claims 1, 14-16, 19-22, 25-29, and 32-34 are rejected under 35 U.S.C. 103 as being unpatentable over Qi et al., U.S. PGPUB 2009/0125461, published 5/14/2009 [hereafter referred as Qi] in view of Esponda et al., U.S. PGPUB 2014/0279745, published 9/18/2014 [hereafter referred as Esponda].
Regarding amended Claim 1, Qi teaches
(Currently Amended) A computer-implemented method for adaptively improving the performance of a current predictive model, the method comprising: 
selecting a set of labeled data instances from a labeled data reservoir, the labeled data reservoir comprising a pool of candidate training data (Qi Figure 2, elements 104, 106(x), 108(x): examiner’s note: Qi teaches a system selecting samples 106(x) and associated labels 108(x) (collectively they comprise the pool of candidate training data) from training samples set 104 (representing a labeled data reservoir); this is also reflected in flow diagram shown in Figure 4, element 406 (Qi paragraph [0049]) and in Figure 1, element 114 (Qi paragraph [0033] for single-label case) (Qi paragraphs [0037]-[0038]: “…each respective sample 106(x) is associated with multiple labels 108(x). Training samples set 104 may include any number of samples 106, each of which may have any number of associated labels 108. … sample-label pair selector 202 selects at arrow 204 a sample 106(x) and an associated label 108(x) to jointly form a sample-label pair 212 for labeling by oracle 110…”).), 
wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data and a determination that re-training a current predictive model with updated training data that comprises at least the set of labeled data instances (Examiner’s note: Qi teaches a sample-label pair selector selecting a sample-label pair to minimize an expected error Qi paragraph [0092]: “In an example embodiment, classifier 112 is to classify objects in accordance with multiple labels that are also associated with samples of a set of training samples 104 ( of FIGS. 1 and 2). Sample-label pair selector 202 analyzes the set of training samples 104 and selects a sample-label pair 212 (of FIG. 2) responsive to at least one error parameter.”). Qi further teaches that during training, an error parameter (described as a classification error or generalization error) is determined, where minimizing the classification error or generalization error maximizes the expected predictive capability of the classifier (leading to increased model accuracy and hence improving model performance; Qi paragraph [0017] and [0033]); this is reflected in the flow diagram taught in Qi Figure 4, element 406 and Qi paragraph [0049] (Qi paragraph [0038]: “… This sample-label pair selection may be made responsive to an error parameter, such as a generalization or classification error parameter. For instance, it may be made responsive to an error bound. By way of example, a sample-label pair may be selected responsive to a Bayesian classification error bound for a multi-label scenario. More specifically, a sample-label pair may be selected so as to reduce, if not minimize, an expected Bayesian error.”). Although Qi Figure 1 is in the context of the single label case described in Qi paragraphs [0031]-[0033], the same criterion used here is also applicable for the multi-label case described in Qi Figure 2 and Qi Figure 4, where “a convergence of the expected/estimated error performance” over multiple iteration instances of training will lead to improved model performance (Qi paragraph [0033]: “The process can thus include sample selection 114, oracle labeling 116/118, training sample set updating 120, and classifier updating 122. The process may be iterated until a desired criterion is reached. This criterion may be, for example … a convergence of expected/estimated error performance …”). Qi further teaches that the identified multi-label samples being selected to forming the set of updated training samples are based on a marginal sample distribution P(x) (Qi paragraphs [0055]-[0056]: “For an example embodiment, the ASL learning requests label annotations on the basis of sample-label pairs, which once incorporated into the training set, are expected to result in the lowest generalization error. A Multi-Labeled Bayesian Error Bound is derived with a selected sample-label pair under a multi-label setting, and ASL accordingly selects the optimal pairs to minimize this bound. … For each sample x, it has m labels                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                    (1≤i≤m) … Let P(y|x) be the unknown conditional distribution over the samples, where y=                        
                            
                                
                                    {
                                    0,1
                                    }
                                
                                
                                    m
                                
                            
                        
                     is the complete label vector and P(x) is the marginal sample distribution.”), where this marginal sample distribution is a probability distribution (interpreted as “a statistical distribution”), thus making this process of selecting a set of labeled data instances correspond to “wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data and a determination that re-training a current predictive model with updated training data that comprises at least the set of labeled data instances will result in improved model performance”).); 
generating a … model usingthe updated training data  that comprises at least the set of labeled data instances (Qi Figure 2, elements 104, 102, 112, 210: examiner’s note: Qi teaches updating a single classifier using selected training sample-label pairs (with the selected sample-label pairs representing “the updated training data set that comprises at least the set of labeled data instances”) from the training samples set 104, where the process of updating the classifier is shown in Qi Figure 2, elements 104, 204, 202, 206, 208, 210 and Qi paragraphs [0036]-[0039], and reflects training a classifier model using the selected sample-label pair data (and hence represents a generation of a model, based on the selected sample-label pairs); this is also reflected in the multi-label flow diagram taught in Qi Figure 4, elements 404, 406, 408, 410, 412, 414 and Qi paragraphs [0047]-[0052] (Qi paragraphs [0038]-[0039]: “ … sample-label pair selection 202 selects at arrow 204 a sample 106(x) and an associated label 108(x) to jointly form a sample-label pair 212 for labeling by oracle 110. … At arrow 208, oracle 110 returns an indication of relevance 214 … This indicated relevancy labeling 214 is incorporated into the set of training samples 104 to update it. With the updated training samples set 104, active learning classifier trainer 102 updates classifier 112 at arrow 210.”) as well as in the single label case taught in Qi Figure 1, elements 114, 116, 118, 102, 222 and Qi paragraphs [0031]-[0033]).); 
in response to determining that a first performance of the … model is improved … , instantiating the updated training data set and the … model (Qi Figure 2, elements 102, 210, 112; Figure 4, elements 416, 418: examiner’s note: As indicated earlier, for the multi-label case described in Qi Figure 2 and Qi Figure 4, “a convergence of the expected/estimated error performance” over multiple iteration instances of training will lead to improved model performance (Qi paragraph [0033]: “The process can thus include sample selection 114, oracle labeling 116/118, training sample set updating 120, and classifier updating 122. The process may be iterated until a desired criterion is reached. This criterion may be, for example … a convergence of expected/estimated error performance …”). After reaching a certain stopping criteria, the model is considered trained, and along with the returned labeled samples in the set of training samples (Qi paragraph [0039]), both model and the returned label samples (corresponding to the “updated training data set”) are considered instantiated at this point (Qi paragraph [0052]: “At block 416, it is determined if additional classifier training is to be performed. For example, this determination may be made with reference to one or more criteria. … If no more training is to be performed … then at block 418, the final classifier is produced. Classifier 112 may then be used to label new objects.”); this is also reflected in the single label case (Qi Figure 2, elements 120, 112; paragraph [0034]: “After training, classifier 112 may be given an input sample object from a target data set. In response, classifier 112 outputs one or more predicted labeled concepts in accordance with its trained classifying algorithm…”).).  
However, Qi does not teach
generating a candidate model …  that comprises at least the set of labeled data instances;
in response to determining that a first performance of the candidate model is improved from a second performance of the current predictive model, instantiating … the updated model.
Esponda teaches
generating a candidate model …  that comprises at least the set of labeled data instances (Esponda Figure 1B, elements 114, 120: examiner’s note: A dynamic classifier performing training using training data received as input data (instance data with labels, Esponda paragraph [0032], [0054]) during a training phase, where in the first level, the dynamic classifier contains multiple data models M1..Mn (each one representing a candidate model), the second level use model selection 132 and integrator 128 to process a subset of the model outputs (predictions) along with the training data to generate one or more predictions (hence representing the generation of a candidate model among multiple data models) (Esponda paragraphs [0041]-[0044]: “…the dynamic classifier 114 is comprised of three levels. The first level includes multiple data models M 1 through Mn (hereinafter collectively referred to as "data models M"). Data models M receive input data 120 and generates model outputs MO 1 through MOn (hereinafter collectively referred to as "model outputs MO"). Each of the model outputs MO represents a prediction made by each of the data models MO1 through MOn based on input data 120. … The second level of the dynamic classifier 114 receives and process a subset of the model outputs MO along with instance data to generate one or more intermediate predictions using two or more modules using different algorithms. … Model selection 132 and integrator 128 generate first intermediate prediction 133 and second intermediate prediction 129, respectively.”).);
in response to determining that a first performance of the candidate model is improved from a second performance of the current predictive model, instantiating … the updated model (Esponda Figure 8, elements 820, 824: examiner’s note: In the context of generating a first intermediate prediction, selecting (instantiating) a final model after choosing a model from among M1..Mn with the highest model output (prediction) as a first model, and a model from among M1..Mn with the lowest model output (prediction) as a second model, and using the corresponding model oracles to generate confidence scores (reflecting prediction accuracy, which is a form of performance metric) based on their model outputs (Esponda paragraphs [0030], [0049], [0051]), where the model with the highest confidence score (reflecting the highest accuracy of a prediction) is selected (instantiated) as the final model for generating the prediction, where the selection of this final model among multiple models corresponds to instantiating the updated model (Esponda paragraphs [0071]-[0072]: “A final model is then selected 820 from the first and second models based on the first and second confidence values. Specifically, one of the first and second models having their corresponding oracles produce a higher confidence is selected as the final model. … The model output of the final model is then sent 824 out as first intermediate prediction 133 from model selector 132.”), and where the confidence scores indicating prediction accuracy are generated by the oracles and are represented by class probabilities (Esponda paragraph [0050]), which also serves as the current predictive model for the next iteration instance (Esponda paragraph [0051]: “In non-binary classification, the output selector 210 may simply choose a model output of a model predicted by oracles to be the most accurate as first intermediate prediction 133 without using the Min-Max function. In some embodiments, the output selector 210 may generate a default value as first intermediate prediction 133 if the confidence values of the all oracles are below a certain level.”).).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the process of using oracles in the single classifier model method of Qi and extend it by implementing an ensemble model method where multiple oracles are associated with multiple models of Esponda as a way to improve the predictive ability of a model used in a system. The motivation to combine is taught in Esponda, as ensemble methods involving multiple models obtain better predictive performance versus updating single model. Both Qi and Esponda use oracles to help in identify training data that result in improved predictions (the oracle in Qi uses label relevancy identifiers while the oracles in Esponda use confidence scores, which are similar as they both identify relevant or related training data to improve predictability of a model), and thus it would have been obvious to a person having ordinary skill in the art to also implement the oracles from Esponda to generate confidence scores to facilitate a more qualitative comparison between a current predictive model and a candidate model, such that the final candidate model represents an improvement of the predictive performance, thus making the system more efficient and accurate in terms of making predictions (Esponda paragraph [0006]: “An ensemble method uses multiple distinct models to obtain better predictive performance than could be obtained from any of the individual models.” and Esponda paragraph [0030]: “Embodiments relate to a dynamic classifier for performing classification of an action or event associated with instance data using oracles that predict accuracy of predictions made by corresponding models. An oracle corresponding to a model is trained to generate a confidence value that represents accuracy of a prediction made by the model. Based on the confidence value and predictions, one of multiple models is selected and its prediction is used as an intermediate prediction. … By using the confidence value for each model and for each instance data, a more accurate prediction can be made.”).
Regarding amended Claim 14, Qi teaches
(Currently Amended) A computer program product, stored on a non-transitory computer readable medium, comprising instructions that when executed on one or more computers cause the one or more computers to perform operations ([Qi Figure 7, elements, 700, 702, 706, 708, 710; paragraph [0035]: processor-executable instructions] [Qi paragraphs [0096]; [0098]; [0100]-[0101]: computer device with processors and storage media containing instructions.]) comprising: 
selecting a set of labeled data instances from a labeled data reservoir, the labeled data reservoir comprising a pool of candidate training data (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.), 
wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data and a determination that re-training a current predictive model with updated training data that comprises at least the set of labeled data instances (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); 
generating a … model usingthe updated training data  that comprises at least the set of labeled data instances (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); 
in response to determining that a first performance of the … model is improved … , instantiating the updated training data set and the … model (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.).  
However, Qi does not teach
generating a candidate model …  that comprises at least the set of labeled data instances;
in response to determining that a first performance of the candidate model is improved from a second performance of the current predictive model, instantiating … the updated model.
Esponda teaches
generating a candidate model …  that comprises at least the set of labeled data instances (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.);
in response to determining that a first performance of the candidate model is improved from a second performance of the current predictive model, instantiating … the updated model (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.).
Both Qi and Esponda are analogous art since they both teach improving predictions in machine learning models.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the process of using oracles in the single classifier model method of Qi and extend it by implementing an ensemble model method where multiple oracles are associated with multiple models of Esponda as a way to improve the predictive ability of a model used in a system. The motivation to combine is taught in Esponda, as ensemble methods involving multiple models obtain better predictive performance versus updating single model. Both Qi and Esponda use oracles to help in identify training data that result in improved predictions (the oracle in Qi uses label relevancy identifiers while the oracles in Esponda use confidence scores, which are similar as they both identify relevant or related training data to improve predictability of a model), and thus it would have been obvious to a (Esponda paragraph [0006]: “An ensemble method uses multiple distinct models to obtain better predictive performance than could be obtained from any of the individual models.” and Esponda paragraph [0030]: “Embodiments relate to a dynamic classifier for performing classification of an action or event associated with instance data using oracles that predict accuracy of predictions made by corresponding models. An oracle corresponding to a model is trained to generate a confidence value that represents accuracy of a prediction made by the model. Based on the confidence value and predictions, one of multiple models is selected and its prediction is used as an intermediate prediction. … By using the confidence value for each model and for each instance data, a more accurate prediction can be made.”).
Regarding amended Claim 15, Qi in view of Esponda teaches
(Currently Amended) The computer program product of claim 14, 
wherein the labeled data reservoir comprises data (Examiner’s note: Qi teaches that samples 106 collected for labeling can cover a range of items, and can include a large amount of samples, which are assumed to be gathered over time, thus representing data collected over time (Qi paragraph [0030]: “Samples 106 may correspond to text items, images, videos, biological data, combinations thereof, or any other type of data set. Although a single sample 106 that is associated with two labels 108a and 108b is explicitly shown, there may be many (e.g., dozens, hundreds, thousands, or more of) such samples 106. Also, each sample 106 may be associated with any number of labels 108.”).).  
Regarding amended Claim 16, Qi in view of Esponda teaches
(Currently Amended) The computer program product of claim 14, 
from the set of labeled data instances is associated with a true label representing the data instance (Qi Figure 1, elements 106, 108, 110, 116, 118: examiner’s note: Qi teaches an oracle providing labeling for selected labeled data instances, along with a relevancy tag indicating the relevancy for the label that was associated with the label data instance; a similar oracle is also indicated for the multi-label case (Qi paragraph [0040] and Qi Figure 2, elements 106(x), 108(x), 110, 206, 208) (Qi paragraph [0032]: “Oracle 110 may also be termed a teacher, an annotator, and so forth. Oracle 110 is typically a human or a group of humans that is capable of labeling each sample. The labeling may indicate, for example, a relevancy of label 108 to its associated sample 106. If two labeling categories are permitted for each label concept, the relevancies may be positive/negative, relevant/not relevant, related/not related, and so forth. … Oracle 110 provides or mputs the labeled relevancy at arrow 118 to active learning classifier trainer 102.”). Qi further teaches a true label ys (Qi paragraphs [0061]: “…true label of the selected sample ys …”), where ys is indicated in Qi paragraph [0058]: “tentatively selected to be requested for labeling (but not yet annotated by the oracle)…”, thus representing the predictive category for the labeled data instance.).  
Regarding amended Claim 19, Qi in view of Esponda teaches
(Currently Amended) The computer program product of claim 14, 
wherein generating the updated training data comprises identifying and removing outlier instances (Qi Figure 3, elements X1 and Xj (before ASL and after ASL): examiner’s note: Qi teaches performing active sampling and labeling (ASL) on data instances to select the most informative samples and labels for labeling, such that the process (e.g., oracle) categorizes multiple labels with label relevancy identifier (as positive concept, negative 1 and one label, sample Xj and two labels) representing the concept of identifying data instances used for further training, with the other samples and labels that were not selected for further training representing the concept of identifying and removing outlier instances according to their corresponding labels (which are a form of a predictive category) (Qi paragraphs [0044]-[0045]: “… As indicated by legend 308, each label may be categorized or labeled as a positive concept ("P"), as a negative concept ("N"), as an unlabeled concept("?"), or it may be selected for labeling of the concept (''S"). As indicated by the ellipses (" ... ") in each matrix 302, more samples 106 and labels 108 than those that are explicitly illustrated may be present. … The illustrated example labeling states for matrices 302B and 302A are as follows. For the before ASL matrix 302B, sample X1 has three associated labels that are:?,?, and P. Sample Xi, has three associated labels that are: ?, P, and N. Sample Xj has three associated labels that are: ?, ?, and?. Sample Xn has three associated labels that are: P, ?, and P. For the after ASL matrix 302A, sample X1 has three associated labels that are: S, ?, and P. Sample Xi, has three associated labels that are: ?, P, and N. Sample Xj has three associated labels that are: S, ?, and S. Sample Xn has three associated labels that are: P, ?, and P. Thus, example ASL procedure 300 has selected three sample-label pairs for labeling. These three sample-label pairs include one with sample X1 and two with sample Xj.”).).  
Regarding previously presented Claim 20, Qi in view of Esponda teaches
(Previously Presented) The computer program product of claim 19, 
wherein the current predictive model is a classifier predicting to which of a set of predictive categories an input data instance belongs (Qi paragraph [0034]: “classifier 112 outputs one or more predicted labeled concepts in accordance with its trained classifying algorithm. Classifier 112 may employ any classifying algorithm.”), and 
wherein selecting the set of labeled data instances from the labeled data reservoir comprises: identifying and removing outlier instances in one predictive category (Qi Figure 3, elements X1 and Xj (before ASL and after ASL): examiner’s note: Qi teaches performing active sampling and labeling (ASL) on data instances to select the most informative samples and labels for labeling, such that the process (e.g., oracle) categorizes multiple labels with label relevancy identifier (as positive concept, negative concept, unlabeled concept, or selected/not selected), with those samples and associated labels being selected for labeling (e.g., sample X1 and one label, sample Xj and two labels) representing the concept of identifying data instances used for further training, with the other samples and labels that were not selected for further training representing the concept of identifying and removing outlier instances according to their corresponding labels (which are a form of a predictive category) (Qi paragraphs [0044]-[0045]: “… As indicated by legend 308, each label may be categorized or labeled as a positive concept ("P"), as a negative concept ("N"), as an unlabeled concept("?"), or it may be selected for labeling of the concept (''S"). As indicated by the ellipses (" ... ") in each matrix 302, more samples 106 and labels 108 than those that are explicitly illustrated may be present. … The illustrated example labeling states for matrices 302B and 302A are as follows. For the before ASL matrix 302B, sample X1 has three associated labels that are:?,?, and P. Sample Xi, has three associated labels that are: ?, P, and N. Sample Xj has three associated labels that are: ?, ?, and?. Sample Xn has three associated labels that are: P, ?, and P. For the after ASL matrix 302A, sample X1 has three associated labels that are: S, ?, and P. Sample Xi, has three associated labels that are: ?, P, and N. Sample Xj has three associated labels that are: S, ?, and S. Sample Xn has three associated labels that are: P, ?, and P. Thus, example ASL procedure 300 has selected three sample-label pairs for labeling. These three sample-label pairs include one with sample X1 and two with sample Xj.”).).  
Regarding amended Claim 21, Qi in view of Esponda teaches
 The computer program product of claim 14, 
wherein the labeled data reservoir comprises labeled data instances that are received from multiple sources (Qi paragraph [0030]: Samples 106 collected for labeling can cover a range of items from multiple sources (“Samples 106 may correspond to text items, images, videos, biological data, combinations thereof, or any other type of data set. Although a single sample 106 that is associated with two labels 108a and 108b is explicitly shown, there may be many (e.g., dozens, hundreds, thousands, or more of) such samples 106. Also, each sample 106 may be associated with any number of labels 108.”).), and 
wherein selecting a labeled data instance from the set of labeled data instances comprises: selecting the labeled data instance in response to a source of the labeled data instance matching with a pre-determined source (Examiner’s note: Qi teaches a true label ys (Qi paragraphs [0061]: “…true label of the selected sample ys …”), where ys is indicated in Qi paragraph [0058]: “tentatively selected to be requested for labeling (but not yet annotated by the oracle)…”, thus representing the predictive category for the labeled data instance. As indicated earlier, Qi teaches an oracle for the multi-label case (Qi paragraph [0040] and Qi Figure 2, elements 106(x), 108(x), 110, 206, 208). As part of the selection of a labeled data instance (sample) for training, oracle 110 provides a label relevancy identifier for a label 108 chosen with a sample 106 from the set of training samples 104, with the relevancy indicating categories such as: positive/negative, relevant/not relevant, related/not related, etc., where the label relevancy identification represents a condition where the pre-determined source (label 108) matches the sample source (true label provided by the oracle) (Qi paragraph [0032]: “Oracle 110 may also be termed a teacher, an annotator, and so forth. Oracle 110 is typically a human or a group of humans that is capable of labeling each sample. The labeling may indicate, for example, a relevancy of label 108 to its associated sample 106. If two labeling categories are permitted for each label concept, the relevancies may be positive/negative, relevant/not relevant, related/not related, and so forth. … Oracle 110 provides or mputs the labeled relevancy at arrow 118 to active learning classifier trainer 102.”).).  
Regarding amended Claim 22, Qi in view of Esponda teaches
(Currently Amended) The computer program product of claim 14, 
wherein generating the updated training data [[set]] is based on a greedy algorithm, the generating comprising: 
generating a first at least a portion of the candidate training data (Qi Figure 3, elements X1 and Xj (before ASL and after ASL): examiner’s note: Under its broadest reasonable interpretation, the term “greedy algorithm” is interpreted as an algorithm or a series of steps that is exhaustive or thorough in nature to achieve its goal. Qi teaches performing active sampling and labeling (ASL) on data instances to select the most informative samples and labels for labeling, such that the process (e.g., oracle) categorizes multiple labels with label relevancy identifier (as positive concept, negative concept, unlabeled concept, or selected/not selected), with those samples and associated labels being selected for labeling (e.g., sample X1 and one label, sample Xj and two labels) representing the concept of identifying data instances used for further training, with the other samples and labels that were not selected for further training representing the concept of identifying and removing outlier instances according to their corresponding labels (which are a form of a predictive category), thus representing an exhaustive search corresponding to a greedy algorithm (Qi paragraphs [0044]-[0045]: “… As indicated by legend 308, each label may be categorized or labeled as a positive concept ("P"), as a negative concept ("N"), as an unlabeled concept ("?"), or it may be selected for labeling of the concept (''S"). As indicated by the ellipses (" ... ") in each matrix 302, more samples 106 and labels 108 than those that are explicitly illustrated may be present. … The illustrated example labeling states for matrices 302B and 302A are as follows. For the before ASL matrix 302B, sample X1 has three associated labels that are:?,?, and P. Sample Xi, has three associated labels that are: ?, P, and N. Sample Xj has three associated labels that are: ?, ?, and?. Sample Xn has three associated labels that are: P, ?, and P. For the after ASL matrix 302A, sample X1 has three associated labels that are: S, ?, and P. Sample Xi, has three associated labels that are: ?, P, and N. Sample Xj has three associated labels that are: S, ?, and S. Sample Xn has three associated labels that are: P, ?, and P. Thus, example ASL procedure 300 has selected three sample-label pairs for labeling. These three sample-label pairs include one with sample X1 and two with sample Xj.”). In this case, the sample X1 with one associated label represents a first (candidate) training data set containing a first subset of labeled data instances (e.g., sample X1 with one associated label) added to the first (candidate) training data set.); and 
generating a second candidate training data set by adding a second subset of the labeled data instances to the first (Qi Figure 3, elements X1 and Xj (before ASL and after ASL): examiner’s note: As indicated earlier, Qi teaches performing active sampling and labeling (ASL) on data instances to select the most informative samples and labels for labeling, such that the process (e.g., oracle) categorizes multiple labels with label relevancy identifier (as positive concept, negative concept, unlabeled concept, or selected/not selected), with those samples and associated labels being selected for labeling (e.g., sample X1 and one label, sample Xj and two labels) representing the concept of identifying data instances used for further training, with the other samples and labels that were not selected for further training representing the concept of identifying and removing outlier instances according to their corresponding labels (which are a form of a predictive category) (Qi paragraphs [0044]-[0045]: “… As indicated by legend 308, each label may be categorized or labeled as a positive concept ("P"), as a negative concept ("N"), as an unlabeled concept ("?"), or it may be selected for labeling of the concept (''S"). As indicated by the ellipses (" ... ") in each matrix 302, more samples 106 and labels 108 than those that are explicitly illustrated may be present. … The illustrated example labeling states for matrices 302B and 302A are as follows. For the before ASL matrix 302B, sample X1 has three associated labels that are:?,?, and P. Sample Xi, has three associated labels that are: ?, P, and N. Sample Xj has three associated labels that are: ?, ?, and?. Sample Xn has three associated labels that are: P, ?, and P. For the after ASL matrix 302A, sample X1 has three associated labels that are: S, ?, and P. Sample Xi, has three associated labels that are: ?, P, and N. Sample Xj has three associated labels that are: S, ?, and S. Sample Xn has three associated labels that are: P, ?, and P. Thus, example ASL procedure 300 has selected three sample-label pairs for labeling. These three sample-label pairs include one with sample X1 and two with sample Xj.”). In this case, the sample Xj with two associated labels represents a second candidate training data set containing a second subset of labeled data instances (e.g., sample Xj with a first associated label; sample Xj with a second associated label) added to the first (candidate) training data set.).  
Regarding amended Claim 25, Qi in view of Esponda teaches
(Currently Amended) The computer program product of claim [[24]] 14, 
the instructions, when executed on one or more computers cause the one or more computers to perform operations (This claim limitation is similar in scope to a corresponding claim limitation in Claim 14, and hence is rejected under similar rationale.) comprising: 
calculating a cross-validation between the candidate model performance and the current predictive model performance (Esponda Figure 8, elements 820, 824: examiner’s note: In the context of generating a first intermediate prediction, Esponda teaches selecting (instantiating) a final model after choosing a model from among M1..Mn with the highest model output (prediction) as a first model, and a model from among M1..Mn with the lowest model output (prediction) as a second model, and using the corresponding model oracles to generate confidence scores (reflecting prediction accuracy) based on their model outputs (Esponda paragraphs [0030]; [0048]-[0049]; [0051]), where the model with the highest confidence score Esponda paragraphs [0071]-[0072]: “A final model is then selected 820 from the first and second models based on the first and second confidence values. Specifically, one of the first and second models having their corresponding oracles produce a higher confidence is selected as the final model. … The model output of the final model is then sent 824 out as first intermediate prediction 133 from model selector 132.”), and where the confidence scores indicating prediction accuracy are generated by the oracles and are represented by class probabilities (Esponda paragraph [0050]), which also serves as the current predictive model for the next iteration (Esponda paragraph [0051]: “In non-binary classification, the output selector 210 may simply choose a model output of a model predicted by oracles to be the most accurate as first intermediate prediction 133 without using the Min-Max function. In some embodiments, the output selector 210 may generate a default value as first intermediate prediction 133 if the confidence values of the all oracles are below a certain level.”). Hence, the comparison of confidence scores are interpreted as a cross-validation between a candidate model and the current predictive model performance, and the process of generating the intermediate prediction (through calculation of confidence scores).).  
Regarding amended Claim 26, Qi in view of Esponda teaches
(Currently Amended) The computer program product of claim [[24]] 14, 
wherein the candidate model[[s]] is a first candidate model (Esponda Figures 1B, elements 114, M1..Mn; Figure 2, elements M1..Mn; paragraph [0042]: “…the dynamic classifier 114 is comprised of three levels. The first level includes multiple data models M1 through Mn (hereinafter collectively referred to as "data models M")”, where one of the multiple data models corresponds to a first candidate model.), and 
wherein the instructions, when executed on the one or more computers, further cause the one or more computers to perform operations (This claim limitation is similar in scope to a comprising: 
generating [[the]] an assessment for first performance of the first candidate model[[s]] and the second performance of the second candidate model (Examiner’s note: Esponda teaches the process of performing intermediate predictions may be performed in parallel (Esponda paragraph [0067]: “ … although the process in FIG. 7 is illustrated as generating first intermediate prediction 133 before generating second intermediate prediction 129, second intermediate prediction 129 may be generated before first intermediate prediction 133 or both intermediate predictions 129, 133 may be generated in parallel. … In other embodiments, more than two intermediate predictions may be generated by one or more additional modules in the second level of dynamic classifier 114 to generate final prediction 152.”). For the case of two models among the multiple data models (selected as a first candidate model and a second candidate model), this may involve computing confidence scores for both models, as indicated in Esponda paragraph [0074]: “Also, instead of generating the confidence values for only the first and second models, the confidence values for all models may be computed. Then, a model with the highest confidence value may be selected as the final model.”, thus indicating that the same assessment is done for both models in parallel.).  
Regarding amended Claim 27, Qi teaches
(Currently Amended) A system, comprising: 
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations ([Qi Figure 7, elements, 700, 702, 706, 708, 710; paragraph [0035]: processor-executable instructions] [Qi paragraphs [0096]; [0098]; [0100]-[0101]:  computer device with processors and storage media containing instructions.]) comprising: 
selecting a set of labeled data instances from a labeled data reservoir, the labeled data reservoir comprising a pool of candidate training data (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.), 
wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data and a determination that re-training a current predictive model with updated training data that comprises at least the set of labeled data instances (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); 
generating a … model usingthe updated training data  that comprises at least the set of labeled data instances (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); 
in response to determining that a first performance of the … model is improved … , instantiating the updated training data set and the … model (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.).  
However, Qi does not teach
generating a candidate model …  that comprises at least the set of labeled data instances;
in response to determining that a first performance of the candidate model is improved from a second performance of the current predictive model, instantiating … the updated model.
Esponda teaches
generating a candidate model …  that comprises at least the set of labeled data instances (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.);
in response to determining that a first performance of the candidate model is improved from a second performance of the current predictive model, instantiating … the updated model (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.).
Both Qi and Esponda are analogous art since they both teach improving predictions in machine learning models.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the process of using oracles in the single classifier model method of Qi and extend it by implementing an ensemble model method where multiple oracles are associated with multiple models of Esponda as a way to improve the predictive ability of a model used in a system. The motivation to combine is taught in Esponda, as ensemble methods involving multiple models obtain better predictive performance versus updating single model. Both Qi and Esponda use oracles to help in identify training data that result in improved predictions (the oracle in Qi uses label relevancy identifiers while the oracles in Esponda use confidence scores, which are similar as they both identify relevant or related training data to improve predictability of a model), and thus it would have been obvious to a person having ordinary skill in the art to also implement the oracles from Esponda to generate confidence scores to facilitate a more qualitative comparison between a current predictive model and a candidate model, such that the final candidate model represents an improvement of the predictive performance, thus making the system more efficient and accurate in terms of making predictions (Esponda paragraph [0006]: “An ensemble method uses multiple distinct models to obtain better predictive performance than could be obtained from any of the individual models.” and Esponda paragraph [0030]: “Embodiments relate to a dynamic classifier for performing classification of an action or event associated with instance data using oracles that predict accuracy of predictions made by corresponding models. An oracle corresponding to a model is trained to generate a confidence value that represents accuracy of a prediction made by the model. Based on the confidence value and predictions, one of multiple models is selected and its prediction is used as an intermediate prediction. … By using the confidence value for each model and for each instance data, a more accurate prediction can be made.”).
Regarding amended Claim 28, Qi in view of Esponda teaches
(Currently Amended) The system of claim 27, 
wherein the labeled data reservoir comprises data (This claim limitation is similar in scope to a corresponding claim limitation in Claim 15, and hence is rejected under similar rationale.).  
Regarding amended Claim 29, Qi in view of Esponda teaches
(Currently Amended) The system of claim 27, 
from the set of labeled data instances is associated with a true label representing the data instance (This claim limitation is similar in scope to a corresponding claim limitation in Claim 16, and hence is rejected under similar rationale.).  
Regarding amended Claim 32, Qi in view of Esponda teaches
(Currently Amended) The system of claim 27, 
wherein generating the updated training data comprises identifying and removing outlier instances (This claim limitation is similar in scope to a corresponding claim limitation in Claim 19, and hence is rejected under similar rationale.).  
Regarding previously presented Claim 33, Qi in view of Esponda teaches
 The system of claim 32, 
wherein the current predictive model is a classifier predicting to which of a set of predictive categories an input data instance belongs (This claim limitation is similar in scope to a corresponding claim limitation in Claim 20, and hence is rejected under similar rationale.), and 
wherein selecting the set of labeled data instances from the labeled data reservoir comprises: identifying and removing outlier instances in one predictive category (This claim limitation is similar in scope to a corresponding claim limitation in Claim 20, and hence is rejected under similar rationale.).  
Regarding amended Claim 34, Qi in view of Esponda teaches
(Currently Amended) The system of claim 27, 
wherein the labeled data reservoir comprises labeled data instances that are received from multiple sources (This claim limitation is similar in scope to a corresponding claim limitation in Claim 21, and hence is rejected under similar rationale.), and 
wherein selecting a labeled data instance from the set of labeled data instances comprises: selecting the labeled data instance in response to a source of the labeled data instance matching with a pre-determined source (This claim limitation is similar in scope to a corresponding claim limitation in Claim 21, and hence is rejected under similar rationale.).  
Claims 17-18, 23, and 30-31 are rejected under 35 U.S.C. 103 as being unpatentable over Qi et al., U.S. PGPUB 2009/0125461, published 5/14/2009 [hereafter referred as Qi] in view of Esponda et al., U.S. PGPUB 2014/0279745, published 9/18/2014 [hereafter referred as Esponda] as applied to claims 14, 16, and 29; in further view of Lin et al., U.S. PGPUB 2012/0284213, published 11/8/2012 [hereafter referred as Lin].
Regarding amended Claim 17, Qi in view of Esponda as applied to Claim 16 teaches
(Currently Amended) The computer program product of claim 16.
However, Qi in view of Esponda does not teach  
wherein the determination is further based at least in part on a 
Lin teaches
wherein the determination is further based at least in part on a  (Examiner’s note: Under its broadest reasonable interpretation, the term “quality of the training data” is interpreted as representing a property of the sample data that provides the most information for training a predictive model, such as a richness score (indicated in Lin). Lin teaches implementing a richness score (Lin paragraphs [0003], [0130]) indicating how information-rich a particular data sample is in comparison to other retained data samples for purposes of testing the accuracy of a trained predictive model, where the richness score is calculated based on the number of nearby different samples and the total number of nearby samples (Lin paragraphs [0138]-[0142]), where the richness score represents a measure of quality of the training data as it provides an indication of similarity between different samples with different prediction outputs (Lin paragraph [0131]: “… if multiple data samples are clustered together, e.g., exhibit a high degree of similarity in features, then a small number of the data samples in the cluster can be given a relatively higher richness score then the rest in the cluster, which can be given a relatively low richness score on account of their redundancy. By comparison, a data sample whose nearest neighbor (when comparing features) is a data sample having a different output value is considered an information-rich data sample.”), thus introducing diversity of samples into a training set to improve the accuracy of the model. The richness score is used to decide on retaining the data samples (Lin paragraph [0134]: “…there is value in retaining a portion of these data samples, however, retaining all of them is of less value on account of the redundancy between the data samples.”), since samples that do not provide as much value (and hence contribute to quality of the training) for training will not be retained.).
Qi in view of Esponda and Lin are analogous art since they both teach improving predictions in machine learning models.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the oracles of Qi in view of Esponda and incorporate a mechanism to calculate information richness score of Lin as a way to tag and identify different data samples that share similarities. The motivation to combine is taught in Lin, as way to identify different data samples with enough similarities to provide diversity to the training, thereby improving the accuracy of the predictive model. A secondary motivation is also taught in Lin, since through the use of richness scores, a decision can be further made in which to only retain those data samples in memory/storage that are considered informative to making model predictions, thereby also improving storage efficiency and performance of the system by reducing the storage requirements for the system while maximizing the predictive accuracy of the model at the same time (Lin paragraph [0136]: “Having data samples that have borderline features values can therefore be informative when testing the accuracy of a trained predictive model in making predictions, particularly in borderline cases. Accordingly, these data samples are considered information-rich and are assigned relatively high richness scores.” and Lin paragraph [0151]: “Training data may be scored for richness so that the amount of training data retained in the training data repository 214 can be kept to a desired volume, e.g., on account of memory space restrictions. The training-richness-score can be also used to optimize what training samples are used to train and retrain models, so as to provide optimally trained predictive models in view of the input data expected to be received going forward with prediction requests. As is already described above, the test data set is used to test the accuracy of a trained predictive model before the data samples included in the test data set are used to train (or retrain) the trained predictive model.”).
Regarding amended Claim 18, Qi in view of Esponda, in further view of Lin teaches
 The computer program product of claim 17, 
wherein the current predictive model is a classifier predicting to which of a set of predictive categories an input data instance belongs (Qi paragraph [0034]: “… classifier 112 outputs one or more predicted labeled concepts in accordance with its trained classifying algorithm. Classifier 112 may employ any classifying algorithm.”), 
wherein a true label associated with a labeled data instance identifies the predictive category to which the labeled data instance belongs (Qi Figure 1, elements 106, 108, 110, 116, 118: examiner’s note: Qi teaches an oracle for the multi-label case (Qi paragraph [0040] and Qi Figure 2, elements 106(x), 108(x), 110, 206, 208) (Qi paragraph [0032]: “Oracle 110 may also be termed a teacher, an annotator, and so forth. Oracle 110 is typically a human or a group of humans that is capable of labeling each sample. The labeling may indicate, for example, a relevancy of label 108 to its associated sample 106. If two labeling categories are permitted for each label concept, the relevancies may be positive/negative, relevant/not relevant, related/not related, and so forth. … Oracle 110 provides or mputs the labeled relevancy at arrow 118 to active learning classifier trainer 102.”). Qi further teaches a true label ys (Qi paragraphs [0061]: “…true label of the selected sample ys …”), where ys is indicated in Qi paragraph [0058]: “tentatively selected to be requested for labeling (but not yet annotated by the oracle)…”, thus representing the predictive category for the labeled data instance.), and 
wherein selecting the set of labeled data instances from the labeled data reservoir is based at least in part on maintaining a class balance within the pool of candidate training data (Examiner’s note: Under its broadest reasonable interpretation, the term “class balance” is being interpreted as a condition where samples are selected to minimize a generalization error, as indicated in Qi. Qi teaches selecting appropriate samples as well as the labels to minimize the generalization error, which is interpreted to avoid overfitting the classifier with the same data (Qi paragraphs [0020]: “Thus, for active learning in multi-label settings, not only can the samples be appropriately selected for labeling, but the label set to be manually annotated by an oracle for a particular selected sample may also be appropriately selected. Selecting labels for annotation from among multiple potential labels may be pertinent because the varying contribution levels of different labels to the minimization of the generalization error may be different due to the existence of label correlations.”). Qi further teaches selecting different samples and corresponding labels from the set of training samples (corresponding to the “pool of candidate training data”) to provide different contributions to the minimize the expected classification error (i.e., to avoid overfitting), thus maintaining a class balance (Qi paragraph [0042]: “In contrast to traditional binary active learning approaches that select the most informative samples for annotation, ASL embodiments as described herein jointly select both the samples and the labels. Different labels of a certain sample have different contributions to minimizing the expected classification error of the to-be-trained classifier. Thus, annotating a well-selected portion of the labels may provide sufficient information for learning the classifier.”).).  
Regarding amended Claim 23, Qi in view of Esponda as applied to Claim 14 teaches
(Currently Amended) The computer program product of claim 14.
 However, Qi in view of Esponda does not teach 
wherein generating the updated training data [[set]] is based on a non-greedy algorithm, the generating comprising: replacing at least a portion of the candidate training data with a subset of the labeled data instances.  
Lin teaches
wherein generating the updated training data [[set]] is based on a non-greedy algorithm, the generating comprising: replacing at least a portion of the candidate training data with a subset of the labeled data instances (Examiner’s note: Under its broadest reasonable interpretation, the term “non-greedy algorithm” is interpreted as an algorithm or a  As indicated earlier, Lin teaches implementing a richness score (Lin paragraph [0130]) indicating how information-rich a particular data sample is in comparison to other retained data samples for purposes of testing the accuracy of a trained predictive model, where the richness score is used as a threshold to decide on retaining the data samples (Lin paragraph [0134]: “there is value in retaining a portion of these data samples, however, retaining all of them is of less value on account of the redundancy between the data samples. … In another example, a data sample is selected to be assigned a richness score of 0 (i.e., to effectively remove the data sample from the test data) based on whether removal of the data sample will increase the overall score of the data samples.”). In this context, the removal of samples from the updated training data (set) using a richness score representing a threshold is interpreted as a form of non-greedy replacement of the candidate training data.).  
Both Qi in view of Esponda and Lin are analogous art since they both teach improving predictions in machine learning models.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the oracles of Qi in view of Esponda and incorporate a mechanism to calculate information richness score of Lin as a way to tag and identify different data samples that share similarities. The motivation to combine is taught in Lin, as way to identify different data samples with enough similarities to provide diversity to the training, thereby improving the accuracy of the predictive model. A secondary motivation is also taught in Lin, since through the use of richness scores, a decision can be further made in which to only retain those data samples in memory/storage that are considered informative to making model predictions, thereby also improving storage efficiency and performance of the system by (Lin paragraph [0136]: “Having data samples that have borderline features values can therefore be informative when testing the accuracy of a trained predictive model in making predictions, particularly in borderline cases. Accordingly, these data samples are considered information-rich and are assigned relatively high richness scores.” and Lin paragraph [0151]: “Training data may be scored for richness so that the amount of training data retained in the training data repository 214 can be kept to a desired volume, e.g., on account of memory space restrictions. The training-richness-score can be also used to optimize what training samples are used to train and retrain models, so as to provide optimally trained predictive models in view of the input data expected to be received going forward with prediction requests. As is already described above, the test data set is used to test the accuracy of a trained predictive model before the data samples included in the test data set are used to train (or retrain) the trained predictive model.”).
Regarding amended Claim 30, Qi in view of Esponda as applied to Claim 29 teaches
(Currently Amended) The system of claim 29.
However, Qi in view of Esponda does not teach 
wherein the determination is further based at least in part on a 
Lin teaches
wherein the determination is further based at least in part on a  (This claim limitation is similar in scope to a corresponding claim limitation in Claim 17, and hence is rejected under similar rationale.).  
Both Qi in view of Esponda and Lin are analogous art since they both teach improving predictions in machine learning models.
Qi in view of Esponda and incorporate a mechanism to calculate information richness score of Lin as a way to tag and identify different data samples that share similarities. The motivation to combine is taught in Lin, as way to identify different data samples with enough similarities to provide diversity to the training, thereby improving the accuracy of the predictive model. A secondary motivation is also taught in Lin, since through the use of richness scores, a decision can be further made in which to only retain those data samples in memory/storage that are considered informative to making model predictions, thereby also improving storage efficiency and performance of the system by reducing the storage requirements for the system while maximizing the predictive accuracy of the model at the same time (Lin paragraph [0136]: “Having data samples that have borderline features values can therefore be informative when testing the accuracy of a trained predictive model in making predictions, particularly in borderline cases. Accordingly, these data samples are considered information-rich and are assigned relatively high richness scores.” and Lin paragraph [0151]: “Training data may be scored for richness so that the amount of training data retained in the training data repository 214 can be kept to a desired volume, e.g., on account of memory space restrictions. The training-richness-score can be also used to optimize what training samples are used to train and retrain models, so as to provide optimally trained predictive models in view of the input data expected to be received going forward with prediction requests. As is already described above, the test data set is used to test the accuracy of a trained predictive model before the data samples included in the test data set are used to train (or retrain) the trained predictive model.”).
Regarding amended Claim 31, Qi in view of Esponda, in further view of Lin teaches
(Currently Amended) The system of claim 30, 
wherein the current predictive model is a classifier predicting to which of a set of predictive categories an input data instance belongs (This claim limitation is similar in scope to a corresponding claim limitation in Claim 18, and hence is rejected under similar rationale.), 
wherein a true label associated with a labeled data instance identifies the predictive category to which the labeled data instance belongs (This claim limitation is similar in scope to a corresponding claim limitation in Claim 18, and hence is rejected under similar rationale.), and 
wherein selecting the set of labeled data instances from the labeled data reservoir is based at least in part on maintaining a class balance within the pool of candidate training data (This claim limitation is similar in scope to a corresponding claim limitation in Claim 18, and hence is rejected under similar rationale.).  

Conclusion







Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332. The examiner can normally be reached Monday-Friday 8:00am - 4:30pm PT.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121