DETAILED ACTION
The applicant’s request for continued examination regarding application number 16/418,232, filed May 21, 2019 has been entered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on April 1, 2022 has been entered.

Response to Amendments
The amendment filed April 1, 2022 has been entered. Examiner acknowledges receipt of Amendments to Application 16/418,232, which include: Amendments to the Claims and Remarks containing Applicant’s amendments. 
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner acknowledges Claims 1, 14, and 27 have been amended, with Claims 2-13, 24, and 35-39 previously cancelled. Claims 1, 14-23, and 25-34 remain pending in the application. 

Response to Arguments
Examiner acknowledges receipt of Arguments to Application 16/418,232, which include: Remarks containing Applicant’s arguments. 
Regarding Applicant’s Remarks for Claims 1, 14-23, and 25-34 on the grounds of nonstatutory double patenting over Claims 1, 13-21, and 23-31 of U.S. Patent No. 10,339,468, Examiner acknowledges Applicant’s arguments and have considered them, and have found them to be not persuasive. While certain terms in the instant application are now replaced with synonyms as part of Applicant’s amended claims, under its broadest reasonable interpretation these synonyms still convey the same context and scope as the corresponding claims from the issued patent, and therefore the nonstatutory double patenting rejection is still relevant and not withdrawn. Examiner points out that the amended limitations recited in the independent claims: “… selecting a set of labeled data instances that are stored in a labeled data reservoir with a first label and also labeled by a current predictive model with a second label different than the first label, the labeled data reservoir comprising a pool of candidate training data for the current predictive model …” have corresponding mappings to existing dependent claims identified in the issued patent. These updated mappings with additional clarifications according to the Applicant’s amended claims are provided in the sections indicated below.
Regarding Applicant's Remarks for Claims 1, 14-16, 19-22, 25-29, and 32-34 under 35 U.S.C. 103 as being unpatentable over Qi et al., U.S. PGPUB 2009/0125461, published 5/14/2009 [hereafter referred as Qi] in view of Esponda et al., U.S. PGPUB 2014/0279745, published 9/18/2014 [hereafter referred as Esponda]; and for Claims 17-18, 23, and 30-31 under 35 U.S.C. 103 as being unpatentable over Qi in view of Esponda, in further view of Lin et al., U.S. PGPUB 2012/0284213, published 11/8/2012 [hereafter referred as Lin], Examiner acknowledges Applicant’s arguments and have considered them, and have found them to be not persuasive. Examiner notes that the majority of the Applicant’s arguments are directed to the newly amended limitations in the independent claims, which have not been previously entered, and thus necessitate further examination and re-evaluation of the amended and related original claims. The updated claim mappings according to the Applicant’s amended claims are provided in the sections indicated below. 
Regarding Applicant’s Remarks:
“Qi discloses that "a sample-label pair is selected for labeling responsive to at least one error parameter." See Qi, paragraph [0049]. For example, with Qi, "[s]ample-label pair selector 202 analyzes the set of training samples 104 and selects a sample-label pair 212 (of FIG. 2) responsive to at least one error parameter." See Qi, paragraph [0092]. 
In other words, Qi merely discloses selection of a sample-label pair for labeling by an oracle (e.g., oracle 110) using an error parameter associated with the sample-label pair. However, Qi does not teach or suggest "selecting a set of labeled data instances that are stored in a labeled data reservoir with a first label and also labeled by a current predictive model with a second label different than the first label, the labeled data reservoir comprising a pool of candidate training data for the current predictive model, wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data and a determination that re-training the current predictive model with updated training data that comprises at least the set of labeled data instances will result in improved model performance," as recited in amended claim 1.” 
Examiner has considered this argument but finds the argument to be not persuasive. Applicant’s assertion that the Qi reference only teaches selection of a sample-label pair for labeling by an oracle using an error parameter does not take into account the supporting paragraphs and figures taught in the Qi reference. Examiner points out that the Qi reference teaches multi-label active learning (Qi Abstract), with Qi Figure 3 and [0044] teaching identifying different label categories in an active sampling and labeling example: “… As indicated by legend 308, each label may be categorized or labeled as a positive concept (“P”), as a negative concept (“N”) … As indicated by the ellipses (“…”) in each matrix 302, more samples 106 and labels 108 than those that are explicitly illustrated may be present.”. Examiner points out that Applicant’s assertion is based on analyzing Qi [0049] in isolation, by indicating that the selection of a sample-label pair is based on an error parameter as discussed in Qi [0092] (and shown in Qi Figure 6). However, Examiner further points out that Qi [0049] is part of the larger discussion of the flow diagram in Qi Figure 4, where Qi [0050]-[0051] additionally teaches further details of the sample-label pair selection and labeling process, where the labeling involves the generation and assignment of a relevancy indication that corresponds to an assignment of a positive or negative label to an associated sample-label pair as part of the active learning classifier training: “… At block 408, at least one selected sample label pair is submitted to an oracle … At block 410, a relevancy indication for the selected sample-label pair may be received from the oracle … a positive or a negative indication of the conceptual relevancy 214 of label 108(3)c to sample 106(3) may be received from oracle 110. … At block 412, the current set of training samples is updated with the received relevancy indication … active learning classifier trainer 102 may add the positive/negative relevancy indication 214 at label 108(3)c of associated sample 106(3) in set of training samples 104. At block 414, the classifier is updated.” Qi [0052] further teaches that this sample-label pair selection and labelling process is repeated until one or more criteria is met that indicates that no additional training is to be performed. Qi Figure 2 and [0039]-[0040] further teaches that both active learning classifier trainer 102 and the training samples set 104 are updated during this sample-label pair selection and labelling process (where the updated training samples are stored in the labeled data reservoir, and the active learning classifier represents the current predictive model being trained/learned): “After sample-label pair selector 202 has selected a sample-label pair 212, active learning classifier trainer 102 submits the selected sample-label pair 212 to oracle 110 at arrow 206 for labeling. At arrow 208, oracle 110 returns an indication of relevance 214 of the submitted label 108 to its associated sample 106. This indicated relevancy labeling 214 is incorporated into the set of training samples 104 to update it. With the updated training samples set 104, active learning classifier trainer 102 updates classifier 112 at arrow 210 … The selected sample-label pair 212 is submitted to oracle 110 at arrow 206. Active learning classifier trainer 102 requests that oracle 110 indicate the relevance of label 108(2)b to its associated sample 106(2). After relevancy indication 214 is returned from oracle 110 at arrow 208, active learning classifier trainer 102 can update the set of training samples 104. With the updated training samples set 104, active learning classifier trainer 102 can update classifier 112 at arrow 210.” Hence, Applicant’s assertion that the Qi reference only teaches selection of a sample-label pair for labeling by an oracle and not other aspects of the recited claim limitation is not persuasive, and the existing prior art rejection is maintained.
	Examiner notes that the remaining of the Applicant’s arguments are directed to applying the Applicant’s newly amended limitations to the Esponda and Lin references. Examiner points out that the Esponda and Lin references teach other limitations identified in the independent and dependent claims as indicated in the Final Office Action mailed December 1, 2021, and hence those arguments are found to be not persuasive, and the existing prior art rejection is maintained. 
As noted above, Applicant’s amended claim limitations necessitate further examination and re-evaluation of the amended and related original claims. The updated claim mappings according to the Applicant’s amended claims are provided in the relevant sections indicated below.

Double Patenting









The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1, 14-23, and 25-34 are rejected on the ground of nonstatutory double patenting as being unpatentable over 
Claims 1, 3, 4, 13-21, and 23-31 of U.S. Patent No. 10,339,468 (Johnston, David Alan; Jeffery, Shawn Ryan; and Polychronopoulos, Vasileios (Groupon, Inc.)). Note that the original amendment/correction marks from the instant application have been removed for easier reading and comparison against the issued patent. The bolded text between the instant application and the issued patent in Claim 1, 14, and 27 indicate a merged claim limitation between the instant application (1 limitation) and the issued patent (3 limitations) that express the same scope. Although the claims at issue are not identical, they are not patentably distinct from each other because the issued patent discloses all of the features and limitations in the instant application, thereby making the claims in the instant application obvious over the issued patent. While certain terms in the instant application are now replaced with synonyms as part of applicant’s amended claims, under its broadest reasonable interpretation these synonyms still convey the same context and scope as the corresponding claims from the issued patent.
Instant Application 16/418,232
U.S. Patent 10,339,468
Applicant: Groupon, Inc.
Applicant: Groupon, Inc.
Inventors: David Alan Johnston, Shawn Ryan Jeffery, Vasileios Polychronopoulos
Inventors: David Alan Johnston, Shawn Ryan Jeffery, Vasileios Polychronopoulos
Filed: May 21, 2019
Filed: October 20, 2015


Claim 1
Claim 1
(Currently Amended) A computer-implemented method for adaptively improving the performance of a current predictive model, the method comprising:

A computer-implemented method for adaptively improving the performance of a current predictive model by curating training data used to derive the current predictive model, the method comprising:

receiving the training data and the current predictive model derived using the training data;
selecting a set of labeled data instances that are stored in a labeled data reservoir… the labeled data reservoir comprising a pool of candidate training data for the current predictive model, 


The terms “a set of labeled data instances that are stored in a labeled data reservoir” and “a set of labeled data instances from a labeled data reservoir” are functionally equivalent to each other, since the phrase “that are stored in” is used to describe the location or source of the data instances (which is the labeled data reservoir), which is equivalent to the usage of the function word “from” in the corresponding claim from the issued patent.

Furthermore, the terms “candidate training data” and “possible training data” represent synonyms of each other in the context of these claims, and hence these claims are functionally equivalent.

The term “for the current predictive model” describing the pool of candidate training data merely emphasizes that the pool of candidate training data is used by the predictive model, which is already established as the source of the labeled data instances being selected and updated, and used for training the predictive model, and hence this term does not add any additional restrictions to the pool of candidate training data. 
selecting a set of labeled data instances from a labeled data reservoir, wherein the labeled data reservoir includes a pool of possible training data, 

wherein the set of labeled data instances are not included in the training data, 

wherein each labeled data instance is associated with a true label representing the instance, and
wherein selecting the set of labeled data instances is based at least in part on … 
… a determination that re-training the current predictive model with updated training data that comprises at least the set of labeled data instances will result in improved model performance;


As identified in an earlier claim limitation, “the set of labeled data instances” are selected from a labeled data reservoir that is a pool of candidate training data, and this pool is updated by the process of curating described in this claim. Hence this set of labeled data instances represents “updated training data”, as well as a pool of “possible” or “candidate” training data (thereby making these terms “possible training data”, “candidate training data”, and “updated training data” synonyms of each other in the context of these claims).
wherein selecting the set of labeled data instances is based on a determination that re-training the model with updated training data likely will result in improved model performance;
generating a candidate model using the updated training data that comprises at least the set of labeled data instances;


The claim limitation reciting “generating a candidate model …” is a form of “deriving a candidate model …”, where the candidate training data set is based on the received training data and a set of label data instances.
generating at least one candidate training data set by updating the training data using the set of labeled data instances;
deriving a candidate model using the candidate training data set;



generating, by a training data manager component, an assessment of whether the candidate model performance is improved from the current model performance; and 
in response to determining that a first performance of the candidate model is improved from a second performance of the current predictive model,
 
instantiating the updated training data set and the candidate model.


The terms “in an instance in which” and “in response to determining that” are synonyms since both identify a particular point in time (i.e., after the determination step recited in an earlier claim). Furthermore, applying “first” and “second” qualifier terms to two distinct performance (a candidate model performance and a current model performance) does not further differentiate or limit the already distinct performance from the two models.
instantiating the candidate training data set and the candidate model 

in an instance in which the candidate model performance is improved from the current model performance.


Claim 1
Claim 3
(Currently Amended) A computer-implemented method for adaptively improving the performance of a current predictive model, the method comprising:

The method of claim 1,


By virtue of dependency, Claim 3 includes all limitations from independent Claim 1, see above Claim 1 mapping.

… wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data …


The term “a statistical distribution” is a narrower form of “a distribution”, and hence anticipates the term “a distribution” from the issued patent.

The determination step is part of selecting the set of labeled instances (as recited in Claim 1: “wherein selecting the set of labeled data instances is based at least in part on … a determination…”).
Hence this claim limitation is functionally equivalent to the corresponding claim limitation “wherein the determination is based at least in part on analyzing the distribution …” from the issued patent.
wherein the determination is based at least in part on analyzing the distribution and quality of the training data.


The determination step indicated in this claim limitation is based on selecting the set of labeled data instances (as recited in Claim 1: “wherein selecting the set of labeled data instances is based on a determination…”, such that when the claim is taken as a whole with Claim 13, the “determination” term can be replaced by its functional equivalent: “selecting the set of labeled data instances is based at least in part on analyzing the distribution …”.




Claim 1
Claim 4
(Currently Amended) A computer-implemented method for adaptively improving the performance of a current predictive model, the method comprising:

The method of claim 3,


By virtue of dependency, Claim 4 includes all limitations from Claim 3, where Claim 3 traces back to independent Claim 1.

… a set of labeled data instances … with a first label and also labeled by a current predictive model with a second label different than the first label …


Under its broadest reasonable interpretation, using a current predictive model to generate a first and second label different than the first label for a set of labeled data instances is interpreted as a predictive model performing a labeling process to produce different labels for a set of input data instances. Hence this claim limitation is functionally equivalent to the corresponding claim limitation “wherein the current model is a classifier predicting to which of a set of predictive categories an input data instance belongs” from the issued patent.
wherein the current model is a classifier predicting to which of a set of predictive categories an input data instance belongs, 


The recited limitation broadly recites that the current model is a classifier that performs category predictions on input data, such that these category predictions indicate where each input data belongs (i.e., classified). A person having ordinary skill in the art will understand that the term “classifier” describing the current model broadly indicates that the current model performs predictions, and that each of the generated category predictions by the current “predictive” model represent different labels or classes for the input data. Hence it would be obvious to a person having ordinary skill in the art that this classifier that performs category predictions on input data corresponds to a predictive model performing a labeling process to produce different labels for a set of input data instances.

wherein a true label associated with a labeled data instance identifies the predictive category to which the labeled data instance belongs, and 

wherein selecting the set of labeled data instances from the labeled data reservoir is based at least in part on maintaining a class balance within the training data.


Claim 14
Claim 13
(Currently Amended) A computer program product, 
A computer program product, 
stored on a non-transitory computer readable medium, comprising instructions that when executed on one or more computers cause the one or more computers to perform operations comprising:
stored on a non-transitory computer readable medium, comprising instructions that when executed on one or more computers cause the one or more computers to perform operations comprising:

receiving the training data and the current predictive model derived using the training data;
selecting a set of labeled data instances that are stored in a labeled data reservoir … the labeled data reservoir comprising a pool of candidate training data for the current predictive model, 


The terms “a set of labeled data instances that are stored in a labeled data reservoir” and “a set of labeled data instances from a labeled data reservoir” are functionally equivalent to each other, since the phrase “that are stored in” is used to describe the location or source of the data instances (which is the labeled data reservoir), which is equivalent to the usage of the function word “from” in the corresponding claim from the issued patent.

Furthermore, the terms “candidate training data” and “possible training data” represent synonyms of each other in the context of these claims, and hence these claims are functionally equivalent.

The term “for the current predictive model” describing the pool of candidate training data merely emphasizes that the pool of candidate training data is used by the predictive model, which is already established as the source of the labeled data instances being selected and updated, and used for training the predictive model, and hence this term does not add any additional restrictions to the pool of candidate training data.
selecting a set of labeled data instances from a labeled data reservoir, wherein the labeled data reservoir includes a pool of possible training data, 

wherein the set of labeled data instances are not included in the training data, 

wherein each labeled data instance is associated with a true label representing the instance, and 
wherein selecting the set of labeled data instances is based at least in part on … 
… a determination that re-training the current predictive model with updated training data that comprises at least the set of labeled data instances will result in improved model performance;


As identified in an earlier claim limitation, “the set of labeled data instances” are selected from a labeled data reservoir that is a pool of candidate training data, and this pool is updated by the process of curating described in this claim. Hence this set of labeled data instances represents “updated training data”, as well as a pool of “possible” or “candidate” training data (thereby making these terms “possible training data”, “candidate training data”, and “updated training data” synonyms of each other in the context of these claims).
wherein selecting the set of labeled data instances is based on a determination that re-training the model with updated training data likely will result in improved model performance;
generating a candidate model using the updated training data that comprises at least the set of labeled data instances;


The claim limitation reciting “generating a candidate model …” is a form of “deriving a candidate model …”, where the candidate training data set is based on the received training data and a set of label data instances.
generating at least one candidate training data set by updating the training data using the set of labeled data instances;
deriving a candidate model using the candidate training data set;

generating, by a training data manager component, an assessment of whether the candidate model performance is improved from the current model performance; and
in response to determining that a first performance of the candidate model is improved from a second performance of the current predictive model,
 
instantiating the updated training data set and the candidate model.


The terms “in an instance in which” and “in response to determining that” are synonyms since both identify a particular point in time (i.e., after the determination step recited in an earlier claim). Furthermore, applying “first” and “second” qualifier terms to two distinct performance (a candidate model performance and a current model performance) does not further differentiate or limit the already distinct performance from the two models.
instantiating the candidate training data set and the candidate model 

in an instance in which the candidate model performance is improved from the current model performance.


Claim 14
Claim 15
(Currently Amended) A computer program product, …

The computer program product of claim 13,


By virtue of dependency, Claim 15 includes all limitations from independent Claim 13, see above Claim 13 mapping.

… wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data …


The term “a statistical distribution” is a narrower form of “a distribution”, and hence anticipates the term “a distribution” from the issued patent.

The determination step is part of selecting the set of labeled instances (as recited in Claim 14: “wherein selecting the set of labeled data instances is based at least in part on … a determination…”).
Hence this claim limitation is functionally equivalent to the claim limitation “wherein the determination is based at least in part on analyzing the distribution …” from the issued patent.
wherein the determination is based at least in part on analyzing the distribution and quality of the training data.


The determination step indicated in this claim limitation is based on selecting the set of labeled data instances (as recited in Claim 13: “wherein selecting the set of labeled data instances is based on a determination…”, such that when the claim is taken as a whole with Claim 13, the “determination” term can be replaced by its functional equivalent: “selecting the set of labeled data instances is based at least in part on analyzing the distribution …”.


Claim 14
Claim 16
(Currently Amended) A computer-program product, …

The computer program product of claim 15,


By virtue of dependency, Claim 16 includes all limitations from Claim 15, where Claim 15 traces back to independent Claim 13.

… a set of labeled data instances … with a first label and also labeled by a current predictive model with a second label different than the first label …


Under its broadest reasonable interpretation, using a current predictive model to generate a first and second label different than the first label for a set of labeled data instances is interpreted as a predictive model performing a labeling process to produce different labels for a set of input data instances. Hence this claim limitation is functionally equivalent to the corresponding claim limitation “wherein the current model is a classifier predicting to which of a set of predictive categories an input data instance belongs” from the issued patent.
wherein the current model is a classifier predicting to which of a set of predictive categories an input data instance belongs,


The recited limitation broadly recites that the current model is a classifier that performs category predictions on input data, such that these category predictions indicate where each input data belongs (i.e., classified). A person having ordinary skill in the art will understand that the term “classifier” describing the current model broadly indicates that the current model performs predictions, and that each of the generated category predictions by the current “predictive” model represent different labels or classes for the input data. Hence it would be obvious to a person having ordinary skill in the art that this classifier that performs category predictions on input data corresponds to a predictive model performing a labeling process to produce different labels for a set of input data instances.

wherein a true label associated with a labeled data instance identifies the predictive category to which the labeled data instance belongs, and 

wherein selecting the set of labeled data instances from the labeled data reservoir is based at least in part on maintaining a class balance within the training data.


Claim 15
Claim 14
(Previously Presented) The computer program product of claim 14, 
The computer program product of claim 13,
wherein the labeled data reservoir comprises data collected over time from input data being processed by the current predictive model.


Under its broadest reasonable interpretation, this limitation recites data that has been collected over time, which is functionally equivalent to data that has been collected continuously over a period of time.
wherein the labeled data reservoir includes data that have been collected continuously over time from input data being processed by the current predictive model.


Under its broadest reasonable interpretation, the term “continuously” broadly recites and emphasizes the time aspect of the data collection, which is recited as being performed over a period of time.


Claim 16
Claim 13
(Previously Presented) The computer program product of claim 14,

By virtue of dependency, Claim 16 includes all limitations from independent Claim 14; refer to above Claim 14 mapping.
A computer program product, 

stored on a non-transitory computer readable medium, comprising instructions that when executed on one or more computers cause the one or more computers to perform operations comprising:

receiving the training data and the current predictive model derived using the training data;

selecting a set of labeled data instances from a labeled data reservoir, wherein the labeled data reservoir includes a pool of possible training data, 

wherein the set of labeled data instances are not included in the training data, 
wherein each labeled data instance from the set of labeled data instances is associated with a true label representing the data instance.
wherein each labeled data instance is associated with a true label representing the instance, and 

wherein selecting the set of labeled data instances is based on a determination that re-training the model with updated training data likely will result in improved model performance;

generating at least one candidate training data set by updating the training data using the set of labeled data instances;
deriving a candidate model using the candidate training data set;

generating, by a training data manager component, an assessment of whether the candidate model performance is improved from the current model performance; and

instantiating the candidate training data set and the candidate model 

in an instance in which the candidate model performance is improved from the current model performance.


Claim 17
Claim 15
(Previously Presented) The computer program product of claim 16

By virtue of dependency, Claim 17 includes all limitations from Claims 14 and 16; refer to above Claims 14 and 16 mapping. 
The computer program product of claim 13,


By virtue of dependency, Claim 15 includes all limitations from independent Claim 13, see above Claim 13 mapping.

wherein the determination is further based at least in part on a quality of the updated training data set.


The term “a distribution” is now incorporated into independent Claim 14 and further limited to indicate “a statistical distribution”; see above claim mapping for Claim 14.
wherein the determination is based at least in part on analyzing the distribution and quality of the training data.


Claim 18
Claim 16
(Previously Presented) The computer program product of claim 17,
The computer program product of claim 15,
wherein the current predictive model is a classifier predicting to which of a set of predictive categories an input data instance belongs, 
wherein the current model is a classifier predicting to which of a set of predictive categories an input data instance belongs,
wherein a true label associated with a labeled data instance identifies the predictive category to which the labeled data instance belongs, and
wherein a true label associated with a labeled data instance identifies the predictive category to which the labeled data instance belongs, and
wherein selecting the set of labeled data instances from the labeled data reservoir is based at least in part on maintaining a class balance within the pool of candidate training data.


As established earlier, “updated training data”, “possible training data”, and “candidate training data” are synonyms of each other with respect to each other in the context of these claims, as they all identify and originate from a pool of (candidate/possible) training data.
wherein selecting the set of labeled data instances from the labeled data reservoir is based at least in part on maintaining a class balance within the training data.


Claim 19
Claim 17
(Previously Presented) The computer program product of claim 14, 
The computer program product of claim 13,
wherein generating the updated training data comprises: identifying and removing outlier instances.


As established earlier, “updated training data”, “possible training data”, and “candidate training data” are synonyms of each other with respect to each other in the context of these claims, as they all identify and originate from a pool of (candidate/possible) training data.
wherein generating the candidate training data comprises: identifying and removing outlier instances.


Claim 20
Claim 18
(Previously Presented) The computer program product of claim 19, 
The computer program product of claim 17,
wherein the current predictive model is a classifier predicting to which of a set of predictive categories an input data instance belongs, and
wherein the current model is a classifier predicting to which of a set of predictive categories an input data instance belongs, and
wherein selecting the set of labeled data instances from the labeled data reservoir comprises: identifying and removing outlier instances in one predictive category.
wherein selecting the set of labeled data instances from the labeled data reservoir comprises: identifying and removing outlier instances in one predictive category.


Claim 21
Claim 19
(Previously Presented) The computer program product of claim 14, 
The computer program product of claim 13,
wherein the labeled data reservoir comprises labeled data instances that are received from multiple sources, and 
wherein the labeled data reservoir includes labeled data instances that are received from multiple sources, and
wherein selecting a labeled data instance from the set of labeled data instances comprises:
wherein selecting a labeled data instance from the set of labeled data instances comprises:

comparing a source of the labeled data instance with a pre-determined source; and
selecting the labeled data instance in response to a source of the labeled data instance matching with a pre-determined source.


The terms “in an instance in which” and “in response to” are synonyms since both identify a particular point in time.
selecting the labeled data instance in an instance in which the source of the labeled data instance matches the pre-determined source.


Claim 22
Claim 20
(Previously Presented) The computer program product of claim 14, 
The computer program product of claim 13,
wherein generating the updated training data is based on a greedy algorithm, the generating comprising:
wherein generating at least one candidate training data set is based on a greedy algorithm, the generating comprising:
generating a first training data set by adding a first subset of the labeled data instances to at least a portion of the candidate training data; and

generating a first candidate training data set by adding a first subset of the labeled data instances to the training data; and
generating a second candidate training data set by adding a second subset of the labeled data instances to the first training data set.


As established earlier, “updated training data”, “possible training data”, and “candidate training data” are synonyms of each other with respect to each other in the context of these claims. As recited in Claim 14, the generation of training sets are performed using the labeled data reservoir comprising the pool of candidate training data, therefore the “first training data set” still represents the “first candidate training data set”.
generating a second candidate training data set by adding a second subset of the labeled data instances to the first candidate training data set.


Claim 23
Claim 21
(Previously Presented) The computer program product of claim 14, 
The computer program product of claim 13,
wherein generating the updated training data is based on a non-greedy algorithm, the generating comprising:
wherein generating at least one candidate training data set is based on a non-greedy algorithm, the generating comprising:
replacing at least a portion of the candidate training data with a subset of the labeled data instances.


As established earlier, “updated training data”, “possible training data”, and “candidate training data” are synonyms of each other with respect to each other in the context of these claims. As recited in Claim 14, the generation of training sets are performed using the labeled data reservoir comprising the pool of candidate training data, therefore the “training data” and  “candidate training data” are functionally equivalent.
replacing the training data with a subset of the labeled data instances.


Claim 25
Claim 23
(Previously Presented) The computer program product of claim 14, wherein the instructions, when executed on the one or more computers, further cause the one or more computers to perform operations comprising:


The claim limitation “the instructions, when executed on the one or more computers further cause the one or more computers to perform operations” is already established from independent Claim 14 (and established in corresponding Claim 22 of the issued patent, which traces back to independent Claim 13 in the issued patent), and therefore does not add any further distinction to the claim.
The computer program product of claim 22,
calculating a cross-validation between the first performance of the candidate model and the second performance of the current predictive model.


Applying “first” and “second” qualifier terms to a candidate model performance and a current model performance does not further differentiate or limit the already distinct performance from the two models.
wherein generating the assessment comprises calculating a cross-validation between the candidate model performance and the current model performance.


Claim 26
Claim 24
(Previously Presented) The computer program product of claim 14, 
The computer program product of claim 22,
wherein the candidate model is a first candidate model, and


The term “the candidate model is a first candidate model” is a narrower form of “there are multiple candidate models”. Furthermore, as identified in the following claim limitation, a second candidate model is required to generate an assessment. Hence, this claim taken as a whole anticipates the claim “wherein there are multiple candidate models” from the issued patent.
wherein there are multiple candidate models, and 
wherein the instructions, when executed on the one or more computers, further cause the one or more computers to perform operations comprising:
generating an assessment for first performance of the first candidate model and the second performance of the second candidate model in parallel.


The claim limitation “the instructions, when executed on the one or more computers further cause the one or more computers to perform operations” is already established from independent Claim 14 (and established in corresponding Claim 22 of the issued patent, which traces back to independent Claim 13 in the issued patent), and therefore does not add any further distinction to the claim.

Applying “first” and “second” qualifier terms to two different candidate models (representing multiple candidate models) does not further differentiate or limit the already distinct performance from the two models.
Generating the assessment between two candidate models in parallel is a narrower form of generating the assessment for multiple candidate models in parallel, and hence this claim anticipates the claim “wherein generating the assessment for each of the multiple candidate models is implemented in parallel” from the issued patent.
wherein generating the assessment for each of the multiple candidate models is implemented in parallel.


Claim 27
Claim 25
(Currently Amended) A system, comprising:
A system, comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising:

receiving the training data and the current predictive model derived using the training data;
selecting a set of labeled data instances that are stored in a labeled data reservoir … the labeled data reservoir comprising a pool of candidate training data for the current predictive model, 


The terms “a set of labeled data instances that are stored in a labeled data reservoir” and “a set of labeled data instances from a labeled data reservoir” are functionally equivalent to each other, since the phrase “that are stored in” is used to describe the location or source of the data instances (which is the labeled data reservoir), which is equivalent to the usage of the function word “from” in the corresponding claim from the issued patent.

Furthermore, the terms “candidate training data” and “possible training data” represent synonyms of each other in the context of these claims, and hence these claims are functionally equivalent.

The term “for the current predictive model” describing the pool of candidate training data merely emphasizes that the pool of candidate training data is used by the predictive model, which is already established as the source of the labeled data instances being selected and updated, and used for training the predictive model, and hence this term does not add any additional restrictions to the pool of candidate training data. 
selecting a set of labeled data instances from a labeled data reservoir, wherein the labeled data reservoir includes a pool of possible training data,

wherein the set of labeled data instances are not included in the training data, 

wherein each labeled data instance is associated with a true label representing the instance, and
wherein selecting the set of labeled data instances is based at least in part on … 
… a determination that re-training the current predictive model with updated training data that comprises at least the set of labeled data instances will result in improved model performance;


As identified in an earlier claim limitation, “the set of labeled data instances” are selected from a labeled data reservoir that is a pool of candidate training data, and this pool is updated by the process of curating described in this claim. Hence this set of labeled data instances represents “updated training data”, as well as a pool of “possible” or “candidate” training data (thereby making these terms “possible training data”, “candidate training data”, and “updated training data” synonyms of each other in the context of these claims).
wherein selecting the set of labeled data instances is based on a determination that re-training the model with updated training data likely will result in improved model performance;
generating a candidate model using the updated training data that comprises at least the set of labeled data instances;


The claim limitation reciting “generating a candidate model …” is a form of “deriving a candidate model …”, where the candidate training data set is based on the received training data and a set of label data instances.
generating at least one candidate training data set by updating the training data using the set of labeled data instances;
deriving a candidate model using the candidate training data set;

generating, by a training data manager component, an assessment of whether the candidate model performance is improved from the current model performance; and
in response to determining that a first performance of the candidate model is improved from a second performance of the current predictive model,
 
instantiating the updated training data set and the candidate model.


The terms “in an instance in which” and “in response to determining that” are synonyms since both identify a particular point in time (i.e., after the determination step recited in an earlier claim). Furthermore, applying “first” and “second” qualifier terms to two distinct performance (a candidate model performance and a current model performance) does not further differentiate or limit the already distinct performance from the two models.
instantiating the candidate training data set and the candidate model

in an instance in which the candidate model performance is improved from the current model performance.


Claim 27
Claim 27
(Currently Amended) A system, comprising: …

The system of claim 25


By virtue of dependency, Claim 27 includes all limitations from independent Claim 25, see above Claim 25 mapping.

… wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data …


The term “a statistical distribution” is a narrower form of “a distribution”, and hence anticipates the term “a distribution” from the issued patent.

The determination step is part of selecting the set of labeled instances (as recited in Claim 27: “wherein selecting the set of labeled data instances is based at least in part on … a determination…”).
Hence this claim limitation is functionally equivalent to the claim limitation “wherein the determination is based at least in part on analyzing the distribution …” from the issued patent.
wherein the determination is based at least in part on analyzing the distribution and quality of the training data.


The determination step indicated in this claim limitation is based on selecting the set of labeled data instances (as recited in Claim 25: “wherein selecting the set of labeled data instances is based on a determination…”, such that when the claim is taken as a whole with Claim 13, the “determination” term can be replaced by its functional equivalent: “selecting the set of labeled data instances is based at least in part on analyzing the distribution …”.


Claim 27
Claim 28
(Currently Amended) A system, comprising: …

The system of claim 27,


By virtue of dependency, Claim 28 includes all limitations from Claim 27, where Claim 27 traces back to independent Claim 25.

… a set of labeled data instances … with a first label and also labeled by a current predictive model with a second label different than the first label …


Under its broadest reasonable interpretation, using a current predictive model to generate a first and second label different than the first label for a set of labeled data instances is interpreted as a predictive model performing a labeling process to produce different labels for a set of input data instances. Hence this claim limitation is functionally equivalent to the corresponding claim limitation “wherein the current model is a classifier predicting to which of a set of predictive categories an input data instance belongs” from the issued patent.
wherein the current model is a classifier predicting to which of a set of predictive categories an input data instance belongs,


The recited limitation broadly recites that the current model is a classifier that performs category predictions on input data, such that these category predictions indicate where each input data belongs (i.e., classified). A person having ordinary skill in the art will understand that the term “classifier” describing the current model broadly indicates that the current model performs predictions, and that each of the generated category predictions by the current “predictive” model represent different labels or classes for the input data. Hence it would be obvious to a person having ordinary skill in the art that this classifier that performs category predictions on input data corresponds to a predictive model performing a labeling process to produce different labels for a set of input data instances.

wherein a true label associated with a labeled data instance identifies the predictive category to which the labeled data instance belongs, and 

wherein selecting the set of labeled data instances from the labeled data reservoir is based at least in part on maintaining a class balance within the training data.


Claim 28
Claim 26
(Previously Presented) The system of claim 27, 
The system of claim 25,
wherein the labeled data reservoir comprises data collected over time from input data being processed by the current predictive model.


Under its broadest reasonable interpretation, this limitation recites data that has been collected over time, which is functionally equivalent to data that has been collected continuously over a period of time.
wherein the labeled data reservoir includes data that have been collected continuously over time from input data being processed by the current predictive model.


Under its broadest reasonable interpretation, the term “continuously” broadly recites and emphasizes the time aspect of the data collection, which is recited as being performed over a period of time.


Claim 29
Claim 25
(Previously Presented) The system of claim 27,


By virtue of dependency, Claim 29 includes all limitations from independent Claim 27; refer to above Claim 27 mapping.
A system, comprising:

one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising:

receiving the training data and the current predictive model derived using the training data;

selecting a set of labeled data instances from a labeled data reservoir, wherein the labeled data reservoir includes a pool of possible training data,

wherein the set of labeled data instances are not included in the training data, 
wherein each labeled data instance from the set of labeled data instances is associated with a true label representing the data instance.
wherein each labeled data instance is associated with a true label representing the instance, and

wherein selecting the set of labeled data instances is based on a determination that re-training the model with updated training data likely will result in improved model performance;

generating at least one candidate training data set by updating the training data using the set of labeled data instances;
deriving a candidate model using the candidate training data set;

generating, by a training data manager component, an assessment of whether the candidate model performance is improved from the current model performance; and

instantiating the candidate training data set and the candidate model

in an instance in which the candidate model performance is improved from the current model performance.


Claim 30
Claim 27
(Previously Presented) The system of claim 29, 


By virtue of dependency, Claim 30 includes all limitations from Claims 27 and 29; refer to above Claims 27 and 29 mapping.
The system of claim 25,


By virtue of dependency, Claim 27 includes all limitations from independent Claim 25, see above Claim 25 mapping.

wherein the determination is based at least in part on quality of the updated training data.


The term “a distribution” is now incorporated into independent Claim 27 and further limited to indicate “a statistical distribution”; see above claim mapping for Claim 27.
wherein the determination is based at least in part on analyzing the distribution and quality of the training data.


Claim 31
Claim 28
(Previously Presented) The system of claim 30, 
The system of claim 27,
wherein the current predictive model is a classifier predicting to which of a set of predictive categories an input data instance belongs, 
wherein the current model is a classifier predicting to which of a set of predictive categories an input data instance belongs, 
wherein a true label associated with a labeled data instance identifies the predictive category to which the labeled data instance belongs, and
wherein a true label associated with a labeled data instance identifies the predictive category to which the labeled data instance belongs, and
wherein selecting the set of labeled data instances from the labeled data reservoir is based at least in part on maintaining a class balance within the pool of candidate training data.
wherein selecting the set of labeled data instances from the labeled data reservoir is based at least in part on maintaining a class balance within the training data.


Claim 32
Claim 29
(Previously Presented) The system of claim 27, 
The system of claim 25,
wherein generating the updated training data comprises: identifying and removing outlier instances.
wherein generating the candidate training data comprises: identifying and removing outlier instances.


Claim 33
Claim 30
(Previously Presented) The system of claim 32, 
The system of claim 29,
wherein the current predictive model is a classifier predicting to which of a set of predictive categories an input data instance belongs, and
wherein the current model is a classifier predicting to which of a set of predictive categories an input data instance belongs, and 
wherein selecting the set of labeled data instances from the labeled data reservoir comprises identifying and removing outlier instances in one predictive category.
wherein selecting the set of labeled data instances from the labeled data reservoir comprises: identifying and removing outlier instances in one predictive category.


Claim 34
Claim 31
(Previously Presented) The system of claim 27, 
The system of claim 25,
wherein the labeled data reservoir comprises labeled data instances that are received from multiple sources, and 
wherein the labeled data reservoir includes labeled data instances that are received from multiple sources, and 

wherein selecting a labeled data instance from the set of labeled data instances comprises:
wherein selecting a labeled data instance from the set of labeled data instances comprises:
selecting the labeled data instance in response to a source of the labeled data instance matching with a pre-determined source.


The terms “in an instance in which” and “in response to” are synonyms since both identify a particular point in time.
selecting the labeled data instance in an instance in which the source of the labeled data instance matches the pre-determined source.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 14-16, 19-22, 25-29, and 32-34 are rejected under 35 U.S.C. 103 as being unpatentable over 
Qi et al., U.S. PGPUB 2009/0125461, published 5/14/2009 [hereafter referred as Qi] in view of Esponda et al., U.S. PGPUB 2014/0279745, published 9/18/2014 [hereafter referred as Esponda].
Regarding amended Claim 1, 
Qi teaches
(Currently Amended) A computer-implemented method for adaptively improving the performance of a current predictive model, the method comprising: 
selecting a set of labeled data instances that are stored in a labeled data reservoir with a first label and also labeled by a current predictive model with a second label different than the first label, the labeled data reservoir comprising a pool of candidate training data for the current predictive model (Examiner’s note: Qi teaches a multi-label active learning method for selecting samples for labeling a set of training samples, where the labeling involves the generation and assignment of a relevancy indication that corresponds to an assignment of a positive or negative label to an associated sample-label pair as part of the active learning classifier training, and where these positive or negative labels represent different labels assigned to each associated sample-label pair by the current predictive model. Qi further teaches that this labelling process is repeated until no further training is to be performed based on one or more criteria, such that both active learning classifier trainer and the set of training samples are updated, where the updated training samples are stored in the labeled data reservoir, and the active learning classifier represents the current predictive model being trained/learned (Qi Figure 2, elements 104, 106(x), 108(x), and [0036]-[0040]: “FIG.2 is a block diagram that illustrates an example scenario 200 for multi-label active learning in which sample-label pairs 212 are selected for labeling … Training samples set 104 may include any number of samples 106, each of which may have any number of associated labels 108. … sample-label pair selector 202 selects at arrow 204 a sample 106(x) and an associated label 108(x) to jointly form a sample-label pair 212 for labeling by oracle 110 … a sample-label pair may be selected responsive to a Bayesian classification error bound for a multi-label scenario … After sample-label pair selector 202 has selected a sample-label pair 212, active learning classifier trainer 102 submits the selected sample-label pair 212 to oracle 110 at arrow 206 for labeling … oracle 110 returns an indication of relevance 214 of the submitted label 108 to its associated sample 106. … After relevancy indication 214 is returned from oracle 110 at arrow 208, active learning classifier trainer 102 can update the set of training samples 104. With the updated training samples set 104, active learning classifier trainer 102 can update classifier 112 at arrow 210.”; Figure 3, [0044]: “… each label may be categorized or labeled as a positive concept (“P”), as a negative concept (“N”) … more samples 106 and labels 108 than those that are explicitly illustrated may be present …”; and Figure 4, [0049]-[0052]: “… At block 404, a current set of training samples is analyzed … At block 406, a sample-label pair is selected for labeling … At block 408, at least one selected sample label pair is submitted to an oracle … At block 410, a relevancy indication for the selected sample-label pair may be received from the oracle … a positive or a negative indication of the conceptual relevancy 214 of label 108(3)c to sample 106(3) may be received from oracle 110. … At block 412, the current set of training samples is updated with the received relevancy indication … active learning classifier trainer 102 may add the positive/negative relevancy indication 214 at label 108(3)c of associated sample 106(3) in set of training samples 104. At block 414, the classifier is updated. … At block 416, it is determined if additional classifier training is to be performed. … If more training is to be performed … the method of flow diagram 400 continues at block 404 …”).), 
wherein selecting the set of labeled data instances is based at least in part on a statistical distribution of the candidate training data and a determination that re-training the current predictive model with updated training data that comprises at least the set of labeled data instances will result in improved model performance (Examiner’s note: Qi teaches selecting a sample-label pair to minimize an expected error parameter for the classifier, where the selected multi-label samples being updated with labels are based on a marginal sample distribution P(x), with this marginal sample distribution representing “a statistical distribution” (Qi [0092]: “… classifier 112 is to classify objects in accordance with multiple labels that are also associated with samples of a set of training samples 104 ( of FIGS. 1 and 2). Sample-label pair selector 202 analyzes the set of training samples 104 and selects a sample-label pair 212 (of FIG. 2) responsive to at least one error parameter.”; and [0055]-[0056]: “… the ASL learning requests label annotations on the basis of sample-label pairs, which once incorporated into the training set, are expected to result in the lowest generalization error. A Multi-Labeled Bayesian Error Bound is derived with a selected sample-label pair under a multi-label setting, and ASL accordingly selects the optimal pairs to minimize this bound. … For each sample x, it has m labels                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                    (1≤i≤m) … Let P(y|x) be the unknown conditional distribution over the samples, where y=                        
                            
                                
                                    {
                                    0,1
                                    }
                                
                                
                                    m
                                
                            
                        
                     is the complete label vector and P(x) is the marginal sample distribution.”). Qi further teaches that during training, an error parameter (described as a classification error or generalization error) is determined, where minimizing the classification error or generalization error maximizes the expected predictive capability of the classifier, thus leading to increased model accuracy and hence improving model performance (Qi [0017]; [0038]: “… This sample-label pair selection may be made responsive to an error parameter … a sample-label pair may be selected responsive to a Bayesian classification error bound for a multi-label scenario. More specifically, a sample-label pair may be selected so as to reduce, if not minimize, an expected Bayesian error.”; Figure 4 and [0049]). Although Qi [0033] is in the context of the single label case, the same criterion is also applicable for the multi-label case, where “a convergence of the expected/estimated error performance” over multiple iteration instances of training will lead to improved model performance (Qi [0033]: “… The process can thus include sample selection 114, oracle labeling 116/118, training sample set updating 120, and classifier updating 122. The process may be iterated until a desired criterion is reached. This criterion may be, for example … a convergence of expected/estimated error performance …”; and [0052]).); 
generating a … model using the updated training data that comprises at least the set of labeled data instances (Examiner’s note: As indicated earlier, Qi teaches a multi-label active learning method for selecting samples for labeling a set of training samples, where the labeling involves the generation and assignment of a relevancy indication that corresponds to an assignment of a positive or negative label to an associated sample-label pair as part of the active learning classifier training, where the iterative training of this active learning classifier using updated sample-label pairs from the set of training samples represents the generation of the model using the updated training data (Qi Figure 2, elements 104, 106(x), 108(x), and [0036]-[0040]; Figure 3, [0044]; and Figure 4, [0049]-[0052]).); 
in response to determining that a first performance of the … model is improved … instantiating the updated training data set and the … model (Examiner’s note: As indicated earlier, Qi teaches that during training, an error parameter (described as a classification error or generalization error) is determined, where minimizing the classification error or generalization error maximizes the expected predictive capability of the classifier, thus leading to increased model accuracy and hence improving model performance (Qi [0017]; [0038]; Figure 4 and [0049]). After reaching a certain stopping criteria, the model is considered trained, and along with the updated labeled samples returned to the set of training samples, both model and the updated label samples (corresponding to the “updated training data set”) are considered instantiated at this point (Qi [0033], [0052]: “… At block 416, it is determined if additional classifier training is to be performed. … this determination may be made with reference to one or more criteria. … If no more training is to be performed … then at block 418, the final classifier is produced. Classifier 112 may then be used to label new objects.”).).  
However, Qi does not teach
… generating a candidate model … that comprises at least the set of labeled data instances …
… determining that a first performance of the candidate model is improved from a second performance of the current predictive model, instantiating … the updated model.
Esponda teaches
… generating a candidate model … that comprises at least the set of labeled data instances (Esponda Figure 1B, elements 114, 120: examiner’s note: Esponda teaches training a dynamic classifier using training data received as input data (instance data with labels, Esponda [0032], [0054]) during a training phase, where in the first level, the dynamic classifier contains multiple data models M1..Mn (each one representing a candidate model), the second level use model selection 132 and integrator 128 to process a subset of the model outputs (predictions) along with the training data to generate one or more predictions (hence representing the generation of a candidate model among multiple data models) (Esponda [0041]-[0044]: “…the dynamic classifier 114 is comprised of three levels. The first level includes multiple data models M 1 through Mn (hereinafter collectively referred to as "data models M"). Data models M receive input data 120 and generates model outputs MO 1 through MOn (hereinafter collectively referred to as "model outputs MO"). Each of the model outputs MO represents a prediction made by each of the data models MO1 through MOn based on input data 120. … The second level of the dynamic classifier 114 receives and process a subset of the model outputs MO along with instance data to generate one or more intermediate predictions using two or more modules using different algorithms. … Model selection 132 and integrator 128 generate first intermediate prediction 133 and second intermediate prediction 129, respectively.”).) …
… determining that a first performance of the candidate model is improved from a second performance of the current predictive model, instantiating … the updated model (Esponda Figure 8, elements 820, 824: examiner’s note: In the context of generating a first intermediate prediction, Esponda teaches selecting (instantiating) a final model after choosing a model from among M1..Mn with the highest model output (prediction) as a first model, and a model from among M1..Mn with the lowest model output (prediction) as a second model, and using the corresponding model oracles to generate confidence scores (reflecting prediction accuracy, which is a form of performance metric) based on their model outputs (Esponda [0030], [0049], [0051]), where the model with the highest confidence score (reflecting the highest accuracy of a prediction) is selected (instantiated) as the final model for generating the prediction, where the selection of this final model among multiple models corresponds to instantiating the updated model (Esponda [0071]-[0072]: “A final model is then selected 820 from the first and second models based on the first and second confidence values. Specifically, one of the first and second models having their corresponding oracles produce a higher confidence is selected as the final model. … The model output of the final model is then sent 824 out as first intermediate prediction 133 from model selector 132.”), and where the confidence scores indicating prediction accuracy are generated by the oracles and are represented by class probabilities (Esponda [0050]), which also serves as the current predictive model for the next iteration instance (Esponda [0051]: “In non-binary classification, the output selector 210 may simply choose a model output of a model predicted by oracles to be the most accurate as first intermediate prediction 133 without using the Min-Max function. In some embodiments, the output selector 210 may generate a default value as first intermediate prediction 133 if the confidence values of the all oracles are below a certain level.”).).
Both Qi and Esponda are analogous art since they both teach improving predictions in machine learning models.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the process of using oracles in the single classifier model method of Qi and extend it by implementing an ensemble model method where multiple oracles are associated with multiple models of Esponda as a way to improve the predictive ability of a model used in a system. The motivation to combine is taught in Esponda, as ensemble methods involving multiple models obtain better predictive performance versus updating single model. Both Qi and Esponda use oracles to help in identify training data that result in improved predictions (the oracle in Qi uses label relevancy identifiers while the oracles in Esponda use confidence scores, which are similar as they both identify relevant or related training data to improve predictability of a model), and thus it would have been obvious to a person having ordinary skill in the art to also implement the oracles from Esponda to generate confidence scores to facilitate a more qualitative comparison between a current predictive model and a candidate model, such that the final candidate model represents an improvement of the predictive performance, thus making the system more efficient and accurate in terms of making predictions (Esponda [0006]: “An ensemble method uses multiple distinct models to obtain better predictive performance than could be obtained from any of the individual models.”; and [0030]: “Embodiments relate to a dynamic classifier for performing classification of an action or event associated with instance data using oracles that predict accuracy of predictions made by corresponding models. An oracle corresponding to a model is trained to generate a confidence value that represents accuracy of a prediction made by the model. Based on the confidence value and predictions, one of multiple models is selected and its prediction is used as an intermediate prediction. … By using the confidence value for each model and for each instance data, a more accurate prediction can be made.”).
Regarding amended Claim 14, 
Claim 14 recites a computer program product, stored on a non-transitory computer readable medium, comprising instructions that when executed on one or more computers cause the one or more computers to perform operations comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 1, and hence is rejected under similar rationale and motivations provided by Qi and Esponda as indicated in Claim 1. In addition, Qi teaches a computer device with processors and processor-accessible storage media containing computer instructions, with the media including removeable and non-removable media corresponding to a computer program product (Qi Figure 7, elements, 700, 702, 706, 708, 710; [0035]; and [0096], [0098], [0100]-[0101]).
Regarding previously presented Claim 15, 
Qi in view of Esponda teaches
(Previously Presented) The computer program product of claim 14, 
wherein the labeled data reservoir comprises data collected over time from input data being processed by the current predictive model (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites that the labeled data reservoir contains data collected over time. Qi teaches that samples 106 collected for labeling can cover a range of items, and can include a large number of samples from different sources and of different types. A person having ordinary skill in the art would understand that these samples representing data items of various types are collected over a period of time (Qi [0030]: “Samples 106 may correspond to text items, images, videos, biological data, combinations thereof, or any other type of data set. Although a single sample 106 that is associated with two labels 108a and 108b is explicitly shown, there may be many (e.g., dozens, hundreds, thousands, or more of) such samples 106. Also, each sample 106 may be associated with any number of labels 108.”).).  
Regarding previously presented Claim 16, 
Qi in view of Esponda teaches
(Previously Presented) The computer program product of claim 14, 
wherein each labeled data instance from the set of labeled data instances is associated with a true label representing the data instance (Qi Figure 1, elements 106, 108, 110, 116, 118: examiner’s note: Qi teaches an oracle providing labeling for selected labeled data instances, along with a relevancy tag indicating the relevancy for the label that was associated with the label data instance; a similar oracle is also indicated for the multi-label case (Qi [0040], Figure 2, elements 106(x), 108(x), 110, 206, 208) (Qi [0032]: “Oracle 110 may also be termed a teacher, an annotator, and so forth. Oracle 110 is typically a human or a group of humans that is capable of labeling each sample. The labeling may indicate, for example, a relevancy of label 108 to its associated sample 106. If two labeling categories are permitted for each label concept, the relevancies may be positive/negative, relevant/not relevant, related/not related, and so forth. … Oracle 110 provides or mputs the labeled relevancy at arrow 118 to active learning classifier trainer 102.”). Qi further teaches a true label ys (Qi [0061]: “…true label of the selected sample ys …”), where ys is indicated in [0058]: “tentatively selected to be requested for labeling (but not yet annotated by the oracle)…”, thus representing the predictive category for the labeled data instance.).  
Regarding previously presented Claim 19, 
Qi in view of Esponda teaches
(Previously Presented) The computer program product of claim 14, 
wherein generating the updated training data comprises identifying and removing outlier instances (Qi Figure 3, elements X1 and Xj (before ASL and after ASL): examiner’s note: Qi teaches performing active sampling and labeling (ASL) on data instances to select the most informative samples and labels for labeling, such that the process (e.g., oracle) categorizes multiple labels (as positive concept, negative concept, unlabeled concept, or selected/not selected), with those samples and associated labels being selected for labeling (e.g., sample X1 and one label, sample Xj and two labels) representing the concept of identifying data instances used for further training, with the other samples and labels that were not selected for further training representing the concept of identifying and removing outlier instances according to their corresponding labels (which are a form of a predictive category) (Qi [0044]-[0045]: “… As indicated by legend 308, each label may be categorized or labeled as a positive concept ("P"), as a negative concept ("N"), as an unlabeled concept("?"), or it may be selected for labeling of the concept (''S"). As indicated by the ellipses (" ... ") in each matrix 302, more samples 106 and labels 108 than those that are explicitly illustrated may be present. … The illustrated example labeling states for matrices 302B and 302A are as follows. For the before ASL matrix 302B, sample X1 has three associated labels that are:?,?, and P. Sample Xi, has three associated labels that are: ?, P, and N. Sample Xj has three associated labels that are: ?, ?, and?. Sample Xn has three associated labels that are: P, ?, and P. For the after ASL matrix 302A, sample X1 has three associated labels that are: S, ?, and P. Sample Xi, has three associated labels that are: ?, P, and N. Sample Xj has three associated labels that are: S, ?, and S. Sample Xn has three associated labels that are: P, ?, and P. Thus, example ASL procedure 300 has selected three sample-label pairs for labeling. These three sample-label pairs include one with sample X1 and two with sample Xj.”).).  
Regarding previously presented Claim 20, 
Qi in view of Esponda teaches
(Previously Presented) The computer program product of claim 19, 
wherein the current predictive model is a classifier predicting to which of a set of predictive categories an input data instance belongs (Qi [0034]: “classifier 112 outputs one or more predicted labeled concepts in accordance with its trained classifying algorithm. Classifier 112 may employ any classifying algorithm.”), and 
wherein selecting the set of labeled data instances from the labeled data reservoir comprises: identifying and removing outlier instances in one predictive category (Qi Figure 3, elements X1 and Xj (before ASL and after ASL): examiner’s note: As indicated earlier, Qi teaches performing active sampling and labeling (ASL) on data instances to select the most informative samples and labels for labeling, such that the process (e.g., oracle) categorizes multiple labels with label relevancy identifier (as positive concept, negative concept, unlabeled concept, or selected/not selected), with those samples and associated labels being selected for labeling (e.g., sample X1 and one label, sample Xj and two labels) representing the concept of identifying data instances used for further training, with the other samples and labels that were not selected for further training representing the concept of identifying and removing outlier instances according to their corresponding labels (which are a form of a predictive category) (Qi [0044]-[0045]).).  
Regarding previously presented Claim 21, 
 
Qi in view of Esponda teaches
(Previously Presented) The computer program product of claim 14, 
wherein the labeled data reservoir comprises labeled data instances that are received from multiple sources (Examiner’s note: As indicated earlier, Qi teaches the collected training samples for labeling can cover a range of items (text, images, videos, biological data, etc.), where these range of items represent data from multiple sources (Qi [0030]).), and 
wherein selecting a labeled data instance from the set of labeled data instances comprises: selecting the labeled data instance in response to a source of the labeled data instance matching with a pre-determined source (Examiner’s note: As indicated earlier, Qi teaches a true label ys (Qi [0061]: “…true label of the selected sample ys …”), where ys is indicated in [0058]: “tentatively selected to be requested for labeling (but not yet annotated by the oracle)…”, thus representing the predictive category for the labeled data instance. As indicated earlier, Qi additionally teaches an oracle for the multi-label case (Qi [0040] and Figure 2, elements 106(x), 108(x), 110, 206, 208). As part of the selection of a labeled data instance (sample) for training, oracle 110 provides a label relevancy identifier for a label 108 chosen with a sample 106 from the set of training samples 104, with the relevancy indicating categories such as: positive/negative, relevant/not relevant, related/not related, etc., where the label relevancy identification represents a condition where the pre-determined source (label 108) matches the sample source (true label provided by the oracle) (Qi [0032]: “Oracle 110 may also be termed a teacher, an annotator, and so forth. Oracle 110 is typically a human or a group of humans that is capable of labeling each sample. The labeling may indicate, for example, a relevancy of label 108 to its associated sample 106. If two labeling categories are permitted for each label concept, the relevancies may be positive/negative, relevant/not relevant, related/not related, and so forth. … Oracle 110 provides or mputs the labeled relevancy at arrow 118 to active learning classifier trainer 102.”).).  
Regarding previously presented Claim 22, 
Qi in view of Esponda teaches
(Previously Presented) The computer program product of claim 14, 
wherein generating the updated training data is based on a greedy algorithm, the generating comprising: 
generating a first training data set by adding a first subset of the labeled data instances to at least a portion of the candidate training data (Qi Figure 3, elements X1 and Xj (before ASL and after ASL): examiner’s note: Under its broadest reasonable interpretation, the term “greedy algorithm” broadly recites an algorithm or a series of steps that is exhaustive or thorough in nature to achieve its goal. As indicated earlier, Qi teaches performing active sampling and labeling (ASL) on data instances to select the most informative samples and labels for labeling, such that the process (e.g., oracle) categorizes multiple labels with label relevancy identifier (as positive concept, negative concept, unlabeled concept, or selected/not selected), with those samples and associated labels being selected for labeling (e.g., sample X1 and one label, sample Xj and two labels) representing the concept of identifying data instances used for further training, with the other samples and labels that were not selected for further training representing the concept of identifying and removing outlier instances according to their corresponding labels (which are a form of a predictive category), thus representing an exhaustive search corresponding to a greedy algorithm (Qi [0044]-[0045]). In this case, the sample X1 with one associated label represents a first (candidate) training data set containing a first subset of labeled data instances (e.g., sample X1 with one associated label) added to the first (candidate) training data set.); and 
generating a second candidate training data set by adding a second subset of the labeled data instances to the first training data set (Qi Figure 3, elements X1 and Xj (before ASL and after ASL): examiner’s note: As indicated earlier, Qi teaches performing active sampling and labeling (ASL) on data instances to select the most informative samples and labels for labeling, such that the process (e.g., oracle) categorizes multiple labels with label relevancy identifier (as positive concept, negative concept, unlabeled concept, or selected/not selected), with those samples and associated labels being selected for labeling (e.g., sample X1 and one label, sample Xj and two labels) representing the concept of identifying data instances used for further training, with the other samples and labels that were not selected for further training representing the concept of identifying and removing outlier instances according to their corresponding labels (which are a form of a predictive category) (Qi [0044]-[0045]). In this case, the sample Xj with two associated labels represents a second candidate training data set containing a second subset of labeled data instances (e.g., sample Xj with a first associated label; sample Xj with a second associated label) added to the first (candidate) training data set.).  
Regarding previously presented Claim 25, 
Qi in view of Esponda teaches
(Previously Presented) The computer program product of claim 14, the instructions, when executed on one or more computers cause the one or more computers to perform operations (This claim limitation is similar in scope to a corresponding claim limitation in Claim 14, and hence is rejected under similar rationale.) comprising: 
calculating a cross-validation between the candidate model performance and the current predictive model performance (Esponda Figure 8, elements 820, 824: examiner’s note: In the context of generating a first intermediate prediction, Esponda teaches selecting (instantiating) a final model after choosing a model from among M1..Mn with the highest model output (prediction) as a first model, and a model from among M1..Mn with the lowest model output (prediction) as a second model, and using the corresponding model oracles to generate confidence scores (reflecting prediction accuracy) based on their model outputs (Esponda [0030]; [0048]-[0049]; [0051]), where the model with the highest confidence score (reflecting the highest accuracy of a prediction) is selected (instantiated) as the final model for generating the prediction (Esponda [0071]-[0072]), and where the confidence scores indicating prediction accuracy are generated by the oracles and are represented by class probabilities (Esponda [0050]), which also serves as the current predictive model for the next iteration (Esponda [0051]). Hence, the comparison of confidence scores are interpreted as a cross-validation between a candidate model and the current predictive model performance, and the process of generating the intermediate prediction (through calculation of confidence scores).).  
Regarding previously presented Claim 26, 
Qi in view of Esponda teaches
(Previously Presented) The computer program product of claim 14, 
wherein the candidate model is a first candidate model (Esponda Figures 1B, elements 114, M1..Mn; Figure 2, elements M1..Mn; [0042]: “…the dynamic classifier 114 is comprised of three levels. The first level includes multiple data models M1 through Mn (hereinafter collectively referred to as "data models M")”, where one of the multiple data models corresponds to a first candidate model.), and 
wherein the instructions, when executed on the one or more computers, further cause the one or more computers to perform operations (This claim limitation is similar in scope to a corresponding claim limitation in Claim 14, and hence is rejected under similar rationale.) comprising: 
generating an assessment for first performance of the first candidate model and the second performance of the second candidate model in parallel (Examiner’s note: Esponda teaches the process of performing intermediate predictions may be performed in parallel (Esponda [0067]: “ … although the process in FIG. 7 is illustrated as generating first intermediate prediction 133 before generating second intermediate prediction 129, second intermediate prediction 129 may be generated before first intermediate prediction 133 or both intermediate predictions 129, 133 may be generated in parallel. … In other embodiments, more than two intermediate predictions may be generated by one or more additional modules in the second level of dynamic classifier 114 to generate final prediction 152.”). For the case of two models among the multiple data models (selected as a first candidate model and a second candidate model), this may involve computing confidence scores for both models, as indicated in Esponda [0074]: “Also, instead of generating the confidence values for only the first and second models, the confidence values for all models may be computed. Then, a model with the highest confidence value may be selected as the final model.”, thus indicating that the same assessment is done for both models in parallel.).  
Regarding amended Claim 27, 
Claim 27 recites a system, comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 1, and hence is rejected under similar rationale and motivations provided by Qi and Esponda as indicated in Claim 1. In addition, Qi teaches a computer device with processors and processor-accessible storage media containing computer instructions, with the computer device containing processor-accessible storage media corresponding to a computer system as recited by the claims (Qi Figure 7, elements, 700, 702, 706, 708, 710; [0035]; and [0096], [0098], [0100]-[0101]).
Regarding previously presented Claim 28, 
Claim 28 recites the system of claim 27, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 15, and hence is rejected under similar rationale provided by Qi in view of Esponda as indicated in Claim 15, in view of rejections from Claim 27.
Regarding previously presented Claim 29, 
Claim 29 recites the system of claim 27, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 16, and hence is rejected under similar rationale provided by Qi in view of Esponda as indicated in Claim 16, in view of rejections from Claim 27.
Regarding previously presented Claim 32, 
Claim 32 recites the system of claim 27, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 19, and hence is rejected under similar rationale provided by Qi in view of Esponda as indicated in Claim 19, in view of rejections from Claim 27.
Regarding previously presented Claim 33, 
Claim 33 recites the system of claim 32, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 20, and hence is rejected under similar rationale provided by Qi in view of Esponda as indicated in Claim 20, in view of rejections from Claim 32.
Regarding previously presented Claim 34, 
Claim 34 recites the system of claim 27, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 21, and hence is rejected under similar rationale provided by Qi in view of Esponda as indicated in Claim 21, in view of rejections from Claim 27.
Claims 17, 23, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over 
Qi et al., U.S. PGPUB 2009/0125461, published 5/14/2009 [hereafter referred as Qi] in view of Esponda et al., U.S. PGPUB 2014/0279745, published 9/18/2014 [hereafter referred as Esponda] as applied to Claims 14, 16, and 29; in further view of Lin et al., U.S. PGPUB 2012/0284213, published 11/8/2012 [hereafter referred as Lin].
Regarding previously presented Claim 17, 
Qi in view of Esponda as applied to Claim 16 teaches
(Previously Presented) The computer program product of claim 16.
However, Qi in view of Esponda does not teach  
wherein the determination is further based at least in part on a quality of the training data.
Lin teaches
wherein the determination is further based at least in part on a quality of the training data (Examiner’s note: Under its broadest reasonable interpretation, the term “quality of the training data” broadly recites a property of the sample data that provides the most information for training a predictive model. Lin teaches implementing a richness score (Lin [0003], [0130]) indicating how information-rich a particular data sample is in comparison to other retained data samples for purposes of testing the accuracy of a trained predictive model, where the richness score is calculated based on the number of nearby different samples and the total number of nearby samples (Lin [0138]-[0142]), where the richness score represents a measure of quality of the training data as it provides an indication of similarity between different samples with different prediction outputs (Lin [0131]: “… if multiple data samples are clustered together, e.g., exhibit a high degree of similarity in features, then a small number of the data samples in the cluster can be given a relatively higher richness score then the rest in the cluster, which can be given a relatively low richness score on account of their redundancy. By comparison, a data sample whose nearest neighbor (when comparing features) is a data sample having a different output value is considered an information-rich data sample.”), thus introducing diversity of samples into a training set to improve the accuracy of the model. The richness score is used to decide on retaining the data samples (Lin [0134]: “…there is value in retaining a portion of these data samples, however, retaining all of them is of less value on account of the redundancy between the data samples.”), since samples that do not provide as much value (and hence contribute to quality of the training) for training will not be retained.).
Both Qi in view of Esponda and Lin are analogous art since they both teach improving predictions in machine learning models.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the oracles of Qi in view of Esponda and incorporate a mechanism to calculate information richness score of Lin as a way to tag and identify different data samples that share similarities. The motivation to combine is taught in Lin, as way to identify different data samples with enough similarities to provide diversity to the training, thereby improving the accuracy of the predictive model. A secondary motivation is also taught in Lin, since through the use of richness scores, a decision can be further made in which to only retain those data samples in memory/storage that are considered informative to making model predictions, thereby also improving storage efficiency and performance of the system by reducing the storage requirements for the system while maximizing the predictive accuracy of the model at the same time (Lin [0136]: “Having data samples that have borderline features values can therefore be informative when testing the accuracy of a trained predictive model in making predictions, particularly in borderline cases. Accordingly, these data samples are considered information-rich and are assigned relatively high richness scores.”; and [0151]: “Training data may be scored for richness so that the amount of training data retained in the training data repository 214 can be kept to a desired volume, e.g., on account of memory space restrictions. The training-richness-score can be also used to optimize what training samples are used to train and retrain models, so as to provide optimally trained predictive models in view of the input data expected to be received going forward with prediction requests. As is already described above, the test data set is used to test the accuracy of a trained predictive model before the data samples included in the test data set are used to train (or retrain) the trained predictive model.”).
Regarding previously presented Claim 23, 
Qi in view of Esponda as applied to Claim 14 teaches
(Previously Presented) The computer program product of claim 14.
 However, Qi in view of Esponda does not teach 
wherein generating the updated training data is based on a non-greedy algorithm, the generating comprising: replacing at least a portion of the candidate training data with a subset of the labeled data instances.  
Lin teaches
wherein generating the updated training data is based on a non-greedy algorithm, the generating comprising: replacing at least a portion of the candidate training data with a subset of the labeled data instances (Examiner’s note: Under its broadest reasonable interpretation, the term “non-greedy algorithm” broadly recites an algorithm or a series of steps that is performed with minimal effort (i.e., not requiring an exhaustive search) to achieve its goal (which in the context of the claims, the goal is to update training data by performing a replacement of some of the candidate training data with a subset of labeled data instances). As indicated earlier, Lin teaches implementing a richness score (Lin [0130]) indicating how information-rich a particular data sample is in comparison to other retained data samples for purposes of testing the accuracy of a trained predictive model, where the richness score is used as a threshold to decide on retaining the data samples (Lin [0134]: “there is value in retaining a portion of these data samples, however, retaining all of them is of less value on account of the redundancy between the data samples. … In another example, a data sample is selected to be assigned a richness score of 0 (i.e., to effectively remove the data sample from the test data) based on whether removal of the data sample will increase the overall score of the data samples.”). In this context, the removal of samples from the updated training data (set) using a richness score representing a threshold is interpreted as a form of non-greedy replacement of the candidate training data.).  
Both Qi in view of Esponda and Lin are analogous art since they both teach improving predictions in machine learning models.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the oracles of Qi in view of Esponda and incorporate a mechanism to calculate information richness score of Lin as a way to tag and identify different data samples that share similarities. The motivation to combine is taught in Lin, as provided in the prior art claim mapping of Claim 17 recited above.
Regarding previously presented Claim 30, 
Claim 30 recites the system of claim 29, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 17, and hence is rejected under similar rationale and motivations provided by Qi in view of Esponda and Lin as indicated in Claim 17, in view of rejections from Claim 29.
Claims 18 and 31 are rejected under 35 U.S.C. 103 as being unpatentable over 
Qi et al., U.S. PGPUB 2009/0125461, published 5/14/2009 [hereafter referred as Qi] in view of Esponda et al., U.S. PGPUB 2014/0279745, published 9/18/2014 [hereafter referred as Esponda], in further view of Lin et al., U.S. PGPUB 2012/0284213, published 11/8/2012 [hereafter referred as Lin] as applied to Claims 17 and 30; in even further view of Attenberg et al., "Chapter 6: Class Imbalance and Active Learning", In Imbalanced Learning: Foundations, Algorithms, and Applications, published 2013 [hereafter referred as Attenberg].
Regarding previously presented Claim 18, 
Qi in view of Esponda, in further view of Lin as applied to Claim 17 teaches
(Previously Presented) The computer program product of claim 17, 
wherein the current predictive model is a classifier predicting to which of a set of predictive categories an input data instance belongs (Qi [0034]: “… classifier 112 outputs one or more predicted labeled concepts in accordance with its trained classifying algorithm. Classifier 112 may employ any classifying algorithm.”), 
wherein a true label associated with a labeled data instance identifies the predictive category to which the labeled data instance belongs (Qi Figure 1, elements 106, 108, 110, 116, 118: examiner’s note: As indicated earlier, Qi teaches an oracle for the multi-label active labeling method (Qi [0032], [0040] and Figure 2, elements 106(x), 108(x), 110, 206, 208). Qi further teaches a true label ys (Qi [0061]: “…true label of the selected sample ys …”), where ys is indicated in [0058]: “tentatively selected to be requested for labeling (but not yet annotated by the oracle)…”, thus representing the predictive category for the labeled data instance.) …
While Qi in view of Esponda, in further view of Lin teaches selection of appropriate samples and labels according to different contributions to minimize the generalization error (which is interpreted as a way to avoid overfitting the classifier with the same data, Qi [0020], [0042]), Qi in view of Esponda, in further view of Lin does not explicitly teach
… wherein selecting the set of labeled data instances from the labeled data reservoir is based at least in part on maintaining a class balance within the pool of candidate training data …
Attenberg teaches
… wherein selecting the set of labeled data instances from the labeled data reservoir is based at least in part on maintaining a class balance within the pool of candidate training data (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites techniques or steps for selecting data instances in order to maintain a class balance for the candidate training samples. Attenberg teaches using a distance metric for pool-based active learning to select and label the most informative examples from datasets exhibiting moderate class imbalance in order to produce a more balanced class distribution, where those examples that lie closest to the hyperplane (i.e., closest to the margin or class decision boundary) contain the most information. Hence, a selection strategy that uses a distance metric to select the most informative examples closest to a class decision boundary in order to produce a more balanced class distribution corresponds a process or technique for selecting labeled data instances based at least in part on maintaining a class balance for the selected training samples (representing the pool of candidate training data) (Attenberg p.104 Figure 6.1 and p.104 1st paragraph: “… the remainder of this chapter focuses on the latter of these two scenarios, the pool-based setting …”; p.106 Section 6.2.3: “… AL presents itself as an effective strategy for dealing with moderate class imbalance even without any special considerations for the skewed class distribution …”; and pp.110-111 Section 6.3 and Figure 6.3: “… From a traditional perspective, the active learner … aims to make a clever choice to select the most informative example to obtain its label … AL can still be leveraged to obtain the informative examples through training sets … the examples close to the hyperplane are the ones that yield the most information to the learner … the most commonly used AL strategy in SVMs is to check the distance of each unlabeled example to the hyperplane and focus on the examples that lie closest to the hyperplane … As shown in the figure, the imbalance ratio of the classes within the margin is much smaller than the class imbalance ratio of the entire dataset. Therefore, any selection strategy that focuses on the examples in the margin most likely ends up with a more balanced class distribution …”). Attenberg further teaches additional sample selection techniques such as density-sensitive active learning and skew-specialized active learning for training sample sets exhibiting more extreme class imbalances, and hence these additional selection techniques also represent additional processes or techniques for selecting labeled data instances based at least in part on maintaining a class balance for the selected training samples (Attenberg pp.106-110 Section 6.2.3: “… These skew-specialized AL techniques incorporate an innate preference for the minority class, leading to more balanced training sets and better predictive performance in imbalanced settings …”; and Sections 6.2.3.1 and 6.2.3.2).) …
Both Qi in view of Esponda, in further view of Lin and Attenberg are analogous art since they both teach pool-based active learning.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take active learning sample selection techniques taught in Qi in view of Esponda, in further view of Lin and enhance them to incorporate the distance metric, density-sensitive, and skew-specialized active learning sample selection techniques taught in Attenberg as a way to improve the class balance in training sample sets. The motivation combine is taught in Attenberg, since selecting the most informative examples for labeling helps to maintain overall class balance for the training samples, which in turn minimizes the adverse effects of imbalanced data on a model’s generalization performance as well as focusing the computational resources on the selection of the most informative examples nearest a model’s decision boundary to improve a model’s performance around this decision boundary, leading to improvements in a model’s overall performance and accuracy (Attenberg p.101 Abstract; p.102 Section 6.1 2nd-3rd paragraphs: “… the goal here is to ensure that the budget is not predominantly expended on getting the labels of the majority class instances, and to make sure that the set of instances to be labeled have comparable number of minority class instances as well … The role of AL in this case is to reduce, and potentially eliminate, any adverse effects that the class imbalance can have on the model’s generalization performance … we would like to employ AL to select informative examples both from the majority and minority classes for labeling, subject to the constraints of a given budget …”; and p.104 2nd paragraph – p.105 1st paragraph: “… The most common techniques in AL have focused on selecting examples from a so-called region of uncertainty, the area nearest to the current model’s predictive decision boundary … Incorporating labeled examples from this region may improve the model’s performance along this boundary, leading to gains in overall accuracy.”).
Regarding previously presented Claim 31, 
Claim 31 recites the system of claim 30, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 18, and hence is rejected under similar rationale and motivations provided by Qi in view of Esponda, in further view of Lin and Attenberg as indicated in Claim 18, in view of rejections from Claim 30.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332. The examiner can normally be reached Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121