Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Amendments
This action is in response to amendments filed July 29th, 2021, in which Claims 1 and 11 are amended.  No claims are cancelled nor added.  The amendments have been entered.  Claims 1-20 are currently pending.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4-12, and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hughes et al. (US 20200202171 A1, hereinafter Hughes) in view of Breckenridge et al. (US 8595154 B2, hereinafter Breckenridge).

Regarding claim 1,
Hughes discloses a method comprising: storing, at a machine learning server computer, a plurality of machine learning training datasets (Hughes Fig. 1 Elements 100 & Annotation Server, 100, (i.e. machine learning (ML) server computer), Annotated Data, 104, (i.e. machine learning (ML) training datasets)); 
displaying, through a graphical user interface, a plurality of selectable options, each selectable option of the plurality of selectable options identifying a machine learning training dataset to be used for training a machine learning system among the plurality of machine learning training datasets (Hughes [0155] recites “During model building the annotation server 202 will guide the user through a series of steps in an automated fashion. In some embodiments, the user will designate certain annotation sets to be used for training a machine learning and others to be used for testing the quality of a machine learning model.” Additionally, Hughes fig. 13 and [0207] recites “While only two annotations are shown, other numbers of annotations may be provided for single-class or multi-class classifiers. A selectable dataset button 1314 facilitates adding additional datasets to be annotated. The datasets may be selectable from the set of datasets maintained though FIG. 10.” Examiner interprets user designated annotation sets (i.e. machine learning training dataset) used for training a machine learning model (i.e. machine learning system) and annotations able to provide for single-class or multi-class classifiers while guided by annotation server (i.e. machine learning server computer). Selectable datasets maintained through fig. 10 (i.e. plurality of machine learning datasets) and fig.13 (i.e. graphical user interface));
receiving, at the machine learning server computer, a combination of particular input dataset and a selection of a particular selectable option identifying a particular machine learning training dataset among the plurality of machine learning training datasets (Hughes [0145] recites “The annotation server 202 may receive the unannotated data 102 over a network 208 from an annotation client 206 for storage in the database 204. The annotation server 202 interacts with the annotation client 206 through one or more graphical user interfaces to the facilitate generation of the annotated data 104. Upon sufficient annotation of the unannotated data 102, as specified by one or more annotation training criteria (e.g., 20 annotations for each class), the annotation server 202 is configured to generate one or more intermediate models.” Annotation server (i.e. ML server computer) and unannotated data (i.e. particular input dataset) Additionally, Hughes [0155] recites “During model building the annotation server 202 will guide the user through a series of steps in an automated fashion. In some embodiments, the user will designate certain annotation sets to be used for training a machine learning and others to be used for testing the quality of a machine learning model.” User designated annotation sets for machine learning (i.e. selecting a particular ML training dataset) where receiving both is receiving a combination) the particular machine learning training dataset being for use in training the machine learning system, and the particular input dataset being to be computed by the machine learning system after the machine learning system is trained using the particular machine learning training dataset (Hughes [0155], “During model building the annotation server 202 will guide the user through a series of steps in an automated fashion. In some embodiments, the user will designate certain annotation sets to be used for training a machine learning and others to be used for testing the quality of a machine learning model.”);
training the particular machine learning system using the particular machine learning training dataset (Hughes [0156]-[0157] recites, in part, “Given training data and test data and a model type (e.g. text classifier, image classifier, semantic role labeling), the Training the given model type (i.e. particular ML system) with given training data (i.e. ML training dataset));
However, Hughes does not explicitly teach each machine learning training dataset of the plurality of machine learning training datasets comprising input data and output data, each machine learning training dataset of the plurality of machine learning training datasets being associated with a corresponding problem to be computed by a machine learning system; a machine learning system that computes an output corresponding to a particular problem associated with the identified machine learning training dataset for a given input; using the particular input dataset as input into the particular machine learning system, computing a particular output dataset.
Although Hughes teaches machine learning training datasets; displaying, through a graphical user interface, a plurality of selectable options, each selectable option of the plurality of selectable options identifying a machine learning training dataset to be used for training a machine learning system with the identified machine learning dataset among the plurality of machine learning training datasets, Hughes does not explicitly teach a machine learning system that computes an output corresponding to a particular problem associated with the identified machine learning training dataset for a given input. 
On the other hand, Breckenridge teaches each machine learning training dataset of the plurality of machine learning training datasets comprising input data and output data , 
each machine learning training dataset of the plurality of machine learning training datasets being associated with a corresponding problem to be computed by a machine learning system (Breckenridge Pg. 15, Col. 6, Ln. 47-54 recites “The process 400 and system 200 can be used in various different applications. Some examples include (without limitation) making predictions relating to customer sentiment, transaction risk, species identification, message routing, diagnostics, churn prediction, legal docket classification, suspicious activity, work roster assignment, inappropriate content, product recommendation, political bias, uplift marketing, e-mail filtering and career counseling.” Customer sentiment, transaction risk, species identification, etc. (i.e. corresponding problem to be computed) and predictive model (i.e. machine learning system));
a machine learning system that computes an output corresponding to a particular problem associated with the identified machine learning training dataset for a given input (Breckenridge Pg. 15, Col. 6, Ln. 47-65 recites “The process 400 and system 200 can be used in various different applications. Some examples include (without limitation) making predictions relating to customer sentiment, transaction risk, species identification, … In this example, the client computing system 202 provides a web-based online shopping service. The training data includes multiple records, where each record provides the online shopping transaction history for a particular customer. The record for a customer includes the dates the customer made a purchase and identifies the item or items purchased on each date. The client computing system 202 is interested in predicting a next purchase of a customer based on the customer's online shopping Customer’s next purchase prediction (i.e. output), customer sentiment, transaction risk, species identification, customer online shopping, etc. (i.e. corresponding problem to be computed), predictive model (i.e. machine learning system), training data with multiple records of online shopping transaction history of customers (i.e. identified machine learning dataset), and record for a customer (i.e. given input));
using the particular input dataset as input into the particular machine learning system, computing a particular output dataset (Breckenridge Pg. 14-15, Col. 4-5, Ln. 65-67 & 1 recites “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output, and generates the predictive output 114.” Selected model (i.e. particular machine learning system), input data (i.e. particular input dataset as input) and predictive output (i.e. particular output dataset)).
Hughes and Breckenridge are both directed to machine learning and training models. In view of the teachings of Breckenridge, it would have been obvious to one of ordinary skill in the art to apply the teachings of Breckenridge to Hughes before the effective filing date of the claimed invention in order to accommodate training various types of predictive models and generating associated predictive outputs with large amounts of training data thereby improving Hughes (cf. Breckenridge Pg. 12, Col. 1 Ln. 19-36, in part, recites the following: 
“Predictive analytics generally refers to techniques for extracting information from data to build a model that can predict an output from a given input. Predicting an output can include predicting future trends or behavior patterns, or performing sentiment analysis, to name a few examples. Various types of predictive models can be used to analyze data and generate predictive outputs. Typically, a predictive model is trained with training data that includes input data and output data that mirror the form of input data that will be entered into the predictive model and the desired predictive output, respectively. The amount of training data that may be required to train a predictive model can be large, e.g., in the order of gigabytes or terabytes. The number of different types of predictive models available is extensive, and different models behave differently depending on the type of input data. Additionally, a particular type of predictive model can be made to behave 
).
Regarding claim 2,
The Hughes/Breckenridge Combination teaches the method of claim 1, further comprising: storing, at the machine learning server computer, a confidence score threshold value (Hughes fig. 7 elements 714 & 716) and [0196]-[0197] recites, in part, “The priority queues 608 shown in the example of FIG. 7 include a priority queue 704 for samples with a high confidence prediction to be annotated with “Class A”… As discussed above, the sampling score may be the confidence score or a value otherwise derived by the prediction vector… [0197] If the sampling score is below a threshold value 716 for a given priority queue 608, then the priority queue 608 may discard 720 the prediction… In some embodiments, if the sampling score is not greater than any of the sampling scores of previously stored predictions, then the prediction is discarded.” Previously stored predictions with scores (storing confidence score values) and fig. 7 element 716 (i.e. confidence score threshold value));
the particular output dataset comprising, for each of a plurality of data items in the particular output dataset, an output confidence score (Hughes fig. 7 elements 702, 704, 706, 714 and [0168] & [0196] recites, in part, “At 404, a prediction set is generated by a model 406 predicting an annotation for samples in the set of training candidates or a subset thereof… The model 406 also provides a prediction vector score for each prediction. [0196] As a prediction 702 is streamed through the sampling storage writer 606, the prediction is provided to the plurality of priority queues 608. The priority queues 608 shown in the example of FIG. 7 include a priority queue 704 for samples with a high confidence prediction to be annotated with “Class A”, a priority queue 706 for samples with a high confidence prediction to be annotated with “Class B”, a priority queue 708 for samples with a high entropy (e.g., maintained in order of highest Shannon entropy), and a priority queue 710 for minimum margin samples. The samples are arranged in the priority queues in an order of increasing sampling score 714. As discussed above, the sampling score may be the confidence score or a value otherwise derived by the Figure 7 element 714 (i.e. confidence scores), prediction vector fig. 7 element 702 (i.e. particular output dataset), and fig. 7 elements 704 & 706 (i.e. data items with confidence scores));
determining that a subset of the plurality of data items in the particular output dataset comprise confidence scores below the confidence score threshold value (Hughes fig. 7 element 716 and [0197] Lines 1-5 recites “If the sampling score is below a threshold value 716 for a given priority queue 608, then the priority queue 608 may discard 720 the prediction.” Discarding prediction 720 (i.e. subset of data items) from prediction vector (i.e. particular output dataset) if sampling score (i.e. confidence score) is below a threshold value (i.e. determining subset of data items comprise confidence scores below threshold value));
identifying a subset of the particular input dataset that corresponds to the subset of the plurality of data items in the particular output dataset (Hughes [0168]-[0169] recites, in part, “At 402, the unannotated set of training candidates is received. Each data element in the set of training candidates is referred to as a sample of the unannotated data 102… [0168] At 408, the prediction set is evaluated based on the prediction vector of the predictions and a determination is made as to whether to request annotations of one or more of the samples. To facilitate rapid and focused training of the model through the annotation process, a sampled prediction set is generated by sampling the prediction set...” Sampling the prediction set based on the prediction vectors (i.e. particular output dataset) and each data element in set of training candidates referred to as samples (i.e. subset of particular input dataset)); 
training a second machine learning system using a second machine learning training dataset of the plurality of machine learning training datasets (Breckenridge Pg. 21, Col. 18, Ln. 32-34 and Col. 18, Ln. 43-45 recites, in part, “A new set of trained predictive models is generated using the updated training data and using training functions that are obtained from the training function repository 216 (Box 708)… A second trained predictive model can be selected to which access is provided to the client computing system 202 (Box 712).”); 
using the subset of the particular input dataset as input into the second machine learning system, computing a second output dataset (Breckenridge Pg. 19-20, Col. 14-15, Ln. 67 & 1-7 recites ; 
replacing one or more data items in the particular output dataset with one or more corresponding items in the second output dataset (Hughes [0197] recites “However, if the given priority queue 608 is full, then the sampling score is compared against one or more of the sampling scores of prior saved predictions in the priority queue 608. In some embodiments, if the sampling score is not greater than any of the sampling scores of previously stored predictions, then the prediction is discarded. Otherwise, the prediction is saved in the priority queue 608 at a location in accordance with its priority score and a lowest scoring prediction is removed from the priority queue 608.”).
Please see motivation for claim 1 above.

Regarding Claim 4.
The Hughes/Breckenridge Combination teaches the method of claim 2, further comprising: in response to determining that the subset of the plurality of data items comprise confidence scores below the confidence score threshold value (Hughes [0197] recites “As a new prediction is received, each of the priority queues 608 evaluate the sampling score for the new prediction. If the sampling score is below a threshold value 716 for a given priority queue 608, then the priority queue 608 may discard 720 the prediction.”), 
displaying on the graphical user interface the plurality of selectable options (Hughes Fig. 18 Elements 1814 and [0214] & [0216] recites, in part, “The graphical user interface 1800 provides for minimizing the cognitive load and expertise required to train a machine learning model. [0216] Feedback 1814 is provided to the user on the quality and quantity of data ; 
receiving a selection of a second selectable option corresponding to the second machine learning training dataset (Hughes [154] and [155] recites “In some implementations, multiple models may be built and compared using common measures of quality against the annotated test set. During model building the annotation server 202 will guide the user through a series of steps in an automated fashion”. Examiner interprets annotation server guiding the user through a series of steps using the GUI.) and, 
in response, training the second machine learning system using the second machine learning training dataset (Breckenridge Pg. 21, Col. 18, Ln. 32-34 and Col. 18, Ln. 43-45 recites, in part, “A new set of trained predictive models is generated using the updated training data and using training functions that are obtained from the training function repository 216 (Box 708)… A second trained predictive model can be selected to which access is provided to the client computing system 202 (Box 712).”).
Please see motivation for claim 1 above.

Regarding Claim 5.
The Hughes/Breckenridge combination teaches the method of claim 2 wherein the second machine learning training dataset comprises a combination of two or more machine learning training datasets of the plurality of machine learning training datasets (Breckenridge Pg. 21, Col. 17, Ln. 53-60 recites “Updated training data is generated (Box 706) .
	Please see motivation for claim 1 above.

Regarding Claim 6.
The Hughes/Breckenridge combination teaches the method of claim 5 wherein the combination of two or more machine learning training datasets comprises the particular machine learning training dataset (Breckenridge Pg. 21, Col. 18, Ln. 22-26 recites “For illustrative purposes, in one example the updated training data can be generated by combining the training data in the training data queue together with the training data already stored in the training data repository 216 (e.g., the initial training data).”).
	Please see motivation for claim 1 above.

Regarding Claim 7.
The Hughes/Breckenridge combination teaches the method of claim 2, further comprising: displaying, through the graphical user interface, a plurality of selectable category options for the particular input dataset (Hughes Fig. 11 (1100, 1102) & Fig. 12 (1200) and [0204]-[0205] recites, in part, “One or more annotation sets 104, such as annotated or unannotated sets of training candidates or sets of test data are provided in a list of selectable annotation sets 1102 that have been generated from the unannotated data 102. [0205] FIG. 12 illustrates an example of various categories of sets of annotations, in accordance with an example embodiment of the disclosed technology. For example, the annotation sets may be categorized in Fig. 12 elements 1202-1208 (i.e. categories of annotation sets) for list of selectable annotation sets which include unannotated sets of training candidates (i.e. input dataset)); 
wherein the machine learning server computer stores data associating the particular machine learning training dataset with a particular category identified by a particular selectable category option of the plurality of selectable category options (Hughes [0145] recites “The annotation server 202 is in communication with a database 204 that is configured to store the information stack 100 therein… The annotation server 202 interacts with the annotation client 206 through one or more graphical user interfaces to the facilitate generation of the annotated data 104.” Examiner interprets the information stack depicted in figure 1 which includes annotated data (i.e. ML training dataset) stored in the annotation server (i.e. ML server computer) and interacted with figures 11 and 12 (GUIs).); 
receiving a selection of a particular selectable category option (Hughes fig. 11 element 1108 and [0204]-[0205] recites, in part, “The graphical user interface 1100 allows a user to manage a multiplicity of annotation sets. One or more annotation sets 104, such as annotated or unannotated sets of training candidates or sets of test data are provided in a list of selectable annotation sets 1102 that have been generated from the unannotated data 102… [0205] For example, for the emotions category 1202, a list of annotation sets includes affection, agitation, anger, complaint, happiness, sadness, solidarity, and worry. Other emotions are contemplated by this disclosure.” Examiner interprets the selected annotation set of complaint, figure 11 element 1108, included in the emotion category 1202); 
in response to determining that the subset of the plurality of data items comprise confidence scores below the confidence score threshold value (Hughes [0199] recites “At 816, a determination is made by the annotation process 800 of whether a confidence score of the most uncertain predictions exceeds a threshold confidence score.”), 
identifying the second machine learning training dataset based, at least in part, on the selection of the particular selectable category option and the data associating the particular machine learning training dataset with the particular category (Hughes [0199] recites “If not, the annotation process 800 loops back to request annotations of additional samples of the test set of data at 810.”).
Please see motivation for claim 1 above.

Regarding Claim 8.
The Hughes/Breckenridge combination teaches the method of claim 1, further comprising: using the machine learning server computer, training a second machine learning system using a second machine learning training dataset (Breckenridge Pg. 21, Col. 18, Ln. 32-34 and Col. 18, Ln. 43-45 recites, in part, “A new set of trained predictive models is generated using the updated training data and using training functions that are obtained from the training function repository 216 (Box 708)… A second trained predictive model can be selected to which access is provided to the client computing system 202 (Box 712).”); 
using the particular input dataset as input into the second machine learning system, computing a second output dataset (Breckenridge Pg. 14, Col. 3-4, Ln. 67 & 1-8 recites “As a particular client entity's training data changes over time, the client entity can be provided access to a trained predictive model that has been trained with training data reflective of the changes. As such, the repository of trained predictive models from which a predictive model can be selected 
determining that an accuracy of the second output dataset is higher than an accuracy of the particular output dataset (Breckenridge Pg. 16, Col. 8, Ln. 59-65 recites “In some implementations, the effectiveness of each trained predictive model is estimated by performing cross-validation to generate a cross-validation score that is indicative of the accuracy of the trained predictive model, i.e., the number of exact matches of output data predicted by the trained model when compared to the output data included in the test sub-sample.”); 
storing default machine learning data associating the second machine learning system with the particular machine learning training dataset (Hughes [0142] recites “The unannotated data 102 may be provided from pre-existing data stores or also include live-streams of unannotated data of any desired format. In some implementations, the unannotated data 102 may include directories of files and can include graphical formats of data. Other sources of electronic data may be used.” Pre-existing data is interpreted as default machine learning data.); 
receiving a second input dataset and a selection of the particular selectable option identifying the particular machine learning training dataset (Breckenridge Pg. 14-15, Col. 4-5, Ln. 63-67 & 1-3 recites “The client computing system 104a can transmit prediction requests 108a over the network. The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output, and generates the predictive output 114. The predictive output 114 can be provided to the client computing system 104a, for example, over the network 102.”); 
based on the default machine learning data, selecting the second machine learning system for the second input dataset (Hughes [0156] recites “The specific algorithm has in most cases been predetermined for the type of model and the amount of training data.”).
Please see motivation for claim 1 above.

Regarding Claim 9.
The Hughes/Breckenridge combination teaches the method of claim 1, the particular machine learning system comprising a particular machine learning type and one or more first machine learning parameters (Hughes [0157] recites ‘Finally, a "baseline model" is trained using default hyperparameters selected for a given model type and algorithm.’ Model type (i.e. particular machine learning type) and hyperparameters (i.e. one or more machine learning parameters)), and the method further comprising: 
using the machine learning server computer, training a second machine learning system using the particular machine learning training dataset (Hughes [0145] recites “Upon sufficient annotation of the unannotated data 102, as specified by one or more annotation training criteria (e.g., 20 annotations for each class), the annotation server 202 is configured to generate one or more intermediate models”. Intermediate model is interpreted as machine learning system.); 
wherein the second machine learning system comprises the particular machine learning type and one or more second machine learning parameters that are different than the one or more first machine learning parameters (Hughes [0154] recites “At 324, a contender model is generated as a result of the model building 320. At 328, reporting on the generated model may be presented. In some implementations, multiple models may be built and Examiner interprets the contender model being an intermediate model (i.e. second machine learning system)); 
using the particular input dataset as input into the second machine learning system, computing a second output dataset (Hughes [0153]-[0154] recites, in part, “At 320, a machine learning model is built using the cleansed annotated training set and annotated test set. In some instances, a shared model 322 may supplied to inform the model building 320… [0154] At 324, a contender model is generated as a result of the model building 320. At 328, reporting on the generated model may be presented. In some implementations, multiple models may be built and compared using common measures of quality against the annotated test set.” Contender model (i.e. second machine learning system) and cleansed annotated training set (i.e. second output dataset)); 
determining that an accuracy of the second output dataset is higher than an accuracy of the particular output dataset (Breckenridge Pg. 16, Col. 7, Ln. 59-65 recites “In some implementations, the effectiveness of each trained predictive model is estimated by performing cross-validation to generate a cross-validation score that is indicative of the accuracy of the trained predictive model, i.e., the number of exact matches of output data predicted by the trained model when compared to the output data included in the test sub-sample.” Additionally, Breckenridge Pg. 17, Col. 9, Ln. 30-40 recites “In the example implementation described, the criterion is the accuracy of the trained model and is estimated using a cross-validation score. Based on the scores, a trained predictive model is selected (Step 408). In some implementations, the trained models are ranked based on the value of their respective scores, and the top ranking trained model is chosen as the selected predictive model. Although the selected predictive model was trained during the evaluation stage described above, training at that stage may have involved Accuracy scores of each trained model ranked (i.e. accuracy of outputs)); 
storing default machine learning data associating the one or more second machine learning parameters with the particular machine learning training dataset (Hughes [0160] recites “Algorithm and candidate hyperparameter selection is performed using any number of methods: random selection, grid search, or Bayesian estimation methods (e.g. a Tree of Parzen Estimators). In each run of model training, the parameters necessary to re-create the experiment and the results of the experiments are stored in a database. These parameters may include random seeds, algorithm selection, loss function, hyperparameters, dataset splits, dataset hashes (e.g., a measure across the dataset to determine whether any change has occurred), and class weights. The store results may include both baselines as well as iterations performed during hyperparameter optimization.” Hyperparameters (i.e. machine learning parameters) and parameters including dataset splits and hashes (i.e. default machine learning data)); 
receiving a second input dataset and a selection of the particular selectable option identifying the particular machine learning training dataset (Hughes [0156] recites “Given training data and test data and a model type (e.g. text classifier, image classifier, semantic role labeling), the annotation server 202 selects an appropriate algorithm and loss function to use to establish a baseline.” Given training data and test data (i.e. receiving input dataset)); 
based on the default machine learning data, selecting the one or more second machine learning parameters for the second input dataset (Hughes [0156]-[0157] recites “The specific algorithm has in most cases been predetermined for the type of model and the amount of training data... [0157] Finally, a "baseline model" is trained using default hyperparameters selected for a given model type and algorithm.” Predetermined algorithm (i.e. default machine learning data) and hyperparameters selected (i.e. machine learning parameters)).
Please see motivation for claim 1 above.

Regarding Claim 10.
The Hughes/Breckenridge combination teaches the method of claim 1, further comprising, in response to computing the particular output dataset, deleting the particular machine learning system from the machine learning server computer (Breckenridge Pg. 17, Col. 10, Ln. 18-22 recites, in part, “In response, the input data is input to the trained predictive model 218 and a predictive output generated by the trained model (Step 416). The predictive output is provided.” Additionally, Pg. 20, Col. 16, Ln. 42-47 recites, in part, “…the effectiveness of each retrained predictive model can be compared to the effectiveness of the updateable trained predictive model from which it was derived, and the most effective of the two models stored in the repository 215 and the other discarded.” Discarding least effective model (i.e. deleting machine learning system)). 
Please see motivation for claim 1 above.

Regarding claims 11-12 and 14-20,
Claims 11-12 and 14-20 are directed to a computer system with one or more processors and memory having stored instructions performing substantially identical methods to those recited in claims 1-2 and 4-10, respectively. Therefore, the rejections to claims 11-12 and 14-20 apply equally here.
In addition, Hughes discloses the additional limitations of a computer system with one or more processors and memory having stored instructions (Hughes figure 26 elements .


Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Hughes in view of Breckenridge and in further view of Dirac et al. (US 10102480 B2, hereinafter Dirac).

Regarding Claim 3.
The Hughes/Breckenridge Combination teaches the method of claim 2, further comprising determining that the one or more corresponding items in the second output dataset comprise confidence scores above the confidence score threshold value and (Hughes Fig. 7 (702, 714 & 716) and [0196] recites “The priority queues 608 shown in the example of FIG. 7 include a priority queue 704 for samples with a high confidence prediction to be annotated with "Class A", a priority queue 706 for samples with a high confidence prediction to be annotated with "Class B", a priority queue 708 for samples with a high entropy (e.g., maintained in order of highest Shannon entropy), and a priority queue 710 for minimum margin samples.”. Prediction set with vectors (i.e. output dataset) and priority queues of samples (i.e. items)), 
in response, performing the replacing one or more data items in the particular output dataset with the one or more corresponding items in the second output dataset. 
Dirac teaches in response, performing the replacing one or more data items in the particular output dataset with the one or more corresponding items in the second output dataset (Dirac (87) and (91) recites “In one embodiment, the recipe management components of the MLS may examine the set of input data variables, and/or the outputs of the transformations indicated in a recipe, automatically identify groups of variables or outputs that may have a higher predictive capability than others, and provide an indication of such groups to the client… In the example output section 1210, a number of transformations are applied to input data variables, groups of variables, intermediate variables defined in earlier sections of the recipe, or the output of an artifact identified in the dependencies section. The transformed data is provided as input to a different model identified as “model1”.” Identify outputs that may have higher predictive capability and transformations applied to output (i.e. replacing data items in output dataset)).
The Hughes/Breckenridge Combination and Dirac are directed to machine learning. In view of the teachings of Dirac, it would have been obvious to one of ordinary skill in the art to apply the teachings of Dirac to Hughes as modified by Breckenridge before the effective filing date of the claimed invention in order to capture key relationships among different variables represented in large datasets and transform and extract representative subsets of data thereby improving The Hughes/Breckenridge Combination (cf. Dirac (3) Pg. #, Col. # Ln. #-#, in part, recites the following: 
“The quality of the results obtained from machine learning algorithms may depend on how well the empirical data used for training the models captures key relationships among different variables represented in the data, and on how effectively and efficiently these relationships can be identified. Depending on the 
).
	
Regarding claim 13,
Claims 13 is directed to a computer system with one or more processors and memory having stored instructions performing substantially identical methods to those recited in claim 3. Therefore, the rejections to claim 3 applies equally here.
In addition, Hughes discloses the additional limitations of a computer system with one or more processors and memory having stored instructions (Hughes figure 26 elements 2600, 2630 & 2620 and [0234] recites “In an example implementation, the processing unit 2620 may execute program code stored in the system memory 2630. For example, the bus may carry data to the system memory 2630, from which the processing unit 2620 receives and executes instructions. The data received by the system memory 2630 may optionally be stored on the removable storage 2640 or the non-removable storage 2650 before or after execution by the processing unit 2620.”).

Response to Arguments
Applicant’s arguments filed July 29th
Applicant’s arguments regarding the Claim Objection of Claim 11 have been fully considered, and due to amendment to the claim, are persuasive.
Applicant’s arguments regarding the 35 U.S.C. 103 rejections of independent Claims 1 and 11 have been fully considered, but are unpersuasive.
Specifically, applicant argues (last paragraph of pg. 11 of the response) that element 1314 only selects datasets to be annotated (not to be included in a set of datasets to be used in training a particular model) and thus does not teach displaying, though a graphical user interface, a plurality of selectable options, each selectable option of the plurality of selectable options identifying a machine learning training dataset to be used for training a machine learning system that computes an output corresponding to a particular problem associated with the identified machine learning training dataset for a given input among the plurality of machine learning training datasets.  However, the claim language reciting displaying (and later, selection of a particular selectable option) does not require that the displaying and selection be for the purpose of selecting the datasets that will be used for training.  The claim language only requires displaying and selecting datasets, wherein the displayed and selected datasets have the characteristic of  [being for use/to be used] in training the machine learning system.  Datasets which are selected to be annotated (and possibly later selected by another operation which actually designates their use in the training of the model) for use in training the model clearly lie within the claim scope.
Applicant finally asserts without argument that Hughes does not disclose “receiving a combination of an input dataset for evaluation, and a selection of training dataset to use for training before executing the evaluation.”  Hughes, [0155], clearly describes receiving the particular input dataset (the cross-validation dataset) and Hughes also describes receiving a selection of a particular selectable option identifying a particular machine learning training dataset as described above.  Receiving both falls within the broadest reasonable interpretation of receiving a combination of the two.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:  Most of the newly cited art show a selectable option in a graphical user interface for selecting a dataset to be used in training a machine learning model.  Additionally, Chan et al., “PredictionIO” (Fig. 2) shows an option for selecting a combination of training data and cross-validation data.

Applicant’s amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN M SMITH whose telephone number is (469)295-9104.  The examiner can normally be reached on Monday - Friday, 8:30am -5pm Central.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/BRIAN M SMITH/Primary Examiner, Art Unit 2122