DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 03/03/2022 has been entered.

Response to Amendment
The amendment filed 03/03/2022 has been entered. Claims 1, 3-10 and 12-19 remain pending in the application. 

Response to Arguments
Applicant’s arguments, filed 03/03/2022, with respect to the rejections of claims 1 and 10 under 103 have been fully considered and are persuasive because of the amendments. Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Mann et al. (US Patent 8,473,431) in view of Srinivasa et al. (US Pub. 2018/0174042) in view of Simard et al. (US Pub. 2015/0019460) in view of Duggan et al. (US Pub. 2017/0178020) and further in view of Campos (US Patent 7,092,941).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-4, 8-10, 12-13 and 17-18 are rejected under 35 USC. 103 as being unpatentable over Mann et al. (US Patent 8,473,431) in view of Srinivasa et al. (US Pub. 2018/0174042) in view of Simard et al. (US Pub. 2015/0019460) in view of Duggan et al. (US Pub. 2017/0178020) and further in view of Campos (US Patent 7,092,941).
As per claim 1, Mann teaches a method comprising: 
storing, at a first server computer, one or more machine learning training datasets [Fig. 1, Col. 4, lines 20-24, “allows training data 106a to be uploaded from the client computing system 104a to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112”], each of the machine learning training datasets comprising input data and verified output data [Col. 5, lines 52-55, “The training data can be in any convenient form that is understood by the modeling server system 206 to define a set of records, where each record includes an input and a corresponding desired output”]; 
receiving, at the first server computer [Fig. 1, the server system front end 110], a particular input dataset [Fig. 1, Col. 4, lines 20-24, “allows training data 106a to be uploaded from the client computing system 104a to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112”] and a request to run a machine learning system with the particular input dataset [Col. 2, lines 1-2, Input data, data identifying the first trained predictive model, and a request for a predictive output can be received; Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output, and generates the predictive output”];
sending, from the first server computer [Fig. 1, the server system front end 110] to a second server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112] separate from the first server computer, the particular input dataset, a particular machine learning training dataset of the one or more machine learning training datasets [Fig. 1, Col. 4, lines 20-49, “allows training data 106a to be uploaded from the client computing system 104a to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112 … when handling large volumes of training data and/or input data, the processes can be scaled across multiple computers at the data center 112. The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”; It can be seen that the one of computers (a second sever) in the data center 112 receives from the system front end 110 the input data, training data and using the training data to select a predictive model which process the input data to generate an output], and one or more particular configuration files for building the machine learning system [Col. 2, lines 1-11, “Input data, data identifying the first trained predictive model, and a request for a predictive output can be received. In response, the predictive output can be generated using the first predictive model and the input data … The multiple training functions can include two or more training functions for training predictive models of a same type, where each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used”]; 
using the second server computer, processing the particular input dataset with a particular machine learning system comprising [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]: 
by the second server computer, configuring the particular machine learning system using the one or more particular configuration files [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used”]; 
by the second server computer, training the particular machine learning system using the particular machine learning training dataset [Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected model can be trained and the trained model made available to users]; 
by the second server computer, using the particular input dataset as input into the particular machine learning system, computing a particular output dataset [Fig. 1, Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]; 
by the second server computer, sending the particular output dataset to the first server computer [Fig. 1 shows the predictive output 114 is sent to the server system front end 110; Col. 4, lines 41-43, “The predictive output 114 can be provided to the client computing system 104a, for example, over the network 102”]; 
sending, from the first server computer [Fig. 1, the server system front end 110] to a third server computer separate from the first server computer and the second server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112 (including a third server)], the particular input dataset, the second machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “allows training data 106a to be uploaded from the client computing system 104a (training data 106b to be uploaded from the client computing system 104b, etc.) to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112 … when handling large volumes of training data and/or input data, the processes can be scaled across multiple computers at the data center 112. The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers (third computer/server) in the data center 112 can run software that uses the training data (106b which is different with the training data 106a) to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”; It can be seen that the one of computers (a third sever) in the data center 112 receives from the system front end 110 the input data, training data and using the training data to select a predictive model which process the input data to generate an output], and the specific configuration file for building a machine learning system [Col. 2, lines 1-11, “Input data, data identifying the first trained predictive model, and a request for a predictive output can be received. In response, the predictive output can be generated using the first predictive model and the input data … The multiple training functions can include two or more training functions for training predictive models of a same type, where each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model”]; 
using the third server computer, re-processing the input dataset, with a second machine learning system comprising [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 (including third computer) can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]: 
by the third server computer, configuring the second machine learning system using the specific configuration file [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model”]; 
by the third server computer, training the second machine learning system using the second machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data (training data 106b) to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected model can be trained and the trained model made available to users];
by the third server computer, using the input dataset as input into the second machine learning system, computing a second output dataset [Fig. 1, Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]; and 
by the third server computer, sending the second output dataset to the first server computer [Fig. 1 shows the predictive output 114 is sent to the server system front end 110; Col. 4, lines 41-43, “The predictive output 114 can be provided to the client computing system 104a (or the client computing system 104b), for example, over the network 102”].   
Mann does not teach
determining that, for a subset of outputs of the particular output dataset, a respective confidence score of each output of the subset of outputs is below a confidence score threshold value; 
in response to determining that, for the subset of outputs of the particular output dataset, a respective confidence score of each output of the subset of outputs is below a confidence score threshold value: 
providing an interface indicating each output of the subset of outputs is below the confidence score threshold value, one or more additional machine learning training datasets, each of the additional machine learning training datasets differing from the particularATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 3 of 21machine learning training dataset, a type of machine learning system, and parameters for a machine learning system; 
receiving user input that is responsive to the interface, the user input indicating a selection of a further subset of the subset of outputs, a selection of a second machine learning training dataset from the one or more additional machine learning training datasets, a selection of a type of machine learning system, and a selection of parameters for a machine learning system; 
identifying a specific configuration file based on the selected type of machine learning system; 
updating the specific configuration file to include the selected parameters; 
sending, from the first server computer to a third server computer separate from the first server computer and the second server computer, the particular input dataset, a second machine learning training dataset of the one or more machine learning training datasets (emphasis added); 
using the third server computer, re-processing the particular input dataset based on the further subset of the subset of outputs, with a second machine learning system (emphasis added);
by the third server computer, using the particular input dataset as input into the second machine learning system, computing a second output dataset based on the further subset of the subset of outputs (emphasis added);
Srinivasa teaches
determining that, for a subset of outputs [Fig. 8, paragraph 0054, set of excluded samples 815] of the particular output dataset [Fig. 8, paragraph 0054, “a set of correctly classified samples and excluded samples is determined”], a respective confidence score of each output of the subset of outputs is below a confidence score threshold value [Figs 8, 10, paragraph 0054, “the sequence 800 depicts a set of training data 805 being provided as input to a first instance of a neural network 810. This first instance of the neural network 810 will operate to produce a classification or label for the various characteristics of the training data … The first instance of the neural network 810 will be trained on the complete set of the training data 805. After the training, a set of correctly classified samples and excluded samples is determined … the first set of excluded samples 815 include one or more training samples that were predicted by the neural network, but with a prediction score (e.g., confidence level) that is lower than a first threshold value”]; 
in response to determining that, for the subset of outputs of the particular output dataset, a respective confidence score of each output of the subset of outputs is below a confidence score threshold value [paragraph 0063, “If the classification score does not exceed the threshold, then the data sample … is further processed by a second neural network”]: 
sending, from the first server computer to a third server computer separate PAL01127862 1ATTORNEY DOCKETPATENT APPLICATION088813.013315/721,5783 of 18from the first server computer and the second server computer, the particular input dataset, a second machine learning training dataset of the one or more machine learning training datasets [paragraph 0055, “the first set of excluded samples 815 is provided to a second instance of the neural network 820 … the second instance of the neural network 820 will be trained from another set of correctly classified samples”];
using the third server computer, re-processing the particular input dataset based on the subset of the subset of outputs, with a second machine learning system [paragraph 0073, “The excluded training samples, which were unable to achieve a satisfactory classification from training in the first instance of the neural network, are then set aside and identified for subsequent training. This subsequent training process is depicted as including the repeating of an evaluation of the excluded training samples in a new instance of the neural network”];
by the third server computer, using the particular input dataset as input into the second machine learning system, computing a second output dataset based on the subset of the subset of outputs [Fig. 8, paragraph 0055, “the second instance of the neural network 820 will be trained from another set of correctly classified samples … while the network will remain untrained for one or more excluded samples 825 … The one or more excluded samples 825 again include mispredicted samples, or samples with a prediction score (e.g., confidence level) that is lower than a second threshold value. The second threshold value may be lower than the first threshold value, to allow additional classifications to be attempted”];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include the process of determining that a respective confidence score of the output is below a threshold value, sending to a third server computer the particular input dataset, a second machine learning training dataset, and using the third server computer, re-processing the particular input dataset of Srinivasa. Doing so would help enhancing supervised learning and operational procedures for a neural network by performing a cascading pattern classification on training data, to repeat the classification among a plurality of instances of the neural network until a classification approach is generated for all input data.
Mann and Srinivasa do not teach
providing an interface indicating each output of the subset of outputs is below the confidence score threshold value, one or more additional machine learning training datasets, each of the additional machine learning training datasets differing from the particularATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 3 of 21machine learning training dataset, a type of machine learning system, and parameters for a machine learning system; 
receiving user input that is responsive to the interface, the user input indicating a selection of a further subset of the subset of outputs, a selection of a second machine learning training dataset from the one or more additional machine learning training datasets, a selection of a type of machine learning system, and a selection of parameters for a machine learning system; 
identifying a specific configuration file based on the selected type of machine learning system; 
updating the specific configuration file to include the selected parameters; 
re-processing the particular input dataset based on the further subset of the subset of outputs (emphasis added);
computing a second output dataset based on the further subset of the subset of outputs (emphasis added);
Simard teaches
providing an interface indicating each output of the subset of outputs is below the confidence score threshold value [paragraph 0044, “range of data item scores selected to optimize the precision of the classifier may be within a range of scores having a probability of greater than 0.5 … The range of data item scores selected to optimize the recall of the classifier may be within a range of scores having a probability of less than 0.5”; paragraph 0045, “a method of interactively labeling training data for machine learning … A classifier is trained based at least on the data items that were identified as belonging to the particular class of data items … the classifier scores each data item with a probability of being a positive example of the particular class of data items. From the first set of data items, a second set of one or more data items is selected based on the scoring … where the selected one or more data items have scores that lie within a distribution around a probability of 0.75 on a scale of zero to one … where the selected one or more data items have scores that lie within a distribution around a probability of 0.25 (less than the threshold score .5) … The second set of one or more data items is presented on a user interface”; It can be seen that the outputs from the classifier with the scores less than the threshold (.5) are provided to the user via a user interface]; PAL01 1278621ATTORNEY DOCKETPATENT APPLICATION 088813.013315/721,578 3 of 17 
receiving user input that is responsive to the interface, the user input indicating a selection of a further subset of the subset of outputs [paragraph 0045, “The second set of one or more data items is presented on a user interface. Via the user interface, one or more user-provided labels are received that identify one or more of the data items in the second set as positive or negative examples of the particular class of data items”; It can be seen that the user selected correctly labeled a subset of the received outputs, and the system, via the user interface, receives one or more user-provided labels]; 
re-processing the particular input dataset based on the further subset of the subset of outputs [paragraph 0045, “The classifier is retrained based on the one or more user-provided labels”];
computing a second output dataset based on the further subset of the subset of outputs [paragraph 0045, “upon retraining the classifier, the first set of data items is scored with the retrained classifier”];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include the process of re-processing the particular input dataset based on receiving user input that is responsive to the interface, the user input indicating a selection of a further subset of the subset of outputs of Simard. Doing so would help improving the precision of the classifier and improving the recall of the classifier (Simard, 0046).
Mann, Srinivasa and Simard do not teach
providing an interface indicating … one or more additional machine learning training datasets, each of the additional machine learning training datasets differing from the particularATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 3 of 21machine learning training dataset, a type of machine learning system, and parameters for a machine learning system;
receiving user input that is responsive to the interface, a selection of a second machine learning training dataset from the one or more additional machine learning training datasets, a selection of a type of machine learning system, and a selection of parameters for a machine learning system; 
identifying a specific configuration file based on the selected type of machine learning system; 
updating the specific configuration file to include the selected parameters; 
Duggan teaches
providing an interface [abstract, “A machine provides a system and interface to allow domain experts and other users to develop, deploy, and iterate on analytical models. The system facilitates building, deploying, and/or training analytical models, by, e.g., exposing analytical model configuration parameters to a user while abstracting model building and model deployment activities”];
receiving user input that is responsive to the interface, a selection of a second machine learning training dataset from the one or more additional machine learning training datasets [Fig. 6, paragraphs 0065-0066, “accept a designation input specifying training data with which to train the analytical model … accept a user training input configured to cause training of the analytical model with the user-designated training data by the model builder circuitry 102 to create the trained analytical model”], a selection of a type of machine learning system [paragraphs 0060-0061, “provide a user interface to a user … interact with the user to receive a command to create a new analytical model. The user interface circuitry 114 may be configured to accept a user selection of an analytical model algorithm type for the analytical model … a user may select via the analytical model control user interface a particular base analytical model algorithm type that is available or provided by a particular compute engine 118 (e.g., linear regression, logistic regression, generalized linear models, neural network, or other analytical model types) for the new analytical model”], and a selection of parameters for a machine learning system [Fig. 6, paragraphs 0063, “accept a user alteration input that is applicable to an analytical model parameter for the new or existing analytical model (610). For example, the analytical model control user interface may receive from the user an alteration of a configuration parameter for the analytical model”]; 
identifying a specific configuration file based on the selected type of machine learning system [paragraphs 0063, “The applicable analytical model configuration parameters may vary with each different base analytical model algorithm type”];  
updating the specific configuration file to include the selected parameters [paragraph 0115, “adjust the parameters of the analytical model and re-test during the process of creating an accurate analytical model for a particular application or dataset”]; 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include the process of receiving from a user a selection of a second machine learning training dataset, a selection of a type of machine learning system, and a selection of parameters for a machine learning system of Duggan. Doing so would help iteratively developing, testing, and implementing analytical models (Duggan, 0021).
Mann, Srinivasa, Simard and Duggan do not explicitly teach
an interface indicating … one or more additional machine learning training datasets, each of the additional machine learning training datasets differing from the particularATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 3 of 21machine learning training dataset, a type of machine learning system, and parameters for a machine learning system (emphasis added);
Campos teaches
an interface indicating … one or more additional machine learning training datasets, each of the additional machine learning training datasets differing from the particularATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 3 of 21machine learning training dataset, a type of machine learning system, and parameters for a machine learning system [Fig. 5, Col. 4, lines 59-67 – Col. 6, lines 1-47 disclose a diagram of a model building which comprises training data 406, model algorithms 414 (type of machine learning) and training parameters 518; since Campos teaches the training data 406, and Srinivasa teaches in paragraph 0055 that different type of model will be trained with different set of training data, therefore, the combination of Mann (as modified) and Campos read on the limitation different training data set];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include an interface indicating one or more additional machine learning training datasets, a type of machine learning system, and parameters for a machine learning system of Duggan. Doing so would help performing clustering based data mining to improve performance in model building (Campos, abstract).

As per claim 3, Mann, Srinivasa, Simard, Duggan and Campos teach the method of claim 1.
Mann further teaches
sending, from the first server computer [Fig. 1, the server system front end 110] to a fourth server computer separate from the first server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112 (including a fourth server)], the input dataset, the particular machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “allows training data 106a to be uploaded from the client computing system 104a (training data 106b to be uploaded from the client computing system 104b, etc.) to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112 … when handling large volumes of training data and/or input data, the processes can be scaled across multiple computers at the data center 112. The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers (fourth computer/server) in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”; It can be seen that the one of computers (a fourth sever) in the data center 112 receives from the system front end 110 the input data, training data and using the training data to select a predictive model which process the input data to generate an output], and one or more third configuration files for building a machine learning system [Col. 2, lines 1-11, “Input data, data identifying the first trained predictive model, and a request for a predictive output can be received. In response, the predictive output can be generated using the first predictive model and the input data … The multiple training functions can include two or more training functions for training predictive models of a same type, where each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model”; It can be seen that each trained predictive model is generated based on different hyper-parameter configurations (including a third configuration)];
the one or more particular configuration files comprising one or more particular machine learning parameters [multiple different hyper-parameter configurations] and the one or more third configuration files comprising one or more third machine learning parameters that are different from the one or more particular machine learning parameters [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”, where, different hyper-parameter configuration can be interpreted as the configuration has the different parameters compare to other configuration; Col. 6, lines 53-67 to Col. 7, lines 1-15, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model … in the present example, where the type of predictive model is a linear regression model, changes to an l penalty generate different sets of parameters … a predictive model can be trained with different features, again generating different trained models … Considering the many different types of predictive models that are available, and then that each type of predictive model may have multiple training functions and that multiple hyper-parameter configurations and selected features may be used for each of the multiple training functions]; 
using the fourth server computer, processing the input dataset with a third machine learning system comprising [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 (including fourth computer) can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]: 
by the fourth server computer, configuring the third machine learning system usingPAL01127862 1ATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 4 of 18the one or more third configuration files [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”, where, different hyper-parameter configuration can be interpreted as the configuration has the different parameters compare to other configuration; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model; Col. 6, lines 53-67 to Col. 7, lines 1-15, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … a predictive model can be trained with different features, again generating different trained models … Considering the many different types of predictive models that are available, and then that each type of predictive model may have multiple training functions and that multiple hyper-parameter configurations and selected features may be used for each of the multiple training functions; It can be seen that each predictive model (including the third model/machine learning system) is configured with a different hyper-parameter configuration (the third hyper-parameter configuration in this case)”]; 
training the third machine learning system using the particular machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected model can be trained and the trained model made available to users]; 
using the input dataset as input into the third machine learning system, computing a third output dataset [Fig. 1, Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]; 
sending the third output dataset to the first server computer [Fig. 1 shows the predictive output 114 is sent to the server system front end 110];  
determining, at the first server computer, that the third output dataset is more accurate than the particular output dataset [Col. 8, lines 9-15, “each trained model is assigned a score that represents the effectiveness of the trained model … the criterion is the accuracy of the trained model and is estimated using a cross-validation score. Based on the scores, a trained predictive model is selected”]; and 
in response to determining, storing data identifying the one or more third machine learning parameters as default parameters for the particular machine learning system [Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models; Col. 8, lines 28-34, a trained model (i.e., "fully trained" model) is thereby generated for use in generating predictive output, e.g., trained model 218. The trained model 218 can be stored by the predictive modeling server system 206. That is, the trained model 218 can reside and execute in a data center that is remote from the client computing system; It can be seen that storing the trained model which has parameters that form the model after training for use as storing default parameters].  
Mann does not teach
sending, from the first server computer to a fourth server computer separate from the first server computer, the particular input dataset (emphasis added);
using the fourth server computer, processing the particular input dataset with a third machine learning system (emphasis added);
using the particular input dataset as input into the third machine learning system, computing a third output dataset (emphasis added).
Srinivasa teaches
sending, from the first server computer to a fourth server computer separate from the first server computer, the particular input dataset [paragraph 0055, “The one or more excluded samples 825 again include mispredicted samples, or samples with a prediction score (e.g., confidence level) that is lower than a second threshold value”; Fig. 8 shows the samples with a prediction score that is less than a threshold are sent to the next neural network (fourth server, Net 3 …) for further processing];
using the fourth server computer, processing the particular input dataset with a third machine learning system [Fig. 8, paragraph 0073, “The excluded training samples, which were unable to achieve a satisfactory classification from training in the first instance of the neural network, are then set aside and identified for subsequent training. This subsequent training process is depicted as including the repeating of an evaluation of the excluded training samples in a new instance of the neural network”];
using the particular input dataset as input into the third machine learning system, computing a third output dataset [Fig. 8, paragraph 0056, “The sequence 800 further depicts the cascaded training of other additional instances of the neural network. This cascading training is performed on cascading subsets of training data, until N instances of the network are produced, with neural network N-1 830 producing a final set of one or more excluded samples 835”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include the process of using the particular input dataset as input into the third machine learning system, computing a third output dataset of Srinivasa. Doing so would help enhancing supervised learning and operational procedures for a neural network by performing a cascading pattern classification on training data, to repeat the classification among a plurality of instances of the neural network until a classification approach is generated for all input data.

As per claim 4, Mann, Srinivasa, Simard, Duggan and Campos teach the method of claim 3.
Mann further teaches
receiving, at the first server computer, a third input dataset and a request to run a machine learning system with the third input dataset [Fig. 1, Col. 8, lines 49-51, “The predictive modeling server system 206 receives the input data and prediction request from the client computing system 202”]; 
sending, from the first server computer [Fig. 1, the server system front end 110] to a fifth server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112], the third input dataset, the particular machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “allows training data 106a to be uploaded from the client computing system 104a (training data 106b to be uploaded from the client computing system 104b, etc.) to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112 … when handling large volumes of training data and/or input data, the processes can be scaled across multiple computers at the data center 112. The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”; It can be seen that the one of computers (a fifth computer/sever) in the data center 112 receives from the system front end 110 the input data, training data and using the training data to select a predictive model which process the input data to generate an output], and one or more fourth configuration files for building the machine learning system [Col. 2, lines 1-11, “Input data, data identifying the first trained predictive model, and a request for a predictive output can be received. In response, the predictive output can be generated using the first predictive model and the input data … The multiple training functions can include two or more training functions for training predictive models of a same type, where each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model”; It can be seen that each trained predictive model is generated based on different hyper-parameter configurations (including a fourth configuration)], the one or more fourth configuration files comprising the one or more third machine learning parameters [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”, where, different hyper-parameter configuration can be interpreted as the configuration has the different parameters compare to other configuration; Col. 6, lines 53-67 to Col. 7, lines 1-15, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model … in the present example, where the type of predictive model is a linear regression model, changes to an l penalty generate different sets of parameters … a predictive model can be trained with different features, again generating different trained models … Considering the many different types of predictive models that are available, and then that each type of predictive model may have multiple training functions and that multiple hyper-parameter configurations and selected features may be used for each of the multiple training functions];
using the fifth server computer, processing the third input dataset with a fourth machine learning system by [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 (including fourth computer) can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]: 
configuring the fourth machine learning system using the one or more fourth configuration files [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model; Col. 6, lines 53-67 to Col. 7, lines 1-15, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … a predictive model can be trained with different features, again generating different trained models … Considering the many different types of predictive models that are available, and then that each type of predictive model may have multiple training functions and that multiple hyper-parameter configurations and selected features may be used for each of the multiple training functions; It can be seen that each predictive model (including the fourth model/machine learning system) is configured with a different hyper-parameter configuration (the fourth hyper-parameter configuration in this case)”]; 
training the  fourth machine learning system using the particular machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected model can be trained and the trained model made available to users]; 
using the third input dataset as input into the fourth machine learning system, computing a fourth output dataset [Fig. 1, Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]; and 
sending the fourth output dataset to the first server computer [Fig. 1 shows the predictive output 114 is sent to the server system front end 110].  

As per claim 8, Mann, Srinivasa, Simard, Duggan and Campos teach the method of claim 1.
Mann further teaches
receiving, at the first server computer [Fig. 1, the server system front end 110], a third input dataset [Fig. 1, element 108a-c, Col. 4, lines 20-24, “allows training data 106a to be uploaded from the client computing system 104a (104c in this case) to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112”]] and a request to run a machine learning system with the third input dataset [Fig. 1, Col. 2, lines 1-2, Input data, data identifying the first trained predictive model, and a request for a predictive output can be received; Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output, and generates the predictive output”]; 
sending, from the first server computer to a fourth server computer, the third dataset, a third machine learning training dataset of the one or more machine learning training datasets [Fig. 1, Col. 4, lines 20-49, “allows training data 106a to be uploaded from the client computing system 104a (training data 106b to be uploaded from the client computing system 104b, etc.) to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112 … when handling large volumes of training data and/or input data, the processes can be scaled across multiple computers at the data center 112. The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers (fourth computer/server) in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”; It can be seen that the one of computers (a fourth sever) in the data center 112 receives from the system front end 110 the input data, training data and using the training data to select a predictive model which process the input data to generate an output], and one or more third configuration files for building the machine learning system [Col. 2, lines 1-11, “Input data, data identifying the first trained predictive model, and a request for a predictive output can be received. In response, the predictive output can be generated using the first predictive model and the input data … The multiple training functions can include two or more training functions for training predictive models of a same type, where each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model”; It can be seen that each trained predictive model is generated based on different hyper-parameter configurations (including a third configuration)]; 
sending, from the first server computer [Fig. 1, the server system front end 110] to a fifth server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112], the third dataset, the second machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “allows training data 106a to be uploaded from the client computing system 104a (training data 106b to be uploaded from the client computing system 104b, etc.) to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112 … when handling large volumes of training data and/or input data, the processes can be scaled across multiple computers at the data center 112. The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”; It can be seen that the one of computers (a fifth computer/sever) in the data center 112 receives from the system front end 110 the input data, training data and using the training data to select a predictive model which process the input data to generate an output], and the one or more second configuration files [Col. 2, lines 1-11, “Input data, data identifying the first trained predictive model, and a request for a predictive output can be received. In response, the predictive output can be generated using the first predictive model and the input data … The multiple training functions can include two or more training functions for training predictive models of a same type, where each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model”; It can be seen that each trained predictive model is generated based on different hyper-parameter configurations (including a fourth configuration)]; 
using the fourth server computer, processing the third dataset with a third machine learning system by [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 (including fourth computer) can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]: 
configuring the third machine learning system using the one or more third configuration files [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”, where, different hyper-parameter configuration can be interpreted as the configuration has the different parameters compare to other configuration; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model; Col. 6, lines 53-67 to Col. 7, lines 1-15, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … a predictive model can be trained with different features, again generating different trained models … Considering the many different types of predictive models that are available, and then that each type of predictive model may have multiple training functions and that multiple hyper-parameter configurations and selected features may be used for each of the multiple training functions; It can be seen that each predictive model (including the third model/machine learning system) is configured with a different hyper-parameter configuration (the third hyper-parameter configuration in this case)”]; PAL01127862 1ATTORNEY DOCKETPATENT APPLICATION 088813.013315/721,578 7 of 18 
training the third machine learning system using the third machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected model can be trained and the trained model made available to users]; 
using the third dataset as input into the third machine learning system, computing a third output dataset [Fig. 1, Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”];
sending the third output dataset to the first server computer [Fig. 1 shows the predictive output 114 is sent to the server system front end 110];  
while the fourth server computer is processing the first subset of the third dataset, using the fifth server computer, processing the third dataset with the second machine learning system by [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 (including fifth computer) can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]:  
configuring the third machine learning system using the one or more third configuration files [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model”]; 
training the third machine learning system using the third machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data (training data 106b) to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected model can be trained and the trained model made available to users]; 
using the third dataset as input into the third machine learning system, computing a fourth output dataset [Fig. 1, Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]; and 
sending the fourth output dataset to the first server computer [Fig. 1 shows the predictive output 114 is sent to the server system front end 110; Col. 4, lines 41-43, “The predictive output 114 can be provided to the client computing system 104a (or the client computing system 104b), for example, over the network 102”].  
Mann does not teach
sending, from the first server computer to a fourth server computer, a first subset of the third dataset (emphasis added);
sending, from the first server computer to a fifth server computer, a second subset of the third dataset (emphasis added);
using the fourth server computer, processing the first subset of the third dataset (emphasis added);
using the first subset of the third dataset as input into the third machine learning system; (emphasis added); 
processing the second subset of the third dataset with the third machine learning system (emphasis added);
using the second subset of the third dataset as input into the third machine learning system (emphasis added).
Srinivasa teaches
sending, from the first server computer to a fourth server computer, a first subset of the third dataset [paragraph 0054, “the sequence 800 depicts a set of training data 805 being provided as input to a first instance of a neural network 810. This first instance of the neural network 810 will operate to produce a classification or label for the various characteristics of the training data … The first instance of the neural network 810 will be trained on the complete set of the training data 805. After the training, a set of correctly classified samples and excluded samples is determined … the first set of excluded samples 815 include one or more training samples that were predicted by the neural network, but with a prediction score (e.g., confidence level) that is lower than a first threshold value”; paragraph 0063, “If the classification score does not exceed the threshold, then the data sample … is further processed by a second neural network”; Fig. 8 shows the samples 815 are sent to NET 2, where samples 815 is a subset (first subset) of the training data 805];
sending, from the first server computer to a fifth server computer, a second subset of the third dataset [Fig. 8 shows the samples 825 are sent to NET N-1, where samples 825 is a subset (second subset) of the training data 805];
using the fourth server computer, processing the first subset of the third dataset [Fig. 8 shows NET 2 processes samples 815 and generates samples 825];
using the first subset of the third dataset as input into the third machine learning system [Fig. 8 shows the samples 815 are sent to NET 2 for processing]; 
processing the second subset of the third dataset with the third machine learning system [Fig. 8 shows NET N-1 processes samples 825 and generates samples 835];
using the second subset of the third dataset as input into the third machine learning system [Fig. 8 shows the samples 825 are sent to NET N-1 for processing].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include the process of using the particular input dataset as input into the third machine learning system, and processing the second subset of the third dataset with the third machine learning system of Srinivasa. Doing so would help enhancing supervised learning and operational procedures for a neural network by performing a cascading pattern classification on training data, to repeat the classification among a plurality of instances of the neural network until a classification approach is generated for all input data.

As per claim 9, Mann, Srinivasa, Simard, Duggan and Campos teach the method of claim 1.
Mann further teaches
in response to sending the particular output dataset to the first server computer [Fig. 1 shows the predictive output 114 is sent to the server system front end 110 (first server computer)], storing the particular machine learning system on a separate server computer [Col. 4, lines 24-39, “One or
more computers in the data center 112 can run software that uses the training data to estimate the effectiveness of multiple types of predictive models and make a selection of a trained predictive model to be used for data received from the particular client computing system 104a … The selected trained model executing in the data center 112”; Since the first server computer (the server system front end 110) is located remotely from the data center 112 (which comprising one or more computers), and the selected trained model (machine learning system) is located in the data center 112, thus, the machine learning system is stored on a separate server computer].  

As per claim 10, Mann teaches a computer system comprising: 
a first server computer [Fig. 1, the server system front end 110] comprising: 
one or more first processors [Col. 12, lines 35-42, “These various implementations may include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device”]; 
first memory storing first instructions which, when executed by the one or more first processors, cause performance of [Col. 9, lines 58-66, “Components of the ... predictive modeling system 206 ... can be realized by instructions that upon execution cause one or more computers to carry out the operations described above. Such instructions can comprise, for example, interpreted instructions … stored in a computer readable medium”]: 
storing one or more machine learning training datasets [Fig. 1, Col. 4, lines 20-24, “allows training data 106a to be uploaded from the client computing system 104a to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112”], each of the machine learning training datasets comprising input data and verified output data [Col. 5, lines 52-55, “The training data can be in any convenient form that is understood by the modeling server system 206 to define a set of records, where each record includes an input and a corresponding desired output”]; 
receiving a particular input dataset [Fig. 1, Col. 4, lines 20-24, “allows training data 106a to be uploaded from the client computing system 104a to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112”] and a request to run a machine learning system with the particular input dataset [Col. 2, lines 1-2, Input data, data identifying the first trained predictive model, and a request for a predictive output can be received; Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output, and generates the predictive output”]; 
sending to a second server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112] separate from the first server computer, the particular input dataset, a particular machine learning training dataset of the one or more machine learning training datasets [Fig. 1, Col. 4, lines 20-49, “allows training data 106a to be uploaded from the client computing system 104a to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112 … when handling large volumes of training data and/or input data, the processes can be scaled across multiple computers at the data center 112. The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”; It can be seen that the one of computers (a second sever) in the data center 112 receives from the system front end 110 the input data, training data and using the training data to select a predictive model which process the input data to generate an output], and one or more particular configuration files for building the machine learning system [Col. 2, lines 1-11, “Input data, data identifying the first trained predictive model, and a request for a predictive output can be received. In response, the predictive output can be generated using the first predictive model and the input data … The multiple training functions can include two or more training functions for training predictive models of a same type, where each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used”]; 
a second server computer comprising [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112 (including a second server)]: 
one or more second processors [Col. 12, lines 35-42, “These various implementations may include ... at least one programmable processor”]; 
second memory storing second instructions which, when executed by the one or more second processors, cause performance of [Col. 9, lines 58-66, “Components of the ... predictive modeling system 206 ... can be realized by instructions that upon execution cause one or more computers to carry out the operations described above. Such instructions can comprise, for example, interpreted instructions … stored in a computer readable medium”]: 
by the second server computer, receiving, from the first server computer, the particular input dataset, the particular machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “allows training data 106a to be uploaded from the client computing system 104a to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112 … when handling large volumes of training data and/or input data, the processes can be scaled across multiple computers at the data center 112. The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”; It can be seen that the one of computers (a second sever) in the data center 112 receives from the system front end 110 the input data, training data and using the training data to select a predictive model which process the input data to generate an output], and the one or more particular configuration files [Col. 2, lines 1-11, “Input data, data identifying the first trained predictive model, and a request for a predictive output can be received. In response, the predictive output can be generated using the first predictive model and the input data … The multiple training functions can include two or more training functions for training predictive models of a same type, where each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used”]; 
processing the particular input dataset with a particular machine learning system comprising [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]:
by the second server computer, configuring the particular machine learning system using the one or more particular configuration files [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used”]; 
by the second server computer, training the particular machine learning system using the particular machine learning training dataset [Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected model can be trained and the trained model made available to users]; 
by the second server computer, using the particular input dataset as input into the particular machine learning system, computing a particular output dataset [Fig. 1, Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]; 
by the second server computer, sending the particular output dataset to the first server computer [Fig. 1 shows the predictive output 114 is sent to the server system front end 110; Col. 4, lines 41-43, “The predictive output 114 can be provided to the client computing system 104a, for example, over the network 102”]; 
sending, from the first server computer [Fig. 1, the server system front end 110] to a third server computer separate from the first server computer and the second server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112 (including a third server)], the particular input dataset, the second machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “allows training data 106a to be uploaded from the client computing system 104a (training data 106b to be uploaded from the client computing system 104b, etc.) to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112 … when handling large volumes of training data and/or input data, the processes can be scaled across multiple computers at the data center 112. The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers (third computer/server) in the data center 112 can run software that uses the training data (106b which is different with the training data 106a) to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”; It can be seen that the one of computers (a third sever) in the data center 112 receives from the system front end 110 the input data, training data and using the training data to select a predictive model which process the input data to generate an output], and the specific configuration file for building a machine learning system [Col. 2, lines 1-11, “Input data, data identifying the first trained predictive model, and a request for a predictive output can be received. In response, the predictive output can be generated using the first predictive model and the input data … The multiple training functions can include two or more training functions for training predictive models of a same type, where each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model”]; 
wherein the computer system further comprises the third server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112 (including a third server)], the third server computer comprising: 
one or more third processors [Col. 9, lines 58-67 to Col. 19, lines 1-4, "The components of ... the predictive modeling system 206 can be implemented in multiple computers distributed over a network, such as a server farm, in one or more locations ...”; Col. 12, lines 43-57, "These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor ... "machine readable medium" "computer-readable medium" refers to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal"]; 
third memory storing third instructions which, when executed by the one or more third processors, cause performance of [Col. 9, lines 58-66, “predictive modeling system 206 ... can be realized by instructions that upon execution cause one or more computers to carry out the operations described above. Such instructions can comprise, for example, interpreted instructions ... stored in a computer readable medium”]: 
while the second server computer is processing the particular input dataset, re-processing the input dataset with a second machine learning system comprising [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 (including third computer) can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]: 
by the third server computer, configuring the second machine learning system using the specific configuration file [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model”]; 
by the third server computer, training the second machine learning system using the second machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data (training data 106b) to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected model can be trained and the trained model made available to users];
by the third server computer, using the input dataset as input into the second machine learning system, computing a second output dataset [Fig. 1, Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]; and 
by the third server computer, sending the second output dataset to the first server computer [Fig. 1 shows the predictive output 114 is sent to the server system front end 110; Col. 4, lines 41-43, “The predictive output 114 can be provided to the client computing system 104a (or the client computing system 104b), for example, over the network 102”].   
Mann does not teach
wherein the first instructions, when executed by the one or more first processors further cause performance of: 
determining that, for a subset of outputs of the particular output dataset, a respective confidence score of each output of the subset of outputs is below a confidence score threshold value; 
in response to determining that, for the subset of outputs of the particular output dataset, a respective confidence score of each output of the subset of outputs is below a confidence score threshold value: 
providing an interface indicating each output of the subset of outputs is below the confidence score threshold value, one or more additional machine learning training datasets, each of the additional machine learning training datasets differing from the particularATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 3 of 21machine learning training dataset, a type of machine learning system, and parameters for a machine learning system; 
receiving user input that is responsive to the interface, the user input indicating a selection of a further subset of the subset of outputs, a selection of a second machine learning training dataset from the one or more additional machine learning training datasets, a selection of a type of machine learning system, and a selection of parameters for a machine learning system; 
identifying a specific configuration file based on the selected type of machine learning system; 
updating the specific configuration file to include the selected parameters; 
sending, from the first server computer to a third server computer separate from the first server computer and the second server computer, the particular input dataset, a second machine learning training dataset of the one or more machine learning training datasets (emphasis added); 
using the third server computer, re-processing the particular input dataset based on the further subset of the subset of outputs, with a second machine learning system (emphasis added);
by the third server computer, using the particular input dataset as input into the second machine learning system, computing a second output dataset based on the further subset of the subset of outputs (emphasis added);
Srinivasa teaches
determining that, for a subset of outputs [Fig. 8, paragraph 0054, set of excluded samples 815] of the particular output dataset [Fig. 8, paragraph 0054, “a set of correctly classified samples and excluded samples is determined”], a respective confidence score of each output of the subset of outputs is below a confidence score threshold value [Figs 8, 10, paragraph 0054, “the sequence 800 depicts a set of training data 805 being provided as input to a first instance of a neural network 810. This first instance of the neural network 810 will operate to produce a classification or label for the various characteristics of the training data … The first instance of the neural network 810 will be trained on the complete set of the training data 805. After the training, a set of correctly classified samples and excluded samples is determined … the first set of excluded samples 815 include one or more training samples that were predicted by the neural network, but with a prediction score (e.g., confidence level) that is lower than a first threshold value”]; 
in response to determining that, for the subset of outputs of the particular output dataset, a respective confidence score of each output of the subset of outputs is below a confidence score threshold value [paragraph 0063, “If the classification score does not exceed the threshold, then the data sample … is further processed by a second neural network”]: 
sending, from the first server computer to a third server computer separate PAL01127862 1ATTORNEY DOCKETPATENT APPLICATION088813.013315/721,5783 of 18from the first server computer and the second server computer, the particular input dataset, a second machine learning training dataset of the one or more machine learning training datasets [paragraph 0055, “the first set of excluded samples 815 is provided to a second instance of the neural network 820 … the second instance of the neural network 820 will be trained from another set of correctly classified samples”];
using the third server computer, re-processing the particular input dataset based on the subset of the subset of outputs, with a second machine learning system [paragraph 0073, “The excluded training samples, which were unable to achieve a satisfactory classification from training in the first instance of the neural network, are then set aside and identified for subsequent training. This subsequent training process is depicted as including the repeating of an evaluation of the excluded training samples in a new instance of the neural network”];
by the third server computer, using the particular input dataset as input into the second machine learning system, computing a second output dataset based on the subset of the subset of outputs [Fig. 8, paragraph 0055, “the second instance of the neural network 820 will be trained from another set of correctly classified samples … while the network will remain untrained for one or more excluded samples 825 … The one or more excluded samples 825 again include mispredicted samples, or samples with a prediction score (e.g., confidence level) that is lower than a second threshold value. The second threshold value may be lower than the first threshold value, to allow additional classifications to be attempted”];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include the process of determining that a respective confidence score of the output is below a threshold value, sending to a third server computer the particular input dataset, a second machine learning training dataset, and using the third server computer, re-processing the particular input dataset of Srinivasa. Doing so would help enhancing supervised learning and operational procedures for a neural network by performing a cascading pattern classification on training data, to repeat the classification among a plurality of instances of the neural network until a classification approach is generated for all input data.
Mann and Srinivasa do not teach
providing an interface indicating each output of the subset of outputs is below the confidence score threshold value, one or more additional machine learning training datasets, each of the additional machine learning training datasets differing from the particularATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 3 of 21machine learning training dataset, a type of machine learning system, and parameters for a machine learning system; 
receiving user input that is responsive to the interface, the user input indicating a selection of a further subset of the subset of outputs, a selection of a second machine learning training dataset from the one or more additional machine learning training datasets, a selection of a type of machine learning system, and a selection of parameters for a machine learning system; 
identifying a specific configuration file based on the selected type of machine learning system; 
updating the specific configuration file to include the selected parameters; 
re-processing the particular input dataset based on the further subset of the subset of outputs (emphasis added);
computing a second output dataset based on the further subset of the subset of outputs (emphasis added);
Simard teaches
providing an interface indicating each output of the subset of outputs is below the confidence score threshold value [paragraph 0044, “range of data item scores selected to optimize the precision of the classifier may be within a range of scores having a probability of greater than 0.5 … The range of data item scores selected to optimize the recall of the classifier may be within a range of scores having a probability of less than 0.5”; paragraph 0045, “a method of interactively labeling training data for machine learning … A classifier is trained based at least on the data items that were identified as belonging to the particular class of data items … the classifier scores each data item with a probability of being a positive example of the particular class of data items. From the first set of data items, a second set of one or more data items is selected based on the scoring … where the selected one or more data items have scores that lie within a distribution around a probability of 0.75 on a scale of zero to one … where the selected one or more data items have scores that lie within a distribution around a probability of 0.25 (less than the threshold score .5) … The second set of one or more data items is presented on a user interface”; It can be seen that the outputs from the classifier with the scores less than the threshold (.5) are provided to the user via a user interface]; PAL01 1278621ATTORNEY DOCKETPATENT APPLICATION 088813.013315/721,578 3 of 17 
receiving user input that is responsive to the interface, the user input indicating a selection of a further subset of the subset of outputs [paragraph 0045, “The second set of one or more data items is presented on a user interface. Via the user interface, one or more user-provided labels are received that identify one or more of the data items in the second set as positive or negative examples of the particular class of data items”; It can be seen that the user selected correctly labeled a subset of the received outputs, and the system, via the user interface, receives one or more user-provided labels]; 
re-processing the particular input dataset based on the further subset of the subset of outputs [paragraph 0045, “The classifier is retrained based on the one or more user-provided labels”];
computing a second output dataset based on the further subset of the subset of outputs [paragraph 0045, “upon retraining the classifier, the first set of data items is scored with the retrained classifier”];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include the process of re-processing the particular input dataset based on receiving user input that is responsive to the interface, the user input indicating a selection of a further subset of the subset of outputs of Simard. Doing so would help improving the precision of the classifier and improving the recall of the classifier (Simard, 0046).
Mann, Srinivasa and Simard do not teach
providing an interface indicating … one or more additional machine learning training datasets, each of the additional machine learning training datasets differing from the particularATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 3 of 21machine learning training dataset, a type of machine learning system, and parameters for a machine learning system;
receiving user input that is responsive to the interface, a selection of a second machine learning training dataset from the one or more additional machine learning training datasets, a selection of a type of machine learning system, and a selection of parameters for a machine learning system; 
identifying a specific configuration file based on the selected type of machine learning system; 
updating the specific configuration file to include the selected parameters; 
Duggan teaches
providing an interface [abstract, “A machine provides a system and interface to allow domain experts and other users to develop, deploy, and iterate on analytical models. The system facilitates building, deploying, and/or training analytical models, by, e.g., exposing analytical model configuration parameters to a user while abstracting model building and model deployment activities”];
receiving user input that is responsive to the interface, a selection of a second machine learning training dataset from the one or more additional machine learning training datasets [Fig. 6, paragraphs 0065-0066, “accept a designation input specifying training data with which to train the analytical model … accept a user training input configured to cause training of the analytical model with the user-designated training data by the model builder circuitry 102 to create the trained analytical model”], a selection of a type of machine learning system [paragraphs 0060-0061, “provide a user interface to a user … interact with the user to receive a command to create a new analytical model. The user interface circuitry 114 may be configured to accept a user selection of an analytical model algorithm type for the analytical model … a user may select via the analytical model control user interface a particular base analytical model algorithm type that is available or provided by a particular compute engine 118 (e.g., linear regression, logistic regression, generalized linear models, neural network, or other analytical model types) for the new analytical model”], and a selection of parameters for a machine learning system [Fig. 6, paragraphs 0063, “accept a user alteration input that is applicable to an analytical model parameter for the new or existing analytical model (610). For example, the analytical model control user interface may receive from the user an alteration of a configuration parameter for the analytical model”]; 
identifying a specific configuration file based on the selected type of machine learning system [paragraphs 0063, “The applicable analytical model configuration parameters may vary with each different base analytical model algorithm type”];  
updating the specific configuration file to include the selected parameters [paragraph 0115, “adjust the parameters of the analytical model and re-test during the process of creating an accurate analytical model for a particular application or dataset”]; 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include the process of receiving from a user a selection of a second machine learning training dataset, a selection of a type of machine learning system, and a selection of parameters for a machine learning system of Duggan. Doing so would help iteratively developing, testing, and implementing analytical models (Duggan, 0021).
Mann, Srinivasa, Simard and Duggan do not explicitly teach
an interface indicating … one or more additional machine learning training datasets, each of the additional machine learning training datasets differing from the particularATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 3 of 21machine learning training dataset, a type of machine learning system, and parameters for a machine learning system (emphasis added);
Campos teaches
an interface indicating … one or more additional machine learning training datasets, each of the additional machine learning training datasets differing from the particularATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 3 of 21machine learning training dataset, a type of machine learning system, and parameters for a machine learning system [Fig. 5, Col. 4, lines 59-67 – Col. 6, lines 1-47 disclose a diagram of a model building which comprises training data 406, model algorithms 414 (type of machine learning) and training parameters 518; since Campos teaches the training data 406, and Srinivasa teaches in paragraph 0055 that different type of model will be trained with different set of training data, therefore, the combination of Mann (as modified) and Campos read on the limitation different training data set];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include an interface indicating one or more additional machine learning training datasets, a type of machine learning system, and parameters for a machine learning system of Duggan. Doing so would help performing clustering based data mining to improve performance in model building (Campos, abstract).

As per claim 12, Mann, Srinivasa, Simard, Dugganand Campos teach the computer system of claim 10.
Mann further teaches
sending, from the first server computer [Fig. 1, the server system front end 110] to a fourth server computer separate from the first server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112 (including a fourth server)], the input dataset, the particular machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “allows training data 106a to be uploaded from the client computing system 104a (training data 106b to be uploaded from the client computing system 104b, etc.) to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112 … when handling large volumes of training data and/or input data, the processes can be scaled across multiple computers at the data center 112. The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers (fourth computer/server) in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”; It can be seen that the one of computers (a fourth sever) in the data center 112 receives from the system front end 110 the input data, training data and using the training data to select a predictive model which process the input data to generate an output], and one or more third configuration files for building a machine learning system [Col. 2, lines 1-11, “Input data, data identifying the first trained predictive model, and a request for a predictive output can be received. In response, the predictive output can be generated using the first predictive model and the input data … The multiple training functions can include two or more training functions for training predictive models of a same type, where each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model”; It can be seen that each trained predictive model is generated based on different hyper-parameter configurations (including a third configuration)];
the one or more particular configuration files comprising one or more particular machine learning parameters [multiple different hyper-parameter configurations] and the one or more third configuration files comprising one or more third machine learning parameters that are different from the one or more particular machine learning parameters [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”, where, different hyper-parameter configuration can be interpreted as the configuration has the different parameters compare to other configuration; Col. 6, lines 53-67 to Col. 7, lines 1-15, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model … in the present example, where the type of predictive model is a linear regression model, changes to an l penalty generate different sets of parameters … a predictive model can be trained with different features, again generating different trained models … Considering the many different types of predictive models that are available, and then that each type of predictive model may have multiple training functions and that multiple hyper-parameter configurations and selected features may be used for each of the multiple training functions]; 
wherein the computer system further comprises the fourth server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112 (including a fourth server)], the fourth server computer comprising: 
one or more fourth processors [Col. 12, lines 35-42, “These various implementations may include ... at least one programmable processor”]; 
fourth memory storing fourth instructions which, when executed by the one or more fourth processors, cause performance of [Col. 9, lines 58-66, “Components of the ... predictive modeling system 206 ... can be realized by instructions that upon execution cause one or more computers to carry out the operations described above. Such instructions can comprise, for example, interpreted instructions … stored in a computer readable medium”]:
processing the input dataset with a third machine learning system by [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 (including fourth computer) can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]: 
	configuring the third machine learning system using the one or more third configuration files [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”, where, different hyper-parameter configuration can be interpreted as the configuration has the different parameters compare to other configuration; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model; Col. 6, lines 53-67 to Col. 7, lines 1-15, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … a predictive model can be trained with different features, again generating different trained models … Considering the many different types of predictive models that are available, and then that each type of predictive model may have multiple training functions and that multiple hyper-parameter configurations and selected features may be used for each of the multiple training functions; It can be seen that each predictive model (including the third model/machine learning system) is configured with a different hyper-parameter configuration (the third hyper-parameter configuration in this case)”]; 
training the third machine learning system using the particular machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected model can be trained and the trained model made available to users]; 
using the input dataset as input into the third machine learning system, computing a third output dataset [Fig. 1, Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]; 
sending the third output dataset to the first server computer [Fig. 1 shows the predictive output 114 is sent to the server system front end 110];  
wherein the first instructions, when executed by the one or more first process, further cause performance of: 
determining that the third output dataset is more accurate than the particular output dataset [Col. 8, lines 9-15, “each trained model is assigned a score that represents the effectiveness of the trained model … the criterion is the accuracy of the trained model and is estimated using a cross-validation score. Based on the scores, a trained predictive model is selected”]; 
in response to determining, storing data identifying the one or more third machine learning parameters as default parameters for the particular machine learning system [Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models; Col. 8, lines 28-34, a trained model (i.e., "fully trained" model) is thereby generated for use in generating predictive output, e.g., trained model 218. The trained model 218 can be stored by the predictive modeling server system 206. That is, the trained model 218 can reside and execute in a data center that is remote from the client computing system; It can be seen that storing the trained model which has parameters that form the model after training for use as storing default parameters].  
Mann does not teach
sending, from the first server computer to a fourth server computer separate from the first server computer, the particular input dataset (emphasis added);
processing the particular input dataset with a third machine learning system (emphasis added);
using the particular input dataset as input into the third machine learning system, computing a third output dataset (emphasis added).
Srinivasa teaches
sending, from the first server computer to a fourth server computer separate from the first server computer, the particular input dataset [paragraph 0055, “The one or more excluded samples 825 again include mispredicted samples, or samples with a prediction score (e.g., confidence level) that is lower than a second threshold value”; Fig. 8 shows the samples with a prediction score that is less than a threshold are sent to the next neural network (fourth server, Net 3 …) for further processing];
processing the particular input dataset with a third machine learning system [Fig. 8, paragraph 0073, “The excluded training samples, which were unable to achieve a satisfactory classification from training in the first instance of the neural network, are then set aside and identified for subsequent training. This subsequent training process is depicted as including the repeating of an evaluation of the excluded training samples in a new instance of the neural network”];
using the particular input dataset as input into the third machine learning system, computing a third output dataset [Fig. 8, paragraph 0056, “The sequence 800 further depicts the cascaded training of other additional instances of the neural network. This cascading training is performed on cascading subsets of training data, until N instances of the network are produced, with neural network N-1 830 producing a final set of one or more excluded samples 835”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include the process of using the particular input dataset as input into the third machine learning system, computing a third output dataset of Srinivasa. Doing so would help enhancing supervised learning and operational procedures for a neural network by performing a cascading pattern classification on training data, to repeat the classification among a plurality of instances of the neural network until a classification approach is generated for all input data.

As per claim 13, Mann, Srinivasa, Simard, Dugganand Campos teach the computer system of claim 12.
Mann further teaches
wherein the first instructions, when executed by the one or more first processors further cause performance of: 
receiving, at the first server computer, a third input dataset and a request to run a machine learning system with the third input dataset [Fig. 1, Col. 8, lines 49-51, “The predictive modeling server system 206 receives the input data and prediction request from the client computing system 202”]; 
sending, from the first server computer [Fig. 1, the server system front end 110] to a fifth server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112], the third input dataset, the particular machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “allows training data 106a to be uploaded from the client computing system 104a to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112 … when handling large volumes of training data and/or input data, the processes can be scaled across multiple computers at the data center 112. The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”; It can be seen that the one of computers (a fifth computer/sever) in the data center 112 receives from the system front end 110 the input data, training data and using the training data to select a predictive model which process the input data to generate an output], and one or more fourth configuration files for building the machine learning system [Col. 2, lines 1-11, “Input data, data identifying the first trained predictive model, and a request for a predictive output can be received. In response, the predictive output can be generated using the first predictive model and the input data … The multiple training functions can include two or more training functions for training predictive models of a same type, where each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model”; It can be seen that each trained predictive model is generated based on different hyper-parameter configurations (including a fourth configuration)], the one or more fourth configuration files comprising the one or more third machine learning parameters [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”, where, different hyper-parameter configuration can be interpreted as the configuration has the different parameters compare to other configuration; Col. 6, lines 53-67 to Col. 7, lines 1-15, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model … in the present example, where the type of predictive model is a linear regression model, changes to an l penalty generate different sets of parameters … a predictive model can be trained with different features, again generating different trained models … Considering the many different types of predictive models that are available, and then that each type of predictive model may have multiple training functions and that multiple hyper-parameter configurations and selected features may be used for each of the multiple training functions];
wherein the computer system further comprises the fifth server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112 (including a fifth server)], the fifth server computer comprising: 
one or more fifth processors [Col. 12, lines 35-42, “These various implementations may include ... at least one programmable processor”]; 
fifth memory storing fifth instructions which, when executed by the one or more fifth processors, cause performance of [Col. 9, lines 58-66, “Components of the ... predictive modeling system 206 ... can be realized by instructions that upon execution cause one or more computers to carry out the operations described above. Such instructions can comprise, for example, interpreted instructions … stored in a computer readable medium”]:
processing the third input dataset with a fourth machine learning system by [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 (including fourth computer) can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]: 
configuring the fourth machine learning system using the one or more fourth configuration files [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model; Col. 6, lines 53-67 to Col. 7, lines 1-15, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … a predictive model can be trained with different features, again generating different trained models … Considering the many different types of predictive models that are available, and then that each type of predictive model may have multiple training functions and that multiple hyper-parameter configurations and selected features may be used for each of the multiple training functions; It can be seen that each predictive model (including the fourth model/machine learning system) is configured with a different hyper-parameter configuration (the fourth hyper-parameter configuration in this case)”];
training the fourth machine learning system using the particular machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected model can be trained and the trained model made available to users]; 
using the third input dataset as input into the fourth machine learning system, computing a fourth output dataset [Fig. 1, Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]
sending the fourth output dataset to the first server computer [Fig. 1 shows the predictive output 114 is sent to the server system front end 110].  

As per claim 17, Mann, Srinivasa, Simard, Dugganand Campos teach the computer system of claim 10.
Mann further teaches
wherein the first instructions, when executed by the one or more first processors, further cause performance of: 
receiving a third input dataset [Fig. 1, element 108a-c, Col. 4, lines 20-24, “allows training data 106a to be uploaded from the client computing system 104a (104c in this case) to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112”]] and a request to run a machine learning system with the third input dataset [Fig. 1, Col. 2, lines 1-2, Input data, data identifying the first trained predictive model, and a request for a predictive output can be received; Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output, and generates the predictive output”]; 
sending, from the first server computer to a fourth server computer, the third dataset, a third machine learning training dataset of the one or more machine learning training datasets [Fig. 1, Col. 4, lines 20-49, “allows training data 106a to be uploaded from the client computing system 104a (training data 106b to be uploaded from the client computing system 104b, etc.) to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112 … when handling large volumes of training data and/or input data, the processes can be scaled across multiple computers at the data center 112. The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers (fourth computer/server) in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”; It can be seen that the one of computers (a fourth sever) in the data center 112 receives from the system front end 110 the input data, training data and using the training data to select a predictive model which process the input data to generate an output], and one or more third configuration files for building the machine learning system [Col. 2, lines 1-11, “Input data, data identifying the first trained predictive model, and a request for a predictive output can be received. In response, the predictive output can be generated using the first predictive model and the input data … The multiple training functions can include two or more training functions for training predictive models of a same type, where each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model”; It can be seen that each trained predictive model is generated based on different hyper-parameter configurations (including a third configuration)];
sending, from the first server computer [Fig. 1, the server system front end 110] to a fifth server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112], the third dataset, the third machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “allows training data 106a to be uploaded from the client computing system 104a (training data 106b to be uploaded from the client computing system 104b, etc.) to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112 … when handling large volumes of training data and/or input data, the processes can be scaled across multiple computers at the data center 112. The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”; It can be seen that the one of computers (a fifth computer/sever) in the data center 112 receives from the system front end 110 the input data, training data and using the training data to select a predictive model which process the input data to generate an output], and the one or more third configuration files [Col. 2, lines 1-11, “Input data, data identifying the first trained predictive model, and a request for a predictive output can be received. In response, the predictive output can be generated using the first predictive model and the input data … The multiple training functions can include two or more training functions for training predictive models of a same type, where each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model”; It can be seen that each trained predictive model is generated based on different hyper-parameter configurations (including a fourth configuration)]; 
wherein the computer system further comprises the fourth server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112 (including a fourth server)], the fourth server computer comprising: 
one or more fourth processors [Col. 12, lines 35-42, “These various implementations may include ... at least one programmable processor”]; 
fourth memory storing fourth instructions which, when executed by the one or more fourth processors, cause performance of [Col. 9, lines 58-66, “Components of the ... predictive modeling system 206 ... can be realized by instructions that upon execution cause one or more computers to carry out the operations described above. Such instructions can comprise, for example, interpreted instructions … stored in a computer readable medium”]:
processing the first subset of the third dataset with a third machine learning system by [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 (including fourth computer) can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]: 
configuring the third machine learning system using the one or more third configuration files [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”, where, different hyper-parameter configuration can be interpreted as the configuration has the different parameters compare to other configuration; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model; Col. 6, lines 53-67 to Col. 7, lines 1-15, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … a predictive model can be trained with different features, again generating different trained models … Considering the many different types of predictive models that are available, and then that each type of predictive model may have multiple training functions and that multiple hyper-parameter configurations and selected features may be used for each of the multiple training functions; It can be seen that each predictive model (including the third model/machine learning system) is configured with a different hyper-parameter configuration (the third hyper-parameter configuration in this case)”]; PAL01127862 1ATTORNEY DOCKETPATENT APPLICATION 088813.013315/721,578 7 of 18
training the third machine learning system using the third machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected model can be trained and the trained model made available to users]; 
using the third dataset as input into the third machine learning system, computing a third output dataset [Fig. 1, Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”];
sending the third output dataset to the first server computer [Fig. 1 shows the predictive output 114 is sent to the server system front end 110];  
wherein the computer system further comprises the fifth server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112 (including a fifth server)], the fifth server computer comprising: 
one or more fifth processors [Col. 12, lines 35-42, “These various implementations may include ... at least one programmable processor”];
fifth memory storing fifth instructions which, when executed by the one or more fifth processors, cause performance of [Col. 9, lines 58-66, “Components of the ... predictive modeling system 206 ... can be realized by instructions that upon execution cause one or more computers to carry out the operations described above. Such instructions can comprise, for example, interpreted instructions … stored in a computer readable medium”]:
while the fourth server computer is processing the first subset of the third dataset, processing the second subset of the third dataset with the third machine learning system by [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 (including fifth computer) can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]:  
configuring the third machine learning system using the one or more third configuration files [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model”]; 
training the third machine learning system using the third machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data (training data 106b) to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected model can be trained and the trained model made available to users]; 
using the third dataset as input into the third machine learning system, computing a fourth output dataset [Fig. 1, Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]; and 
sending the fourth output dataset to the first server computer [Fig. 1 shows the predictive output 114 is sent to the server system front end 110; Col. 4, lines 41-43, “The predictive output 114 can be provided to the client computing system 104a (or the client computing system 104b), for example, over the network 102”].  
Mann does not teach
sending, from the first server computer to a fourth server computer, a first subset of the third dataset (emphasis added);
sending, from the first server computer to a fifth server computer, a second subset of the third dataset (emphasis added);
using the fourth server computer, processing the first subset of the third dataset (emphasis added);
using the first subset of the third dataset as input into the third machine learning system; (emphasis added); 
processing the second subset of the third dataset with the third machine learning system (emphasis added);
using the second subset of the third dataset as input into the third machine learning system (emphasis added).
Srinivasa teaches
sending, from the first server computer to a fourth server computer, a first subset of the third dataset [paragraph 0054, “the sequence 800 depicts a set of training data 805 being provided as input to a first instance of a neural network 810. This first instance of the neural network 810 will operate to produce a classification or label for the various characteristics of the training data … The first instance of the neural network 810 will be trained on the complete set of the training data 805. After the training, a set of correctly classified samples and excluded samples is determined … the first set of excluded samples 815 include one or more training samples that were predicted by the neural network, but with a prediction score (e.g., confidence level) that is lower than a first threshold value”; paragraph 0063, “If the classification score does not exceed the threshold, then the data sample … is further processed by a second neural network”; Fig. 8 shows the samples 815 are sent to NET 2, where samples 815 is a subset (first subset) of the training data 805];
sending, from the first server computer to a fifth server computer, a second subset of the third dataset [Fig. 8 shows the samples 825 are sent to NET N-1, where samples 825 is a subset (second subset) of the training data 805];
using the fourth server computer, processing the first subset of the third dataset [Fig. 8 shows NET 2 processes samples 815 and generates samples 825];
using the first subset of the third dataset as input into the third machine learning system [Fig. 8 shows the samples 815 are sent to NET 2 for processing]; 
processing the second subset of the third dataset with the third machine learning system [Fig. 8 shows NET N-1 processes samples 825 and generates samples 835];
using the second subset of the third dataset as input into the third machine learning system [Fig. 8 shows the samples 825 are sent to NET N-1 for processing].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include the process of using the particular input dataset as input into the third machine learning system, and processing the second subset of the third dataset with the third machine learning system of Srinivasa. Doing so would help enhancing supervised learning and operational procedures for a neural network by performing a cascading pattern classification on training data, to repeat the classification among a plurality of instances of the neural network until a classification approach is generated for all input data.

As per claim 18, Mann, Srinivasa, Simard, Dugganand Campos teach the computer system of claim 10.
Mann further teaches
in response to sending the particular output dataset to the first server computer [Fig. 1 shows the predictive output 114 is sent to the server system front end 110 (first server computer)], storing the particular machine learning system on a separate server computer [Col. 4, lines 24-39, “One or
more computers in the data center 112 can run software that uses the training data to estimate the effectiveness of multiple types of predictive models and make a selection of a trained predictive model to be used for data received from the particular client computing system 104a … The selected trained model executing in the data center 112”; Since the first server computer (the server system front end 110) is located remotely from the data center 112 (which comprising one or more computers), and the selected trained model (machine learning system) is located in the data center 112, thus, the machine learning system is stored on a separate server computer].  

Claims 5-6 and 14-15 are rejected under 35 USC. 103 as being unpatentable over Mann et al. in view of Srinivasa et al. in view of Simard et al. in view of Duggan et al. in view of Campos and further in view of Hughes et al. (US Pub. 2020/0202171).
As per claim 5, Mann, Srinivasa, Simard, Dugganand Campos teach the method of claim 1. PAL01127862 1ATTORNEY DOCKETPATENT APPLICATION
Mann further teaches
sending, from the first server computer [Fig. 1, the server system front end 110] to a fourth server computer separate from the first server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112 (including a fourth server)], the input dataset, the particular machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “allows training data 106a to be uploaded from the client computing system 104a (training data 106b to be uploaded from the client computing system 104b, etc.) to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112 … when handling large volumes of training data and/or input data, the processes can be scaled across multiple computers at the data center 112. The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers (fourth computer/server) in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”; It can be seen that the one of computers (a fourth sever) in the data center 112 receives from the system front end 110 the input data, training data and using the training data to select a predictive model which process the input data to generate an output], and one or more third configuration files for building a machine learning system [Col. 2, lines 1-11, “Input data, data identifying the first trained predictive model, and a request for a predictive output can be received. In response, the predictive output can be generated using the first predictive model and the input data … The multiple training functions can include two or more training functions for training predictive models of a same type, where each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model”; It can be seen that each trained predictive model is generated based on different hyper-parameter configurations (including a third configuration)];
the one or more particular configuration files comprising one or more particular machine learning parameters [multiple different hyper-parameter configurations] and the one or more third configuration files comprising one or more third machine learning parameters that are different from the one or more particular machine learning parameters [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”, where, different hyper-parameter configuration can be interpreted as the configuration has the different parameters compare to other configuration; Col. 6, lines 53-67 to Col. 7, lines 1-15, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model … in the present example, where the type of predictive model is a linear regression model, changes to an l penalty generate different sets of parameters … a predictive model can be trained with different features, again generating different trained models … Considering the many different types of predictive models that are available, and then that each type of predictive model may have multiple training functions and that multiple hyper-parameter configurations and selected features may be used for each of the multiple training functions]; 
using the fourth server computer, processing the input dataset with a third machine learning system by [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 (including fourth computer) can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]: 
configuring the third machine learning system usingPAL01127862 1ATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 4 of 18the one or more third configuration files [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”, where, different hyper-parameter configuration can be interpreted as the configuration has the different parameters compare to other configuration; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model; Col. 6, lines 53-67 to Col. 7, lines 1-15, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … a predictive model can be trained with different features, again generating different trained models … Considering the many different types of predictive models that are available, and then that each type of predictive model may have multiple training functions and that multiple hyper-parameter configurations and selected features may be used for each of the multiple training functions; It can be seen that each predictive model (including the third model/machine learning system) is configured with a different hyper-parameter configuration (the third hyper-parameter configuration in this case)”]; 
training the third machine learning system using the particular machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected model can be trained and the trained model made available to users]; 
using the input dataset as input into the third machine learning system, computing a third output dataset [Fig. 1, Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]; 
sending the third output dataset to the first server computer [Fig. 1 shows the predictive output 114 is sent to the server system front end 110];  
determining, at the first server computer, that the third output dataset is more accurate than the particular output dataset [Col. 8, lines 9-15, “each trained model is assigned a score that represents the effectiveness of the trained model … the criterion is the accuracy of the trained model and is estimated using a cross-validation score. Based on the scores, a trained predictive model is selected”]; and 
Mann does not teach
sending, from the first server computer to a fourth server computer separate from the first server computer, the particular input dataset (emphasis added);
using the fourth server computer, processing the particular input dataset with a third machine learning system (emphasis added);
using the particular input dataset as input into the third machine learning system, computing a third output dataset (emphasis added).
in response to determining, storing the third output dataset and deleting the particular output dataset.  
Srinivasa teaches
sending, from the first server computer to a fourth server computer separate from the first server computer, the particular input dataset [paragraph 0055, “The one or more excluded samples 825 again include mispredicted samples, or samples with a prediction score (e.g., confidence level) that is lower than a second threshold value”; Fig. 8 shows the samples with a prediction score that is less than a threshold are sent to the next neural network (fourth server, Net 3 …) for further processing];
using the fourth server computer, processing the particular input dataset with a third machine learning system [Fig. 8, paragraph 0073, “The excluded training samples, which were unable to achieve a satisfactory classification from training in the first instance of the neural network, are then set aside and identified for subsequent training. This subsequent training process is depicted as including the repeating of an evaluation of the excluded training samples in a new instance of the neural network”];
using the particular input dataset as input into the third machine learning system, computing a third output dataset [Fig. 8, paragraph 0056, “The sequence 800 further depicts the cascaded training of other additional instances of the neural network. This cascading training is performed on cascading subsets of training data, until N instances of the network are produced, with neural network N-1 830 producing a final set of one or more excluded samples 835”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include the process of using the particular input dataset as input into the third machine learning system, computing a third output dataset of Srinivasa. Doing so would help enhancing supervised learning and operational procedures for a neural network by performing a cascading pattern classification on training data, to repeat the classification among a plurality of instances of the neural network until a classification approach is generated for all input data.
Mann, Srinivasa, Simard, Duggan and Campos do not teach
in response to determining, storing the third output dataset and deleting the particular output dataset.  
Hughes teaches
in response to determining, storing the third output dataset and deleting the particular output dataset [paragraph 0197, “As a new prediction (new/third output) is received, each of the priority queues 608 evaluate the sampling score for the new prediction … then the sampling score is compared against one or more of the sampling scores of prior saved predictions (first/particular output) in the priority queue … if the sampling score is not greater than any of the sampling scores of previously stored predictions, then the prediction is discarded. Otherwise, the prediction is saved in the priority queue … and a lowest scoring prediction is removed from the priority queue”; Since Mann teaches assigning the score to each trained model that represents the effectiveness of the trained model, and Hughes teaches removing the stored output/prediction with the lower score, thus the combination of Mann and Hughes read on the claim limitation].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the process of storing the third output dataset and deleting the particular output dataset of Hughes into the method for training a predictive model of Mann. Doing so would help storing only the top results/predictions for samples according to the sampling algorithm (Hughes, 0191).

As per claim 6, Mann, Srinivasa, Simard, Duggan, Campos and Hughes teach the method of claim 5.
Mann teaches 
storing at the first server computer [Col. 6, lines 36-37, “The predictive modeling server system 206 includes a repository”];
Hughes teaches 
storing the confidence score threshold value [paragraph 0197, “Different priority queues may use different threshold values”];
Srinivasa further teaches
determining that a number of data items in the third output dataset with confidence scoresPAL01127862 1ATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 6 of 18above the confidence score threshold value exceeds a number of data items in the particular output dataset with confidence scores above the confidence score threshold value [Figs. 8, 15, paragraphs 54-56, “the first set of excluded samples 815 include one or more training samples that were predicted by the neural network, but with a prediction score (e.g., confidence level) that is lower than a first threshold value … the first set of excluded samples 815 is provided to a second instance of the neural network 820 … The one or more excluded samples 825 again include mispredicted samples, or samples with a prediction score (e.g., confidence level) that is lower than a second threshold value”; since the number of training samples 815 which having the scores lower than the threshold are sent to the next neural network for further processing, and the number of samples 825, 835 … are generated (samples which having the scores lower than the threshold), where paragraph 0077, “A prediction score ( e.g., a classification confidence score) is then produced and evaluated from the first network instance (operation 1520). If the prediction score is below a threshold value, then the evaluation processes (operations 1510, 1520) are repeated with a second instance of a neural network operation 1530), until the prediction score meets or exceeds the threshold value”; therefore, it can be seen that the number of samples that having the scores lower than the threshold at the first neural network are more than the number of samples that having the scores lower than the threshold at the third neural network].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include the process of determining that a number of data items in the third output dataset with confidence scoresPAL01127862 1ATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 6 of 18above the confidence score threshold value exceeds a number of data items in the particular output dataset with confidence scores above the confidence score threshold value of Srinivasa. Doing so would help enhancing supervised learning and operational procedures for a neural network by performing a cascading pattern classification on training data, to repeat the classification among a plurality of instances of the neural network until a classification approach is generated for all input data.

As per claim 14, Mann, Srinivasa, Simard, Duggan and Campos teach the computer system of claim 12.
Mann further teaches
wherein the first instructions, when executed by the one or more first processors further cause performance of: 
sending, from the first server computer [Fig. 1, the server system front end 110] to a fourth server computer separate from the first server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112 (including a fourth server)], input dataset, the particular machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “allows training data 106a to be uploaded from the client computing system 104a (training data 106b to be uploaded from the client computing system 104b, etc.) to the predictive modeling server system 109 over the network 102. The server system front end 110 (first server) can receive, store and manage large volumes of data using the data center 112 … when handling large volumes of training data and/or input data, the processes can be scaled across multiple computers at the data center 112. The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers (fourth computer/server) in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”; It can be seen that the one of computers (a fourth sever) in the data center 112 receives from the system front end 110 the input data, training data and using the training data to select a predictive model which process the input data to generate an output], and one or more third configuration files for building a machine learning system [Col. 2, lines 1-11, “Input data, data identifying the first trained predictive model, and a request for a predictive output can be received. In response, the predictive output can be generated using the first predictive model and the input data … The multiple training functions can include two or more training functions for training predictive models of a same type, where each predictive model is trained with a different hyper-parameter configuration”; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model”; It can be seen that each trained predictive model is generated based on different hyper-parameter configurations (including a third configuration)];
the one or more particular configuration files comprising one or more particular machine learning parameters [multiple different hyper-parameter configurations] and the one or more third configuration files comprising one or more third machine learning parameters that are different from the one or more particular machine learning parameters [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”, where, different hyper-parameter configuration can be interpreted as the configuration has the different parameters compare to other configuration; Col. 6, lines 53-67 to Col. 7, lines 1-15, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model … in the present example, where the type of predictive model is a linear regression model, changes to an l penalty generate different sets of parameters … a predictive model can be trained with different features, again generating different trained models … Considering the many different types of predictive models that are available, and then that each type of predictive model may have multiple training functions and that multiple hyper-parameter configurations and selected features may be used for each of the multiple training functions]; 
wherein the computer system further comprises the fourth server computer [Fig. 1, Col. 4, lines 20-49, one or more computers at the data center 112 (including a fourth server)], the fourth server computer comprising: 
one or more fourth processors [Col. 12, lines 35-42, “These various implementations may include ... at least one programmable processor”]; 
fourth memory storing fourth instructions which, when executed by the one or more fourth processors, cause performance of [Col. 9, lines 58-66, “Components of the ... predictive modeling system 206 ... can be realized by instructions that upon execution cause one or more computers to carry out the operations described above. Such instructions can comprise, for example, interpreted instructions … stored in a computer readable medium”]:
processing the input dataset with a third machine learning system by [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 (including fourth computer) can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]:
configuring the third machine learning system usingPAL01127862 1ATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 4 of 18the one or more third configuration files [Col. 2, lines 1-11, “each predictive model is trained with a different hyper-parameter configuration”, where, different hyper-parameter configuration can be interpreted as the configuration has the different parameters compare to other configuration; Col. 6, lines 53-67, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … A training function is applied to the training data to generate a set of parameters. These parameters form the trained predictive model. For example, to train (or estimate) a Naive Bayes model, the method of maximum likelihood can be used … if the type of predictive model is a linear regression model, more than one different training function for a linear regression model can be used … to generate more than one trained predictive model; Col. 6, lines 53-67 to Col. 7, lines 1-15, “For a given training function, multiple different hyper-parameter configurations can be applied to the training function, again generating multiple different trained predictive models … a predictive model can be trained with different features, again generating different trained models … Considering the many different types of predictive models that are available, and then that each type of predictive model may have multiple training functions and that multiple hyper-parameter configurations and selected features may be used for each of the multiple training functions; It can be seen that each predictive model (including the third model/machine learning system) is configured with a different hyper-parameter configuration (the third hyper-parameter configuration in this case)”]; 
training the third machine learning system using the particular machine learning training dataset [Fig. 1, Col. 4, lines 20-49, “The predictive modeling server system 109 can automatically provision and allocate the required resources, using one or more computers … One or more computers in the data center 112 can run software that uses the training data to … make a selection of a trained predictive model to be used for data received from the particular client computing system … The selected model can be trained and the trained model made available to users]; 
using the input dataset as input into the third machine learning system, computing a third output dataset [Fig. 1, Col. 4, lines 38-41, “The selected trained model executing in the data center 112 receives the prediction request, input data and request for a predictive output and generates the predictive output”]; 
sending the third output dataset to the first server computer [Fig. 1 shows the predictive output 114 is sent to the server system front end 110];  
wherein the first instructions, when executed by the one or more first processors further cause performance of: 
determining, at the first server computer, that the third output dataset is more accurate than the particular output dataset [Col. 8, lines 9-15, “each trained model is assigned a score that represents the effectiveness of the trained model … the criterion is the accuracy of the trained model and is estimated using a cross-validation score. Based on the scores, a trained predictive model is selected”];
Mann does not teach
sending, from the first server computer to a fourth server computer separate from the first server computer, the particular input dataset (emphasis added);
processing the particular input dataset with a third machine learning system (emphasis added);
using the particular input dataset as input into the third machine learning system, computing a third output dataset (emphasis added).
in response to determining, storing the third output dataset and deleting the particular output dataset.  
Srinivasa teaches
sending, from the first server computer to a fourth server computer separate from the first server computer, the particular input dataset [paragraph 0055, “The one or more excluded samples 825 again include mispredicted samples, or samples with a prediction score (e.g., confidence level) that is lower than a second threshold value”; Fig. 8 shows the samples with a prediction score that is less than a threshold are sent to the next neural network (fourth server, Net 3 …) for further processing];
processing the particular input dataset with a third machine learning system [Fig. 8, paragraph 0073, “The excluded training samples, which were unable to achieve a satisfactory classification from training in the first instance of the neural network, are then set aside and identified for subsequent training. This subsequent training process is depicted as including the repeating of an evaluation of the excluded training samples in a new instance of the neural network”];
using the particular input dataset as input into the third machine learning system, computing a third output dataset [Fig. 8, paragraph 0056, “The sequence 800 further depicts the cascaded training of other additional instances of the neural network. This cascading training is performed on cascading subsets of training data, until N instances of the network are produced, with neural network N-1 830 producing a final set of one or more excluded samples 835”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include the process of using the particular input dataset as input into the third machine learning system, computing a third output dataset of Srinivasa. Doing so would help enhancing supervised learning and operational procedures for a neural network by performing a cascading pattern classification on training data, to repeat the classification among a plurality of instances of the neural network until a classification approach is generated for all input data.
Mann, Srinivasa, Simard, Duggan and Campos do not teach
in response to determining, storing the third output dataset and deleting the particular output dataset.  
Hughes teaches
in response to determining, storing the third output dataset and deleting the particular output dataset [paragraph 0197, “As a new prediction (new/third output) is received, each of the priority queues 608 evaluate the sampling score for the new prediction … then the sampling score is compared against one or more of the sampling scores of prior saved predictions (first/particular output) in the priority queue … if the sampling score is not greater than any of the sampling scores of previously stored predictions, then the prediction is discarded. Otherwise, the prediction is saved in the priority queue … and a lowest scoring prediction is removed from the priority queue”; Since Mann teaches assigning the score to each trained model that represents the effectiveness of the trained model, and Hughes teaches removing the stored output/prediction with the lower score, thus the combination of Mann and Hughes read on the claim limitation].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the process of storing the third output dataset and deleting the particular output dataset of Hughes into the method for training a predictive model of Mann. Doing so would help storing only the top results/predictions for samples according to the sampling algorithm (Hughes, 0191).

As per claim 15, Mann, Srinivasa, Simard, Duggan, Campos and Hughes teach the computer system of claim 14.
Mann teaches 
storing at the first server computer [Col. 6, lines 36-37, “The predictive modeling server system 206 includes a repository”];
Hughes teaches 
storing the confidence score threshold value [paragraph 0197, “Different priority queues may use different threshold values”];
Srinivasa further teaches
determining that a number of data items in the third output dataset with confidence scoresPAL01127862 1ATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 6 of 18above the confidence score threshold value exceeds a number of data items in the particular output dataset with confidence scores above the confidence score threshold value [Figs. 8, 15, paragraphs 54-56, “the first set of excluded samples 815 include one or more training samples that were predicted by the neural network, but with a prediction score (e.g., confidence level) that is lower than a first threshold value … the first set of excluded samples 815 is provided to a second instance of the neural network 820 … The one or more excluded samples 825 again include mispredicted samples, or samples with a prediction score (e.g., confidence level) that is lower than a second threshold value”; since the number of training samples 815 which having the scores lower than the threshold are sent to the next neural network for further processing, and the number of samples 825, 835 … are generated (samples which having the scores lower than the threshold), where paragraph 0077, “A prediction score ( e.g., a classification confidence score) is then produced and evaluated from the first network instance (operation 1520). If the prediction score is below a threshold value, then the evaluation processes (operations 1510, 1520) are repeated with a second instance of a neural network operation 1530), until the prediction score meets or exceeds the threshold value”; therefore, it can be seen that the number of samples that having the scores lower than the threshold at the first neural network are more than the number of samples that having the scores lower than the threshold at the third neural network].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include the process of determining that a number of data items in the third output dataset with confidence scoresPAL01127862 1ATTORNEY DOCKETPATENT APPLICATION088813.013315/721,578 6 of 18above the confidence score threshold value exceeds a number of data items in the particular output dataset with confidence scores above the confidence score threshold value of Srinivasa. Doing so would help enhancing supervised learning and operational procedures for a neural network by performing a cascading pattern classification on training data, to repeat the classification among a plurality of instances of the neural network until a classification approach is generated for all input data.

Claims 7 and 16 are rejected under 35 USC. 103 as being unpatentable over Mann et al. in view of Srinivasa et al. in view of Simard et al. in view of Duggan et al. in view of Campos and further in view of Lin et al. (US Patent 8,370,280).
As per claim 7, Mann, Srinivasa, Simard, Duggan and Campos teach the method of claim 1.
Mann, Srinivasa, Simard, Duggan and Campos do not teach
determining, at the first server computer, a size of the particular machine learning system; 
determining, at the first server computer, one or more capabilities of the second server computer and one or more capabilities of a third server computer;
 based, at least in part, on the size of the particular machine learning system, determining that the second server computer is capable of running the particular machine learning system and that the third server computer is not capable of running the particular machine learning system; and 
in response to determining that the second server computer is capable of running the particular machine learning system and that the third server computer is not capable of running the particular machine learning system, selecting the second server computer for running the particular machine learning system.  
Lin teaches
determining, at the first server computer, a size of the particular machine learning system [Col. 6, lines 38-57, “The servers execute computer programs that implement model implementations 208, an implementation selector 210, and model executors 212. The model executors 212 can use the implementation selector 210 to select model implementations 208 to execute based on various factors. A given predictive model (e.g., a support vector machine) can have a number of different possible predictive model implementations. In some implementations, predetermined predictive model implementations can be provided. For example, there can be small, medium and/or large implementations. A small predictive model implementation uses the resources of a single server, a medium predictive model implementation has a parallelized implementation (e.g., a map-reduce predictive model implementation) that uses the resources of N servers, and a large implementation has a parallelized implementation that uses the resources of P servers, where P>N. In some examples, P and N can be varied dynamically based on the available resources of the system 200 (e.g., the number of a servers that are available to execute a portion of the model implementation) and other factors" Determining model implementation based on factors such as resource constraints and usage (i.e. size)]; 
determining, at the first server computer, one or more capabilities of the second server computer and one or more capabilities of a third server computer [Col. 7, lines 42-48, “In some implementations, the user's remaining account balance determines which model implementations (e.g., small, medium and large) are selected based an estimate of what the user will be charged for the usage. That is, the largest model implementation possible is selected that is not estimated to result in a negative account balance based on system 200 usage"; determining model implementations based on estimates can be interpreted as determining based on capabilities];
based, at least in part, on the size of the particular machine learning system, determining that the second server computer is capable of running the particular machine learning system and that the third server computer is not capable of running the particular machine learning system [Col. 7, lines 36-42, “In the former case, the user can pay for a level or grade of service which determines the size of model implementations that are available to them. In the latter case, the user is charged for the system 200 resources they consume so that if the user (or the system 200) selects larger model implementations, the user will be charged accordingly"; determining which models to implement based on service, resources and usage can be interpreted as determining capabilities based on size”]; and 
in response to determining that the second server computer is capable of running the particular machine learning system and that the third server computer is not capable of running the particular machine learning system, selecting the second server computer for running the particular machine learning system [Col. 7, lines 46-48, “the largest model implementation possible is selected that is not estimated to result in a negative account balance based on system 200 usage”; selecting the model implementation based on usage capabilities can be interpreted as selecting the server(s) to implement the model].   
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the process of determining that the second server computer is capable of running the particular machine learning system and that the third server computer is not capable of running the particular machine learning system, and selecting the second server computer for running the particular machine learning system of Lin into the method for training a predictive model of Mann. Doing so would help providing the faster model implementation (Lin, Col. 7, lines 28-29).

As per claim 17, Mann, Srinivasa, Simard, Duggan and Campos teach the computer system of claim 10.
Mann, Srinivasa, Simard, Duggan and Campos do not teach
the first instructions, when executed by the one or more first processors, further cause performance of: 
determining, at the first server computer, a size of the particular machine learning system; 
determining, at the first server computer, one or more capabilities of the second server computer and one or more capabilities of a third server computer; 
based, at least in part, on the size of the particular machine learning system, determining that the second server computer is capable of running the particular machine learning system and that the third server computer is not capable of running the particular machine learning system; and 
in response to determining that the second server computer is capable of running the particular machine learning system and that the third server computer is not capable of running the particular machine learning system, selecting the second server computer for running the particular machine learning system.   
Lin teaches
determining, at the first server computer, a size of the particular machine learning system [Col. 6, lines 38-57, “The servers execute computer programs that implement model implementations 208, an implementation selector 210, and model executors 212. The model executors 212 can use the implementation selector 210 to select model implementations 208 to execute based on various factors. A given predictive model (e.g., a support vector machine) can have a number of different possible predictive model implementations. In some implementations, predetermined predictive model implementations can be provided. For example, there can be small, medium and/or large implementations. A small predictive model implementation uses the resources of a single server, a medium predictive model implementation has a parallelized implementation (e.g., a map-reduce predictive model implementation) that uses the resources of N servers, and a large implementation has a parallelized implementation that uses the resources of P servers, where P>N. In some examples, P and N can be varied dynamically based on the available resources of the system 200 (e.g., the number of a servers that are available to execute a portion of the model implementation) and other factors" Determining model implementation based on factors such as resource constraints and usage (i.e. size)]; 
determining, at the first server computer, one or more capabilities of the second server computer and one or more capabilities of a third server computer [Col. 7, lines 42-48, “In some implementations, the user's remaining account balance determines which model implementations (e.g., small, medium and large) are selected based an estimate of what the user will be charged for the usage. That is, the largest model implementation possible is selected that is not estimated to result in a negative account balance based on system 200 usage"; determining model implementations based on estimates can be interpreted as determining based on capabilities];
based, at least in part, on the size of the particular machine learning system, determining that the second server computer is capable of running the particular machine learning system and that the third server computer is not capable of running the particular machine learning system [Col. 7, lines 36-42, “In the former case, the user can pay for a level or grade of service which determines the size of model implementations that are available to them. In the latter case, the user is charged for the system 200 resources they consume so that if the user (or the system 200) selects larger model implementations, the user will be charged accordingly"; determining which models to implement based on service, resources and usage can be interpreted as determining capabilities based on size”]; and 
in response to determining that the second server computer is capable of running the particular machine learning system and that the third server computer is not capable of running the particular machine learning system, selecting the second server computer for running the particular machine learning system [Col. 7, lines 46-48, “the largest model implementation possible is selected that is not estimated to result in a negative account balance based on system 200 usage”; selecting the model implementation based on usage capabilities can be interpreted as selecting the server(s) to implement the model].   
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the process of determining that the second server computer is capable of running the particular machine learning system and that the third server computer is not capable of running the particular machine learning system, and selecting the second server computer for running the particular machine learning system of Lin into the method for training a predictive model of Mann. Doing so would help providing the faster model implementation (Lin, Col. 7, lines 28-29).
Claim 19 is rejected under 35 USC. 103 as being unpatentable over Mann et al. in view of Srinivasa et al. in view of Simard et al. in view of Duggan et al. in view of Campos and further in view of Zou et al. (US Patent 8,266,078).
As per claim 19, Mann, Srinivasa, Simard, Duggan and Campos teach the method of claim 1.
Mann, Srinivasa, Simard, Dugganand Campos do not explicitly teach
providing the interface indicating one or more combinations of the additional machine learning training datasets, and 
wherein receiving user input indicating the selection of the second machine learning training dataset from the one or more additional machine learning training datasets further comprises receiving user input indicating the selection of a particular combination of the one or more combinations of the additional machine learning training datasets.
Zou teaches 
providing the interface indicating one or more combinations of the additional machine learning training datasets [claim 8, “selecting one or more of the training data sets from a display of the previously created training data set”; examiner interprets the training data sets as a combination of training data], and 
wherein receiving user input indicating the selection of the second machine learning training dataset from the one or more additional machine learning training datasets further comprises receiving user input indicating the selection of a particular combination of the one or more combinations of the additional machine learning training datasets [claim 8, “selecting one or more of the training data sets (combination of training data) from a display of the previously created training data set”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method for training a predictive model of Mann to include the process of providing the interface indicating one or more combinations of the additional machine learning training datasets, and receiving user input indicating the selection of a particular combination of the one or more combinations of the additional machine learning training datasets of Zou. Doing so would help creating the model based on the training parameters, and evaluating the recognition model (Zou, abstract).

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Bourhani et al. (US Pub. 2017/0316114) describes a computing design system facilitates the creation and deployment of complex data and mathematical models.
Sainani et al. (US Pub. 2019/0034767) describes the preprocessed data is used to train a machine learning model that can be subsequently used to predict dat.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TRI T NGUYEN whose telephone number is 571-272-0103. The examiner can normally be reached M-F, 8 AM-5 PM, (CT).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ can be reached on 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/T. N./Examiner, Art Unit 2128                                                                                                                                                                                                        
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128