DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The amendment filed 12/28/2021 has been entered. Claims 1-5, 7-12, 16-21, 45-46, 51-56 and 61-64 remain pending in the application. 

Response to Arguments
Applicant’s arguments, filed 12/28/2021, with respect to the rejections of claims 1 and 45 under 103 have been fully considered and are persuasive because of the amendments. Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Bazrafkan et al. (US Pub. 2018/0211164) in view of Chaudhari et al. (US Pub. 2018/0005111) in view of Rosswog et al. (US Patent 9,514,414) and further in view of Li et al. (US Pub. 2018/0240257).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claims 1-2, 19-20, 45-46, 52, 62 and 64 are rejected under 35 U.S.C. 103 as being unpatentable over Bazrafkan et al. (US Pub. 2018/0211164) in view of Chaudhari et al. (US Pub. 2018/0005111) in view of Rosswog et al. (US Patent 9,514,414) and further in view of Li et al. (US Pub. 2018/0240257).
As per claim 1, Bazrafkan teaches a machine learning (ML) computer system comprising: 
A machine learning (ML) computer system for training a student ML system [Fig. 1, network B] iteratively through machine learning on training data to perform a machine learning task [paragraph 0020, “FIG. 1 shows an augmenting network A operating on an image set I for training a target network B”; paragraph 0029, “the dataset comprises images of male subjects, Network B is designed to perform gender classification and so it produces a single output (LB) indicating the likelihood of an input image containing for example, a male image”; paragraph 0054, “a batch of data X(T) is given to the network and these are used to train instances of network A in parallel and to subsequently train instances of network B in parallel … half of the batch for network B comes from the outputs of the network A processing its samples from the batch and the other half comes from the original database”; abstract, “Training a target neural network comprises providing a first batch of samples of a given class to respective instances of a generative neural network … A second batch of samples is provided to the target neural network”; claim 1, “providing a first batch of samples of a given class to respective instances of a generative neural network … providing a second batch of samples to said target neural network … determining a second loss function … updating the parameters for said target neural network … for said target neural network … repeating steps”], wherein: 
[Fig. 1, network B] comprises an input layer, an output layer, and at least a first inner layer between the input and output layers [paragraph 0020, “training a target network B”; paragraph 0051, “Network B is a typical deep neural network with two convolutional layers”; It can be understood that in a convolutional neural network, the hidden layers include layers that perform convolutions; Fig. 4 shows the target network B includes the input, hidden and output layers]; 
each layer comprises at least one node, such that the first inner layer comprises at least a first node [Fig. 4 shows the inner/hidden layer comprises at least a first node (for example the top node/neuron in the figure)]; 
the first node of the first inner layer outputs an activation value [Fig. 4 shows the first node of the inner layer outputs the values indicating a likelihood of the input image being male or female] for each set of input values to the first node of the first inner layer [Fig. 4, paragraph 0051, one output would represent a likelihood of the input image being male, with the other representing a likelihood of the input image being female. In this case, the targets for these outputs could be 1 and 0 with 1 for male, so causing one of the network B output neurons to fire, and with 0 for female, so causing the other neuron to fire; Fig. 1, paragraph 0029, “Network B is designed to perform gender classification and so it produces a single output (LB) indicating the likelihood of an input image containing for example, a male image”; paragraph 0034, “The loss function LB for network B can for example be a categorical cross-entropy”];
paragraph 0081 of the specification of the Application recites “If the system is doing classification into a finite set of categories, then the control flow 15 proceeds to block 614, which classifies the input data … The output of block 614 is either a score for each possible classification category or simply an indication of the best matching category, which is equivalent to a score of 1 for the chosen category and 0 for everything else. Each category is associated with a node … and the corresponding score is the activation value for the node”.
[paragraphs 0053-0054, “when training a neural network, a batch of data X(T) is given to the network and these are used to train instances of network A in parallel and to subsequently train instances of network B in parallel, the instances of network B being fed (at least partially) with augmented samples generated by the instances of network A from processing the samples of batch X(T) … the total loss … is fed back to network A for the subsequent batch … the network parameters are updated based on the loss function(s) … The parameters for network B are updated based on the loss function for network B”];
wherein the machine learning computer system comprises: 
a learning coach ML system [Fig. 1, paragraph 0032, “Network A is a neural network”] that is in communication with the student ML system [Fig. 1], wherein the learning coach ML system trained through machine learning to determine and implement an enhancement to the student ML system [paragraph 0028, “an augmenting network (network A) to learn the best sample blending/generation for the specific problem in hand. This network A is placed before a target network designed for the problem (network B), so that network A augments a dataset, I1 ... Ik, so that it can provide augmented samples i.e. samples other than those of the dataset I1 ... Ik, to network B. Thus, network A learns the best data augmentation to increase the training accuracy of network B”], and wherein the learning coach ML system comprises an input layer and an output layer [Figs. 1, 3-4], and wherein the learning coach ML system determines the enhancement to the student ML system based on input to the learning coach ML system from the student ML system [paragraph 0036, “The overall loss function error used for training network A is f(LA,LB)”; where, paragraph 0034, “the loss function LB for network B], wherein the input to the learning coach ML system comprises observations about an internal state of the student ML system during training of the student ML system, wherein the observations about the internal state of the student ML system comprise values related to the -2-Serial No. 16/496,585learned parameters on the first inner layer of [Fig. 4 shows the first node of the inner layer outputs the values indicating a likelihood of the input image being male or female; paragraph 0036, “The overall loss function error used for training network A is f(LA,LB) … In one embodiment the overall loss function is αLA+βLB where α=0.3 and β=0.7”; where, paragraph 0034, “the loss function LB for network B can for example be a categorical cross-entropy between the outputs of network B and the target values for the classifier for a batch”; paragraph 0028, “network A learns the best data augmentation to increase the training accuracy of network B”; It can be seen that the network A (learning coach) is trained based in part on the loss function LB of network B (input to network A)]; and 
“…” generating a set of additional training data for the student ML system [paragraph 0054, “network B being fed (at least partially) with augmented samples generated by the instances of network A from processing the samples of batch X(T)”], wherein the learning coach ML system trained, through machine learning, to determine the enhancement to the student ML system that is implemented based on the observations about the first node on the first inner layer of the student ML system from a training of the student ML system with the set of additional training data generated by the reference system [paragraph 0038, “the loss function error back propagates from network B to network A. This tunes network A to generate the best augmentations for network B that can be produced by network A”; where, abstract, “A second loss function (loss function LB) is determined for the target neural network by comparing outputs of instances of the target neural network to one or more targets for the neural network”; paragraph 0036, “The overall loss function error used for training network A is f(LA,LB) … In one embodiment the overall loss function is αLA+βLB where α=0.3 and β=0.7”; where, paragraph 0034, “the loss function LB for network B can for example be a categorical cross-entropy between the outputs of network B and the target values for the classifier for a batch”; paragraph 0029, “an instance of Network A accepts at least one sample, N>=l, from a batch of samples of the same class in the dataset … generates Out1, a new sample in the same class so that this new sample reduces the loss function for network B”; paragraph 0054, “network B being fed (at least partially) with augmented samples generated by the instances of network A”].  
Bazrafkan does not explicitly teach 
the activation value is determined based on an activation function for the first node of the first inner layer and based on learned parameters for the first node of the first inner layer; 
following iterations of the training of the student ML system, the learned parameters for the first node of the first inner layer are updated (emphasis added);
the learning coach ML system has been trained through machine learning to determine and implement an enhancement to the student ML system (emphasis added);
the input to the learning coach ML system from the student ML system is input to the input layer of the learning coach ML system;
the observations about the internal state of the student ML system comprise values related to the learned parameters for the first node on the first inner layer of the student ML system;
a reference system for generating a set of additional training data;
Chaudhari teaches
the activation value [an activation result] is determined based on an activation function for the first node of the first inner layer and based on learned parameters for the first node of the first inner layer [paragraph 0014, “Each neuron node in the first hidden layer may be associated with a corresponding activation function … The activation function corresponding to each neuron node in the first hidden layer may receive as inputs the initial scale parameter from the initial input layer, the bias parameter, the set of training inputs, and a set of weights … The activation function may be executed on the second linear combination and the initial scale parameter to generate an activation result … a respective activation result may be determined for each neuron node in the first hidden layer”; paragraph 0025, “provide the technical effect of improved classification accuracy for a neural network. This technical effect is achieved as a result of the technical feature of learning a scale parameter of an activation function by continually updating a weight applied to the scale parameter”; Fig. 3, paragraph 0031, “execute an activation function on the linear combination obtained at block 304 and the scale parameter to generate an activation result … the bias parameter and the scale parameter may each have respective weights applied thereto that may be learned through iterative execution of the method 300. Thus, an activation function of the form shown in FIG. 1 allows for the scale parameter to be learned in addition to the bias parameter and the other interconnection weights”]; 
following iterations of the training, the learned parameters for the first node of the first inner layer are updated [paragraph 0014, “Each neuron node in the first hidden layer may be associated with a corresponding activation function … The activation function corresponding to each neuron node in the first hidden layer may receive as inputs the initial scale parameter from the initial input layer, the bias parameter, the set of training inputs, and a set of weights; claim 3, “updating the respective set of weights to be applied during a next iteration of the training based at least in part on the difference between the actual target output and the classifier output”];
the observations about the internal state comprise values related to the learned parameters [weights] for the first node on the first inner layer [paragraph 0014, “Each neuron node in the first hidden layer may be associated with a corresponding activation function … The activation function may be executed on the second linear combination and the initial scale parameter to generate an activation result … a respective activation result may be determined for each neuron node in the first hidden layer”; paragraph 0019, “the set of activation results provided to the final output layer may be output by the DNN as a set of classifier outputs. Each such classifier output may be compared to an actual target output associated with a corresponding training input to determine an amount of deviation between the classifier output and the actual target output. Training the DNN may include using these deviations between classifier outputs and actual target outputs to determine the optimal set of weights in the DNN that minimizes a cost/error function representing the error between the set of classifier outputs and the actual target outputs”; paragraph 0020, “weights of the DNN are updated to obtain an optimal set of interconnection weights that minimizes a cost/error function”];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the process of determining the activation value based on an activation function for the first node of the first inner layer and based on learned parameters for the first node of the first inner layer of Chaudhari. Doing so would help generating a set of classifier outputs of the classifier using the activation function and the scale parameter or providing the activation results as input to a next iteration of the training (Chaudhari, 0003).
Bazrafkan and Chaudhari do not teach
the learning coach ML system has been trained through machine learning to determine and implement an enhancement to the student ML system;
the input to the learning coach ML system from the student ML system is input to the input layer of the learning coach ML system;
a reference system for generating a set of additional training data;
Rosswog teaches 
a reference system [Fig. 2, seed set generators 230] for generating a set of additional training data [abstract, “identifying and categorizing electronic documents through machine learning … a seed set of categorized electronic documents may be used to train a document categorizer based on a machine learning algorithm; Col. 9, lines 22-35, “one or more seed set generators 230 that may generate seed sets of electronic documents”; Col. 12, lines 37-67 – Col. 13, lines 1-2, “document categorizer 250 may send an indication to seed set generator 230 that a second or subsequent seed set of electronic document classifications is needed to retrain machine learning algorithm 252 … seed set generator 230 may generate additional seed sets based on the metrics tracked by performance tracker 254”];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include a reference system for generating a set of additional training data of Rosswog. Doing so would help training or retraining a network with the additional training data generated by the reference system to improve the performance of the categorizer.
Bazrafkan, Chaudhari and Rosswog do not teach
the learning coach ML system has been trained through machine learning to determine and implement an enhancement to the student ML system;
the input to the learning coach ML system from the student ML system is input to the input layer of the learning coach ML system;
Li teaches
the learning coach ML system has been trained through machine learning to determine and implement an enhancement to the student ML system [abstract, “The computer system generates an image based on a generator neural network and a loss neural network. The generator neural network outputs the synthesized image based on a noise vector and the style image and is trained based on style features generated from the loss neural network”; paragraph 0004, “a first "generator" neural network is trained using a second pre-trained "loss" neural network”; paragraph 0017, “The loss neural network is pre-trained to generate features that identify the styles in a given image, referred to herein as style features. Pre-trained refers to the loss neural network being already trained prior to the training of the generator neural network”; Examiner interprets the loss neural network as the learning coach ML, and the generator neural network as the student ML system, where the loss neural network/learning coach ML being already trained prior to the training of the generator neural network/student ML system];
the input to the learning coach ML system [Fig. 2, the loss neural network] from the student ML system [Fig. 2, the generator neural network] is input to the input layer of the learning coach ML system [Fig. 2 shows output 212 (from the generator neural network/student ML system) which is the input to the loss neural network/learning coach ML is input to the input layer of the loss neural network/learning coach ML; paragraph 0047, “the generator neural network 210 outputs an image, referred to herein as an intermediate output 212, based on the input noise vector. The intermediate output 212 then becomes an input to the loss neural network 240. In turn, the loss neural network 240 outputs style features of the intermediate output 212, referred to herein as intermediate style features 242”; Since Bazrafkan teaches in Fig. 1 that the network A/ learning coach ML system for training a target network B/student ML system, and in paragraph 0036 that output from the network B is used to train the network A, while Li teaches in Fig. 2 that the loss neural network/learning coach ML being already trained prior to the training of the generator neural network/student ML system, and input from the generator neural network/student ML system to the loss neural network/learning coach ML is input to the input layer of the loss neural network/learning coach ML, therefore, the combination of Bazrafkan and Li read on the claim limitations];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the learning coach ML system has been trained through machine learning, and the input to the learning coach ML system from the student ML system is input to the input layer of the learning coach ML system of Li. Doing so would help iteratively updating the parameters of the generator neural network to minimize the losses (Li, 0004).

As per claim 2, Bazrafkan, Chaudhari, Rosswog and Li teach the ML computer system of claim 1.
Bazrafkan teaches 
generating a set of additional training data for the student ML system [paragraph 0054, “network B being fed (at least partially) with augmented samples generated by the instances of network A from processing the samples of batch X(T)”]
Rosswog further teaches
the reference system [Fig. 2, seed set generators 230] comprises at least one classifier for classifying input data to generate classified data as the set of additional training data [abstract, “identifying and categorizing electronic documents through machine learning … a seed set of categorized electronic documents may be used to train a document categorizer based on a machine learning algorithm; Col. 9, lines 22-35, “one or more seed set generators 230 that may generate seed sets of electronic documents”; Col. 12, lines 37-67 – Col. 13, lines 1-2, “document categorizer 250 may send an indication to seed set generator 230 that a second or subsequent seed set of electronic document classifications is needed to retrain machine learning algorithm 252 … seed set generator 230 may generate additional seed sets based on the metrics tracked by performance tracker 254”; It can be seen that the seed set generator 230 comprises a categorizer that categorizes the documents to generate the seed set];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include a reference system comprises at least one classifier for classifying input data to generate classified data as the set of additional training data of Rosswog. Doing so would help training or retraining a network with the 

As per claim 19, Bazrafkan, Chaudhari, Rosswog and Li teach the ML computer system of claim 1.
Bazrafkan further teaches
the student ML system [Fig. 1, paragraph 0029, “the dataset comprises images of male subjects, Network B is designed to perform gender classification and so it produces a single output (L0) indicating the likelihood of an input image containing for example, a male image”] has a different objective than the learning coach ML system [paragraph 0028, “network A learns the best data augmentation to increase the training accuracy of network B”].  

As per claim 20, Bazrafkan, Chaudhari, Rosswog and Li teach the ML computer system of claim 1.
Bazrafkan further teaches
the enhancement comprises one or more revised hyperparameters [generating new sample] for the student ML system that improve learning by the student ML system [paragraph 0029, “an instance of Network A accepts at least one sample, N>=l, from a batch of samples of the same class in the dataset … generates Out1, a new sample in the same class so that this new sample reduces the loss function for network B”].  

As per claim 45, Bazrafkan teaches a computerized method of improving operation of a student ML system [paragraph 0002, “method of training a neural network”; paragraph 0028, “increase the training accuracy of network B”], the method comprising: 
[paragraph 0020, “FIG. 1 shows an augmenting network A operating on an image set I for training a target network B”; paragraph 0029, “the dataset comprises images of male subjects, Network B is designed to perform gender classification and so it produces a single output (LB) indicating the likelihood of an input image containing for example, a male image”; paragraph 0054, “a batch of data X(T) is given to the network and these are used to train instances of network A in parallel and to subsequently train instances of network B in parallel … half of the batch for network B comes from the outputs of the network A processing its samples from the batch and the other half comes from the original database”; abstract, “Training a target neural network comprises providing a first batch of samples of a given class to respective instances of a generative neural network … A second batch of samples is provided to the target neural network”; claim 1, “providing a first batch of samples of a given class to respective instances of a generative neural network … providing a second batch of samples to said target neural network … determining a second loss function … updating the parameters for said target neural network … for said target neural network … repeating steps”], wherein: 
the student ML system [Fig. 1, network B] comprises an input layer, an output layer, and at least a first inner layer between the input and output layers [paragraph 0020, “training a target network B”; paragraph 0051, “Network B is a typical deep neural network with two convolutional layers”; It can be understood that in a convolutional neural network, the hidden layers include layers that perform convolutions; Fig. 4 shows the target network B includes the input, hidden and output layers]; 
each layer comprises at least one node, such that the first inner layer comprises at least a first node [Fig. 4 shows the inner/hidden layer comprises at least a first node (for example the top node/neuron in the figure)]; 
[Fig. 4 shows the first node of the inner layer outputs the values indicating a likelihood of the input image being male or female] for each set of input values to the first node of the first inner layer [Fig. 4, paragraph 0051, one output would represent a likelihood of the input image being male, with the other representing a likelihood of the input image being female. In this case, the targets for these outputs could be 1 and 0 with 1 for male, so causing one of the network B output neurons to fire, and with 0 for female, so causing the other neuron to fire; Fig. 1, paragraph 0029, “Network B is designed to perform gender classification and so it produces a single output (LB) indicating the likelihood of an input image containing for example, a male image”; paragraph 0034, “The loss function LB for network B can for example be a categorical cross-entropy”];
paragraph 0081 of the specification of the Application recites “If the system is doing classification into a finite set of categories, then the control flow 15 proceeds to block 614, which classifies the input data … The output of block 614 is either a score for each possible classification category or simply an indication of the best matching category, which is equivalent to a score of 1 for the chosen category and 0 for everything else. Each category is associated with a node … and the corresponding score is the activation value for the node”.
the initial training the student ML system comprises, following iterations of the training of the student ML system, updating the learned parameters of the first inner layer [paragraphs 0053-0054, “when training a neural network, a batch of data X(T) is given to the network and these are used to train instances of network A in parallel and to subsequently train instances of network B in parallel, the instances of network B being fed (at least partially) with augmented samples generated by the instances of network A from processing the samples of batch X(T) … the total loss … is fed back to network A for the subsequent batch … the network parameters are updated based on the loss function(s) … The parameters for network B are updated based on the loss function for network B”]; 
[paragraph 0054, “network B being fed (at least partially) with augmented samples generated by the instances of network A from processing the samples of batch X(T)”];
following the initial training of the student ML system: 
training the student ML system on the set of additional training data generated “…” [paragraph 0054, “network B being fed (at least partially) with augmented samples generated by the instances of network A]; 
receiving, by a learning coach ML system [Fig. 1, paragraph 0032, “Network A is a neural network”], from the student ML system, observations about an internal state of the student ML system as the student ML system is being trained on the set of training data [Fig. 4 shows the first node of the inner layer outputs the values indicating a likelihood of the input image being male or female; paragraph 0036, “The overall loss function error used for training network A is f(LA,LB), wherein: the learning coach ML system trained through machine learning to determine and implement an enhancement to the student ML system [paragraph 0028, “an augmenting network (network A) to learn the best sample blending/generation for the specific problem in hand. This network A is placed before a target network designed for the problem (network B), so that network A augments a dataset, I1 ... Ik, so that it can provide augmented samples i.e. samples other than those of the dataset I1 ... Ik, to network B. Thus, network A learns the best data augmentation to increase the training accuracy of network B”]; 
the learning coach ML system comprises an input layer and an output layer [Figs. 1, 3-4]; 
the learning coach ML system determines the enhancement to the student ML system based on input to the learning coach ML system from the student ML system [paragraph 0036, “The overall loss function error used for training network A is f(LA,LB)”; where, paragraph 0034, “the loss function LB for network B]; -6-Serial No. 16/496,585
[Fig. 4 shows the first node of the inner layer outputs the values indicating a likelihood of the input image being male or female; paragraph 0036, “The overall loss function error used for training network A is f(LA,LB) … In one embodiment the overall loss function is αLA+βLB where α=0.3 and β=0.7”; where, paragraph 0034, “the loss function LB for network B can for example be a categorical cross-entropy between the outputs of network B and the target values for the classifier for a batch”; paragraph 0028, “network A learns the best data augmentation to increase the training accuracy of network B”; It can be seen that the network A (learning coach) is trained based in part on the loss function LB of network B (input to network A)]; and
determining and implementing, by the learning coach ML system, the enhancement to the student ML system [optimizing the loss function for network B] based on the observations about the first node of the first inner layer of the student ML system from training on the set of additional training data to improve operation of the student ML system [paragraph 0038, “the loss function error back propagates from network B to network A. This tunes network A to generate the best augmentations for network B that can be produced by network A”; where, abstract, “A second loss function (loss function LB) is determined for the target neural network by comparing outputs of instances of the target neural network to one or more targets for the neural network”; paragraph 0036, “The overall loss function error used for training network A is f(LA,LB) … In one embodiment the overall loss function is αLA+βLB where α=0.3 and β=0.7”; where, paragraph 0034, “the loss function LB for network B can for example be a categorical cross-entropy between the outputs of network B and the target values for the classifier for a batch”; paragraph 0029, “an instance of Network A accepts at least one sample, N>=l, from a batch of samples of the same class in the dataset … generates Out1, a new sample in the same class so that this new sample reduces the loss function for network B”; paragraph 0054, “network B being fed (at least partially) with augmented samples generated by the instances of network A”].  
Bazrafkan does not explicitly teach
the activation value is determined based on an activation function for the first node of the first inner layer and based on learned parameters for the first node of the first inner layer; 
following iterations of the training of the student ML system, updating the learned parameters for the first node of the first inner layer;
the learning coach ML system has been trained through machine learning to determine and implement an enhancement to the student ML system (emphasis added);
the input to the learning coach ML system from the student ML system is input to the input layer of the learning coach ML system; 
the observations about the internal state of the student ML system comprise values related to the learned parameters for the first node on the first inner layer of the student ML system;
a reference system for generating a set of additional training data, wherein the reference system comprises a computer;
Chaudhari teaches
the activation value [an activation result] is determined based on an activation function for the first node of the first inner layer and based on learned parameters for the first node of the first inner layer [paragraph 0014, “Each neuron node in the first hidden layer may be associated with a corresponding activation function … The activation function corresponding to each neuron node in the first hidden layer may receive as inputs the initial scale parameter from the initial input layer, the bias parameter, the set of training inputs, and a set of weights … The activation function may be executed on the second linear combination and the initial scale parameter to generate an activation result … a respective activation result may be determined for each neuron node in the first hidden layer”; paragraph 0025, “provide the technical effect of improved classification accuracy for a neural network. This technical effect is achieved as a result of the technical feature of learning a scale parameter of an activation function by continually updating a weight applied to the scale parameter”; Fig. 3, paragraph 0031, “execute an activation function on the linear combination obtained at block 304 and the scale parameter to generate an activation result … the bias parameter and the scale parameter may each have respective weights applied thereto that may be learned through iterative execution of the method 300. Thus, an activation function of the form shown in FIG. 1 allows for the scale parameter to be learned in addition to the bias parameter and the other interconnection weights”]; 
following iterations of the training of the student ML system, updating the learned parameters for the first node of the first inner layer [paragraph 0014, “Each neuron node in the first hidden layer may be associated with a corresponding activation function … The activation function corresponding to each neuron node in the first hidden layer may receive as inputs the initial scale parameter from the initial input layer, the bias parameter, the set of training inputs, and a set of weights; claim 3, “updating the respective set of weights to be applied during a next iteration of the training based at least in part on the difference between the actual target output and the classifier output”];
the observations about the internal state comprise values related to the learned parameters [weights] for the first node on the first inner layer [paragraph 0014, “Each neuron node in the first hidden layer may be associated with a corresponding activation function … The activation function may be executed on the second linear combination and the initial scale parameter to generate an activation result … a respective activation result may be determined for each neuron node in the first hidden layer”; paragraph 0019, “the set of activation results provided to the final output layer may be output by the DNN as a set of classifier outputs. Each such classifier output may be compared to an actual target output associated with a corresponding training input to determine an amount of deviation between the classifier output and the actual target output. Training the DNN may include using these deviations between classifier outputs and actual target outputs to determine the optimal set of weights in the DNN that minimizes a cost/error function representing the error between the set of classifier outputs and the actual target outputs”; paragraph 0020, “weights of the DNN are updated to obtain an optimal set of interconnection weights that minimizes a cost/error function”];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the process of determining the activation value based on an activation function for the first node of the first inner layer and based on learned parameters for the first node of the first inner layer of Chaudhari. Doing so would help generating a set of classifier outputs of the classifier using the activation function and the scale parameter or providing the activation results as input to a next iteration of the training (Chaudhari, 0003).
Bazrafkan and Chaudhari do not teach
the learning coach ML system has been trained through machine learning to determine and implement an enhancement to the student ML system;
the input to the learning coach ML system from the student ML system is input to the input layer of the learning coach ML system;
a reference system for generating a set of additional training data, wherein the reference system comprises a computer;
Rosswog teaches
a reference system [Fig. 2, seed set generators 230] for generating a set of additional training data [abstract, “identifying and categorizing electronic documents through machine learning … a seed set of categorized electronic documents may be used to train a document categorizer based on a machine learning algorithm; Col. 9, lines 22-35, “one or more seed set generators 230 that may generate seed sets of electronic documents”; Col. 12, lines 37-67 – Col. 13, lines 1-2, “document categorizer 250 may send an indication to seed set generator 230 that a second or subsequent seed set of electronic document classifications is needed to retrain machine learning algorithm 252 … seed set generator 230 may generate additional seed sets based on the metrics tracked by performance tracker 254”], wherein the reference system comprises a computer [Col. 13, lines 66-67 – Col. 14, line 1, “Seed set generator 230, document categorizer 250 … may be implemented as a hardware modules”];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include a reference system for generating a set of additional training data of Rosswog. Doing so would help training or retraining a network with the additional training data generated by the reference system to improve the performance of the categorizer.
Bazrafkan, Chaudhari and Rosswog do not teach
the learning coach ML system has been trained through machine learning to determine and implement an enhancement to the student ML system;
the input to the learning coach ML system from the student ML system is input to the input layer of the learning coach ML system;
Li teaches
the learning coach ML system has been trained through machine learning to determine and implement an enhancement to the student ML system [abstract, “The computer system generates an image based on a generator neural network and a loss neural network. The generator neural network outputs the synthesized image based on a noise vector and the style image and is trained based on style features generated from the loss neural network”; paragraph 0004, “a first "generator" neural network is trained using a second pre-trained "loss" neural network”; paragraph 0017, “The loss neural network is pre-trained to generate features that identify the styles in a given image, referred to herein as style features. Pre-trained refers to the loss neural network being already trained prior to the training of the generator neural network”; Examiner interprets the loss neural network as the learning coach ML, and the generator neural network as the student ML system, where the loss neural network/learning coach ML being already trained prior to the training of the generator neural network/student ML system];
the input to the learning coach ML system [Fig. 2, the loss neural network] from the student ML system [Fig. 2, the generator neural network] is input to the input layer of the learning coach ML system [Fig. 2 shows output 212 (from the generator neural network/student ML system) which is the input to the loss neural network/learning coach ML is input to the input layer of the loss neural network/learning coach ML; paragraph 0047, “the generator neural network 210 outputs an image, referred to herein as an intermediate output 212, based on the input noise vector. The intermediate output 212 then becomes an input to the loss neural network 240. In turn, the loss neural network 240 outputs style features of the intermediate output 212, referred to herein as intermediate style features 242”; Since Bazrafkan teaches in Fig. 1 that the network A/ learning coach ML system for training a target network B/student ML system, and in paragraph 0036 that output from the network B is used to train the network A, while Li in Fig. 2 teaches the loss neural network/learning coach ML being already trained prior to the training of the generator neural network/student ML system, and input from the generator neural network/student ML system to the loss neural network/learning coach ML is input to the input layer of the loss neural network/learning coach ML, therefore, the combination of Bazrafkan and Li read on the claim limitations];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the learning coach ML system has been trained through machine learning, and the input to the learning coach ML  of Li. Doing so would help iteratively updating the parameters of the generator neural network to minimize the losses (Li, 0004).

As per claim 46, Bazrafkan, Chaudhari, Rosswog and Li teach the method of claim 45.
Bazrafkan teaches 
generating a set of additional training data for the student ML system [paragraph 0054, “network B being fed (at least partially) with augmented samples generated by the instances of network A from processing the samples of batch X(T)”]
Rosswog further teaches
the reference system [Fig. 2, seed set generators 230] comprises at least one classifier for classifying input data to generate classified data as the set of additional training data [abstract, “identifying and categorizing electronic documents through machine learning … a seed set of categorized electronic documents may be used to train a document categorizer based on a machine learning algorithm; Col. 9, lines 22-35, “one or more seed set generators 230 that may generate seed sets of electronic documents”; Col. 12, lines 37-67 – Col. 13, lines 1-2, “document categorizer 250 may send an indication to seed set generator 230 that a second or subsequent seed set of electronic document classifications is needed to retrain machine learning algorithm 252 … seed set generator 230 may generate additional seed sets based on the metrics tracked by performance tracker 254”; It can be seen that the seed set generator 230 comprises a categorizer that categorizes the documents to generate the seed set];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include a reference system comprises at least one classifier for classifying input data to generate classified data as the set of 

As per claim 52, Bazrafkan, Chaudhari, Rosswog and Li teach the method of claim 45.
Rosswog further teaches
transmitting, by a learning experimentation system [Fig. 2, document categorizer 250] that is in communication with the reference system [Fig. 2], a control parameter to the reference system [Col. 12, lines 37-66, “document categorizer 250 may include a performance tracker 254 that tracks one or more metrics associated with the performance of document categorizer 250's categorizations. The metrics may include the number of electronic documents categorized in each category (e.g., relevant and not relevant), the confidence modifiers of all the categorized electronic documents … document categorizer 250 may send an indication to seed set generator 230 that a second or subsequent seed set of electronic document classifications is needed (control parameter- increasing the training data) to retrain machine learning algorithm 252 to improve its categorization performance”], wherein the control parameter controls generation of the set of additional training data by the reference system [Col. 12, line 67 – Col. 13, lines 1-2, “seed set generator 230 may generate additional seed sets based on the metrics tracked by performance tracker 254”], wherein the learning experimentation system comprises a computer [Col. 13, lines 66-67 – Col. 14, line 1, “Seed set generator 230, document categorizer 250 … may be implemented as a hardware modules”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the process of transmitting a control parameter to the reference system, wherein the control parameter controls generation of the set of additional training data by the reference system of Rosswog. Doing so would 

As per claim 62, Bazrafkan, Chaudhari, Rosswog and Li teach the ML computer system of claim 20.
Bazrafkan further teaches
the one or more revised hyperparameters comprise a revised learning regularization hyperparameter for the student ML system [paragraph 0003, “a problem in the field of deep learning is that there is simply not enough quality labelled data to train neural networks”; paragraph 0010, “technique for addressing this problem is called augmentation. Augmentation is the process of supplementing a training dataset, with similar data created from the information in that dataset. The use of augmentation in deep learning is ubiquitous, and when dealing with images, this can include the application of rotation, translation, blurring and other modifications to existing labelled images in order to improve the training of a target network. Augmentation thus serves as a type of regularization, reducing the chance of overfitting by adding information to the training dataset for a target network”].  

As per claim 64, Bazrafkan, Chaudhari, Rosswog and Li teach the ML computer system of claim 20.
Chaudhari further teaches
the student ML system comprises multiple connection weights between nodes of the student ML system, including a first connection weight and a second connection weight [paragraph 0015, “An activation result generated for any given neuron node in the first hidden layer may differ from the activation result generated for any other neuron node in the first hidden layer depending on the interconnection weights between the nodes in the initial input layer and the neuron nodes in the first hidden layer. For example, a first node in the initial input layer representative of a first training input may be connected to a first neuron node in the first hidden layer by a first weight, but may be connected to a second neuron node in the first hidden layer by a second different weight. Thus, the set of weights that is applied to the set of training inputs may differ from one neuron node in the first hidden layer to the next neuron node in the first hidden layer”]; and
the one or more revised hyperparameters comprise a first value for a first hyperparameter for the first connection weight of the student ML system and a second value for the revised hyperparameter for the second connection weight of the student ML system [paragraph 0015, “a first node in the initial input layer representative of a first training input may be connected to a first neuron node in the first hidden layer by a first weight, but may be connected to a second neuron node in the first hidden layer by a second different weight”;  paragraph 0020, “weights of the DNN are updated to obtain an optimal set of interconnection weights that minimizes a cost/error function representing a total error between a set of classifier outputs and a set of actual target outputs for a given set of training inputs”;  paragraph 0022, “during the weight update phase, the delta determined for a first node may be multiplied by the input activation result received from a second node connected by an interconnection weight with the first node to obtain a gradient associated with the interconnection weight. A ratio of the gradient may then be determined and subtracted from the interconnection weight connecting the first and second nodes to obtain a new weight between the nodes”; It can be seen that the new weight values are obtained for the nodes (weights updating) and the nodes (from input and first hidden layers) are connected by different weights (first and second weights)].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the one or .

Claims 3-5, 8-9, 11-12, 53 and 55-56 are rejected under 35 U.S.C. 103 as being unpatentable over Bazrafkan et al. in view of Chaudhari et al. in view of Rosswog et al. in view of Li et al. and further in view of Paquet et al. (US Pub. 2012/0158620).
As per claim 3, Bazrafkan, Chaudhari, Rosswog and Li teach the ML computer system of claim 2.
Bazrafkan (as modified) teaches
the at least one classifier of the reference system[abstract, “a seed set of categorized electronic documents may be used to train a document categorizer based on a machine learning algorithm; Col. 9, lines 22-35, “one or more seed set generators 230 that may generate seed sets of electronic documents”; Col. 12, lines 37-67 – Col. 13, lines 1-2, “document categorizer 250 may send an indication to seed set generator 230 that a second or subsequent seed set of electronic document classifications is needed to retrain machine learning algorithm 252”; It can be seen that the seed set generator 230 comprises a categorizer that categorizes the documents to generate the seed set].  
Bazrafkan, Chaudhari, Rosswog and Li do not teach
the at least one classifier of the reference system comprises a ML system (emphasis added).
Paquet teaches
[Fig. 2, paragraph 0023, an automated classifier 32, such as an artificial neural network, a Bayesian classifier algorithm, etc.].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the at least one classifier comprises a ML system of Paquet. Doing so would help classifying untrained pattern (ANN).

As per claim 4, Bazrafkan, Chaudhari, Rosswog, Li and Paquet teach the ML computer system of claim 3.
Bazrafkan teaches
the student ML system [Fig. 1, paragraph 0020, “training a target network B”; abstract, “Training a target neural network”; It can be seen that network B is a neural network].  
Bazrafkan (as modified) teaches
the reference system [Rosswog, Fig. 2, seed set generators 230] comprises at least one classifier [Rosswog, Fig. 2, Col. 12, lines 37-67 – Col. 13, lines 1-2, “document categorizer 250 may send an indication to seed set generator 230 that a second or subsequent seed set of electronic document classifications is needed to retrain machine learning algorithm 252 … seed set generator 230 may generate additional seed sets based on the metrics tracked by performance tracker 254”; It can be seen that the seed set generator 230 comprises a categorizer that categorizes the documents to generate the seed set], and the at least one classifier of the reference system comprises a ML system [Paquet, Fig. 2, paragraphs 0023, 0038, an automated classifier 32, such as a Bayesian classifier algorithm … a genetic algorithm … and other "machine learning" techniques];
”.

As per claim 5, Bazrafkan, Chaudhari, Rosswog, Li and Paquet teach the ML computer system of claim 3.
Bazrafkan teaches
the student ML system [Fig. 1, paragraph 0020, “training a target network B”; abstract, “Training a target neural network”; It can be seen that network B is a neural network].  
Rosswog teaches
the reference system [Fig. 2, seed set generators 230] comprises at least one classifier [Fig. 2, Col. 12, lines 37-67 – Col. 13, lines 1-2, “document categorizer 250 may send an indication to seed set generator 230 that a second or subsequent seed set of electronic document classifications is needed to retrain machine learning algorithm 252 … seed set generator 230 may generate additional seed sets based on the metrics tracked by performance tracker 254”; It can be seen that the seed set generator 230 comprises a categorizer that categorizes the documents to generate the seed set]
Paquet further teaches
the at least one classifier of the reference system has an identical ML structure as the student ML system [paragraph 0023, “an automated classifier 32, such as an artificial neural network (student ML system) … the automated classifier 32 is developed using a training set 34 … The training set 34 may be generated … by utilizing a sample content set 12 prepared by another automated classifier 32”; As recited in claims 1-2, “a computerized reference system for generating a set of training data … the reference system comprises at least one classifier”, thus the “another automated classifier 32” that generated the training set is the classifier that included in the reference system, and the “another automated classifier 32” has an identical ML structure as the ML system “an automated classifier 32”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the at least one classifier of the reference system has an identical ML structure as the student ML system of Paquet. Doing so would help classifying untrained pattern (ANN).

As per claim 8, Bazrafkan, Chaudhari, Rosswog, Li and Paquet teach the ML computer system of claim 3.
Rosswog further teaches
a learning experimentation system [Fig. 2, document categorizer 250] that is in communication with the reference system [Fig. 2], wherein the learning experimentation system comprises a computer [Col. 13, lines 66-67 – Col. 14, line 1, “Seed set generator 230, document categorizer 250 … may be implemented as a hardware modules”], and wherein the learning experimentation system transmits a control parameter to the reference system [Col. 12, lines 37-66, “document categorizer 250 may include a performance tracker 254 that tracks one or more metrics associated with the performance of document categorizer 250's categorizations. The metrics may include the number of electronic documents categorized in each category (e.g., relevant and not relevant), the confidence modifiers of all the categorized electronic documents … document categorizer 250 may send an indication to seed set generator 230 that a second or subsequent seed set of electronic document classifications  is needed (control parameter- increasing the training data) to retrain machine learning algorithm 252 to improve its categorization performance”], wherein the control parameter controls generation of the set of additional training data by the reference system [Col. 12, line 67 – Col. 13, lines 1-2, “seed set generator 230 may generate additional seed sets based on the metrics tracked by performance tracker 254”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the control parameter controls generation of the set of additional training data by the reference system of Rosswog. Doing so would help training or retraining a network with the additional training data generated by the reference system to improve the performance of the categorizer.

As per claim 9, Bazrafkan, Chaudhari, Rosswog, Li and Paquet teach the ML computer system of claim 8.
Paquet further teaches
the learning experimentation system [Fig. 6, paragraph 0032 disclose a system comprises a device 82, CPU 84, etc.]  controls the reference system [Fig. 6, paragraph 0032, “The exemplary system 86 may also comprise a training set generating component 90”] such that the student ML system is trained to imitate the reference system [paragraph 0023, “an automated classifier 32, such as an artificial neural network (student ML system) … the automated classifier 32 is developed using a training set 34 … (wherein) The training set 34 may be generated … by utilizing a sample content set 12 prepared by another automated classifier 32”; As recited in claims 1-2, “a computerized reference system for generating a set of training data … the reference system comprises at least one classifier”, thus the “another automated classifier 32” that generated the training set is the classifier that included in the reference system, where, the training set is generated based in part of comparing the classification confidence of the content item with the confidence threshold (control parameter created by the device 82), sending the content that having the classification confidence less than the threshold to the human classifiers, and a new training set is created based on the results of the human classification; Since Paquet discloses the system that includes at least a device 82 (the learning experimentation system) controls the reference system (the exemplary system 86 which comprises a training set generating component 90) for generating the training set 34 based on the provided confidence threshold 54 (prepared by device 82), and both the student ML system (comprising an automated classifier 32), and the reference system (comprising another automated classifier 32) perform classification of the content items based on the generated training set 34 (two identical classifier 32 are trained using a training set 34), therefore, Paquet reference teaches the claim limitation];
Paragraph 0118 of the specification of the Application recites “the learning experimentation system 61 can specify architectures (e.g., layers, nodes, connection weights) such that it is possible for the student learning system 11 to exactly duplicate the classification done by the reference 15 system 51. The architecture of the student learning system 11 may be a copy of the architecture of the reference system 51”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the learning experimentation system controls the reference system such that the student ML system is trained to imitate the reference system of Paquet. Doing so would help classifying content items with an acceptable classification confidence and accuracy (Paquet, 0029).

As per claim 11, Bazrafkan, Chaudhari, Rosswog and Li teach the ML computer system of claim 2.
Rosswog further teaches
generate classified data as the set of additional training data [Col. 9, lines 22-35, “one or more seed set generators 230 that may generate seed sets of electronic documents”; Col. 12, lines 37-67 – Col. 13, lines 1-2, “document categorizer 250 may send an indication to seed set generator 230 that a second or subsequent seed set of electronic document classifications is needed to retrain machine learning algorithm 252 … seed set generator 230 may generate additional seed sets based on the metrics tracked by performance tracker 254”];
Bazrafkan, Chaudhari, Rosswog and Li do not teach
the reference system: comprises two or more classifiers for classifying the input data to generate classified data; and 
randomly selects the classified data from the two of more classifiers as the set of additional training data for the student ML system.  
Paquet teaches
the reference system: comprises two or more classifiers for classifying the input data to generate classified data [paragraph 0023, “the automated classifier 32 is developed using a training set 34 … The training set 34 may be generated … by utilizing a sample content set 12 prepared by another automated classifier 32”; paragraph 0006, “Content items having a low classification confidence may be selected and provided to human classifiers, who may identify one or more categories that are associated with the content item. These human-selected classifications of content items may therefore be utilized as a new training set”]; and 
randomly selects the classified data from the two of more classifiers as the set of additional training data for the student ML system [paragraph 0006, “These human-selected classifications of content items may therefore be utilized as a new training set to retrain the automated classifier in order to achieve an accurate classification of the difficult-to-classify content items; Figs 2 and 4, paragraph 0030, “The automated classifier 32 may be invoked to perform a classification 18 of content items 14 of a content set 12 ( such as the three content items 14 identified in this exemplary scenario 50 as "A", "B", and "C"), and each classification 18 may result in an identified association with one or more categories 16, and also a classification confidence 52 (e.g., computed as a probability between 0.00, indicating no confidence, and 1.00, indicating absolute confidence). An embodiment of these techniques may compare the classification confidence 52 of each classification 18 with a classification confidence threshold 54 (e.g., a 0.50 probability) that distinguishes acceptably confident classifications 18 from unacceptably confident classifications 18. For example, the content item 14 identified as "B" may be classified with a classification confidence 52 of 0.96 that well exceeds a defined classification confidence threshold 54 of 0.50, while the content items 14 identified as "A" and "C" may be classified with unacceptably low classification confidences 52 of 0.24 and 0.03. Accordingly, an embodiment of these techniques may select these content items 14 for inclusion in a supplemental training set 34, and may provide this training set 34 to a human classifier 20 for classification 18. After the human classifier 20 identifies one or more categories 16 associated with each content item 14, these associations may be used in a supplemental training 36 in order to improve the proficiency of the automated classifier 32 in classifying these types of content items 14. (The supplemental training 36 … include … the content items 14 from the initial training set 34, and/or from previously generated supplemental training sets 34.) In this manner, the supplementally trained automated classifier 32 may therefore exhibit a wider range of acceptably accurate classifications 18”; It can be seen that, classification is performed on the content items and each classification is assigned with a confidence score (probability), any classification confidence that is below a classification confidence threshold 54 defined by the device 82 is selected and send to a human classifier such that a category associated with the content item is identified, and a new training set is generated including the content items from the initial training set. For example, when the automated classifier 32 performs classification on the content items, the content item identified as "B" that is classified with a classification confidence of 0.96 that well exceeds a defined classification confidence threshold of 0.50 is selected from the automated classifier 32, while the content items 14 identified as "A" and "C" which are classified with unacceptably low classification confidences of 0.24 and 0.03 are sent to the human classifier to identify the classes associated with the content items, then those classified data from the human classifier combine with the previous selected classified data from the automated classifier 32 to generate a new training set”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include two or more classifiers for classifying the input data to generate classified data, and randomly selects the classified data from the two of more classifiers as the set of additional training data of Paquet. Doing so would help classifying content items with an acceptable classification confidence and accuracy (Paquet, 0029).

As per claim 12, Bazrafkan, Chaudhari, Rosswog, Li and Paquet teach the ML computer system of claim 11.
Rosswog teaches
a computerized learning experimentation system that is in communication with the reference system [Fig. 2];
generate classified data as the set of additional training data [Col. 9, lines 22-35, “one or more seed set generators 230 that may generate seed sets of electronic documents”; Col. 12, lines 37-67 – Col. 13, lines 1-2, “document categorizer 250 may send an indication to seed set generator 230 that a second or subsequent seed set of electronic document classifications is needed to retrain machine learning algorithm 252 … seed set generator 230 may generate additional seed sets based on the metrics tracked by performance tracker 254”];
Paquet further teaches
the learning experimentation system provides a tunable control parameter [paragraph 0032, “a classification confidence threshold 54 defined by the device 82”] to the reference system [Fig. 6, paragraph 0032, “The exemplary system 86 may also comprise a training set generating component 90”] that controls a probability at which the reference system randomly selects the classified data from each of the two or more classifiers to be the set of additional training data [paragraph 0023, “the automated classifier 32 is developed using a training set 34 … The training set 34 may be generated … by utilizing a sample content set 12 prepared by another automated classifier 32”; Figs 2 and 4, paragraph 0030, “The automated classifier 32 may be invoked to perform a classification 18 of content items 14 of a content set 12 ( such as the three content items 14 identified in this exemplary scenario 50 as "A", "B", and "C"), and each classification 18 may result in an identified association with one or more categories 16, and also a classification confidence 52 (e.g., computed as a probability between 0.00, indicating no confidence, and 1.00, indicating absolute confidence). An embodiment of these techniques may compare the classification confidence 52 of each classification 18 with a classification confidence threshold 54 (e.g., a 0.50 probability) that distinguishes acceptably confident classifications 18 from unacceptably confident classifications 18. For example, the content item 14 identified as "B" may be classified with a classification confidence 52 of 0.96 that well exceeds a defined classification confidence threshold 54 of 0.50, while the content items 14 identified as "A" and "C" may be classified with unacceptably low classification confidences 52 of 0.24 and 0.03. Accordingly, an embodiment of these techniques may select these content items 14 for inclusion in a supplemental training set 34, and may provide this training set 34 to a human classifier 20 for classification 18. After the human classifier 20 identifies one or more categories 16 associated with each content item 14, these associations may be used in a supplemental training 36 in order to improve the proficiency of the automated classifier 32 in classifying these types of content items 14. (The supplemental training 36 … include … the content items 14 from the initial training set 34, and/or from previously generated supplemental training sets 34.) In this manner, the supplementally trained automated classifier 32 may therefore exhibit a wider range of acceptably accurate classifications
18”; It can be seen that, classification is performed on the content items and each classification is assigned with a confidence score (probability), any classification confidence that is below a classification confidence threshold 54 defined by the device 82 is selected and send to a human classifier such that a category associated with the content item is identified, and a new training set is generated including the content items from the initial training set. For example, when the automated classifier 32 performs classification on the content items, the content item identified as "B" that is classified with a classification confidence of 0.96 that well exceeds a defined classification confidence threshold of 0.50 is selected from the automated classifier 32, while the content items 14 identified as "A" and "C" which are classified with unacceptably low classification confidences of 0.24 and 0.03 are sent to the human classifier to identify the classes associated with the content items, then those classified data from the human classifier combine with the previous selected classified data from the automated classifier 32 to generate a new training set; In this example, since the total classified data is 3, thus the probability of selecting a classified data from the automated classifier 32 by the system is 1/3 (selecting B), and the probability of selecting a classified data from the human classifier is 1 – 1/3].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the process of providing a tunable control parameter to the reference system that controls a probability at which the reference system randomly selects the classified data from each of the two or more classifiers to be the set of additional training data of Paquet. Doing so would help classifying content items with an acceptable classification confidence and accuracy (Paquet, 0029).

As per claim 53, Bazrafkan, Chaudhari, Rosswog and Li teach the method of claim 52.
Bazrafkan, Chaudhari, Rosswog and Li do not teach

Paquet teaches
the learning experimentation system [Fig. 6, paragraph 0032 disclose a system comprises a device 82, CPU 84, etc.]  controls the reference system [Fig. 6, paragraph 0032, “The exemplary system 86 may also comprise a training set generating component 90”] such that the student ML system is trained to imitate the reference system [paragraph 0023, “an automated classifier 32, such as an artificial neural network (student ML system) … the automated classifier 32 is developed using a training set 34 … (wherein) The training set 34 may be generated … by utilizing a sample content set 12 prepared by another automated classifier 32”; As recited in claims 1-2, “a computerized reference system for generating a set of training data … the reference system comprises at least one classifier”, thus the “another automated classifier 32” that generated the training set is the classifier that included in the reference system, where, the training set is generated based in part of comparing the classification confidence of the content item with the confidence threshold (control parameter created by the device 82), sending the content that having the classification confidence less than the threshold to the human classifiers, and a new training set is created based on the results of the human classification; Since Paquet discloses the system that includes at least a device 82 (the learning experimentation system) controls the reference system (the exemplary system 86 which comprises a training set generating component 90) for generating the training set 34 based on the provided confidence threshold 54 (prepared by device 82), and both the student ML system (comprising an automated classifier 32), and the reference system (comprising another automated classifier 32) perform classification of the content items based on the generated training set 34 (two identical classifier 32 are trained using a training set 34), therefore, Paquet reference teaches the claim limitation];
 “the learning experimentation system 61 can specify architectures (e.g., layers, nodes, connection weights) such that it is possible for the student learning system 11 to exactly duplicate the classification done by the reference 15 system 51. The architecture of the student learning system 11 may be a copy of the architecture of the reference system 51”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the learning experimentation system controls the reference system such that the student ML system is trained to imitate the reference system of Paquet. Doing so would help classifying content items with an acceptable classification confidence and accuracy (Paquet, 0029).

As per claim 55, Bazrafkan, Chaudhari, Rosswog and Li teach the method of claim 46.
Rosswog further teaches
generate classified data as the set of additional training data [Col. 9, lines 22-35, “one or more seed set generators 230 that may generate seed sets of electronic documents”; Col. 12, lines 37-67 – Col. 13, lines 1-2, “document categorizer 250 may send an indication to seed set generator 230 that a second or subsequent seed set of electronic document classifications is needed to retrain machine learning algorithm 252 … seed set generator 230 may generate additional seed sets based on the metrics tracked by performance tracker 254”];
Bazrafkan, Chaudhari, Rosswog and Li do not teach
the reference system comprises two or more classifiers for classifying the input data to generate classified data; and 
the method further comprises randomly selecting, by the reference system, the classified data from the two of more classifiers as the set of additional training data for the student ML system.  
Paquet teaches
the reference system comprises two or more classifiers for classifying the input data to generate classified data [paragraph 0023, “the automated classifier 32 is developed using a training set 34 … The training set 34 may be generated … by utilizing a sample content set 12 prepared by another automated classifier 32”; paragraph 0006, “Content items having a low classification confidence may be selected and provided to human classifiers, who may identify one or more categories that are associated with the content item. These human-selected classifications of content items may therefore be utilized as a new training set”]; and
randomly selecting, by the reference system, the classified data from the two or more classifiers as the set of additional training data for the student ML system [paragraph 0006, “These human-selected classifications of content items may therefore be utilized as a new training set to retrain the automated classifier in order to achieve an accurate classification of the difficult-to-classify content items; Figs 2 and 4, paragraph 0030, “The automated classifier 32 may be invoked to perform a classification 18 of content items 14 of a content set 12 ( such as the three content items 14 identified in this exemplary scenario 50 as "A", "B", and "C"), and each classification 18 may result in an identified association with one or more categories 16, and also a classification confidence 52 (e.g., computed as a probability between 0.00, indicating no confidence, and 1.00, indicating absolute confidence). An embodiment of these techniques may compare the classification confidence 52 of each classification 18 with a classification confidence threshold 54 (e.g., a 0.50 probability) that distinguishes acceptably confident classifications 18 from unacceptably confident classifications 18. For example, the content item 14 identified as "B" may be classified with a classification confidence 52 of 0.96 that well exceeds a defined classification confidence threshold 54 of 0.50, while the content items 14 identified as "A" and "C" may be classified with unacceptably low classification confidences 52 of 0.24 and 0.03. Accordingly, an embodiment of these techniques may select these content items 14 for inclusion in a supplemental training set 34, and may provide this training set 34 to a human classifier 20 for classification 18. After the human classifier 20 identifies one or more categories 16 associated with each content item 14, these associations may be used in a supplemental training 36 in order to improve the proficiency of the automated classifier 32 in classifying these types of content items 14. (The supplemental training 36 … include … the content items 14 from the initial training set 34, and/or from previously generated supplemental training sets 34.) In this manner, the supplementally trained automated classifier 32 may therefore exhibit a wider range of acceptably accurate classifications 18”; It can be seen that, classification is performed on the content items and each classification is assigned with a confidence score (probability), any classification confidence that is below a classification confidence threshold 54 defined by the device 82 is selected and send to a human classifier such that a category associated with the content item is identified, and a new training set is generated including the content items from the initial training set. For example, when the automated classifier 32 performs classification on the content items, the content item identified as "B" that is classified with a classification confidence of 0.96 that well exceeds a defined classification confidence threshold of 0.50 is selected from the automated classifier 32, while the content items 14 identified as "A" and "C" which are classified with unacceptably low classification confidences of 0.24 and 0.03 are sent to the human classifier to identify the classes associated with the content items, then those classified data from the human classifier combine with the previous selected classified data from the automated classifier 32 to generate a new training set”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include two or more classifiers for classifying the input data to generate classified data, and randomly selects the classified data from the two of more classifiers as the set of additional training data of Paquet. Doing so would help classifying content items with an acceptable classification confidence and accuracy (Paquet, 0029).

As per claim 56, Bazrafkan, Chaudhari, Rosswog, Li and Paquet teach the method of claim 55.
Rosswog teaches
a computerized learning experimentation system that is in communication with the reference system [Fig. 2];
generate classified data as the set of additional training data [Col. 9, lines 22-35, “one or more seed set generators 230 that may generate seed sets of electronic documents”; Col. 12, lines 37-67 – Col. 13, lines 1-2, “document categorizer 250 may send an indication to seed set generator 230 that a second or subsequent seed set of electronic document classifications is needed to retrain machine learning algorithm 252 … seed set generator 230 may generate additional seed sets based on the metrics tracked by performance tracker 254”];
Paquet further teaches
providing, by a computerized learning experimentation system, a tunable control parameter [paragraph 0032, “a classification confidence threshold 54 defined by the device 82”] to the reference system [Fig. 6, paragraph 0032, “The exemplary system 86 may also comprise a training set generating component 90”] that controls a probability at which-6- 308380659 v2Serial No. 16/496,585the reference system randomly selects the classified data from each of the two or more classifiers to be the set of additional training data [paragraph 0023, “the automated classifier 32 is developed using a training set 34 … The training set 34 may be generated … by utilizing a sample content set 12 prepared by another automated classifier 32”; Figs 2 and 4, paragraph 0030, “The automated classifier 32 may be invoked to perform a classification 18 of content items 14 of a content set 12 ( such as the three content items 14 identified in this exemplary scenario 50 as "A", "B", and "C"), and each classification 18 may result in an identified association with one or more categories 16, and also a classification confidence 52 (e.g., computed as a probability between 0.00, indicating no confidence, and 1.00, indicating absolute confidence). An embodiment of these techniques may compare the classification confidence 52 of each classification 18 with a classification confidence threshold 54 (e.g., a 0.50 probability) that distinguishes acceptably confident classifications 18 from unacceptably confident classifications 18. For example, the content item 14 identified as "B" may be classified with a classification confidence 52 of 0.96 that well exceeds a defined classification confidence threshold 54 of 0.50, while the content items 14 identified as "A" and "C" may be classified with unacceptably low classification confidences 52 of 0.24 and 0.03. Accordingly, an embodiment of these techniques may select these content items 14 for inclusion in a supplemental training set 34, and may provide this training set 34 to a human classifier 20 for classification 18. After the human classifier 20 identifies one or more categories 16 associated with each content item 14, these associations may be used in a supplemental training 36 in order to improve the proficiency of the automated classifier 32 in classifying these types of content items 14. (The supplemental training 36 … include … the content items 14 from the initial training set 34, and/or from previously generated supplemental training sets 34.) In this manner, the supplementally trained automated classifier 32 may therefore exhibit a wider range of acceptably accurate classifications 18”; It can be seen that, classification is performed on the content items and each classification is assigned with a confidence score (probability), any classification confidence that is below a classification confidence threshold 54 defined by the device 82 is selected and send to a human classifier such that a category associated with the content item is identified, and a new training set is generated including the content items from the initial training set. For example, when the automated classifier 32 performs classification on the content items, the content item identified as "B" that is classified with a classification confidence of 0.96 that well exceeds a defined classification confidence threshold of 0.50 is selected from the automated classifier 32, while the content items 14 identified as "A" and "C" which are classified with unacceptably low classification confidences of 0.24 and 0.03 are sent to the human classifier to identify the classes associated with the content items, then those classified data from the human classifier combine with the previous selected classified data from the automated classifier 32 to generate a new training set; In this example, since the total classified data is 3, thus the probability of selecting a classified data from the automated classifier 32 by the system is 1/3 (selecting B), and the probability of selecting a classified data from the human classifier is 1 – 1/3].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the process of providing a tunable control parameter to the reference system that controls a probability at which the reference system randomly selects the classified data from each of the two or more classifiers to be the set of additional training data of Paquet. Doing so would help classifying content items with an acceptable classification confidence and accuracy (Paquet, 0029).

	Claims 7, 51 and 63 are rejected under 35 U.S.C. 103 as being unpatentable over Bazrafkan et al. in view of Chaudhari et al. in view of Rosswog et al. in view of Li et al. and further in view of Kulkarni et al. (US Pub. 2003/0191728).
As per claim 7, Bazrafkan, Chaudhari, Rosswog and Li teach the ML computer system of claim 2.
Bazrafkan teaches 
generating a set of additional training data for the student ML system [paragraph 0054, “network B being fed (at least partially) with augmented samples generated by the instances of network A from processing the samples of batch X(T)”]
Rosswog teaches
the reference system [Fig. 2, seed set generators 230] comprises at least one classifier for classifying input data to generate classified data as the set of additional training data [abstract, “identifying and categorizing electronic documents through machine learning … a seed set of categorized electronic documents may be used to train a document categorizer based on a machine learning algorithm; Col. 9, lines 22-35, “one or more seed set generators 230 that may generate seed sets of electronic documents”; Col. 12, lines 37-67 – Col. 13, lines 1-2, “document categorizer 250 may send an indication to seed set generator 230 that a second or subsequent seed set of electronic document classifications is needed to retrain machine learning algorithm 252 … seed set generator 230 may generate additional seed sets based on the metrics tracked by performance tracker 254”; It can be seen that the seed set generator 230 comprises a categorizer that categorizes the documents to generate the seed set];
Bazrafkan, Chaudhari, Rosswog and Li do not teach
adds noise to the classified data to generate the set of additional training data.  
Kulkarni teaches 
adds noise to the classified data to generate the set of additional training data [abstract, “improving the prediction accuracy and generalization performance of artificial neural network models in presence of input-output example data … a specific amount of Gaussian noise is added to each input/output variable in the example set and the enlarged sample data set created thereby is used as the training set for constructing the artificial neural network model, the amount of noise to be added is specific to an input/output variable and its optimal value is determined using a stochastic search and optimization technique, namely, genetic algorithms, the network trained on the noise-superimposed enlarged training set shows significant improvements in its prediction accuracy and generalization performance”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the process of adding noise to the classified data to generate the set of training data of Kulkarni. Doing so would help 

As per claim 51, Bazrafkan, Chaudhari, Rosswog and Li teach the method of claim 46.
Bazrafkan teaches 
generating a set of additional training data for the student ML system [paragraph 0054, “network B being fed (at least partially) with augmented samples generated by the instances of network A from processing the samples of batch X(T)”]
Rosswog teaches
the reference system [Fig. 2, seed set generators 230] comprises at least one classifier for classifying input data to generate classified data as the set of additional training data [abstract, “identifying and categorizing electronic documents through machine learning … a seed set of categorized electronic documents may be used to train a document categorizer based on a machine learning algorithm; Col. 9, lines 22-35, “one or more seed set generators 230 that may generate seed sets of electronic documents”; Col. 12, lines 37-67 – Col. 13, lines 1-2, “document categorizer 250 may send an indication to seed set generator 230 that a second or subsequent seed set of electronic document classifications is needed to retrain machine learning algorithm 252 … seed set generator 230 may generate additional seed sets based on the metrics tracked by performance tracker 254”; It can be seen that the seed set generator 230 comprises a categorizer that categorizes the documents to generate the seed set];
Bazrafkan, Chaudhari, Rosswog and Li do not teach
adds noise to the classified data to generate the set of additional training data.  
Kulkarni teaches 
[abstract, “improving the prediction accuracy and generalization performance of artificial neural network models in presence of input-output example data … a specific amount of Gaussian noise is added to each input/output variable in the example set and the enlarged sample data set created thereby is used as the training set for constructing the artificial neural network model, the amount of noise to be added is specific to an input/output variable and its optimal value is determined using a stochastic search and optimization technique, namely, genetic algorithms, the network trained on the noise-superimposed enlarged training set shows significant improvements in its prediction accuracy and generalization performance”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the process of adding noise to the classified data to generate the set of training data of Kulkarni. Doing so would help improving the prediction accuracy and generalization performance of artificial neural network models (Kulkarni, abstract).

As per claim 63, Bazrafkan, Chaudhari, Rosswog and Li teach the ML computer system of claim 20.
Bazrafkan, Chaudhari, Rosswog and Li do not teach
the one or more revised hyperparameters comprise a revised momentum hyperparameter for the student ML system.  
Kulkarni teaches
the one or more revised hyperparameters comprise a revised momentum hyperparameter [paragraph 0061, “The objective of network training is to minimize the RMSE with respect to the test set … To achieve this objective, it is necessary to optimize the number of hidden layers, the number of nodes in each hidden layer, and training algorithm specific parameters, for example, the learning rate and momentum coefficient in the EBP algorithm”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the one or more revised hyperparameters comprise a revised momentum hyperparameter of Kulkarni. Doing so would help minimizing the RMSE with respect to the test set (Kulkarni, 0061).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Bazrafkan et al. in view of Chaudhari et al. in view of Rosswog et al. in view of Li et al. in view of Paquet et al. and further in view of Xiong et al. (US Pub. 2017/0024642).
As per claim 10, Bazrafkan, Chaudhari, Rosswog, Li and Paquet teach the ML computer system of claim 8.
Bazrafkan, Chaudhari, Rosswog, Li and Paquet do not explicitly teach
the learning experimentation system: 
is in communication with the learning coach ML system; and 
further determines a cost function for the learning coach ML system based on observations from the reference system and the student ML system; and 
the learning coach ML system uses the cost function in learning the enhancement for the student ML system;
Xiong teaches
the learning experimentation system: 
is in communication with the learning coach ML system [paragraph 0017, “The system and method provided herein can be used to improve the performance of dropout training by adjusting the variance of predictions introduced by dropout during training, and can be referred to for convenience as variance-adjustable dropout”; paragraph 0020, “a feedforward neural network training system comprising an extra hyper-parameter that controls the variance of the ensemble predictors and generalizes ensemble learning. The hyper-parameter can be smoothly adjusted to vary the behaviour of the method from a single model learning to a family of ensemble learning comprising a plurality of interacting models. This technique can be applied to ensemble learning with various cost functions, structures and parameter sharing”]; and 
further determines a cost function for the learning coach ML system based on observations from the reference system and the student ML system [paragraphs 0031-0032, “during the training stage, a plurality of training cases are presented to the neural network in order to train the neural network … Each training case is then processed by the neural network, one or a mini-batch at a time … For each such training case, the switch may reconfigure the neural network; paragraph 0042, “Dropout training, when implemented using suitable parameters, can be considered as improving generalization performance by introducing a distribution of networks with different structures during training. Networks from this distribution produce different predictions f(x, m, w) for the same x and w”; paragraphs 0045-0046, “the system and method provided herein enable the variance off(x, m, w) to be adjusted during training so that the regularization strength may be better tuned and better performance can be achieved … A new random predictor f~(x, m, w) is provided, which is the variance-adjusted version of f(x, m, w), and an adjusted cost function that is based on the new predictor is provided, as follows: … (equation 6)”]; and 
the learning coach ML system uses the cost function in learning the enhancement for the student ML system [paragraph 0020, paragraph 0020, “a feedforward neural network training system comprising an extra hyper-parameter that controls the variance of the ensemble predictors and generalizes ensemble learning …. This technique can be applied to ensemble learning with various cost functions, structures and parameter sharing”; wherein, paragraph 0017, “improve the performance of dropout training by adjusting the variance of predictions introduced by dropout during training”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the process of determining a cost function for the learning coach ML system, and the learning coach ML system uses the cost function in learning the enhancement for the student ML system of Xiong. Doing so would help controlling the performance of the machine learning system.

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Bazrafkan et al. in view of Chaudhari et al. in view of Rosswog et al. in view of Li et al. in view of Paquet et al. and further in view of Kulkarni (US Pub. 2003/0191728).
As per claim 16, Bazrafkan, Chaudhari, Rosswog, Li and Paquet teach the ML computer system of claim 11.
Bazrafkan, Chaudhari, Rosswog, Li and Paquet do not teach
the reference system adds noise to output of each of the two or more classifiers prior to randomly selecting the classified data from two or more classifiers to be the set of additional training data.  
Kulkarni teaches 
the reference system adds noise to output of each of the two or more classifiers prior to randomly selecting the classified data from two or more classifiers to be the set of additional training data [paragraph 0004, “training with noise-added data improves the classification ability of the multilayer perceptron (MLP) networks”; paragraph 0049, “the artificial neural networks used to perform nonlinear modeling and classification, are trained using the noise-superimposed enlarged input-output data set, where the optimal amount of noise to be added to each input/output variable in the example set, has been determined using a stochastic optimization formalism known as genetic algorithms”; abstract, “improving the prediction accuracy and generalization performance of artificial neural network models in presence of input-output example data … a specific amount of Gaussian noise is added to each input/output variable in the example set and the enlarged sample data set created thereby is used as the training set for constructing the artificial neural network model, the amount of noise to be added is specific to an input/output variable and its optimal value is determined using a stochastic search and optimization technique, namely, genetic algorithms, the network trained on the noise-superimposed enlarged training set shows significant improvements in its prediction accuracy and generalization performance”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the process of adding noise to output of each of the two or more classifiers of Kulkarni. Doing so would help improving the prediction accuracy and generalization performance of artificial neural network models (Kulkarni, abstract).

Claims 17 and 54 are rejected under 35 U.S.C. 103 as being unpatentable over Bazrafkan et al. in view of Chaudhari et al. in view of Rosswog et al. in view of Li et al. and further in view of Xiong et al. (US Pub. 2017/0024642).
As per claim 17, Bazrafkan, Chaudhari, Rosswog and Li teach the ML computer system of claim 1.
Rosswog teaches
a computerized learning experimentation system [Fig. 2, document categorizer 250] that is in communication with the reference system [Fig. 2]; 
[Col. 12, lines 37-66, “document categorizer 250 may include a performance tracker 254 that tracks one or more metrics associated with the performance of document categorizer 250's categorizations. The metrics may include the number of electronic documents categorized in each category (e.g., relevant and not relevant), the confidence modifiers of all the categorized electronic documents … document categorizer 250 may send an indication to seed set generator 230 that a second or subsequent seed set of electronic document classifications is needed (control parameter) to retrain machine learning algorithm 252 to improve its categorization performance”]; 
the control parameter controls generation of the set of additional training data by the reference system [Col. 12, line 67 – Col. 13, lines 1-2, “seed set generator 230 may generate additional seed sets based on the metrics tracked by performance tracker 254”];
Bazrafkan, Chaudhari, Rosswog and Li do not explicitly teach
the reference system comprises an ensemble of multiple ML ensemble members; 
the control parameter comprises combining weights for combining output from the multiple ML ensemble members of the reference system.  
Xiong teaches
the reference system comprises an ensemble of multiple ML ensemble members [Fig. 1, paragraphs 0016 - 0027, “A system and method for addressing overfitting in a neural network … a stochastic gradient descent process may be applied for training the neural network on mini-batches of training cases processed using a dropout neural network training process … One approach to addressing overfitting is referred to as dropout, which selectively disables a randomly (or pseudorandomly) selected subset of hidden units and/or input units in the neural network … Dropout training of a single neural network has the effect of creating an exponentially large ensemble of neural networks with different structures, but with shared parameter … Methods of ensemble learning implementing dropout training for deep neural networks are described … different neural networks in the plurality of neural networks differ only in that during the forward pass, feature detectors are selectively disabled randomly, pseudorandomly or using a fixed or predetermined pattern, in the fashion of the Dropout procedure, and the selection of feature detectors to be deactivated is not the same in different neural networks … a feedforward neural network (100) having a plurality of layers (102) is shown. Each layer comprises one or more feature detectors (104), each of which may be associated with activation functions and weights for each parameter input to the respective feature detector … A set of switches (108) are linked to at least a subset of the feature detectors. Each switch is operable to selectively disable its respective feature detector in the neural network to which it is linked, with a learned or preconfigured probability. A random or pseudorandom number generator (110) may be linked to the switch to provide the switch with a random or pseudorandom number value that enables the switch to selectively disable each linked feature detector”]; 
the control parameter comprises combining weights for combining output from the multiple ML ensemble members of the reference system [paragraph 0033, “Once the training set has been learned by the neural network, the switch may enable all feature detectors and normalize their outgoing weights (204). Normalization comprises reducing the outgoing weights of each feature detector or input by multiplying them by the probability that the feature detector or input was not disabled”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the control parameter comprises combining weights for combining output from the multiple ML ensemble members of the reference system of Xiong. Doing so would help controlling the performance of the machine learning system.

As per claim 54, Bazrafkan, Chaudhari, Rosswog and Li teach the method of claim 52.
Bazrafkan, Chaudhari, Rosswog and Li do not explicity teach
determining, by the learning experimentation system, a cost function for the learning coach ML system based on observations from the reference system and the student ML system, wherein the learning coach ML system uses the cost function in learning the enhancement for the student ML system.  
Xiong teaches
determining, by the learning experimentation system, a cost function for the learning coach ML system based on observations from the reference system and the student ML system [paragraphs 0031-0032, “during the training stage, a plurality of training cases are presented to the neural network in order to train the neural network … Each training case is then processed by the neural network, one or a mini-batch at a time … For each such training case, the switch may reconfigure the neural network; paragraph 0042, “Dropout training, when implemented using suitable parameters, can be considered as improving generalization performance by introducing a distribution of networks with different structures during training. Networks from this distribution produce different predictions f(x, m, w) for the same x and w”; paragraphs 0045-0046, “the system and method provided herein enable the variance off(x, m, w) to be adjusted during training so that the regularization strength may be better tuned and better performance can be achieved … A new random predictor f~(x, m, w) is provided, which is the variance-adjusted version of f(x, m, w), and an adjusted cost function that is based on the new predictor is provided, as follows: … (equation 6)”], wherein the learning coach ML system uses the cost function in learning the enhancement for the student ML system [paragraph 0020, paragraph 0020, “a feedforward neural network training system comprising an extra hyper-parameter that controls the variance of the ensemble predictors and generalizes ensemble learning …. This technique can be applied to ensemble learning with various cost functions, structures and parameter sharing”; wherein, paragraph 0017, “improve the performance of dropout training by adjusting the variance of predictions introduced by dropout during training”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the process of determining a cost function for the learning coach ML system based on observations from the reference system and the student ML system of Xiong. Doing so would help controlling the performance of the machine learning system.

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Bazrafkan et al. in view of Chaudhari et al. in view of Rosswog et al. in view of Li et al. and further in view of Talathi et al. (US Pub. 2016/0224903).
As per claim 18, Bazrafkan, Chaudhari, Rosswog and Li teach the ML computer system of claim 1.
Bazrafkan, Chaudhari, Rosswog and Li do not each
the learning coach ML system comprises a pattern recognition system that recognizes patterns of learning performance of a ML system.  
Talathi teaches 
the learning coach ML system comprises a pattern recognition system that recognizes patterns of learning performance of a ML system [paragraph 0074, “Neural networks may be designed with a variety of connectivity patterns … recognizing patterns that span more than one of the input data chunks that are delivered to the neural network in a sequence”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include a ML system .

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Bazrafkan et al. in view of Chaudhari et al. in view of Rosswog et al. in view of Li et al. and further in view of Xiao et al. (US Pub. 2004/0059695).
As per claim 21, Bazrafkan, Chaudhari, Rosswog and Li teach the ML computer system of claim 1.
Bazrafkan, Chaudhari, Rosswog and Li do not explicitly teach
the enhancement comprises a structural change to the student ML system.  
Xiao teaches
the enhancement comprises a structural change to the student ML system [paragraph 0150, “If in block 806 it is determined that performance of neural network is not satisfactory, then in order to try to improve the performance by adding additional processing nodes, the process 800 continues with block 808 in which the number of processing nodes is incremented. The topology of the type shown in FIG. 1 (i.e., a feed-forward sequence of processing nodes) is preferably maintained when incrementing the number of processing nodes. In block 810 the neural network formed in the preceding block 808 by incrementing the number of nodes is trained until the aforementioned stopping condition is met. Next, in block 812 it is ascertained whether or not the performance of the augmented neural network that was formed in block 808 is satisfactory. If the performance is now found to be satisfactory then the process 800 halts”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of training a neural network of Bazrafkan to include the .

Claim 61 is rejected under 35 U.S.C. 103 as being unpatentable over Bazrafkan et al. in view of Chaudhari et al. in view of Rosswog et al. in view of Li et al. and further in view of Aslan et al. (US Pub. 2017/0132528).
As per claim 61, Bazrafkan, Chaudhari, Rosswog and Li teach the ML computer system of claim 20.
Bazrafkan, Chaudhari, Rosswog and Li do not explicitly teach
the one or more revised hyperparameters comprise a revised learning rate hyperparameter for the student ML system.  
Aslan teaches
the one or more revised hyperparameters comprise a revised learning rate hyperparameter [paragraph 0043, “a scheduling module can initiate training of the second (student) machine learning model 102 at a slow learning rate, and gradually increase the learning rate of the second model 102 as training progresses”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have included the one or more revised hyperparameters comprise a revised learning rate hyperparameter of Aslan into the method of training a neural network of Bazrafkan. Doing so would help controlling the learning rate of any machine learning model for efficiency in computation (Aslan, 0043).



Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Li et al. (US Pub. 2016/0078339) describes a method for generating a DNN classifier by "learning" a "student" DNN model from a larger more accurate "teacher" DNN model.
Mims (US Patent 7,062,476) describes a student neural network that is capable of receiving a series of tutoring inputs from one or more teacher networks to generate a student network output that is similar to the output of the one or more teacher networks.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TRI T NGUYEN whose telephone number is 571-272-0103. The examiner can normally be reached M-F, 8 AM-5 PM, (CT).

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ can be reached on 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/T. N./Examiner, Art Unit 2128                                                                                                                                                                                                        
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128