Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 2021-06-24 has been entered.  Claims 1-20 remain pending in the application.  Applicant’s amendments to the claims overcome each and every objection previously set forth in the Non-Final office action mailed 2021-04-14.
Response to Arguments
Applicant's arguments in response to rejections under 35 USC 101 have been fully considered and they are persuasive.  Examiner has determined that the invention falls within “Improvements to the Functioning of a Computer or To Any Other Technology or Technical Field” as stated in MPEP 2106.05(a), because rather than broadly stating “training a model” with no details, Applicant has detailed a specific method of training that results in a technical improvement that results in faster convergence of a machine learning model.  Thus, the claims do not amount to an attempt to monopolize a judicial exception, nor to simply “apply” an exception to a field of use.
Applicant's arguments in response to rejections under 35 USC 103 have been fully considered but they are not persuasive.  Applicant argues that that McDermott merely teaches modifying a loss function by a distance factor, as opposed to teaching a loss function comprising a prediction function, wherein the prediction function is modified by a distance factor.  Examiner respectfully disagrees.  Examiner points out that one may be tempted to consider 

    PNG
    media_image1.png
    51
    262
    media_image1.png
    Greyscale

To be the “loss function”.  However, this is not the “loss function”, but rather the “residual” as stated in [0089].  The “residual” is actually a first derivative of the Loss Function, as the Instant Spec is using the method of Li and Friedman, and Friedman demonstrates this:

    PNG
    media_image2.png
    124
    1134
    media_image2.png
    Greyscale

Where L is the loss function.
Friedman shows the Loss function to be:

    PNG
    media_image3.png
    127
    936
    media_image3.png
    Greyscale

And rearranges this expression to

    PNG
    media_image4.png
    116
    923
    media_image4.png
    Greyscale

Which is analogous to the “prediction function” as named in the Instant Specification.  Examiner points out that it is apparent that if one were to modify Loss function L with a distance factor, one could also be considered to be modifying the prediction function F by a distance factor.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Li (“A gentle introduction to gradient boosting”) in view of Friedman (“Greedy Function Approximation:  A Gradient Boosting Machine”) and McDermott et. al. (“Prototype-based MCE/GPD training for word spotting and connected word recognition”; hereinafter McDermott).
As per Claim 1, Li teaches a method for training a classification model, the method comprising (Li, Page 25, discloses training a model:  “We are improving the predictions of training data”.  Li, Page 48, discloses applying the technique for a classification model: “Gradient Boosting for Classification” in the title, and “Multi-class classification” in the slide text.)
obtaining, by a device comprising a memory storing instructions and a processor in communication with the memory, a training sample, the training sample comprising a training parameter and a true classification corresponding to the training parameter (Li, Page 69, discloses a training parameter:  “a matrix of parameters to optimize”.  Li, Page 49, discloses a true classification, showing for a chosen example “Label = G”.  Li implies the use of a computer device, which comprises memory and a processor, since Li supplies a hyperlink to an online location for the data set, and also on Page 2 Li supplies a Github link to a code implementation.)
performing, by the device, classification training on an initial classification model by using the training parameter, to obtain a predicted classification (Li, Page 25, discloses training a model to obtain a prediction:  “We are improving the predictions of training data”.  Li, Page 48, discloses applying the technique for a classification model: “Gradient Boosting for Classification” in the title, and “Multi-class classification” in the slide text.  Li, Page 48, discloses obtaining a predicted classification:  “Recognize the given hand written capital letter”.  Li, Page 69, discloses a training parameter:  “a matrix of parameters to optimize”.  Li, Page 70, indicates an initial model, disclosing “Start with initial models”).
determining, by the device, a residual between the true classification and the predicted classification according to a gradient loss function of the initial classification model (Li, Page 27, discloses residual:  “yi – F(xi) are called residuals”.  Li, Page 13, indicates that yi is the true value and F(xi) is the predicted value:  “You are given (x1; y1); (x2; y2); …; (xn; yn), and the task is to fit a model F(x)”.  Li, Page 68, discloses “Give any differentiable loss function L”.  Li, Page 70, discloses that differentiating this loss function L produces a gradient:  “calculate negative gradients for class A”, and discloses an equation that shows this gradient is produced by differentiating the loss function.  Li, Page 71, discloses that this negative gradient equivalent to the true value minus the predicted value: “(YA(xi) – PA(xi))”, and this is known as a residual.)
and modifying, by the device, the initial classification model according to the residual to obtain a final classification model (Li, Page 71, discloses “calculate negative gradients”, which are residuals.  Li, Page 71, also discloses modifying the initial classification model according to the residual:  “fit a regression tree hA to the negative gradients” and “FA := FA + rhoA*hA”.  Li, Page 71, also discloses a final classification model, as they state “iterate until converge”.  Upon convergence, the updates stop and one has the final classification model.)
However, Li does not explicitly teach the gradient loss function being based on a prediction function.
Friedman teaches the gradient loss function being based on a prediction function (First note that Li, Page 55, discloses under “Loss Function for each data point”:  “Step 3 Calculate the difference between the true probability distribution and the predicted probability distribution.  Here we use KL divergence.”
Examiner notes that KL divergence is defined as:

    PNG
    media_image5.png
    60
    289
    media_image5.png
    Greyscale

Which is equivalent to:

    PNG
    media_image6.png
    49
    411
    media_image6.png
    Greyscale

Friedman, Section 4.6 “Multi class logistic regression and classification”, teaches:
“Here we develop a gradient descent boosting algorithm for the K-class problem.  The loss function is”

    PNG
    media_image7.png
    104
    936
    media_image7.png
    Greyscale

When considering the K-class problem (such as Li’s character recognition which is a 26-class problem), it is the case that the true probability distribution is a vector of several 0’s and one 1.  Therefore, when applied to the Wikipedia DKL equation, one of ordinary skill in the art can see that the second term of the equation in this case is always zero, because either p(x) itself is 0 or p(x) is 1, which then makes log p(x) to be equal to 0.  So, the product p(x)*log p(x) is always 0.  Then one arrives at Friedman’s loss function which is the first term of the KL equation.  Therefore, Friedman is also using KL divergence in the loss function for the K-class problem.  It is therefore established that Li (“Here we use KL divergence”) is simply using Friedman’s method (and Li cites Friedman on Page 80). 
	Friedman continues:  “where yk = 1 (class = k) e {0, 1}, and pk(x)  = Pr (yk = 1 | x). Following FHT00, we use the symmetric multiple logistic transform”

    PNG
    media_image8.png
    122
    909
    media_image8.png
    Greyscale

“or equivalently”

    PNG
    media_image9.png
    113
    970
    media_image9.png
    Greyscale

“Substituting (30) into (28) and taking first derivatives one has”

    PNG
    media_image10.png
    133
    1127
    media_image10.png
    Greyscale

Here, the equations correspond to the Instant Specification, where Fk(x) is called a “prediction function”.  One can see from Friedman that the prediction function F is actually a rearrangement of the loss function as seen below:

    PNG
    media_image11.png
    144
    935
    media_image11.png
    Greyscale


    PNG
    media_image12.png
    114
    897
    media_image12.png
    Greyscale

Thus, Friedman discloses the gradient loss function being based on a prediction function.
Li and Friedman are analogous art because they are in the field of endeavor of machine learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the gradient boosting of Li, with the function of Friedman. The modification would have been obvious because one of ordinary skill in the art would be motivated to improve accuracy of the classification model even when using suboptimal data (Friedman, Abstract:  “Gradient boosting of regression trees produces competitive, highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data.”)
However, the combination of Li and Friedman does not teach the gradient loss function being based on a modification prediction function which is modified from a prediction function according to a distance factor, the distance factor representing a distance between a first category and a second category, the first category being a category to which the predicted classification belongs, and the second category being a category to which the true classification belongs.
McDermott teaches the gradient loss function [being based on a modification prediction function] which is modified [from a prediction function] according to a distance factor, the distance factor representing a distance between a first category and a second category, the first category being a category to which the predicted classification belongs, and (McDermott, Abstract Lines 7-10, discloses:  “Furthermore, we define a new MCE/GPD loss function that can incorporate word spotting errors and other measures of symbolic distance between correct and incorrect categories.” Here, McDermott discloses that the loss function comprises a distance between a predicted category and the true category.  McDermott, Intro Lines 5-7, further discloses a classification problem in which gradient descent is used:  “GPD allows us to perform gradient descent on a classification loss measure that closely reflects the misclassification rate”  McDermott, pg 294 Top of left column, discloses more details on possible distance metrics:  “For a classification problem of M categories, we consider a matrix A of inter-category symbolic distances djk between categories j and k. These distances could be many kinds of distance, such as distances between syntactic parse trees”) McDermott, pg 294 3rd column, also discloses:  “Multiplying this expression, for each incorrect category a, by the inter-category symbolic distance djk, and summing over all incorrect categories thus gives an aggregate, weighted symbolic distance between the correct category k and all other categories. Multiplying this aggregate distance by the usual loss L() gives the new loss L2().”  Here, McDermott discloses scaling the Loss function to create a new Loss function, by multiplying by a distance between the correct category and incorrect categories. Thus, McDermott discloses the gradient loss function which is modified according to a distance factor.  
Recall that Friedman teaches the gradient loss function being based on a prediction function, and that the prediction function F is a rearrangement of the Loss function L:

    PNG
    media_image11.png
    144
    935
    media_image11.png
    Greyscale


    PNG
    media_image12.png
    114
    897
    media_image12.png
    Greyscale

Thus, modifying a prediction function according to a distance factor would also be modifying a loss function according to a distance factor.  Thus, the combination of Li, Friedman, and McDermott teaches the gradient loss function being based on a modification prediction function which is modified from a prediction function according to a distance factor, the distance factor representing a distance between a first category and a second category, the first category being a category to which the predicted classification belongs, and the second category being a category to which the true classification belongs.
Li, Friedman, and McDermott are analogous art because they are in the field of endeavor of machine learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the gradient boosting of Li and Friedman, with the distance-based loss function of McDermott. The modification would have been obvious because one of ordinary skill in the art would be motivated to accelerate training by using more informative loss values (McDermott, Sec 3.2 Para 2:  “In MCE/GPD so far, whenever a category is misrecognized as another category, the (ideal) loss is considered to be 1. In a grammar-constrained task, where the categories are taken to be the strings allowed by the grammar, any difference, however slight, between the correct string and incorrect strings will be treated as a loss of 1; a very large 

As per Claim 2, the combination of Li, Friedman, and McDermott teaches the method according to claim 1 as shown above, as well as further comprising: receiving, by the device, to-be-classified data (Li, Page 48, discloses receiving to-be-classified data:  “Data Set” with “20000 data points”)
classifying, by the device, the to-be-classified data by using the final classification model;  (Li, Page 48, discloses classifying the data: “Recognize the given hand written capital letter”.  Li, Page 71, discloses using the final classification model:  “Iterate until converge”)
and outputting, by the device, the classification result. (Li, Page 66, displays an output of the classification result, which is a graph of all 26 letters, showing “G” with the highest probability)

As per Claim 3, the combination of Li, Friedman, and McDermott teaches the method according to claim 1 as shown above, as well as wherein: the performing classification training on an initial classification model by using the training parameter, to obtain the predicted classification comprises: performing, by the device, iterative calculation on the initial classification model by using the training parameter, to obtain a predicted classification generated by a classification model used in each iteration (Li, Page 48, discloses obtaining a training sample, as Li discloses a “Data Set” with “20000 data points”.  Li, Page 48, also discloses “16 features”, being incorporated on Page 49 into one “feature vector”.  In machine learning training, a weight, or parameter, must be applied to this feature, as it is this which is trained.  Li, Page 71, discloses starting with an initial classification model and performing an iterative calculation:  “Start with initial models…iterate until converge”.  Li discloses obtaining a predicted classification in each iteration in Pages 58-66, where Li shows the probability distribution at each iteration.)
the determining the residual between the true classification and the predicted classification according to the gradient loss function of the initial classification model comprises: determining, by the device, according to the gradient loss function of the initial classification model, a residual between the true classification and the predicted classification generated in each iteration (Li, Page 27, discloses residual:  “yi – F(xi) are called residuals”.  Li, Page 13, indicates that yi is the true value and F(xi) is the predicted value:  “You are given (x1; y1); (x2; y2); …; (xn; yn), and the task is to fit a model F(x)”.  Li, Page 68, discloses “Give any differentiable loss function L”.  Li, Page 70, discloses that differentiating this loss function L produces a gradient:  “calculate negative gradients for class A”, and discloses an equation that shows this gradient is produced by differentiating the loss function.  Li, Page 71, discloses that this negative gradient equivalent to the true value minus the predicted value: “(YA(xi) – PA(xi))”, and this is known as a residual.)
and the modifying the initial classification model according to the residual to obtain the final classification model comprises: modifying, by the device according to a residual determined in the Mth iteration, a classification model used in the Mth iteration to obtain a classification model used in the (M+1)th iteration, and obtaining, by the device, the final classification model by means of at least one iterative modification, the classification model used in the Mth iteration being obtained by modifying a classification model used in the (M-1)th iteration according to a residual determined in the (M-1)th iteration, and M being a positive integer greater than 1.  (Li, Page 71, discloses “calculate negative gradients”, which are residuals.  Li, Page 71, also discloses modifying the initial classification model according to the residual:  “fit a regression tree hA to the negative gradients” and “FA := FA + rhoA*hA”.  Here, the assignment operator “:=” means that model F is updated for the next iteration M+1, based on the previous model F from the previous iteration M and the residual h also from the previous iteration M.  Li, Page 71, also discloses a final classification model, as they state “iterate until converge”.  Upon convergence, the updates stop and one has the final classification model. If the final model is considered iteration M, then it was created with the same assignment operator “:=”, with previous model F and residual h from previous iteration M-1).

As per Claim 4, the combination of Li, Friedman, and McDermott teaches the method according to claim 3 as shown above, as well as further comprising: receiving, by the device, to-be-classified data (Li, Page 48, discloses receiving to-be-classified data:  “Data Set” with “20000 data points”)
classifying, by the device, the to-be-classified data by using the final classification model;  (Li, Page 48, discloses classifying the data: “Recognize the given hand written capital letter”.  Li, Page 71, discloses using the final classification model:  “Iterate until converge”)
and outputting, by the device, the classification result. (Li, Page 66, displays an output of the classification result, which is a graph of all 26 letters, showing “G” with the highest probability)

As per Claim 5, the combination of Li, Friedman, and McDermott teaches the method according to claim 3 as shown above.  Li also teaches wherein the determining, according to the gradient loss function of the initial classification model, the residual between the true classification and the predicted classification generated in each iteration comprises: determining, by the device, according to the training parameter and the true classification, a residual between the true classification and the predicted classification in each iteration (Li, Page 27, discloses residual:  “yi – F(xi) are called residuals”.  Li, Page 13, indicates that yi is the true value and F(xi) is the predicted value:  “You are given (x1; y1); (x2; y2); …; (xn; yn), and the task is to fit a model F(x)”.  Li, Page 68, discloses “Give any differentiable loss function L”.  Li, Page 69, discloses a training parameter:  “a matrix of parameters to optimize”.  Li, Page 70, discloses that differentiating this loss function L produces a gradient:  “calculate negative gradients for class A”, and discloses an equation that shows this gradient is produced by differentiating the loss function.  Li, Page 71, discloses that this negative gradient equivalent to the true value minus the predicted value: “(YA(xi) – PA(xi))”, and this is known as a residual.  Li, Page 71, discloses that this is done in each iteration:  “iterate until converge”.)
However, Li does not teach and the distance factor representing the difference between the category to which the true classification belongs and a category to which a predicted classification in each iteration belongs.
McDermott teaches and the distance factor representing the difference between the category to which the true classification belongs and a category to which a predicted classification in each iteration belongs. (McDermott, Abstract Lines 7-10, discloses:  “Furthermore, we define a new MCE/GPD loss function that can incorporate word spotting errors and other measures of symbolic distance between correct and incorrect categories.” Here, McDermott discloses that the loss function comprises a distance between a predicted category and the true category.  McDermott, Intro Lines 5-7, further discloses a classification problem in which gradient descent is used:  “GPD allows us to perform gradient descent on a classification loss measure that closely reflects the misclassification rate”. McDermott, pg 294 Top of left column, discloses more details on possible distance metrics:  “For a classification problem of M categories, we consider a matrix A of inter-category symbolic distances djk between categories j and k. These distances could be many kinds of distance, such as distances between syntactic parse trees”).

As per Claim 6, the combination of Li, Friedman, and McDermott teaches the method according to claim 5.  Li teaches determining, according to the training parameter and the true classification, the residual between the true classification and the predicted classification model (Li, Page 27, discloses residual:  “yi – F(xi) are called residuals”.  Li, Page 13, indicates that yi is the true value and F(xi) is the predicted value:  “You are given (x1; y1); (x2; y2); …; (xn; yn), and the task is to fit a model F(x)”.  Li, Page 68, discloses “Give any differentiable loss function L”.  Li, Page 70, discloses that differentiating this loss function L produces a gradient:  “calculate negative gradients for class A”, and discloses an equation that shows this gradient is produced by differentiating the loss function.  Li, Page 71, discloses that this negative gradient equivalent to the true value minus the predicted value: “(YA(xi) – PA(xi))”, and this is known as a residual).
However, Li does not teach wherein the determining, according to the training parameter and the true classification, the residual between the true classification and the predicted classification in each iteration comprises: determining, by the device, a residual between a predicted classification generated in the kth iteration and the true classification by using the following formulas:

    PNG
    media_image13.png
    168
    430
    media_image13.png
    Greyscale

k is the true classification, Yik is the residual between the predicted classification generated in the kth iteration and the true classification, Pk(x) is a prediction probability function of the kth iteration, F, ) is a prediction function of the kth iteration, Dyk is a distance factor representing a distance between the category to which the true classification belongs and a category to which the predicted classification of the kth iteration belongs, F(xi) is a modification prediction function of the kth iteration, F is a modification prediction function of the lth iteration, and a value of 1 ranges from 1 to K, wherein K is a quantity of classes of the true classification.
Friedman teaches wherein the determining, according to the training parameter and the true classification, the residual between the true classification and the predicted classification in each iteration comprises: determining, by the device, a residual between a predicted classification generated in the kth iteration and the true classification by using the following formulas:

    PNG
    media_image14.png
    166
    501
    media_image14.png
    Greyscale

x is the training parameter, i is a positive integer greater than 1, Yk is the true classification, Yik is the residual between the predicted classification generated in the kth iteration and the true classification, Pk(x) is a prediction probability function of the kth iteration, F, ) is a prediction function of the kth iteration, Dyk is a distance factor representing a distance between the i) is a modification prediction function of the kth iteration, F is a modification prediction function of the lth iteration, and a value of 1 ranges from 1 to K, wherein K is a quantity of classes of the true classification.
(First note that Li, Page 55, discloses under “Loss Function for each data point”:  “Step 3 Calculate the difference between the true probability distribution and the predicted probability distribution.  Here we use KL divergence.”
Examiner notes that KL divergence is defined as:

    PNG
    media_image5.png
    60
    289
    media_image5.png
    Greyscale

Which is equivalent to:

    PNG
    media_image6.png
    49
    411
    media_image6.png
    Greyscale

Friedman, Section 4.6 “Multi class logistic regression and classification”, teaches:
“Here we develop a gradient descent boosting algorithm for the K-class problem.  The loss function is”

    PNG
    media_image15.png
    57
    298
    media_image15.png
    Greyscale

When considering the K-class problem (such as Li’s character recognition which is a 26-class problem), it is the case that the true probability distribution is a vector of several 0’s and one 1.  Therefore, when applied to the Wikipedia DKL equation, one of ordinary skill in the art can see that the second term of the equation in this case is always zero, because either p(x) itself is 0 or p(x) is 1, which then makes log p(x) to be equal to 0.  So, the product p(x)*log p(x) is always 0.  Then one arrives at Friedman’s loss function which is the first term of the KL equation.  Therefore, Friedman is also using KL divergence in the loss function for the K-class problem.  It is therefore established that Li (“Here we use KL divergence”) is simply using Friedman’s method (and Li cites Friedman on Page 80). 
Li, Page 71, discloses “Calculate negative gradients”, and shows this to be equivalent to the residual:  “YA (xi) – PA(xi)”.  Friedman provides more detail on this residual:

    PNG
    media_image16.png
    162
    695
    media_image16.png
    Greyscale

Here, the residuals calculated using KL divergence of Li are shown to result in the claimed equations (without the claimed distance factor Dyk).
The combination of Li and Friedman thus far fails to teach Dyk is a distance factor representing a distance between the category to which the true classification belongs and a category to which the predicted classification of the kth iteration belongs

    PNG
    media_image17.png
    182
    492
    media_image17.png
    Greyscale

McDermott teaches and the distance factor representing the difference between the category to which the true classification belongs and the category to which the predicted classification in each iteration belongs

    PNG
    media_image18.png
    169
    291
    media_image18.png
    Greyscale

(McDermott, Abstract Lines 7-10, discloses:  “Furthermore, we define a new MCE/GPD loss function that can incorporate word spotting errors and other measures of symbolic distance between correct and incorrect categories.” Here, McDermott discloses that the loss function comprises a distance between a predicted category and the true category.  McDermott, Intro Lines 5-7, further discloses a classification problem in which gradient descent is used:  “GPD allows us to perform gradient descent on a classification loss measure that closely reflects the misclassification rate”. McDermott, pg 294 Top of left column, discloses more details on possible distance metrics:  “For a classification problem of M categories, we consider a matrix A of inter-category symbolic distances djk between categories j and k. These distances could be many kinds of distance, such as distances between syntactic parse trees”). 
McDermott, pg 294 3rd column, also discloses:  “Multiplying this expression, for each incorrect category a, by the inter-category symbolic distance djk, and summing over all incorrect categories thus gives an aggregate, weighted symbolic distance between the correct category k and all other categories. Multiplying this aggregate distance by the usual loss L() gives the new loss L2().”  Here, McDermott discloses scaling the Loss function to create a new Loss function, by multiplying by a distance between the correct category and incorrect categories.  Applying a scale factor Dyk to the loss function of Li and Friedman:
 
    PNG
    media_image19.png
    165
    280
    media_image19.png
    Greyscale

Results in the claimed equation

    PNG
    media_image18.png
    169
    291
    media_image18.png
    Greyscale

As per Claim 7, the combination of Li, McDermott, and Friedman teaches the method according to claim 6 as shown above, as well as further comprising: receiving, by the device, to-be-classified data (Li, Page 48, discloses receiving to-be-classified data:  “Data Set” with “20000 data points”)
classifying, by the device, the to-be-classified data by using the final classification model;  (Li, Page 48, discloses classifying the data: “Recognize the given hand written capital letter”.  Li, Page 71, discloses using the final classification model:  “Iterate until converge”)
(Li, Page 66, displays an output of the classification result, which is a graph of all 26 letters, showing “G” with the highest probability)

As per Claim 8, Claim 8 is an apparatus claim corresponding to method Claim 1.  The difference is that it recites a memory and a processor.  (Li implies the use of a computer device, which comprises memory and a processor, since Li supplies a hyperlink to an online location for the data set, and also on Page 2 Li supplies a Github link to a code implementation. Such code implementation must be run on a computer.)  Claim 8 is rejected for the same reasons as Claim 1.

As per Claim 9, Claim 9 is an apparatus claim corresponding to method Claim 2.  The difference is that it recites a memory and a processor.  Claim 9 is rejected for the same reasons as Claim 2.

As per Claim 10, Claim 10 is an apparatus claim corresponding to method Claim 3.  The difference is that it recites a memory and a processor.  Claim 10 is rejected for the same reasons as Claim 3.

As per Claim 11, Claim 11 is an apparatus claim corresponding to method Claim 4.  The difference is that it recites a memory and a processor.  Claim 11 is rejected for the same reasons as Claim 4.

As per Claim 12, Claim 12 is an apparatus claim corresponding to method Claim 5.  The difference is that it recites a memory and a processor.  Claim 12 is rejected for the same reasons as Claim 5.

As per Claim 13, Claim 13 is an apparatus claim corresponding to method Claim 6.  The difference is that it recites a memory and a processor.  Claim 13 is rejected for the same reasons as Claim 6.

As per Claim 14, Claim 14 is an apparatus claim corresponding to method Claim 7.  The difference is that it recites a memory and a processor.  Claim 14 is rejected for the same reasons as Claim 7.

As per Claim 15, Claim 15 is a non-transitory computer readable storage medium claim corresponding to method Claim 1.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  (Li implies the use of a computer device, which comprises a non-transitory computer readable storage medium and a processor, since Li supplies a hyperlink to an online location for the data set, and also on Page 2 Li supplies a Github link to a code implementation. Such code implementation must be run on a computer, and the code must be saved on a non-transitory computer readable storage medium.)  Claim 15 is rejected for the same reasons as Claim 1.

As per Claim 16, Claim 16 is a non-transitory computer readable storage medium claim corresponding to method Claim 2.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  Claim 16 is rejected for the same reasons as Claim 2.

As per Claim 17, Claim 17 is a non-transitory computer readable storage medium claim corresponding to method Claim 3.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  Claim 17 is rejected for the same reasons as Claim 3.

As per Claim 18, Claim 18 is a non-transitory computer readable storage medium claim corresponding to method Claim 5.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  Claim 18 is rejected for the same reasons as Claim 5.

As per Claim 19, Claim 19 is a non-transitory computer readable storage medium claim corresponding to method Claim 6.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  Claim 19 is rejected for the same reasons as Claim 6.

As per Claim 20, Claim 20 is a non-transitory computer readable storage medium claim corresponding to method Claim 7.  The difference is that it recites a non-transitory computer and a processor.  Claim 20 is rejected for the same reasons as Claim 7.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710.  The examiner can normally be reached on M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/L.A.S./Examiner, Art Unit 2126                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126