Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 2019-02-27 and 2021-03-09 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Objections
Claim 3 is objected to because of the following informalities:  “by the device according” should be changed to add a comma to read “by the device, according” in Lines 9 and 14.  Appropriate correction is required.
Claim 5 is objected to because of the following informality:  “, by the device according” should be changed to add a comma to read “by the device, according” in Line 4.  Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis:

Step 2 Analysis:
Based on the claims being determined to be within one of the four categories (Step 1), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea). In this case the claims fall within the judicial exception of an abstract idea, specifically, “Mental Processes (processes that can be performed in the human mind, or by a human using a pen and paper)”.
Step 2A: Prong 1 analysis:
The claim(s) recite(s):
Claims 1, 8, and 15:
“performing…classification training” (mental process)
“determining…a residual…according to a gradient loss function” (mental process)
“modifying…the initial classification model according to the residual (mental process)
Step 2A: Prong 2 analysis:
This judicial exception is not integrated into a practical application because the additional elements in claims 1, 8, and 15 “device”, “memory”, “processor”, and “non-transitory computer readable storage medium” correspond to mere instructions to implement an abstract idea or other exception on a computer. The additional limitation “obtaining…a necessary data gathering and outputting, see MPEP 2106.05(g)(3)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.
Step 2B analysis:
Claims 1, 8, and 15 do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional limitations of claims 1, 8, and 15 “device”, “memory”, “processor”, and “non-transitory computer readable storage medium” correspond to mere instructions to implement an abstract idea or other exception on a computer. The additional limitation “obtaining data” amounts to a well understood, routine, and conventional activity (Storing and retrieving information in memory, see MPEP 2106.05(d)(II)(iv)). The claims are directed to a judicial exception.
Dependent claim(s) 2-7, 9-14, and 16-20 when analyzed as a whole are held to be patent ineligible under 35 U.S.C. 101 because the additional recited limitation(s) fail(s) to establish that the claim(s) is/are not directed to an abstract idea, as they recite further embellishment of the judicial exception.
Claims 2, 9, and 16 recite the same limitations as Claims 1, 8, and 15, further performing “classifying…the data” (mental process); additional elements “receiving…data” and “outputting…result” amount to merely “applying” the concept in a computer environment; the additional limitations do not amount to significantly more, as they are well-understood, Storing and retrieving information in memory, see MPEP 2106.05(d)(II)(iv)).
Claims 3, 10, and 17 recite the same limitations as Claims 1, 8, and 15, further specifying “performing…iterative calculation”, “determining…according to the gradient loss function…a residual”, “modifying…according to a residual”, and “obtaining…by means of at least one iterative modification” (mental process).
Claims 4 and 11 recite the same limitations as Claims 3 and 10, further performing “classifying…the data” (mental process); additional elements “receiving…data” and “outputting…result” amount to merely “applying” the concept in a computer environment; the additional limitations do not amount to significantly more, as they are well-understood, routine, and conventional activity (Storing and retrieving information in memory, see MPEP 2106.05(d)(II)(iv)).
Claims 5, 12, and 18 recite the same limitations as Claims 3, 10, and 17, further specifying “determining…a residual” (mental process).
Claims 6, 13, and 19 recite the same limitations as Claims 5, 12, and 18, further specifying “determining…a residual” based on mathematical formulas (mental process / mathematical formula).
Claims 7, 14, and 20 recite the same limitations as Claims 6, 13, and 19, further performing “classifying…the data” (mental process); additional elements “receiving…data” and “outputting…result” amount to merely “applying” the concept in a computer environment; the additional limitations do not amount to significantly more, as they are well-understood, Storing and retrieving information in memory, see MPEP 2106.05(d)(II)(iv)).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 8-12, and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Li (“A gentle introduction to gradient boosting”) in view of McDermott et. al. (“Prototype-based MCE/GPD training for word spotting and connected word recognition”; hereinafter McDermott).
As per Claim 1, Li teaches a method for training a classification model, the method comprising (Li, Page 25, discloses training a model:  “We are improving the predictions of training data”.  Li, Page 48, discloses applying the technique for a classification model: “Gradient Boosting for Classification” in the title, and “Multi-class classification” in the slide text.)
obtaining, by a device comprising a memory storing instructions and a processor in communication with the memory, a training sample, the training sample comprising a training parameter and a true classification corresponding to the training parameter (Li, Page 69, discloses a training parameter:  “a matrix of parameters to optimize”.  Li, Page 49, discloses a true classification, showing for a chosen example “Label = G”.  Li implies the use of a computer device, which comprises memory and a processor, since Li supplies a hyperlink to an online location for the data set, and also on Page 2 Li supplies a Github link to a code implementation.)
performing, by the device, classification training on an initial classification model by using the training parameter, to obtain a predicted classification (Li, Page 25, discloses training a model to obtain a prediction:  “We are improving the predictions of training data”.  Li, Page 48, discloses applying the technique for a classification model: “Gradient Boosting for Classification” in the title, and “Multi-class classification” in the slide text.  Li, Page 48, discloses obtaining a predicted classification:  “Recognize the given hand written capital letter”.  Li, Page 69, discloses a training parameter:  “a matrix of parameters to optimize”.  Li, Page 70, indicates an initial model, disclosing “Start with initial models”).
determining, by the device, a residual between the true classification and the predicted classification according to a gradient loss function of the initial classification model (Li, Page 27, discloses residual:  “yi – F(xi) are called residuals”.  Li, Page 13, indicates that yi is the true value and F(xi) is the predicted value:  “You are given (x1; y1); (x2; y2); …; (xn; yn), and the task is to fit a model F(x)”.  Li, Page 68, discloses “Give any differentiable loss function L”.  Li, Page 70, discloses that differentiating this loss function L produces a gradient:  “calculate negative gradients for class A”, and discloses an equation that shows this gradient is produced by differentiating the loss function.  Li, Page 71, discloses that this negative gradient equivalent to the true value minus the predicted value: “(YA(xi) – PA(xi))”, and this is known as a residual.)
and modifying, by the device, the initial classification model according to the residual to obtain a final classification model (Li, Page 71, discloses “calculate negative gradients”, which are residuals.  Li, Page 71, also discloses modifying the initial classification model according to the residual:  “fit a regression tree hA to the negative gradients” and “FA := FA + rhoA*hA”.  Li, Page 71, also discloses a final classification model, as they state “iterate until converge”.  Upon convergence, the updates stop and one has the final classification model.)
However, Li does not teach the gradient loss function comprising a distance factor representing a distance between a first category and a second category, the first category being a category to which the predicted classification belongs, and the second category being a category to which the true classification belongs.
McDermott teaches the gradient loss function comprising a distance factor representing a distance between a first category and a second category, the first category being a category to which the predicted classification belongs, and the second category being a category to which the true classification belongs. (McDermott, Abstract Lines 7-10, discloses:  “Furthermore, we define a new MCE/GPD loss function that can incorporate word spotting errors and other measures of symbolic distance between correct and incorrect categories.” Here, McDermott discloses that the loss function comprises a distance between a predicted category and the true category.  McDermott, Intro Lines 5-7, further discloses a classification problem in which gradient descent is used:  “GPD allows us to perform gradient descent on a classification loss measure that closely reflects the misclassification rate”  McDermott, pg 294 Top of left column, discloses more details on possible distance metrics:  “For a classification problem of M categories, we consider a matrix A of inter-category symbolic distances djk between categories j and k. These distances could be many kinds of distance, such as distances between syntactic parse trees”).

It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the gradient boosting of Li, with the distance-based loss function of McDermott. The modification would have been obvious because one of ordinary skill in the art would be motivated to accelerate training by using more informative loss values (McDermott, Sec 3.2 Para 2:  “In MCE/GPD so far, whenever a category is misrecognized as another category, the (ideal) loss is considered to be 1. In a grammar-constrained task, where the categories are taken to be the strings allowed by the grammar, any difference, however slight, between the correct string and incorrect strings will be treated as a loss of 1; a very large difference between correct and incorrect strings will be counted in the same way, as a loss of 1. Thus, the above approach to word spotting in continuous speech is optimizing correct string recognition. It may be desirable to consider more revealing error counts when comparing correct and incorrect strings. For instance, a word spotting error count is a practical way of evaluating the performance of a system attempting to recognize continuous speech. The following extension of the MCE/GPD loss allows one to incorporate word spotting error and other symbolic distances between categories into the MCE/GPD training framework. Training will then heavily penalize the recognition of strings whose symbolic contents are very different from the correct string.”)

As per Claim 2, the combination of Li and McDermott teaches the method according to claim 1 as shown above, as well as further comprising: receiving, by the device, to-be-classified (Li, Page 48, discloses receiving to-be-classified data:  “Data Set” with “20000 data points”)
classifying, by the device, the to-be-classified data by using the final classification model;  (Li, Page 48, discloses classifying the data: “Recognize the given hand written capital letter”.  Li, Page 71, discloses using the final classification model:  “Iterate until converge”)
and outputting, by the device, the classification result. (Li, Page 66, displays an output of the classification result, which is a graph of all 26 letters, showing “G” with the highest probability)

As per Claim 3, the combination of Li and McDermott teaches the method according to claim 1 as shown above, as well as wherein: the performing classification training on an initial classification model by using the training parameter, to obtain the predicted classification comprises: performing, by the device, iterative calculation on the initial classification model by using the training parameter, to obtain a predicted classification generated by a classification model used in each iteration (Li, Page 48, discloses obtaining a training sample, as Li discloses a “Data Set” with “20000 data points”.  Li, Page 48, also discloses “16 features”, being incorporated on Page 49 into one “feature vector”.  In machine learning training, a weight, or parameter, must be applied to this feature, as it is this which is trained.  Li, Page 71, discloses starting with an initial classification model and performing an iterative calculation:  “Start with initial models…iterate until converge”.  Li discloses obtaining a predicted classification in each iteration in Pages 58-66, where Li shows the probability distribution at each iteration.)
(Li, Page 27, discloses residual:  “yi – F(xi) are called residuals”.  Li, Page 13, indicates that yi is the true value and F(xi) is the predicted value:  “You are given (x1; y1); (x2; y2); …; (xn; yn), and the task is to fit a model F(x)”.  Li, Page 68, discloses “Give any differentiable loss function L”.  Li, Page 70, discloses that differentiating this loss function L produces a gradient:  “calculate negative gradients for class A”, and discloses an equation that shows this gradient is produced by differentiating the loss function.  Li, Page 71, discloses that this negative gradient equivalent to the true value minus the predicted value: “(YA(xi) – PA(xi))”, and this is known as a residual.)
and the modifying the initial classification model according to the residual to obtain the final classification model comprises: modifying, by the device according to a residual determined in the Mth iteration, a classification model used in the Mth iteration to obtain a classification model used in the (M+1)th iteration, and obtaining, by the device, the final classification model by means of at least one iterative modification, the classification model used in the Mth iteration being obtained by modifying a classification model used in the (M-1)th iteration according to a residual determined in the (M-1)th iteration, and M being a positive integer greater than 1.  (Li, Page 71, discloses “calculate negative gradients”, which are residuals.  Li, Page 71, also discloses modifying the initial classification model according to the residual:  “fit a regression tree hA to the negative gradients” and “FA := FA + rhoA*hA”.  Here, the assignment operator “:=” means that model F is updated for the next iteration M+1, based on the previous model F from the previous iteration M and the residual h also from the previous iteration M.  Li, Page 71, also discloses a final classification model, as they state “iterate until converge”.  Upon convergence, the updates stop and one has the final classification model. If the final model is considered iteration M, then it was created with the same assignment operator “:=”, with previous model F and residual h from previous iteration M-1).

As per Claim 4, the combination of Li and McDermott teaches the method according to claim 3 as shown above, as well as further comprising: receiving, by the device, to-be-classified data (Li, Page 48, discloses receiving to-be-classified data:  “Data Set” with “20000 data points”)
classifying, by the device, the to-be-classified data by using the final classification model;  (Li, Page 48, discloses classifying the data: “Recognize the given hand written capital letter”.  Li, Page 71, discloses using the final classification model:  “Iterate until converge”)
and outputting, by the device, the classification result. (Li, Page 66, displays an output of the classification result, which is a graph of all 26 letters, showing “G” with the highest probability)

As per Claim 5, the combination of Li and McDermott teaches the method according to claim 3 as shown above.  Li also teaches wherein the determining, according to the gradient loss function of the initial classification model, the residual between the true classification and (Li, Page 27, discloses residual:  “yi – F(xi) are called residuals”.  Li, Page 13, indicates that yi is the true value and F(xi) is the predicted value:  “You are given (x1; y1); (x2; y2); …; (xn; yn), and the task is to fit a model F(x)”.  Li, Page 68, discloses “Give any differentiable loss function L”.  Li, Page 69, discloses a training parameter:  “a matrix of parameters to optimize”.  Li, Page 70, discloses that differentiating this loss function L produces a gradient:  “calculate negative gradients for class A”, and discloses an equation that shows this gradient is produced by differentiating the loss function.  Li, Page 71, discloses that this negative gradient equivalent to the true value minus the predicted value: “(YA(xi) – PA(xi))”, and this is known as a residual.  Li, Page 71, discloses that this is done in each iteration:  “iterate until converge”.)
However, Li does not teach and the distance factor representing the difference between the category to which the true classification belongs and a category to which a predicted classification in each iteration belongs.
McDermott teaches and the distance factor representing the difference between the category to which the true classification belongs and a category to which a predicted classification in each iteration belongs. (McDermott, Abstract Lines 7-10, discloses:  “Furthermore, we define a new MCE/GPD loss function that can incorporate word spotting errors and other measures of symbolic distance between correct and incorrect categories.” Here, McDermott discloses that the loss function comprises a distance between a predicted category and the true category.  McDermott, Intro Lines 5-7, further discloses a classification problem in which gradient descent is used:  “GPD allows us to perform gradient descent on a classification loss measure that closely reflects the misclassification rate”. McDermott, pg 294 Top of left column, discloses more details on possible distance metrics:  “For a classification problem of M categories, we consider a matrix A of inter-category symbolic distances djk between categories j and k. These distances could be many kinds of distance, such as distances between syntactic parse trees”).

As per Claim 8, Claim 8 is an apparatus claim corresponding to method Claim 1.  The difference is that it recites a memory and a processor.  (Li implies the use of a computer device, which comprises memory and a processor, since Li supplies a hyperlink to an online location for the data set, and also on Page 2 Li supplies a Github link to a code implementation. Such code implementation must be run on a computer.)  Claim 8 is rejected for the same reasons as Claim 1.

As per Claim 9, Claim 9 is an apparatus claim corresponding to method Claim 2.  The difference is that it recites a memory and a processor.  Claim 9 is rejected for the same reasons as Claim 2.

As per Claim 10, Claim 10 is an apparatus claim corresponding to method Claim 3.  The difference is that it recites a memory and a processor.  Claim 10 is rejected for the same reasons as Claim 3.

As per Claim 11, Claim 11 is an apparatus claim corresponding to method Claim 4.  The difference is that it recites a memory and a processor.  Claim 11 is rejected for the same reasons as Claim 4.

As per Claim 12, Claim 12 is an apparatus claim corresponding to method Claim 5.  The difference is that it recites a memory and a processor.  Claim 12 is rejected for the same reasons as Claim 5.

As per Claim 15, Claim 15 is a non-transitory computer readable storage medium claim corresponding to method Claim 1.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  (Li implies the use of a computer device, which comprises a non-transitory computer readable storage medium and a processor, since Li supplies a hyperlink to an online location for the data set, and also on Page 2 Li supplies a Github link to a code implementation. Such code implementation must be run on a computer, and the code must be saved on a non-transitory computer readable storage medium.)  Claim 15 is rejected for the same reasons as Claim 1.

As per Claim 16, Claim 16 is a non-transitory computer readable storage medium claim corresponding to method Claim 2.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  Claim 16 is rejected for the same reasons as Claim 2.

As per Claim 17, Claim 17 is a non-transitory computer readable storage medium claim corresponding to method Claim 3.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  Claim 17 is rejected for the same reasons as Claim 3.

As per Claim 18, Claim 18 is a non-transitory computer readable storage medium claim corresponding to method Claim 5.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  Claim 18 is rejected for the same reasons as Claim 5.

Claims 6-7, 13-14, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Friedman (“Greedy Function Approximation:  A Gradient Boosting Machine”) and McDermott.
As per Claim 6, Li teaches the method according to claim 5 as shown above, as well as determining, according to the training parameter and the true classification, the residual between the true classification and the predicted classification model (Li, Page 27, discloses residual:  “yi – F(xi) are called residuals”.  Li, Page 13, indicates that yi is the true value and F(xi) is the predicted value:  “You are given (x1; y1); (x2; y2); …; (xn; yn), and the task is to fit a model F(x)”.  Li, Page 68, discloses “Give any differentiable loss function L”.  Li, Page 70, discloses that differentiating this loss function L produces a gradient:  “calculate negative gradients for class A”, and discloses an equation that shows this gradient is produced by differentiating the loss function.  Li, Page 71, discloses that this negative gradient equivalent to the true value minus the predicted value: “(YA(xi) – PA(xi))”, and this is known as a residual).
However, Li does not teach wherein the determining, according to the training parameter and the true classification, the residual between the true classification and the predicted classification in each iteration comprises: determining, by the device, a residual between a predicted classification generated in the kth iteration and the true classification by using the following formulas:

    PNG
    media_image1.png
    168
    430
    media_image1.png
    Greyscale

x is the training parameter, i is a positive integer greater than 1, Yk is the true classification, Yik is the residual between the predicted classification generated in the kth iteration and the true classification, Pk(x) is a prediction probability function of the kth iteration, F, ) is a prediction function of the kth iteration, Dyk is a distance factor representing a distance between the category to which the true classification belongs and a category to which the predicted classification of the kth iteration belongs, F(xi) is a modification prediction function of the kth iteration, F is a modification prediction function of the lth iteration, and a value of 1 ranges from 1 to K, wherein K is a quantity of classes of the true classification.

Friedman teaches wherein the determining, according to the training parameter and the true classification, the residual between the true classification and the predicted classification in each iteration comprises: determining, by the device, a residual between a predicted classification generated in the kth iteration and the true classification by using the following formulas:

    PNG
    media_image2.png
    166
    501
    media_image2.png
    Greyscale

x is the training parameter, i is a positive integer greater than 1, Yk is the true classification, Yik is the residual between the predicted classification generated in the kth iteration and the true classification, Pk(x) is a prediction probability function of the kth iteration, F, ) is a prediction function of the kth iteration, Dyk is a distance factor representing a distance between the category to which the true classification belongs and a category to which the predicted classification of the kth iteration belongs, F(xi) is a modification prediction function of the kth iteration, F is a modification prediction function of the lth iteration, and a value of 1 ranges from 1 to K, wherein K is a quantity of classes of the true classification.
(First note that Li, Page 55, discloses under “Loss Function for each data point”:  “Step 3 Calculate the difference between the true probability distribution and the predicted probability distribution.  Here we use KL divergence.”
Examiner notes that KL divergence is defined as:

    PNG
    media_image3.png
    60
    289
    media_image3.png
    Greyscale

Which is equivalent to:

    PNG
    media_image4.png
    49
    411
    media_image4.png
    Greyscale

Friedman, Section 4.6 “Multi class logistic regression and classification”, teaches:
“Here we develop a gradient descent boosting algorithm for the K-class problem.  The loss function is”

    PNG
    media_image5.png
    57
    298
    media_image5.png
    Greyscale

When considering the K-class problem (such as Li’s character recognition which is a 26-class problem), it is the case that the true probability distribution is a vector of several 0’s and one 1.  Therefore, when applied to the Wikipedia DKL equation, one of ordinary skill in the art can see that the second term of the equation in this case is always zero, because either p(x) itself is 0 or p(x) is 1, which then makes log p(x) to be equal to 0.  So, the product p(x)*log p(x) is always 0.  Then one arrives at Friedman’s loss function which is the first term of the KL equation.  Therefore, Friedman is also using KL divergence in the loss function for the K-class problem.  It is therefore established that Li (“Here we use KL divergence”) is simply using Friedman’s method (and Li cites Friedman on Page 80). 
Li, Page 71, discloses “Calculate negative gradients”, and shows this to be equivalent to the residual:  “YA (xi) – PA(xi)”.  Friedman provides more detail on this residual:

    PNG
    media_image6.png
    162
    695
    media_image6.png
    Greyscale

Here, the residuals calculated using KL divergence of Li are shown to result in the claimed equations (without the claimed distance factor Dyk).
Li and Friedman are analogous art because they are in the field of endeavor of machine learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the gradient boosting of Li, with the function of Friedman. The modification would have been obvious because one of ordinary skill in the art would be motivated to improve accuracy of the classification model even when using suboptimal data (Friedman, Abstract:  “Gradient boosting of regression trees produces competitive, highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data.”)
The combination of Li and Friedman thus far fails to teach Dyk is a distance factor representing a distance between the category to which the true classification belongs and a category to which the predicted classification of the kth iteration belongs

    PNG
    media_image7.png
    182
    492
    media_image7.png
    Greyscale

McDermott teaches and the distance factor representing the difference between the category to which the true classification belongs and the category to which the predicted classification in each iteration belongs

    PNG
    media_image8.png
    169
    291
    media_image8.png
    Greyscale

(McDermott, Abstract Lines 7-10, discloses:  “Furthermore, we define a new MCE/GPD loss function that can incorporate word spotting errors and other measures of symbolic distance between correct and incorrect categories.” Here, McDermott discloses that the loss function comprises a distance between a predicted category and the true category.  McDermott, Intro Lines 5-7, further discloses a classification problem in which gradient descent is used:  “GPD allows us to perform gradient descent on a classification loss measure that closely reflects the misclassification rate”. McDermott, pg 294 Top of left column, discloses more details on possible distance metrics:  “For a classification problem of M categories, we consider a matrix A of inter-category symbolic distances djk between categories j and k. These distances could be many kinds of distance, such as distances between syntactic parse trees”). 
McDermott, pg 294 3rd column, also discloses:  “Multiplying this expression, for each incorrect category a, by the inter-category symbolic distance djk, and summing over all incorrect categories thus gives an aggregate, weighted symbolic distance between the correct category k and all other categories. Multiplying this aggregate distance by the usual loss L() gives the new loss L2().”  Here, McDermott discloses scaling the Loss function to create a new Loss function, by multiplying by a distance between the correct category and incorrect categories.  Applying a scale factor Dyk to the loss function of Li and Friedman:
 
    PNG
    media_image9.png
    165
    280
    media_image9.png
    Greyscale

Results in the claimed equation

    PNG
    media_image8.png
    169
    291
    media_image8.png
    Greyscale

Li, Friedman, and McDermott are analogous art because they are in the field of endeavor of machine learning.


As per Claim 7, the combination of Li, McDermott, and Friedman teaches the method according to claim 6 as shown above, as well as further comprising: receiving, by the device, to-be-classified data (Li, Page 48, discloses receiving to-be-classified data:  “Data Set” with “20000 data points”)
(Li, Page 48, discloses classifying the data: “Recognize the given hand written capital letter”.  Li, Page 71, discloses using the final classification model:  “Iterate until converge”)
and outputting, by the device, the classification result. (Li, Page 66, displays an output of the classification result, which is a graph of all 26 letters, showing “G” with the highest probability)

As per Claim 13, Claim 13 is an apparatus claim corresponding to method Claim 6.  The difference is that it recites a memory and a processor.  (Li implies the use of a computer device, which comprises memory and a processor, since Li supplies a hyperlink to an online location for the data set, and also on Page 2 Li supplies a Github link to a code implementation. Such code implementation must be run on a computer.)  Claim 13 is rejected for the same reasons as Claim 6.

As per Claim 14, Claim 14 is an apparatus claim corresponding to method Claim 7.  The difference is that it recites a memory and a processor.  Claim 14 is rejected for the same reasons as Claim 7.

As per Claim 19, Claim 19 is a non-transitory computer readable storage medium claim corresponding to method Claim 6.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  (Li implies the use of a computer device, which comprises a non-transitory computer readable storage medium and a processor, since Li supplies a hyperlink to an online location for the data set, and also on Page 2 Li supplies a Github link to a code implementation. Such code implementation must be run on a computer, and the code must be saved on a non-transitory computer readable storage medium.)  Claim 19 is rejected for the same reasons as Claim 6.

As per Claim 20, Claim 20 is a non-transitory computer readable storage medium claim corresponding to method Claim 7.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  Claim 20 is rejected for the same reasons as Claim 7.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Gong et. al. (US 9,552,549 B1) discloses a multiple classification problem of image labeling, in which the level of error in the predicted value is determined by a semantic loss ranking which may be based on a distance between the predicted label and the correct label on a tree that connects the labels semantically
Jin et. al. (US 2010/0250253 A1) [0008] discloses a weight that indicates the difference of a predicted rank position at an iteration and a true rank position
Mohamed et. al. (US 2008/0101705 A1) [0068] discloses an objective function that depends on the difference between output of a neural network and the correct class labels
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710.  The examiner can normally be reached on M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/L.A.S./Examiner, Art Unit 2126                                                                                                                                                                                                        



/BABOUCARR FAAL/Primary Examiner, Art Unit 2184