DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Applicant’s submission filed 2022-11-16 has been entered.  Applicant’s amendments to the claims have overcome the objections set forth in the previous office action.  The status of claims is as follows:
Claims 1-14 are pending in the application.
Claims 1 and 8 are amended.
Response to Arguments
Applicant’s arguments with respect to rejections under 35 USC 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.  The Guizilini secondary reference has been replaced by Huang.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 7-10, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Cui et al. (“Class-Balanced Loss Based on Effective Number of Samples”; hereinafter “Cui”) in view of Huang et al. (“Cost-sensitive sparse linear regression for crowd counting with imbalanced training data”; hereinafter “Huang”)
As per Claim 1, Cui teaches A method for training a machine learning model comprising:  
5receiving, by a computer system comprising a processor and memory, a training data set comprising imbalanced data (Cui, Page 7 Top Right Paragraph, discloses a computer system comprising a processor, which must also have a memory:  “Models on CIFAR were trained with batch size of 128 on a single NVIDIA Titan X GPU for 200 epochs.”  Cui, Page 4 Section 4, discloses a training data set comprising imbalanced data:  “The Class-Balanced Loss is designed to address the problem of training from imbalanced data by introducing a weighting factor that is inversely proportional to the effective number of samples.”)
computing, by the computer system, a label density fx(x) of the training data set (Cui, Page 4 Section 4 Para 2, discloses:  “For an input sample x with label y e {1, 2, …, C}, where C is the total number of classes, suppose the model’s estimated class probabilities are p = [p1, p2, …, pC]T.”  Here, Cui discloses a label density of the training data set, as they disclose a probability of each class label occurring, which is a discrete probability density function, and thus a label density.)
computing, by the computer system, a weight function w(x) comprising a term that is inversely proportional to the label density (Cui, Page 4 Section 4 Para 3, discloses:  “To balance the loss, we introduce a weighting factor αi that is inversely proportional to the effective number of samples for class i: αi ~ 1 / Eni. To make the total loss roughly in the same scale when applying i, we normalize i so that ΣCi=1 αi = C. For simplicity, we abuse the notation of 1 / Eni to denote the normalized weighting factor in the rest of our paper.”  Here, Cui discloses a weight function (“1 / Eni to denote the normalized weighting factor”) that is inversely proportional to the label density (“inversely proportional to the effective number of samples for class”)).
10weighting, by the computer system, a loss function L(x,                         
                            
                                
                                    x
                                
                                ^
                            
                        
                    ) in accordance with the weight function to generate a weighted loss function Lw(x,                         
                            
                                
                                    x
                                
                                ^
                            
                        
                    ), where x represents a value of the training data set and where                         
                            
                                
                                    x
                                
                                ^
                            
                        
                     represents a [continuous] value predicted by the [continuous] machine learning model. (Cui, Page 4 Section 4 Para 4, discloses:  “Formally speaking, given a sample from class i that contains ni samples in total, we propose to add a weighting factor (1 - β) / (1 -  βni) to the loss function, with hyperparameter β e [0, 1). The class-balanced (CB) loss can be written as: 

    PNG
    media_image1.png
    45
    290
    media_image1.png
    Greyscale

where ny is the number of samples in the ground-truth class y.”  Here, Cui discloses a loss function (“L(p, y)”) that is weighted by the weight function (“1 / Eny”) to generate a weighted loss function (“CB(p, y)”).  Cui’s loss function represents the difference between the labeled training value and the predicted value, as described in Page 4 Section 4 Para 2:  “For an input sample x with label y e {1, 2, … , C}, where C is the total number of classes, suppose the model’s estimated class probabilities are p = [p1, p2, …, pC]T.”, which is analogous to the recited x and                         
                            
                                
                                    x
                                
                                ^
                            
                        
                     in Cui’s equation L(p, y).)
training, by the computer system, a continuous machine learning model in accordance with the training data set and the weighted loss function Lw(x,                         
                            
                                
                                    x
                                
                                ^
                            
                        
                    ) to compute a trained [continuous] machine learning model [configured to compute a prediction of a continuous value] (Cui, Page 5 Section 5, discloses:  “The proposed class-balanced losses are evaluated on artificially created long-tailed CIFAR [24] datasets with controllable degrees of data imbalance and real-world longtailed datasets iNaturalist 2017 [40] and 2018 [1]. To demonstrate our loss is generic for visual recognition, we also present experiments on ImageNet data (ILSVRC 2012 [33]). We use deep residual networks (ResNet) [16] with various depths and train all networks from scratch.”  Here, Cui discloses applying their technique to image recognition, which comprises classifying images into a discrete set of labels.  One of ordinary skill in the art will appreciate that the domain of the training data is continuous, as images comprise real-numbered pixel values.  Thus, Cui discloses training (“train all networks”) with the training data set (“CIFAR”, “ImageNet”) and the weighted loss function (“class-balanced losses”) to result in a trained continuous machine learning model.)
and 15outputting, by the computer system, the trained [continuous] machine learning model (Cui, Page 6 last line of Section 5.2, discloses:  “Code, data and pre-trained models are available at: https://github.com/richardaecn/ class-balanced-loss.”  Here, Cui discloses outputting the trained model, as they have published the “pre-trained models”.)
However, Cui does not explicitly teach continuous machine learning model configured to compute a prediction of a continuous value
Huang explicitly teaches continuous machine learning model configured to compute a prediction of a continuous value (Huang, Page 2 Section 2.1, discloses:  “According to the extracted image features, a sparse linear regression model is trained. Accordingly, the modelling errors associated all training data can be calculated and are used to design the weighting factor for each training image. In the training stage 2, a cost-sensitive sparse linear regression model (CS-SLR) will be jointly learned using input training data and their associated weighting factors.”  Here, Huang discloses that a “a cost-sensitive sparse linear regression model (CS-SLR) will be jointly learned”.  One of ordinary skill in the art will appreciate that a regression model predicts a continuous value, as acknowledged in multiple areas of the Instant Specification, in particular [0039]:  “In particular, aspects of embodiments of the present disclosure relate to addressing imbalanced data in making predictions of continuous values (as opposed to discrete classifications), such as in regression models.”  Examiner notes that Huang is directed to “crowd counting” (Huang, Page 2 Section 2.3:  “As a kind of linear regression, RR has shown a promising performance for crowd counting regression”).  Although a “count” is a discrete integer value, Huang is still using a regression model to do estimate the count, and said regression model will result in a continuous, real-valued prediction that estimates the crowd size (and could subsequently be rounded.)  Therefore, Huang computes a prediction of a continuous value.)
Cui and Huang are analogous art because they are both in the field of endeavor of machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the convolutional neural network for image classification with weighted loss function of Cui, with the cost sensitive regression of Huang.  One of ordinary skill in the art would be motivated to do so in order to improve the accuracy of a regression model by accounting for imbalance in the training data set (Huang, Page 3 Section 2.4:  “It is noted that, for imbalanced data classification problem, the standard classifiers generally perform poorly because they just pursue the minimization of the overall error and ignore the errors of minor classes, which lead the model to have a tendency to misclassify data to be major classes. The similar observation is found in linear regression problem. Due to the imbalance of data, the errors of minor classes contribute few to the overall errors, therefore the estimated model will have a bias toward major classes. Consequently, test samples belonging to the minor classes will have high errors. Fortunately, cost-sensitive learning can handle the imbalanced classification problem by assuming higher misclassification costs with samples in minor class [17]. In order to solve the similar problem in regression, here we introduce cost items into the regression learning framework and propose a new cost-sensitive SLR (CS-SLR) model as follows.”)

As per Claim 2, the combination of Cui and Huang teaches the method of claim 1.  Cui teaches wherein the label density fx(x) is a probability density function of the training data set. (Cui, Page 4 Section 4 Para 2, discloses:  “For an input sample x with label y e {1, 2, …, C}, where C is the total number of classes, suppose the model’s estimated class probabilities are p = [p1, p2, …, pC]T.”  Here, Cui discloses a label density of the training data set, as they disclose a probability of each class label occurring, which is a discrete probability density function.)
	
As per Claim 3, the combination of Cui and Huang teaches the method of claim 1.  Cui teaches wherein the weight function w(x) is computed in accordance with a weighting parameter Δ reflecting a ratio between a maximum weight and minimum weight of the weighting function. (Cui, Page 6 right column under Eq 6, discloses:  “We visualize class-balanced loss in Figure 3 as a function of ny for different β. Note that β = 0 corresponds to no re-weighting and  β -> 1 corresponds to re-weighing by inverse class frequency. The proposed novel concept of effective number of samples enables us to use a hyperparameter to smoothly adjust the class-balanced term between no re-weighting and re-weighing by inverse class frequency.”  Here, Cui discloses that the weight function is computed (“class-balanced term”) in accordance to a weighting parameter (“hyperparameter” β) that reflects a ratio (β is between 0 and 1, and is thus a ratio) between a maximum weight and minimum weight of the weighting function (“no re-weighting and re-weighing by inverse class frequency”)).

As per Claim 7, the combination of Cui and Huang teaches the method of claim 1.  Cui teaches wherein training the continuous machine learning model comprises iteratively updating a plurality of parameters of the continuous machine learning model in accordance with gradient descent to minimize the weighted loss function Lw(x,                         
                            
                                
                                    x
                                
                                ^
                            
                        
                    ) with respect to the training data set.  (Cui, Page 6 bottom left column, discloses training using gradient descent:  “We used Tensorflow [2] to implement and train all the models by stochastic gradient descent with momentum.”)

As per Claim 8, Claim 8 is a system claim corresponding to method Claim 1.  Claim 8 is rejected for the same reasons as Claim 1.

As per Claim 9, Claim 9 is a system claim corresponding to method Claim 2.  Claim 9 is rejected for the same reasons as Claim 2.

As per Claim 10, Claim 10 is a system claim corresponding to method Claim 3.  Claim 10 is rejected for the same reasons as Claim 3.

As per Claim 14, Claim 14 is a system claim corresponding to method Claim 7.  Claim 14 is rejected for the same reasons as Claim 7.

Claims 6 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Cui and Huang further in view of Khan et al. (“Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data”; hereinafter “Khan”).
As per Claim 6, Cui teaches the method of claim 1.  Cui teaches wherein weighting the loss function L(x,                         
                            
                                
                                    x
                                
                                ^
                            
                        
                    ) comprises multiplying the loss function L(x,                         
                            
                                
                                    x
                                
                                ^
                            
                        
                    ) by the weight function w(x) to compute the weighted loss function Lw(x,                         
                            
                                
                                    x
                                
                                ^
                            
                        
                    ): 
    PNG
    media_image2.png
    59
    236
    media_image2.png
    Greyscale

Cui, Page 4 Section 4 Para 4, discloses:  “Formally speaking, given a sample from class i that contains ni samples in total, we propose to add a weighting factor (1 - β) / (1 -  βni) to the loss function, with hyperparameter β e [0, 1). The class-balanced (CB) loss can be written as: 

    PNG
    media_image1.png
    45
    290
    media_image1.png
    Greyscale

where ny is the number of samples in the ground-truth class y.”  Here, Cui discloses multiplying a loss function (“L(p, y)”) by the weight function (“1 / Eny”) to generate a weighted loss function (“CB(p, y)”).
	However, although it is common practice for loss functions, Cui does not explicitly disclose taking the mean of the loss function over the number of samples 

    PNG
    media_image3.png
    113
    321
    media_image3.png
    Greyscale

	Khan teaches taking the mean of the loss function over the number of samples (Khan, Page 3576 Section C, discloses: “Our approach addresses the class imbalance problem during
the training of CNNs. For this purpose, we introduce a CoSen error function, which can be expressed as the mean loss over the training set 
    PNG
    media_image4.png
    61
    310
    media_image4.png
    Greyscale

where the predicted output (y) of the penultimate layer (before the loss layer) is parameterized by θ (network weights and biases) and ξ (class-sensitive costs), M is the total number of training examples, d ∈ {0, 1} 1×N is the desired output (s.t. Σn dn := 1), and N denotes the total number of neurons in the output layer.”  Here, Khan discloses summing over the total number of samples M, and dividing by M.)
Khan and the combination of Cui and Huang are analogous art because they are both in the field of endeavor of machine learning on imbalanced data.
	It would have been obvious before the effective filing date of the claimed invention to combine the weighted loss function of the combination of Cui and Huang with the mean cost-sensitive loss of Khan.  One of ordinary skill in the art would be motivated to do so in order to minimize the total loss in such a way that the results of the loss function are not dependent upon the batch size, and thus different sized batches can be evaluated with the loss function (Khan, Page 3576 Section C: “Note that the error is larger when the model performs poorly on the training set. The objective
of the learning algorithm is to find the optimal parameters (θ ∗, ξ∗), which give the minimum possible cost E∗ (4).”)

As per Claim 13, Claim 13 is a system claim corresponding to method Claim 6.  Claim 13 is rejected for the same reasons as Claim 6.

Allowable Subject Matter
Prior art has not been applied to Claims 4-5 and Claims 11-12, as no art has been found which teaches or suggests that recited equations in Claims 4 and 11.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/L.A.S./Examiner, Art Unit 2126                                                                                                                                                                                                        
/VIKER A LAMARDO/Primary Examiner, Art Unit 2126