DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 2019-10-23 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Status
Claims 1-14 are pending in the application.
Claim Objections
Claims 1 and 8 objected to because of the following informalities:  Applicant recites loss functions L(x,                                 
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                            ) and Lw(x,                                 
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                            ), but not definitions for x and                                 
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                            .   In order to prevent any potential confusion, Applicant should define x and                                 
                                    
                                        
                                            x
                                        
                                        ^
                                    
                                
                             accordingly in the claim language.  Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 7-10, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Cui et al. (“Class-Balanced Loss Based on Effective Number of Samples”; hereinafter “Cui”) in view of Guizilini et al. (“Continuous Convolutional Neural Networks for Image Classification”; hereinafter “Guizilini”)
As per Claim 1, Cui teaches A method for training a machine learning model comprising:  
5receiving, by a computer system comprising a processor and memory, a training data set comprising imbalanced data (Cui, Page 7 Top Right Paragraph, discloses a computer system comprising a processor, which must also have a memory:  “Models on CIFAR were trained with batch size of 128 on a single NVIDIA Titan X GPU for 200 epochs.”  Cui, Page 4 Section 4, discloses a training data set comprising imbalanced data:  “The Class-Balanced Loss is designed to address the problem of training from imbalanced data by introducing a weighting factor that is inversely proportional to the effective number of samples.”)
computing, by the computer system, a label density fx(x) of the training data set (Cui, Page 4 Section 4 Para 2, discloses:  “For an input sample x with label y e {1, 2, …, C}, where C is the total number of classes, suppose the model’s estimated class probabilities are p = [p1, p2, …, pC]T.”  Here, Cui discloses a label density of the training data set, as they disclose a probability of each class label occurring, which is a discrete probability density function, and thus a label density.)
computing, by the computer system, a weight function w(x) comprising a term that is inversely proportional to the label density (Cui, Page 4 Section 4 Para 3, discloses:  “To balance the loss, we introduce a weighting factor αi that is inversely proportional to the effective number of samples for class i: αi ~ 1 / Eni. To make the total loss roughly in the same scale when applying i, we normalize i so that ΣCi=1 αi = C. For simplicity, we abuse the notation of 1 / Eni to denote the normalized weighting factor in the rest of our paper.”  Here, Cui discloses a weight function (“1 / Eni to denote the normalized weighting factor”) that is inversely proportional to the label density (“inversely proportional to the effective number of samples for class”)).
10weighting, by the computer system, a loss function L(x,                         
                            
                                
                                    x
                                
                                ^
                            
                        
                    ) in accordance with the weight function to generate a weighted loss function Lw(x,                         
                            
                                
                                    x
                                
                                ^
                            
                        
                    ) (Cui, Page 4 Section 4 Para 4, discloses:  “Formally speaking, given a sample from class i that contains ni samples in total, we propose to add a weighting factor (1 - β) / (1 -  βni) to the loss function, with hyperparameter β e [0, 1). The class-balanced (CB) loss can be written as: 

    PNG
    media_image1.png
    45
    290
    media_image1.png
    Greyscale

where ny is the number of samples in the ground-truth class y.”  Here, Cui discloses a loss function (“L(p, y)”) that is weighted by the weight function (“1 / Eny”) to generate a weighted loss function (“CB(p, y)”)).
training, by the computer system, a continuous machine learning model in accordance with the training data set and the weighted loss function Lw(x,                         
                            
                                
                                    x
                                
                                ^
                            
                        
                    ) to compute a trained [continuous] machine learning model (Cui, Page 5 Section 5, discloses:  “The proposed class-balanced losses are evaluated on artificially created long-tailed CIFAR [24] datasets with controllable degrees of data imbalance and real-world longtailed datasets iNaturalist 2017 [40] and 2018 [1]. To demonstrate our loss is generic for visual recognition, we also present experiments on ImageNet data (ILSVRC 2012 [33]). We use deep residual networks (ResNet) [16] with various depths and train all networks from scratch.”  Here, Cui discloses applying their technique to image recognition, which comprises classifying images into a discrete set of labels.  One of ordinary skill in the art will appreciate that the domain of the training data is continuous, as images comprise real-numbered pixel values.  Thus, Cui discloses training (“train all networks”) with the training data set (“CIFAR”, “ImageNet”) and the weighted loss function (“class-balanced losses”) to result in a trained continuous machine learning model.)
and 15outputting, by the computer system, the trained [continuous] machine learning model (Cui, Page 6 last line of Section 5.2, discloses:  “Code, data and pre-trained models are available at: https://github.com/richardaecn/ class-balanced-loss.”  Here, Cui discloses outputting the trained model, as they have published the “pre-trained models”.)
However, Cui does not explicitly teach continuous machine learning model.
Guizilini explicitly teaches continuous machine learning model (Guizilini, Page 6 Section 3.3, discloses:  “A diagram depicting how the proposed convolutional Hilbert layer can be applied to an image classification task, to create a Continuous Convolutional Neural Network (CCNN), is shown in Figure 3. The original image, composed of discrete data Dn = {x, yn}, with x ∈ R2 being the pixel coordinates and y n = [0, 1]C0 their corresponding intensity values (C0 is the number of input channels, i.e. 1 for grayscale and 3 for color images), is first modeled as a continuous function via projection to a RKHS.”  Here, Guizilini discloses a “Continuous Convolutional Neural Network”).
Cui and Guizilini are analogous art because they are both in the field of endeavor of machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the convolutional neural network for image classification with weighted loss function of Cui, with the continuous convolutional neural network for image classification of Guizilini.  One of ordinary skill in the art would be motivated to do so in order to save resources by requiring smaller network sizes (Guizilini, Pg 10 Conclusion:  “Experimental tests using standard benchmark datasets show that this proposed architecture is able to achieve competitive results with much smaller network sizes, by focusing instead on more descriptive individual filters that are used to extract more complex patterns.”)

As per Claim 2, the combination of Cui and Guizilini teaches the method of claim 1.  Cui teaches wherein the label density fx(x) is a probability density function of the training data set. (Cui, Page 4 Section 4 Para 2, discloses:  “For an input sample x with label y e {1, 2, …, C}, where C is the total number of classes, suppose the model’s estimated class probabilities are p = [p1, p2, …, pC]T.”  Here, Cui discloses a label density of the training data set, as they disclose a probability of each class label occurring, which is a discrete probability density function.)
	
As per Claim 3, the combination of Cui and Guizilini teaches the method of claim 1.  Cui teaches wherein the weight function w(x) is computed in accordance with a weighting parameter Δ reflecting a ratio between a maximum weight and minimum weight of the weighting function. (Cui, Page 6 right column under Eq 6, discloses:  “We visualize class-balanced loss in Figure 3 as a function of ny for different β. Note that β = 0 corresponds to no re-weighting and  β -> 1 corresponds to re-weighing by inverse class frequency. The proposed novel concept of effective number of samples enables us to use a hyperparameter to smoothly adjust the class-balanced term between no re-weighting and re-weighing by inverse class frequency.”  Here, Cui discloses that the weight function is computed (“class-balanced term”) in accordance to a weighting parameter (“hyperparameter” β) that reflects a ratio (β is between 0 and 1, and is thus a ratio) between a maximum weight and minimum weight of the weighting function (“no re-weighting and re-weighing by inverse class frequency”)).

As per Claim 7, the combination of Cui and Guizilini teaches the method of claim 1.  Cui teaches wherein training the continuous machine learning model comprises iteratively updating a plurality of parameters of the continuous machine learning model in accordance with gradient descent to minimize the weighted loss function Lw(x,                         
                            
                                
                                    x
                                
                                ^
                            
                        
                    ) with respect to the training data set.  (Cui, Page 6 bottom left column, discloses training using gradient descent:  “We used Tensorflow [2] to implement and train all the models by stochastic gradient descent with momentum.”)

As per Claim 8, Claim 8 is a system claim corresponding to method Claim 1.  Claim 8 is rejected for the same reasons as Claim 1.

As per Claim 9, Claim 9 is a system claim corresponding to method Claim 2.  Claim 9 is rejected for the same reasons as Claim 2.

As per Claim 10, Claim 10 is a system claim corresponding to method Claim 3.  Claim 10 is rejected for the same reasons as Claim 3.

As per Claim 14, Claim 14 is a system claim corresponding to method Claim 7.  Claim 14 is rejected for the same reasons as Claim 7.

Claims 6 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Cui and Guizilini further in view of Khan et al. (“Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data”; hereinafter “Khan”).
As per Claim 6, Cui teaches the method of claim 1.  Cui teaches wherein weighting the loss function L(x,                         
                            
                                
                                    x
                                
                                ^
                            
                        
                    ) comprises multiplying the loss function L(x,                         
                            
                                
                                    x
                                
                                ^
                            
                        
                    ) by the weight function w(x) to compute the weighted loss function Lw(x,                         
                            
                                
                                    x
                                
                                ^
                            
                        
                    ): 
    PNG
    media_image2.png
    59
    236
    media_image2.png
    Greyscale

Cui, Page 4 Section 4 Para 4, discloses:  “Formally speaking, given a sample from class i that contains ni samples in total, we propose to add a weighting factor (1 - β) / (1 -  βni) to the loss function, with hyperparameter β e [0, 1). The class-balanced (CB) loss can be written as: 

    PNG
    media_image1.png
    45
    290
    media_image1.png
    Greyscale

where ny is the number of samples in the ground-truth class y.”  Here, Cui discloses multiplying a loss function (“L(p, y)”) by the weight function (“1 / Eny”) to generate a weighted loss function (“CB(p, y)”).
	However, although it is common practice for loss functions, Cui does not explicitly disclose taking the mean of the loss function over the number of samples 

    PNG
    media_image3.png
    113
    321
    media_image3.png
    Greyscale

	Khan teaches taking the mean of the loss function over the number of samples (Khan, Page 3576 Section C, discloses: “Our approach addresses the class imbalance problem during
the training of CNNs. For this purpose, we introduce a CoSen error function, which can be expressed as the mean loss over the training set 
    PNG
    media_image4.png
    61
    310
    media_image4.png
    Greyscale

where the predicted output (y) of the penultimate layer (before the loss layer) is parameterized by θ (network weights and biases) and ξ (class-sensitive costs), M is the total number of training examples, d ∈ {0, 1} 1×N is the desired output (s.t. Σn dn := 1), and N denotes the total number of neurons in the output layer.”  Here, Khan discloses summing over the total number of samples M, and dividing by M.)
Khan and the combination of Cui and Guizilini are analogous art because they are both in the field of endeavor of machine learning on imbalanced data.
	It would have been obvious before the effective filing date of the claimed invention to combine the weighted loss function of the combination of Cui and Guizilini with the mean cost-sensitive loss of Khan.  One of ordinary skill in the art would be motivated to do so in order to minimize the total loss in such a way that the results of the loss function are not dependent upon the batch size, and thus different sized batches can be evaluated with the loss function (Khan, Page 3576 Section C: “Note that the error is larger when the model performs poorly on the training set. The objective
of the learning algorithm is to find the optimal parameters (θ ∗, ξ∗), which give the minimum possible cost E∗ (4).”)

As per Claim 13, Claim 13 is a system claim corresponding to method Claim 6.  Claim 13 is rejected for the same reasons as Claim 6.

Prior Art Not Applied
Prior art has not been applied to Claims 4-5 and Claims 11-12, as no art has been found which teaches or suggests that recited equations in Claims 4 and 11.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Liu (CN 110598837 A) discloses, in the Abstract and Summary, adding weighting to the loss function in order to improve the accuracy of the classification of rare classes
Novikov et al. (EP 3355270 A1) discloses a weighted loss function to account for underrepresented anatomical structures in medical imaging
Tao et al. (“Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification”) discloses on Page 39, an SVM-based cost-sensitive ensemble framework, in which minority classes are given greater weight for the cost function
Tomar et al. (“An effective weighted multi-class least squares twin support vector machine for imbalanced data classification”) discloses on Page 769 Section 6, a “Weighted Multi-class Least Squares Twin Support Vector Machine” in which “The proposed approach selects and assigns a weight to the classes according to their size”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/L.A.S./Examiner, Art Unit 2126                                  
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126