DETAILED ACTION
This action is responsive to the Application filed on 08/02/2022. Claims 2,3,4 and 6 are canceled. Claims 1, 5, 7, 8, 9, 11 and 13 are amended. Claims 1, 5 and 7-14 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 08/02/2022 has been entered.

Response to Arguments
Applicant's arguments filed 08/02/2022 have been fully considered they are persuasive, but moot.
	
With respect to Claim 1
	Applicant argues that while Wen teaches a CNN which uses softmax loss and a center loss, Wen fails to teach the Copernican loss as newly amended. Examiner agrees. Examiner points out the amendments effectively roll up limitations which were previously recited in now canceled claims 3 and 4. These limitations are taught not by Wen but by Liu et al “Learning deep features via congenerous cosine loss for person recognition”. Accordingly the rejection has been updated to reflect the changes. 
	Further Examiner points out that applicant did not comment on the suitability of the art, Liu, to teach these limitations
	Examiner agrees with applicant’s characterization of Wen in some ways. Wen teaches a center loss that is not equivalent to the Copernican loss, but similar to the planetary loss function. Wen describes a center loss function which augments a soft max function, Wen does not teach the claimed sun loss function. Liu however is relied upon for this limitation.

Claim Interpretation 
	Examiner notes that the amended limitations of at least claim 1 is directed to a method for training a deep neural network to learn an increased discrimination by refining a deep neural network with backpropogating according to a particular type of loss function. The loss function is only passively “determined” as part of the backpropogating step, there is no active “determination” of a “loss” to be propagated. Therefore, are not given patentable weight. For the purposes of compact prosecution, the prior art rejections are set forth below.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5, 7, 11, and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wen et al. “A Discriminative Feature Learning Approach for Deep Face Recognition” hereinafter Wen, further in view of Liu et al “Learning Deep Features via Congenerous Cosine Loss for Person Recognition” hereinafter Liu.

Regarding claim 1
Wen teaches, A method, in a deep neural network, for training the network to learn an increased discrimination of feature vectors comprising (Abstract “With the joint supervision of softmax loss and center loss, we can train a robust CNNs”) inputting a batch of training samples to the deep neural network; receiving a batch of feature vectors generated by the deep neural network; (pg 5 “instead of updating the centers with respect to the entire training set, we perform the update based on mini-batch. In each iteration, the centers are computed by averaging the features of the corresponding classes” the loss function is described as operating on mini-batches of a training set, which includes inputting such training samples into a neural network. Training a neural network according to a mini-batch includes receiving the batch of feature vectors.) and refining the deep neural network by backpropagating a loss representing differences between the batch of feature vectors and ground truth results into the deep neural network, ( pg 6 Algorithm 1 
    PNG
    media_image1.png
    349
    732
    media_image1.png
    Greyscale
algorithm 1 clearly demonstrates backpropogation error computations are used to refine parameters of the neural network, 
    PNG
    media_image2.png
    119
    594
    media_image2.png
    Greyscale
further the loss function incorporates the differences between the training data label, xi, or ground truth, and the feature vectors Cyi.) the loss determined by:  a softmax loss function  (pg 3 ¶01 “The CNNs are trained under the joint supervision of the softmax loss and center loss” Wen describes a joint loss function one of the terms is a softmax loss function) a Copernican loss ( Section 3.2 “Intuitively, minimizing the intra-class variations while keeping the features of different classes separable is the key. To this end, we propose the center loss function… 
    PNG
    media_image3.png
    64
    265
    media_image3.png
    Greyscale
” center loss corresponds to the Copernican loss because it enlarges inter-class variation while minimizing intra-class or inner-class variation.)
Wen does not explicitly teach, a planetary loss function that minimizes intra-class variation of the feature vectors by minimizing a cosine distance of the feature vectors by minimizing a cosine distance of the feature vectors to their corresponding planet centers; and a sun loss function that maximizes the inter-class variation of the feature vectors by maximizing a cosine distance of the feature vectors away from a mean of the batch of training samples
However Liu when addressing issues related a center loss function that employs cosine similarity teaches, a planetary loss function that minimizes intra-class variation of the feature vectors by minimizing a cosine distance of the feature vectors by minimizing a cosine distance of the feature vectors to their corresponding planet centers; ( Section 3.2 pg 3  “We first define the cosine similarity of two features from a mini-batch B as: 
    PNG
    media_image4.png
    36
    307
    media_image4.png
    Greyscale
” the cosine similarity function is a function for computing the cosine distance between two vectors. Section 3.2 pg 3 “we have the following output of sample i to maximize:…
    PNG
    media_image5.png
    74
    340
    media_image5.png
    Greyscale
… The numerator ensures sample i is close enough to its own class li” the value Pli is minimized thus the numerator is minimized, the numerator is a measure of the cosine similarity between the feature vectors and ‘its own class’ corresponding to the claimed planet centers. As described in the specification ¶009 the ‘planet center’ describes the center for each class. Therefore, the numerator is the planetary loss function claimed) and a sun loss function that maximizes the inter-class variation of the feature vectors by maximizing a cosine distance of the feature vectors away from a mean of the batch of training samples ( Section 3.2 pg 3 “we define
the centroid of class k as the average of features over a minibatch B:
 
    PNG
    media_image6.png
    73
    287
    media_image6.png
    Greyscale
” pg 2 ¶02 To this end, we propose a congenerous cosine loss, namely COCO, to enlarge the inter-class distinction as well as narrow down the inner-class variation” the quantity Ck is the corresponds to the claimed ‘mean of the batch of training samples’ in a mini batch B. As shown in the equation above for Pli, the denominator is the cosine distance between this quantity, Ck, and the feature vectors. Minimizing the quantity Pli is accomplished by maximizing the denominator, i.e. the inter-class variation.)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a center loss that measures distance using cosine similarity as an alternative to mean squared error as taught by Liu to the disclosed invention of Wen.
One of ordinary skill in the arts would have been motivated to make this modification so that one could implement “a…loss is that we directly compare and optimize the cosine distance (similarity) between two features… The cosine similarity measures how close two samples are in the feature space” in contrast regarding the MSE based loss “the positives of one class will get as far as possible from the negatives of other classes. It optimizes the inter-class distance in some sense but fails to differentiate among the negatives” (Section 3.3 and Section 3.2)

Regarding claim 5
Wen teaches/Liu claim 1
Further Wen teaches, wherein the Copernican loss includes a loss weight enforcing a trade-off between the softmax loss function and the Copernican loss function. ( pg 6 “We adopt the joint supervision of softmax loss and center loss … The formulation is given in Equation 5…. 
    PNG
    media_image7.png
    20
    116
    media_image7.png
    Greyscale
…Clearly, the CNNs supervised by center loss are trainable and can be optimized by standard SGD. A scalar λ is used for balancing the two loss functions.” )


Regarding claim 7
Wen/Liu teaches claim 1
Further Wen teaches, wherein the Copernican loss is minimized using stochastic gradient descent. ( pg 6 “We adopt the joint supervision of softmax loss and center loss to train the CNNs for discriminative feature learning. The formulation is given in Equation 5…. 
    PNG
    media_image8.png
    23
    118
    media_image8.png
    Greyscale
 Clearly, the CNNs supervised by center loss are trainable and can be optimized by standard SGD” SGD is short for stochastic gradient descent. The loss is made up of a primary and secondary loss which are optimized using the SGD algorithm.)
Alternatively Liu teaches, wherein the Copernican loss is minimized using stochastic gradient descent ( pg 5 Section 4.1 “We use stochastic gradient descent” in reference to training their model which uses COCO loss.)


Regarding claim 11
Wen/Liu teaches claim 1
Further Wen teaches, wherein each feature vector is adjusted based on a …gradient function. (pg 6 “In Algorithm 1, we summarize the learning detailsin the CNNs with joint supervision…. 
    PNG
    media_image9.png
    397
    868
    media_image9.png
    Greyscale
”  in step 5-7 the parameters are updated based on the loss gradient. Changing the parameters of the network consequently changes the feature vectors.)
Further Liu teaches,  a planetary loss gradient function representing a derivative of the planetary loss function with respect to each feature vector. ( pg 4 “The derivative of loss L(i) w.r.t. the input feature f(i), written in an element-wise form and dropping sample index i for brevity, is as follow… 
    PNG
    media_image10.png
    73
    291
    media_image10.png
    Greyscale
” the gradient loss function for the feature vectors is given with respect to each feature vector fj, there are two loss gradients one of which can be considered the planetary loss gradient because it is the gradient of a loss function which is composed of a planetary loss function. As addressed in the rejection of claim 1)


Regarding claim 13
Wen/Liu teaches claim 1
Further Wen teaches, wherein each feature vector is adjusted based on a …gradient function. (pg 6 “In Algorithm 1, we summarize the learning detailsin the CNNs with joint supervision…. 
    PNG
    media_image9.png
    397
    868
    media_image9.png
    Greyscale
”  in step 5-7 the parameters are updated based on the loss gradient. Changing the parameters of the network consequently changes the feature vectors.)
Further Liu teaches,  a sun loss gradient function representing a derivative of the sun loss function with respect to each feature vector. ( pg 4 “The derivative of loss L(i) w.r.t. the input feature f(i), written in an element-wise form and dropping sample index i for brevity, is as follow… 
    PNG
    media_image10.png
    73
    291
    media_image10.png
    Greyscale
” the gradient loss function for the feature vectors is given with respect to each feature vector fj, there are two loss gradients one of which can be considered the sun loss gradient because it is the gradient of a loss function which is composed of a sun loss function, As addressed in the rejection of claim 1)



Allowable Subject Matter
Claims 8, 9, 10, 12 and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.	
Specifically, none of the reference of record either alone or in combination fairly disclose or suggest the limitations of claim 9.
Further examiner notes, as noted in the claim interpretation section, the loss function should be positively claimed to be given patentable weight.

	The closest prior art of record Zhang et al (“Deep Metric Learning with Improved Triplet Loss for Face Clustering in Videos”) teaches a secondary loss function which is the sum of two terms. Where on term minimized intra class variation and another maximizes inter class variation. However Zhang’s loss function is based on the Euclidean distance between feature vectors, not between a feature vector in a batch and a class center. 
Further, the claimed equation measures “angular distance” rather than “Euclidean distance” as described in the Specification ¶0020. Further Liu et al (“Learning Deep Features via Congenerous Cosine Loss for Person Recognition”) discloses COCO loss which uses “angular distance” to minimizes intra class variation while maximizing inter-class variation, however this formulation maximizes inter class variation against class specific centroids instead of the centroid of an entire batch of samples irrespective of class. 
	Furthermore, with respect to claim 8, Cai et al. “Island Loss for Learning Discriminative Features in Facial Expression Recognition” discloses a loss function which is a summation of the softmax loss the center loss and the ‘pairwise distances between class centers in the feature space’. While the composite loss function of Cai achieves a similar goal, namely minimizing intra-class distance while maximizing inter-class variation. The pair wise distances between class centers maximized by Cai’s third loss term is not equivalent to the claimed limitation “maximizing the distance of …feature vectors away from a mean of the batch of training samples”.
It would not have been obvious to one of ordinary skill in the art before the effective filling data to combine these references to teach at least the limitations of claim 8 or claim 9 by extension.


Conclusion
Prior art
Wu et al. “Deep Face Recognition with Center Invariant Loss” discusses a loss function that minimizes the sum of 3 loss functions including: a soft max loss, a center invariant loss for inter class variations, and center loss for intra class variations.
Deng et al “ArcFace: Additive Angular Margin Loss for Deep Face Recognition” incorporates angular and cosine margin to enhance the discriminative power of soft-max loss.
Cai et al. “Island Loss for Learning Discriminative Features in Facial Expression Recognition” disclose island loss to augment softmax loss function with the purpose of reducing intra-class variation while enlarging inter-class differences simultaneously. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached M-F 7:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/J.R.G./Examiner, Art Unit 2122                                                                                                                                                                                                        
/BRIAN M SMITH/
 Primary Examiner, Art Unit 2122