DETAILED ACTION
This action is responsive to the Application filed on 04/14/2022. Claims 1-14 are pending in the case.  Claim 1 is independent claims. Claims 4, 10, 12 and 14 are amended. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 04/14/2022 have been fully considered but they are not persuasive. 
	With respect to the rejection under 35 U.S.C. 102
Applicant argues that Wen's primary and secondary loss and softmax loss is not the same as the claimed loss. Applicant arguers "In Wen, the Softmax loss is the primary loss function and the secondary loss function comprises a center loss function. No other loss functions are disclosed. As such, the overall loss function is a combination of the Softmax loss (the primary loss function) and the center loss (the secondary loss function…In Wen, the center loss (only) explicitly minimizes the intra-class variation. As a small by-product, it does tend to increase inter-class discrimination, but that is not by design. Planet loss or Copernican loss, does two things better than center loss; first it minimizes intra-class variation by using the cosine angular space and, second, it maximizes inter-class variation by using the sun loss (the third loss term)”
In response examiner notes that the joint loss function in Wen explicitly minimizes not only intra-class variation but also inter-class discrimination, as demonstrated in the rejection. (Wen pg 3 ¶01)  Whether or not the function is expressly "designed" for a particular purpose does not mean that the feature is not taught by Wen, in fact the feature in dispute is explicitly pointed out by Wen as a benefit.
Further, applicant argues "Claim 1 requires a primary loss function which may be, for example, a Softmax function (see claim 6) and, in addition, a secondary loss function comprising both a planetary loss function and a sun loss function, for a total of three loss functions…”
In response examiner notes, claim 1 does not require three loss functions only a primary loss and secondary loss, claim 1 presently merely states abstractly that the secondary loss is composed of a planetary loss and a sun loss. The center loss function of Wen achieves the functions set forth by both the planetary loss and sun loss, thus is equivalent to the secondary loss claimed. Further 
With respect to the rejection under 35 U.S.C. 103
Applicant argues "Liu, like Wen, teaches only minimizing the intra-class variations…First, with respect to claim 4, the Applicant would like to point out that neither Liu nor Wen teaches providing an explicit function to maximize the inter-class variations…In the present application, a separate loss function is provided for explicitly maximizing the inter-class variations, namely, the sun loss portion of the secondary loss function."
Examiner disagrees, Liu explicitly states that the loss function "enlarge[s] the centroid distance of samples across classes" this is equivalent to maximizing inter class variation.Further, while the application may broadly discuss that the "planetary loss portion" and "sun loss portion" are "separate loss functions" the present claims do not require this. Furthermore, Liu clearly discusses two distinct portions of the loss function which may be considered separate "functions" that achieve the same goals as the "sun" loss and "planetary" loss, In Liu, this is the numerator and denominator of the loss function.

Others of applicant's arguments in light of the amendments filed 02/01/2018 have been fully considered, accordingly, the rejection under 35 USC 101 has been withdrawn. 

Claim Interpretation 
	Examiner notes that the amended limitations of at least claim 1 is directed to a method for training a deep neural network to learn an increased discrimination by refining a deep neural network with backpropogating according to a particular type of loss function. The loss function is only passively “determined” as part of the backpropogating step, there is no active “determination” of a “loss” to be propagated. Therefore, are not given patentable weight. For the purposes of compact prosecution, the 102 and 103 rejections are set forth below.



Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 5-7, 11, and 13 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wen et al. “A Discriminative Feature Learning Approach for Deep Face Recognition” hereinafter Wen.

Regarding claim 1
Wen teaches, A method, in a deep neural network, for training the network to learn an increased discrimination of feature vectors comprising (Abstract “With the joint supervision of softmax loss and center loss, we can train a robust CNNs”) inputting a batch of training samples to the deep neural network; receiving a batch of feature vectors generated by the deep neural network; (pg 5 “instead of updating the centers with respect to the entire training set, we perform the update based on mini-batch. In each iteration, the centers are computed by averaging the features of the corresponding classes” the loss function is described as operating on mini-batches of a training set, which includes inputting such training samples into a neural network. Training a neural network according to a mini-batch includes receiving the batch of feature vectors.) and refining the deep neural network by backpropagating a loss representing differences between the batch of feature vectors and ground truth results into the deep neural network, ( pg 6 Algorithm 1 
    PNG
    media_image1.png
    349
    732
    media_image1.png
    Greyscale
algorithm 1 clearly demonstrates backpropogation error computations are used to refine parameters of the neural network, 
    PNG
    media_image2.png
    119
    594
    media_image2.png
    Greyscale
further the loss function incorporates the differences between the training data label, xi, or ground truth, and the feature vectors Cyi.) the loss determined by:  b. augmenting the primary loss function with a secondary loss function (pg 3 ¶01 “The CNNs are trained under the joint supervision of the softmax loss and center loss” softmax loss corresponds to the primary loss functions provided by the CNN, the joint loss function is comprised of a primary loss function, softmax loss, and a secondary loss function, the center loss) the secondary loss function comprising a planetary loss portion that minimizes intra-class variation of the feature vectors and a sun loss portion that maximizes the inter-class variation of the feature vectors; ( pg 3 ¶01 “With the joint supervision, not only the inter-class features differences are enlarged, but also the intra-class features variations [of the feature vectors] are reduced” pg 5 Section 3.2 “minimizing the intra-class variations while keeping the features of different classes separable is the key. To this end, we propose the center loss function” the joint supervision refers to the combined softmax loss and center loss functions. The center loss function which corresponds and includes both the planetary loss and the sun loss, minimizes intra-class variations, and keeps features ‘separable’ or maximizes the inter-class variations.) c. minimizing the augmented loss function for each feature vector in the batch; and ( pg 5 Section 3.2 ¶03 “instead of updating the centers with respect to the entire training set, we perform the update based on mini-batch.” The loss is computed for each mini batch, as previously stated the loss function is minimized via the training.) 

Regarding claim 5
Wen teaches claim 1
Further Wen teaches, wherein the secondary loss function includes a loss weight enforcing a trade-off between the primary loss function and the secondary loss function. ( pg 6 “We adopt the joint supervision of softmax loss and center loss … The formulation is given in Equation 5…. 
    PNG
    media_image3.png
    20
    116
    media_image3.png
    Greyscale
…Clearly, the CNNs supervised by center loss are trainable and can be optimized by standard SGD. A scalar λ is used for balancing the two loss functions.” )

Regarding claim 6
Wen teaches claim 1
Further Wen teaches, wherein the primary loss function is Softmax. ( pg 6 “We adopt the joint supervision of softmax loss and center loss … The formulation is given in Equation 5…. 
    PNG
    media_image3.png
    20
    116
    media_image3.png
    Greyscale
…Clearly, the CNNs supervised by center loss are trainable and can be optimized by standard SGD. A scalar λ is used for balancing the two loss functions. The conventional softmax loss can be considered as a special case of this joint supervision, if λ is set to 0.” The primary loss Ls is the softmax loss.)

Regarding claim 7
Wen teaches claim 1
Further Wen teaches, wherein the secondary loss function is minimized using stochastic gradient descent. ( pg 6 “We adopt the joint supervision of softmax loss and center loss to train the CNNs for discriminative feature learning. The formulation is given in Equation 5…. 
    PNG
    media_image4.png
    23
    118
    media_image4.png
    Greyscale
 Clearly, the CNNs supervised by center loss are trainable and can be optimized by standard SGD” SGD is short for stochastic gradient descent. The loss is made up of a primary and secondary loss which are optimized using the SGD algorithm.)


Regarding claim 11
Wen teaches claim 1
Further Wen teaches, wherein each feature vector is adjusted based on a planetary loss gradient function representing a derivative of the planetary loss portion of the secondary loss function with respect to each feature vector. (pg 6 “In Algorithm 1, we summarize the learning detailsin the CNNs with joint supervision…. 
    PNG
    media_image5.png
    397
    868
    media_image5.png
    Greyscale
”  in step 5-7 the parameters are updated based on the loss gradient. The total loss gradient includes the secondary loss function as previously described in claim 1. Thus the feature vector is adjusted based on the derivative of planet loss which is used to update parameters.)

Regarding claim 13
Wen teaches claim 1
Further Wen teaches, wherein each feature vector is adjusted based on a planet loss gradient function representing a derivative of the planet loss portion of the secondary loss function with respect to each feature vector. (pg 6 “In Algorithm 1, we summarize the learning detailsin the CNNs with joint supervision…
    PNG
    media_image5.png
    397
    868
    media_image5.png
    Greyscale
”  in step 5-7 the parameters are updated based on the loss gradient. The total loss gradient includes the primary loss function as previously described in claim 1. Thus the feature vector is adjusted based on the derivative of sun loss which is used to update parameters.)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wen. Further in view of Kang et al. “Learning Deep Semantic Embeddings for Cross-Modal Retrieval” hereinafter Kang. 

Regarding claim 2
Wen teaches claim 1
Wen does not explicitly teach, wherein the secondary loss function maintains a center for each class of feature vectors and computes a mean of the batch of feature vectors.
However Kang when addressing issues related to maintaining an average embedding for each class teaches in a machine learning loss function teaches, wherein the secondary loss function maintains a center for each class of feature vectors and computes a mean of the batch of feature vectors. (Section 3.3 “the center lossis also exploited in this paper for cross-modal matching… 
    PNG
    media_image6.png
    49
    218
    media_image6.png
    Greyscale
” where cyi ∈ Rd is the class center of the embedding xi. Differently, unlike other weight parameters to be learned by backpropagation, the updating of the class centers cj , j = 1, 2, . . . , C are additionally performed as follows” the secondary loss maintains the center of each class by updated the class centers for the batch. The Summation in the cited equation 2 corresponds to the mean of the batch of feature vectors, because each element of feature vector is summed over the set in the batch which is then multiplied by the reciprocal of the size of the batch (1/M).)
	It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a moving average center for each class embedding as taught by Kang to the disclosed invention of Wen.
One of ordinary skill in the arts would have been motivated to make this modification so that “For class label guided cross-modal matching, such as matching between images and texts, it is even more important to reduce the intra-class variance, so that different modalities of the same class will have small distances to enable direct matching” (Kang Section 3.3)


Claim 3 and 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wen/Kang. Further in view of Liu et al “Learning Deep Features via Congenerous Cosine Loss for Person Recognition” hereinafter Liu.

Regarding claim 3
Wen/Kang teaches claim 2
Wen/Kang does not explicitly teach, wherein the planetary loss portion of the secondary loss function minimizes intra-class variation by minimizing the cosine distance of the samples to their corresponding class center.
However Liu when addressing issues related a center loss function that employs cosine similarity teaches, wherein the planetary loss portion of the secondary loss function minimizes intra-class variation by minimizing the cosine distance of the feature vectors to their corresponding class center. (pg 3 Section 3.2 “ The intuition behind designing a COCO loss is that we directly compare and optimize the cosine distance (similarity) between two features…. We first define the cosine similarity of two features from a mini-batch B as:  … A natural intuition to a desirable loss is to increase the similarity of samples within a category and enlarge the centroid distance of samples across classes…. Incorporating the spirit of Eqn. 3 with class centroid, we have the following output of sample i to maximize:  ” “The numerator ensures sample i is close enough to its own class li” Examiner notes that the cosine similarity or distance is measured by the functions C(x,y). Because the minimization function is represented as a fraction, in order to minimize the fraction the numerator must be minimized while the denominator is maximized. Cli is the centroid of the class that matches the class of the feature vector, thus minimizes the numerator, minimizes intra class variation.)
	It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a center loss that measures distance using cosine similarity as an alternative to mean squared error as taught by Liu to the disclosed invention of Wen/Kang.
One of ordinary skill in the arts would have been motivated to make this modification so that one could implement “a…loss is that we directly compare and optimize the cosine distance (similarity) between two features… The cosine similarity measures how close two samples are in the feature space” in contrast regarding the MSE based loss “the positives of one class will get as far as possible from the negatives of other classes. It optimizes the inter-class distance in some sense but fails to differentiate among the negatives” (Section 3.3 and Section 3.2)

Regarding claim 4
Wen/Kang/Liu teaches claim 3
Further Liu teaches, wherein the sun loss portion of the secondary loss function maximizes the inter-class variation of the feature vectors by maximizing the cosine distance of each feature vector away from the mean of the batch. (pg 3 Section 3.2 “The intuition behind designing a COCO loss is that we directly compare and optimize the cosine distance (similarity) between two features…. We first define the cosine similarity of two features from a mini-batch B as: … A natural intuition to a desirable loss is to increase the similarity of samples within a category and enlarge the centroid distance of samples across classes…. Incorporating the spirit of Eqn. 3 with class centroid, we have the following output of sample i to maximize:  … the denominator enforces a minimal distance against samples in other classes… we propose the congenerous cosine (COCO) loss, which is to increase similarity within classes and enlarge variation across categories in a cooperative way:” Examiner notes that the cosine similarity or distance is measured by the functions C(x,y). Because the minimization function is represented as a fraction, in order to minimize the fraction the numerator must be minimized while the denominator is maximized. Ck is the centroid of the class that is the average of all features representing the inter class average, thus maximizing the denominator maximizes or enlarges inter class variation.) 

Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wen. Further in view of Zhang et al “Deep Metric Learning with Improved Triplet Loss for Face Clustering in Videos” hereinafter Zhang.

Regarding claim 8
Wen teaches claim 1
Further Wen teaches, wherein the augmented loss function is given by:  Lc = Lprimary + [Secondary loss]… LPrimary is the primary loss function; (pg 3 ¶01 “The CNNs are trained under the joint supervision of the softmax loss and center loss” softmax loss corresponds to the primary loss functions provided by the CNN, the joint loss function is comprised of a primary loss function, softmax loss, and a secondary loss function, the center loss)
	Wen does not explicitly teach, wherein the [secondary loss is given by] (lambda)Lp + Ls  where:  lambda is a loss weight;…LP is the planetary loss portion of the secondary loss function; and LS is the sun loss portion of the secondary loss function.
	Zhang when addressing issues related to an improved secondary loss function teaches, wherein the [secondary loss is given by] (lambda)Lp + Ls  where:  lambda is a loss weight;…LP is the planetary loss portion of the secondary loss function; and LS is the sun loss portion of the secondary loss function. (pg 5 “To address these issues, we propose an improved triplet loss function… We define the ImpTriplet loss as:… 
    PNG
    media_image7.png
    89
    616
    media_image7.png
    Greyscale
the intra class constraints, corresponds to the LP function, while the inter class constraints, correspond to the sun loss. The art provides an loss function that separates the interclass and intraclass functions by addition.
	It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate separable interclass and intraclass loss functions which enforces the angle that a feature embedding is encouraged to update as taught by Zhang to the disclosed invention of Wen.
One of ordinary skill in the arts would have been motivated to make this modification so that one could implement “an improved triplet loss function, which pushes the negative face away from the positive pairs simultaneously, and requires the distance of the positive pair to be less than a margin, such that the Euclidean distances correspond to a measure of semantic face similarity” (Conclusion Zhang)

Allowable Subject Matter
Claims 9, 10, 12 and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.	
Specifically, none of the reference of record either alone or in combination fairly disclose or suggest the limitations of claim 9.
Further examiner notes, as noted in the claim interpretation section, the loss function should be positively claimed to be given patentable weight.

	The closest prior art of record Zhang et al (“Deep Metric Learning with Improved Triplet Loss for Face Clustering in Videos”) teaches a secondary loss function which is the sum of two terms. Where on term minimized intra class variation and another maximizes inter class variation. However Zhang’s loss function is based on the Euclidean distance between feature vectors, not between a feature vector in a batch and a class center. Further, the claimed equation measures “angular distance” rather than “Euclidean distance” as described in the Specification ¶0020. Further Liu et al (“Learning Deep Features via Congenerous Cosine Loss for Person Recognition”) discloses COCO loss which uses “angular distance” to minimizes intra class variation while maximizing inter-class variation, however this formulation maximizes inter class variation against class specific centroids instead of the centroid of an entire batch of samples irrespective of class. It would not have been obvious to one of ordinary skill in the art before the effective filling data to combine these references to teach at least the limitations of claim 9.


Conclusion
Prior art
Wu et al. “Deep Face Recognition with Center Invariant Loss” discusses a loss function that minimizes the sum of 3 loss functions including: a soft max loss, a center invariant loss for inter class variations, and center loss for intra class variations.
Deng et al “ArcFace: Additive Angular Margin Loss for Deep Face Recognition” incorporates angular and cosine margin to enhance the discriminative power of soft-max loss.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached M-F 7:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/J.R.G./Examiner, Art Unit 2122                                                                                                                                                                                                        
 /KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122                                                                                                                                                                                                        
	




	

	
	
	
	














	
	




	
	
	
	

	 





son