ALLOWABILITY NOTICE
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  The Amendment filed 3 August 2022 has been entered and considered. Claims 1, 5, 19 and 20 have been amended, and claim 4 has been canceled. By way of Examiner’s Amendment herein, claims 1, 5, 6, 9, 11-13, 19, and 20 are further amended, and claims 3-4, 7-8, and 10 are canceled. Claims 1, 5-6, 9, and 11-20, all the claims pending in the application, are allowed. 

Examiner’s Amendment
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with Applicants’ representative, Luke Choi (Reg. No. 76,846), on 30 August 2022.
The application has been amended as follows: 


1. (Currently amended) A method of training an image recognition model, comprising: 
dividing an input training image into a plurality of subregions, wherein the divided subregions are non-overlapping;  
selecting subregions corresponding to a landmark portion from among the divided subregions of [[an]] the input training image from training data by excluding at least one subregion corresponding to a background from among the divided subregions of the input training image; 
calculating a class loss and a class-dependent localization loss for the selected subregions based on an image recognition model, the calculating of the class-dependent localization loss including transforming coordinates of the landmark portion based on a viewing angle 
training the image recognition model using a total loss comprising the class loss and the class-dependent localization loss, [[and]] 
wherein the calculating of the total loss comprises: 
after the selecting the subregions corresponding to the landmark portion, setting a plurality of anchor nodes for each of the selected subregions;
calculating temporary reference point information for each of the plurality of anchor nodes for each of the selected subregions;
for each of the selected subregions, calculating a partial localization loss as a sum of differences between ground truth reference point information and the temporary reference point information for each of the plurality of anchor nodes; 
determining a sum of partial localization losses calculated for each of the selected subregions of the input training image to be the class-dependent localization loss; 
calculating temporary class information for each of the plurality of anchor nodes for each of the selected subregions;
calculating a partial class loss between ground truth class information and the temporary class information calculated for each of the anchor nodes for the each of the selected subregions of the input training image; and 
determining a sum of partial class losses calculated for the each of the selected subregions of the input training image to be the class loss.  

2-4. (Canceled).  

5. (Currently amended) The method of claim 1, wherein the calculating of the class loss comprises: 
selecting subregions corresponding to a ground truth landmark portion from among the divided subregions of the input training image.  

6. (Currently Amended) The method of claim 5, wherein the selecting of the subregions comprises: 
further selecting a subregion corresponding a ground truth background portion from among the divided subregions of the input training image.  

7-8. (Canceled).  

9. (Currently Amended) The method of claim 1 
excluding a subregion with a ground truth background portion from the selected subregions.  

10. (Canceled). 

11. (Currently Amended) The method of claim 1 
calculating temporary class information and temporary reference point information for an anchor node having a highest confidence level from among confidence levels calculated for each of the anchor nodes.  

12. (Currently Amended) The method of claim 1 
excluding an anchor node having a confidence level less than a threshold from among confidence levels calculated for each of the anchor nodes.  

13. (Currently Amended) The method of claim 1, wherein the calculating of the class loss and the class-dependent localization loss comprises: 
calculating a class-based weight based on the temporary class information; and 
determining the class-dependent localization loss based on the class-based weight, the temporary reference point information, and the ground truth reference point information.  

14. (Original) The method of claim 13, wherein the determining of the class-dependent localization loss comprises: 
determining the class-dependent localization loss by applying the class-based weight to a difference between the temporary reference point information and the ground truth reference point information.  

15. (Original) The method of claim 1, wherein the training comprises: 
updating a parameter of the image recognition model to minimize the total loss.  

16. (Original) The method of claim 15, wherein the updating of the parameter comprises: 
repeating the updating of the parameter of the image recognition model to converge the total loss.  

17. (Original) The method of claim 15, wherein the updating of the parameter comprises: 
updating the parameter such that the class loss is minimized before the localization loss is minimized.  

18. (Original) A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1.  
19. (Currently amended) A training apparatus comprising: 
a memory configured to store an image recognition model; and 
a processor configured to: 
divide an input training image into a plurality of subregions, wherein the divided subregions are non-overlapping;
select subregions corresponding to a landmark portion from among the divided subregions of [[an]] the input training image from training data by excluding at least one subregion corresponding to a background from among the divided subregions of the input training image; 
calculate a class loss and a class-dependent localization loss for the selected subregions based on the image recognition model, the calculating of the class-dependent localization loss including transforming coordinates of the landmark portion based on a viewing angle of an image sensor capturing the input training image and a posture associated with the image sensor; and 
train the image recognition model using a total loss comprising the class loss and the class-dependent localization loss, [[and]] 
wherein the processor is further configured to: 
after the selecting the subregions corresponding to the landmark portion, set a plurality of anchor nodes for each of the selected subregions;
calculate temporary reference point information for each of the plurality of anchor nodes for each of the selected subregions;
for each of the selected subregions, calculate a partial localization loss as a sum of differences between ground truth reference point information and the temporary reference point information for each of the plurality of anchor nodes; 
determine a sum of partial localization losses calculated for each of the selected subregions of the input training image to be the class-dependent localization loss; 
calculate temporary class information for each of the plurality of anchor nodes for each of the selected subregions;
calculate a partial class loss between ground truth class information and temporary class information calculated for each of the anchor nodes for the each of the selected subregions of the input training image; and 
determine a sum of partial class losses calculated for the each of the selected subregions of the input training image to be the class loss.  

20. (Currently amended) An image recognition method comprising: 
obtaining an input image; and 
estimating, from the input image, a class of a landmark in the input image and a reference point of the landmark, based on an image recognition model, 
wherein the image recognition model is trained using a total loss comprising [[the]] class loss and [[the]] class-dependent localization loss being calculated by: 
dividing an input training image into a plurality of subregions, wherein the divided subregions are non-overlapping,
selecting subregions corresponding to a landmark portion from among the divided subregions of [[an]] the input training image from training data and excluding at least one subregion corresponding to a background from among the divided subregions of the input training image, 
calculating the [[a]] class-dependent localization loss for the selected subregions based on the image recognition model by transforming coordinates of the landmark portion based on a viewing angle of an image sensor capturing the input training image and a posture associated with the image sensor, and 
wherein the total loss is calculated by:
after the selecting the subregions corresponding to the landmark portion, setting a plurality of anchor nodes for each of the selected subregions;
calculating temporary reference point information for each of the plurality of anchor nodes for each of the selected subregions;
for each of the selected subregions, calculating a partial localization loss as a sum of differences between ground truth reference point information and the temporary reference point information for each of the plurality of anchor nodes; 
determining a sum of partial localization losses calculated for each of the selected subregions of the input training image to be the class-dependent localization loss; 
calculating temporary class information for each of the plurality of anchor nodes for each of the selected subregions;
calculating a partial class loss between ground truth class information and the temporary class information calculated for each of the anchor nodes for the each of the selected subregions of the input training image; and 
determining a sum of partial class losses calculated for the each of the selected subregions of the input training image to be the class loss



Examiner’s Amendment
The following is an examiner’s statement of reasons for allowance: Each of independent claims 1, 19, and 20 recites, in some variation: obtaining an input image; and estimating, from the input image, a class of a landmark in the input image and a reference point of the landmark, based on an image recognition model, wherein the image recognition model is trained using a total loss comprising class loss and class-dependent localization loss being calculated by: dividing an input training image into a plurality of subregions, wherein the divided subregions are non-overlapping, selecting subregions corresponding to a landmark portion from among the divided subregions of the input training image from training data and excluding at least one subregion corresponding to a background from among the divided subregions of the input training image, calculating the class-dependent localization loss for the selected subregions based on the image recognition model by transforming coordinates of the landmark portion based on a viewing angle of an image sensor capturing the input training image and a posture associated with the image sensor, and wherein the total loss is calculated by: after the selecting the subregions corresponding to the landmark portion, setting a plurality of anchor nodes for each of the selected subregions; calculating temporary reference point information for each of the plurality of anchor nodes for each of the selected subregions; for each of the selected subregions, calculating a partial localization loss as a sum of differences between ground truth reference point information and the temporary reference point information for each of the plurality of anchor nodes; determining a sum of partial localization losses calculated for each of the selected subregions of the input training image to be the class-dependent localization loss; calculating temporary class information for each of the plurality of anchor nodes for each of the selected subregions; calculating a partial class loss between ground truth class information and the temporary class information calculated for each of the anchor nodes for the each of the selected subregions of the input training image; and determining a sum of partial class losses calculated for the each of the selected subregions of the input training image to be the class loss. The cited art of record does not teach or suggest such a combination of features. 
Redmon (“You Only Look Once: Unified, Real-Time Object Detection”), the closest prior art of record, is directed to a method of detecting objects in an image. Redmon discloses dividing the image into a grid of S x S non-overlapping cells and predicting, for each cell, bounding boxes and confidence scores reflecting how confident the model is that the box contains an object. If no object exists in a particular cell, the confidence score for that particular cell is zero. Each grid cell also predicts a conditional class probability. A loss function is then minimized to train the object detection model, the loss function comprising a sum of class losses for each cell and a sum of bounding box coordinate losses for each cell, each class loss being based on the conditional class probability and the ground truth label for that cell, and each bounding box coordinate loss being based on an intersection over union between the predicted box and the ground truth box for that cell. 
However, Redmon does not teach or suggest that calculating the class-dependent localization loss for the selected subregions is based on the image recognition model by transforming coordinates of the landmark portion based on a viewing angle of an image sensor capturing the input training image and a posture associated with the image sensor, as required by the independent claims. Redmon also fails to teach or suggest that the total loss is calculated by: after the selecting the subregions corresponding to the landmark portion, setting a plurality of anchor nodes for each of the selected subregions, or that the temporary reference point information, partial localization loss, temporary class information, and partial class loss for each of the selected subregions is calculated for each of the plurality of anchor nodes, as required by the independent claims.  
	Ikeda (U.S. Patent Application Publication No. 2009/0080780) and Engedal (U.S. Patent Application Publication No. 2010/0232727) both disclose using a 3D model to calculate 2D real-world coordinates in an image based on a viewing angle and posture of the camera that captured the image. 
	However, even if the teachings of Ikeda or Engedal were to be combined with the teachings of Redmon, the combination would still fail to teach or suggest that the total loss is calculated by: after the selecting the subregions corresponding to the landmark portion, setting a plurality of anchor nodes for each of the selected subregions, or that the temporary reference point information, partial localization loss, temporary class information, and partial class loss for each of the selected subregions is calculated for each of the plurality of anchor nodes, as required by the independent claims. 
	Ren is directed to a network for localizing and classifying objects in an image. Ren discloses calculating, for each training image, a classification loss and a regression loss for each of 256 randomly chosen anchors. 
However, Ren is silent about dividing the image into non-overlapping regions and setting a plurality of anchors for each subregion. Furthermore, Redmon specifically cites Ren’s R-CNN method as requiring substantially more bounding boxes than Redmon’s method. Thus, Redmon teaches away from multiple predictions (anchors) per region. Accordingly, one of ordinary skill in the art would not have modified Redmon to arrive at the claimed invention based on the teachings of Ren.
	For all of the foregoing reasons, independent claims 1, 19, and 20 are allowed. Claims 5-6, 9, and 11-18 are allowed by virtue of their dependency on claim 1. 

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN M CONNER whose telephone number is (571)272-1486. The examiner can normally be reached noon - 8:30 PM Monday through Thursday and Saturday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on (571) 270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/SEAN M CONNER/Primary Examiner, Art Unit 2663