DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-20, all the claims pending in the application, are rejected. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-5, 7, 8, 10, 15-16 and 19-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks” to Ren et al. (cited in the IDS filed 6 February 2020; hereinafter “Ren”).
As to independent claim 1, Ren discloses a method of training an image recognition model (Abstract and Section 3 discloses that Ren is directed to a network for localizing and classifying objects in an image), comprising: calculating a class loss and a class-dependent localization loss from training data based on an image recognition model (Section 3 discloses calculating, for each training image, a classification loss and a regression loss of a bounding box which localizes the object, the losses being based on the training images output ; and training the image recognition model using a total loss comprising the class loss and the localization loss (Section 3 discloses training the network by minimizing an objective function comprising the classification loss and the regression loss). 
As to claim 2, Ren further discloses that the calculating of the class loss and the class-dependent localization loss comprises: calculating temporary class information and temporary reference point information from an input training image based on the image recognition model (Section 3 discloses calculating an outputs pi of the cls layer and ti of the reg layer of the network for a particular anchor); calculating the class loss based on the temporary class information and ground truth class information (Section 3 discloses that the classification loss is calculated using a loss between pi and ground truth label pi*); and calculating the localization loss based on the temporary reference point information and ground truth reference point information (Section 3 discloses that the regression loss is calculated using a loss between ti and ground truth label ti*).
As to claim 3, Ren further discloses that the calculating of the temporary class information and the temporary reference point information comprises: calculating temporary class information and temporary reference point information for each of subregions of the input training image (Section 3 discloses calculating the outputs pi of the cls layer and ti of the reg layer of the network for each of a sampled 256 anchors in each mini-batch of an image). 
As to claim 4, Ren further discloses that the calculating of the class loss comprises: calculating a partial class loss between the ground truth class information and the temporary class information calculated for the each of the subregions of the input training image; and determining a sum of partial class losses calculated for the each of the subregions of the input training image to be the class loss (Section 3 discloses calculating the classification loss between outputs pi of the cls layer of the network and ground truth label pi* for each of the 256 anchors in each mini-batch of an image; equation 1 shows the summation of partial losses). 
As to claim 5, Ren further discloses that the calculating of the class loss comprises: selecting subregions corresponding to a ground truth landmark portion from among the subregions of the input training image; calculating a partial class loss between the ground truth class information and temporary class information calculated for each of the selected subregions; and determining a sum of partial class losses calculated for the selected subregions to be the class loss (Section 3 discloses sampling 256 anchors in a mini-batch of an image and calculating the classification loss between outputs pi of the cls layer of the network and ground truth label pi* for each of the 256 anchors in each mini-batch of an image; equation 1 shows the summation of partial losses).
As to claim 7, Ren further discloses that the calculating of the localization loss comprises: calculating, for each of the subregions of the input training image, a partial localization loss between the ground truth reference point information and temporary reference point information calculated for the each of the subregions of the input training image; and determining a sum of partial localization losses calculated for the each of the subregions to be the localization loss (Section 3 discloses calculating the regression loss between outputs ti of the reg layer of the network and ground truth label ti* for each of the 256 anchors in each mini-batch of an image; equation 2 shows the summation of partial losses).
As to claim 8, Ren further discloses that the calculating of the localization loss comprises: selecting subregions corresponding to a ground truth landmark portion from among the subregions of the input training image; calculating a partial localization loss between the ground truth reference point information and temporary reference point information of each of the selected subregions; and determining a sum of partial localization losses calculated for the selected subregions to be the localization loss (Section 3 discloses sampling 256 anchors in a mini-batch of an image and calculating the regression loss between outputs ti of the reg layer of the network and ground truth label ti* for each of the 256 anchors in each mini-batch of an image; equation 2 shows the summation of partial losses).
As to claim 10, Ren further discloses that the calculating of the temporary class information and the temporary reference point information for the each of the subregions of the input training image comprises: calculating temporary class information and temporary reference point information for each of anchor nodes set for the each of the subregions (Section 3 discloses calculating the outputs pi of the cls layer and ti of the reg layer of the network for each of a sampled 256 anchors in each mini-batch of an image).
As to claim 15, Ren further discloses that the training comprises: updating a parameter of the image recognition model to minimize the total loss (Section 3 discloses that the weights in the network are adjusted using end-to-end training by back-propagation to minimize the loss function). 
As to claim 16, Ren further discloses that the updating of the parameter comprises: repeating the updating of the parameter of the image recognition model to converge the total loss (Section 3 discloses that the training is an iterative process which minimizes the loss function). 

As to independent claim 19, Ren discloses a training apparatus comprising: a memory configured to store an image recognition model; and a processor (Abstract and Section 3 discloses that Ren is directed to a network for localizing and classifying objects in an image; such a network requires a computer having memory which stores the software code which implements the network; also, table 5 discloses that the network is implemented using a CPU) configured to calculate a class loss and a class-dependent localization loss from training data based on the image recognition model, and train the image recognition model using a total loss comprising the class loss and the localization loss (Section 3 discloses calculating, for each training image, a classification loss and a regression loss of a bounding box which localizes the object, the losses being based on the training images output from the network; this section further discloses that the network is trained by minimizing an objective function comprising the classification loss and the regression loss).

As to independent claim 20, Ren discloses an image recognition method comprising: obtaining an input image; and estimating, from the input image, a class of a landmark in the input image and a reference point of the landmark, based on an image recognition model (Abstract and Section 3 discloses that Ren is directed to a network for localizing and classifying objects in an image; in particular, an image is input into the network, and objects in the image are localized by a bounding box along with a label of the class of the object; see also Fig. 5). 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 6, 9, and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Ren in view of “Resnet-based Vehicle Classification and Localization in Traffic Surveillance Systems” to Jung et al. (hereinafter “Jung”).
As to claim 6, Ren does not expressly disclose that the selecting of the subregions comprises: further selecting a subregion corresponding a ground truth background portion from among the subregions of the input training image. 
Jung, like Ren, is directed to a neural network for localizing and classifying objects in images using an objective function comprising a weighted sum of a localization loss Lreg and a classification loss Lcls (Abstract and Section 3). In particular, the weight is a class-based weight I(c*) which multiplies the localization loss Lreg by 0 when the ground truth label of the ROI is background and multiplies the localization loss Lreg by 1 when the ground truth label of the ROI is not background (Section 3). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ren to select an ROI corresponding to a ground truth background, as taught by Jung, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to accurately classify all ROIs in the image. 
As to claim 9, the proposed combination of Ren and Jung further teaches that the calculating of the partial localization loss comprises: excluding a subregion with a ground truth background portion from the selected subregions (Section 3 of Jung discloses weighting the localization loss Lreg to 0 when the ground truth label of the ROI is background).
As to claim 13, the proposed combination of Ren and Jung further teaches that the calculating of the class loss and the class-dependent localization loss comprises: calculating a class-based weight based on temporary class information; and determining the class-dependent localization loss based on the class-based weight, temporary reference point information, and ground truth reference point information (Section 3 of Jung discloses a class-based weight I(c*) which multiplies the localization loss Lreg by 0 when the ground truth label of the ROI is background and multiplies the localization loss Lreg by 1 when the ground truth label of the ROI is not background; this section of Jung further discloses that the localization loss Lreg is calculated based on the class-based weight I(c*), the location information tb output by the network and the ground truth location information tb*).
As to claim 14, the proposed combination of Ren and Jung further teaches that the determining of the class-dependent localization loss comprises: determining the class-dependent localization loss by applying the class-based weight to a difference between the temporary reference point information and the ground truth reference point information (Section 3 of Jung discloses that the class-based weight I(c*) is multiplied by the localization loss Lreg which comprises a difference between the location information tb output by the network and the ground truth location information tb*). 

Claims 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Ren in view of U.S. Patent Application Publication No. 2020/0117991 to Suzuki et al. (hereinafter “Suzuki”).
As to claim 11, Ren further discloses that the calculating of the temporary class information and the temporary reference point information for the each of the anchor nodes comprises: calculating temporary class information and temporary reference point information for an anchor node (Section 3 discloses calculating the outputs pi of the cls layer and ti of the reg layer of the network for each of a sampled 256 anchors in each mini-batch of an image).  
Ren does not expressly disclose that the anchor node is selected as the one having a highest confidence level from among confidence levels calculated for each of the anchor nodes. 
Suzuki, like Ren, is directed to a trained network for classifying and localizing objects in an image (Abstract). Suzuki discloses that the regions selected for classification and localization are those having a confidence measure equal to or greater than a threshold, which includes the region with the highest confidence level ([0067]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ren to select a region having a highest confidence level for classification and localization, as taught by Suzuki, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to enhance accuracy of the model.
As to claim 12, the proposed combination of Ren and Suzuki further teaches that the calculating of the temporary class information and the temporary reference point information for each of the anchor nodes comprises: excluding an anchor node having a confidence level less than a threshold from among confidence levels calculated for each of the anchor nodes ([0067] of Suzuki discloses that the regions selected for classification and localization are those having a confidence measure equal to or greater than a threshold such that regions having a confidence below the threshold are excluded). 

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Ren in view of U.S. Patent Application Publication No. 2020/0117991 to Suzuki et al. (hereinafter “Suzuki”).
As to claim 17, Ren contemplates a variety of training strategies which include training the RPN and the Fast R-CNN in different orders (Section 3.2). However, Ren does not expressly disclose that the updating of the parameter comprises: updating the parameter such that the class loss is minimized before the localization loss is minimized. 
Wang, like Ren, is directed to object detection in images based on minimizing a classification error and a localization error (Abstract and [0039]). Wang discloses that the minimization of the classification error and the minimization of the localization error may not be performed at the same time ([0039]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ren to minimize the classification loss before minimizing the localization loss, as contemplated by Wang, to arrive at the claimed invention discussed above because such a modification would have been obvious to try. More specifically, minimizing the classification loss before minimizing the localization loss is one of a predictable and ascertainable group of similar approaches contemplated by Wang: 1) minimizing the localization error first, or 2) minimizing the classification error first. This group addresses the .

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Ren.
As to claim 18, Ren does not expressly disclose a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1. However, official notice is taken to note that the uses and benefits of embodying software instructions for execution by a CPU (such as the arrangement disclosed by Ren in Table 5) on a non-transitory computer-readable medium are known and expected within the image processing arts.  It would have been obvious to the ordinarily-skilled artisan at the time of invention to embody Ren’s software on a non-transitory computer-readable medium, to achieve the known and expected uses and benefits of reproducing and transporting the software.

Pertinent Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Chen (U.S. Patent Application Publication No. 2019/0377949) discloses a neural network for classifying and localizing objects in images, wherein the neural network is trained by minimizing a first loss function and a second loss function. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN M CONNER whose telephone number is (571)272-1486.  The examiner can normally be reached on noon - 8:30 PM Monday through Thursday and Saturday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on (571) 270-1051.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to 




/SEAN M CONNER/Primary Examiner, Art Unit 2663