DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . The Amendment filed 22 February 2022 (hereinafter “the Amendment”) has been entered and considered. Claims 1, 19, and 20 have been amended. Claims 1-20, all the claims pending in the application, are rejected. All new grounds of rejection set forth in the present action were necessitated by Applicants’ claim amendments; accordingly, this action is made final. 

Response to Amendment
In view of the amendments to independent claims 1, 19, and 20, the previously applied prior art rejections are withdrawn. Applicants’ arguments are rendered moot in view of the new grounds of rejection set forth below.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-10, 13-16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks” by Ren et al. (cited in the IDS filed 6 February 2020; hereinafter “Ren”) in view of “Arbitrary-Oriented Scene Text Detection via Rotation Proposals” by Ma et al. (herein after “Ma”) and further in view of U.S. Patent Application Publication No. 2014/0343842 to Ranganathan et al. (hereinafter “Ranganathan”).
As to independent claim 1, Ren discloses a method of training an image recognition model (Abstract and Section 3 discloses that Ren is directed to a network for localizing and classifying objects in an image), comprising: selecting subregions corresponding to a landmark portion from among subregions of an input training image from training data (Section 3 discloses sampling 256 anchors in a mini-batch of a training image, the anchor regions necessarily including landmark portions since objects of interest are found therein);  calculating a class loss and a class-dependent localization loss for the selected subregions based on an image recognition model (Section 3 discloses calculating, for each training image, a classification loss and a regression loss for each of the 256 anchors in each mini-batch of the training image); and training the image recognition model using a total loss comprising the class loss and the localization loss (Section 3 discloses training the network by minimizing an objective function comprising the classification loss and the regression loss).
Ren discloses that pi* weights the regression loss to zero for anchors i that don’t include objects (See equation 1 and corresponding description in Section 3.1.2) and thus suggests that the selection of regions excludes background regions. However, Ren does not expressly disclose that the selection is performed by excluding a subregion corresponding to a background from among the subregions of the input training image. 
Ma, like Ren, is directed to a region-proposal-based architecture for detecting objects in images based on a class loss and a regression (localization) loss (Abstract and Section IV(C)). As part of the sampling strategy for anchor selection, Ma discloses weighting the regression loss to zero for background regions, and as such “the background RoIs are ignored” (Equation 4 and Section IV(C)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ren to perform anchor region selection by excluding background RoIs, as taught by Ma, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to save computation resources.
	Ren discloses that the localization loss is calculated based on a measure of Intersection-over-Union (IoU) overlap between the anchors and the ground truth box in the training images (Section 3). However, Ren does not disclose any pre-processing of the training images. That is, the proposed combination of Ren and Ma does not expressly disclose the calculating of the class-dependent localization loss including transforming coordinates of the landmark portion based on a viewing angle of an image sensor capturing the input training image. 
	Ranganathan, like Ren, is directed to utilizing training images with landmarks (e.g., 110, 120, 210, 220, 310, 320) annotated by bounding boxes (Abstract, [0031-0038] and Figs. 1-4). In particular, Ranganathan discloses a pre-processing step of rectifying an original training image by transforming pixels therein based on at least one of the height, angle and GPS coordinates of the camera that captured the original training image in order to produce the annotated training image ([0033]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Ren and Ma to use training images that have been rectified by transforming coordinates of pixels of landmarks in the original training images based on the viewing angle of the camera that captured the training image, as taught by Ranganathan, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to facilitate matching between the images (Abstract of Ranganathan). 
As to claim 2, the proposed combination of Ren, Ma and Ranganathan further teaches that the calculating of the class loss and the class-dependent localization loss comprises: calculating temporary class information and temporary reference point information from an input training image based on the image recognition model (Section 3 of Ren discloses calculating an outputs pi of the cls layer and ti of the reg layer of the network for a particular anchor); calculating the class loss based on the temporary class information and ground truth class information (Section 3 of Ren discloses that the classification loss is calculated using a loss between pi and ground truth label pi*); and calculating the localization loss based on the temporary reference point information and ground truth reference point information (Section 3 discloses that the regression loss is calculated using a loss between ti and ground truth label ti*).
As to claim 3, the proposed combination of Ren, Ma and Ranganathan further teaches that the calculating of the temporary class information and the temporary reference point information comprises: calculating temporary class information and temporary reference point information for each of the selected subregions of the input training image (Section 3 of Ren discloses calculating the outputs pi of the cls layer and ti of the reg layer of the network for each of a sampled 256 anchors in each mini-batch of an image). 
As to claim 4, the proposed combination of Ren, Ma and Ranganathan further teaches that the calculating of the class loss comprises: calculating a partial class loss between the ground truth class information and the temporary class information calculated for the each of the selected subregions of the input training image; and determining a sum of partial class losses calculated for the each of the selected subregions of the input training image to be the class loss (Section 3 of Ren discloses calculating the classification loss between outputs pi of the cls layer of the network and ground truth label pi* for each of the 256 anchors in each mini-batch of an image; equation 1 shows the summation of partial losses). 
As to claim 5, the proposed combination of Ren, Ma and Ranganathan further teaches that the calculating of the class loss comprises: selecting subregions corresponding to a ground truth landmark portion from among the subregions of the input training image; calculating a partial class loss between the ground truth class information and temporary class information calculated for each of the selected subregions; and determining a sum of partial class losses calculated for the selected subregions to be the class loss (Section 3 of Ren discloses sampling 256 anchors in a mini-batch of an image and calculating the classification loss between outputs pi of the cls layer of the network and ground truth label pi* for each of the 256 anchors in each mini-batch of an image; equation 1 shows the summation of partial losses).
As to claim 6, the proposed combination of Ren, Ma and Ranganathan further teaches that the selecting of the subregions comprises: further selecting a subregion corresponding a ground truth background portion from among the subregions of the input training image (Section IV(C) of Ma discloses weighting the regression loss by l to zero for background regions).
As to claim 7, the proposed combination of Ren, Ma and Ranganathan further teaches that the calculating of the localization loss comprises: calculating, for each of the selected subregions of the input training image, a partial localization loss between the ground truth reference point information and temporary reference point information calculated for the each of the selected subregions of the input training image; and determining a sum of partial localization losses calculated for the each of the selected subregions to be the localization loss (Section 3 of Ren discloses calculating the regression loss between outputs ti of the reg layer of the network and ground truth label ti* for each of the 256 anchors in each mini-batch of an image; equation 2 shows the summation of partial losses).
As to claim 8, the proposed combination of Ren, Ma and Ranganathan further teaches that the calculating of the localization loss comprises: selecting subregions corresponding to a ground truth landmark portion from among the subregions of the input training image; calculating a partial localization loss between the ground truth reference point information and temporary reference point information of each of the selected subregions; and determining a sum of partial localization losses calculated for the selected subregions to be the localization loss (Section 3 of Ren discloses sampling 256 anchors in a mini-batch of an image and calculating the regression loss between outputs ti of the reg layer of the network and ground truth label ti* for each of the 256 anchors in each mini-batch of an image; equation 2 shows the summation of partial losses).
As to claim 9, the proposed combination of Ren, Ma and Ranganathan further teaches that the calculating of the partial localization loss comprises: excluding a subregion with a ground truth background portion from the selected subregions (Section IV(C) of Ma discloses weighting the localization loss Lreg to 0 when the ground truth label of the ROI is background).
As to claim 10, the proposed combination of Ren, Ma and Ranganathan further teaches that the calculating of the temporary class information and the temporary reference point information for the each of the selected subregions of the input training image comprises: calculating temporary class information and temporary reference point information for each of anchor nodes set for the each of the selected subregions (Section 3 of Ren discloses calculating the outputs pi of the cls layer and ti of the reg layer of the network for each of a sampled 256 anchors in each mini-batch of an image).
As to claim 13, the proposed combination of Ren, Ma and Ranganathan further teaches that the calculating of the class loss and the class-dependent localization loss comprises: calculating a class-based weight based on temporary class information; and determining the class-dependent localization loss based on the class-based weight, temporary reference point information, and ground truth reference point information (Section IV(C) of Ma discloses a class-based weight l which multiplies the localization loss Lreg by 0 when the ground truth label of the ROI is background and multiplies the localization loss Lreg by 1 when the ground truth label of the ROI is not background; this section of Ma further discloses that the localization loss Lreg is calculated based on the class-based weight l, the location information v output by the network and the ground truth location information v*).
As to claim 14, the proposed combination of Ren, Ma and Ranganathan further teaches that the determining of the class-dependent localization loss comprises: determining the class-dependent localization loss by applying the class-based weight to a difference between the temporary reference point information and the ground truth reference point information (Section IV(C) of Ma discloses that the class-based weight l is multiplied by the localization loss Lreg which comprises a difference between the location information v output by the network and the ground truth location information v*).
As to claim 15, the proposed combination of Ren, Ma and Ranganathan further teaches that the training comprises: updating a parameter of the image recognition model to minimize the total loss (Section 3 of Ren discloses that the weights in the network are adjusted using end-to-end training by back-propagation to minimize the loss function). 
As to claim 16, the proposed combination of Ren, Ma and Ranganathan further teaches that the updating of the parameter comprises: repeating the updating of the parameter of the image recognition model to converge the total loss (Section 3 of Ren discloses that the training is an iterative process which minimizes the loss function). 
As to claim 18, the proposed combination of Ren, Ma and Ranganathan does not expressly disclose a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1. However, official notice is taken to note that the uses and benefits of embodying software instructions for execution by a CPU (such as the arrangement disclosed by Ren in Table 5) on a non-transitory computer-readable medium are known and expected within the image processing arts.  It would have been obvious to the ordinarily-skilled artisan at the time of invention to embody Ren’s software on a non-transitory computer-readable medium, to achieve the known and expected uses and benefits of reproducing and transporting the software.

As to independent claim 19, Ren discloses a training apparatus comprising: a memory configured to store an image recognition model; and a processor (Abstract and Section 3 discloses that Ren is directed to a network for localizing and classifying objects in an image; such a network requires a computer having memory which stores the software code which implements the network; also, table 5 discloses that the network is implemented using a CPU) configured to: select subregions corresponding to a landmark portion from among subregions of an input training image from training data (Section 3 discloses sampling 256 anchors in a mini-batch of a training image, the anchor regions necessarily including landmark portions since objects of interest are found therein); calculate a class loss and a class-dependent localization loss for the selected subregions based on the image recognition model, and train the image recognition model using a total loss comprising the class loss and the localization loss (Section 3 discloses calculating, for each training image, a classification loss and a regression loss for each of the 256 anchors in each mini-batch of the training image; this section further discloses that the network is trained by minimizing an objective function comprising the classification loss and the regression loss).
Ren discloses that pi* weights the regression loss to zero for anchors i that don’t include objects (See equation 1 and corresponding description in Section 3.1.2) and thus suggests that the selection of regions excludes background regions. However, Ren does not expressly disclose that the selection is performed by excluding a subregion corresponding to a background from among the subregions of the input training image. 
Ma, like Ren, is directed to a region-proposal-based architecture for detecting objects in images based on a class loss and a regression (localization) loss (Abstract and Section IV(C)). As part of the sampling strategy for anchor selection, Ma discloses weighting the regression loss to zero for background regions, and as such “the background RoIs are ignored” (Equation 4 and Section IV(C)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ren to perform anchor region selection by excluding background RoIs, as taught by Ma, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to save computation resources.
Ren discloses that the localization loss is calculated based on a measure of Intersection-over-Union (IoU) overlap between the anchors and the ground truth box in the training images (Section 3). However, Ren does not disclose any pre-processing of the training images. That is, the proposed combination of Ren and Ma does not expressly disclose the calculating of the class-dependent localization loss including transforming coordinates of the landmark portion based on a viewing angle of an image sensor capturing the input training image. 
	Ranganathan, like Ren, is directed to utilizing training images with landmarks (e.g., 110, 120, 210, 220, 310, 320) annotated by bounding boxes (Abstract, [0031-0038] and Figs. 1-4). In particular, Ranganathan discloses a pre-processing step of rectifying an original training image by transforming pixels therein based on at least one of the height, angle and GPS coordinates of the camera that captured the original training image in order to produce the annotated training image ([0033]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Ren and Ma to use training images that have been rectified by transforming coordinates of pixels of landmarks in the original training images based on the viewing angle of the camera that captured the training image, as taught by Ranganathan, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to facilitate matching between the images (Abstract of Ranganathan).

As to independent claim 20, Ren discloses an image recognition method comprising: obtaining an input image; and estimating, from the input image, a class of a landmark in the input image and a reference point of the landmark, based on an image recognition model (Abstract and Section 3 discloses that Ren is directed to a network for localizing and classifying objects in an image; in particular, an image is input into the network, and objects in the image are localized by a bounding box along with a label of the class of the object; see also Fig. 5); wherein the image recognition model is trained using a total loss comprising the class loss and the localization loss being calculated by selecting subregions corresponding to a landmark portion from among subregions of an input training image from training data (Section 3 discloses sampling 256 anchors in a mini-batch of a training image, the anchor regions necessarily including landmark portions since objects of interest are found therein; Section 3 discloses calculating, for each training image, a classification loss and a regression loss for each of the 256 anchors in each mini-batch of the training image; this section further discloses that the network is trained by minimizing an objective function comprising the classification loss and the regression loss).
Ren discloses that pi* weights the regression loss to zero for anchors i that don’t include objects (See equation 1 and corresponding description in Section 3.1.2) and thus suggests that the selection of regions excludes background regions. However, Ren does not expressly disclose excluding a subregion corresponding to a background from among the subregions of the input training image. 
Ma, like Ren, is directed to a region-proposal-based architecture for detecting objects in images based on a class loss and a regression (localization) loss (Abstract and Section IV(C)). As part of the sampling strategy for anchor selection, Ma discloses weighting the regression loss to zero for background regions, and as such “the background RoIs are ignored” (Equation 4 and Section IV(C)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ren to perform anchor region selection by excluding background RoIs, as taught by Ma, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to save computation resources.
Ren discloses that the localization loss is calculated based on a measure of Intersection-over-Union (IoU) overlap between the anchors and the ground truth box in the training images (Section 3). However, Ren does not disclose any pre-processing of the training images. That is, the proposed combination of Ren and Ma does not expressly disclose that a class-dependent localization loss of the localization loss is calculated based on transforming coordinates of the landmark portion based on a viewing angle of an image sensor capturing the input training image. 
	Ranganathan, like Ren, is directed to utilizing training images with landmarks (e.g., 110, 120, 210, 220, 310, 320) annotated by bounding boxes (Abstract, [0031-0038] and Figs. 1-4). In particular, Ranganathan discloses a pre-processing step of rectifying an original training image by transforming pixels therein based on at least one of the height, angle and GPS coordinates of the camera that captured the original training image in order to produce the annotated training image ([0033]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Ren and Ma to use training images that have been rectified by transforming coordinates of pixels of landmarks in the original training images based on the viewing angle of the camera that captured the training image, as taught by Ranganathan, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to facilitate matching between the images (Abstract of Ranganathan).

Claims 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Ren in view of Ma and Ranganathan and further in view of U.S. Patent Application Publication No. 2020/0117991 to Suzuki et al. (hereinafter “Suzuki”).
As to claim 11, the proposed combination of Ren, Ma and Ranganathan further teaches that the calculating of the temporary class information and the temporary reference point information for the each of the anchor nodes comprises: calculating temporary class information and temporary reference point information for an anchor node (Section 3 of Ren discloses calculating the outputs pi of the cls layer and ti of the reg layer of the network for each of a sampled 256 anchors in each mini-batch of an image).  
The proposed combination of Ren, Ma and Ranganathan does not expressly disclose that the anchor node is selected as the one having a highest confidence level from among confidence levels calculated for each of the anchor nodes. 
Suzuki, like Ren, is directed to a trained network for classifying and localizing objects in an image (Abstract). Suzuki discloses that the regions selected for classification and localization are those having a confidence measure equal to or greater than a threshold, which includes the region with the highest confidence level ([0067]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Ren, Ma and Ranganathan to select a region having a highest confidence level for classification and localization, as taught by Suzuki, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to enhance accuracy of the model.
As to claim 12, the proposed combination of Ren, Ma, Ranganathan and Suzuki further teaches that the calculating of the temporary class information and the temporary reference point information for each of the anchor nodes comprises: excluding an anchor node having a confidence level less than a threshold from among confidence levels calculated for each of the anchor nodes ([0067] of Suzuki discloses that the regions selected for classification and localization are those having a confidence measure equal to or greater than a threshold such that regions having a confidence below the threshold are excluded). 

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Ren in view of Ma and Ranganathan and further in view of U.S. Patent Application Publication No. 2015/0371397 to Wang et al. (hereinafter “Wang”).
As to claim 17, Ren contemplates a variety of training strategies which include training the RPN and the Fast R-CNN in different orders (Section 3.2). However, the proposed combination of Ren, Ma, and Ranganathan does not expressly disclose that the updating of the parameter comprises: updating the parameter such that the class loss is minimized before the localization loss is minimized. 
Wang, like Ren, is directed to object detection in images based on minimizing a classification error and a localization error (Abstract and [0039]). Wang discloses that the minimization of the classification error and the minimization of the localization error may not be performed at the same time ([0039]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Ren, Ma, and Ranganathan to minimize the classification loss before minimizing the localization loss, as contemplated by Wang, to arrive at the claimed invention discussed above because such a modification would have been obvious to try. More specifically, minimizing the classification loss before minimizing the localization loss is one of a predictable and ascertainable group of similar approaches contemplated by Wang: 1) minimizing the localization error first, or 2) minimizing the classification error first. This group addresses the recognized problem of reducing computation time with a reasonable level of success. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to try to modify Ren’s minimization by minimizing the classification loss first, as contemplated by Wang, since there are a finite number of identified, predictable potential solutions to the recognized need (as discussed above) and one of ordinary skill in the art could have pursued the known potential solutions with a reasonable expectation of success.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN M CONNER whose telephone number is (571)272-1486. The examiner can normally be reached noon - 8:30 PM Monday through Thursday and Saturday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on (571) 270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/SEAN M CONNER/Primary Examiner, Art Unit 2663