DETAILED ACTION

Applicant's arguments filed on 08/31/2021 have been fully considered but they are not persuasive. 

Regarding applicant’s argument that, In the invention of independent claims 1 and 11, the entire template image is not matched to the entire input image. In contrast, Carroll paragraphs [0101] and [0102] do exactly this. 
Moreover, in the invention of independent claims 1 and 11, the bounding box is a bounding box for individual keywords. In contrast, looking at Carroll paragraph [0083], there is no bounding box as the ordinarily skilled artisan would understand that term.. Carroll's analysis concerns tables and table lines, including a table cell, more specifically, a table grid line. In this portion of Carroll, a user is drawing a table, and the system is detecting the table. The table grid lines are not bounding boxes as the ordinarily skilled artisan would understand them. Rather, the table grid lines define a periphery of a table cell, in which a word or data may be input. There is no detection carried out, as would be the purpose of a bounding box in claims 1 and 11. 
Examiner’s Response: Clams 1 and 11, recite “providing a bounding box around some of the text of the input image” The word “some” is defined as “at least one” and nowhere in the claim is the word “individual” recited. Therefore the bounding boxes in these claims do not necessarily only include bounding boxes for individual keywords, and could include multiple words as in a table.
Applicant’s assertion that “table grid lines are not bounding boxes as the ordinarily skilled artisan would understand them” is not supported by any proof as to  The computer can analyze line segments that are within the indicated drawn bounding rectangle to determine both the location and extent of table 600/700.” Therefore the bounding box is not necessarily the actual periphery grid lines of the table but rather the bounding box that is drawn around the table, in response to the user indicating the location of the corners of the table.
It is also unclear as to what detection is being referring to, nonetheless this “detection” is not recited in the claims so it is a moot point.

Regarding applicant’s argument that, Still further, the independent claims recite, among other things, responsive to a determination that the bounding box in the input image is not the same size as the bounding box in the template image, scaling the bounding box in the input image to be substantially the same size as the bounding box in the template image. 
In contrast, Carroll paragraph [0108], describing block 920, refers to different 
resolutions. This is technically a very different concept. Different resolutions do not necessarily imply different sizes, and therefore do not necessarily lead to scaling, as independent claims 1 and 11 in the present application recite. 

Examiner’s Response: Although it is possible to increase a resolution but keep the scale of the image the same, that is not the case here. Carroll clearly discloses in ¶108, that increasing the resolution will scale the image to be a large size and not just a larger resolution.

Regarding applicant’s argument that, 7 Looking now at the rejection of claim 2, which depends from claim 1, Applicant notes initially that an initial value of bounding box translation is determined by the translation of an image origin through foreground detection, as the red dot in the following figure denotes:
Carroll already has the deficiencies that Applicant has noted above with respect to claim  1. Carroll further is deficient with respect to claim 2 because whatever scaling and shifting Carroll discusses is for an entire image, not contents of a bounding box. Moreover, Carroll FIG. 12, as discussed at paragraphs [0151] - [0157], describes iterations of computations of scale and shifting, but is silent on identification of an origin. Carroll talks about ranges of X and Y offsets, but makes no mention of any kind of origin.
Examiner’s Response: Carroll discloses in ¶83, “The computer can analyze line segments that are within the indicated drawn bounding rectangle to determine both the location and extent of table 600/700.” Therefore the location of table is determined 
. 
Regarding applicant’s argument that, “Looking now at the rejection of claim 10, Applicant notes that the box in Zlotnick and the bounding box of the invention of claim 10 are different. In Zlotnick FIG. 2, an image is divided into box regions according to pixel distribution. Zlotnick describes reference areas 44, 46, 48, and 50 as reference boxes. These are not bounding boxes as in claim 1, from which claim 10 depends, and in any event are very different from Carroll's table grids for the reasons Applicant has discussed previously. Consequently, Applicant submits that Zlotnick supplies none of the deficiencies of Carroll or of the other references on which the Examiner relies, so that claim 10 is patentable for at least this additional reason as well.”
Examiner’s Response: Each of the images of Carroll can have more than one table and therefore each of the them can also be matched to a template and then aligned/scaled. Therefore it would be obvious to be combined with the teachings from Zlotnick that more than one box can be scaled. Zlotnick does not need to also teach about bounding boxes since this already disclosed by Carroll, but rather the only part of Zlotnick that is used is the teaching that multiple boxes can be scaled. Even if the boxes are not the same type, the teaching of doing scaling to multiple boxes can be applied to another type of box. In fact Zlotnick is a good reference because he is also matching images to templates so he is solving a similar problem of how to scale multiple image boxes that do not match template boxes and it would be make sense to combine these references.

Regarding applicant’s argument that, Looking now at Fua, the bounding box that Fua mentions in paragraph [0028] is a 3D spatio-temporal box, and has nothing to do with a bounding box for text, as claim 11, from which claim 20 depends, recites. See, e.g., FIG. 1 of Fua: Consequently, Fua supplies none of the deficiencies of Carroll or of the other prior art on which the Examiner has relied. Accordingly, Applicant submits that claim 20 is patentable for at least this additional reason as well.
010-9224-0819/1/AMERICAS Examiner’s Response: Fua teaches aligning boxes using a neural network. The fact that the boxes are not exactly the same should not be a problem. It would be obvious to one of skill in the art to apply the teaching of aligning and scaling of one type of a box to another type of a box.
U.S. Application No. 16/836,778 Attorney Docket No. 116538.00036Response to Non-Final Office Action mailed June 14, 2021
	Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1 and 11 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Carroll (US Pub. 2017/0147552 A1).
Regarding claim 1, Carroll discloses, a method comprising: responsive to receipt of an input image containing text, identifying a template image that matches said input image; (See Carroll ¶101, “At block 905, a computer system, such as processing system 1400 of FIG. 14, receives an image of a form, and receives an image of a 
providing a bounding box around at least some of the text in the input image; (See Carroll ¶83, “The computer can analyze line segments that are within the indicated drawn bounding rectangle to determine both the location and extent of table 600/700, as well as the location and extent of each of the fields of the table, such as field 605/705 (blocks 156 and 158).”)
matching the bounding box in the input image with a bounding box in the template image; (See Carroll ¶102, “An image of a form can be matched with a form template from the library of form templates, such as by matching the image of the form with an image of the form template.”)
responsive to a determination that the bounding box in the input image is not aligned with the bounding box in the template image, aligning the bounding box in the input image with the bounding box in the template image; and responsive to a determination that the bounding box in the input image is not the same size as the bounding box in the template image, scaling the bounding box in the input image to be substantially the same size as the bounding box in the template image.  (See Carroll ¶108, “At block 920, the computer system scales and shifts the rotation aligned version of the thin feature image.  When an image of a form is created, the image can have been created at a different scale, or at an offset relative to an image of the form template of which the form is an instance.  For example, the image of the form can have 

Regarding claim 11, Carroll discloses, a computer-implemented system comprising at least one processor, volatile memory, and non-volatile storage, the system, when the at least one processor is programmed, performing the following method: (See Carroll ¶174, “Memory 1411 may store data and instructions that configure the processor(s) 1410 to execute operations in accordance with the techniques described above.”)
responsive to receipt of an input image containing text, identifying a template image that matches said input image; providing a bounding box around at least some of the text in the input image; matching the bounding box in the input image with a bounding box in the template image; responsive to a determination that the bounding box in the input image is not aligned with the bounding box in the template image, aligning the bounding box in the input image with the bounding box in the template image; and responsive to a determination that the bounding box in the input image is not the same size as the bounding box in the template image, scaling the bounding box in the input image to be substantially the same size as the bounding box in the template image.  (See the rejection of claim 1 as it is equally applicable for claim 11 as well.)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 2-9 and 12-19 are rejected under 35 U.S.C. 103 as being unpatentable over Carroll (US Pub. 2017/0147552 A1) in view of Wang et al. (US Pub. No. 2019/0065833 A1).
Regarding claim 2, Carroll discloses, a method as claimed in claim 1, wherein the aligning comprises: locating a template image bounding box origin and an input image bounding box origin; (See Carroll ¶83, “The computer can analyze line segments that are within the indicated drawn bounding rectangle to determine both the location and extent of table 600/700.”)

setting a first translation value for translation along a first axis and a second translation value for translation along a second axis to match the template image bounding box origin with the input image bounding box origin; (See Carroll ¶153, “The computer system, at blocks 1215 through 1235, iterates though various scale and shift values in an attempt to determine a scale and a shift value that optimizes an alignment of the first image with a second image.  In some embodiments, the scaling and shifting is done separately for the x and y dimensions.”)
responsive to a determination that translation along the first axis is required, incrementing the first translation value; responsive to a determination that translation along the second axis is required, incrementing the second translation value.  (See Carroll, ¶156, “After each iteration over the range of X or Y offsets, at block 1230, a determination is made whether the first image has been shifted over the range of X or Y offsets. If yes, then block 1240 is executed next.  At block 1240, the computer system determines a scale value and a shift value that optimizes a cross-correlation of the first image and the second image. …The shift value, which is an offset of the first image in the X and Y dimensions, causes a table in the first image to substantially align with a table in the second image.”)

However Wang discloses, calculating an intersection over union for the image bounding box and the template bounding box; (See Wang ¶57, “Using FIG. 3 as an example, the first bounding box 302 and the second bounding box 304 can be determined to match for tracking purposes if an overlapping area between the first bounding box 302 and the second bounding box 304 (the intersecting region 308) divided by the union 310 of the bounding boxes 302 and 304 is greater than an IOU threshold (denoted as TIOU < (Area of Intersecting Region 308) / (Area of Union 310 ).The IOU threshold can be set to any suitable amount, such as 50%, 60%, 70%, 75%, 80%, 90%, or other configurable amount.  In one illustrative example, the first bounding box 302 and the second bounding box 304 can be determined to be a match when the IOU for the bounding boxes is at least 70%.  The object in the current frame can be determined to be the same object from the previous frame based on the bounding boxes of the two objects being determined as a match.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to substitute the Intersection over Union used to compute correlation between bounding boxes as suggested by Wang for Carroll’s alignment score computed using cross correlation using known engineering techniques, with a reasonable expectation of success. The motivation for doing so is in order to accurately compute correlation since IOU is a metric that rewards predicted bounding 

Regarding claim 3, Carroll and Wang disclose, a method as claimed in claim 2, further comprising repeating incrementing the first translation value until translation along the first axis is complete, and repeating incrementing the second translation value until translation along the second axis is complete and the template image bounding box origin and the input image bounding box origin match.  (See Carroll, ¶156, “After each iteration over the range of X or Y offsets, at block 1230, a determination is made whether the first image has been shifted over the range of X or Y offsets. If yes, then block 1240 is executed next.  At block 1240, the computer system determines a scale value and a shift value that optimizes a cross-correlation of the first image and the second image. …The shift value, which is an offset of the first image in the X and Y dimensions, causes a table in the first image to substantially align with a table in the second image.”)

Regarding claim 4, Carroll and Wang disclose, a method as claimed in claim 1, wherein the scaling comprises: estimating a range of scaling to match a size of the template image bounding box and a size of the input image bounding box; (See Carroll ¶151, “At block 1205, a computer system or a user identify a range of scales over which to scale a first image, such as from a 50% scale to a 150% scale.  With knowledge of the various ways that images of forms are created, the user can define maximum and 
setting a first scaling value for scaling along a first axis and a second scaling value for scaling along a second axis to match the size of the template image bounding box and the size of the input image bounding box; “At block 1215, the computer system scales the first image in the X or Y dimension by a scale amount.  The computer system, at blocks 1215 through 1235, iterates though various scale and shift values in an attempt to determine a scale and a shift value that optimizes an alignment of the first image with a second image.  In some embodiments, the scaling and shifting is done separately for the x and y dimensions.”)
responsive to a determination that scaling along the first axis is required, incrementing the first scaling value; responsive to a determination that scaling along the second axis is required, incrementing the second scaling value. (See Carroll ¶156, “At block 1240, the computer system determines a scale value and a shift value that optimizes a cross-correlation of the first image and the second image.  The scale value, which may be a different scale in the X and Y dimensions, or may be a same value in both dimensions, stretches or shrinks the first image to cause a table in the first image to be substantially the same size as a table in the second image.”)
Carroll discloses in ¶155 computing an “alignment score” using cross correlation to determine if the bounding box forms are aligned properly but he fails to disclose using Intersection over union instead to determine correlation between bounding boxes.
However Wang discloses, calculating an intersection over union for the image bounding box and the template bounding box; (See Wang ¶57, “Using FIG. 3 as an 
The proposed combination of Carroll and Wang and the motivation presented in the rejection of claim 2 are equally applicable to claim 3 and are incorporated by reference.

Regarding claim 5, Carroll and Wang disclose, a method as claimed in claim 4, further comprising repeating incrementing the first scaling value until width scaling along the first axis is complete and repeating incrementing the second scaling value until width scaling along the second axis is complete and the template image bounding box and the input image bounding box are substantially the same size.  (See Carroll ¶156, “If yes, at block 1235, a determination is made whether the first image has been scaled over the range of scales.  If no, block 1215 is executed next at the next scale amount.  If yes, then block 1240 is executed next.  At block 1240, the computer system determines 

Regarding claim 6, Carroll and Wang disclose, a method as claimed in claim 1 wherein, responsive to a determination that an intersection over union of the bounding box in the input image and the bounding box in the template exceeds a predetermined amount, aligning and scaling are not required.  (As disclosed in the rejection of claim 2, Wang discloses that when the IOU is exceeds a threshold there is correlation, therefore the alignment and scaling of Carroll which requires a low alignment score (based on correlation) to perform additional alignment in steps 1215 and 1220 are not performed.)
The proposed combination of Carroll and Wang and the motivation presented in the rejection of claim 2 are equally applicable to claim 6 and are incorporated by reference.

Regarding claim 7, Carroll and Wang disclose, a method as claimed in claim 1 wherein, after the aligning, responsive to a determination that an intersection over union of the bounding box in the input image and the bounding box in the template exceeds a predetermined amount, scaling is not required.  (As disclosed in the rejection of claim 2, Wang discloses that when the IOU is exceeds a threshold there is correlation, therefore the alignment and scaling of Carroll which requires a low alignment score (based on correlation) to perform additional scaling in step 1215 is not performed.)


Regarding claim 8, Carroll and Wang disclose, a method as claimed in claim 1, wherein the determination that the bounding box in the input image is not aligned with the bounding box in the template image is made by calculating an intersection over union for the two bounding boxes and determining whether the calculated intersection over union falls below a predetermined amount.  (As disclosed in the rejection of claim 2, Wang discloses that when the IOU is less than threshold there is no correlation, therefore the alignment and scaling of Carroll which requires a low alignment score (based on correlation) to perform additional alignment in step 1220 is performed to correct the alignment.)
The proposed combination of Carroll and Wang and the motivation presented in the rejection of claim 2 are equally applicable to claim 8 and are incorporated by reference.

Regarding claim 9, Carroll and Wang disclose, a method as claimed in claim 1, wherein the determination that the bounding box in the input image is not the same size as the bounding box in the template image is made by calculating an intersection over union for the two bounding boxes and determining whether the calculated intersection over union falls below a predetermined amount.  (As disclosed in the rejection of claim 2, Wang discloses that when the IOU is less than threshold there is no correlation, 
The proposed combination of Carroll and Wang and the motivation presented in the rejection of claim 2 are equally applicable to claim 9 and are incorporated by reference.

Regarding claim 12, Carroll and Wang disclose, a computer-implemented system as claimed in claim 11, wherein the aligning comprises: locating a template image bounding box origin and an input image bounding box origin; estimating a range of translation to match the template image bounding box origin with the input image bounding box origin; setting a first translation value for translation along a first axis and a second translation value for translation along a second axis to match the template image bounding box origin with the input image bounding box origin;  calculating an intersection over union for the image bounding box and the template bounding box; responsive to a determination that translation along the first axis is required, incrementing the first translation value; responsive to a determination that translation along the second axis is required, incrementing the second translation value.  (See the rejection of claim 2 as it is equally applicable for claim 12 as well.)

Regarding claim 13, Carroll and Wang disclose, a computer-implemented system as claimed in claim 12, the method further comprising repeating incrementing the first translation value until translation along the first axis is complete, and repeating 

Regarding claim 14, Carroll and Wang disclose, a computer-implemented system as claimed in claim 11, wherein the scaling comprises: estimating a range of scaling to match a size of the template image bounding box and a size of the input image bounding box; setting a first scaling value for scaling along a first axis and a second scaling value for scaling along a second axis to match the size of the template image bounding box and the size of the input image bounding box; calculating an intersection over union for the image bounding box and the template bounding box; responsive to a determination that scaling along the first axis is required, incrementing the first scaling value; responsive to a determination that scaling along the second axis is required, incrementing the second scaling value.  (See the rejection of claim 4 as it is equally applicable for claim 14 as well.)

Regarding claim 15, Carroll and Wang disclose, a computer-implemented system as claimed in claim 14, the method further comprising repeating incrementing the first scaling value until width scaling along the first axis is complete and repeating incrementing the second scaling value until width scaling along the second axis is complete and the template image bounding box and the input image bounding box are 

Regarding claim 16, Carroll and Wang disclose, a computer-implemented system as claimed in claim 11 wherein, responsive to a determination that an intersection over union of the bounding box in the input image and the bounding box in the template exceeds a predetermined amount, aligning and scaling are not required.  (See the rejection of claim 6 as it is equally applicable for claim 11 as well.)

Regarding claim 17, Carroll and Wang disclose, a computer-implemented system as claimed in claim 11 wherein, after the aligning, responsive to a determination that an intersection over union of the bounding box in the input image and the bounding box in the template exceeds a predetermined amount, scaling is not required.  (See the rejection of claim 7 as it is equally applicable for claim 17 as well.)

Regarding claim 18, Carroll and Wang disclose, a computer-implemented system as claimed in claim 11, wherein the determination that the bounding box in the input image is not aligned with the bounding box in the template image is made by calculating an intersection over union for the two bounding boxes and determining whether the calculated intersection over union falls below a predetermined amount.  (See the rejection of claim 8 as it is equally applicable for claim 18 as well.)

Regarding claim 19, Carroll and Wang disclose, a computer-implemented system as claimed in claim 11, wherein the determination that the bounding box in the input image is not the same size as the bounding box in the template image is made by calculating an intersection over union for the two bounding boxes and determining whether the calculated intersection over union falls below a predetermined amount.  (See the rejection of claim 9 as it is equally applicable for claim 19 as well.)

Claims 10 is rejected under 35 U.S.C. 103 as being unpatentable over Carroll (US Pub. 2017/0147552 A1) in view of Zlotnick (US Pat. No. 6,778,703 B1).
Regarding claim 10, Carroll discloses, a method as claimed in claim 1, but he fails to disclose the following limitations.
However Zlotnick discloses, further comprising repeating the providing and matching for a different bounding box, and performing the scaling using the different bounding box.  (See Zlotnick 13:64-14:4, “FIG. 8 is a flow chart that schematically illustrates details of fine alignment step 122, in accordance with a preferred embodiment of the present invention.  Once all of the box matches have been performed at step 118, there is an estimated offset of each box located in document 24 with respect to the reference areas of the identified template.  This information is used to compute an exact geometric transformation between the observed and reference images.”
Further see Zlotnick 14:30-42, “Scale differences between the document image and the template are computed at a scale evaluation step 148.  For each two reference boxes that were not rejected at step 140, a horizontal projection of the line between centers of the two boxes in the template and a horizontal projection of the line between 

both projections are longer than a preset threshold, typically 100 pixels, then the ratio between these two lengths is computed.  This ratio is recorded, together with the minimum matching score of the two boxes, as a horizontal scale candidate.  Similarly, vertical scale candidates are found using vertical projections of lines taken between the centers of different pairs of boxes.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the matching and scaling using multiple pairs of boxes as suggested by Zlotnick to Carroll’s scaling using a single box using known engineering techniques, with a reasonable expectation of success. The motivation for doing so is in order to more accurately scale a document that contains multiple boxes at different scales.

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Carroll (US Pub. 2017/0147552 A1) in view of Fua et al. (US Pub. No. 2017/0316578 A1).
Regarding claim 20, Carroll discloses, a computer implemented system according to claim 11, but he fails to disclose the following limitation.
However Fua discloses, further comprising a neural network to perform at least one of the aligning and the scaling. (See Fua ¶28, “Furthermore, we show that, for this approach to perform to its best, it is essential to align the successive bounding boxes of the spatio-temporal volume so that the person inside them remains centered.  To this end, we trained two Convolutional Neural Networks to first predict large body shifts between consecutive frames and then refine them.”)
.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID PERLMAN whose telephone number is        (571) 270-1417. The examiner can normally be reached on Monday - Friday; 10:00am - 6:30pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on (571) 272-3638.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DAVID PERLMAN/Primary Examiner, Art Unit 2662