Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Response to Arguments
Applicant's arguments filed 3/3/2021 have been fully considered but they are not persuasive.
Applicant’s remarks – (page 7) Applicant argues while the claim does not state a specific order of steps in so many words (i.e., it does not say “first do x, then y, then z”), it is clear that certain portions of the claimed method must be performed in a particular sequence in order to make sense. 
Examiner’s response – Examiner agree proper steps and Cao et al teaches in figure 1 and figure 7 teaches flowchart, with specific order of steps, to carry out the proper set of instruction for the invention to work. If that claim does not claim require specific order than this will not be consider during examination. Please amend if particular specific order is required. 

Applicant’s remarks – (page 8) Cao does not disclose, teach, or suggest any merging of outputs of the RPN whatsoever. The Office Action merely notes that Cao teaches the combination of CNN and RPN. See Office Action page 3. This is arguably a merger of processing techniques, but it is not a merger of text proposals into a patch. In fact, Cao is completely silent on this point. 
Examiner’s response – Examiner respectfully disagree and direct to figure 2 of Cao et al. Figure 2 teaches CNN 110 where within the CNN have RPN 202, which further 

35 USC 102 – Claim Rejection
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated over Cao et al (US 10,878,270).
Claim 1, similarly claims 8 and 14:
Cao et al teaches the method by means of flowchart such as figure 7.
Cao et al (US 10,878,270) teaches the following subject matter:
generating, by a processor, a plurality of intermediate feature layers of an image using convolutional neural network (CNN) processing (figure 2 part 206 teaches convolutional layers/plurality of layer of region proposals 112; column 6 starting line 40 detail figure 2 structure of CNN and RPN layers for incoming images);
for each intermediate feature layer, generating, by the processor, at least one text proposal using a region proposal network (RPN), the at least one text proposal comprising a portion of the intermediate feature layer that is predicted to contain text (figure 2 and part 202 teaches region proposal network );
merging, by the processor, at least two of the text proposals with one another to form a patch of the image that is predicted to contain text (abstract teaches identifying bounding regions with in an image to predict key point regions with texts; column 2 starting line 1 teaches the use of CNN combing with RPN, starting line 25 for word detection module;  figure 3-5 where  further detail figure 3 in column 7 line 20 and figure 5 in column 8 lines 20-35, where bound regions 314 that are form a patch together from per-word 504);
determining, by the processor, outer coordinates of the patch, the outer coordinates comprising at least leftmost, rightmost, topmost, and bottommost coordinates (column 5 starting line 20 teaches patch consideration such as top-left, bottom-right, top-middle..etc); and
generating, by the processor, a quadrilateral of the image that is a smallest quadrilateral including the leftmost, rightmost, topmost, and bottommost coordinates (figure 3 part 314 teaches bounding region view as quadrilateral; column 7 starting line 20 teaches segmentation such as the word “amazon” with the text/word oriented on a slight declined angle; column 2 starting line 35 teaches machine learning ML to predict region/bounding box location result in quadrilateral mask; figure 5 part 314 teaches the different size quadrilateral that are later form together with coordinates as mentioned above).
Regarding claim 8: Caro et al teaches the method carry out by processor(s) in figure 10.
Regarding claim 14 regarding system with a memory configured to store an image and a plurality of instructions; and a processor in communication with the memory is taught in figure 10 with processor 1010A-N with system memory 1020 all communicating by interfaces 1030. 
These are applied to dependent claims.

Claim 2, similarly claims 9 and 15:
defining, by the processor, a projective plane of the image including the quadrilateral (figure 3 teaches the plane/region that are consider for quadrilateral output); and
determining, by the processor, an inverse transformation of the projective plane of the image to transform the quadrilateral into a right angled rectangle (column 7 and starting line 40, especially line 60 teaches re-align text to more typical orientation, where this method would increase overall accuracy of the system; column 5 starting line 60 teaches box with vertical matter in 90degrees).





further comprising cropping, by the processor, at least a portion of the image outside the quadrilateral (column 2 starting line 5 teaches rectify word crop to improve quality of OCR).

Claim 4, similarly claim 17:
further comprising performing, by the processor, OCR processing on image data within the quadrilateral (column 2 starting line 5 teaches application of optical character recognition).

Claim 5, similarly claims 11 and 18:
training, by the processor, a machine learning model to recognize outer coordinates of patches within images using the quadrilateral as an input (column 3 starting line 15 teaches training of one or more neural network/machine learning model not to limit to RPN, which is view as outside the patches).

Claim 6, similarly claims 12 and 19:
wherein the CNN, the RPN, or a combination thereof are components of a faster region-based convolutional neural network (Faster R-CNN) architecture executed by the processor (column 6 starting line 50 teaches the use of Faster R-CNN).



identifying at least two horizontally-aligned text proposals or merged regions and merging them into a first region (column 2 starting line 20 teaches consideration with segmentation of horizontal lines; column 3 starting line 5 looking at label, segmentation and localization to identifying of horizontal; column 6 starting line 5; column 5 line 55; column 7 starting line 29 teaches horizontal by angel consideration); 
identifying at least two vertically-aligned text proposals or merged regions and merging them into a second region (column 5 starting line 60 teaches box with vertical matter in 90degrees); and
merging the first region and the second region (column 7 and starting line 40, especially line 60 teaches re-align text to more typical orientation, where this method would increase overall accuracy of the system).

Claims: 21, similarly claims 22 and 23:
wherein the merging comprises selecting the at least two text proposals from among all of the text proposals on the basis of the at least two text proposals having an x coordinate or y coordinate in common or respectively having adjacent x coordinates or y coordinates (above teaches patches from the quadrilateral with text with in; column 5 starting line 15 to 30 teaches the use of x,y coordinate to generating set of line channels 114 and/or sets of word channels 116).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Yang et al (US 10,198,671) teaches Dense captioning with joint interference and visual context
SIGAL et al (US 2019/0138850) teaches WEAKLY-SUPERVISED SPATIAL CONTEXT NETWORKS
Cao et al (US 10,878,270) teaches Keypoint-based multi-label word segmentation and localization

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TSUNG YIN TSAI whose telephone number is (571)270-1671.  The examiner can normally be reached on 5:30am - 3:30pm.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on (571)270-1051.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/TSUNG YIN TSAI/Primary Examiner, Art Unit 2663