DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter
Claims 1-20 are allowed.
The following is an examiner’s statement of reasons for allowance: 
the closest prior art is Noce (Document Image Classification Combining Textual and Visual Features, Ph.D. Thesis, University of Insubria, 2016), Tomizawa et al. (US 20080170786 A1), Yang et al. (Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Network, arXiv.org, Cornell University, 2017), and Bekas (US 2018/0204360 A1).

Regarding claim 1, Noce teaches a method comprising:
processing, by a computer , an input image of a document (Noce images in Fig. 4.1) to obtain textual information (Noce section 4.2.1 line 1 Textual information is extracted from each document image )and geometries of the textual information (Noce obvious from comparing embedded images in Fig. 4.3 to original images in Fig. 4.1);
converting, by the computer, the textual information to semantic information (Noce section 4.2.1 line 1 Textual information is extracted from each ) utilizing a language knowledge base (Noce page 33 line 1 using Natural Language dictionaries); 
generating, by the computer, a semantic image (Noce section 4.2.2 lines5-6 create a specific visual color feature for each class key-word contained in the processed image), the generating comprising positioning the semantic information in the semantic image utilizing the geometries of the textual information (Noce obvious from comparing embedded images in Fig. 4.3 to original images in Fig. 4.1).
Noce does not teach
extracting, by the computer, regions from the semantic image, each of the regions representing a portion of the semantic information in the semantic image; 
mapping, by the computer, the regions extracted from the semantic image to data blocks on a text layer of the document image; and 
extracting, by the computer utilizing the regions extracted from the semantic image and the geometries of the textual information, text data from the data blocks on the text layer of the document image.

Another close prior art is Tomizawa et al. (US 20080170786 A1).

Regarding claim 1, Tomizawa et al. teaches a method comprising:
processing, by a computer, an input image of a document (Tomizawa [0022] image data of an inputted document) to obtain textual information ([0045] The character recognition module 113 has a function of ) and geometries of the textual information (Tomizawa [0039] the document-area analysis module 102 analyzes, for example, positions and sizes of rectangular areas surrounding character strings);
converting, by the computer, the textual information to semantic information (Tomizawa Fig. 3 tables at the bottom); 
generating, by the computer, a semantic image (Tomizawa Fig. 4), the generating comprising positioning the semantic information in the semantic image utilizing the geometries of the textual information (Tomizawa [0040]  semantics (e.g., "title", "header", "footer", and "paragraph") of the objects extracted by the object extracting unit 102 in a layout of the image data).
However, Tomizawa et al. do not teach
extracting, by the computer, regions from the semantic image, each of the regions representing a portion of the semantic information in the semantic image; 
mapping, by the computer, the regions extracted from the semantic image to data blocks on a text layer of the document image; and 
extracting, by the computer utilizing the regions extracted from the semantic image and the geometries of the textual information, text data from the data blocks on the text layer of the document image.
In a similar endeavor, Yang et al. and Bekas teach generating the semantic image using neural networks but still fail to teach
extracting, by the computer, regions from the semantic image, each of the regions representing a portion of the semantic information in the semantic image; 
mapping, by the computer, the regions extracted from the semantic image to data blocks on a text layer of the document image; and 
extracting, by the computer utilizing the regions extracted from the semantic image and the geometries of the textual information, text data from the data blocks on the text layer of the document image.

Therefore, independent claims 1, 8, and 15 are allowable. 
Dependent Claims 2-7, 9-14, and 16-20 are allowable for depending from a corresponding independent claim.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAID M ELNOUBI whose telephone number is (571)272-9732.  The examiner can normally be reached on Monday-Friday 9:30AM to 6:00PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kathy Wang-Hurst can be reached on 571-270-5371.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SAID M ELNOUBI/Examiner, Art Unit 2644