Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Allowable Subject Matter
Claims 1-20 are allowed.
The following is an examiner’s statement of reasons for allowance: the prior art to: An, et al. (US 2021/0011974 A1) disclose a system for extracting named entity information from full text is depicted in accordance with an illustrative embodiment. System 200 may be implemented as, for example, server computer 104 in FIG. 1. System 200 includes a sequence classifier 202 that extracts named entity field values 220 from a document 222. system 200 includes an optical character recognition module 224 that scans document 222 to extract text data 226. Sequence classifier 202 includes a machine learning module 204 that extracts named entity field values 220 from text data 226. Machine learning module 204 includes a tokenizer 206 that creates tokenizes (i.e., converts the text to numerical values) text data 226 to create a tokenized layer 208. Machine learning module 204 creates an embedded layer 210 from the tokenized layer 208 and creates a one-dimensional Xception layer 212 from the embedded layer 210. A number of sequence of classification layers 214 are created from one-dimensional Xception layer 212. In various embodiments, sequence of classification layers 214 includes one or more LSTM layers 216 and/or one or more fully connected neural networks (NN) 218. Machine learning module 204 extracts named entity field values 220 from the last layers provided in sequence of classification layers 214 (See for example, paragraph 0036); Stark (US 2019/0385001) discloses extracting multiple different data types from the same input image. For example, an image 350 may be a receipt including a purchaser name, vendor name, vendor address, and tax identification (Tax ID). The image is sent to an OCR system 302, which may return an ordered set of characters (e.g., recognized characters ordered from left to right and top to bottom). The ordered set of input characters may be encoded as numeric values, for example, and input to a plurality of neural networks trained to extract different data types. For example, the input characters may be received by encoder RNN 300 and masking layer 301, which have been trained to extract a purchaser name data type. Probabilities generated for each character may be selected and translated to output a purchaser name (if found). Similarly, the input characters may be received by encoder RNN 310 and masking layer 311, which have been trained to extract a vendor name data type. Probabilities generated for each character may be selected and translated to output a vendor name (if found). Likewise, a vendor address and taxID may be extracted from the input characters using encoder RNN 320 and masking layer 321, which have been trained to extract a vendor address data type, and encoder RNN 330 and masking layer 331, which have been trained to extract a taxID data type. Probabilities generated for each character may be selected and translated to output a vendor address and taxID, respectively (if found) (See for example, Paragraph 0021); and Tang, et al. (WO 2016/197381 A1) disclose Methods and apparatus for recognizing text in an image are disclosed. According to an embodiment, the method comprises encoding the image into a first sequence with a convolutional neural network (CNN), wherein the first sequence is an output from a last second convolutional layer of the CNN; decoding the first sequence with a recurrent neural network (RNN) into a second sequence, which has a same length as the first sequence and indicates estimated probabilities over all possible characters corresponding to each component in the first sequence; and mapping the second sequence into a word string in which repeated labels and non-character labels are removed (See for example, the Abstract). 
The instant invention is directed to a computer-implemented method and a server for extracting textual entities from a structured document.  Upon receiving, by the server: a plurality of text sequences having been extracted from an image by an optical character recognition (OCR) model having processed the image to generate the structured document; and a plurality of structural elements, each structural element being indicative of a location of a respective text sequence of the plurality of text sequences in the structured document, the instant invention encodes, by a first MLA (Machine Learning Algorithm), the plurality of text sequences and the plurality of structural elements to obtain a 3D encoded image, the 3D encoded image being indicative of semantic characteristics of the plurality of text sequences, the 3D encoded image having a spatial structure of the structured document.  Thereafter, the instant invention compresses, by a second MLA, the 3D encoded image to obtain an aggregated feature vector, the aggregated feature vector being indicative of a combination of spatial characteristics and semantic characteristics of the 3D encoded image; and decodes, by a third MLA, the aggregated feature vector to extract an associated set of textual entities, a given textual entity being associated with at least one text sequence in the plurality of text sequences. None of the prior art disclose or fairly suggest encoding, by the first MLA, the plurality of text sequences and the plurality of structural elements to obtain a 3D encoded image, the 3D encoded image being indicative of semantic characteristics of the plurality of text sequences, the 3D encoded image having a spatial structure of the structured document; compressing, by the second MLA, the 3D encoded image to obtain an aggregated feature vector, the aggregated feature vector being indicative of a combination of spatial characteristics and semantic characteristics of the 3D encoded image; and decoding, by the third MLA, the aggregated feature vector to extract an associated set of textual entities, a given textual entity being associated with at least one text sequence in the plurality of text sequences.  It is for these reasons and in combination with all of the other elements of the claims that claims 1-20 are allowable over the above-identified prior art or any of the prior art of record.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US Patent Application Serial Numbers: 2019/0385001 and 2021/0011974; and WIPO document (WO  2016/197381 A1).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL G MARIAM whose telephone number is (571)272-7394. The examiner can normally be reached M-F 7:30-5:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, EDWARD F URBAN can be reached on 571-272-7899. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DANIEL G MARIAM/Primary Examiner, Art Unit 2665