DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .



Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.


Drawings
The drawings filed 6/25/2020 were accepted.



Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claims do not fall within at least one of the four categories of patent eligible subject matter because they are directed to an abstract idea without significantly more. The claims recite the abstract idea of analyzing an unstructured document, partitioning it into columns based on the location of the text, and extracting key/value pairs from the text based on their location on the document.
The limitations that describe the determining the locations of words, recognizing the words, partitioning the text, and extracting key/value pairs are processes that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. Aside from the digitization of the data using optical character recognition and exporting of key/value pairs as a JSON (JavaScript Object Notation) document, nothing in the claims precludes the steps from practically being performed in the mind.
The judicial exception is not integrated into a practical application because the additional elements of using optical character recognition and exporting data as a JSON can be performed on a generic computer. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because digitization/OCR of document images and exporting data are not considered significantly more than just applying the steps of analyzing an unstructured document, partitioning it into columns based on the location of the text, and extracting key/value pairs from the text based on their location on the document. 
In addition to the abstract idea, the claims describe a computer, a processing unit, a non-transitory computer readable medium and program code, but they represent only well-understood, routine, conventional activity that can be performed on generic computers. Mere instructions to apply an exception using generic computer components cannot provide an inventive concept. The claims are not patent eligible.



Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.




Claims 1-2, 4-5, 9, 12-13, 15, 18, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Dejean et al (US 20160063322 A1; filed 9/3/2014).

With regards to claim 1, Dejean discloses a computer-implemented method for extracting text from an unstructured document (Dejean, title: “Method and system of extracting label:value data from a document”), the method comprising: determining respective spatial locations of a plurality of words identified in the unstructured document (Dejean, paragraph 60: “An OCR engine is applied to a scanned invoice 802. Characters and their position are extracted”); reconstructing a plurality of lines from the unstructured document based upon the determined spatial locations of the plurality of words (Dejean, paragraph 99: “A 2-step method is used to extract data: first all label:value data are extracted, then a ; generating a vertical partition for the unstructured document based upon the determined spatial locations of the plurality of words (Dejean, paragraph 19: “performing a layout analysis of the digital version of the document to generate one or more layout structures associated with the document, each layout structure including a plurality of structural elements vertically or horizontally aligned where each structural element is defined as a typographical box including one or more lines of textual elements associated with the digital version of the document;” The vertical separations between boxes are being interpreted as the partitions); and generating a plurality of key/value pairs from one or more of the plurality of lines, including, for each line among the one or more lines (Dejean, paragraph 19: “processing the one or more lines of textual elements to identify and tag textual elements associated with label and value pairwise data”), assigning one or more words located on one side of the vertical partition as a key for the key/value pair for the line and assigning one or more words located on another side of the vertical partition as a value for the key/value pair for the line (Dejean, paragraph 19: “extracting label and value pairwise data from the one or more label:value sequences of tagged elements.” Fig. 3 and fig. 4 are used as examples which both show documents where the labels are in a block to the left of the values).

With regards to claim 2, which depends on claim 1, Dejean discloses receiving the unstructured document as an image; and performing optical character recognition on the image to identify the plurality of words in the unstructured document (Dejean, paragraph 52: “Step A) Text extraction 804: optical character recognition (OCR) for document image”).


 wherein reconstructing the plurality of lines includes processing each of the plurality of words by assigning such word to an existing line if such word vertically overlaps the existing line and creating a new line if such word does not overlap an existing line (Dejean, paragraph 19: “performing a layout analysis of the digital version of the document to generate one or more layout structures associated with the document, each layout structure including a plurality of structural elements vertically or horizontally aligned where each structural element is defined as a typographical box including one or more lines of textual elements associated with the digital version of the document”).

With regards to claim 5, which depends on claim 4, Dejean discloses wherein reconstructing the plurality of lines includes determining a vertical coordinate for each line from the vertical coordinates of the words assigned to such line (Dejean, paragraph 60: “An OCR engine is applied to a scanned invoice 802. Characters and their position are extracted” Paragraph 65: “Several layout structures are recognized, including… linear layout structures… A linear structure corresponds to lines, table rows, and table columns.”).


With regards to claim 9, which depends on claim 1, Dejean discloses wherein words located to the left of the vertical partition are keys and words located to the right of the vertical partition are values (Dejean, paragraph 40: “Some positional constraints can be added… “amount” key phrases that indicate a total are required to appear “Above” or “LeftOf” a currency amount entity in order to be associated with it.”).



Claim 15 recites substantially similar limitations to claim 5 and is thus rejected along the same rationale.

Claim 18 recites substantially similar limitations to claim 9 and is thus rejected along the same rationale.

Claim 20 recites substantially similar limitations to claim 1 and is thus rejected along the same rationale.




Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the 

Claims 3, 6-8, 10, 14, 16-17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Dejean, in view of Singh (US 20110249905 A1; filed 6/23/2011).

With regards to claim 3, which depends on claim 1, Dejean does not disclose after reconstructing the plurality of lines, sorting the plurality of lines based upon respective vertical coordinates associated with the plurality of lines and sorting the words in each line based upon respective horizontal coordinates associated with the words in each line.
However, Singh teaches after reconstructing the plurality of lines, sorting the plurality of lines based upon respective vertical coordinates associated with the plurality of lines and sorting the words in each line based upon respective horizontal coordinates associated with the words in each line (Singh, paragraph 261: “(i) Sort all canonical label value pairs extracted in raster order”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Dejean and Singh such that the data from the extracted lines are sorted according to their position in the document. This would have enabled the invention to organize the data (Singh, paragraph 280: “After the above process, data identification system 830 organizes the data extracted from such multi-form images into a set of m records as indicated by the layout.”).

With regards to claim 6, which depends on claim 1, Dejean does not disclose wherein generating the vertical partition includes: organizing the plurality of words into text blocks; traversing the unstructured document to determine an amount of overlap of text blocks for each of a plurality of horizontal positions; and positioning the vertical partition at a horizontal position having a minimum amount of overlap.
wherein generating the vertical partition includes: organizing the plurality of words into text blocks; traversing the unstructured document to determine an amount of overlap of text blocks for each of a plurality of horizontal positions; and positioning the vertical partition at a horizontal position having a minimum amount of overlap (Singh, paragraph : “wherein at least some of the identified rows in close proximity are collected to form table formations; and automatically identifying columns of the table using at least some of the plurality of image sections that are vertically aligned, wherein the identified columns are grown in each of the table formations using gaps in vertical projections of the plurality of image sections until an obstruction is reached.”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Dejean and Singh such that text blocks were divided using the vertical gaps between them. This would have allowed the invention to identify columns (Singh, paragraph 299: “Grow columns within each candidate table formation using gaps in their vertical projections until an obstruction is hit. Break the table formation at that point (FIG. 30).”).

With regards to claim 7, which depends on claim 6, Dejean does not disclose wherein the amount of overlap at each horizontal position is determined based upon a number of text blocks overlapped at such horizontal position and a vertical dimension of each overlapping text block.
However, Singh teaches wherein the amount of overlap at each horizontal position is determined based upon a number of text blocks overlapped at such horizontal position and a vertical dimension of each overlapping text block (Singh, paragraph : “wherein at least some of the identified rows in close proximity are collected to form table formations; and automatically identifying columns of the table using at least some of the plurality of image sections that are vertically aligned, wherein the identified columns are grown in each of the table formations using gaps in vertical projections of the plurality of 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Dejean and Singh such that text blocks were divided using the vertical gaps between them. This would have allowed the invention to identify columns (Singh, paragraph 299: “Grow columns within each candidate table formation using gaps in their vertical projections until an obstruction is hit. Break the table formation at that point (FIG. 30).”).

With regards to claim 8, which depends on claim 6, Dejean does not disclose wherein positioning the vertical partition further includes positioning the vertical partition at a horizontal position having a minimum amount of overlap and having regions of higher overlap on each side of the horizontal position.
However, Singh teaches wherein positioning the vertical partition further includes positioning the vertical partition at a horizontal position having a minimum amount of overlap and having regions of higher overlap on each side of the horizontal position (Singh, paragraph : “wherein at least some of the identified rows in close proximity are collected to form table formations; and automatically identifying columns of the table using at least some of the plurality of image sections that are vertically aligned, wherein the identified columns are grown in each of the table formations using gaps in vertical projections of the plurality of image sections until an obstruction is reached.” The vertical gaps are interpreted as vertical lines with 0 overlapping text blocks at the horizontal positions).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Dejean and Singh such that text blocks were divided using the vertical gaps between them. This would have allowed the invention to identify columns (Singh, paragraph 299: “Grow columns within each candidate table formation using gaps in their vertical projections until an obstruction is hit. Break the table formation at that point (FIG. 30).”).

With regards to claim 10, which depends on claim 9, Dejean does not disclose for a line among the plurality of lines having no words located to the left of the vertical partition, assigning one or more words to the right of the vertical partition as at least a part of a value for a key/value pair for a preceding line in the unstructured document.
However, Singh teaches for a line among the plurality of lines having no words located to the left of the vertical partition, assigning one or more words to the right of the vertical partition as at least a part of a value for a key/value pair for a preceding line in the unstructured document (Singh, paragraph 281: “A single table row can span multiple text lines in the image.”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Dejean and Singh such that the identified rows of the tables can consist of multiple text lines. This would have enabled the invention to handle more table formats appropriately for extraction (Singh, paragraph 281: “The strategy addresses the following problems with extracting data from tables… Conventional approaches to extract tables do not handle such wrapped tables in a robust manner.”).

Claim 14 recites substantially similar limitations to claim 3 and is thus rejected along the same rationale.

Claims 16-17 recite substantially similar limitations to claims 7-8 respectively and are thus rejected along the same rationales.

Claim 19 recites substantially similar limitations to claim 10 and is thus rejected along the same rationale.




Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Dejean, in view of Northrup et al (US 20190171875 A1; filed 12/1/2017).

With regards to claim 11, which depends on claim 1, Dejean does not disclose outputting the plurality of key/value pairs in a JavaScript Object Notation format.
However, Northrup et al teaches outputting the plurality of key/value pairs in a JavaScript Object Notation format (Northrup et al, paragraph 32: “Metadata 140 which defines digitized computational data may include, for example, characteristics metadata 145 such as content, position, and style, key-value-pairs metadata 146… The document digitization engine 120 may output metadata in a suitable markup e.g. JSON or XML, and in one embodiment may output metadata in a machine readable stylesheet representative of content of the document.”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Dejean and Northrup such that the extracted data is exported using JSON. This would have allowed the invention to convert unstructured data into a structured, machine-readable form (Northrup, Paragraph 91: "Embodiments herein process documents including unstructured documents to present computational data to the consumer in a structured format, e.g. JSON or XML.").



Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Lucas (US10902952B2): Teaches extracting information from a document by matching it to a template model.
Pisipati (US11216425B2): Teaches recognizing data in a table area from an unstructured document.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRODERICK C ANDERSON whose telephone number is (313)446-6566. The examiner can normally be reached Monday-Friday 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached on 5712724124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/B.C.A/Examiner, Art Unit 2178                                                                                                                                                                                                        /STEPHEN S HONG/Supervisory Patent Examiner, Art Unit 2178