DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
This  office action is in response to the amendment filed 11/30/2021.  Claims 1-5, 7-8, and 10-22 are pending in this application and have been considered below.  Claims 6 and 9 are canceled by the applicant.  

Applicant’s arguments with respect to claims 1-5, 7-8, and 10-20  have been considered but are moot in view of new ground(s) of rejection because of the amendments.
 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.

4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-5, 7-8, 10, and 12-18  is/are rejected under 35 U.S.C. 103 as being unpatentable over Mansfield et al. (US 2010/0174975 A1 – hereinafter “Mansfield”) in view of Messina et al. (US 2018/0336405 A1 – hereinafter “Messina”) further in view of Buisson et al. (US 2019/0171704 A1 – hereinafter “Buisson”).
Claim 1:
Mansfield discloses a method comprising: 
receiving a text of a document image that includes a plurality of words depicted in the document image (¶251 discloses receiving a portion of document that includes words and graphical elements); 
calculating a plurality of feature sets for the plurality of words (¶251 discloses character elements within the document portion are processed to find words based on spatial (location coordinates) and font features of the characters), wherein each feature set contains (¶251 discloses spatial and font features of characters of the corresponding word) of a corresponding word of the plurality of words (¶257 discloses the location coordinates of words are then used to determine lines and paragraphs); identifying candidate table words); 
identifying candidate table words (¶¶222, 223 discloses words within layouts) among the plurality of words based on the feature sets (¶¶282, 294 disclose identified words are organized into paragraphs based on the location coordinates of the words, and the paragraphs are grouped into layouts; ¶379 discloses wherein potential layouts (and therefore words within the layouts) that could be a table are added to a set S) (¶¶115, 125, 161 discloses a node graph/tree 430, 700, 1600); 
identifying, with a clustering procedure (¶382 discloses table conditions), a cluster of candidate table words that correspond to a table within the document image (¶382 discloses tables are identified among the layouts of the set S using table conditions that specify qualifying clustering conditions such as cell alignments and spacing); and 
defining a candidate table border of the table (¶92 and Fig. 2 teaches, “The incoming document is an unstructured document that has a collection of primitive elements that a human viewing the document would recognize as text, borders.” Mansfield teaches borders in numerous locations. For example, ¶120 and Fig. 5 discloses borders. ¶¶374, 382 discloses layouts are defined by implied borders, and the borders are combined into a table upon identifying the qualifying layouts, see Figure 58 and Figure 59, where layouts 5820 and 5825 and layout of zone 5810 are combined into a table with implied borders) that 
Mansfield discloses all of the subject matter as described above except for specifically teaching using a machine learning model.  However, Messina in the same field of endeavor teaches using a machine learning model (¶¶2, 36 discloses machine learning for digitizing and extracting meaning from objects).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Mansfield and Messina before the effective filing date of the claimed invention.  The motivation for this combination of references would have been to improve decomposition of the document in a convolutional neural network to aid in data extraction from graphic objects in a machine learning method (Messina ¶¶2, 53).  This motivation for the combination of Mansfield and Messina is supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).  
Mansfield discloses all of the subject matter as described above except for specifically teaching “encompasses the cluster of candidate table words … and one or more words of the text that are not candidate table words.” However, Buisson in the same field of endeavor teaches “encompasses the cluster of candidate table words … and one or more words of the text that are not candidate table words” (¶66 discloses “At step 303 … a dictionary-based approach may be applied in combination with a fuzzy match algorithm to find table header terms specified in a semantic groups definition file … a defined set of known table header terms,” i.e. candidate table words; ¶67 discloses “At step 304, a potential table data zone is identified, starting with the location of a detected table header (from step 303) and continuing across one or more pages of the input document(s) … classify data 12 that is part of the table.” i.e. non-candidate table words that are part of the table; ¶70 discloses “the candidate table is identified”).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Mansfield and Buisson before the effective filing date of the claimed invention.  The motivation for this combination of references would have been to improve detecting and extracting table data from input documents by using semantic analysis to identify table headers and raw text analysis to extract table data. (Buisson ¶14).  This motivation for the combination of Mansfield and Messina is supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).  
Claims 2 and 13:
The combination of Mansfield, Messina, and Buisson discloses the method of claim 1, wherein the features of the feature set include one or more text features selected from the group consisting of: orthographic properties of the corresponding word, syntactic properties of the corresponding word, and formatting properties of the corresponding word (Mansfield ¶257 discloses features of a word can be word spacing between words).
Claims 3 and 14:
The combination of Mansfield, Messina, and Buisson discloses the method of claim 1, wherein the features of the feature set include one or more spatial features selected from the group consisting of: a nearby ruler line distance, a neighbor alignment measurement, and a neighbor distance measurement (Mansfield ¶257 discloses features of a word can be word spacing between words).
Claims 4 and 15:
The combination of Mansfield, Messina, and Buisson discloses the method of claim 1, wherein the candidate table border is defined as the rectangle with the smallest area that contains the cluster of candidate table words (Mansfield ¶373 discloses layouts are defined by implied borders of smallest rectangle that contain all paragraphs in a layout; ¶382 and Figure 59 discloses a table of qualifying layouts are defined by the implied borders of the layouts; therefore, the implied borders of the table is the smallest rectangle that contains the words of qualifying layouts; Buisson ¶66 discloses candidate table words; ¶67 discloses  i.e. non-candidate table words that are part of the table; ¶70 discloses “the candidate table is identified”).  The motivation to combine for this/these claim(s) is the same as for the independent claim(s).
Claim 5:
Mansfield, Messina, and Buisson discloses the method of claim 1, wherein the clustering procedure (Mansfield ¶382) is a 
 (¶55).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Mansfield and Messina before the effective filing date of the claimed invention.  The motivation for this combination of references would have been to improve decomposition of the document in a convolutional neural network to aid in data extraction from graphic objects in a machine learning method (Messina ¶¶2, 53).  This motivation for the combination of Mansfield and Messina is supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).  	
Claims 7 and 16:
The combination of Mansfield, Messina, and Buisson discloses the method of claim 1, further comprising: predicting a reading order for at least a subset of the words of the text (Mansfield ¶315 discloses a reading orders that follows the expected order in which a human would read a column is assigned).
Claims 8 and 17:
The combination of Mansfield, Messina, and Buisson discloses the method of claim 7, wherein the text further includes a location for the subset of words (Mansfield ¶314 discloses layout information includes location coordinates; ¶282 discloses words within the layout; ¶294), and (Mansfield ¶424 discloses  an assigned reading order of top to bottom text flow dictates that texts on top of other texts come before the other texts), or (ii) the second word is at the same height as the first word and is positioned to the right of the first word according to the location.
Claims 10 and 18:
The combination of Mansfield, Messina, and Buisson discloses the method of claim 9, wherein the machine learning model is a recurrent neural network or a convolutional neural network (Messina ¶¶12-13, 41, 52-53, 75).  The motivation for this claim is the same as the independent claims.
Claim 12:
Mansfield discloses a system comprising: a processor; and a memory containing instructions that, when executed by the processor (¶482), cause the processor to…
The combination of Mansfield, Messina, and Buisson discloses the remaining elements recited in claim 12 for at least the reasons discussed in claim 1 above.
Claim 20:
Mansfield discloses a computer-readable medium containing instructions which, when executed by a processor (¶481, Claim 1), cause the processor to…
.

Allowable Subject Matter
Claims 11 and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 21-22 are allowed.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ross Varndell whose telephone number is (571)270-1922.  The examiner can normally be reached on M-F, 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kim Vu can be reached on (571)272-3859.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished 






/Ross Varndell/Primary Examiner, Art Unit 2666