DETAILED ACTION
	Claims 1-21 rejected under 35 USC § 103.


 Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 and 5-7 are rejected under 35 U.S.C. 103 as being unpatentable over Mukhopadhyay et al., U.S. PG-Publication No. 2020/0073878 A1, in view of Buisson et al., U.S. PG-Publication No. 2019/0171704 A1.

Claim 1
	Mukhopadhyay discloses a method, comprising: identifying a plurality of word bounding boxes in a first electronic document, each word bounding box from the plurality of word bounding boxes associated with a word from a plurality of words in the first electronic document. Mukhopadhyay discloses a "method for extracting structured information from an Id. at ¶¶ 33; 41; Fig. 7.
	Mukhopadhyay discloses the first electronic document including a table having a set of table cells positioned in a set of rows and a set of columns. The method includes "identifying rows and columns, such that the structure of the information from the implicit table may be categorized." Id. at ¶ 8.
	Mukhopadhyay discloses identifying, based on coordinates of the plurality of word bounding boxes, locations of horizontal white space between two adjacent rows from the set of rows. The information extraction method includes "identifying dominant rows of an implicit table." Id. at ¶ 11. In one embodiment, the method uses machine learning "to determine a set of features representing each of the dominant rows," wherein the comparing includes determining "a measure of similarity between each of the dominant rows." Id. at ¶¶ 47. Comparing individual features for each of the dominant rows "may include comparing the vertical separation of the dominant rows," i.e. "the spacing between each of [the] dominant rows." Id. at ¶ 63. Accordingly, the disclosed "vertical separation" between dominant rows is analogous to the claimed "horizontal white space between adjacent rows" of a table. Further, Mukhopadhyay discloses that "the rows may be each represented by two coordinates (x,y) in a two-dimensional space," and the coordinates are used to "find the distance between two rows." Id. at ¶ 57.
	Mukhopadhyay discloses determining, using a machine learning algorithm and based on (1) the locations of horizontal white space … a class from a set of classes for each row from the set of rows in the table. Mukhopadhyay discloses that "[m]achine learning may be performed to determine a set of features representing each of the dominant rows." Id. at ¶ 47. The set of Id. at ¶ 50. Specifically, "the features representing a row may include a header indicator" identified "by analyzing textual content and by analyzing the spacing of text that is vertically separated from the previous non-running text row." Accordingly, the disclosed "header indicator" classifies a given row as a header row.
	Mukhopadhyay discloses extracting a set of table cell values associated with the set of table cells based on, for each row from the set of rows, the class from the set of classes for that row. The placement of information in a column "may identify the information in the column as belonging to a category defined by a header of the column," and "the placement of information in a row may identify that the information in the row is related to each other." Further, the "method of extracting structured information from implicit tables may include identifying rows and columns, such that the structure of the information from the implicit table may be categorized" (i.e. classified). Id. at ¶¶ 7-8. 
	Mukhopadhyay discloses generating a second electronic document including the set of table cell values arranged in the set of rows and the set of columns and based on (1) the locations of horizontal white space, (2) locations of vertical white space, such that the plurality of words in the table are computer-readable in the second electronic document. Mukhopadhyay discloses that the categorization of information from implicit tables "makes it possible to automatically populate a two-dimensional data structure with the structured information from the implicit table." Id. at ¶¶ 8; 77. First, the extraction is based on "comparing the vertical separation of the dominant rows," (i.e. locations of horizontal white space). Id. at ¶ 63. Second, the extraction is based on "comparing spatial positions of white spaces of the dominant rows with one another" for "ensuring accuracy in locating columns by finding a consensus among the dominant rows." Id. at ¶¶ 35-38; See Also FIG. 3A-B (steps 318-322). The extracted information is used to populate a data structure "into a useful format, such as a relational database" (i.e. second electronic document). Id. at ¶¶ 6; 77; 79.
	Mukhopadhyay does not expressly disclose determining, using a Natural Language Processing algorithm, an entity name from a plurality of entity names for each table cell from the set of table cells; and determining, using a machine learning algorithm and based on (1) the locations of horizontal white space and (2) the plurality of entity names, a class from a set of classes for each row from the set of rows in the table.
	Buisson determining, using a Natural Language Processing algorithm, an entity name from a plurality of entity names for each table cell from the set of table cells. Buisson discloses a method "for detecting and extracting table data from documents" that uses "semantic groups of table header terms to identify table headers." Buisson, ¶ 3. The method detects header terms using a "header detection module 111" that "scans each line of the input document 10 to look for identified common semantic terms 108 which are typically used for table headers in a target domain." The semantic terms include entity names (e.g. result name, test, test name, tests, or determination). Id. at ¶ 26.
	Buisson discloses using a machine learning algorithm and based on (1) the locations of horizontal white space and (2) the plurality of entity names, a class from a set of classes for each row from the set of rows in the table. The detected header 41 "is a line of text" (i.e. a row in the table) "listing the detected header terms." Id. at ¶ 47; FIG. 4. Figure 8 illustrates an exemplary method 300 "for extracting table data from input documents" which may be performed by "an IBM Watson cognitive machine" using "a combination of semantics and character location Id. at ¶¶ 63-66. Accordingly, Buisson uses textual term matching (e.g. natural language processing, named entity recognition) to classify a particular line of text (e.g. row of table) as a header if terms (i.e. entities) are recognized.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the header row determination process of   Mukhopadhyay to incorporate determining table rows by scanning table text for known table header terms as taught by Buisson. One of ordinary skill in the art would be motivated to integrate scanning table text for known table header terms into Mukhopadhyay, with a reasonable expectation of success, in order to use "semantic groups of table header terms to identify table headers," wherein the "semantic grouping technique creates a highly accurate targeted table type detection capability" (i.e. to increase table extraction accuracy). See Buisson, ¶ 3.

Claim 2
	Mukhopadhyay discloses wherein the first electronic document is a scanned image. Mukhopadhyay discloses that the method extracts structured information "from a scanned … document." Mukhopadhyay, ¶ 6.

Claim 3
receiving the first electronic document from an optical character recognition program. Mukhopadhyay discloses that "the method of extracting structured information ca be performed with … optical character recognition." Mukhopadhyay, ¶¶ 11; 41.

Claim 5
	Mukhopadhyay discloses wherein the first electronic document includes a plurality of tables including the table. Mukhopadhyay discloses that "the dominant table may include the main and/or largest table of an input image document." Mukhopadhyay, ¶ 42.

Claim 6
	Mukhopadhyay discloses wherein the set of classes includes at least one of Header, Row, Partial Row, or Table End. Mukhopadhyay discloses that "the features representing a row may include a header indicator." Mukhopadhyay, ¶ 55. The header indicator classifies the row as at least a "Header." 

Claim 7
	Buisson discloses wherein the second electronic document is in a format of Comma-Separated Values (CSV), Excel, or JavaScript Object Notation (JSON). Buisson extracts the table information into a "predetermined output table format, such as a comma-separated value (CSV) file." Buisson, ¶ 52.


Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Mukhopadhyay et al., U.S. PG-Publication No. 2020/0073878 A1, in view of Buisson et al., U.S. PG-Publication No. 2019/0171704 A1, further in view of Goulbev et al., U.S. PG-Publication No. 2015/0058374 A1.

Claim 4
	Buisson discloses the first electronic document includes a first page and a second page. The tables include "tables that span multiple pages" and the method can identify data zones in "tables that cross over … multiple pages." Buisson, ¶¶ 25, 45, 67, FIG. 8 (Step 304).
	Mukhopadhyay-Buisson does not expressly disclose the method further includes generating a third electronic document by appending a vertical coordinate of the second page with a vertical coordinate of the first page, the third electronic document having one page including the plurality of words from the first page and the second page.
	Goulbev discloses the method further includes generating a third electronic document by appending a vertical coordinate of the second page with a vertical coordinate of the first page, the third electronic document having one page including the plurality of words from the first page and the second page. Goulbev discloses a "system for capturing data from a document image" using a "flexible structure description" comprising fields. Goulbev, ¶¶ 21-22. Field data may include a table. Id. at ¶¶ 27; 45. Repetitive structure properties are defined, including "a particular row of a table," and "a column title of a multi-page table," and "repetitive tables in which data creeps over to the next page(s) mid-table." Id. at ¶¶ 48-52; 54; 63; See Also FIGS. 2A-B ("example of a multi-page document that contains a table without separator lines"). Two document coordinate systems are used: a local system of coordinates (bound to a particular page) Id. at ¶ 42. Goulbev discloses that the "use of a multi-page sheet (global coordinate system) together with the images of individual pages (local coordinate system) makes it possible to solve tasks as complex as capturing data from documents with multi-page tables that have non-regular structures." Id. at ¶ 71. Accordingly, Goulbev discloses generating a multi-page sheet (i.e. third document) that appends the coordinates of subsequent pages together for use in a global coordinate system.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the table extraction method using word bounding box coordinates of Mukhopadhyay-Buisson to incorporate the multi-page sheet using a global coordinate system as taught by Goulbev. One of ordinary skill in the art would be motivated to integrate the multi-page sheet using a global coordinate system into Mukhopadhyay-Buisson, with a reasonable expectation of success, in order to ensure the tables in a document "are interpreted correctly even when elements are located at different pages." See Goulbev, ¶ 43.


Claims 8-10 and 12-20 are rejected under 35 U.S.C. 103 as being unpatentable over Mukhopadhyay et al., U.S. PG-Publication No. 2020/0073878 A1, in view of Buisson et al., U.S. PG-Publication No. 2019/0171704 A1, further in view of Duta, U.S. PG-Publication No. 2019/0340240 A1.

Claim 8
an apparatus, comprising: a memory; and a processor operatively coupled to the memory, the processor configured to: receive a first electronic document. Mukhopadhyay discloses a "method for extracting structured information from an implicit table" within a scanned document. Mukhopadhyay, ¶ 6. 
	Mukhopadhyay discloses identify … a plurality of word bounding boxes, each word bounding box from the plurality of word bounding boxes associated with a word from a plurality of words in the first electronic document. Figures 3A-3B illustrate an example method 300A-B embodiment including a step 304 of "identifying words in text of [an] input image document … with bounding boxes." Id. at ¶¶ 33; 41; Fig. 7.
	Mukhopadhyay discloses determine, based on locations of the plurality of word bounding boxes, white space in a horizontal dimension of the first electronic document to identify a plurality of rows of a table included in the first electronic document, each row from the plurality of rows (1) including a set of words from the plurality of words and (2) being divided into a plurality of cells separated by the white space in the horizontal dimension. The information extraction method includes "identifying dominant rows of an implicit table." Id. at ¶ 11. In one embodiment, the method uses machine learning "to determine a set of features representing each of the dominant rows," wherein the comparing includes determining "a measure of similarity between each of the dominant rows." Id. at ¶¶ 47. Comparing individual features for each of the dominant rows "may include comparing the vertical separation of the dominant rows," i.e. "the spacing between each of [the] dominant rows." Id. at ¶ 63. Accordingly, the disclosed "vertical separation" between dominant rows is analogous to the claimed "horizontal white space between adjacent rows" of a table. Further, Mukhopadhyay discloses that "the rows may be each Id. at ¶ 57.
	Mukhopadhyay discloses determine, using a machine learning algorithm, a class from a set of classes for each row from the plurality of rows in the table. Mukhopadhyay discloses that "[m]achine learning may be performed to determine a set of features representing each of the dominant rows." Id. at ¶ 47. The set of features "may include row content type," indicating whether the row "has pure textual content or mixed," which "is helpful for finding column headers." Id. at ¶ 50. Specifically, "the features representing a row may include a header indicator" identified "by analyzing textual content and by analyzing the spacing of text that is vertically separated from the previous non-running text row." Accordingly, the disclosed "header indicator" classifies a given row as a header row.
	Mukhopadhyay discloses generate a second electronic document based on … (3) the plurality of rows of the table, (4) the plurality of columns of the table. The placement of information in a column "may identify the information in the column as belonging to a category defined by a header of the column," and "the placement of information in a row may identify that the information in the row is related to each other." Further, the "method of extracting structured information from implicit tables may include identifying rows and columns, such that the structure of the information from the implicit table may be categorized" (i.e. classified). Id. at ¶¶ 7-8. 
	Mukhopadhyay discloses generate a second electronic document based on (5) the white space in the horizontal dimension, and (6) white space in a vertical dimension such that the plurality of words in the table are computer- readable in the second electronic document. Mukhopadhyay discloses that the categorization of information from implicit tables "makes it Id. at ¶¶ 8; 77. First, the extraction is based on "comparing the vertical separation of the dominant rows," (i.e. locations of horizontal white space). Id. at ¶ 63. Second, the extraction is based on "comparing spatial positions of white spaces of the dominant rows with one another" for "ensuring accuracy in locating columns by finding a consensus among the dominant rows." The method generates a "column separator line" based on comparing white spaces within the dominant rows (i.e. comparing vertical white space). Id. at ¶¶ 35-38; See Also FIG. 3A-B (steps 318-322). The extracted information is used to populate a data structure "into a useful format, such as a relational database" (i.e. second electronic document). Id. at ¶¶ 6; 77; 79.
	Mukhopadhyay does not expressly disclose extract, for each row from a subset of rows from the plurality of rows and based on the class from the set of classes for that row, a first set of word values associated with a set of words included in that row; and generate a second electronic document based on (1) the first set of word values in each row from the subset of rows, (2) a second set of word values associated with a set of words included in remaining rows from the plurality of rows … the remaining rows from the plurality of rows being different from the subset of rows from the plurality of rows.
	Buisson discloses extract, for each row from a subset of rows from the plurality of rows and based on the class from the set of classes for that row, a first set of word values associated with a set of words included in that row. Buisson discloses a method "for detecting and extracting table data from documents" that uses "semantic groups of table header terms to identify table headers." Buisson, ¶ 3. The method detects header terms using a "header detection module 111" that "scans each line of the input document 10 to look for identified common Id. at ¶ 26.
	Buisson discloses generate a second electronic document based on (1) the first set of word values in each row from the subset of rows, (2) a second set of word values associated with a set of words included in remaining rows from the plurality of rows … the remaining rows from the plurality of rows being different from the subset of rows from the plurality of rows. After detecting a table header location, the method "identifies a potential table data zone" by applying a "white space correlation function." Figure 5 illustrates "how extracted data zone columns 51a, 52a, 53a, 54a, 54b, 55a, 56a are aligned with closed header columns 51-56 for output." Id. at ¶¶ 45-51. Accordingly, Buisson discloses correlating (i.e. associating) first word values in the subset of header rows (e.g. 51-56) with second word values (e.g. 51a, 52a, 53a, 54a, 54b, 55a, 56a associated in the data zone (i.e. remaining rows different from the subset of header rows).
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the header row determination process of   Mukhopadhyay to incorporate determining table rows by scanning table text for known table header terms as taught by Buisson. One of ordinary skill in the art would be motivated to integrate scanning table text for known table header terms into Mukhopadhyay, with a reasonable expectation of success, in order to use "semantic groups of table header terms to identify table headers," wherein the "semantic grouping technique creates a highly accurate targeted table type detection capability" (i.e. to increase table extraction accuracy). See Buisson, ¶ 3.
	Mukhopadhyay-Buisson does not expressly disclose receiving information on a character bounding box from a plurality of character bounding boxes and for each character from a plurality of characters in the first electronic document; and identifying, based on the plurality of character bounding boxes, a plurality of word bounding boxes.
	Duta discloses receiving information on a character bounding box from a plurality of character bounding boxes and for each character from a plurality of characters in the first electronic document; and identifying, based on the plurality of character bounding boxes, a plurality of word bounding boxes. Duta discloses a table extractor method that "applies a statistics based unsupervised learning process to identify a plurality of table candidates based … on tokens generated from the individual characters within the document and positions and alignments of those characters and tokens." The table extractor method begins "by identifying each character in that document and a positional bounding box for each of those characters," and then "converts the identified characters into a plurality of tokens," wherein "the "tokens comprise a group of one or more adjacent characters having approximately the same linear position within the document based on their bounding boxes." Duta, ¶¶ 5-6; 36-37.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the table extraction method using word bounding boxes of Mukhopadhyay-Buisson to incorporate the table extraction method using character bounding boxes as taught by Duta. One of ordinary skill in the art would be motivated to integrate character bounding boxes into Mukhopadhyay-Buisson, with a reasonable expectation of success, in order to "operate on arbitrary documents having any combination of handwritten characters and machine-generated characters" and to use character positions to extract tables "without any requirement to consider existing table definitions, existing table lines that differentiate rows, columns, and/or cells." See Duta, ¶ 31.

Claim 9
	Mukhopadhyay discloses wherein the first electronic document is a scanned image. Mukhopadhyay discloses that the method extracts structured information "from a scanned … document." Mukhopadhyay, ¶ 6.

Claim 10
	Mukhopadhyay discloses wherein the processor is configured to receive the first electronic document from an optical character recognition program. Mukhopadhyay discloses that "the method of extracting structured information ca be performed with … optical character recognition." Mukhopadhyay, ¶¶ 11; 41.

Claim 12
	Mukhopadhyay discloses wherein the first electronic document includes a plurality of tables including the table. Mukhopadhyay discloses that "the dominant table may include the main and/or largest table of an input image document." Mukhopadhyay, ¶ 42.

Claim 13
	Mukhopadhyay discloses wherein the set of classes includes at least one of Header, Row, Partial Row, or Table End. Mukhopadhyay discloses that "the features representing a row may include a header indicator." Mukhopadhyay, ¶ 55. The header indicator classifies the row as at least a "Header."

Claim 14
wherein the second electronic document is in a format of Comma-Separated Values (CSV), Excel, or JavaScript Object Notation (JSON). Buisson extracts the table information into a "predetermined output table format, such as a comma-separated value (CSV) file." Buisson, ¶ 52.

Claim 15
	Duta discloses wherein: the set of classes includes at least one of Header, Row, Partial Row, or Table End; each row from the subset of rows has a class of Row; and each row from the remaining rows has a class of Partial Row. Duta discloses a Table Extractor that "computes a horizontal grid of cell separators … based on … the computed inter-row separations." Duta, ¶ 102. In one embodiment, the Table Extractor "merges rows of the horizontal grid … to remove tokens that … appear to be outside of the cells of a table." An example is illustrated in Figure 4, wherein the third row of the horizontal grid has "only a single non-empty cell in that row." The Table Extractor operates to "merge (420) the third row with the second row of the table candidate 400 because of the small distance to the token above in view of statistical differences in spacing or difference between tokens in the remained of the table candidate." Id. at ¶¶ 105-107. Accordingly, in the example the second row is classified as a "Row" and the third row is classified as a "Partial Row" because it has only a single-non-empty cell in that row. The classification of a row with a single non-empty cell cause the Table Extractor method to merge (i.e. associate) the word value of the third row into the word value of the second row located above it.

Claim 16
wherein the processor is configured to determine, using a Natural Language Processing algorithm, an entity name from a plurality of entity names for each cell from the plurality of cells in each row from the plurality of rows; and the processor is configured to determine the set of classes by using the plurality of entity names. The detected header 41 "is a line of text" (i.e. a row in the table) "listing the detected header terms." Buisson, ¶ 47; FIG. 4. Figure 8 illustrates an exemplary method 300 "for extracting table data from input documents" which may be performed by "an IBM Watson cognitive machine" using "a combination of semantics and character location techniques to identify and extract simple tables" from a document. At step 303, "each line of the input document is scanned to detect table header entries," wherein each word in a line is compared "to a defined set of known table header terms." Id. at ¶¶ 63-66. Accordingly, Buisson uses textual term matching (e.g. natural language processing, named entity recognition) to classify a particular line of text (e.g. row of table) as a header if terms (i.e. entities) are recognized.

Claim 17
	[NON-FUNCTIONAL DESCRIPTIVE MATERIAL] This limitation has no patentable weight, because it is directed to non-functional descriptive material. This limitation merely conveys meaning of the "entity name" to a human reader independent of the computer. There is no functional relationship between the claimed system and the "entity name" because the functional programming of the system does not change based on what the "entity name" represents; these limitations merely describe the data being processed to a human reader. See MPEP 2111.05.
wherein the plurality of entity names includes Date, Person, Location, Organization, Number, Money, Cardinal, or No Entity. The method detects header terms using a "header detection module 111" that "scans each line of the input document 10 to look for identified common semantic terms 108 which are typically used for table headers in a target domain." The semantic terms include entity names (e.g. result name, test, test name, tests, or determination). Buisson, ¶ 26.

Claim 18
	Mukhopadhyay discloses wherein the machine learning algorithm includes a set of weight values indicating probabilities of transition from one class from the set of classes to another class from the set of classes. Mukhopadhyay discloses that the method of extracting structured information includes "determining that each of the dominant rows is part of the cluster of rows" (i.e. rows are classified by cluster membership). Similar lines are given higher "membership score," and the "weighting of the line representation by membership score ensures that the bad members of the clusters are given less weightage while determining the column guard lines." Mukhopadhyay, ¶¶ 65-66.

Claim 19
	Buisson discloses wherein the table spans across multiple pages in the first electronic document. The tables include "tables that span multiple pages" and the method can identify data zones in "tables that cross over … multiple pages." Buisson, ¶¶ 25, 45, 67, FIG. 8 (Step 304).

Claim 20
wherein the processor is configured to determine, based on (1) the locations of the plurality of word bounding boxes and (2) the set of classes for the plurality of rows, the white space in the vertical dimension of the first electronic document. Mukhopadhyay discloses that the table extraction is based on "comparing spatial positions of white spaces of the dominant rows with one another" for "ensuring accuracy in locating columns by finding a consensus among the dominant rows." The method generates a "column separator line" based on comparing white spaces within the dominant rows (i.e. comparing vertical white space). Mukhopadhyay, ¶¶ 35-38; See Also FIG. 3A-B (steps 318-322).


Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Mukhopadhyay et al., U.S. PG-Publication No. 2020/0073878 A1, in view of Buisson et al., U.S. PG-Publication No. 2019/0171704 A1, further in view of Duta, U.S. PG-Publication No. 2019/0340240 A1, further in view of Goulbev et al., U.S. PG-Publication No. 2015/0058374 A1.
Claim 11
	Buisson discloses the first electronic document includes a first page and a second page. The tables include "tables that span multiple pages" and the method can identify data zones in "tables that cross over … multiple pages." Buisson, ¶¶ 25, 45, 67, FIG. 8 (Step 304).
	Mukhopadhyay-Buisson-Duta does not expressly disclose the processor is configured to generate a third electronic document by appending a vertical coordinate of the second page with a vertical coordinate of the first page, the third electronic document having one page including the plurality of characters from the first page and the second page.
the processor is configured to generate a third electronic document by appending a vertical coordinate of the second page with a vertical coordinate of the first page, the third electronic document having one page including the plurality of characters from the first page and the second page. Goulbev discloses a "system for capturing data from a document image" using a "flexible structure description" comprising fields. Goulbev, ¶¶ 21-22. Field data may include a table. Id. at ¶¶ 27; 45. Repetitive structure properties are defined, including "a particular row of a table," and "a column title of a multi-page table," and "repetitive tables in which data creeps over to the next page(s) mid-table." Id. at ¶¶ 48-52; 54; 63; See Also FIGS. 2A-B ("example of a multi-page document that contains a table without separator lines"). Two document coordinate systems are used: a local system of coordinates (bound to a particular page) and a global coordinate system (goes through the entire document). Id. at ¶ 42. Goulbev discloses that the "use of a multi-page sheet (global coordinate system) together with the images of individual pages (local coordinate system) makes it possible to solve tasks as complex as capturing data from documents with multi-page tables that have non-regular structures." Id. at ¶ 71. Accordingly, Goulbev discloses generating a multi-page sheet (i.e. third document) that appends the coordinates of subsequent pages together for use in a global coordinate system.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the table extraction method using word bounding box coordinates of Mukhopadhyay-Buisson-Duta to incorporate the multi-page sheet using a global coordinate system as taught by Goulbev. One of ordinary skill in the art would be motivated to integrate the multi-page sheet using a global coordinate system into Mukhopadhyay-Buisson-Duta, with a reasonable expectation of success, in order to ensure the .


Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Mukhopadhyay et al., U.S. PG-Publication No. 2020/0073878 A1, in view of Duta, U.S. PG-Publication No. 2019/0340240 A1.

Claim 21
	Mukhopadhyay discloses a processor-readable non-transitory medium storing code representing instructions to be executed by a processor, the code comprising code to cause the processor to receive a first electronic document. Mukhopadhyay discloses a "method for extracting structured information from an implicit table" within a scanned document. Mukhopadhyay, ¶ 6.
	Mukhopadhyay discloses identify … based on the locations of the plurality of character bounding boxes, a plurality of word bounding boxes, each word bounding box from the plurality of word bounding boxes associated with a word from a plurality of words in the first electronic document. Figures 3A-3B illustrate an example method 300A-B embodiment including a step 304 of "identifying words in text of [an] input image document … with bounding boxes." Id. at ¶¶ 33; 41; Fig. 7.
	Mukhopadhyay discloses the first electronic document including a table having a set of table cells positioned in a set of rows and a set of columns. The method includes "identifying Id. at ¶ 8.
	Mukhopadhyay discloses identify, based on coordinates of the plurality of word bounding boxes, locations of horizontal white space between two adjacent rows from the set of rows. The information extraction method includes "identifying dominant rows of an implicit table." Id. at ¶ 11. In one embodiment, the method uses machine learning "to determine a set of features representing each of the dominant rows," wherein the comparing includes determining "a measure of similarity between each of the dominant rows." Id. at ¶¶ 47. Comparing individual features for each of the dominant rows "may include comparing the vertical separation of the dominant rows," i.e. "the spacing between each of [the] dominant rows." Id. at ¶ 63. Accordingly, the disclosed "vertical separation" between dominant rows is analogous to the claimed "horizontal white space between adjacent rows" of a table. Further, Mukhopadhyay discloses that "the rows may be each represented by two coordinates (x,y) in a two-dimensional space," and the coordinates are used to "find the distance between two rows." Id. at ¶ 57.
	Mukhopadhyay discloses determine, using a machine learning algorithm and based on the locations of horizontal white space, a class from a set of classes for each row from the set of rows in the table. Mukhopadhyay discloses that "[m]achine learning may be performed to determine a set of features representing each of the dominant rows." Id. at ¶ 47. The set of features "may include row content type," indicating whether the row "has pure textual content or mixed," which "is helpful for finding column headers." Id. at ¶ 50. Specifically, "the features representing a row may include a header indicator" identified "by analyzing textual content and by analyzing the spacing of text that is vertically separated from the previous non-running text row." Accordingly, the disclosed "header indicator" classifies a given row as a header row.
 extract a set of table cell values associated with the set of table cells; and generate a second electronic document including the set of table cell values arranged in the set of rows and the set of columns and based on (1) the locations of horizontal white space, (2) locations of vertical white space, such that the plurality of words in the table are computer- readable in the second electronic document. Mukhopadhyay discloses that the categorization of information from implicit tables "makes it possible to automatically populate a two-dimensional data structure with the structured information from the implicit table." Id. at ¶¶ 8; 77. First, the extraction is based on "comparing the vertical separation of the dominant rows," (i.e. locations of horizontal white space). Id. at ¶ 63. Second, the extraction is based on "comparing spatial positions of white spaces of the dominant rows with one another" for "ensuring accuracy in locating columns by finding a consensus among the dominant rows." The method generates a "column separator line" based on comparing white spaces within the dominant rows (i.e. comparing vertical white space). Id. at ¶¶ 35-38; See Also FIG. 3A-B (steps 318-322). The extracted information is used to populate a data structure "into a useful format, such as a relational database" (i.e. second electronic document). Id. at ¶¶ 6; 77; 79.
	Mukhopadhyay does not expressly disclose receiving locations of a plurality of character bounding boxes for each character from a plurality of characters in the first electronic document; and identifying based on the locations of the plurality of character bounding boxes, a plurality of word bounding boxes, each word bounding box from the plurality of word bounding boxes.
	Duta does not expressly disclose receiving locations of a plurality of character bounding boxes for each character from a plurality of characters in the first electronic document; and identifying based on the locations of the plurality of character bounding boxes, a plurality of word bounding boxes, each word bounding box from the plurality of word bounding boxes. Duta 
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the table extraction method using word bounding boxes of Mukhopadhyay to incorporate the table extraction method using character bounding boxes as taught by Duta. One of ordinary skill in the art would be motivated to integrate character bounding boxes into Mukhopadhyay, with a reasonable expectation of success, in order to "operate on arbitrary documents having any combination of handwritten characters and machine-generated characters" and to use character positions to extract tables "without any requirement to consider existing table definitions, existing table lines that differentiate rows, columns, and/or cells." See Duta, ¶ 31. 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FRANK D MILLS whose telephone number is (571)270-3172. The examiner can normally be reached M-F 10-6 ET.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KAVITA PADMANABHAN can be reached on (571)272-8352. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/FRANK D MILLS/Primary Examiner, Art Unit 2176                                                                                                                                                                                                        November 20, 2021