DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
1.	This office action is in response to the application received 11/16/2020.
2.	Claims 1-31 are pending in the application. Claims 1, 18, and 31 are independent claims.



Information Disclosure Statement
3.	The information disclosure statements (IDS) submitted on 11/16/2020 and 2/9/22 were filed after the mailing date of the application on 11/16/2020.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.




Claim Rejections - 35 USC § 103
4.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5.	Claims 1, 2, 4-28, and 30-31 are rejected under 35 U.S.C. 103 as being unpatentable over Allen et al., PGPub. 2020/0302114 filed (6/11/20) in view of Taylor et al., USPN 5,848,148 filed (6/30/95).
In reference to independent claim 1, Allen teaches:
	obtaining an electronic document specifying a graphical layout of content items, the content items including at least text (See Allen, para. 0033-0036) a means of receiving an original document which includes both a graphical layout of content items such as a table and textual content.
	determining pixel data representing the graphical layout of the content items (See Allen, para. 0033-0036) a means of determining lines of a table, lines of text to identify the structure of portions of the original document which can include PDF, XML, or HTML documents. The reference fails to explicitly state identifying pixel data representing the graphical layout of the content items however the reference to Taylor (See Col. 4 and 5) a means of determining, using pixel data representing the graphical layout of the document image, textual and non-text content items to correctly identify specific features of a document image. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have combined the reference to Allen which teaches methods for identifying the structure of a various types of documents for classification with the reference to Taylor which includes a means of utilizing pixel data to further identify text and non-text content structures within a document image since it would have provided increased relevance of a classification from an OCR result.
	providing input data based, at least in part, on the pixel data to a document parsing model that is trained to: detect functional regions within the graphical layout based on the input data (See Allen, para. 0027-0036-0039) receiving input data into a model representing the document and content items within the document and further identifying a table, cells, and header region of the document. 
	assign boundaries to the functional regions based on the input data (See Allen, para. 0034) a means of identifying boundaries of a table including rows and columns.
	assign a category label to each functional region that is detected (See Allen, para. 0038) a means of classifying, using a model, to label divisions of the table as header and non-header.
	matching potions of the text to corresponding functional regions based on the boundaries assigned to the functional regions and locations associated with the portions of the text; and (See Allen, para. 0037-0039 and 0043-0045) each cell can be correlated with a row header and with a column header according to the function of the cells. A cell document can then be generated for each cell including information concerning, e.g., the contents of the cell, the row header and the column header, any captions for the table with headers. Further, the cell documents can be annotated and thus modified to include data for identifying relationships between terms within the cell documents 
	storing data representing the content items, the functional regions, and the category labels in a searchable data structure (See Allen, para. 0030 and 0039) a means of creating a cell document for each cell including information concerning the contents of the cell, the row header and the column header, any captions for the table with headers and/or for the cell, surrounding text, table headers, titles. The cell documents can be communicated to a database to be searched.
In reference to dependent claim 2, Allen teaches:
	Wherein the pixel data defines a plurality of display elements to render a display of the electronic document and each display element encodes at least one color bit representing a display color of the display element (See Allen, para. 0033-0036) a means of determining lines of a table, lines of text to identify the structure of portions of the original document which can include PDF, XML, or HTML documents. The reference fails to explicitly state identifying pixel data representing the graphical layout of the content items however the reference to Taylor (See Col. 5 and 6) a means of determining, using non-black pixel encoded data representing the graphical layout of the document image, textual and non-text content items to correctly identify specific features of a document image. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have combined the reference to Allen which teaches methods for identifying the structure of a various types of documents for classification with the reference to Taylor which includes a means of utilizing pixel data to further identify text and non-text content structures within a document image since it would have provided increased relevance of a classification from an OCR result.
In reference to dependent claim 4, Allen teaches:
	Generating one or more search heuristics based on the content items, the functional regions, the category labels, or a combination thereof; and storing the one or more search heuristics for use when searching the searchable data structure (See Allen, para. 0042 and 0078) The domain specific dictionary can form relationships between terms to identify, e.g., synonyms, relationships, definitions, and other relationships for building context and concepts of the terms used. Store the annotated cell documents in a search database wherein the document that includes information contained in tables can be made searchable with the annotated cell documents because each cell of the tables in the document include relational and linguistic information. 
In reference to dependent claim 5, Allen teaches:
	Receive a search query related to a document corpus that includes that includes the electronic document; accessing the one or more search heuristics; generating an augmented search query based on the search query and the one or more search heuristics; and searching the document corpus using the augmented search query (Allen, para. 0051-0057) a means of receiving a query for information related to the table and utilizing a search engine to determine an augmented search strategy according to items and relations in the query. The search engine can perform natural language processing on the query to structure the query before searching the annotated cell documents according to concepts rather than precise words.
In reference to dependent claim 6, Allen teaches:
	Wherein the functional regions detected by the document parsing model include two or more of a page header, a page footer, a section heading, a paragraph, a table, an image, a footnote, or a list (See Allen, para. 0027-0036-0039) receiving input data into a model representing the document and content items within the document and further identifying a table, cells, and header region of the document. 
In reference to dependent claim 7, Allen teaches:
	Estimating column boundaries and row boundaries based on the input data associated with the particular functional region (See Allen, para. 0034) a means of identifying boundaries and the physical structure of tables, including rows and columns. 
	Determine a column heading of a column based on the text associated with the particular functional region (See Allen, para. 0036) detecting a column header for a table structure based on text associated with the table. 
	Storing a portion of the text associated within the particular functional region in a first data element of the searchable data structure; and (See Allen, para. 0039) a means of generating cell documents using information from the headers including captions for the headers, cell information, surrounding text.
	Storing the column head of the column in a second data element, wherein the first data element is subordinate to the second data element in the searchable data structure (See Allen, para. 0039) Each cell can be correlated with a row header and with a column header according to the location of the cell. A cell document can then be generated for each cell including information concerning the contents of the cell. 
In reference to dependent claim 8, Allen teaches:
	Wherein determining the column heading includes using a natural language processing model to determine a semantic group represented by text of the column (See Allen, para. 0041) Cell documents generated for a table can undergo natural language processing and thus recognizing relationships and language used in a table. 
In reference to dependent claim 9, Allen teaches:
	Wherein the data specifying the graphical layout of the content items….(See Allen, para. 0034-0039) a means of utilizing the data input to identify textual features for particular text within the table. Assigning header and non-header labels based upon the identified textual features.
In reference to dependent claim 10, Allen teaches:
	Wherein the data specifying the graphical layout of the content items…(See Allen, para. 0034-0039) a means of utilizing the input to detect the spacing between characters and further determining a classification of the table content as header or non-header based on the information input. 
In reference to dependent claim 11, Allen teaches:
	Wherein the data specifying the graphical layout of the content indicates a background color associated with a particular function region…..(See Allen, para. 0034-0039) a means of utilizing the data input to identify textual features for particular text within the table. Assigning header and non-header labels based upon the identified textual features.
In reference to dependent claim 12, Allen teaches:
	Wherein the text includes one or more special characters…..(See Allen, para. 0034-0039) a means of utilizing the data input to identify textual features including special characters for particular text within the table. Assigning header and non-header labels based upon the identified textual features. (See Allen, para. 0037) a model trained to assign a header or non-header label to particular table based upon data received with the particular table.
In reference to dependent claim 13, Allen teaches:
	Wherein the document parsing model is trained to assign a first category label to a particular functional region based on a probabilistic analysis of the pixel data associated with the particular functional region (See Allen, para. 0037) a model trained to assign a header or non-header label to particular table based upon data received with the particular table. 
The reference fails to explicitly state identifying pixel data representing the graphical layout of the content items however the reference to Taylor (See Col. 4 and 5) a means of determining, using pixel data representing the graphical layout of the document image, textual and non-text content items to correctly identify specific features of a document image. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have combined the reference to Allen which teaches methods for identifying the structure of a various types of documents for classification with the reference to Taylor which includes a means of utilizing pixel data to further identify text and non-text content structures within a document image since it would have provided increased relevance of a classification from an OCR result.
In reference to dependent claim 14, Allen teaches:
	Wherein the input data is further based on the text, wherein the document parsing model is trained to assign a particular category label to a particular functional region further based on a semantic analysis of text associated with the particular functional region (See Allen, para. 0037) a model trained to assign a header or non-header label to particular table based upon data received with the particular table. By extracting relationships amongst recognized words in the cell documents, the cell recognition model can annotate the cell documents with the corresponding relationships. The relationships can include, e.g., case restoration, parsing, semantic role labeling.
In reference to dependent claim 15, Allen teaches:
	Wherein the searchable data structure has a smaller in-memory footprint than the electronic document (See Allen, para. 0022) create a cell document for each cell for each cell in each table that dfines the cell contents as well as the relationship of the cell to the headers.
In reference to dependent claim 16, Allen teaches:
	Further comprising determining a topology of the searchable data structure based on an arrangement of information in the electronic document (See Allen, para. 0037-0039 and 0043-0045) each cell can be correlated with a row header and with a column header according to the function of the cells. A cell document can then be generated for each cell including information concerning, e.g., the contents of the cell, the row header and the column header, any captions for the table with headers. Further, the cell documents can be annotated and thus modified to include data for identifying relationships between terms within the cell documents. (See Allen, para. 0030 and 0039) a means of creating a cell document for each cell including information concerning the contents of the cell, the row header and the column header, any captions for the table with headers and/or for the cell, surrounding text, table headers, titles. The cell documents can be communicated to a database to be searched.
In reference to dependent claim 17, Allen teaches:
	Wherein the document parsing model is trained using labeled training data based on a corpus of electronic documents, each electronic document of the corpus including a plurality of identified functional regions and a respective category label for each of the identified function regions (See Allen, para. 0045-0047) as a result, the cell recognition model can be trained against the training cell documents and annotated cell documents which include added labels for the tables. 
In reference to independent claim 18, the claim recites a system for carrying out similar limitations to those found in the method claim 1. Therefore, the claim is rejected under similar rationale.
In reference to dependent claims 19-28, and 30, the claims recite a system for carrying out similar limitations to those found in the method claims numbered 6-16, respectively. Therefore, the claims are rejected under similar rationale. 
In reference to independent claim 31, the claim recites a computer readable medium storing computer executable instructions for carrying out similar limitations to those found in independent claim 1. Therefore, the claim is rejected under similar rationale.




6.	Claim 3 and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Allen in view of Taylor as applied to claim 1 above, and further in view of Feng et al., PGPub. 2010/0312728 field (8/18/2010).
In reference to dependent claim 3, Allen teaches:
	Wherein the searchable tree structure comprises a tree structure having a plurality of leaf nodes, each leaf node associated with a corresponding branch node, and wherein the content items are assigned to nodes of the tree structure such that a hierarchy of the functional regions is represented in the tree structure (See Allen, para. 0030 and 0039) a means of creating a cell document for each cell including information concerning the contents of the cell, the row header and the column header, any captions for the table with headers and/or for the cell, surrounding text, table headers, titles. The cell documents can be communicated to a database to be searched. The reference fails to explicitly teach a tree structure having a plurality of leaf nodes, each leaf node associated with a corresponding branch node such that the functional regions are represented in the tree structure. However, the reference to Feng (See para. 0026) teaches a means of segmenting a web page and grouping text nodes on a webpage into a sequence of information blocks. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have combined the references to Allen and Taylor which teach storing identified document elements into a document structure to be search with the reference to Feng which teaches a means of web page segmentation task to group text nodes on a web page into a sequence of information blocks since it would have provided increased structure to the document elements to be searched. 
In reference to dependent claim 29, the claim recites a system for carrying out a similar step to that found in the method of dependent claim 3. Therefore, the claim is rejected under similar rationale. 




Conclusion
7.	The examiner recommends adding language to the independent claims to further detail the functional regions (Spec. para. 0030-0031) and how they are identified. Also, the examiner believes further defining the searchable data structure (i.e. nodes) and how the tree structure of nodes are created for specific content items and functional regions are represented in the tree structure. Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW J LUDWIG whose telephone number is (571)272-4127. The examiner can normally be reached Mon - Fri. 9-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached on 571-272-4124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MATTHEW J. LUDWIG
Examiner
Art Unit 2178



/STEPHEN S HONG/Supervisory Patent Examiner, Art Unit 2178